Multidimensional stochastic processes as rough paths

This page intentionally left blank CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS 120 Editorial Board ´ W. FULTON, A. KATO...

Author: Peter K. Friz | Nicolas B. Victoir

22 downloads 1016 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

This page intentionally left blank

CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS 120 Editorial Board ´ W. FULTON, A. KATOK, F. KIRWAN, B. BOLLOBAS, P. SARNAK, B. SIMON, B. TOTARO

MULTIDIMENSIONAL STOCHASTIC PROCESSES AS ROUGH PATHS Rough path analysis provides a fresh perspective on Itˆo’s important theory of stochastic differential equations. Key theorems of modern stochastic analysis (existence and limit theorems for stochastic flows, Freidlin–Wentzell theory, the Stroock–Varadhan support description) can be obtained with dramatic simplifications. Classical approximation results and their limitations (Wong–Zakai, McShane’s counterexample) receive “obvious” rough path explanations. Evidence is building that rough paths will play an important role in the future analysis of stochastic partial differential equations, and the authors include some first results in this direction. They also emphasize interactions with other parts of mathematics, including Caratheodory geometry, Dirichlet forms and Malliavin calculus. Based on successful courses at the graduate level, this up-to-date introduction presents the theory of rough paths and its applications to stochastic analysis. Examples, explanations and exercises make the book accessible to graduate students and researchers from a variety of fields.

CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS Editorial Board: B. Bollob´as, W. Fulton, A. Katok, F. Kirwan, P. Sarnak, B. Simon, B. Totaro All the titles listed below can be obtained from good booksellers or from Cambridge University Press. For a complete series listing visit: http://www.cambridge.org/series/sSeries.asp?code=CSAM

Already published 70 71 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121

R. Iorio & V. Iorio Fourier analysis and partial differential equations R. Blei Analysis in integer and fractional dimensions F. Borceux & G. Janelidze Galois theories B. Bollob´as Random graphs (2nd Edition) R. M. Dudley Real analysis and probability (2nd Edition) T. Sheil-Small Complex polynomials C. Voisin Hodge theory and complex algebraic geometry, I C. Voisin Hodge theory and complex algebraic geometry, II V. Paulsen Completely bounded maps and operator algebras F. Gesztesy & H. Holden Soliton equations and their algebro-geometric solutions, I S. Mukai An introduction to invariants and moduli G. Tourlakis Lectures in logic and set theory, I G. Tourlakis Lectures in logic and set theory, II R. A. Bailey Association schemes J. Carlson, S. M¨uller-Stach & C. Peters Period mappings and period domains J. J. Duistermaat & J. A. C. Kolk Multidimensional real analysis, I J. J. Duistermaat & J. A. C. Kolk Multidimensional real analysis, II M. C. Golumbic & A. N. Trenk Tolerance graphs L. H. Harper Global methods for combinatorial isoperimetric problems I. Moerdijk & J. Mrˇcun Introduction to foliations and Lie groupoids J. Kollr, K. E. Smith & A. Corti Rational and nearly rational varieties D. Applebaum L´evy processes and stochastic calculus (1st Edition) B. Conrad Modular forms and the Ramanujan conjecture M. Schechter An introduction to nonlinear analysis R. Carter Lie algebras of finite and affine type H. L. Montgomery & R. C. Vaughan Multiplicative number theory, I I. Chavel Riemannian geometry (2nd Edition) D. Goldfeld Automorphic forms and L-functions for the group GL(n, R) M. B. Marcus & J. Rosen Markov processes, Gaussian processes, and local times P. Gille & T. Szamuely Central simple algebras and Galois cohomology J. Bertoin Random fragmentation and coagulation processes E. Frenkel Langlands correspondence for loop groups A. Ambrosetti & A. Malchiodi Nonlinear analysis and semilinear elliptic problems T. Tao & V. H. Vu Additive combinatorics E. B. Davies Linear operators and their spectra K. Kodaira Complex analysis T. Ceccherini-Silberstein, F. Scarabotti & F. Tolli Harmonic analysis on finite groups H. Geiges An introduction to contact topology J. Faraut Analysis on Lie groups: An Introduction E. Park Complex topological K-theory D. W. Stroock Partial differential equations for probabilists A. Kirillov, Jr An introduction to Lie groups and Lie algebras F. Gesztesy et al. Soliton equations and their algebro-geometric solutions, II E. de Faria & W. de Melo Mathematical tools for one-dimensional dynamics D. Applebaum L´evy processes and stochastic calculus (2nd Edition) T. Szamuely Galois groups and fundamental groups G. W. Anderson, A. Guionnet & O. Zeitouni An introduction to random matrices C. Perez-Garcia & W. H. Schikhof Locally convex spaces over non-Archimedean valued fields P. K. Friz & N. B. Victoir Multidimensional Stochastic Processes as Rough Paths T. Ceccherini-Silberstein, F. Scarabotti & F. Tolli Representation Theory of the Symmetric Groups

Multidimensional Stochastic Processes as Rough Paths Theory and Applications PETER K. FRIZ NICOLAS B. VICTOIR

CAMBRIDGE UNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521876070 © P. K. Friz and N. B. Victoir 2010 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2010 ISBN-13

978-0-511-68004-5

eBook (EBL)

ISBN-13

978-0-521-87607-0

Hardback

Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

To Wendy and Laura

Contents

Preface Introduction

page xiii 1

1 2 3

The story in a nutshell From ordinary to rough differential equations Carnot–Caratheodory geometry Brownian motion and stochastic analysis

4 4 8 13

I

Basics

1 1.1 1.2 1.3 1.4 1.5

Continuous paths of bounded variation Continuous paths on metric spaces Continuous paths of bounded variation on metric spaces Continuous paths of bounded variation on Rd Sobolev spaces of continuous paths of bounded variation Comments

19 19 21 29 39 44

2 2.1 2.2 2.3

Riemann–Stieltjes integration Basic Riemann–Stieltjes integration Continuity properties Comments

45 45 49 52

3 3.1 3.2 3.3 3.4 3.5 3.6

Ordinary differential equations Preliminaries Existence Uniqueness A few consequences of uniqueness Continuity of the solution map Comments

53 53 55 59 60 62 67

4 4.1 4.2

ODEs: smoothness Smoothness of the solution map Comments

68 68 76

viii

Contents

5 5.1 5.2 5.3 5.4 5.5 5.6

Variation and H¨older spaces H¨older and p-variation paths on metric spaces Approximations in geodesic spaces H¨older and p-variation paths on Rd Generalized variation Higher-dimensional variation Comments

77 77 88 92 99 104 111

6 6.1 6.2 6.3 6.4 6.5

Young integration Young–L´oeve estimates Young integrals Continuity properties of Young integrals Young–L´oeve–Towghi estimates and 2D Young integrals Comments

112 112 115 118 119 122

II

Abstract theory of rough paths

7 7.1 7.2 7.3 7.4 7.5 7.6 7.7

Free nilpotent groups Motivation: iterated integrals and higher-order Euler schemes Step-N signatures and truncated tensor algebras Lie algebra tN (Rd ) and Lie group 1 + tN (Rd ) Chow’s theorem Free nilpotent groups The lift of continuous bounded variation paths on Rd Comments

125 125 128 134 140 142 156 163

8 8.1 8.2 8.3 8.4 8.5 8.6 8.7

Variation and H¨older spaces on free groups p-Variation and 1/p-H¨older topology Geodesic approximations Completeness and non-separability The d0 /d∞ estimate Interpolation and compactness Closure of lifted smooth paths Comments

165 166 174 175 175 177 178 181

9 9.1 9.2 9.3 9.4 9.5

Geometric rough path spaces The Lyons-lift map x → SN (x) Spaces of geometric rough paths Invariance under Lipschitz maps Young pairing of weak geometric rough paths Comments

182 183 191 196 197 211

Contents

ix

10 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9

Rough differential equations Preliminaries Davie’s estimate RDE solutions Full RDE solutions RDEs under minimal regularity of coefficients Integration along rough paths RDEs driven along linear vector fields Appendix: p-variation estimates via approximations Comments

212 212 215 221 241 248 253 262 268 279

11 11.1 11.2 11.3 11.4

RDEs: smoothness Smoothness of the Itˆo–Lyons map Flows of diffeomorphisms Application: a class of rough partial differential equations Comments

281 281 289 294 301

12 12.1 12.2 12.3

RDEs with drift and other topics RDEs with drift terms Application: perturbed driving signals and impact on RDEs Comments

302 302 316 324

III

Stochastic processes lifted to rough paths

13 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10

Brownian motion Brownian motion and L´evy’s area Enhanced Brownian motion Strong approximations Weak approximations Cameron–Martin theorem Large deviations Support theorem Support theorem in conditional form Appendix: infinite 2-variation of Brownian motion Comments

327 327 333 339 354 357 359 367 370 381 383

14 14.1 14.2 14.3 14.4

Continuous (semi-)martingales Enhanced continuous local martingales The Burkholder–Davis–Gundy inequality p-Variation rough path regularity of enhanced martingales Burkholder–Davis–Gundy with p-variation rough path norm

386 386 388 390 392

x

Contents

14.5 Convergence of piecewise linear approximations 14.6 Comments

395 401

15 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 15.10

Gaussian processes Motivation and outlook One-dimensional Gaussian processes Multidimensional Gaussian processes The Young–Wiener integral Strong approximations Weak approximations Large deviations Support theorem Appendix: some estimates in G3 (Rd ) Comments

402 402 404 416 433 436 442 445 448 451 452

16 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8 16.9 16.10

Markov processes Motivation Uniformly subelliptic Dirichlet forms Heat-kernel estimates Markovian rough paths Strong approximations Weak approximations Large deviations Support theorem Appendix: analysis on free nilpotent groups Comments

454 454 457 463 464 467 480 483 484 493 499

IV

Applications to stochastic analysis

17 17.1 17.2 17.3

Stochastic differential equations and stochastic flows Working summary on rough paths Rough paths vs Stratonovich theory Stochastic differential equations driven by non-semi-martingales Limit theorems Stochastic flows of diffeomorphisms Anticipating stochastic differential equations A class of stochastic partial differential equations Comments

17.4 17.5 17.6 17.7 17.8

18 Stochastic Taylor expansions 18.1 Azencott-type estimates

503 503 506 515 517 521 523 525 526 528 528

Contents

xi

18.2 Weak remainder estimates 18.3 Comments

531 532

19 Support theorem and large deviations 19.1 Support theorem for SDEs driven by Brownian motion 19.2 Support theorem for SDEs driven by other stochastic processes 19.3 Large deviations for SDEs driven by Brownian motion 19.4 Large deviations for SDEs driven by other stochastic processes 19.5 Support theorem and large deviations for a class of SPDEs 19.6 Comments

533 533 536 538 541 542 544

20 20.1 20.2 20.3 20.4 20.5

545 545 549 550 553 566

Malliavin calculus for RDEs H-regularity of RDE solutions Non-degenerate Gaussian driving signals Densities for RDEs under ellipticity conditions Densities for RDEs under H¨ormander’s condition Comments

Appendices A A.1 A.2 A.3 A.4 A.5

Sample path regularity and related topics Continuous processes as random variables The Garsia–Rodemich–Rumsey estimate Kolmogorov-type corollaries Sample path regularity under Gaussian assumptions Comments

571 571 573 582 587 596

B B.1 B.2 B.3 B.4

Banach calculus Preliminaries Directional and Fr´echet derivatives Higher-order differentiability Comments

597 597 598 601 602

C C.1 C.2

Large deviations Definition and basic properties Contraction principles

603 603 604

D D.1 D.2 D.3 D.4

Gaussian analysis Preliminaries Isoperimetry and concentration of measure L2 -expansions Wiener–Itˆo chaos

606 606 608 610 610

xii

Contents

D.5 D.6

Malliavin calculus Comments

613 614

E E.1 E.2 E.3 E.4 E.5 E.6 E.7

Analysis on local Dirichlet spaces Quadratic forms Symmetric Markovian semi-groups and Dirichlet forms Doubling, Poincar´e and quasi-isometry Parabolic equations and heat-kernels Symmetric diffusions Stochastic analysis Comments

615 615 617 620 623 625 627 635

Frequently used notation References Index

636 638 652

Preface

This book is split into four parts. Part I is concerned with basic material about certain ordinary differential equations, paths of H¨older and variation regularity, and the rudiments of Riemann–Stieltjes and Young integration. Nothing here will be new to specialists, but the material seems rather spread out in the literature and we hope it will prove useful to have it collected in one place. Part II is about the deterministic core of rough path theory, a` la T. J. Lyons, but actually inspired by the direct approach of A. M. Davie. Although the theory can be formulated in a Banach setting, we have chosen to remain in a finitedimensional setting; our motivation for this decision comes from the fact that the bulk of classic texts on Brownian motion and stochastic analysis take place in a similar setting, and these are the grounds on which we sought applications. In essence, with rough paths one attempts to take out probability from the theory of stochastic differential equations – to the extent possible. Probability still matters, but the problems are shifted from the analysis of the actual SDEs to the analysis of elementary stochastic integrals, known as L´evy’s stochastic area. In Part III we start with a detailed discussion of how multidimensional Brownian motion can be turned into a (random) rough path; followed by a similar study for (continuous) semi-martingales and large classes of multidimensional Gaussian – and Markovian – processes. In Part IV we apply the theory of rough differential equations (RDEs), pathby-path, with the (rough) sample paths constructed in Part III. In the setting of Brownian motion or semi-martingales, the resulting (random) RDE solutions are identified as solutions to classical stochastic differential equations. We then give a selection of applications to stochastic analysis in which rough path techniques have proved useful. The prerequisites for Parts I and II are essentially a good command of undergraduate analysis. Some knowledge of ordinary differential equations (existence, uniqueness results) and basic geometry (vector fields, geodesics) would be helpful, although everything we need is discussed. In Part III, we assume a general background in measure theoretic probability theory and the basics of stochastic processes, such as Brownian motion. Stochastic area (for Brownian motion) is introduced via stochastic integration, with alternatives described in the text. In the respective chapters on semi-martingales,

xiv

Preface

Gaussian and Markovian processes, the reader is assumed to have the appropriate background, most of which we have tried to collect in the appendices. Part IV deals with applications to stochastic analysis, stochastic (partial) differential equations in particular. For a full appreciation of the results herein, the reader should be familiar with the relevant background; textbook references are thus given whenever possible at the end of chapters. Exercises are included throughout the text, often with complete (or sketched) solutions. It is our pleasure to thank our mentors, colleagues and friends. This book would not exist without the teachings of our PhD advisors, S. R. S. Varadhan and T. J. Lyons; both remained available for discussions at various stages throughout the writing process. Once we approached completion, a courageous few offered to do some detailed reading: C. Bayer, M. Caruana, T. Cass, A. Deya, M. Huesmann, H. Oberhauser, J. Teichmann and S. Tindel. Many others offered their time and support in various forms: G. Ben Arous, C. Borrell, F. Baudoin, R. Carmona, D. Chafa¨ı, T. Coulhon, L. Coutin, M. Davis, B. Davies, A. Davie, D. Elworthy, M. Gubinelli, M. Hairer, B. Hambly, A. Iserles, I. Karatzas, A. Lejay, D. L´epingle, P. Malliavin, P. Markowitch, J. Norris, Z. Qian, J. Ram´ırez, J. Robinson, C. Rogers, M. Sanz-Sole and D. Stroock. This is also a welcome opportunity to thank C. Obtresal, C. Schmeiser, R. Schnabl and W. Wertz for their early teachings. The first author expresses his deep gratitude to the Department of Pure Mathematics and Mathematical Statistics, Cambridge and King’s College, Cambridge, where work on this book was carried out under ideal circumstances; he would also like to thank the Radon Institute and his current affiliations, TU and WIAS Berlin, where this book was finalized. Partial support from the Leverhulme Trust and EPSRC grant EP/E048609/1 is gratefully acknowledged. The second author would like to thank the Mathematical Institute, Oxford and Magdalen College, Oxford, where work on the early drafts of this book was undertaken. Finally, it is our great joy to thank our loving families. Peter K. Friz (Cambridge, Berlin) Nicolas B. Victoir (Hong Kong) June 2009

Introduction One of the remarkable properties of Brownian motion is that we can use it to construct (stochastic) integrals of the type . . . dB. The reason this is remarkable is that almost every Brownian sample path (Bt (ω) : t ∈ [0, T ]) has inﬁnite variation and there is no help from the classical Stieltjes integration theory. Instead, Itˆ o’s theory of stochastic integration relies crucially on the fact that B is a martingale and stochastic integrals themselves are constructed as martingales. If one recalls the elementary interpretation of martingales as fair games one sees that Itˆ o integration is some sort of martingale transform in which the integrand has the meaning of a gambling strategy. Clearly then, the integrand must not anticipate the random movements of the driving Brownian motion and one is led to the class of so-called previsible processes which can be integrated against Brownian motion. When such integration is possible, it allows for a theory of stochastic diﬀerential equations (SDEs) of the form1 dY =

d

Vi (Y ) dB i + V0 (Y ) dt , Y (0) = y0 .

(∗)

i=1

Without going into too much detail, it is hard to overstate the importance of Itˆo’s theory: it has a profound impact on modern mathematics, both pure and applied, not to speak of applications in ﬁelds such as physics, engineering, biology and ﬁnance. It is natural to ask whether the meaning of (∗) can be extended to processes other than Brownian motion. For instance, there is motivation from mathematical ﬁnance to generalize the driving process to general (semi-)martingales and luckily Itˆ o’s approach can be carried out naturally in this context. We can also ask for a Gaussian generalization, for instance by considering a diﬀerential equation of the form (∗) in which the driving signal may be taken from a reasonably general class of Gaussian processes. Such equations have been proposed, often in the setting of fractional Brownian motion of Hurst parameter H > 1/2,2 as toy models to study the ergodic behaviour B = B 1 , . . . , B d is a d-dimensional Brownian motion. 2 Hurst parameter H = 1/2 corresponds to Brownian motion. For H > 1/2, one has enough sample path regularity to use Young integration. 1 Here

2

Introduction

of non-Markovian systems or to provide new examples of arbitrage-free markets under transactions costs. Or we can ask for a Markovian generalization. Indeed, it is not hard to think of motivating physical examples (such as heat ﬂow in rough media) a in which Brownian motion B may be replaced by a Markov ij X process 1 with uniformly elliptic generator in divergence form, say 2 i,j ∂i a ∂j · , without any regularity assumptions on the symmetric matrix aij . The Gaussian and Markovian examples have in common that the sample path behaviour can be arbitrarily close to Brownian motion (e.g. by taking H = 1/2 ± ε resp. a uniformly ε-close to the identity matrix I). And yet, Itˆ o’s theory has a complete breakdown! It has emerged over recent years, starting with the pioneering work of T. Lyons [116], that diﬀerential equations driven by such non-semimartingales can be solved in the rough path sense. Moreover, the soobtained solutions are not abstract nonsense but have ﬁrm probabilistic justiﬁcation. For instance, if the driving signal converges to Brownian motion (in some reasonable sense which covers ε → 0 in the aforementioned examples) the corresponding rough path solutions converge to the classical Stratonovich solution of (∗), as one would hope. While this alone seems to allow for ﬂexible and robust stochastic modelling, it is not all about dealing with new types of driving signals. Even in the classical case of Brownian motion, we get some remarkable insights. Namely, the (Stratonovich) solution to (∗) can be represented as a deterministic and continuous image of Brownian motion and L´evy’s stochastic area t t 1 B j dB k − B k dB j Ajt k (ω) = 2 0 0 alone. In fact, there is a “nice” deterministic map, the Itˆ o–Lyons map, (y0 ; x) → π (0, y0 ; x) which yields, upon setting x = B i , Aj,k : i, j, k ∈ {1, . . . , d} a very pleasing version of the solution of (∗). Indeed, subject to suﬃcient regularity of the coeﬃcients, we see that (∗) can be solved simultaneously for all starting points y0 , and even all coeﬃcients! Clearly then, one can allow the starting point and coeﬃcients to be random (even dependent on the entire future of the Brownian driving signals) without problems; in stark contrast to Itˆ o’s theory which struggles with the integration of non-previsible integrands. Also, construction of stochastic ﬂows becomes a trivial corollary of purely deterministic regularity properties of the Itˆ o–Lyons map. This brings us to the (deterministic) main result of the theory: continuity of the Itˆo–Lyons map x → π (0, y0 ; x) in “rough path” topology. When applied in a standard SDE context, it quickly gives an entire catalogue of limit theorems. It also allows us to

Introduction

3

reduce (highly non-trivial) results, such as the Stroock–Varadhan support theorem or the Freidlin–Wentzell estimates, to relatively simple statements about Brownian motion and L´evy’s area. Moreover, and at no extra price, all these results come at the level of stochastic ﬂows. The Itˆo–Lyons map is also seen to be regular in certain perturbations of x which include (but are not restricted to) the usual Cameron–Martin space, and so there is a natural interplay with Malliavin calculus. At last, there is increasing evidence that rough path techniques will play an important role in the theory of stochastic partial diﬀerential equations and we have included some ﬁrst results in this direction. All that said, let us emphasize that the rough path approach to (stochastic) diﬀerential equations is not set out to replace Itˆo’s point of view. Rather, it complements Itˆo’s theory in precisely those areas where the former runs into diﬃculties. We hope that the topics discussed in this book will prove useful to anyone who seeks new tools for robust and ﬂexible stochastic modelling.

The story in a nutshell 1 From ordinary to rough diﬀerential equations Rough path analysis can be viewed as a collection of smart estimates for diﬀerential equations of type dy = V (y) dx ⇐⇒ y˙ =

d

Vi (y) x˙ i .

i=1

Although a Banach formulation of the theory is possible, we shall remain in ﬁnite dimensions here. of simplicity, let us assume that the For the sake driving signal x ∈ C ∞ [0, T ] , Rd and that the coeﬃcients V1 , . . . , Vd ∈ C ∞,b (Re , Re ), that is bounded with bounded derivatives of all orders. We are dealing with a simple time-inhomogenous ordinary diﬀerential equation (ODE) and there is no question about existence and uniqueness of an Re valued solution from every starting point y0 ∈ Re . The usual ﬁrst-order Euler approximation, from a ﬁxed time-s starting point ys , is obviously t yt − ys ≈ Vi (ys ) dxi . s

(We now adopt the summation convention over repeated up–down indices.) A simple Taylor expansion leads to the following step-2 Euler approximation, t

yt − ys ≈ Vi (ys )

s

t

dxi + Vik ∂k Vj (ys )

s

r

s

dxi dxj

=E(y s ,x s , t )

with xs,t =

t

t

s

r

dx ⊗ dx

dx, s

∈ Rd ⊕ Rd×d .

(1)

s

Let us now make the following H¨ older-type assumption: there exists c1 and α ∈ (0, 1] such that, for all s < t in [0, T ] and all i, j ∈ {1, . . . , d}, t dxi ∨ (Hα ) : s

t s

s

r

1/2 α dxi dxj ≤ c1 |t − s| .

(2)

t r 2 Note that s s dxi dxj is readily estimated by 2 |t − s| , where = |x| ˙ ∞;[0,T ] is the Lipschitz norm of the driving signal, and so (Hα ) holds,

1 From ordinary to rough diﬀerential equations

5

somewhat trivially for now, with c1 = and α = 1. [We shall see later that (Hα ) also holds for d-dimensional Brownian motion for any α < 1/2 and a random variable c1 (ω) < ∞ a.s. provided the double integral is understood in the sense of stochastic integration. Nonetheless, let us keep x deterministic and smooth for now.] It is natural to ask exactly how good these approximations are. The answer is given by Davie’s lemma which says that, assuming (Hα ) for some α ∈ (1/3, 1/2], one has the “step-2 Euler estimate” θ

|yt − ys − E (ys , xs,t )| ≤ c2 |t − s|

where θ = 3α > 1. The catch here is uniformity : c2 = c2 (c1 ) depends on x only through the H¨ older bound c1 but not on its Lipschitz norm. Since it is easy to see that (Hα ) implies α

E (ys , xs,t ) ≤ c3 |t − s| , c3 = c3 (c1 ) , the triangle inequality leads to α

|yt − ys | ≤ c4 |t − s| ,

c4 = c4 (c1 ) .

(3)

As often in analysis, uniform bounds allow for passage to the limit. We therefore take xn ∈ C ∞ [0, T ] , Rd with uniform bounds t i dxn ∨ sup n s

t s

s

r

1/2

dxin dxjn

α

≤ c1 |t − s|

such that, uniformly in t ∈ [0, T ], t t r (1) (2) i i j ∈ Rd ⊕ Rd×d . dxn , dxn dxn → xt ≡ xt , xt 0

0

0

d d×d The and the class of d limiting object x is a path with values in R ⊕ R d×d R ⊕R -valued paths obtained in this way is precisely what we call the α-H¨ older rough paths.1 Two important remarks are in order.

(i) The condition α ∈ (1/3, 1/2] in Davie’s estimate is intimately tied to the fact that the condition (Hα ) involves the ﬁrst two iterated integrals. (ii) The space Rd ⊕Rd×d is not quite the correct state space for x. Indeed, the calculus product rule d xi xj = xi dxj + xj dxi implies that2 t t t r 1 Sym dx ⊗ dx = dx ⊗ dx . 2 0 0 0 0 1 To 2

be completely geometric α-H¨ older rough path. honest, we call this a weak Sym (A) := 12 A + A T , Anti (A) := 12 A − A T for A ∈ Rd ×d .

6

The story in a nutshell

Figure 1. We plot s → (xis , xjs ) and the chord which connects (xi0 , xj0 ), on the lower left side, say, with (xit , xjt ) on the right side. The (signed) enclosed area (2 ) (here positive) is precisely Anti(xt )i , j .

This remains valid in the limit so that x (t) must take values in 1 (1) (2) d d×d (2) (1) (1) ∈R ⊕R = x ⊗x . x = x ,x : Sym x 2 We can get rid of this algebraic redundancy by switching from x to3 x(1) , Anti(x(2) ) ∈ Rd ⊕ so (d) . At least for a smooth path x (·) , this has an appealing geometric interpretation. Let (xi· , xj· ) denote the projection to two distinct coordinates (i, j); basic multivariable calculus then tells us that t t i 1 (2) xs − xi0 dxjs − xjs − xj0 dxis Anti(xt )i,j = 2 0 0 is the area (with multiplicity and orientation taken into account) between the curve {(xis , xjs ) : s ∈ [0, t]} and the chord from (xit , xjt ) to (xi0 , xj0 ). See Figure 1. Example 1 Consider d = 2 and xn (t) = n1 cos 2n2 t , n1 sin 2n2 t ∈ R2 . Then (Hα ) holds with α = 1/2, as may be seen by considering sepa1/2 rately the cases where 1/n is less resp. greater than (t − s) . Moreover, the limiting rough path is 0 0 t , , (4) xt ≡ 0 −t 0 since we run around the origin essentially n2 t/π times, sweeping out area π/n2 at each round. 3 As will be discussed in Chapter 7, this is precisely switching from the step-2 free nilpotent Lie group (with d generators) to its Lie algebra.

1 From ordinary to rough diﬀerential equations

7

We are now ready for the passage to the limit on the level of ODEs. To this end, consider (y n ) ⊂ C ([0, T ] , Re ), obtained by solving, for each n, the ODE dy n = V (y n ) dxn , y n (0) = y0 . By Davie’s lemma the sequence (yn ) has a uniform α-H¨ older bound c4 and by Arzela–Ascoli we see that (yn ) has at least one limit point in C ([0, T ] , Re ). Each such limit point is called a solution to the rough differential equation (RDE) which we write as dy = V (y)dx, y (0) = y0 .

(5)

The present arguments apply immediately for V ∈ C 2,b , that is bounded with two bounded derivatives, and more precisely for V ∈ Lipγ −1 ,γ > 1/α, in the sense of Stein.4 As in classical ODE theory, one additional degree of regularity (e.g. V ∈ Lipγ , γ > 1/α) then gives uniqueness5 and we will write y = π (V ) (0, y0 ; x) for such a unique RDE solution. At last, it should not be surprising from our construction that the RDE solution map (a.k.a. Itˆ o–Lyons map) x → π (V ) (0, y0 ; x) is continuous in x (e.g. under uniform convergence with uniform H¨ older bounds).

t r t with smooth Example 2 Assume xt = 0 dxi , 0 0 dxj dxk i,j,k ∈{1,...,d}

x. Then y = π (V ) (0, y0 ; x) is the classical ODE solution to dy = V (y) dx, y (0) = y0 .

Example 3 Assume x is given by (4) and V = (V1 , V2 ). Then y = π (V ) (0, y0 ; x) can be identiﬁed as the classical ODE solution to dy = [V1 , V2 ] (y) dt where [V1 , V2 ] = V1i ∂i V2 − V2i ∂i V1 is the Lie bracket of V1 and V2 .

4 Writing γ = γ+{γ} with integer γ and {γ} ∈ (0, 1] this means that V is bounded and has up to γ bounded derivatives, the last of which is H¨ o lder with exponent {γ}. 5 With more eﬀort, uniqueness can be shown under Lip 1 / a -regularity.

The story in a nutshell

8

Example 4 Assume B = B 1 , . . . , B d is a d-dimensional Brownian motion. Deﬁne enhanced Brownian motion by t t i j k Bt = dB , B ◦ dB 0

i,j,k ∈{1,...,d}

0

(where ◦ indicates stochastic integration in the Stratonovich sense). We shall see that B is an α-H¨older rough path for α ∈ (1/3, 1/2) and identify Yt (ω) := π (V ) (0, y0 ; B) as a solution to the Stratonovich stochastic diﬀerential equation6 dY =

d

Vi (Y ) ◦ dB i .

i=1

2 Carnot–Caratheodory geometry We now try to gain a better understanding of the results discussed in the last section. To this end, it helps to understand the more general case of H¨ older-type regularity with exponent α = 1/p ∈ (0, 1]. As indicated in remark (i), this will require consideration of more iterated integrals and we need suitable notation: given x ∈ C ∞ [0, T ] , Rd we generalize (1) to7 t

xt := SN (x)0,t :=

1,

dx, 0

∆ 2[0 , t ]

dx ⊗ dx, . . . ,

∆N [0 , t ]

dx ⊗ · · · ⊗ dx , (6)

called the step-N signature of x over the interval [0, t] , with values in ⊗2 ⊗N T N Rd := R ⊕ Rd ⊕ Rd ⊕ · · · ⊕ Rd . Observe that we added a zeroth scalar component in our deﬁnition of xt which is always set to 1. This is pure convention but has some algebraic advantages. To go further, we note that T N Rd has the structure of a (truncated) tensor-algebra with tensor-multiplication ⊗. (Elements with scalar component equal to 1 are always invertible with respect to ⊗.) Computations are simply carried out by considering the standard basis (ei ) of Rd as non-commutative indeterminants; for instance, i j a ei ⊗ b ej = ai bj (ei ⊗ ej ) = ai bj (ej ⊗ ei ). 6 A drift term V (y) dt can be trivially included by considering the time-space process 0 (t, B). 7 ∆k denotes the k-dimensional simplex over [0, t]. [0 , t ]

2 Carnot–Caratheodory geometry

9

The reason we are interested in this sort of algebra is that the trivial t xs,t ≡ (−xs ) + xt = dx =: xs,t s

generalizes to xs,t ≡

x−1 s

t ⊗ xt = 1, dx, s

∆ 2[s , t ]

dx ⊗ dx, . . . ,

∆N [s , t ]

dx ⊗ · · · ⊗ dx .

As a consequence, we have Chen’s relation xs,u = xs,t ⊗ xt,u , which tells us precisely how to “patch together” iterated integrals over adjacent intervals [s, t] and [t, u]. Let us now take on remark (ii) of the previous section. One can see that the step-N lift of a smooth path x, as given in (6), takes values in the free step-N nilpotent (Lie) group with d generators, realized as restriction of T N Rd to GN Rd = exp Rd ⊕ Rd , Rd ⊕ Rd , Rd , Rd ⊕ . . . ≡ exp gN Rd where gN Rd is the free step-N nilpotent Lie algebra and exp is deﬁned by the usual power-series based on ⊗. Example 5 [N = 2] Note that Rd , Rd = so (d). Then exp Rd ⊕ Rd , Rd 1 d = 1, v, v ⊗ v + A : v ∈ R , A ∈ so (d) 2 which is precisely the algebraic relation we pointed out in remark (ii) of the previous section. If the discussion above tells us that T N Rd is too big astate space for lifted smooth paths, Chow’s theorem tells us that GN Rd is the correct state space. It asserts that for all g ∈ GN Rd there exists γ : [0, 1] → Rd , which may be taken to be piecewise linear such that SN (γ)0,1 = g. One can then deﬁne the Carnot–Caratheodory norm g = inf length γ|[0,1] : SN (γ)0,1 = g , where the inﬁmum is achieved for some Lipschitz continuous path γ ∗ : [0, 1] → Rd , some sort of geodesic path associated with g. The Carnot– Caratheodory distance is then simply deﬁned by d (g, h) := g −1 ⊗ h. A Carnot–Caratheodory unit ball is plotted in Figure 2. 0 0 a Example 6 Take g = , ∈ G2 R2 . Then γ ∗ is the 0 −a 0 shortest path which returns to its starting point and sweeps√out area a. From basic isoperimetry, γ ∗ must be a circle and g = 2 πa1/2 . See Figure 3.

10

The story in a nutshell

Figure 2. After identifying G2 Rd with the 3-dimensional Heisenberg group, x 0 a i.e. , ≡ (x, y, a), we plot the (apple-shaped) unit-ball with y −a 0 respect to the Carnot–Caratheodory distance. It contains (and is contained in) a Euclidean ball.

Figure 3. We plot the circle γ ∗ . The z-axis represents the wiped-out area and runs from 0 to a.

2 Carnot–Caratheodory geometry

11

In practice, we rarely need to compute precisely the CC norm of an element g = 1, g 1 , . . . , g N ∈ GN Rd . Instead we rely on the so-called equivalence of homogenous norms, which asserts that ∃κ > 0 : where

1 |||g||| ≤ 1, g 1 , . . . , g N ≤ κ |||g||| κ 1/i |||g||| := max g i (Rd ) ⊗i. i=1,...,N

Here,both “norms” · and |||·||| are homogenous with respect to dilation on GN Rd , δ λ : 1, g 1 , . . . , g N → 1, λ1 g 1 , . . . , λN g N , λ ∈ R. It is time to make the link to our previous discussion. Recall condition (Hα ) from equation (2), which expressed a H¨older-type assumption of the form t r 1/2 t α dx ∨ dx ⊗ dx ≤ c1 |t − s| . s

s

s

But this says exactly that, for all 0 ≤ s < t ≤ T , the corresponding “group” increment xs,t = S2 (x)s,t ∈ G2 Rd satisﬁes α

xs,t = d (xs , xt ) c1 |t − s| , where d is the Carnot–Caratheodory metric on G2 Rd , which is equivalent to d (xs , xt ) xα -H¨o l;[0,T ] ≡ sup α c1 . s,t∈[0,T ] |t − s| This regularity persists under passage to the limit and hence any (weak, geometric) α-H¨older rough path is a genuine α-H¨older path with values in G2 Rd . Conversely, given an abstract α-H¨older path in G2 Rd equipped with Carnot–Caratheodory distance, we can construct a path xn by con catenating geodesic paths associated with the increments xt i ,t i + 1 : i = 0, . . . , 2n } , (ti ) = (i2−n T ); the resulting sequence (xn ) then satisﬁes condition (Hα ) uniformly and converges uniformly, together with its iterated integrals, to the path x (·) with which we started. Nothing of all this is restricted to α ∈ (1/3, 1/2] ←→ N = 2: for any α = 1/p ∈ (0, 1] a weak, geometric 1/p-H¨ older rough path x is precisely a 1/p-H¨ older path in the metric space G[p] Rd , d where d denotes the Carnot–Caratheodory distance. Davie’s lemma extends to the step-[p] setting and we are led to a theory of (rough path) diﬀerential equations, formally written as dy = V (y) dx,

The story in a nutshell

12

where x is a (weak, geometric) 1/p-H¨older rough path. For V ∈ Lipγ −1 one has existence and V ∈ Lipγ with γ > p uniqueness.8 Once in possession of a unique solution y = π (V ) (0, y0 ; x) one may ask for regularity of the Itˆ o–Lyons map x → y. In fact, one can construct the RDE solution as a (weak, geometric) 1/pH¨ older rough path in its own right, say y = π (V ) (0, y0 ; x) with values in G[p] (Re ) and ask for regularity of the full Itˆo–Lyons map (y0 , V, x) → y. It turns out that this solution map is Lipschitz continuous on bounded sets, provided we measure the distance between two driving signals x, x ˜ with a (non-homogenous9 ) 1/p-H¨older distance given by ˜) := ρ1/p-H¨o l (x, x

max

sup

i=1,...,[p] s,t∈[0,T ]

i xs,t − x ˜is,t i/p

|t − s|

.

For most applications it is enough to have (uniform) continuity (on bounded sets), in which case one can work with the (homogenous10 ) 1/p-H¨older distance given by ˜) := d1/p-H¨o l (x, x

sup s,t∈[0,T ]

d (xs,t , x ˜s,t ) |t − s|

i/p

.

The latter often makes computations more transparent and can become indispensible in a probabilistic context (e.g. when studying “exponentially good” approximations in a large deviation context). But no matter which distance is more practical in a given context, both induce the lder rough path” topology on the rough path space same “1/p-H¨ o C 1/p-H¨o l [0, T ] , G[p] Rd . 8 With

more eﬀort, uniqueness can be shown under Lip p -regularity. respect to dilation since, in general,

9 . . . with

ρ1 / p -H ¨o l (δ λ x, δ λ x ˜ ) = |λ| ρ1 / p -H ¨o l (x, x ˜) . 1 0 . . . again

with respect to dilation, d1 / p -H ¨o l (δ λ x, δ λ x ˜ ) = |λ| d1 / p -H ¨o l (x, x ˜) .

3 Brownian motion and stochastic analysis

13

Figure 4. A typical 2-dimensional Brownian sample path. The (signed) area between the straight cord and the sample path corresponds to a typical L´evy area increment.

3 Brownian motion and stochastic analysis Let B be a d-dimensional Brownian motion. Almost every realization of enhanced Brownian motion (EBM) t → Bt (ω) =

t

Bs ⊗ ◦dBs

1, Bt ,

= exp (Bt + A0,t )

0

t with so (d)-valued L´evy area As,t (ω) = 12 s (Bs,r ⊗ dBr − dBr ⊗ Bs,r ) is a (weak) geometric rough path, namely B· (ω) ∈ C α -H¨o l [0, T ] , G2 Rd , d , α ∈ (1/3, 1/2). In Figure 4 we plot a Brownian path with an associated L´evy area increment. Granted the usual α-H¨older regularity of Brownian motion, this statement is equivalent to the question “Is it true that for α < 1/2 :

sup s,t∈[0,1]

|As,t | |t − s|

2α

< ∞ a.s. ?”

The reader is encouraged to think about this before reading on! Perhaps the most elegant way to establish this “rough path regularity” of L´evy area relies on scaling properties of enhanced Brownian motion. Namely, D

D

Bs,t = B0,t−s = δ (t−s) 1 / 2 B0,1 ,

The story in a nutshell

14

so that

2q 2q q = E Bs,t ≤ (const) × |t − s| E d (Bs , Bt )

for any q < ∞. Kolmogorov’s criterion applies without any trouble and so B is indeed a.s. α-H¨ older, α < 1/2, with respect to d. QED. Let us also mention a convergence result: we have dα -H¨o l;[0,T ] (B, S2 (B n )) → 0 in probability where B n denotes a piecewise linear approximation to B based on dissections Dn = {tni : i} with the mesh of Dn tending to 0. We then have two important conclusions: (i) Thanks to α-H¨ older regularity of B, the (random) RDE dY = V (Y ) dB can be solved for a.e. ﬁxed ω and yields a continuous stochastic process (7) Y· (ω) = π (V ) (0, y0 ; B (ω)) . (ii) By continuity of the Itˆo–Lyons map with respect to the rough path metric dα -H¨o l;[0,T ] it follows that π (V ) (0, y0 ; B n ) → π (V ) (0, y0 ; B (ω)) with respect to α-H¨older topology and in probability. Clearly, y n ≡ π (V ) (0, y0 ; B n ) is a solution to the (random) ODE dy n = V (y n ) dB n , y n (0) = y0 and the classical Wong–Zakai theorem11 allows us to identify (7) as the classical Stratonovich solution to dY = V (Y ) ◦ dB =

d

Vi (Y ) ◦ dB i .

i=1

But why is all this useful? The following list should give some idea ... • π (V ) (0, y0 ; B (ω)) is simultaneously deﬁned for all starting points y0 and coeﬃcient vector ﬁelds V of suitable regularity. In particular, the construction of stochastic ﬂows is a triviality and this itself can be the starting point for the robust treatment of certain stochastic partial diﬀerential equations. 1 1 For

example, the books of Ikeda–Watanabe [88] or Stroock [160].

3 Brownian motion and stochastic analysis

15

• Every approximation in rough path topology implies a limit theorem (even on the level of ﬂows). This includes classical piecewise linear approximations and non-standard variations a` la McShane, Sussmann. It also includes a variety of weak limit theorems such as a Donskertype invariance principle. • Various stochastic Taylor expansions (`a la Azencott, Platen, . . . ) can be obtained via deterministic rough path estimates. • Support descriptions a` la Stroock–Varadhan and large deviation estimates `a la Freidlin–Wentzell are reduced to the respective (relatively simple) statements about B in the rough path topology. • The Young integral allows us to perturb B simultaneously in all C q -var [0, 1] , Rd -directions with q < 2. Since Cameron–Martin ⊂ C 1-var ⊂ C q -var this implies in particular path space regularity of the SDE solution beyond Malliavin and there is a natural interplay with Malliavin calculus. • Starting points and vector ﬁelds can be fully anticipating. • At last, for the bulk of these results we can replace Brownian motion at little extra cost by martingales, Gaussian processes or Markov processes provided we can construct a suitable stochastic area and establish the correct rough path regularity!

Part I

Basics

1 Continuous paths of bounded variation We discuss continuous paths, deﬁned on a ﬁxed time horizon, with values in a metric space E. The emphasis is on paths with nice regularity properties and in particular on continuous paths of bounded variation.1 We then specialize to the case when E = Rd . Finally, we discuss simple Sobolev-type regularity of paths.

1.1 Continuous paths on metric spaces We start by deﬁning the supremum or inﬁnity distance. Deﬁnition 1.1 Let (E, d) be a metric space and [0, T ] ⊂ R. Then C ([0, T ], E) denotes the set of all continuous paths x : [0, T ] → E. The supremum or inﬁnity distance of x, y ∈ C ([0, T ] , E) is deﬁned by d∞;[0,T ] (x, y) := sup d (xt , yt ) . t∈[0,T ]

For a single path x ∈ C ([0, T ] , E) , we set |x|0;[0,T ] :=

sup u ,v ∈[0,T ]

d (xu , xv ) ,

and, given a ﬁxed element o ∈ E, identiﬁed with the constant path ≡ o, |x|∞;[0,T ] := d∞;[0,T ] (o, x) = sup d (o, xu ) . u ∈[0,T ]

If no confusion is possible we shall omit [0, T ] and simply write d∞ , |·|0 and |·|∞ . If E has a group structure such as Rd , + the neutral element is the usual choice for o. In the present generality, however, the deﬁnition of |·|∞ depends on the choice of o. Notation 1.2 Of course, [0, T ] can be replaced by any other interval [s, t] in which case one considers x : [s, t] → E. All notations adapt by replacing [0, T ] by [s, t]. Let us also agree that Co ([s, t] , E) denotes those paths in C ([s, t] , E) which start at o, i.e. Co ([s, t] , E) = {x ∈ C ([s, t] , E) : x (s) = o} . 1 Also

known as rectiﬁable paths.

Continuous paths of bounded variation

20

Many familiar properties of real-valued functions carry over. For instance, any continuous mapping from [0, T ] into E is uniformly continuous.2 It is also fairly easy to see that C ([0, T ] , E) is a metric space under d∞ (the induced topology will be called the uniform or supremum topology). Also, if (E, d) is complete then (C ([0, T ] , E) , d∞ ) is complete.

Deﬁnition 1.3 A set H ⊂ C ([0, T ] , E) is said to be equicontinuous if, for all ε > 0 there exists δ such that |t − s| < δ implies d (xs , xt ) < ε for all x ∈ H. It is said to be bounded if supx∈H |x|∞ < ∞.

Theorem 1.4 (Arzela–Ascoli) Let (E, d) be a complete metric space in which bounded sets have compact closure. Then a set H ⊂ C ([0, T ] , E) has compact closure if and only if H is bounded and equicontinuous. As a consequence, a bounded, equicontinuous sequence in C ([0, T ] , E) has a convergent subsequence and, conversely, any convergent sequence in C ([0, T ] , E) is bounded and equicontinuous.

Proof. Let us recall that a subset of a complete metric space has compact closure if and only if it is totally bounded, i.e. for all ε > 0, it can be covered by ﬁnitely many ε-balls. “⇐=”: We show that the assumption “H bounded and equicontinuous” implies total boundedness. We ﬁx ε > 0, and then δ > 0 such that for every f ∈ H, |t − s| < δ =⇒ d (fs , ft ) < ε/4.

(1.1)

Cover [0, T ] with a ﬁnite number of neighbourhoods ti − 2δ , ti + 2δ , i = 1, . . . , m, and deﬁne Ht i = {ft i , f∈ H}; as Ht i ⊂ E is bounded, its closure is compact, and so is its union 1≤i≤m Ht i ; let c1 , . . . , cn ∈ 1≤i≤m Ht i be such that 1≤i≤m Ht i is covered by the union of the ε/4-balls centred around some cj . Then, consider Φ, the set of functions from {1, . . . , m} into {1, . . . , n}. For each ϕ ∈ Φ, denote by Lϕ,ε the set of all functions f ∈ C ([0, T ] , E) such that maxi d ft i , cϕ (i) ≤ 4ε . Observe that from the deﬁnition of cj it follows that H is covered by the union of the (Lϕ,ε )ϕ∈Φ . To end the proof, we need only show that the diameter of each Lϕ,ε is ≤ ε. 2 For

example, Dieudonn´e [43], (3.6.15).

1.2 Continuous paths of bounded variation on metric spaces

21

If f, g are both in Lϕ,ε , then d∞ (f, g) = ≤

sup d (ft , gt ) t∈[0,T ]





max d (ft i , gt i ) +

1≤i≤m

d (fs , ft i ) + d (gs , gt i )

sup s∈(t i − δ2 ,t i + δ2

)

ε from (1.1) 2 ε ≤ max d ft i , cϕ (i) + max d gt i , cϕ(i) + i i 2 ≤ ε by deﬁnition of Lϕ,ε .

≤

max d (ft i , gt i ) +

1≤i≤m

“=⇒”: Since compact sets are bounded, only equicontinuity needs proof. By assumption H has compact closure and therefore is totally bounded. Fix ε > 0 and pick h1 , . . . , hn such that H ⊂ 1≤i≤n B hi , ε/3 where B (h, ε) denotes the open ε-ball centred at h. By continuity of each hi (·), there exists δ = δ (ε) such that |t − s| < δ =⇒ max d his , hit < ε/3. i=1,...,n

But then, for every h ∈ H, d (hs , ht ) ≤ ε/3+maxi=1,...,n d his , hit +ε/3 ≤ ε provided |t − s| < δ and so H is equicontinuous. The consequences for sequences are straightforward and left to the reader.

1.2 Continuous paths of bounded variation on metric spaces 1.2.1 Bounded variation paths and controls Let us write D ([s, t]) for the set of all dissections of some interval [s, t] ⊂ R, thus a typical element in D ([s, t]) is written D = {s = t0 < t1 < · · · < tn = t} and consists of #D = n adjacent intervals [ti−1 , ti ]. The mesh of D is deﬁned as |D| := maxi=1,...,n |ti − ti−1 | and we shall denote by Dδ ([s, t]) the set of all dissections of [s, t] with mesh less than or equal to δ. Deﬁnition 1.5 Let (E, d) be a metric space and x : [0, T ] → E. For 0 ≤ s ≤ t ≤ T , the 1-variation of x on [s, t] is deﬁned as3 |x|1-var;[s,t] = sup d xt i , xt i + 1 . (t i )∈D([s,t])

3 Let

i

us agree that |x|1 -va r;[s , s ] = 0 for 0 ≤ s ≤ T .

22

Continuous paths of bounded variation

If |x|1-var;[s,t] < ∞, we say that x is of bounded variation or of ﬁnite 1variation on [s, t]. The space of continuous paths of ﬁnite 1-variation on [0, T ] is denoted by C 1-var ([0, T ] , E), its subset of paths started at o ∈ E is denoted by Co1-var ([0, T ] , E). In the discussion of 1-variation regularity (and later p-variation regularity for p ≥ 1), the notion of control or control function, deﬁned on the simplex ∆ := ∆T = {(s, t) : 0 ≤ s ≤ t ≤ T } turns out to be extremely useful. Deﬁnition 1.6 A map ω : ∆T → [0, ∞) is called superadditive if for all s ≤ t ≤ u in [0, T ], ω(s, t) + ω(t, u) ≤ ω(s, u). If, in addition, ω is continuous and zero on the diagonal, i.e. ω (s, s) = 0 for 0 ≤ s ≤ T we call ω a control or, more precisely, a control function on [0, T ] . Deﬁnition 1.7 We say that the 1-variation of a map x : [0, T ] → E is dominated by the control ω, or controlled by ω, if there exists a constant C < ∞ such that for all s < t in [0, T ], d (xs , xt ) ≤ Cω (s, t) .

θ

Simple examples of controls are given by (s, t) → |t − s| for θ ≥ 1 or the integral of a non-negative function in L1 ([0, T ]) over the interval [s, t]. Trivially, a positive linear combination of controls yields another control. If ω is a control and x : [0, T ] → E a map controlled by ω then x is continuous. Exercise 1.8 Let φ ∈ C ([0, ∞), [0, ∞)) be increasing, convex with φ (0) = 0. Assuming that ω is a control, show that φ ◦ ω : (s, t) → φ (ω (s, t)) is also a control. Solution. Fix 0 < a < b and observe that by convexity φ (a) − φ (0) φ (a + b) − φ (b) ≥ a a so that φ (a + b) ≥ φ (a) + φ (b). Interchanging a, b if needed, this holds for all a, b ≥ 0 and we conclude that φ [ω (s, t)] + φ [ω(t, u)] ≤ φ [ω(s, t) + ω(t, u)] ≤ φ [ω(s, u)] . Exercise 1.9 Assume ω, ω ˜ are controls. (i) Show that ω.˜ ω is a control. (ii) Show that max (ω, ω ˜ ) need not be a control. ω β is a control. (iii) Given α, β > 0 with α + β ≥ 1, show that ω α .˜

1.2 Continuous paths of bounded variation on metric spaces

23

Solution. (iii) By Exercise 1.8, it is enough to consider the case α + β = 1. But this follows from H¨ older’s inequality, α 1 β 1 1 1 a ˜ β + ˜b β . ∀a, a ˜, b, ˜b ≥ 0 : a˜ a + b˜b ≤ a α + b α

Exercise 1.10 Let ω be a control on [0, T ] and consider s < u in [0, T ]. Show that there exists t ∈ [s, u] such that max {ω (s, t) , ω (t, u)} ≤ ω (s, u) /2. Solution. By continuity and monotonicity of controls, there exists t such that ω (s, t) = ω (t, u). By super-additivity, 2ω (s, t) = 2ω (t, u) = ω (s, t) + ω (t, u) ≤ ω (s, u)

and the proof is ﬁnished.

Proposition 1.11 Consider x : [0, T ] → E and ω = ω (s, t) super-additive, with s < t in [0, T ]. If d (xs , xt ) ≤ ω (s, t) for all s < t in [0, T ], then |x|1-var;[s,t] ≤ ω (s, t). Proof. Let D = (ti ) be a dissection of [s, t] . Then, by assumption, # D −1

≤ d xt i ,t i + 1

i=0

# D −1

ω (ti , ti+1 )

i=0

≤

ω (s, t) by super-additivity of ω.

Taking the supremum over all such dissections ﬁnishes the proof. Proposition 1.12 Let x ∈ C 1-var ([0, T ] , E). Then (s, t) → ω x (s, t) := |x|1-var;[s,t] deﬁnes a control on [0, T ] such that for all 0 ≤ s < t ≤ T d(xs , xt ) ≤ |x|1-var;[s,t] . This control is additive: for all 0 ≤ s ≤ t ≤ u ≤ T, |x|1-var;[s,u ] = |x|1-var;[s,t] + |x|1-var;[t,u ] . In particular, t ∈ [0, T ] → (t) := |x|1-var;[0,t] ∈ R is continuous, increasing and hence of ﬁnite 1-variation.

(1.2)

Continuous paths of bounded variation

24

Proof. Trivially, |x|1-var;[s,s] = 0 for all s ∈ [0, T ] . To see super-additivity it suﬃces to take dissections D1 , D2 of [s, t] and [t, u] respectively; noting that the union of D1 and D2 is a dissection of [s, u] we have d xt i , xt i + 1 + d xt j , xx j + 1 ≤ |x|1-var;[s,u ] t i ∈D 1

t j ∈D 2

and ω x (s, t) + ω x (t, u) ≤ ω x (s, u) follows from taking the supremum over all dissections D1 and D2 . For additivity of ω x we establish the reverse inequality. Let D = (vi ) be a dissection of [s, u] so that t ∈ [vj , vj +1 ] for some j. We then have # D −1

j −1 d xv i ,v i + 1 = d xv i , xv i + 1 +

i=0

i=0

+

# D −1

d xv j , xv j + 1

≤ d (x v j ,x t )+d (x t ,x v j + 1 )

d xv i , xv i + 1 .

i=j +1

But, as j −1 d xv i , xv i + 1 + d xv j , xt

≤

|x|1-var;[s,t] ,

≤

|x|1-var;[t,u ] ,

i=0 D −1 # d xv i , xv i + 1 d xt , xv j + 1 + i= j +1

we have

# D −1

d xv i ,v i + 1 ≤ |x|1-var;[s,t] + |x|1-var;[t,u ] .

i=0

Taking the supremum over all dissections shows additivity of ω x . It only remains to prove its continuity. To this end, ﬁx s < t in [0, T ]. From monotonocity of ω x , we see that the limits |x|1-var;[s + ,t − ] := :=

lim

|x|1-var;[s+h 1 ,t−h 2 ] , |x|1-var;[s − ,t + ]

lim

|x|1-var;[s−h 1 ,t+ h 2 ]

h 1 ,h 2 0 h 1 ,h 2 0

exist and that |x|1-var;[s + ,t − ] ≤ |x|1-var;[s,t] ≤ |x|1-var;[s − ,t + ] .

(1.3)

We aim to show the inequalities in (1.3) are actually equalities. To establish “continuity from inside”, i.e. |x|1-var;[s + ,t − ] = |x|1-var;[s,t] , we deﬁne

1.2 Continuous paths of bounded variation on metric spaces

25

ω (s, t) = |x|1-var;[s + ,t − ] , and pick s < t < u in [0, T ] , and h1 , h2 , h3 , h4 four (small) positive numbers. If (ai ) is a dissection of [s + h1 , t − h2 ] , and (bj ) a dissection of [t + h3 , u − h4 ] , then by deﬁnition of the 1-variation of x, d xa i , xa i + 1 + d xb j , xb j + 1 ≤ |x|1-var;[s+ h 1 ,u −h 4 ] . i

j

Taking the supremum over all possible dissections (ai ) and (bi ) , we obtain |x|1-var;[s+ h 1 ,t−h 2 ] + |x|1-var;[t+ h 3 ,u −h 4 ] ≤ |x|1-var;[s+h 1 ,u −h 4 ] . Letting h1 , h2 , h3 , h4 go to 0 and using the continuity of x we obtain that (s, t) → |x|1-var;[s + ,t − ] is super-additive. We also easily see that for all s, t ∈ [0, T ] , d (xs , xt ) ≤ |x|1-var;[s + ,t − ] . Hence, using Proposition 1.11, we obtain |x|1-var;[s + ,t − ] ≥ |x|1-var;[s,t] , and hence we have proved that |x|1-var;[s + ,t − ] = |x|1-var;[s,t] for all s < t in [0, T ]. The remaining part of the proof is “continuity from outside”, i.e. |x|1-var;[s − ,t + ] = |x|1-var;[s,t] .Using additivity of |x|1-var;[.,.] it is easy to see that |x|1-var;[s − ,t + ]

=

|x|1-var;[0,T ] − |x|1-var;[0,s − ] − |x|1-var;[t + ,T ]

=

|x|1-var;[0,T ] − |x|1-var;[0,s] − |x|1-var;[t,T ] = |x|1-var;[s,t]

and this ﬁnishes the proof. Exercise 1.13 Assume ω is a control on [0, T ]. Assume f ∈ C (∆, [0, ∞)) where ∆ = {(s, t) : 0 ≤ s ≤ t ≤ T } , non-decreasing in the sense that [s, t] ⊂ [u, v] implies f (s, t) ≤ f (u, v) . Show that (s, t) → f (s, t) ω (s, t) is a control. As application, given x ∈ C 1-var ([0, T ] , E) and y ∈ C ([0, T ] , E), show that (s, t) → |y|∞;[s,t] |x|1-var;[s,t] is a control where |·|∞;[s,t] is deﬁned with respect to some ﬁxed o ∈ E. Proposition 1.14 Let x ∈ C ([0, T ] , E). Then for all δ > 0 and 0 ≤ s ≤ t ≤ T, |x|1-var;[s,t] = sup d xt i , xt i + 1 ∈ [0, ∞] . (t i )∈Dδ ([s,t])

i

Proof. Clearly, ω x,δ (s, t) :=

sup (t i )∈Dδ ([s,t])

d xt i , xt i + 1 ≤ ω x (s, t) = |x|1-var;[s,t] . i

Continuous paths of bounded variation

26

Super-addivitity of ω x,δ follows from the same argument as for ω x . Take any D = (ui ) ∈ Dδ ([s, t]) so that s = u0 < u1 < · · · < un = t with ui+1 − ui < δ. It follows that d (xs , xt ) ≤ d (xs , xu 1 ) + · · · + d xu n −1 , xt ≤ ω x,δ (s, t) . From Proposition 1.11, we conclude that |x|1-var;[s,t] ≤ ω x,δ (s, t) , which concludes the proof. We now observe lower semi-continuity of the function x → |x|1-var in the following sense. Lemma 1.15 Assume (xn ) is a sequence of paths from [0, T ] → E of ﬁnite 1-variation. Assume xn → x pointwise on [0, T ]. Then, for all s < t in [0, T ], |x|1-var;[0,T ] ≤ lim inf |xn |1-var;[0,T ] . n →∞

Proof. Let D = {0 = t0 < t1 < · · · < tK = T } be a dissection of [0, T ] . By assumption, xn → x pointwise and so K −1

d xt i , xt i + 1

i=0

= lim inf n →∞

d xnti , xnti + 1 i

≤ lim inf |xn |1-var;[0,T ] . n →∞

Taking the supremum over all the dissections of [s, t] ﬁnishes the 1-variation estimate. In general, the inequality in Lemma 1.15 can be strict. The reader is invited to construct an example in the following exercise. Exercise 1.16 Construct (xn ) ∈ C 1-var ([0, 1] , R) such that |xn |∞;[0,1] ≤ 1/n but so that |xn |1-var = 1 for all n. Conclude that the inequality in Lemma 1.15 can be strict.

1.2.2 Absolute continuity Deﬁnition 1.17 Let (E, d) be a metric space. A path x : [0, T ] → E is absolutely continuous if for all ε > 0, there existsδ > 0, such that for all s 1 < t1 ≤ s2 < t2 ≤ · · · < sn < tn in [0, T ] with i |ti − si | < δ, we have d (x , x ) < ε. s t i i i Proposition 1.18 Any absolutely continuous path is a continuous path of bounded variation. Proof. If x : [0, T ] → E is absolutely continuous it is obviously continuous. Furthermore, by deﬁnition there exists δ > 0, such that for all s1 < t1 ≤ s2 < t2 ≤ · · · < sn < tn ∈ [0, T ] with i |ti − si | ≤ δ, we have

1.2 Continuous paths of bounded variation on metric spaces

27

d (xs i , xt i ) ≤ 1. Pick D = (ti )1≤i≤n a dissection of [0, T ]. Then, deﬁne j0 = 1 and jk = max i, ti − tj k −1 ≤ δ , and observe that j[T /δ ]+1 = jk for all k ≥ [T /δ] + 1. i

n −1

[T /δ ]+ 1 j k + 1 −1

d xt i , xt i + 1 ≤

i=1

k =0

i=j k

d xt i , xt i + 1 .

j k + 1 −1 |ti+1 − ti | = tj k + 1 − tj k ≤ δ, hence By deﬁnition of the jk s, i= j k j k + 1 −1 n −1 d xt i , xt i + 1 ≤ 1, which implies that i=1 d xt i , xt i + 1 ≤ [T /δ]+ i= j k 1. Taking the supremum over all dissections ﬁnishes the proof. In general, the converse of the above is not true, as seen in the following. Example 1.19 (Cantor function) Each x ∈ [0, 1] has a base-3 decimal expansion x = j ≥1 aj 3−j where aj ∈ {0, 1, 2}. This expansion is unique unless x is of the form p3−k for some p, k ∈ N (we may assume p is not divisible by 3) and in this case x has two expansions: one with aj = 0 for j > k and one with aj = 2 for j > k. One of them has ak = 1, the other will have ak ∈ {0, 2}. If we agree always to use the latter, we see that a1 a1

= =

1 iﬀ x ∈ (1/3, 2/3) 1, a2 = 1 iﬀ x ∈ (1/9, 2/9) ∪ (7/9, 8/9)

and so forth. The Cantor set C is then deﬁned as the set of all x ∈ [0, 1] that have a base-3 expansion x = aj 3−j with aj = 1 for all j. Thus C is obtained from K0,1 = [0, 1] by removing the open middle third, leaving us with the union of K1,1 = [0, 1/3] , K1,2 = [2/3, 1]; followed by removing all open middle thirds, leaving us with the union of K2,1 = [0, 1/9] , K2,1 = [2/9, 3/9] , K2,1 = [6/9, 7/9] , K2,1 = [8/9, 1] 2 and so forth, so that in the end C = ∩∞ n =1 ∪i=1 Kn ,i . Let us now deﬁne the Cantor function f on C by aj 2−j , x ∈ C. f (x) = 2 n

j ≥1

This series is the base-2 expansion of a number in [0, 1] and since any number in [0, 1] can be obtained this way we see that f (C) = [0, 1]. One readily sees that if x, y ∈ C and x < y, then f (x) < f (y) unless x and y are the endpoints of one of the open intervals removed from [0, 1] to obtain C. In this case, f (x) = p2−k for some p, k ∈ N and f (x) = f (y), given by the two base-2 expansions of this number. We can therefore extend f to a map from [0, 1] to itself by declaring it to be constant on the intervals missing from C. This extended f is still increasing, and since its range is all of [0, 1] it cannot have any jump discontinuities, hence it is continuous.

28

Continuous paths of bounded variation

Being increasing, f is obviously of bounded variation on [0, 1]. We now show that f is not absolutely continuous. Given any δ > 0 we can take si , ti as the boundary points of the intervals (Kn ,i )i=1,...,2 n with n chosen large 2 n enough so that i=1 (ti − si ) < δ. Then, since f is constant on [ti , si+1 ] for i = 1, . . . , 2n − 1, we have n

2

|f (ti ) − f (si )| = f (1) − f (0) = 1.

i=1

1.2.3 Lipschitz or 1-H¨ older continuity Deﬁnition 1.20 Let (E, d) be a metric space. A path x : [0, T ] → E is Lipschitz or 1-H¨older continuous4 if |x|1-H¨o l;[0,T ] :=

sup s,t∈[0,T ]

d (xs , xt ) < ∞. |t − s|

The space of all such paths is denoted by C 1-H¨o l ([0, T ] , E), the subset of paths started at o ∈ E is denoted by Co1-H¨o l ([0, T ] , E). We observe that every Lipschitz path is absolutely continuous. In particular, it is of bounded variation and we note |x|1-var;[s,t] ≤ |x|1-H¨o l;[s,t] × |t − s| . older if and only if it is conFurthermore, x ∈ C 1-var ([0, T ] , E) is 1-H¨ trolled by (s, t) → |t − s|. It is easy to construct examples which are of bounded variation but not Lipschitz (e.g. t → t1/2 ). On the other hand, every continuous bounded variation path is a continuous time-change (or reparametrization) of a Lipschitz path. Proposition 1.21 A path x ∈ C ([0, T ], E) is of ﬁnite 1-variation if and only if there exists a continuous non-decreasing function φ from [0, T ] onto [0, 1] and a path y ∈ C 1-H¨o l ([0, 1] , E) such that x = y ◦ φ. Proof. We may assume |x|1-var;[0,T ] = 0 (otherwise, x|[0,T ] is constant and there is nothing to show). By Propostion 1.12, φ(t) =

|x|1-var;[0,t] |x|1-var;[0,T ]

deﬁnes a continuous increasing function from [0, T ] onto [0, 1] . Then, there exists a function y such that (y ◦ φ) (t) = x (t) , as φ (t1 ) = φ (t2 ) =⇒ 4 . . . in view of the later deﬁnition of H¨ o lder continuity and in order to avoid redundant notation . . .

1.3 Continuous paths of bounded variation on Rd

29

x (t1 ) = x (t2 ) . Now, sup

0≤u < v ≤1

d (y (u) , y (v)) |u − v|

=

sup

0≤u < v ≤T

d (y (φ (u)) , x (y (v))) |φ (u) − φ (v)|

≤

|x|1-var;[u ,v ] |x|1-var;[0,T ] |x|1-var;[0,u ] − |x|1-var;[0,v ]

=

|x|1-var;[0,T ] .

This shows that y is in C 1-H¨o l ([0, 1] , E). The converse direction is an obvious consequence of the invariance of variation norms under reparametrization. Remark 1.22 The 1-variation (i.e. length) of a path is obviously invariant under reparametrization and so it is clear that |y|1-var;[0,1] = |x|1-var;[0,T ] . On the other hand, for the particular parametrization φ (·) used in the previous proof (essentially the arc-length parametrization) we saw that |y|1-H¨o l;[0,1] ≤ |x|1-var;[0,T ] . With the trivial |y|1-var;[0,1] ≤ |y|1-H¨o l;[0,1] we then see that |y|1-H¨o l;[0,1] = |x|1-var;[0,T ] . Lemma 1.23 Assume (xn ) is a sequence of paths from [0, T ] → E of ﬁnite 1-variation. Assume xn → x pointwise on [0, T ]. Then, for all s < t in [0, T ], |x|1-H¨o l; [s,t] ≤ lim inf |xn |1-H¨o l;[s,t] . n →∞

Proof. The H¨ older statement is a genuine corollary of Lemma 1.15: it suﬃces to note that for any u, v ∈ [s, t] , d (xu , xv )

≤

|x|1-var;[u ,v ]

≤

lim inf |xn |1-var;[u ,v ] .

≤

|v − u| lim inf |xn |1-H¨o l;[s,t] .

n →∞

n →∞

1.3 Continuous paths of bounded variation on Rd Unless otherwise stated, Rd shall be equipped with Euclidean structure. In particular, if a ∈ Rd has coordinates a1 , . . . , ad its norm is given by |a| =

2

2

|a1 | + · · · + |ad | .

30

Continuous paths of bounded variation

Given a map x : [0, T ] → Rd the group structure of Rd , + allows us to speak of the increments of x (·) and we write5 xs,t := xt − xs .

1.3.1 Continuously diﬀerentiable paths

diﬀerWe deﬁne inductively the set C k [0, T ] , Rd of k-times continuously 0 d d [0, T ] , R to be C [0, T ] , R , and then entiable paths by ﬁrst deﬁning C k d in C [0, T ] , R . C k +1 [0, T ] , Rd to be the set of paths with a derivative ∞ d [0, T ] , R to be the interFinally, we deﬁne the set of smooth paths C section of all C k [0, T ] , Rd , for k ≥ 0. For continuously diﬀerentiable paths, the computation of 1-variation is a simple matter. Proposition 1.24 Let x ∈ C 1 [0, T ] , Rd . Then t ∈ [0, T ] → (t) := |x|1-var;[0,t] ∈ R is continuously diﬀerentiable and ˙ (t) = |x˙ (t)| for t ∈ (0, T ). In particular, |x|1-var;[s,t] =

t

|x˙ u | du s

for all s < t in [0, T ]. Proof. We ﬁrst note that |xt − xs | ≤ obtain that

t s

|x˙ u | du; using Proposition 1.11, we

(t) − (s) = |x|1-var;[s,t] ≤

t

|x˙ u | du. s

Equality in the above estimate will follow immediately from ˙ (t) = |x˙ (t)| and this is what we now show. Take t ∈ [0, T ) and h small enough (so that t + h ≤ T ). Clearly, | (t + h) − (t)| 1 |xt,t+h | ≤ ≤ h h h

t+ h

|x˙ u | du t

and upon sending h ↓ 0 we see that is diﬀerentiable at t from the right with derivative equal to |x˙ t |. The same argument applies “from the left” and so is indeed diﬀerentiable with derivative |x|. ˙ By assumption on x, this derivative is continuous and the proof is ﬁnished. 5 Later on, we shall replace Rd by a Lie group (G, ·) and increments will be deﬁned as (xs )−1 · xt .

1.3 Continuous paths of bounded variation on Rd

31

1.3.2 Bounded variation The results of Section 1.2 applied to Rd equipped with Euclidean distance allow us in particular to consider the space C 1-var [0, T ] , Rd . Theorem 1.25 C 1-var [0, T ] , Rd is Banach with norm x → |x (0)| + |x|1-var;[0,T ] . The closed subspace of paths in C 1-var [0, T ] , Rd started at 0, denoted by C01-var [0, T ] , Rd , is also Banach under x → |x|1-var;[0,T ] . These Banach spaces are not separable. Proof. It is easy to see that C 1-var [0, T ] , Rd , C01-var [0, T ] , Rd are normed linear spaces under the given norms. We thus focus on completeness. Noting that sup |x (t)| ≤ |x (0)| + |x|1-var;[0,T ] ,

t∈[0,T ]

a Cauchy sequence (xn ) with respect to x → |x (0)| + |x|1-var;[0,T ] is also Cauchy in uniform topology and thus (uniformly) convergent to some continuous path x (·). By Lemma 1.15 it is clear that x has ﬁnite 1-variation and it only remains to see that xn → x in 1-variation norm. To this end, let D = {0 = t0 < · · · < tK = t} be an arbitrary dissection of [0, T ]. For every ε > 0 there exists N = N (ε) large enough so that for all n, m ≥ N (ε) sup D

K −1

< ε/2. d xnti ,t i + 1 , xm t i ,t i + 1

i=0

On the other hand, we can ﬁx D and ﬁnd m large enough so that K −1

< ε/2 d xm t i ,t i + 1 , xt i ,t i + 1

i=0

which implies that for n ≥ N (ε) large enough K −1

d xnti ,t i + 1 , xt i ,t i + 1 ≤ ε,

i=0

uniformly over all D. But this precisely says that xn → x in 1-variation. Non-separability follows from the example below. Example 1.26 (non-separability) We give an example of an uncountable family of functions (fα ) in C 1-var ([0, 1] , R) for which |fα − fα |1-var ≥ 1 if α = α . To this end, take α = (αn )n ≥1 to be a {0, 1}-sequence, and 1) as the write [0, union of the disjoint interval In , n ≥ 1, where In ≡ 1 − 2 n1−1 , 1 − 21n . If αn = 0 then deﬁne fα to be zero on In . Otherwise, deﬁne fα on In by s 1 fα (tn + s) = sin nπ −n 2n 2

Continuous paths of bounded variation

32

so that, using Proposition 1.24, |fα |1-var;I n = 1. By construction fα (tn ) = 0 for all n and hence fα is continuous on [0, 1). (Left) continuity at 1 is also clear: thanks to the decay factor 1/n we see that fα (t) → 0 as t 1. A simple approximation of a path x on Rd is given by its piecewise linear approximation.6 Deﬁnition 1.27 Let x : [0, T ] → Rd , and D = (ti )i a dissection of [0, T ] . We deﬁne the piecewise linear approximation to x by t − ti xt ,t if ti ≤ t ≤ ti+1 . xD t = xt i + ti+1 − ti i i + 1 Proposition 1.28 Let x ∈ C 1-var [0, T ] , Rd . Then, for any dissection D of [0, T ] and any s < t in [0, T ] , D x ≤ |x|1-var;[s,t] . (1.4) 1-var;[s,t] If (Dn ) is an arbitrary sequence of dissections with mesh |Dn | → 0, then xD n converges uniformly to x. (We can write this more concisely as xD → x uniformly on [0, T ] as |D| → 0.) Proof. The estimate (1.4) boils down to the fact that the shortest way to connect two points in Rd is via a straight line. The convergence result requires the remark that x (·) is uniformly continuous on [0, T ]. The easy details are left to the reader. The question arises if (or when) xD → x in 1-variation as |D| → 0. Since piecewise linear approximations are absolutely continuous, the following result tells us that there is no hope unless x is absolutely continuous. (We shall see later that xD → x in 1-variation as |D| → 0 indeed holds true provided x is absolutely continuous.) Proposition 1.29 The set of absolutely continuous functions from [0, T ] → Rd is closed in 1-variation and a Banach space under 1-variation norm. Proof. We prove that if xn is absolutely continuous and converges to x in 1-variation norm, then x is absolutely continuous. Fix ε > 0, and n ∈ N such that ε |x − xn |1-var + |x0 − xn0 | < . 2 Then, as xn is absolutely continuous, there existsδ > 0, such that for all t2 ≤ · · · < sn < tn in [0, T ] with i |ti − si | < δ, we have s 1 < tn1 ≤ s2 < ε i xs i ,t i < 2 . This implies that xns ,t + xs ,t − xns ,t sup |xs ,t | ≤ i

i

i

i

i

≤

i

i

D =(t i ) of [0,T ]

i

i

xns ,t + |x − xn | 1-var ≤ ε i i i

and the proof is ﬁnished. 6A

powerful generalization of this will be discussed in Section 5.2.

i

i

1.3 Continuous paths of bounded variation on Rd

33

Exercise 1.30 By Proposition 1.29 it is clear that piecewise linear approximations cannot converge (in 1-variation) to the Cantor function f : [0, 1] → [0, 1] given in Example 1.19. (By Proposition 1.29 any 1-variation limit point is absolutely continuous; but the Cantor function is not absolutely continuous as was seen in Exercise 1.19). Verify this by an explicit computation. More precisely, set Dn = {j3−n ; j = 0, . . . , 3n } and show that f − f D n = |f − I|1-var;[0,1] 1-var;[0,1] where I (x) = x and conclude that f D f in 1-variation as |D| → 0. Solution. f is self-similar, in the sense that for all n ≥ 1, k ∈ {0, . . . , 3n } , f k3−n + 3n x − f k3−n = 2n f (x) . Using self-similarity, we see that, if I denotes the identity function on [0, 1], f − f D n = 2n |f − I|1-var;[0,1] . 1-var; [ 3jn , j3+n 1 ] Hence, f − f D n 1-var;[0,1]

=

n 2 −1

f − f D n 1-var; [

j =0

=

j 3n

, j3+n 1 ]

|f − I|1-var;[0,1] > 0.

1.3.3 Closure of smooth paths in variation norm

Let us deﬁne C 0,1-var [0, T ] , Rd as the closure of smooth paths from [0, T ] → Rd in 1-variation norm. Obviously, C 0,1-var is a closed, linear subspace of C 1-var [0, T ] , Rd and thus a Banach space. Restricting to paths with x (0) = 0 yields a further subspace (also Banach) denoted by C00,1-var [0, T ] , Rd . By Proposition 1.29 any element of C 0,1-var must be absolutely continuous (a.c.) and so C 0,1-var [0, T ] , Rd ⊂ x : [0, T ] → Rd a.c. C 1-var [0, T ] , Rd . We shall show that the ﬁrst inclusion is in fact an equality. Proposition 1.31 The map y →

· 0

yt dt is a Banach space isomorph from

L1 [0, T ] , Rd → C00,1-var [0, T ] , Rd .

Continuous paths of bounded variation

34

[0, T ] ,Rd if and only if there exists a As a consequence, x ∈ C 0,1-var (uniquely determined) x˙ ∈ L1 [0, T ] , Rd , see Remark 1.33, such that · x ≡ x0 + x˙ t dt 0

˙ L 1 holds. and in this case the Banach isometry |x|1-var = |x| Proof. Without lossof generality started at x0 = 0. we consider paths For · any smooth y ∈ C ∞ [0, T ] , Rd we have x = 0 yt dt ∈ C ∞ [0, T ] , Rd and so, by Proposition 1.24, |x|1-var = |y|L 1 . Obviously, this allows us to extend the map · yt dt ∈ C ∞ [0, T ] , Rd ι : y ∈ C ∞ [0, T ] , Rd → x = 0

to the respective closures. From the very deﬁnition of the space C00,1-var and by density of smooth paths in L1 it follows that i extends to (a Banach space isomorphism) ˆι : L1 [0, T ] , Rd → C00,1-var [0, T ] , Rd . To see that ˆι still has the simple representation as an indeﬁnite integral, let y ∈ L1 [0, T ] , Rd , take smooth approximations y n in L1 and pass to the limit in t

ι (y n )t =

T

ysn ds = 0

ysn 1[0,t] (s) ds, 0

n 1 using the simple fact that y n → y in L1 implies y 1[0,t] → y1[0,t] in L for 1 d every ﬁxed t ∈ [0, T ]. At last, given x ∈ ˆι(L [0, T ] , R we write x˙ rather than y for the uniquely determined ˆι−1 (x) ∈ L1 [0, T ] , Rd . The next proposition requires some background in basic measure theory (Lebesgue–Stieltjes measures, Radon–Nikodym theorem, . . . ).7 d Proposition 1.32 Let x : [0, T ] → Then it

R be absolutely continuous. can be written in the form x0 + 0 x˙ t dt with x˙ ∈ L1 [0, T ] , Rd . As a consequence, C 0,1-var [0, T ] , Rd = x : [0, T ] → Rd absolutely continuous .

Proof. It suﬃces to consider d = 1. The function x determines a signed Borel measure on R via µ ((−∞, t]) = x0,t ≡ xt − x0 for t ∈ [0, T ] 7 See

Folland and Stein’s book [55] for instance.

1.3 Continuous paths of bounded variation on Rd

35

and putting zero mass on R\ [0, T ]. The assumption of absolute continuity of x implies that µ is absolutely continuous (in the sense of measures) with respect to Lebesgue measure λ. By the Radon–Nikodym theorem, there exists an integrable density function y = dµ/dλ, an integrable function from [0, T ] to R, uniquely deﬁned up to Lebesgue null sets, such that

t

xt = µ ((0, t]) =

ys ds. 0

Hence, using Proposition 1.31, x ∈ C 0,1-var [0, T ] , Rd . The converse inclusion follows directly from Proposition 1.29. Remark 1.33 Our notation for x˙ for the unique L1 -function with the property t x˙ t dt xt = x0 + 0

for absolutely continuous x is consistent with the fundamental theorem of calculus for Lebesgue integrals (e.g. [55], p. 106). It states that a real-valued function x on [0, T ] is absolutely continuous if and only if its derivative xt+ h − xt lim h→0 h exists for almost every t ∈ [0, T ] and gives an L1 -function whose indeﬁnite integral is xt − x0 . We have not shown (and will not use) the fact that x˙ is the almost-sure limit of the above diﬀerence quotient. Corollary 1.34 Let x ∈ C 1-var [0, T ] , Rd . Then piecewise linear approximations converge in 1-variation, x − xD

1-var;[0,T ]

→ 0 as |D| → 0

if and only if x ∈ C 0,1-var [0, T ] , Rd . Proof. “=⇒” : Any 1-variation limit of piecewise linear approximation is . absolutely continuous and hence in C 0,1-var “ ⇐= ” : Fix ε > 0, and x ∈ C 0,1-var [0, T ] , Rd . From the very deﬁnition of this space there exists a smooth path y such that |x − y|1-var;[0,T ] ≤

ε . 3

We claim that for all dissections D with small enough mesh (depending on y and ε), ε |y − y D |1-var;[0,T ] < . 3

36

Continuous paths of bounded variation

Indeed, this follows from Proposition 1.24 and the computation t i + 1 y˙ − y˙ D 1 y˙ s − yt i ,t i + 1 ds for D = {ti } ⊂ [0, T ] = L [0,1] ti+1 − ti ti i ti + 1 = |y˙ (s) − y˙ (ξ i )| ds with ξ i ∈ (ti , ti+1 ) ti

i

≤

|¨ y |∞

ti + 1

ti

i

|s − ξ i | ds ≤ |¨ y |∞ |D| T. D

By the triangle inequality and the contraction property of (·) as linear map from C 1-var [0, T ] , Rd into itself, see (1.4), we have x − xD ≤ |x − y|1-var;[0,T ] + y − y D 1-var;[0,T ] 1-var;[0,T ] + xD − y D 1-var;[0,T ] ≤ 2 |x − y| + y − y D 1-var;[0,T ]

≤

1-var;[0,T ]

ε

and this ﬁnishes the proof. Corollary 1.35 The space C 0,1-var [0, T ] , Rd is a separable Banach space (and hence Polish). Proof. Let Dn be the dyadic dissection {T k/2n : i = k, . . . , 2n } and deﬁne Ωn to be the set of paths from [0, T ] to Rd , linear on the dyadic times in Qd . Then, Ω := n Ωn intervals of Dn with values at dyadic is a countable set. If x ∈ C 0,1-var [0, T ] , Rd and ε > 0, there exists n such that x − xD n 1-var < ε/2. It is then easy to ﬁnd y ∈ Ωn such that D x n − y < ε/2, which proves that Ω is dense in C 0,1-var [0, T ] , Rd . 1-var This shows that C 0,1-var [0, T ] , Rd is separable.

1.3.4 Lipschitz continuity

older paths. We now turn to C 1-H¨o l [0, T] , Rd , the set of Lipschitz or 1-H¨ It includes, for instance, C 1 [0, T ] , Rd and elementary examples (e.g. t → |t|) show that this inclusion is strict. Proposition 1.36 C 1-H¨o l [0, T ] , Rd is Banach with norm x → |x (0)| + |x|1-H¨o l;[0,T ] . The closed subspace of paths in C 1-H¨o l [0, T ] , Rd started at 0, is also Banach under x → |x|1-H¨o l;[0,T ] . These Banach spaces are not separable. Proof. Non-separability follows from Example 1.26 together with |x|1-var;[0,T ] ≤ |x|1-H¨o l;[0,T ] or using the (well-known) non-separability of

1.3 Continuous paths of bounded variation on Rd

37

L∞ [0, T ] , Rd in conjunction with Proposition 1.37 below. All other parts of the proof are straightforward and left to the reader.

· Proposition 1.37 The map y → 0 yt dt is a Banach space isomorph from L∞ [0, T ] , Rd → C01-H¨o l [0, T ] , Rd . As a consequence, x ∈ C 1-H¨ol [0, T ] , Rd if and only if there exists a (uniquely determined) x˙ ∈ L∞ [0, T ] , Rd such that · x ≡ x0 + x˙ t dt 0

˙ L ∞ holds. and in this case the Banach isometry |x|1-H¨o l = |x| Proof. Similar to Proposition 1.31 and left to the reader. From general principles, any continuous path of ﬁnite 1-variation can be reparametrized to a 1-H¨older path. In the present context of Rd -valued paths this can be done so that the reparametrized path has constant speed. We have Proposition 1.38 Let x ∈ C 1-var [0, T ] , Rd , not constant. Deﬁne y (·) by y ◦ φ = x where φ(t) = |x|1-var;[0,t] / |x|1-var;[0,T ] . speed. More precisely, y is the Then y ∈ C 1-H¨o l [0, 1] , Rd has constant indeﬁnite integral of some y˙ ∈ L∞ [0, 1] , Rd and |y˙ (t)| ≡ |x|1-var;[0,T ] = |y|1-H¨o l;[0,1] for a.e. t ∈ [0, 1]. Proof. By the precise argument of the proof of Proposition 1.21, y is well-deﬁned and in C 1-H¨o l [0, 1] , Rd . From the very deﬁnition of y and invariance of 1-variation under reparametrization we have |y|1-var;[0,φ(t)] = |x|1-var;[0,t] = cφ(t) where c = |x|1-var;[0,T ] . On the other hand, by Propositions 1.37 and 1.31, y is the indeﬁnite integral of some y˙ ∈ L∞ [0, 1] , Rd and |y|1-var;[0,φ(t)] =

φ(t)

|y˙ (s)| ds. 0

It follows that |y| ˙ ≡ c almost surely. At last, the equality c = |y|1-H¨o l;[0,1] was noted in Remark 1.22.

38

Continuous paths of bounded variation

Remark 1.39 More generally, the proof shows that x can be reparame˙ ≡ 1 almost trized to y ∈ C 1-H¨o l [0, c] , Rd with unit speed, i.e. |y| surely. The reader will notice that the continuous embedding C 1-H¨o l [0, T ] , Rd → C 1-var [0, T ] , Rd is a consequence of the trivial estimate |x|1-var;[0,T ] ≤ |x|1-H¨o l;[0,T ] T. ol [0, T ] , Rd , As in the previous section it is natural to consider C 0,1-H¨ deﬁned as the closure of smooth paths in C 1-H¨o l [0, T ] , Rd . The resulting closure is a space we have already encountered. Proposition 1.40 The closure of smooth paths in C 1-H¨o l [0, T ] , Rd equals C 1 [0, T ] , Rd . Proof. Let us ﬁrst observe that the norm x → |x0 | + supt∈[0,T ] |x˙ t | on C 1 [0, T ] , Rd makes C 1 [0, T ] , Rd a Banach space. To avoid trivialities (norms vs semi-norms), let us assume that all paths are null at 0. Using C 1 [0, T ] , Rd ∼ = ⊕di=1 C 1 ([0, T ] , R) and similar for C 1-H¨o l it suﬃces to consider d = 1. Given a smooth path x : [0, T ] → R with x (0) = 0 we ﬁrst show that |x|1-H¨o l ≡

sup s,t∈[0,T ]

|x (t) − x (s)| |t − s|

equals

sup |x˙ t | . t∈[0,T ]

Indeed, from |x (t + h) − x (h)| ≤ |x|1-H¨o l h we see that |x˙ t | ≤ |x|1-H¨o l for all t ∈ [0, T ] while the converse estimate follows from the intermediate value theorem, |x (t) − x (s)| t−s

=

|x˙ (ξ)|

for ξ ∈ (s, t)

≤

|x| ˙ ∞;[0,T ] .

Any sequence (xn ) of smooth paths which converges (in 1-H¨ older norm) to some path x is also Cauchy in 1-H¨older. By the previous argument, it is ˜ ∈ C 1 ([0, T ] , R). also Cauchy in C 1 ([0, T ] , R) and so converges to some x 1 Since both 1-H¨ older and C -norm imply pointwise convergence we must have x = x ˜ ∈ C 1 ([0, T ] , R) and the proof is ﬁnished.

1.4 Sobolev spaces of continuous paths of bounded variation

39

1.4 Sobolev spaces of continuous paths of bounded variation 1.4.1

Paths of Sobolev regularity on Rd

We saw in Proposition 1.31 that a path x is in C 0,1-var [0, T ] , Rd if and only if · x0 + x˙ t dt

0

0

1

d

∞

d

˙ L 1 . We then saw, with x˙ ∈ L [0, T ] , R and in this case |x|1-var = |x| Proposition 1.37, that a path x is Lipschitz, in symbols x ∈ C 1-H¨o l ([0, T ] , Rd , if and only if · x0 + x˙ t dt ˙ L ∞ . This suggests with x˙ ∈ L [0, T ] , R and in this case |x|1-H¨o l = |x| considering the following path spaces. Deﬁnition 1.41 For p ∈ [1, ∞] , we deﬁne W 1,p [0, T ] , Rd to be the space of Rd -valued functions on [0, T ] of the form · x (·) = x0 + ydt (1.5) 0

with y ∈ Lp [0, T ] , Rd . Writing x˙ instead of y we further deﬁne ˙ L p ;[0,T ] = |x|W 1 , p ;[0,T ] := |x|

T

1/p p

|x| ˙ du

.

0

The set of such paths with x0 = o ∈ Rd is denoted by Wo1,p [0, T ] , Rd . As always, [0, T ] may be replaced by any other interval [s, t] ⊂ R. It is clear from the deﬁnition that W 1,1 = C 0,1-var and hence (Proposition 1.32) precisely the set of absolutely continuous paths, while W 1,∞ is precisely the set of Lipschitz or 1-H¨older paths. It is also clear from the usual inclusions of Lp -spaces that W 1,∞ ⊂ W 1,p ⊂ W 1,1 . In particular, any path in W 1,p is absolutely continuous (and then of course of bounded variation). Proposition 1.42 The space W 1,p [0, T ] , Rd is a Banach space under the norm x → |x0 | + |x|W 1 , p ;[0,T ] . The closed subspace of paths in W 1,p [0, T ] , Rd started at 0, is also Banach under x → |x|W 1 , p ;[0,T ] . These Banach spaces are separable if and only if p ∈ [1, ∞).

40

Continuous paths of bounded variation

Proof. Since Lp ⊂ L1 , we can use Proposition 1.31 to see that the map ˙ L p [0,T ] . The closed x → x˙ is well-deﬁned, as is its norm x → |x0 | + |x| 1,p d subspace of paths in W [0, T ] , R started at 0 is isomorphic (as normed space) to Lp [0, T ] , Rd and hence Banach. The separability statement now follows from well-known facts about Lp -spaces. Exercise 1.43 Let p ∈ [1, ∞] and recall that we equipped W 1,p [0, T ] , Rd with Banach norm |x (0)| + |x| ˙ L p ;[0,T ] . ˙ L p ;[0,T ] , for all Show that an equivalent norm is given by |x|L q ;[0,T ] + |x| q ∈ [1, ∞]. Solution. Lp -control of x˙ gives a modulus for x and in particular |x0,t | ≤ |x|W 1 , p ;[0,T ] t1−1/p where 1/p = 0 for p = ∞.Using |xt | ≤ |x0 | + |x0,t | one controls the supremum of x over t ∈ [0, T ] and then any Lq -norm. Every path in W 1,p ⊂ W 1,1 is continuous and of ﬁnite 1-variation. For p = ∞, such paths are Lipschitz or 1-H¨ older continuous; more precisely |xs,t | ≤ |x|W 1 , ∞ ;[s,t] |t − s|. Observe that the right-hand side is a control so that |xs,t | in the above estimate can be replaced by |x|1-var;[s,t] . In the following theorem we see that a similar statement holds true for all p > 1. Theorem 1.44 Let p ∈ (1, ∞). Given x ∈ W 1,p [0, T ] , Rd , ω (s, t) = |x|W 1 , p ;[s,t] (t − s)

1−1/p

deﬁnes a control function on [0, T ] and we have |x|1-var;[s,t] ≤ ω (s, t) for all s < t in [0, T ]. In particular, we have the continuous embedding W 1,p [0, T ] , Rd → C 1-var [0, T ] , Rd . Proof. Without loss of generality x0 = 0. By Proposition 1.31, x is the older’s indeﬁnite integral of some x˙ ∈ L1 . Deﬁne α = 1 − 1/p. Using H¨ inequality with conjugate exponents p and 1/α |xs,t | ≤

t

α

|x˙ r | dr ≤ (t − s) s

=

|x|W 1 , p ;[s,t] (t − s)

=

ω (s, t) .

t

1/p p

|x˙ r | dr s

α

We show that ω is a control. Continuity of ω is obvious from the fact that p p ˙ , over [s, t]. |x|W 1 , p ;[s,t] is the integral of an integrable function, namely |x| Only super-additivity, ω (s, t) + ω (t, u) ≤ ω (s, u) with s ≤ t ≤ u, remains

1.4 Sobolev spaces of continuous paths of bounded variation

41

to be shown. From H¨older’s inequality with conjugate exponents p and p/ (p − 1) = 1/α we obtain α

α

≤

|x|W 1 , p ;[s,t] (t − s) + |x|W 1 , p ;[t,u ] (u − t) 1/p ! "(p−1)/p p p α p α p (t − s) p −1 + (u − t) p −1 |x|W 1 , p ;[s,t] + |x|W 1 , p ;[t,u ]

=

|x|W 1 , p ;[s,u ] (u − t) .

α

By Proposition 1.11, we conclude that |x|1-var;[s,t] ≤ ω (s, t). In particular, |x|1-var;[0,T ] ≤ ω (0, T ) = |x|W 1 , p ;[0,T ] T 1−1/p which gives the continuous embedding. d Proposition 1.45 Let p ∈ (1, ∞). A function x : [0, T ] → R is in 1,p d [0, T ] , R if and only if Mp (x) < ∞ where W

Mp (x)

:

=

sup (t i )∈D([0,T ])

=

lim

xt

i ,t i + 1

p p−1

i

sup

|ti+1 − ti | xt i ,t i + 1 p

δ →0 (t i )∈Dδ ([0,T ])

|ti+1 − ti |

i

p−1

and in this case p

|x|W 1 , p ;[0,T ] = Mp (x) . Proof. Without loss of generality x0 = 0 and we assume x ∈ W 1,p is the older’s inequality gives indeﬁnite integral of some x˙ ∈ Lp . Then, H¨ xt

≤ |ti+1 − ti |1/p i ,t i + 1

ti + 1

1/p p

|x˙ u | du

ti

where 1/p + 1/p = 1. It immediately follows that

T

Mp (x) ≤ 0

p

p

|x| ˙ du = |x|W 1 , p ;[0,T ] .

(1.6)

Conversely, suppose that Mp (x) < ∞; given s1 < t1 ≤ s2 < t2 ≤ · · · < older’s inequality yields sn < tn in [0, T ] , H¨ n

|xt i − xs i |

=

i=1

n |xt i − xs i | 1/p

i=1

≤

|ti+1 − ti | 1/p

(Mp (x))

i

|ti+1 − ti |

1/p

1/p |ti+1 − ti |

Continuous paths of bounded variation

42

which shows that x is absolutely continuous, hence precisely in C00,1-var , and (Proposition 1.32) the indeﬁnite integral of some x˙ ∈ L1 [0, T ]. We

T p show that x˙ ∈ Lp [0, T ], with 0 |x˙ u | du bounded by Mp (x). Let Dn = i Dn → x in 1-variation norm, and n T : i = 0, . . . , n . By Corollary 1.34, x therefore we have the convergence Dn

x˙

n n = x ( i −1 ) T n T i=1

, inT

By passing to a subsequence

1[ ( i −1 ) T n

, inT

1 ) → x˙ ∈ L [0, T ] .

˜ k = (Dn ) we can achieve that D k

˜

x˙ tD k →k →∞ x˙ t for almost every t ∈ [0, T ] with respect to Lebesgue measure. By Fatou’s lemma we then see that T T D˜ k p p |x˙ u | du ≤ lim inf x˙ t du k →∞ 0 0 xt ,t p i i+ 1 = lim inf p−1 k →∞ ˜ k |ti+1 − ti | i:t i ∈D xt ,t p i i+ 1 ˜ ≤ lim sup p−1 =: Mp (x) . δ →0 |D |≤δ |t − ti | i:t i ∈D i+1 p ˜ p (x) and with the trivRecalling (1.6) we get Mp (x) ≤ |x|W 1 , p ;[0,T ] ≤ M ˜ p (x) ≤ Mp (x) we must have equality throughout. This ﬁnishes the ial M proof.

1.4.2

Paths of Sobolev regularity on metric spaces

We already remarked that W 1,1 (resp. W 1,∞ ) coincides with the set of absolutely continuous (resp. 1-H¨older) paths and this kind of regularity only requires paths with values in an abstract metric space (E, d). Proposition 1.45 suggests how to deﬁne W 1,p -regularity in a metric setting. Although we shall only need p = 2 in later chapters (in particular, in our discussions of large deviations), the case p ∈ (1, ∞) is covered without extra eﬀort and has applications in large deviation-type results for diﬀusions on fractals (see comments below). Deﬁnition 1.46 For p ∈ (1, ∞) we deﬁne W 1,p ([0, T ] , E) as those paths x : [0, T ] → (E, d) for which 1/p d xt i , xt i + 1 p sup < ∞. |x|W 1 , p ;[0,T ] := p−1 |ti+1 − ti | (t i )∈D([0,T ]) i The subset of paths started at o ∈ E is denoted by Wo1,p ([0, T ] , E). As always, [0, T ] may be replaced by any other interval [s, t].

1.4 Sobolev spaces of continuous paths of bounded variation

43

We now give a generalization of Theorem 1.44. Theorem 1.47 For any x ∈ W 1,p ([0, T ] , E) we have for all s, t ∈ [0, T ] , 1−1/p

d (xs , xt ) ≤ |x|1-var;[s,t] ≤ |x|W 1 , p ;[s,t] (t − s)

.

(1.7)

In particular, W 1,p ([0, T ] , E) ⊂ C 1-var ([0, T ] , E) . Proof. From the very deﬁnition of |x|W 1 , p ;[0,T ] we have p

p

d (xs , xt ) ≤ |x|W 1 , p ;[s,t] |t − s|

p−1

and the estimate on d (xs , xt ) follows. We then show, exactly as in the proof of Theorem 1.44, that the map 1−1/p

(s, t) → |x|W 1 , p ;[s,t] (t − s)

(1.8)

is super-additive and the estimate on |x|1-var;[s,t] follows by Proposition 1.11. Remark 1.48 To see that (1.8) is actually a control function, one would have to undergo a similar continuity consideration as in Propo sition 1.12. As in the case of 1-variation (cf. Proposition 1.14) it is enough in the deﬁnition of |x|W 1 , p ;[0,T ] to look at dissections with small mesh. Proposition 1.49 For every x ∈ C ([0, T ] , E), d xt i , xt i + 1 p p |x|W 1 , p ;[0,T ] = lim sup p−1 ∈ [0, ∞] . δ →0 (t i )∈Dδ ([0,T ]) |ti+1 − ti | i:t i ∈D Proof. We assume |x|W 1 , p ;[0,T ] < ∞, leaving the case |x|W 1 , p ;[0,T ] = ∞ to the reader. It suﬃces to show that, for any s < t < u in [0, T ], p

|d (xs , xu )| |u − s|

p−1

p

≤

|d (xs , xt )| p−1

|t − s|

p

+

|d (xt , xu )| |u − t|

p−1

(1.9)

˜ as this will allow us to replace a given dissection D with a reﬁnement D with ˜ D < δ. (We used a similar argument in the proof of Proposition 1.14.) To p

this end, recall the elementary inequality (θa + (1 − θ) b) ≤ θap +(1 − θ) bp for a, b > 0 and θ ∈ (0, 1). Replacing θa by a and (1 − θ) b by b gives p

(a + b) ≤

ap θp−1

+

bp (1 − θ)

p−1

and this implies (1.9) with θ = (t − s) / (u − s) and d (xs , xu ) ≤ d (xs , xt ) + d (xt , xu ) ≡ a + b.

44

Continuous paths of bounded variation

Exercise 1.50 As usual, let C ([0, T ] , E) be equipped with the uniform topology. Let p ∈ (1, ∞). (i) Show that p

x ∈ C ([0, T ] , E) → Mp (x) := |x|W 1 , p ;[0,T ] ∈ [0, ∞] is lower semi-continuous. (ii) Assume that E has the Heine–Borel property, i.e. bounded sets have compact closure. Show that the level sets {x ∈ Co ([0, T ] , E) : Mp (x) ≤ Λ} with Λ ∈ [0, ∞) and o ∈ E are compact. (Hint: Arzela–Ascoli.) Solution. (i) Assume xn → x uniformly (or even pointwise) on [0, T ] and ﬁx a dissection D ⊂ [0, T ]. Then p d xnti , xnti + 1 d xt i , xt i + 1 p ≤ lim inf Mp (xn ) p−1 = lim ninf p−1 →∞ n →∞ |ti+1 − ti | |ti+1 − ti | i:t i ∈D i:t i ∈D and taking the sup over all dissections ﬁnishes the proof. (ii) By (i) it is clear that level sets are closed. Thanks to Theorem 1.47 we know that Mp (x) ≤ Λ implies 1−1/p

d (xs , xt ) ≤ Λ1/p (t − s)

with equicontinuity and boundedness of {x ∈ Co ([0, T ] , E) : Mp (x) ≤ Λ}. Conclude with Arzela–Ascoli.

1.5 Comments Continuous paths of ﬁnite variation, also known as rectiﬁable paths, arise in many areas of analysis and geometry. Ultimately the focus of this book is on non-rectiﬁable paths and so we avoid the notion of rectiﬁability altogether. Topics such as absolute continuity of real-valued functions on R, the fundamental theorem of calculus for Lebesgue integrals or the Radon– Nikodym theorem are found in many textbooks on real analysis such as Rudin [149], Folland [55] or Driver [45]. The interplay between variation, H¨ older and W 1,p -spaces was studied in Musielak and Semadeni [132]; in particular, the authors of [132] attribute Proposition 1.45 to Riesz. A nice martingale proof of this can be found in Revuz and Yor [143]. The extension of W 1,p -regularity to paths in metric spaces is not for the sake of generality but arises, for instance, in the context of sample path large deviation(-type) estimates for symmetric diﬀusions; see Bass and Kumagai [6] and more speciﬁcally our later discussion of large deviation for Markov processes lifted to rough paths, Section 16.7.

2 Riemann–Stieltjes integration In this chapter we give a brief exposition of the Riemann–Stieltjes integral and its basic properties.

2.1 Basic Riemann–Stieltjes integration We will use the notation L Rd , Re for the space of linear maps from Rd into Re . We will always equip this space with its operator norm, that is if f ∈ L Rd , Re , then |f | = sup |f x|Re . x∈Rd |x|Rd =1

Deﬁnition y be two functions from [0, T ] into Rd and d e 2.1 Let x and n L R , R . Let Dn = (ti : i) be a sequence of dissections of [0, T ] with # D −1 |Dn | → 0, and ξ ni some points in tni , tni+1 . Assume i=0 n y (ξ ni ) xt ni ,t ni+ 1 converges when n tends to ∞ to a limit I independent of the choice of ξ ni and the sequence (Dn ). Then we say that the Riemann–Stieltjes integral of y against x (on [0, T ]) exists and write T T ydx := yu dxu := I. 0

0

We call y the integrand and x the integrator. Of course, [0, T ] may be replaced by any other interval [s, t]. Proposition 2.2 Let x ∈ C 1-var [0, T ] , Rd and y : [0, T ] → L Rd , Re

T piecewise continuous.1 Then the Riemann–Stieltjes integral 0 ydx exists, is linear in y and x, and we have the estimate T ydx ≤ |y|∞;[0,T ] |x|1-var;[0,T ] . 0 Moreover,2 t yu dxu − 0 1 This

0

s

t

yu dxu for all 0 ≤ s < t ≤ T .

yu dxu =

(2.1)

s

will cover all our applications. integrals in (2.1) are understood in the sense of Deﬁnition 2.1 with [0, T ] replaced by the intervals [0, t] , [0, s] , [s, t] respectively. 2 All

Riemann–Stieltjes integration

46

Proof. Let us say that a real-valued function y is dx-integrable (on the

T ﬁxed time interval [0, T ]) if the Riemann–Stieltjes integral 0 ydx exists. Step 1: Step-functions, i.e. functions of the form g (t) = a0 1[0,t 1 ] +

n −1

ai 1(t i ,t i + 1 ] (t) ,

i=1

with 0 < t1 < · · · < tn = T and ai ∈ L Rd , Re , are dx-integrable and

T

gdx = 0

n −1

ai xt i ,t i + 1 .

i=0

Step 2: The set of dx-integrable functions is a linear space, i.e. if g and h are dx-integrable, then so is αg + βh, with α, β ∈ R which readily implies that T

T

(αg + βh) dx = α 0

T

gdx + β 0

hdx. 0

Step 3: If y is dx-integrable then T ydx ≤ |y|∞;[0,T ] |x|1-var;[0,T ] . 0 Step 4: The space of dx-integrable functions is closed in supremum topology on [0, T ]. Indeed, assume (y n ) is a sequence of dx-integrable functions such that |y − y n |∞;[0,T ] → 0 as n → ∞. By steps 2 and 3, T T y n dx − y m dx ≤ |yn − ym |∞;[0,T ] |x|1-var;[0,T ] 0 0

T and so In = 0 y n dx deﬁnes a Cauchy sequence whose limit we denote by of [0, T ] with mesh |Dn | → 0 I. Let Dn = (tni )i be a sequence of dissections and ξ ni an arbitrary point in tni , tni+1 for all i, n. Then, # D −1 #D m −1 m m m m n y (ξ i ) xt i ,t i + 1 ≤ |I − In | + (y (ξ i ) − y (ξ i )) xt i ,t i + 1 I − i=0 i=0 #D m −1 y n (ξ m ) x + In − t i ,t i + 1 i i=0

≤ |I − In | + |y − y n |∞;[0,T ] |x|1-var;[0,T ] #D m −1 + In − y n (ξ m i ) xt i ,t i + 1 . i=0

(2.2)

2.1 Basic Riemann–Stieltjes integration

47

Fixing ε > 0 we can pick n large enough so that |I − In | + |y − y n |∞;[0,T ] |x|1-var;[0,T ] < ε/2. Then, since y n is dx-integrable, there exists M > 0 such that m > M implies that (2.2) < ε/2 and hence #D m −1 y (ξ m I − i ) xt i ,t i + 1 < ε. i=0

T But this shows precisely that the Riemann–Stieltjes integral 0 ydx exists and so y is dx-integrable. Step 5: Any y ∈ C [0, T ] , L Rd , Re is dx-integrable. Indeed, take ξ ni ∈ n n ti , ti+1 where Dn = (tni )i is as in the previous step and set

(# D n )−1

y n (t) := y (ξ n0 ) 1[0,t 1 ] (t) +

y (ξ ni ) 1(t i ,t i + 1 ] (t) .

i=1

It then suﬃces to observe that limn →∞ |y − y n |∞;[0,T ] = 0 because y is uniformly continuous on [0, T ] and we conclude with step 4. If y is only piecewise continuous (i.e. bounded with ﬁnitely many points of discontinuthat it contains all points of discontinuity. ity) it suﬃces to choose Dn such ˜ ∈ C 1-var [0, T ] , Rd , the last step shows that any y ∈ Step 6: Given x, x C [0, T ] , L Rd , Re is d (αx + β x ˜)-integrable, for any α, β ∈ R. This easily implies linearity of T ydx x ∈ C 1-var [0, T ] , Rd → 0

T and in conjunction with step 2 we obtain bilinearity of (x, y) → 0 ydx. Step 7: Fix s, t with 0 ≤ s < t ≤ T . If y is piecewise continuous then so is 1[0,t] yu and t T 1[0,t] (u) yu dx = yu dx. 0

0

Relation (2.1) then follows from 1[0,t] yu = 1[0,s] (u) yu + 1(s,t] (u) yu . The details are left to the reader. Exercise 2.3 Assume x ∈ C 1 [0, T ] , Rd and y ∈ C [0, T ] , L Rd , Re . Show that T T yu dxu = yu x˙ u du. 0

0

We then have the classical integration-by-parts formula. It can be obtained by a simple passage to the limit in an elementary partial summation formula for ﬁnite sums. The details are left to the reader.

Riemann–Stieltjes integration

48

Proposition 2.4 (integration by parts) Let x ∈ C 1-var [0, T ] , Rd and y ∈ C 1-var [0, T ] , L Rd , Re . Then

T

T

(dyu ) dxu = yT xT − y0 x0 .

yu dxu + 0

0

Exercise 2.5 Take (x, y) ∈ C 1-var [0, T1 ] , Rd × C [0, T1 ] , L Rd , Re and assume φ a continuous non-decreasing function φ from [0, T2 ] onto [0, T1 ]. Show that

t

yφ(·) d (x ◦ φ) = 0

φ(t)

ydx for all t ∈ [0, T2 ] . 0

Exercise 2.6 Let x ∈ C 1-var [0, T ] , Rd , φ a C ∞ function from R into

∞ R+ , compactly supported on [−1, 1] with −∞ φ (u) du = 1. Deﬁne Φt =

t φ (u) du and extend x to a continuous function from R into Rd by −∞ setting x ≡ x0 on (−∞, 0) and x ≡ xT on [T, ∞). Deﬁne for all ε > 0 the molliﬁer approximation to x by Φ(t−s)/ε dxs . xε : t ∈ [0, T ] → x0 + R

Show that (i) for all ε > 0, xε is inﬁnitely diﬀerentiable; (ii) for all ε > 0, |xε |1-var;[0,T ] ≤ |x|1-var;[0,T ] , and also |xε |1-H¨o l;[0,T ] ≤ |x|1-H¨o l;[0,T ] ; (iii) xε converges to x in supremum topology when ε tends to 0. Solution. (i) One can easily see that, for n ≥ 1, the nth derivative of xε

−d dxs , where Φ(n ) is the nth derivative of Φ. (ii) the is t → R ε Φ(n ) t−s ε ε 1-variation of x is given by T 1 (1) t − s ε dxs dt |x |1-var = εΦ ε 0 R 1 ( 1 ) t − s ≤ Φ . |dxs | dt ε R R ε 1 (1) t − s Φ dt . |dxs | ≤ ε R ε R ≤ |dxs | = |x|1-var;[0,T ]. R

The 1-H¨ older bound follows from integration by parts, 1 t−s 1 ds = xs φ xt+ εs φ (−s) ds. xεt = ε ε −1 R

(2.3)

2.2 Continuity properties

49

(iii) As x is continuous (and hence uniformly continuous), sup

lim

ε→0 s,t∈[0,1]×[0,T ]

|xt+ εs − xt | = 0,

and (2.3) implies that limε→0 supt∈[0,T ] |xεt − xt | = 0.

2.2 Continuity properties

Proposition 2.2 obviously implies that (x, y) → ydx, viewed as a map from C 1-var [0, T ] , Rd × C [0, T ] , L Rd , Re → C 1-var ([0, T ] , Re ) , is a bounded, bilinear map and hence continuous (and even Fr´echet smooth) in the respective norms.3 In particular, |y n − y|∞;[0,T ] → 0, |xn − x|1-var;[0,T ] → 0 implies that

·

y dx → n

0

n

·

ydx 0

in 1-variation. However, this is not the last word on continuity. For instance, the seemingly harmless assumption that all xn are piecewise smooth would already force us to restrict attention to x absolutely continuous (cf. Proposition 1.29). We thus formulate continuity statements that are applicable under the weaker assumption of uniform convergence with uniform 1-variation bounds. functions Proposition 2.7 Let y n , y : [0, T ] → L Rd , Re be continuous n n 1-var [0, T ] , Rd and and assume y → y uniformly. Assume x , x ∈ C xn → x uniformly with sup |xn |1-var;[0,T ] < ∞. n

Then

t

y dx → n

0

t

ydx uniformly for t ∈ [0, T ] .

n

0

1 -va r [0, T ] , Rd but a that what we call |·|1d-va r is a only a semi-norm on C 1 -va r genuine norm on C 0 [0, T ] , R and we can obviously assume x (0) = 0 as only dx is of interest. Alternatively, deﬁne an equivalence relation on C 1 -va r [0, T ] , Rd by setting 1 -va r , is Banach x ∼ y iﬀ t → xt − y t is constant; the resulting quotient space, say C

1 -va r. 1 -va r under |·|1 -va r and viewing (f, x) → f dx as a map C × C →C 3 Observe

Riemann–Stieltjes integration

50

Proof. Set c = supn |xn |1-var;[0,T ] . Then t t n n y dx − ydx ≤ 0

t t t n n n (y − y) dx + ydx − ydx 0 0 0 t t n n ydx − ydx ≤ c |y − y|∞;[0,T ] +

0

0

0

and so it is enough to show t t n ydx − ydx → 0 0

(2.4)

0

uniformly in t ∈ [0, T ] as n → ∞. Fix ε > 0 and pick m = m (ε) such that supn |xn |1-var;[0,T ] m

< ε/2.

Then, from uniform continuity of y on [0, T ], we can ﬁnd a dissection D = (ti ) such that the step function

(# D ) D

y (t) :=

y (ti−1 ) 1[t i −1 ,t i ) (t)

i=1

satisiﬁes y − y D ∞ ≤ 1/m. Observe that (using Lemma 1.15) |x|1-var;[0,T ] / m < ε/2. We estimate the

left-hand side of (2.4) by adding/subtracting the integrals y D dxn and y D dx. This leaves us with three terms, of which the ﬁrst two are dealt with by t t D D n y − y dx + sup y − y dx ≤ ε. sup sup n t∈[0,T ]

t∈[0,T ]

0

0

On the other hand, y D is constant over the (ﬁnitely many) intervals [ti , ti+1 ). Fix t ∈ [0, T ] and let tD ∈ D be the largest point in D for which tD ≤ t. Then t D n n + yt D xt D ,t − xntD ,t y d (x − x ) = yt i −1 xt i −1 ,t i − xt i −1 ,t i 0

i

where the sum that

i

runs over all integers i ≥ 1 for which ti−1 ≤ t. It follows

t D n y d (x − x ) ≤ (#D) × 2 |x − xn |∞;[0,T ] sup

t∈[0,T ]

0

(2.5)

2.2 Continuity properties

51

where #D denotes the number of points in D, dependent on m and hence on ε. It follows that t t n ydx − ydx ≤ ε + (#D) × 2 |x − xn |∞;[0,T ] . 0

0

Using |x − xn |∞;[0,T ] → 0 as n → ∞ it follows that t t lim sup ydxn − ydx ≤ ε n →∞ 0

0

and we conclude by sending ε ↓ 0. Another useful property of Riemann–Stieltjes integration is uniform continuity on bounded sets. and x, x ∈ C 1-var 2.8 Let y, y ∈ C [0, T ] , L Rd , Re Proposition [0, T ] , Rd . Then, · · ydx − y dx 0

≤

0

1-var;[0,T ]

|x|1-var;[0,T ] . |y − y |∞;[0,T ] + |y |∞;[0,T ] . |x − x |1-var;[0,T ] .

In particular, the map (x, y) ∈ C 1-var [0, T ] , Rd ×C [0, T ] , L Rd , Re →

· ydx ∈ C 1-var ([0, T ] , Re ) is locally Lipschitz. 0 Proof. It suﬃces to insert and subtract inequality.

· 0

y dx, followed by the triangle

applications, integrands frequently come in the form ϕ (xt ) ∈ L In Rd , Re for ϕ : Rd → L Rd , Re or V (yt ) for an Re -valued path y and V : Re → L Rd , Re . With focus on the latter, we state the following uniform continuity property; the simple proof is left to the reader. Corollary 2.9 Let x, x ∈ C 1-var [0, T ] , Rd , y, y ∈ C ([0, T ] , Re ) and V : Re → L Rd , Re continuous. Assume |x|1-var;[0,T ] , |x |1-var;[0,T ] , |y|∞;[0,T ] , |y |∞;[0,T ] < R and let ε > 0. Then there exists δ = δ (ε, R, V ) so that |x − x |1-var;[0,T ] + |y − y |∞;[0,T ] < δ

Riemann–Stieltjes integration

52

implies

· · V (y) dx − V (y ) dx 0

0

< ε. 1-var;[0,T ]

2.3 Comments Riemann–Stieltjes integration is discussed in many elementary analysis texts, for example Rudin [148] or Protter and Morrey [139].

3 Ordinary diﬀerential equations We develop the basic theory of ordinary diﬀerential equations of the form dy dxi = Vi (y) dt dt i=1 d

on a ﬁxed time horizon [0, T ]. Here, x and y are paths with values in Rd , Re respectively and we have coeﬃcients Vi : Re → Re , often viewed as “driving” vector ﬁelds on Re . When the driving signal x is continuously diﬀerentiable we are dealing with an example of a (time-inhomogenous) ordinary diﬀerential equation. We give a direct existence proof, via Euler approximations, that applies to continuous, ﬁnite-variation driving signals and continuous vector ﬁelds; uniqueness holds for Lipschitz continuous vector ﬁelds.

3.1 Preliminaries Given a collection of (continuous) vector ﬁelds V = (V1 , . . . , Vd ) on Re and continuous, ﬁnite-variation paths x, y with values in Rd , Re we set

t

V (y) dx := 0

d i=1

t

Vi (y) dxi .

0

From the point of view of vector-valued Riemann–Stieltjes integration, this amounts precisely to viewing V as a map d Vi (y) ai } ∈ L Rd , Re , y ∈ Re → {a = a1 , . . . , ad → i=1

where L Rd , Re is equipped with the operator norm, so that d sup Vi (y) ai . |V (y)| := |V (y)|op := a∈Rd :|a|=1

(3.1)

i=1

Deﬁnition 3.1 A collection of vector ﬁelds V = (V1 , . . . , Vd ) on Re , viewed e d e as V : R → L R , R , is called bounded if |V |∞ := sup |V (y)| < ∞. y ∈Re

Ordinary diﬀerential equations

54

For any U ⊂ Re we deﬁne the 1-Lipschitz norm (in the sense of E. M. Stein) by # $ |V (y) − V (z)| , sup |V (y)| . sup |V |Lip 1 (U ) := max |y − z| y ∈U y ,z ∈U :y = z We say that V ∈ Lip1 (Re ) if |V |Lip 1 ≡ |V |Lip 1 (Re ) < ∞ and locally 1Lipschitz if |V |Lip 1 (U ) < ∞ for all bounded subsets U ⊂ Re . (The concept of Lip1 regularity will later be generalized to Lipγ in the sense of E. M. Stein.) Observe that 1-Lipschitz paths are Lipschitz continuous paths that are bounded. We now state a classical analysis lemma. Lemma 3.2 (Gronwall’s lemma) Let x ∈ C 1-var [0, T ], Rd , and φ : [0, T ] → R+ a bounded measurable function. Assume that for all t ∈ [0, T ]

t

φ (t) ≤ K + L

φs |dxs |,

(3.2)

0

for some K, L ≥ 0. Then, for all t ∈ [0, T ] φt ≤ K exp L|x|1-var;[0,t] . If t → Kt is a non-negative, non-decreasing function, K may be replaced by Kt . Proof. After n iterated uses of (3.2) φ (t)

≤

t t1 t t n −1 K + KL |dxs | + · · · + KLn ··· |dxt n | · · · |dxt 1 | 0 0 0 0 t t1 t n −1 t n ··· φ (tn +1 ) |dxt n + 1 | · · · |dxt 1 |. +Ln +1 0

0

0

0

Since x is continuous, t 0

t1

···

0

t n −1

|dxt n | · · · |dxt 1 | =

0

|x|n1-var;[0,t] n!

.

Then,

φ (t) ≤ K exp L|x|1-var;[0,t] + |φ|∞;[0,T ]

n +1 L|x|1-var;[0,t] (n + 1)!

and sending n → ∞ gives the required estimate. The last statement, replacing K by some non-decreasing Kt , comes from the obvious remark that the previous estimate can be applied on the interval [0, t] with K = Kt .

3.2 Existence

55

3.2 Existence Let us ﬁrst deﬁne what we mean by solution of a (controlled, ordinary) diﬀerential equation: Deﬁnition 3.3 Given a collection of continuous vector ﬁelds V = (V1 , . . . , Vd ) on Re , a driving signal x ∈ C 1-var [0, T ] , Rd and an initial condition y0 ∈ Re , we write π (V ) (0, y0 ; x) for the set of all solutions to the ODE1 dyt =

d

Vi (yt ) dxit ≡ V (yt ) dxt

(3.3)

i=1

for t ∈ [0, T ] started at y0 . The above ODE is understood as a Riemann– Stieltjes integral equation, i.e. t V (ys ) dxs . y0,t := yt − y0 = 0

In case of uniqueness y = π (V ) (0, y0 ; x) denotes the solution. If necessary, π (0, y0 ; x) is only considered up to some explosion time. Similarly, π (V ) (s, ys ; x) stands for solutions of (3.3) started at time s from a point ys ∈ Re . We shall frequently describe π (V ) (0, y0 ; x) as an “ODE solution, driven by x along the vector ﬁelds V and started from y0 ”. Existence of a solution holds under minimal regularity conditions on the vector ﬁelds. Theorem 3.4 (existence) Assume that (i) V = (V1 , . . . , Vd ) is a collection of continuous, bounded vector ﬁelds on Re ; (ii) y0 ∈ Re is an initial condition; (iii) x is a path in C 1-var [0, T ] , Rd . Then there exists a (not necessarily unique) solution to the ODE (3.3). Moreover, for all 0 ≤ s < t ≤ T π (V ) (0, y0 ; x) ≤ |V |∞ |x|1-var;[s,t] . (3.4) 1-var;[s,t] Remark 3.5 In case of non-uniqueness, we abuse notation in the above estimate in the sense that π (V ) (0, y0 ; x) stands for an arbitrary solution to (3.3) started at y0 . Proof. Let D = (ti )i be a dissection of the interval [0, T ] , and deﬁne the Euler approximation y (D ) : [0, T ] → Re by # (D ) y 0 = y0 , (D )

yt 1 The

(D )

= yt i

(D )

+ V yt i

xt i ,t for t ∈ [ti , ti+1 ] .

ODE (3.3) is time-inhomogenous unless xt is proportional to t.

Ordinary diﬀerential equations

56

Then, it is easy to see that for all 0 ≤ s < t ≤ T (D ) (D ) ≤ |V |∞ |x|1-var;[s,t] . ys,t ≤ y 1-var;[s,t]

(3.5)

In particular, (D ) y

∞;[0,T ]

≤ |y0 | + |V |∞ |x|1-var;[0,T ] .

Moreover, if rD denotes the greatest real number in D less than r then t t! " (D ) ) (D ) V yr(D − V y dxr . V yr(D ) dxr = (3.6) y0,t − r D 0

0

of dissections, with mesh |Dn | → 0 as n tends to Now let (Dn) be a sequence ∞. Clearly y (D n ) is equicontinuous and bounded. From Arzela–Ascoli’s theorem we see that y (D n ) has at least one limit point y. After relabelling our sequence, we can assume that y (D n ) converges to y uniformly on [0, T ] . (D ) (D ) Fix r ∈ [0, T ]. From (3.5), limn →∞ yr n − yr D nn = 0. On the other hand, (D n )

yr

(D )

→ yr hence yr D nn → yr and by continuity of V, n) lim V yr(D n ) − V yr(D = 0. Dn n →∞

(3.7)

By dominated convergence,2 we can pass to the limit in (3.6) to see that t V (yr ) dxr = 0. y0,t − 0

Finally, for s, t ∈ [0, T ], for any solution y ∈ π (V ) (0, y0 ; x) , t |ys,t | = V (yu ) dxu

≤ s

=

s t

|V |∞ . |dxu |

|V |∞ |x|1-var;[s,t] .

The right-hand side being a control, we obtain inequality (3.4). If we only assume continuity of the vector ﬁelds (without imposing growth conditions) existence holds up to an explosion time: Theorem 3.6 Assume that (i) V = (V1 , . . . , Vd ) is a collection of continuous vector ﬁelds on Re ; 2 Which requires us to know that Riemann–Stieltjes integrals with continuous integrands coincide with Lebesgue–Stieltjes integrals.

3.2 Existence

57

(ii) y0 ∈ Re is an initial condition; (iii) x is a path in C 1-var [0, T ] , Rd . Then either there exists a (global) solution y : [0, T ] → Re to ODE (3.3) started at y0 or there exists τ ∈ [0, T ] and a (local) solution y : [0, τ ) → Re such that y is a solution on [0, t] for any t ∈ (0, τ ) and lim |y (t)| = +∞.

tτ

Proof. Without loss of generality take y0 = 0. Replace V by compactly supported vector ﬁelds V n which coincide with V on the ball {y : |y| ≤ n}. From the preceding existence theorem, there exists a (not necessarily unique) ODE solution y 1 := π (V 1 ) (0, y0 ; x) which we consider only up to time τ 1 = inf t ∈ [0, T ] : yt1 ≥ 1 ∧ T > 0. If τ 1 = T then y = y 1 is a solution on [0, T ] and we are done. Set τ 0 = 0. We now deﬁne τ n , yn inductively and assume τ n ∈ [0, T ] , y n ∈ C ([τ n −1 , τ n ] , Re ) have been deﬁned. We then deﬁne y n +1 := π (V n + 1 ) τ n , yτnn ; x as an (again, not necessarily unique) ODE solution started from y n +1 (τ n ) = yτnn driven by x along the vector ﬁelds V n +1 up to time τ n +1 = inf t ∈ [τ n , T ] : ytn +1 ≥ n + 1 ∧ T > 0. If at any step in this induction, τ n = T , then (0, T ] = ∪nk=1 (τ k −1 , τ k ] and y (t) = ytk for t ∈ (τ k −1 , τ k ] deﬁnes a solution on [0, T ] and we ﬁnd ourselves in case (i) of the statement of the theorem. Otherwise, we obtain an increasing sequence (τ n ) with τ = limn τ n →∞ ∈ (0, T ]. Any interval (0, t] ⊂ (0, τ ) can be covered by intervals (τ k −1 , τ k ] and a solution on (0, t] is constructed as above by setting y (t) = ytk for t ∈ (τ k −1 , τ k ]. Moreover, be deﬁnition of τ n we see that |y (τ n )| = n → ∞ as n → ∞ and the proof is ﬁnished. Theorem 3.7 Assume that (i) V = (V1 , . . . , Vd ) is a collection of continuous vector ﬁelds on Re of linear growth, i.e. ∃A ≥ 0 : |Vi (y)| ≤ A (1 + |y|) for all y ∈ Re ; (ii) y0 ∈ Re is an initial condition; (iii) x is a path in C 1-var [0, T ] , Rd , and ≥ A |x|1-var;[0,T ] . Then explosion cannot happen. Moreover, any solution y to (3.3) satisﬁes the estimates (3.8) |y|∞;[0,T ] ≤ (|y0 | + ) exp () ,

Ordinary diﬀerential equations

58

and, for all 0 ≤ s < t ≤ T , |y|1-var;[s,t] ≤ (1 + |y0 |) exp (2) A Proof. For s ∈ [0, min (τ (y) , T )), s V (yu ) dxu |ys | ≤ |y0 | + 0 s |dxu | + A ≤ |y0 | + A 0

t

|dxu | . s

s

|yu | . |dxu | . 0

Hence, by Gronwall’s inequality, for all s ∈ [0, min (τ (y) , T )), s s |dxu | exp A |dxu | . |ys | ≤ |y0 | + A 0

(3.9)

0

This implies in particular that explosion cannot happen in ﬁnite time, and that s s |dxu | exp A |dxu | . |y|∞;[0,s] ≤ |y0 | + A 0

0

Let us now take s, t ∈ [0, T ]. Clearly, for all u, v ∈ [s, t] , v v yu ,v = V (yr ) dxr = V (yu + yu ,r ) dxr u

u

so that

v

|yu ,v | ≤ A (1 + |yu |)

v

|dxu | + A u

|yu ,r | |dxr | . u

By Gronwall’s inequality, we obtain v |dxu | exp A |yu ,v | ≤ A (1 + |yu |) u

v

|dxr | .

u

v

v

u Now, using inequality (3.9) and 0 |dxr | + u |dxr | = 0 |dxr |, v v u |dxr | ≤ |y0 | + 1 + A |dxr | exp A |dxr | (1 + |yu |) exp A u 0 0 t t |dxr | + exp 2A |dxr | ≤ |y0 | exp A 0 0 t |dxr | , ≤ (1 + |y0 |) exp 2A 0

which gives

t

|yu ,v | ≤ A (1 + |y0 |) exp 2A 0

|dxr |

v

|dxu | , u

3.3 Uniqueness

59

and hence |y|1-var;[s,t] ≤ A (1 + |y0 |) exp 2A

t

t |dxr | |dxu | .

0

s

3.3 Uniqueness We now show uniqueness for ODEs driven along Lipschitz vector ﬁelds by establishing Lipschitz continuity of the ﬂow. Theorem 3.8 Assume that (i) V = (V1 , . . . , Vd ) is a collection of Lipschitz continuous vector ﬁelds on Re such that, for some υ ≥ 0, υ ≥ sup

y ,z ∈Rd

|V (y) − V (z)| ; |y − z|

(ii) x ∈ C 1-var [0, T ] , Rd with, for some ≥ 0, υ |x|1-var;[0,T ] ≤ . Then, for every initial condition there exists a unique ODE solution to dy = V (y) dx on [0, T ]. Moreover, the associated ﬂow is Lipschitz continuous in the following sense that, for any initial conditions y01 , y02 ∈ Re , π (V ) 0, y01 ; x − π (V ) 0, y02 ; x (3.10) ≤ y01 − y02 exp () . ∞;[0,T ] Moreover, for all s < t in [0, T ] we have π (V ) 0, y01 ; x − π (V ) 0, y02 ; x ≤ y01 − y02 exp (2) .υ |x|1-var;[s,t] . 1-var;[s,t] Proof. Lipschitz continuous vector ﬁelds are of linear growth and existence of solutions on [0, T ] is guaranteed by Theorem 3.7. Let us write y i ∈ π (V ) 0, y0i ; x , i = 1, 2 for an arbitrary solution started from y01 , y02 respectively and set y¯ = y 1 − y 2 . Then t 1 V ys − V ys2 dxs , y¯t = y¯0 + 0

and hence

|¯ yt | ≤ |¯ y0 | + υ

t

|¯ ys | . |dxs | . 0

60

Ordinary diﬀerential equations

Gronwall’s inequality then leads to the ﬁrst stated estimate. Moreover, taking y01 = y02 shows that y 1 ≡ y 2 and there is indeed a unique solution. For the second estimate, we have t 1 2 V yr − V yr dxs |¯ ys,t | = s t |¯ yr | |dxr | ≤ υ s t t |dxr | + υ |¯ ys,r | . |dxr | . ≤ |¯ ys | .υ s

s

Applying Gronwall gives

t

|¯ ys,t | ≤ |¯ ys | .υ

|dxr | . exp () . s

Using the estimate (3.10), we obtain |¯ ys,t | ≤ |¯ y0 | .υ

t

|dxr | . exp (2) . s

Noting that the right-hand side is a control, we obtain our estimate. Since uniqueness is a local property we immediately have Corollary 3.9 Given x ∈ C 1-var [0, T ] , Rd , there is a unique solution to dy = V (y) dx started at y0 along locally 1-Lipschitz vector ﬁelds V = (V1 , . . . , Vd ) up to its possible explosion time. If explosion can be ruled out (e.g. under an additional linear growth condition, cf. Theorem 3.7) then there exists a unique solution on [0, T ].

3.4 A few consequences of uniqueness We ﬁrst show that time-change commutes with solving diﬀerential equations. Proposition 3.10 Let x ∈ C 1-var [0, T1 ] , Rd and V = (V1 , . . . , Vd ) a collection of locally Lipschitz continuous vector ﬁelds on Re of linear growth. Assume φ is a continuous non-decreasing function from [0, T2 ] onto [0, T1 ] so that x ◦ φ ∈ C 1-var [0, T2 ] , Rd . Then π (V ) (0, y0 ; x)φ(·) ≡ π (V ) (0, y0 ; x ◦ φ) on [0, T2 ] .

3.4 A few consequences of uniqueness

61

Proof. Let y = π (V ) (0, y0 ; x) denote the (unique) ODE solution. For all t ∈ [0, T2 ] d φ(t) yφ(t) = y0 + Vi (yr ) dxir . 0

i=1

By a change of variable r = φ (s) for Riemann–Stieltjes integrals, we obtain yφ(t) = y0 +

d i=1

t

Vi yφ(s) dxiφ(s)

0

which says precisely that t → yφ(t) is an ODE solution driven x ◦ φ along vector ﬁelds V1 , . . . , Vd started at y0 . By uniqueness, we therefore have π (V ) (0, y0 ; x)φ(·) = π (V ) (0, y0 ; x ◦ φ) .

Deﬁnition 3.11 (concatenation, time-reversal) (i) Given x ∈ C [0, T ] , Rd and x ˜ ∈ C [T, U ] , Rd we deﬁne the concatenation x x ˜ as a d 3 path in C [0, U ] , R deﬁned by (x x ˜) (t) = xt if t ∈ [0, T ] (x x ˜) (t) = (xS − x ˜S ) + x ˜t if t ∈ [T, U ] . (ii) Next, the time-inverse of a path x ∈ C [0, T ] , Rd is deﬁned as the path x run backwards on [0, T ], i.e. ← −T : t ∈ [0, T ] → x d x T −t ∈ R . − for When [0, T ] is ﬁxed and no confusion is possible, we simply write ← x the time-inverse of x. As a simple consequence of uniqueness we have the following two propositions. ˜ ∈ C 1-var [S, T ] , Rd and Proposition 3.12 Let x ∈ C 1-var [0, S] , Rd , x V = (V1 , . . . , Vd ) a collection of locally Lipschitz continuous vector ﬁelds on Re of linear growth. Then π (V ) (0, y0 ; x) ≡ π (V ) (0, y0 ; x x ˜) on [0, S] and π (V ) S, π (V ) (0, y0 ; x)S ; x ˜ ≡ π (V ) (0, y0 ; x x ˜) on [S, T ] . 3 Of course, x, x ˜ need not be deﬁned on adjacent intervals but a simple reparametrization will bring things back to the above deﬁnition. Formally speaking, concatenation is an operation on paths modulo their parametrization.

62

Ordinary diﬀerential equations

Proof. Obvious. Proposition 3.13 Let x ∈ C 1-var [0, T ] , Rd and V = (V1 , . . . , Vd ) a collection of locally Lipschitz continuous vector ﬁelds on Re of linear growth so that there is a (unique) ODE solution y = π (V ) (0, y0 ; x). Then for all 0 ≤ t ≤ T, −T x = yt . π (V ) 0, yT ; ← T −t Proof. Same as for Proposition 3.10, just use φ (t) = T − t. We record a simple corollary. (As a preview to an application discussed later on: when applied to the left-invariant vector ﬁelds U1 , . . . , Ud on the −T over [0, T ] step-N nilpotent group it implies that the signature of x ← x is trivial.) Corollary 3.14 Let x ∈ C 1-var [0, T ] , Rd and V = (V1 , . . . , Vd ) a collection of locally Lipschitz continuous vector ﬁelds on Re of linear growth so − x that there is a (unique) ODE solution y = π (V ) (0, y0 ; x). Reparametrize ← ← − as a path on [T, 2T ], i.e. x (t) = x2T −t . Then, −T = y . x π (V ) 0, y0 ; x ← 0 2T

3.5 Continuity of the solution map We now investigate continuity properties of the solution map, i.e. the map (y0 , x) → y, the ODE solution to dy = V (y) dx =

d

Vi (y) dxi

i=1

started at time 0 at y0 ∈ Re . In fact, it will not complicate things to consider the map (y0 , V, x) → y.

3.5.1 Limit theorem for 1-variation signals Let us recall that our notion of 1-Lipschitz regularity includes the assumption of boundedness, cf. Deﬁnition 3.1. We start with our ﬁrst continuity statement for solutions of ordinary diﬀerential equations. Theorem 3.15 We consider (i) V 1 = V11 , . . . , Vd1 and V 2 = V12 , . . . , Vd2 are two collections of Lip1 vector ﬁelds on Re with, for some υ ≥ 0, max V i Lip 1 ≤ υ; i=1,2

3.5 Continuity of the solution map

63

(ii) y01 , y02 ∈ Re are initial conditions; (iii) x1 and x2 are two paths in C 1-var [0, T ] , Rd with, for some ≥ 0, max xi 1-var;[0,T ] ≤ . i=1,2

Then, if y i = π (V i ) 0, y0i ; xi for i = 1, 2, we have 1 2 1 y −y y01 −y02 +υ x1 −x2 V −V 2 exp (2υ) . ≤ + ∞;[0,T ] 0;[0,T ] ∞

Proof. Without loss of generality, x10 = x20 = 0 so that 12 x1 − x2 0;[0,T ] ≤ 1 x − x2 ≤ x1 − x2 0,[0,T ] . First note that for i = 1, 2 we have ∞;[0;T ] i y ≤ υ. Now, write for t ∈ [0, T ] , 1-var;[0,T ] 1 yt − yt2

1 t 1 1 2 1 2 1 V yr − V yr dxr ≤ y 0 − y0 + 0 t V 1 yr2 d x1r − x2r + 0 t 2 2 V yr − V 1 yr2 dx2r + 0

t 1 1 2 yr − yr2 . dx1r (3.11) ≤ y 0 − y0 + υ 0 t V 1 yr2 d x1r − x2r + V 1 − V 2 ∞ . + 0

We deduce from the integration-by-parts formula t t 2 1 1 1 2 xr − x2r dV 1 yr2 + V 1 yt2 . x1t − x2t V yr d xr − xr = 0

0

the bound t 2 1 1 2 y d x V − x r r r 0

≤ x1 − x2 ∞;[0,t] V 1 y·2 1-var;[0,t] + V 1 yt2 . x1 − x2 ∞;[0,t] ≤ υ x1 − x2 ∞;[0,t] 1 + y 2 1-var;[0,t] ≤ υ x1 − x2 ∞;[0,t] (1 + υ) .

This last inequality and inequality (3.11) give, for t ∈ [0, T ] , 1 yt − yt2 ≤ y01 − y02 + υ (1 + υ) x1 − x2 ∞;[0,t] t 1 yr − yr2 . dx1r + V 1 − V 2 +υ ∞ 0

64

Ordinary diﬀerential equations

which implies, using Gronwall’s lemma, that 1 yt − yt2 ≤ y01 − y02 + υ x1 − x2 (1 + υ) ∞;[0,t] t 1 dxr + V 1 − V 2 ∞ exp |V |Lip 1 0 1 V − V 2 exp (2υ) . y01 − y02 + υ x1 − x2 + ≤ ∞;[0,t] ∞

From the above theorem we see in particular that the solution map (y0 , x) → π (V ) (0, y0 ; x) is uniformly continuous in the sense of “uniform convergence with uniform 1-variation bounds”. By a localization argument we now weaken the boundedness assumption inherent to Lip1 regularity: Corollary 3.16 We consider (i) V 1 = V11 , . . . , Vd1 and V 2 = V12 , . . . , Vd2 are two collections of locally Lip1 vector ﬁelds on Re , with linear growth; with y0i ≤ R for some R ≥ 0; (ii) y01 , y02 ∈ Re are initial conditions, (iii) x1 and x2 are two paths in C 1-var [0, T ] ,Rd , with maxi=1,2 xi 1-var;[0,T ] ≤ for some ≥ 0. Then, if y i = π (V i ) 0, y0i ; xi for i = 1, 2, there exist constants C, M depending only on R, and the vector ﬁelds, such that for all s, t ∈ [0, T ] 1 1 y − y 2 y01 − y02 + x1 − x2 V − V 2 . ≤ C + ∞;[0,T ] 0;[0,T ] ∞;B (0,M ) Proof. We saw in Corollary 3.9 that under locally Lipschitz and linear growth assumptions on the vector ﬁelds, there is indeed a unique, nonexploding solution. In fact, thanks to the explicit estimate (3.8) there exists M = M (, R) so that maxi=1,2 y i ∞;[0,T ] ≤ M . We now modify the vector ﬁelds V i outside a ball of radius R such as to make them Lip1 -vector ﬁelds, say V˜ i , and note that y i = π (V i ) 0, y0i ; xi = π (V˜ i ) 0, y0i ; xi . This allows us to use Theorem 3.15 to ﬁnish the proof. Exercise 3.17 (change-of-variable formula) Assume f is C 1 (Re ) and y = π (V ) (0, y0 ; x) the unique solution to dy = V (y) dx, y (0) = y0 ∈ Re along locally 1-Lipschitz V = (V1 , . . . , Vd ) on Re , with linear vector ﬁelds growth, and x ∈ C 1-var [0, T ] , Rd . Show that f (yT ) − f (y0 ) =

T

(V f ) (ys ) dxs 0

3.5 Continuity of the solution map

65

where V f = (V1 f, . . . , Vd f ) and each Vi is identiﬁed with a ﬁrst-order differential operator Vi f =

e

Vik ∂k f.

k =1

T ] , Rd , this is just the fundamental theorem of Solution. For x ∈ C 1 [0, calculus. For x ∈ C 1-var [0, T ] , Rd we approximate (uniformly, with uniform 1-variation bounds) and use the limit theorem. One can also appeal to the direct change-of-variable formulae for Riemann–Stieltjes integrals . . .

3.5.2 Continuity under 1-variation distance Given a collection V of Lip1 -vector ﬁelds we ﬁrst show that (y0 , x) ∈ Re × C 1-var [0, T ] , Rd → π (V ) (0, y0 ; x) ∈ C 1-var ([0, T ] , Re ) is Lipschitz continuous on bounded sets. Again, it will not complicate things to include V in the following continuity result.

Theorem 3.18 We consider (i) V 1 = V11 , . . . , Vd1 and V 2 = V12 , . . . , Vd2 are two collections of Lip1 vector ﬁelds on Re , such that, for some υ ≥ 0, max V i Lip 1 ≤ υ;

i=1,2

(ii) y01 , y02 ∈ Re are viewed as two time-0 initial conditions; (iii) x1 and x2 are two paths in C 1-var [0, T ] , Rd such that, for some ≥ 0, max xi 1-var;[0,T ] ≤ .

i=1,2

Then, if y i = π (V i ) 0, y0i ; xi for i = 1, 2, we have 1 2 y −y

1-var;[0,T ]

≤ 2( y01 − y02 υ+υ x1 −x2 1-var;[0,T ] + V 1 − V 2 ∞ )e3υ . (3.12)

66

Ordinary diﬀerential equations

Proof. Take s < t in [0, T ] and observe that t t 1 1 1 2 2 2 1 2 ys,t − ys,t = V yr dxr − V yr dxr s s t t 1 1 V yr − V 1 yr2 dx2r + V 1 yr1 d x1r − x2r ≤ s s t 1 2 V yr − V 2 yr2 dx2r + s ≤ υ y 1 − y 2 ∞;[0,T ] + V 1 − V 2 ∞ x2 1-var;[s,t] +υ x1 − x2 1-var;[s,t] . As the right-hand side is a control, it follows that 1 2 1 x y − y 2 y 1 − y 2 V − V 2 ≤ υ + 1-var;[s,t] ∞;[0,T ] ∞ 1-var;[s,t] 1 +υ x − x2 . 1-var;[s,t]

Using Theorem 3.15, and replacing s, t by 0, T, we then obain (3.12), as claimed. Remark 3.19 The interval [0, T ] in the above theorem is of course arbitrary. In particular, this means that we also have for all s, t ∈ [0, T ] , 1 y − y 2 ≤ 2( ys1 − ys2 υ+υ x1 − x2 1-var;[s,t] + V 1 − V 2 ∞ )e3υ , 1-var;[s,t] (3.13) where is a bound on max x2 1-var;[s,t] , x1 1-var;[s,t] . As before, we can relax the assumption on the vector ﬁelds and still keep a uniform Lipschitz bound on bounded sets. Corollary 3.20 We consider (i) V 1 = V11 , . . . , Vd1 and V 2 = V12 , . . . , Vd2 are two collections of locally Lip1 vector ﬁelds on Re , with linear growth; i (ii) y01 , y02 ∈ Re are initial conditions, 0; with y0d ≤ R for some R ≥ 1 2 1-var [0, T ] , R ,with maxi=1,2 xi 1-var;[0,T ] (iii) x and x are two paths in C ≤ for some ≥ 0. Then, if y i = π (V i ) 0, y0i ; xi for i = 1, 2, there exist constants C, M depending only on R, and the vector ﬁelds, such that 1 2 1 y −y y01 − y02 + x1 − x2 V − V 2 . ≤ C + 1-var;[0,T ] 0;[0,T ] ∞;B (0,M ) The 1-variation estimates imply 1-H¨ older estimates: Exercise 3.21 Under the same assumptions as the one in Corollary 3.20, and assuming x1 and x2 to be 1-H¨ older, prove the existence of constants

3.6 Comments

67

C, M depending on maxi y0i and maxi=1,2 xi 1-H¨o l;[0,T ] and the vector ﬁelds, such that 1 2 1 y −y y01 − y02 + x1 −x2 V −V 2 . ≤ C + 1-H¨o l;[0,T ] 1-H¨o l;[0,T ] ∞;B (0,M ) Solution. We will use |V |Lip 1 |x|1-var;[s,t] ≤ |V |Lip 1 |x|1-H¨o l;[s,t] |t − s|. We may take the vector ﬁelds to be 1-Lipschitz, as the result then follows by a localization argument. Deﬁne = maxi xi 1-H¨o l;[0,T ] . From (3.13) we obtain 1 2 ys,t − ys,t ≤ 2 ys1 −ys2 .υ + υ x1 −x2 1-H¨o l;[0,T ] + V 1 − V 2 ∞ e3υ T . t−s Replacing ys1 − ys2 on the right-hand side by y 1 − y 2 ∞ , followed by taking the supremum over all s < t in [0, T ], leads to an estimate of the form 1 y − y 2 ≤ 2 y 1 − y 2 ∞ υ + · · · exp (3υT ) . 1-H¨o l;[0,T ] We then conclude with Theorem 3.15.

3.6 Comments There are many books on ODE theory, such as the authorative Hartman [84]; for a concise treatment, see the relevant chapters of Driver [45]. The class of ODEs studied here, where the time-inhomogeneity factorizes in the form of a multidimensional driving signal, is particularly important in (non-linear) control theory, see the relevant contributions in Agrachev [1] for instance. Continuity in the starting point (the “ﬂow”) is well known, and its further regularity is discussed in Chapter 4. Continuity in the driving signal is harder to ﬁnd in the literature but also well known, see Lyons and coworkers [120, 123] and the references cited therein.

4 ODEs: smoothness We remain in the ODE setting of the previous chapter; that is, we consider diﬀerential equations of the form dy = V (y) dx, y (0) = y0 , where x = x (t) is a Rd -valued continuous path of bounded variation. In the present chapter we investigate various smoothness properties of the solution, in particular as a function of y0 and x.

4.1 Smoothness of the solution map We saw in the last chapter (cf. Theorem 3.18) that Lip1 -regularity of the vector ﬁelds leads to (local Lipschitz) continuity of the solution map π (V ) (0, y0 ; x) as a function of the initial condition y0 , the driving signal x and the vector ﬁelds V = (V1 , . . . , Vd ). Under the slightly stronger regularity assumption of C1 -boundedness we now show that π (V ) (0, y0 ; x) is diﬀerentiable in y0 and x. (For simplicity, we do not discuss diﬀerentiability in V .) In fact, we shall see that Ck -boundedness allows for k derivatives of π (V ) (0, y0 ; x) in y0 and x. As earlier, in the following deﬁnition V = (V1 , . . . , Vd ) is regarded as a map from Re to L Rd , Re , equipped with operator norm. if (i) it Deﬁnition 4.1 We say that V : Re → L Rd , Re is Ck -bounded is k-times Fr´echet diﬀerentiable and (ii) V, DV, . . . , Dk V is a bounded function on Re . We then set |V |C k := max Di V ∞ . i=0,...,k

If only (i) holds, we write V ∈ Cklo c .

4.1.1 Directional derivatives Lemma 4.2 Let V = (V1 , . . . , Vd ) be a collection of continuously diﬀeren tiable vector ﬁelds, that is V ∈ C1 Re , L Rd , Re . Then, for all ε > 0 and for all bounded sets U ⊂ Re , there exists δ such that for all b, a ∈ U , |b − a| < δ =⇒ |V (b) − V (a) − DV (a) · (b − a)| ≤ ε |b − a| .

4.1 Smoothness of the solution map

69

Proof. By the fundamental theorem of calculus and the chain rule, |V (b) − V (a) − DV (a) · (b − a)| 1 = [DV (a + t (b − a)) − DV (a)] dt . (b − a) 0 1 |DV (a + t (b − a)) − DV (a)| dt. ≤ |b − a| 0

We conclude using the fact that DV is uniformly continuous on bounded sets in Re . Condition 4.3 (non-explosion) We say that a collection of vector ﬁelds V = (V1 , . . . , Vd ) on Re satisﬁes the non-explosion condition if for all R > 0, there exists M > 0 such that if (y0 , x) ∈ Re × C 1-var [0, T ] , Rd with |x|1-var + |y0 | ≤ R, then π (V ) (0, y0 ; x) < M. ∞;[0,T ]

Following our usual convention, we agree that, in the case of nonuniqueness, π (V ) (0, y0 ; x) stands for any ODE solutions driven by x along vector ﬁelds V started at y0 . For example, a collection of continuous vector ﬁelds of linear growth satisﬁes the non-explosion condition. Theorem 4.4 (directional derivatives in starting point and driving signal) We ﬁx a collection of C1lo c -vector ﬁelds on Re , V = (V1 , . . . , Vd ), satisfying the non-explosion condition. Then, (i) the map1 (y0 , x) ∈ Re × C 1-var [0, T ] , Rd → y ≡ π (0, y0 ; x) ∈ C 1-var ([0, T ] , Re ) has directional derivatives2 d π (V ) (0, y0 + εv; x + εh) D(v ,h) π (V )(0, y0 ; x) := ∈ C 1-var ([0, T ] , Re ) dε ε=0 in all directions (v, h) ∈ Re × C 1-var [0, T ] , Rd ; (ii) deﬁne the bounded variations paths t → Mt := Mty 0 ,x :=

d i=1

t

DVi (yr ) dxir ∈ Me (R)

(4.1)

0

(where Me (R) denotes real (e × e)-matrices) and also t → Ht :=

Hty 0 ,x;h

:=

d i=1

1 Since 2 The

t

Vi (yr ) dhir ∈ Re

(4.2)

0

V remains ﬁxed we write π instead of π (V ) . derivate exists as a (strong) limit in the Banach space C 1 -va r ([0, T ] , Re ).

ODEs: smoothness

70

then z = D(v ,h) π (V ) (0, y0 ; x) is the (unique) solution of the linear ODE

dzt = dMty 0 ,x · zt + dHty 0 ,x;h , z0 = v.

(4.3)

Remark 4.5 Observe that (y, z) = π (V ) (0, y0 ; x) , D(v ,h) π (V ) (0, y0 ; x) solves the ODE driven by (x, h) given by formal diﬀerentiation, namely dy = V (y) dx, dz = (DV (y) dx) · z + V (y) dh or, in more detail, #

d dyt = i=1 Vi (yt ) dxit d d dzt = i=1 (DVi (yt ) · zt ) dxit + i=1 Vi (yt ) dhit

started at (y0 , z0 ) = (y0 , v). With V ∈ C1lo c , DV is continuous and so the vector ﬁelds of the ODE for (y, z) are continuous but in general not C1lo c . Nonetheless, it has a unique solution (thanks to the speciﬁc structure: ﬁrst solve for y, then M, H, then z) which satisﬁes the non-explosion condition. Indeed, this is a straightforward application of the estimates for ODE solutions and Riemann–Stieltjes integrals: estimate y in terms of (y0 , x· ), then M, H in terms of (x· , h· , y· ) and ﬁnally z in terms of (v, M· , H· ). Proof. We ﬁrst notice that by a localization argument, we can assume that V is compactly supported. With (y0 , x) , (v, h) ∈ X ≡ Re ×C 1-var [0, T ] , Rd ﬁxed write ytε = π (V ) (0, y0 + εv; x + εh)t , y ≡ y 0 and also z ε = (y ε − y) /ε for ε > 0. Deﬁne z ∈ C 1-var ([0, T ] , Re ) as the (unique) ODE solution to (4.3). Step 1: We ﬁrst establish that lim z ε = z in Y∞

(4.4)

ε→0

with Y∞ := C [0, T ] , Rd , a Banach space under the ∞-norm. From the respective ODEs for y, y ε and z, ztε − zt

=

& d t% 1 (Vi (ysε ) − Vi (ys )) − DVi (ys ) · zs dxis ε i=1 0 d t + (Vi (ysε ) − Vi (ys )) dhis i=1

=

d

0

∆i1 (0, t) + ∆i2 (0, t) + ∆i3 (0, t)

i=1

4.1 Smoothness of the solution map

71

with ∆i1 (s, t) ∆i2 (s, t) ∆i3 (s, t)

t

DVi (yr ) · (zrε − zr ) dxir ,

=

s

s

t

=

1 [Vi (yrε ) − Vi (yr ) − DVi (yr ) · (yrε − yr )] dxir , ε

t

(Vi (yrε ) − Vi (yr )) dhir .

= s

First observe that Theorems 3.4 and 3.18 apply (as V ∈ C1c ⊂ Lip1 ) and we have R := sup |ytε | < ∞, t∈[0,T ] ε∈[0,1]

|z ε |1-var;[0,T ] ≤ c1 |v| + |h|1-var;[0,T ] =: c2 . Fix η > 0. From R < ∞ and Lemma 4.2, we see that there exists δ > 0 such that |yrε − yr | < δ implies

1 |Vi (yrε ) − Vi (yr ) − DVi (yr ) · (yrε − yr )| ≤ η |zrε | . ε

Using |yrε − yr | ≤ εc2 , this means that there exists ε0 > 0 such that ε < ε0 implies sup r ∈[0,T ]

1 |Vi (yrε ) − Vi (yr ) − DVi (yr ) · (yrε − yr )| ≤ η |zrε | ≤ c2 η. ε

In particular, we obtain that d i ∆2 (s, t) ≤ c3 η |x|

1-var;[s,t]

.

i=1

Bounding ∆i3 (s, t) is even easier; indeed, d i ∆3 (s, t) ≤

d

i=1

i=1

≤

|Vi |Lip 1 sup |yrε − yr | . |h|1-var;[s,t] r ∈[s,t]

c4 ε |h|1-var;[s,t] .

Finally, as the vector ﬁelds are Lipschitz, we have t d i ∆1 (s, t) = c5 |zrε − zr | . |dxr | . i=1

s

(4.5)

ODEs: smoothness

72

Putting things together, we obtain that for ε < ε0 , t ε |zsε − zs | . |dxs | + c4 ε |h|1-var;[0,t] + c3 η |x|1-var;[0,t] . |zt − zt | ≤ c5 0

By Gronwall’s lemma, we obtain that sup |ztε − zt | ≤ c4 ε |h|1-var;[0,T ] + c3 η |x|1-var;[0,T ] exp c5 |x|1-var;[0,T ] , t∈[0,T ]

(4.6) so that limε→0 |z ε − z|∞;[0,T ] ≤ c6 (η + ε) and since η > 0 was arbitrary it follows that limε→0 |z ε − z|∞;[0,T ] = 0. Step 2: Deﬁne zˆε to be the solution of dˆ ztε = dMty 0 + ε,x+ εh · zˆtε + dHty 0 +ε,x+εh;h , zˆ0ε = v. As there was nothing special about ε = 0 in the ﬁrst step, we have actually just showed that ε ∈ [0, 1] → π (V ) (0, y0 + εv; x + εh) ∈ Y∞ := C ([0, T ] , Re ) is diﬀerentiable with derivative zˆε . Now ε → Mty 0 + ε,x+εh , Hty 0 + ε,x+εh;h → zˆε ∈ Y1 = C 1-var ([0, T ] , Re ) is continuous (from the continuity properties of the solution map and Riemann–Stieltjes integration respectively). Therefore, from Proposition B.1 in Appendix B, ε ∈ [0, 1] → π (V ) (0, y0 + εv; x + εh) ∈ Y1 is diﬀerentiable; that is, the limit when ε → 0 of ε−1 π (V ) (0, y0 + εv; x + εh) − π (V ) (0, y0 + v; x + h) exists in Y1 . The proof is now ﬁnished. Proposition 4.6 (higher-order directional derivatives) Let k ∈ {1, 2, . . . }. Assume V = (V1 , . . . , Vd ) is a collection of Cklo c -vector ﬁelds on Re satisfying the non-explosion condition. Then (y0 , x) → π (V ) (0, y0 ; x) has (up to) kth-order directional derivatives in the sense that, for all (vi , ×k hi )1≤i≤k ∈ Re × C 1-var [0, T ] , Rd ,    k k   k ∂ k   π (0, y ; x) := π 0, y + ε v , x+ ε h D(v 0 0 j j j j (V ) i ,h i ) 1 ≤i ≤k ∂ε1 . . . ∂εk  j =1 j =1

ε=0

4.1 Smoothness of the solution map

73

exists as a strong limit in the Banach space C 1-var ([0, T ] , Re ). Furthermore, the directional derivatives satisfy the control ODEs obtained by formal differentiation. Proof. This follows by simple induction: for j ≥ 1, a solution of an ODE driven by Cjlo c -vector ﬁelds satisfying the non-explosion condition admits a derivative in any arbitrary direction in its starting point and driving signal, and the derivative in such directions, together with the driving signal, satisﬁes an ODE driven along Cjlo−1 c vector ﬁelds that satisﬁes the non-explosion condition.

4.1.2 Fr´echet diﬀerentiability We now show that the solution map to dy = V (y) dx is continuously Fr´echet diﬀerentiable in the starting point and driving signal. Theorem 4.7 Let V = (V1 , . . . , Vd ) be a collection of C1lo c-vector ﬁelds on Re satisfying the non-explosion condition. Then the map (y0 , x) ∈ Re × C 1-var ([0, T ] , Re ) → y ≡ π (V ) (0, y0 ; x) ∈ C 1-var ([0, T ] , Re ) is C 1 in the Fr´echet sense. Proof. From Corollary B.5, we only need to show that the map (y0, x), (v, h) ×2 into C 1-var ([0, T ] , → D(v ,h) π (V ) (0, y0 ; x) from Re × C 1-var [0, T ] , Rd e R ) is uniformly continuous on bounded sets. This follows from the uniform continuity on bounded sets of the maps   v φ1 φ2 φ (y0 , x) , (v, h) →  M y 0 ,x  →3 D(v ,h) π (V ) (0, y0 ; x) ; (y0 , x) , (v, h) → π (V ) (0, y0 ; x) H y 0 ,x φ1 and φ3 because of Theorem 3.20, and φ2 because of Corollary 2.8. We now discuss C k -Fr´echet diﬀerentiability of the map (y0 , x) → π (V ) (0, y0 ; x) . Proposition 4.8 (higher-order Fr´echet) Let k ≥ 1, and V = (V1 , . . . , Vd ) a collection of Cklo c -vector ﬁelds on Re satisfying the non-explosion condition. Then the map (y0 , x) ∈ Re × C 1-var ([0, T ] , Re ) → y ≡ π (V ) (0, y0 ; x) ∈ C 1-var ([0, T ] , Re ) is C k in the Fr´echet sense. k Proof. The map (y0 , x) , (vi , hi )1≤i≤k → D(v π (V ) (0, y0 ; x) is unii ,h i ) 1 ≤i ≤k formly continuous on bounded sets because of the uniform continuity on bounded sets of the solution map and the integral. This is enough to conclude the proof using Corollary B.11 in Appendix B.

ODEs: smoothness

74

It can be convenient in applications to view π (V ) (0, ·; x) as a ﬂow of C k diﬀeomorphisms, that is, an element in the space of all φ : [0, T ] × Re → Re : (t, y) → φt (y) such that ∀t ∈ [0, T ] : φt is a C k -diﬀeomorphism of Re , ∀α : |α| ≤ k : ∂α φt (y) , ∂α φ−1 t (y) are continuous in (t, y) . Corollary 4.9 Under the assumptions of Proposition 4.8, the map (t, y0 ) → π (V ) (0, y0 ; x) is a ﬂow of C k -diﬀeomorphisms. Proof. It is clear from Proposition 4.8 that y0 ∈ Re → π (V ) (0, y0 ; x)t is in C k (Re , Re ). Moreover, it follows from Proposition 3.13 that −1 −) x π (V ) (0, ·; x)t = π (V ) (0, ·; ← t ← − where x (·) = x (t − ·) ∈ C 1-var [0, t] , Rd ; we see that π (V ) (0, ·; x)t is a bijection whose inverse is also in C k (Re , Re ) and conclude that each π (V ) (0, ·; x)t is indeed a C k-diﬀeomorphism of Re. At last, each ∂α -derivative −1 of π (V ) (0, ·; x)t resp. π (V ) (0, ·; x)t can be represented as a (non-explosive) ODE solution which plainly implies joint continuity in t and y0 .

Exercise 4.10 Prove Proposition 4.8 with C 1-var replaced throughout by (i) C 1-H¨o l and (ii) W 1,2 . We ﬁnish this section with a representation formula for directional derivatives. principle) Consider (y0 ,x) ∈ Re ×C 1-var Proposition 4.11 (Duhamel’s d 1 [0, T ] , R , a collection of Clo c -vector ﬁelds on Re , V = (V1 , . . . , Vd ) satisfying the non-explosion condition and write y ≡ π (V ) (0, y0 ; x) ∈ C 1-var([0, T ] , Re ) for the unique ODE solution. Deﬁne Mt =

d i=1

t

DVi (yr ) dxir ∈ Me (R)

0

and J· as the Me (R)-valued (unique) solution to the linear ODE, dJt = dMt · Jt , J0 = I

(4.7)

(where · denotes matrix multiplication and I the identity matrix). More generally, given 0 ≤ s ≤ t ≤ T write Jt←s for the solution of this ODE started at I at time s. Then Jt←s is the Jacobian of π (V ) (s, ·; x)t : Re → y s ,x to indicate this. Moreover, the Re at ys and we may write Jt←s =: Jt←s following representation formula holds: D(v ,h) π (V ) (0, y0 ; x)t = D(v ,h) yt =

y 0 ,x Jt←0 ·v+

d i=1

0

t

y s ,x Jt←s ·Vi (ys ) dhis . (4.8)

4.1 Smoothness of the solution map

75

Proof. By Theorem 4.4, for 0 ≤ t ≤ T , the ﬂow map y0 → π (V ) (0, y0 ; x)t from Re → Re admits partial derivatives in all directions. These are easily seen to be continuous (much more will be shown soon) and so π (V ) (0, ·; x)t ∈ C 1 (Re , Re ). Its diﬀerential (the “Jacobian”) at some point y0 , viewed as an Re×e -matrix, is of the form J˜t = (z1 | . . . |ze ) where zi = zi (t) = D(bi ,0) π (V ) (0, y0 ; x)t and (bi ) denotes the canonical basis of Re . From Theorem 4.4, zi = zi (t) is the solution of a linear ODE of the form dzi (t) = dMt · zi (t) with zi (0) = bi . Equivalently, J˜t is the solution of (4.7) started at I at time 0 and by ODE uniqueness, J˜t = Jt . The matrix Jt remains invertible for all t ∈ [0, T ]. Indeed, its inverse is constructed explictly as the (unique) ODE solution to dKt = −Kt · dMt with K0 = I. To see this, we just observe that d (Kt Jt ) = −Kt dMt Jt + Kt dMt Jt = 0. Of course, there is nothing special about time 0 and the same reasoning shows that, for 0 ≤ s ≤ t ≤ T , the ﬂow map π (V ) (s, ·; x)t is in C 1 (Re , Re ) with Jacobian given by (the invertible matrix) Jt←s . The chain rule in conjunction with π (V ) (s, ys ; x)u = π (V ) (t, π (s, ys , x)t ; x)u , 0 ≤ s ≤ t ≤ u ≤ T implies3 Ju ←s = Ju ←t · Jt←s , 0 ≤ s ≤ t ≤ u ≤ T −1

and by deﬁning Js←t := (Jt←s ) , 0 ≤ s ≤ t, this remains valid for all s, t, u ∈ [0, T ]. The validity of (4.8) is nothing more than a variationof-constants ODE argument (also known as Duhamel’s principle) which represents the solution to the inhomogenous equation dzt = dMt · zt + dHt , z0 = v, which is precisely D(v ,h) π (V ) (0, y0 ; x), in terms of the solution of the homogenous equation, i.e. the ODE satisﬁed by the Jacobian. More precisely, it suﬃces to observe that J0←t zt − v =

t

d (J0←s zs ) = 0

t

J0←s dHs = 0

d i=1

t

J0←s Vi (ys ) dhis .

0

−1

x x x ) · J0←s = Jt←s the representation formula (4.8) now follows Using (J0←t from simple algebra.

Remark 4.12 The underlying geometry helps to “read” these equations. The ﬂow π (V ) (s, ·; x)t maps ys → yt (where y is the solution of the ODE 3 The notation J t ←s (rather than J s →t ) has the advantage of suggesting the right order of matrix multiplication.

ODEs: smoothness

76

driven by x) and its matrix-valued Jacobian should be viewed as a linear map between the respective tangent spaces, i.e. y s ,x ∈ L (Ty s Re , Ty t Re ) . Jt←s = Jt←s

From the very nature of vector ﬁelds, Vi (ys ) = Vi |y s ∈ Ty s Re and Jt←s · Vi (ys ) ∈ Ty t Re . In particular, we should think of (4.8) as equality between elements in Ty t Re rather than just Re .

4.2 Comments Although (or maybe because) the results are unsurprising we are unaware of good references to the smoothness topics discussed here. In a more general Young context, related smoothness properties have been discussed by Li and Lyons [109]. Diﬀerential equations driven by W 1,2 -paths (a special case of Exercise 4.10) was Bismut’s starting point in [15]; the resulting Hilbert structure of the input signal is convenient in discussing non-degeneracy properties of the solution map. Diﬀerential equations driven by W 1,2 -paths also arise naturally in support and large-deviation statements for stochastic diﬀerential equations, which we shall encounter in Part IV.

5 Variation and H¨ older spaces We return to the abstract setting of Section 1.1, where we introduced C ([0, T ] , E), the space continuous paths deﬁned on [0, T ] with values in a metric space (E, d), followed by a detailed discussion of continuous paths of ﬁnite 1-variation (“bounded variation”). The purpose of the present chapter is to carry out a similar discussion for p-variation and 1/p-H¨ older regularity, p ∈ [1, ∞). In the later applications to rough paths, E will be a Lie group whose dimension depends on [p], the integer part of p.

5.1 H¨older and p-variation paths on metric spaces 5.1.1 Deﬁnition and ﬁrst properties We start by deﬁning α-H¨older and p-variation distances. Deﬁnition 5.1 Let (E, d) be a metric space. A path x : [0, T ] → E is said to be (i) H¨ older continuous with exponent α ≥ 0, or simply α-H¨ older, if |x|α -H¨o l;[0,T ] :=

sup 0≤s< t≤T

d (xs , xt ) α < ∞; |t − s|

(5.1)

(ii) of ﬁnite p-variation for some p > 0 if |x|p-var;[0,T ] :=

sup (t i )∈D([0,T ])

p d xt i , xt i + 1

1/p < ∞.

(5.2)

i

We will use the notations C α -H¨o l ([0, T ], E) for the set of α-H¨older paths x and C p-var ([0, T ], E) for the set of continuous paths x : [0, T ] → E of ﬁnite p-variation. It is obvious from these deﬁnitions that a path x : [0, T ] → E is constant, i.e. xt ≡ o for some o ∈ E, if and only if |x|α -H¨o l;[0,T ] = 0 and if and only if |x|p-var;[0,T ] = 0. (In particular, if E = Rd our quantities (5.1), (5.2) are only semi-norms.) Observe that C 0-H¨o l ([0, T ], E) is nothing but the set of continuous paths from [0, T ] into E and |x|0-H¨o l;[0,T ] = |x|0;[0,T ] , where the latter was deﬁned in Section 1.1. Any α > 0 can be written as α = 1/p and it is obvious that any (1/p)-H¨ older path is a continuous path of ﬁnite p-variation. Although

78

Variation and H¨ older spaces

a path of ﬁnite p-variation need not be continuous (e.g. a step-function), our focus is on continuous paths. The following simple proposition then explains why our main interest lies in α ∈ [0, 1] and p ≥ 1. Proposition 5.2 Assume x : [0, T ] → E is α-H¨ older continuous, with α ∈ (1, ∞), or continuous of ﬁnite p-variation with p ∈ (0, 1). Then x is constant, i.e. x (·) ≡ x0 . Proof. Since α-H¨older paths have ﬁnite p-variation with p = 1/α it suﬃces to consider the case when x is continuous of ﬁnite p-variation with p < 1. Consider a dissection D = (ti ) ∈ D ([0, T ]) with mesh |D|. Then d (x0 , xT ) ≤ d xt i , xt i + 1 i

≤

1−p p M max d xt i , xt i + 1 i

where M = |x|p-var;[0,T ] < ∞. Using uniform continuity of x on [0, T ], we can make maxi d xt i , xt i + 1 arbitrarily small by taking a dissection with small enough mesh |D| = maxi |ti+1 − ti |. The case p = 1 resp. α = 1 was already discussed in detail in Section 1.2 and heavily used in our discussion of ODEs driven by continuous paths of bounded variation. We now begin a systematic study of p-variation, generalizing much of the familiar p = 1 case. Proposition 5.3 Let x ∈ C ([0, T ] , E) . Then, if 1 ≤ p ≤ p < ∞, |x|p -var;[0,T ] ≤ |x|p-var;[0,T ] . In particular, C p-var ([0, T ], E) ⊂ C p

-var

([0, T ] , E) .

Proof. This follows from the elementary inequality 1/p 1/p p p |ai | ≤ |ai | .

Exercise 5.4 Formulate and prove the H¨ older version of Proposition 5.3. Proposition 5.5 (interpolation) Let x ∈ C ([0, T ] , E). (i) For 1 ≤ p α ≥ 0, we have α /α 1−α /α |x|0;[0,T ] . |x|α -H¨o l;[0,T ] ≤ |x|α -H¨o l;[0,T ]

5.1 H¨ older and p-variation paths on metric spaces

Proof. (i) Observe p d xt i , xt i + 1

=

p p −p d xt i , xt i + 1 d xt i , xt i + 1

i

i

≤

79

p −p

|x|0

p d xt i , xt i + 1 , i

then pass to the respective suprema over all dissections (ti ), and raise to the power 1/p . (ii) Follows from α /α α /α d (xs , xt ) d (xs , xt ) d (xs , xt ) 1−α /α 1−α /α ≤ d (xs , xt ) ≤ |x|0 α α α |t − s| |t − s| |t − s| and passing to the respective suprema. Proposition 5.6 Let p ≥ 1 and x ∈ C ([0, T ] , E). (i) x ∈ C p-var ([0, T ], E) is equivalent to p limδ →0 sup d xt i , xt i + 1 < ∞. (t i )∈Dδ ([0,T ])

(ii) If 1 ≤ q < p < ∞ and x ∈ C q -var ([0, T ], E) then p limδ →0 sup d xt i , xt i + 1 = 0. (t i )∈Dδ ([0,T ])

(5.3)

i

(5.4)

i

Remark 5.7 Proposition 5.9 implies that one can replace p The forthcoming p d xt i , xt i + 1 by |x|p-var;[t i ,t i + 1 ] in both (5.3) and (5.4). Proof. (i) If x is of ﬁnite p-variation then, trivially, (5.3) holds. Conversely, let us write ϕ (x) = xp ; it follows from (5.3) that we can ﬁnd δ > 0 small enough and c < ∞ so that ϕ d xt i , xt i + 1 < c t i ∈D

for any dissection D = (ti ) of [0, T ] with |D| < δ. Then, for an arbitrary dissection D of [0, T ], the number of intervals of length atleast δ cannot be more than T δ −1 and each of these contributes at most ϕ |x|0;[0,T ] where, by continuity of x, |x|0;[0,T ] ≡

sup d (xs , xt ) < ∞. s,t∈[0,T ]

Hence, for any dissection D of [0, T ], ϕ d xt i , xt i + 1 < c + T δ −1 ϕ |x|0;[0,T ] < ∞ t i ∈D

Variation and H¨ older spaces

80

which implies that x ∈ C p-var ([0, T ], E). (ii) Introduce the modulus of continuity, osc (x, δ) = sup {d (xs , xt ) : s, t ∈ [0, T ] , |t − s| ≤ δ} . By uniform continuity of x : [0, T ] → E we have osc (x, δ) → 0 as δ 0. The estimate p q p−q osc (x, |D|) d xt i , xt i + 1 ≤ d xt i , xt i + 1 t i ∈D

t i ∈D

then implies sup (t i )∈Dδ ([0,T ])

p q p−q d xt i , xt i + 1 ≤ |x|q -var;[0,T ] osc (x, δ) i

which converges to 0 with δ 0 as required. As in the discussion of 1-variation regularity, the notion of control or control function is extremely useful. Let us recall that a control (on [0, T ]) is a continuous map ω of s, t ∈ [0, T ] , s ≤ t, into the non-negative reals, 0 on the diagonal, and super-additive, i.e for all s ≤ t ≤ u ∈ [0, T ], ω(s, t) + ω(t, u) ≤ ω(s, u). p

The perhaps most important example of a control is given by |x|p-var;[s,t] for x ∈ C p-var ([0, T ], E). This is the content of the following proposition: Proposition 5.8 Let (E, d) be a metric space, p ≥ 1 and x : [0, T ] → E be a continuous path of ﬁnite p-variation. Then p

ω x,p (s, t) := |x|p-var;[s,t] deﬁnes a control. Proof. We dealt with the case p = 1 in Proposition 1.12 and thus can focus on p > 1. Step 1: The same argument which gave super-additivity in the case p = 1 gives super-additivity of ω x,p in the present setting. The proof of continuity of ω x,p splits up into showing (i) “continuity from inside” ω x,p (s+, t−) ≡

lim

h 1 ,h 2 0

ω x,p (s + h1 , t − h2 ) = ω x,p (s, t) ,

which follows from the same argument as in the case p = 1, and (ii) “continuity from outside” ω x,p (s−, t+) ≡

lim

h 1 ,h 2 0

ω x,p (s − h1 , t + h2 ) = ω x,p (s, t) ,

5.1 H¨ older and p-variation paths on metric spaces

81

for all s < t. Remark that ω x,p (s, t+), ω x,p (s−, t), etc. are deﬁned in the obvious way and that all limits here exist by monotonicty of ω x,p . In fact, this reduces the proof of (ii) to showing ω x,p (s−, t+) ≤ ω x,p (s, t) and this requires a careful analysis which is not covered by our previous “p = 1” discussion.1 As a further reduction, it is enough to establish “onesided continuity from outside”, i.e. ω x,p (s, t) ≥ ω x,p (s, t+) and ω x,p (s, t) ≥ ω x,p (s−, t) .

(5.5)

We only discuss ω x,p (s, t) ≥ ω x,p (s, t+), the other inequality following from the same argument, and show how to deduce it from continuity of ω x,p at the diagonal, i.e. (5.6) ω x,p (t, t+) = 0. (The proof of (5.6) is left to step 2 below.) Fixing s < t and h, ε > 0 we consider D = (s = t0 < t1 < · · · < tn −1 < tn = t + h) such that n −1

p d xt i , xt i + 1 > ω x,p (s, t + h) − ε;

i=0

splitting D = D1 ∪D2 so that all points in [s, t] are contained in D1 (clearly, D1 is a dissection of [s, t]) yields p d xt i , xt i + 1 + ω x,p (t, t + h) > ω x,p (s, t + h) − ε t i ∈D 1

and after sending h to 0, using ω x,p (t, t+) = 0, p ω x,p (s, t) ≥ d xt i , xt i + 1 > ω x,p (s, t+) − ε i:t i ∈D 1

and upon sending ε to 0 we see that it is indeed enough to prove rightcontinuity of ω x,p at the diagonal. Step 2: To see (5.6) we seek a contradiction to lim ω x,p (t, t + h) =: δ > 0.

h0

Observe that the limit exists by monotonicity. Keeping t ﬁxed throughout, thanks to continuity of x, we can ﬁnd h1 such that for all h ∈ [0, h1 ] , p

d (xt , xt+ h ) < δ/8.

(5.7)

1 In the case p = 1 we used additivity of ω x , 1 to obtain continuity from outside. In general, when p > 1 a control ω x , p is not additive.

Variation and H¨ older spaces

82

Fix h0 ∈ [0, h1 ] and a dissection (t = τ 0 < τ 1 < · · · < τ k −1 < τ k = t + h0 ) of [t, t + h0 ] such that k −1

p d xτ i , xτ i + 1 > 7δ/8,

i=0

which is possible since ω x,p (t, t + h0 ) ≥ ω x,p (t, t+) = δ. Using (5.7), we have k −1 p d xτ i , xτ i + 1 > 7δ/8 − δ/8 = 3δ/4. i=1

Doing the same with τ 1 in place of t+h0 yields (t = σ 0 < σ 1 < · · · < σ l−1 < σ l = τ 1 ), a dissection of [t, τ 1 ], such that l−1 p d xσ j , xσ j + 1 > 3δ/4. j =1

Combining the previous two sums, over non-overlapping intervals of the form [σ j , σ j +1 ] , [τ i , τ i+1 ] ⊂ [t, t + h0 ], yields ω x,p (t, t + h0 ) ≥ 3δ/4 + 3δ/4 = 3δ/2, which implies ω (t, t+) ≥ 3δ/2, which contradicts limh0 ω x,p (t, t+h). That concludes the proof. Proposition 5.9 Let (E, d) be a metric space, p ≥ 1 and x : [0, T ] → E be a continuous path of ﬁnite p-variation and δ > 0. Then (i) p p sup d xt i , xt i + 1 ≤ |x|p-var;[s,t] ω x,δ ,p (s, t) := (t i )∈Dδ ([s,t])

deﬁnes a control. (ii) We have

i

p d xt i , xt i + 1 =

sup (t i )∈Dδ ([0,T ])

sup |t−s|< δ

(t i )∈Dδ ([0,T ])

i

as well as

sup

d(xs , xt ) 1/p

|t − s|

p

|x|p-var;[t i ,t i + 1 ]

i

= sup |x|1/p-H¨o l;[s,t] . |t−s|< δ

Proof. (i) The proof follows along the same lines as the proof of Proposition 5.8. (ii) In both cases, the ≤ part is obvious. Using the fact that ω x,δ ,p is a control, we obtain p p d xt i , xt i + 1 ≤ sup |x|p-var;[t i ,t i + 1 ] sup (t i )∈Dδ ([0,T ])

(t i )∈Dδ ([0,T ])

i

≤

sup (t i )∈Dδ ([0,T ])

≤

i

ω x,δ ,p ([0, T ])

i

ω x,δ ,p (ti , ti+1 )

5.1 H¨ older and p-variation paths on metric spaces

83

and, by the very deﬁnition of ω x,δ ,p ([0, T ]), equality must hold throughout. The 1/p-H¨older statement is also simple to prove and left to the reader. The following proposition is extremely important. We shall use part (i) below (without further notice) throughout the book; part (ii) says that a modulus of continuity on small intervals gives quantitative control over large intervals. Proposition 5.10 Let (E, d) be a metric space, ω a control on [0, T ] , p ≥ 1, C > 0, and x : [0, T ] → E a continuous path. (i) The pointwise estimate 1/p

d (xs , xt ) ≤ C ω (s, t)

for all s < t in [0, T ]

implies the p-variation estimate |x|p-var;[s,t] ≤ C ω (s, t)

1/p

for all s < t in [0, T ] .

(We say that x is of ﬁnite p-variation controlled by ω.) (ii) Under the weaker assumption 1/p

d (xs , xt ) ≤ C ω (s, t)

for all s < t in [0, T ] such that ω (s, t) ≤ 1

we have |x|p-var;[s,t] ≤ 2C

1/p

ω (s, t)

∨ ω (s, t) for all s < t in [0, T ] .

Proof. (Remark that only the super-additivity of ω is used in the proof.) p Ad (i). By assumption, d (xs , xt ) ≤ C p ω (s, t). Then for any dissection D = {ti } of [s, t], super-additivity implies p d xt i , xt i + 1 ≤ C p ω (ti , ti+1 ) ≤ C p ω (s, t) . i

i

Taking the supremum over all such dissections ﬁnishes the proof of the ﬁrst part. (ii) Deﬁning φp (x) = x∨xp we see (cf. Exercise 1.8) that (s, t) → φp (ω (s, t)) is a control. In view of part (i) we only need to prove 1/p

d (xs , xt ) ≤ 2Cφp (ω (s, t))

.

If s, t are such that ω (s, t) ≤ 1, there is nothing to prove, so we ﬁx s, t such that ω(s, t) > 1. Deﬁne t0 = s, and ti+1 = inf {u > ti , ω(ti , u) = 1} ∧ t. From super-additivity of ω it follows that tN = t for N ≥ ω(s, t). We

Variation and H¨ older spaces

84

conclude with d (xs , xt )

≤

d xt i , xt i + 1

0≤i< ω (s,t)

≤

Cω (ti , ti+1 )

1/p

0≤i< ω (s,t)

≤ C (1 + ω (s, t)) ≤ 2Cω (s, t) .

Exercise 5.11 Let x ∈ C p-var ([0, T ], E) , p ≥ 1, with associated control function p ω x,p (s, t) = |x|p-var;[s,t] . Show that, for any s < t < u in [0, T ], ω x,p (s, t) + ω x,p (t, u) ≤ ω x,p (s, u) ≤ 2p−1 [ω x,p (s, t) + ω x,p (t, u)] . Solution. The ﬁrst inequality is immediate. For the second, if s < s < t < t < u we have d (xs , xt ) ≤ d (xs , xt ) + d (xt , xt ) p

and since (a + b) ≤ 2p−1 (ap + bp ) for a, b ≥ 0 the conclusion follows.

The very same argument used in Lemma 1.15 shows lower semi-continuity of x → |x|p-var in the following sense. Lemma 5.12 Let (xn ) be a sequence of paths from [0, T ] → E of ﬁnite p-variation. Assume xn → x pointwise on [0, T ]. Then, for all s < t in [0, T ], |x|p-var;[s,t] ≤ lim inf |xn |p-var;[s,t] . n →∞

In particular, |x|1/p-H¨o l; [s,t] ≤ lim inf |xn |1/p-H¨o l;[s,t] . n →∞

In a similar spirit, the following lemma says that |x|p-var is a rightcontinuous function of p. Lemma 5.13 Let x : [0, T ] → E be a continuous path of ﬁnite p-variation. Then, for all s < t in [0, T ] , the map p ∈ [p, ∞) → |x|p -var;[s,t] is nonincreasing and |x|p -var;[s,t] = |x|p-var;[s,t] . (5.8) lim p p

5.1 H¨ older and p-variation paths on metric spaces

85

Proof. The non-increasing statement was proved in Proposition 5.3. In particular, it implies that ω (s, t)

1/p

:= lim |x|p -var;[s,t] p p

1/p

exists and satisﬁes ω (s, t) ≤ |x|p-var;[s,t] ; we only need to show the converse inequality. Clearly, d (xs , xt ) ≤ |x|p -var;[s,t] and sending p p we have 1/p (5.9) d (xs , xt ) ≤ ω (s, t) for all s < t in [0, T ]. Let us show that ω is super-additive. First observe that p p |x|p -var;[s,t] = lim |x|p -var;[s,t] . ω (s, t) = lim p →p

p p

p

Then, for s ≤ t ≤ u and using super-additivity of (s, t) → |x|p -var;[s,t] , ω (s, t) + ω (t, u)

= ≤

lim

p →p

p

p

|x|p -var;[s,t] + |x|p -var;[t,u ]

p

|x|p -var;[s,u ] = ω (s, u) . lim

p →p

But (5.9) and super-additivity of ω imply |x|p-var;[s,t] ≤ ω (s, t) 1/p

conclude that ω (s, t)

1/p

. We

= |x|p-var;[s,t] as required.

5.1.2 On some path-spaces contained in C p-var ([0, T ] , E) Observe that x is a path of ﬁnite p-variation controlled by (s, t) → |t − s| if and only if x is 1/p-H¨older. Hence, 1/p-H¨older paths are of ﬁnite pvariation. Conversely, we now show that every ﬁnite p-variation path is the time-change of a 1/p-H¨older path. Proposition 5.14 Let (E, d) be a metric space, and let x : [0, T ] → E be a continuous path. Then x is of ﬁnite p-variation if and only if there exists a continuous increasing function h from [0, T ] onto [0, 1] and a 1/p-H¨ older path g such that x = g ◦ h. Proof. Let x be of ﬁnite p-variation, non-zero. Then, h(t) =

ω x,p (0, t) ω x,p (0, T )

deﬁnes a continuous (from Proposition 5.8) increasing function from [0, T ] onto [0, 1] . Then, there exists a function g such that g ◦ h (t) = x (t) , as

Variation and H¨ older spaces

86

h (t1 ) = h (t2 ) =⇒ x (t1 ) = x (t2 ) . Now, sup u ,v ∈[0,1]

|g (u) − g (v)|

=

1/p

|u − v|

|g (h (u)) − g (h (v))|

sup

1/p

u ,v ∈[0,T ]

|h (u) − h (v)|

ω x,p (0, T )

1/p

1/p

≤

ω x,p (u, v)

1/p

|ω x,p (0, u) − ω x,p (0, v)|

.

From the sub-additivity of ω x,p , |ω x,p (0, u) − ω x,p (0, v)| ≥ |ω x,p (u, v)| , so that |g (u) − g (v)| 1/p ≤ ω x,p (0, T ) , sup 1/p u ,v |u − v| i.e. g is 1/p-H¨ older. Exercise 5.15 (absolute continuity of order p) We say that x : [0, T ] → E is “absolutely continuous of order p” if, for all ε > 0, there exists δ> 0, such that for all s1 < t1 ≤ s2 < t2 ≤ · · · < sn < tn in [0, T ] with p i |ti − si | < δ, we have p d (xs i , xt i ) < ε. (5.10) i

(i) Assume p ≥ 1. Show that in the deﬁnition of absolute continuity of order p one can replace (5.10) by p |x|p-var;[s i ,t i ] < ε. i

(ii) Assume p > 1, and show that x is absolutely continuous of order p if and only if p limδ →0 sup d xt i , xt i + 1 = 0. (5.11) D ∈Dδ ([0,T ])

i

Solution. (i) Consider is1 < t1 ≤ s2 < t2 ≤ · · · < sn < tn in [0, T ] with p |t − s | < δ. Let uj ∈ D ([si , ti ]) be a dissection of [si , ti ] and observe i i i that 1/p  uij +1 − uij p  uij +1 − uij = ti − si .  ≤ j

j

p p It follows that i j uij +1 − uij ≤ i |ti − si | < δ and so p d xu ij + 1 , xu ij <ε i

j

and we conclude by taking the supremum over all possible dissections of [si , ti ] , i = 1, . . . , n.

5.1 H¨ older and p-variation paths on metric spaces

87

(ii) “ ⇐ ” : Condition (5.11) implies that ∀ε > 0 : ∃˜δ :

sup D ∈Dδ˜ ([0,T ])

p d xu i , xu i + 1 < ε, i

for any dissection D with |D| ≤ ˜δ. Fix ε > 0 and take s1 < t1 ≤ s2 < t2 ≤ · · · < sn < tn in [0, T ] such that

p

p

|ti − si | < δ := ˜δ

i

which plainly implies maxi |ti − si | ≤ ˜δ. Take D = (ui ) to be a reﬁnement of {0 ≤ s1 < t1 ≤ · · · < sn < tn ≤ T } with mesh |D| ≤ ˜δ, without adding any (unnecessary) points in the intervals [si , ti ]. It then follows that

p

d (xs i , xt i ) ≤

p d xu i , xu i + 1 < ε,

i:u i ∈D

i

which shows that x is absolutely continuous of order p. “ =⇒ ” : Fix x an absolutely continuous path of order p, and ε > 0. We may write an arbitrary dissection D = (si ) of [0, T ] in the form D = {0 ≤ s1 < t1 = s2 < · · · < tn −1 = sn < tn ≤ T } p p−1 and furthermore assume |D| is small enough so that i |ti − si | ≤ T |D| < δ, where δ is chosen so that this implies, using the assumption of absolute continuity of order p of the path x, p d xs i , xs i + 1 < ε. i 1/(p−1)

This estimate is uniform over all dissections p < (δ/T ) D with |D| It follows that limδ →0 supD ∈Dδ ([0,T ]) i d xt i , xt i + 1 = 0.

= ˜δ.

Example 5.16 (Besov spaces) In Section 1.4 we introduced the (Sobolev) path spaces W 1,q ([0, T ] , E) which provided examples of ﬁnite 1−1/q . We now intro1-variation paths with precise H¨ older modulus |t − s| duce the fractional Sobolev – or Besov – spaces W δ ,q ([0, T ] , E) with δ < 1 whose elements are paths having ﬁnite p-variation with p = 1/δ > 1 and δ −1/q . More precisely, we make the following precise H¨older modulus |t − s| deﬁnition. Given q ∈ [1, ∞) and δ ∈ (1/q, 1), the space W δ ,q ([0, T ] , E) is the set of all x ∈ C ([0, T ] , E) for which |x|W δ , q ;[0,T ] :=

0

T

0

T

d (xu , xv ) δ +1/q

|v − u|

q

1/q dudv

< ∞.

Variation and H¨ older spaces

88

Following Section A.2 in Appendix A, the Garsia–Rodemich–Rumsey estimate leads quickly to a Besov–H¨older resp. variation “embedding”, by which we mean |x|δ −1/q -H¨o l;[0,T .] ≤ |x|(1/δ )-var;[0,T ]

≤

(const) |x|W δ , q ;[0,T ] , (const) |x|W δ , q ;[0,T ]

5.2 Approximations in geodesic spaces For a continuous path x : [0, T ] → Rd , and a dissection D = (ti )i of [0, T ] , we constructed the piecewise linear approximation xD by deﬁning d xD t i = xt i and connecting by straight lines in between. Straight lines in R are geodesics in the sense of the following deﬁnition. Deﬁnition 5.17 In a metric space (E, d) a geodesic (or geodesic path) joining two points a, b ∈ E is a continuous path Υa,b : [0, 1] → E such that Υa,b (0) = a, Υa,b (1) = b and a,b d Υa,b = |t − s| d (a, b) (5.12) s , Υt for all s < t in [0, 1]. If any two points in E are joined by a (not necessarily unique) geodesic, we call E a geodesic space. Equation (5.12) expresses that there are no shortcuts between any two points on the geodesic path. Even if E is complete and connected, it need not be a geodesic space; for example, the unit circle S 1 ⊂ R2 with metric induced from R2 is not geodesic. However, S 1 is a geodesic space under arclength distance. Readers with some background in Riemannian geometry will recall the Hopf–Rinow theorem;2 it says precisely that a complete connected Riemannian manifold is a geodesic space. The main example of a geodesic space to have in mind for our purposes is the free step-N nilpotent group equipped with Carnot–Caratheodory metric, to be discussed in detail later on. Geodesic spaces have exactly the structure that allows us to generalize the idea of piecewise linear approximations. To simplify, when considering a geodesic space E and two points a, b ∈ E, we will deﬁne Υa,b to be an arbitrary geodesic between a and b. Deﬁnition 5.18 (piecewise geodesic approximation) Let x be a continuous path from [0, T ] into some geodesic space (E, d) . Given a dissection D = {t0 = 0 < t1 < · · · < tn = T } of [0, T ] we deﬁne xD as the concatenation of geodesics connecting xt i and xt i + 1 for i = 1, . . . , n−1. More precisely, set xD t = xt for all t ∈ D 2 For

example, Bishop and Crittenden [14, p. 154].

5.2 Approximations in geodesic spaces

and for t ∈ (ti , ti+1 ),

xD t

x t i ,x t i + 1

=Υ

t − ti ti+1 − ti

89

.

Lemma 5.19 Let E be a geodesic space and x ∈ C([0, T ], E). Then, x converges to x uniformly on [0, T ]. That is, sup d xD t , xt → 0 as |D| → 0.

D

t∈[0,T ]

Proof. Fix two consecutive points ti < ti+1 in D and note that it is enough → 0 uniformly for t ∈ [ti , ti+1 ]. To see this, ﬁx ε > 0 , x to show that d xD t t and pick δ = δ (ε) so that osc (x; δ) ≡

sup

d (xs , xt ) < ε/2

s< t in [0,T ]: t−s< δ

(which is possible since x is continuous on the compact [0, T ] and hence uniformly continuous). Then, for t ∈ [ti , ti+1 ] and provided that |D| < δ, we have ≤ d xD d xD t , xt t , xt i + d (xt i , xt ) t − ti d xt , xt + d (xt i , xt ) = i i+ 1 ti+1 − ti ≤ 2osc (x; δ) < ε which already ﬁnishes the proof. Proposition 5.20 Let E be a geodesic space and x ∈ C p-var ([0, T ], E), p ≥ 1 and D = {0 = t0 < t1 < · · · < tn = T } a dissection of [0, T ] . Then, D x ≤ 31−1/p |x| . (5.13) p-var;[0,T ]

p-var;[0,T ]

If x is 1/p-H¨ older, D x ≤ 31−1/p |x|1/p-H¨o l;[0,T ] . 1/p-H¨o l;[0,T ]

(5.14)

Remark 5.21 D induces a dissection of any interval [ti , tj ] with endpoints ti , tj ∈ D. It follows that [0, T ] in (5.13) may be replaced by any interval [ti , tj ] with ti , tj ∈ D. Proof. p-variation estimate: To prove (5.13) we use the control ω (s, t) = p |x|p-var;[s,t] and then deﬁne ω D ﬁrst on the intervals of D by p t−s ω (ti , ti+1 ) for ti ≤ s ≤ t ≤ ti+1 , 1 ≤ i < #D ω D (s, t) = ti+1 − ti

Variation and H¨ older spaces

90

and then for arbitary s < t, say 1 ≤ i < j < #D and ti ≤ s ≤ ti+1 ≤ tj ≤ t ≤ tj +1 , by ω D (s, t) = ω D (s, ti+1 ) + ω(ti+1 , tj ) + ω D (tj , t).

(5.15)

Clearly, if ti ≤ s ≤ t ≤ ti+1 , D D s − ti t − ti x t i ,x t i + 1 x t i ,x t i + 1 = d Υ ,Υ d xs , xt ti+1 − ti ti+1 − ti t−s d xt , xt using (5.12) = i i+ 1 ti+1 − ti ≤

ω D (s, t)1/p .

On the other hand, if ti ≤ s ≤ ti+1 ≤ tj ≤ t ≤ tj +1 , D D D D ≤ d x d xD , x , x s t s t i + 1 + d xt i + 1 , xt j + d(xt j , xt ) ≤ ω D (s, ti+1 )1/p + ω(ti+1 , tj )1/p + ω D (tj , t)1/p ≤

31−1/p (ω D (s, ti+1 ) + ω(ti+1 , tj ) + ω D (tj , t))

=

31−1/p ω D (s, t)1/p

using (5.15) in the last line. Hence, for all s < t in [0, T ] , D p d xD ≤ 3p−1 ω D (s, t). s , xt

1/p

(5.16)

It now suﬃces to show that ω D is a control (only super-additivity is nontrivial) to obtain the desired conclusion, namely D x ≤ 31−1/p ω D (0, T )1/p p-var;[0,T ] =

31−1/p |x|p-var;[0,T ] .

To see super-additivity, ω D (s, t) + ω D (t, u) ≤ ω D (s, u) for s ≤ t ≤ u in [0, T ] we ﬁrst consider the case when s, t, u are contained in one interval, say ti ≤ s < t < u ≤ ti+1 . Then p p t−s u−t ω (ti , ti+1 ) + ω (ti , ti+1 ) ω D (s, t) + ω D (t, u) = ti+1 − ti ti+1 − ti p u−s ω (ti , ti+1 ) = ω D (s, u) . ≤ ti+1 − ti Consider the case that s, t are contained in one interval, say ti ≤ s < t ≤ ti+1 ≤ tk ≤ u ≤ tk +1 . Then ω D (s, t) + ω D (t, u) = ω D (s, t) + ω D (t, ti+1 ) + ω (ti+1 , tk ) + ω D (tk , u)

≤ω D (s,t i + 1 )

5.2 Approximations in geodesic spaces

91

(using the ﬁrst case!) and conclude the deﬁning equality ω D (s, u) = ω D (s, ti+1 )+ω (ti+1 , tk )+ω D (tk , u). The case that t, u are contained in one interval is similar. At last, if s, t, u are in three diﬀerent intervals, say ti ≤ s ≤ ti+1 ≤ tj ≤ t ≤ tj +1 ≤ tk ≤ u ≤ tk +1 , then ω D (s, t) + ω D (t, u) equals ω D (s, ti+1 ) + ω(ti+1 , tj ) + ω D (tj , t) + ω D (t, tj +1 ) + ω(tj +1 , tk ) + ω D (tk , u)

≤ω D (t j ,t j + 1 )=ω (t j ,t j + 1 )

≤ω (t i + 1 ,t k )

and we conclude again with the deﬁning equality for ω D (s, u). This covers all cases and we have established that ω D is a control. older then 1/p-H¨ older estimate: If x is actually 1/p-H¨ 1/p

ω (s, t)

= |x|p-var;[s,t] ≤ |x|1/p-H¨o l;[0,T ] |t − s|

1/p

and so for ti ≤ s ≤ t ≤ ti+1 p t−s p ω D (s, t) = ω (ti , ti+1 ) ≤ |x|1/p-H¨o l;[0,T ] |t − s| . ti+1 − ti For general s < t, say ti ≤ s ≤ ti+1 ≤ tj ≤ t ≤ tj +1 , we have ω D (s, t)

= ω D (s, ti+1 ) + ω(ti+1 , tj ) + ω D (tj , t) p ≤ |x|1/p-H¨o l;[0,T ] (|ti+1 − s| − |tj − ti+1 | − |t − tj |) p

= |x|1/p-H¨o l;[0,T ] |t − s| The claimed estimate (5.14) now follows immediately from 1/p 1/p D ≤ 31−1/p ω D (s, t) ≤ 31−1/p |x|1/p-H¨o l;[0,T ] |t − s| . d xD s , xt

Remark 5.22 The above proof actually shows that D p x

p-var;[0,T ]

≤ 3p−1

p

sup (t i )∈D|D | ([0,T ])

p

|x|p-var;[t i ,t i + 1 ] ,

(5.17)

i

and a slight extension shows that for |D| < δ one has the estimate p p xD sup ≤ 3p−1 sup |x|p-var;[t i ,t i + 1 ] . ] p-var;[t ,t (t j )∈Dδ ([0,T ])

j

i

j+1

(t i )∈Dδ ([0,T ])

i

(5.18)

Variation and H¨ older spaces

92

Combining Lemma 5.19 and Proposition 5.20 gives immediately the following important approximation result. Theorem 5.23 Let E be a geodesic space and x ∈ C p-var ([0, T ], E), p ≥ 1. Let (Dn ) be a sequence of dissection of [0, T ] such that its mesh |Dn | converges to 0. Then, xD n converges to x “uniformly with uniform p-variation bounds”. That is, n →0 , x sup d xD t t t∈[0,T ]

and

sup xD n p-var;[0,T ] ≤ 31−1/p |x|p-var;[0,T ] . n

If x is 1/p-H¨ older then sup xD n 1/p-H¨o l;[0,T ] ≤ 31−1/p |x|1/p-H¨o l;[0,T ] . n

Exercise 5.24 Let E be a geodesic space, p ∈ (1, ∞) and x ∈ W 1,p ([0, T ], E) as deﬁned in Section 1.4.2. Show that d xt i , xt i + 1 p D p x 1 , p = p−1 . ;[0,T ] W |ti+1 − ti | i:t i ∈D

5.3 H¨older and p-variation paths on Rd 5.3.1 H¨ older and p-variation Banach spaces We now turn to Rd (equipped with Euclidean distance) as our most familiar example of a metric (and geodesic) space. Theorem 5.25 (i) C p-var [0, T ] , Rd is Banach with normd x → |x (0)| + p-var [0, T ] , R started at 0 |x|p-var;[0,T ] . The closed subspace of paths in C is also Banach under x → |x|p-var;[0,T ] . (ii) C 1/p-H¨o l [0, T ] , Rd is Banach with norm x → |x (0)| + |x|1/p-H¨o l;[0,T ] . The closed subspace of paths in C 1/p-H¨o l [0, T ] , Rd started at 0 is also Banach under x → |x|1/p-H¨o l;[0,T ] . These Banach spaces are not separable. Proof. The case p = 1 was dealt with in Section 1.3. Leaving straightforward details to the reader, let us say that completeness in the case p > 1 is proved as in the case p = 1; non-separability follows from the following example.

5.3 H¨ older and p-variation paths on Rd

93

Example 5.26 We construct an uncountable family of functions so that the distance of any two f = f remains bounded below by a ﬁxed positive real. An uncountable subset of C ([0, 1] , R) is given by εk 2−k /p sin 2k πt , t ∈ [0, 1] , fε (t) = k ≥1

where ε is a ±1 sequence, that is, εk ∈ {−1, 1} for all k. We show (i) that fε ∈ C 1/p-H¨o l [0, 1] , Rd ⊂ C p-var [0, 1] , Rd and (ii) if ε = ε then 2 < |fε − fε |p-var;[0,1] ≤ |fε − fε |1/p-H¨o l;[0,1] .

Proof. Ad (i). For 0 ≤ s < t ≤ 1 we have |fε (t) − fε (s)| ≤ εk 2−k /p sin 2k πt − sin 2k πs 1≤k ≤|log ( 2 ) (t−s) | εk 2−k /p sin 2k πt − sin 2k πs + k > |log ( 2 ) (t−s) | is the logarithm with base 2. Using |ε|l ∞ ≤ 1, we obtain where log k (2) sin 2 πt − sin 2k πs ≤ 2k π |t − s| for the ﬁrst sum and |sin (· · · )| ≤ 1 for the second, and hence 2−k /p 2k + 2.2−k /p |fε (t) − fε (s)| ≤ π |t − s| 1≤k ≤|log ( 2 ) (t−s) | k > |log ( 2 ) (t−s) | 1/p

≤ c1 |t − s|

for some constant c1 = c1 (p), independent of s, t and ε. This proves (i). Ad (ii). Assume ε = ε and let j ≥ 1 be the ﬁrst index for which εj = ε j , i.e. ε1 = ε 1 , . . . , εj −1 = ε j −1 but εj = ε j . Consider then a dissection D of [0, 1] given by ti = i2−j −1 : i = 0, . . . , 2j +1 . From j sin 2 πti+1 − sin 2j πti p = 1 it follows readily that sin 2j π· p-var;[0,1] ≥ 2j /p . Moreover, |(fε −fε ) (ti+1 ) − (fε −fε ) (ti )| = εj −ε j 2−j /p sin 2j πti+1 − sin 2j πti = 2.2−j /p . This shows that |fε − fε |p-var;[0,1] ≥ 2.

Variation and H¨ older spaces

94

5.3.2 Compactness

Lemma 5.27 Consider (xn ) ⊂ C [0, T ] , Rd and assume xn → x ∈ C [0, T ] , Rd uniformly. (i) Assume supn |xn |p-var;[0,T ] < ∞. Then xn → x in p -variation for any p > p. older norm for (ii) Assume supn |xn |α -H¨o l;[0,T ] < ∞. Then xn → x in α -H¨ any α < α. Proof. By Lemma 5.12 we see that x is of ﬁnite p-variation. It then sufﬁces to apply the interpolation result (Proposition 5.5) to the diﬀerence x − xn . Proposition 5.28 (compactness) Consider (xn ) ⊂ C [0, T ] , Rd . (i) Assume (xn ) is equicontinuous, bounded and supn |xn |p-var;[0,T ] < ∞. Then xn converges (in p > p variation, along a subsequence) to some p-var d [0, T ] , R . x∈C (ii) Assume (xn ) is bounded and supn |xn |α -H¨o l;[0,T ] < ∞. Then xn converges (in α < α H¨ older topology, along a subsequence) to some x ∈ C α -H¨o l [0, T ] , Rd . Proof. An obvious consequence of Arzela–Ascoli and the previous lemma. The following corollary will be useful, for example, in the proof of the forthcoming Theorem 6.8. Corollary 5.29 (i) Assume (xn ) , x are in C p-var [0, T ] , Rd such that supn |xn |p-var;[0,T ] < ∞ and xn → x uniformly, then for p > p, sup (s,t)∈∆ T

n |x |p -var;[s,t] − |x|p -var;[s,t] → 0 as n → ∞

where ∆T = {(s, t) : 0 ≤ s ≤ t ≤ T }. Furthermore, |xn |p -var; [·,·] : n ∈ N is equicontinuous in the sense that for every ε > 0 there exists δ such that |t − s| < δ implies (5.19) sup |xn |p -var; [s,t] < ε. n

(ii) If (x ) , x are in C ([0, T ] , E) such that supn |xn |α -H¨o l;[0,T ] < ∞ n and x → x uniformly, then for all s < t in [0, T ], then for α < α, sup |xn |α -H¨o l;[s,t] − |x|α -H¨o l;[s,t] → 0 as n → ∞ n

α -H¨o l

(s,t)∈∆ T

and

|xn |α -H¨o l; [·,·] : n ∈ N

is equicontinuous, similar to part (i).

5.3 H¨ older and p-variation paths on Rd

95

Proof. (i) Proposition 5.5, applied to xn − x, actually shows that lim sup |xn − x|p -var;[s,t] = 0.

n →∞ s,t

Hence, |xn |p -var;[·,·] converges uniformly on ∆T and, by Arzela–Ascoli’s theorem, is equicontinuous. That is, for any ε > 0 there exists δ such that |(s, t) − (s , t )| < δ implies sup |xn |p -var;[s,t] − |xn |p -var;[s ,t ] < ε. n

In particular, this applies to (s, t) ∈ ∆T with |t − s| < δ and s := t := s, and using |xn |p -var;[s ,s ] = 0 we see that sup |xn |p -var;[s,t] < ε n

which concludes the proof of (i). The proof of (ii) follows similar lines.

5.3.3 Closure of smooth paths in variation norm

For p ≥ 1 we deﬁne C 0,p-var [0, T ] , Rd resp. C 0,1/p-H¨o l [0, T ] , Rd as the closure of smooth paths from [0, T ] → Rd in p-variation resp. 1/p-H¨older norm. In symbols, C 0,p-var [0, T ] , Rd C 0,1/p-H¨o l [0, T ] , Rd

p-var

: = C ∞ ([0, T ] , Rd )

,

1/p-H¨o l

: = C ∞ ([0, T ] , Rd )

.

p-var

[0, T ] , Rd resp. Obviously, these are closed, linear subspaces of C C 1/p-H¨o l [0, T ] , Rd and thus Banach spaces and so is the restriction to 0,1/p-H¨o l paths with x (0) = 0, denoted by Co0,p-var [0, T ] , Rd resp. Co ([0, T ] , Rd . The case p = 1 was already discussed earlier in Section 1.3 where, 0,1-var among otherthings, we identiﬁed C d as absolutely continuous paths 0,1-H¨o l d 1 [0, T ] , R as C [0, T ] , R . For p > 1 we have and C Lemma 5.30 Let p > 1. (i) Let Ω be a set in C 1-var [0, T ] , Rd such that C 0,1-var [0, T ] , Rd ⊂ 1-var

Ω

. Then,

p-var

Ω

= C 0,p-var [0, T ] , Rd .

1-H¨o l (ii) Let Ω be a set in C 1-H¨o l [0, T ] , Rd such that C 1 [0, T ] , Rd ⊂ Ω . Then, 1/p-H¨o l Ω = C 0,1/p-H¨o l [0, T ] , Rd .

96

Variation and H¨ older spaces p-var

Proof. (i) First, C 0,p-var ⊂ Ω

follows immediately from

C ∞ ⊂ C 0,1-var ⊂ Ω

1-var

p-var

⊂Ω

.

The converse inclusion follows readily from C 1-var ⊂ C ∞ Ω ⊂ C 1-var ⊂ C ∞

p-var

p-var

=⇒ Ω

⊂ C∞

p-var

p-var

; indeed,

.

p-var

, recall from Exercise 2.6 that any x ∈ C 1-var can To see C 1-var ⊂ C ∞ be approximated by xn ∈ C ∞ in uniform norm with uniform 1-variation bounds, i.e. |x − xn |∞;[0,T ] → 0, sup |xn |1-var;[0,T ] < ∞; n

then interpolation (Proposition 5.5 applied to x − xn ) gives xn → x in p-variation, which is what we had to prove. (ii) Similar and left to the reader. Theorem 5.31 (Wiener’s characterization) Let x ∈ C p-var ([0, T ], Rd ), with p > 1. The following statements are equivalent. (i.1) x ∈ C 0,p-var ([0, T ], Rd ). p (i.2a) limδ →0 supD =(t i ),|D |< δ i |x|p-var;[t i ,t i + 1 ] = 0. p (i.2b) limδ →0 supD =(t i ),|D |< δ i d xt i , xt i + 1 = 0. D (i.3) lim|D |→0 dp-var x , x = 0. Secondly, let x ∈ C 1/p-H¨o l [0, T ], Rd , with p > 1. The following statements are equivalent. (ii.1) x ∈ C 0,1/p-H¨o l [0, T ], Rd . (ii.2a) limδ →0 sup|t−s|< δ |x|1/p-H¨o l;[s,t] = 0. (ii.2b) limδ →0 sup|t−s|< δ d(xs , xt )/|t − s|1/p = 0. (ii.3) lim|D |→0 d1/p-H¨o l xD , x = 0. Remark 5.32 From purely metric considerations, we have seen in Exercise 5.15 that (i.2b) is equivalent to “absolute continuity of order p”. Remark also that the case p = 1 requires special care: by Corollary 1.34 (i.1) ⇔ (i.3) holds true. On the other hand, Proposition 1.14 tells us that in the case p = 1 condition (i.2) is tantamount to saying x is constant; in particular, conditions (i.1), (i.3) do not imply (i.2). Similar comments apply in the H¨older case. Proof. We only prove the p-variation statements, as the 1/p-H¨ older ones follow the same logic. From Lemma 5.30, the dp-var -closure of C 1-var [0, T ] , Rd is C 0,p-var ([0, T ], Rd ), which implies (i.3) ⇒ (i.1). The reverse proof of (i.1) ⇒ (i.3) follows the same lines as the proof in the case p = 1, i.e. the proof of Corollary 1.34.

5.3 H¨ older and p-variation paths on Rd

97

We already proved in Proposition 5.9 that (i.2a) ⇔ (i.2b) and now turn to (i.1) ⇒ (i.2b). p Let us ﬁx ε > 0, and a smooth path y such that dp-var (x, y) ≤ ε2−p . For a dissection D, we obtain from the triangle inequality p p d xt i , xt i + 1 ≤ 2p−1 d yt i , yt i + 1 + 2p−1 dp-var;[0,T ] (y, x)p . t i ∈D

t i ∈D

Since y is smooth, there exists δ > 0 (that depends on n) such that for all dissections |D| < δ implies that p d yt i , yt i + 1 < ε2−p . t i ∈D

Hence, we obtain that for all dissections D with |D| < δ, p d xt i , xt i + 1 ≤ ε. t i ∈D

We ﬁnish by proving (i.2a) ⇒ (i.3). First, if x and y are two paths, and δ is some ﬁxed positive real, observe that for all subdivisions D = (ti ) , p d xt i ,t i + 1 , yt i ,t i + 1 i

≤

i,|t i + 1 −t i |≤δ



≤

2p−1 

p d xt i ,t i + 1 , yt i ,t i + 1 +

i,|t i + 1 −t i |> δ



p

|x|p-var;[t i ,t i + 1 ] +

i,|t i + 1 −t i |≤δ

+

p d xt i ,t i + 1 , yt i ,t i + 1 |y|p-var;[t i ,t i + 1 ]  p

i,|t i + 1 −t i |≤δ

T d0 (x, y) . δ

Taking the supremum over all dissections, we obtain p p dp-var (x, y) ≤ 2p−1 sup |x|p-var;[t i ,t i + 1 ] (t i )∈Dδ ([0,T ])

p−1

+2

i

sup

(t i )∈Dδ ([0,T ])

p

|y|p-var;[t i ,t i + 1 ]

i

T + d0 (x, y) . δ Taking a bounded variation path x and its piecewise linear approximation xD for some dissection D with |D| < δ, we obtain, using inequality (5.18), p p dp-var x, xD ≤ cp sup |x|p-var;[t i ,t i + 1 ] (t i )∈Dδ ([0,T ])

T + d0 x, xD . δ

i

Variation and H¨ older spaces

98

p First ﬁx δ > 0 such that cp sup(t i )∈Dδ ([0,T ]) i |x|p-var;[t i ,t i + 1 ] < ε/2. Then as xD converges to x in uniform topology when |D| → 0, there exists δ 2 < δ such that for all dissections D with |D| < δ 2 , T d0 x, xD < ε/2. δ p Hence, for all dissections D with |D| < δ 2 , we have dp-var x, xD < ε and the proof is ﬁnished. Corollary 5.33 For p > 1, we have the following set inclusions: ⊂ C 0,p-var [0, T ] , Rd C q -var [0, T ] , Rd 1≤q p

Proof. Recalling basic inclusions between p- and q-variation spaces (Proposition 5.3), only the inclusion C q -var ([0, T ] , R) ⊂ C 0,p-var ([0, T ] , R) q
requires an argument. Thanks to Proposition 5.6, p x∈ C q -var ([0, T ] , R) =⇒ lim sup d xt i , xt i + 1 = 0 1≤q < p

δ →0 D =(t i ),|D |< δ

i

and we conclude using Theorem 5.31. Example 5.34 An example of a function in C 1/2-H¨o l ([0, 1], R) but not in C 0,1/2-H¨o l ([0, 1], R) is given by t → t1/2 , as follows immediately from Wiener’s characterization, Theorem 5.31. ∞ Exercise 5.35 (i) Deﬁne g (x) = i=1 c−i/p sin ci x . If c is a suﬃciently / C 0,p-var ([0, 1] , large positive integer, show that g ∈ C p-var ([0, 1] , R) but g ∈ R) . (ii) Deﬁne h (x) = x1/p cos2 (π/x) / log x for x > 0, h (0) = 0. Show that h ∈ C 0,p-var ([0, 1] , R) and h∈ / ∪q < p C q -var ([0, 1] , R) . 0,1/p-H¨o l 0,p-var d [0, T ], R ,C Proposition 5.36 Let p ≥ 1. The spaces C ([0, d T ] , R are separable Banach spaces (and hence Polish). Proof. From 1.35, there is a countable space Ω that is dense Proposition in C 0,1-var [0, T ] , Rd . We conclude using Lemma 5.30.

5.4 Generalized variation

99

5.4 Generalized variation 5.4.1 Deﬁnition and basic properties The concept of variation (and then p-variation) allows for an obvious generalization: Deﬁnition 5.37 Let (E, d) be a metric space, ϕ ∈ C ([0, ∞), [0, ∞)), 0 at 0, strictly increasing and onto. A path x : [0, T ] → E is said to be of ﬁnite ϕ-variation on the interval [s, t] if $ # / 0 d xt i , xt i + 1 ≤ 1 < ∞. ϕ |x|ϕ-var;[0,T ] := inf M > 0, sup M D ∈D([0,T ]) t i ∈D

We will use the notation C ϕ-var ([0, T ], E) for the set of continuous paths x : [0, T ] → E of ﬁnite ϕ-variation. The set of paths pinned at time zero to some ﬁxed element o ∈ E is denoted by Coϕ-var ([0, T ], E). For ϕ (x) = xp , the deﬁnition of |x|ϕ-var;[0,T ] coincides with |x|p-var;[0,T ] . If (E, d) is a normed space and ϕ is (globally) convex, |.|ϕ-var;[0,T ] is a seminorm. Several variation functions of interest are not convex (including the class ψ p,q to be introduced in the forthcoming Deﬁnition 5.45, which will be convenient for our later applications). A ﬁrst interest in ϕ-variation comes from the fact that (sharp) sample path properties for stochastic processes are often available in this form. For example, a classical result of Taylor (cf. the forthcoming Theorem 13.15) states that Brownian motion has a.s. ﬁnite ψ 2,1 -variation on any compact interval [0, T ] and this is optimal (cf. Theorem 13.69). A wide class of (enhanced) Gaussian processes have a.s. ﬁnite ψ p,p/2 -variation, while (enhanced) Markov processes (with uniformly elliptic generator in divergence form) have the “Brownian” ψ 2,1 -variation regularity. The other reason for our interest in ϕ-variation is that it is intimately related to the uniqueness of solutions to rough diﬀerential equations under minimal regularity assumptions; as will be discussed in Section 10.5. Lemma 5.38 Let x ∈ C ϕ-var ([0, T ], E). Then, for all M ≥ |x|ϕ-var;[0,T ] we have / 0 d xt i , xt i + 1 ≤ 1. (5.20) ϕ sup M D ∈D([0,T ]) t i ∈D

Proof. Only the case M = |x|ϕ-var;[0,T ] requires a proof. By deﬁnition, there exists a sequence Mn ↓ M such that (5.20) holds with M replaced by Mn . In particular, for a ﬁxed dissection D, / / 0 0 d xt i , xt i + 1 d xt i , xt i + 1 ≤1 ≤ 1, and hence ϕ ϕ Mn M t i ∈D

t i ∈D

Variation and H¨ older spaces

100

by continuity of ϕ. Taking the supremum over all D ∈ D ([0, T ]) ﬁnishes the proof. Just as in the common case of p-variation, controls are a very useful concept. Proposition 5.39 Let (E, d) be a metric space, ϕ ∈ C ([0, ∞), [0, ∞)), 0 at 0, strictly increasing and onto. Then the following are equivalent. (i) x ∈ C ϕ-var ([0, T ], E) with |x|ϕ-var;[0,T ] ≤ M for some M ≥ 0. (ii) There exists a control ω with ω (0, T ) ≤ 1 such that for all s < t in [0, T ] , d (xs , xt ) ≤ M ϕ−1 (ω (s, t)) . Proof. Deﬁne ω x,ϕ (s, t) :=

sup

D ∈D([s,t]) t ∈D i

/ 0 d xt i , xt i + 1 . ϕ M

Working as in the proof of Proposition 5.8, we see that ω x,ϕ is a control with ω x,ϕ (0, T ) ≤ 1. We then, by deﬁnition of ω x,ϕ , have that d (xs , xt ) ≤ M ϕ−1 (ω x,ϕ (s, t)) . Conversely, assume that for all s < t in [0, T ] , d (xs , xt ) ≤ M ϕ−1 (ω (s, t)) for some M > 0 and control ω with ω (0, T ) ≤ 1. Then, for a dissection D, we have / 0 d xt i , xt i + 1 ≤ ϕ ϕ ◦ ϕ−1 [ω (ti , ti+1 )] M t i ∈D

t i ∈D

≤ ω (0, T ) ≤ 1, and hence |x|ϕ-var;[0,T ] ≤ M. For simplicity, we will only look at ϕ-variation of paths for functions ϕ satisfying the following condition. Condition 5.40 [∆c ] Assume ϕ ∈ C ([0, ∞), [0, ∞)), 0 at 0, strictly increasing and onto. We say ϕ satisﬁes condition ∆c if for all c > 0, there exists ∆c ≥ 0 such that ∀x ∈ [0, ∞) : ϕ (cx) ≤ ∆c ϕ (x) and limc→0 ∆c = 0. The condition ∆c leads to the following convenient equivalences. Proposition 5.41 Let (E, d) be a metric space, and let x : [0, T ] → E be a continuous path. Assume the variation function ϕ satisﬁes condition (∆c ). Then the following conditions are equivalent. (i) The path x is of ﬁnite ϕ-variation. (ii) There exists M > 0 such that / 0 d xt i , xt i + 1 < ∞. ϕ sup M D ∈D([0,T ]) t i ∈D

5.4 Generalized variation

101

(iii) for all K > 0, sup

D ∈D([0,T ]) t ∈D i

/ 0 d xt i , xt i + 1 < ∞. ϕ K

Proof. Trivially, (i) =⇒ (ii) and (iii) =⇒ (ii). We show that (ii) implies (i) and (iii). For any K > 0, using condition ∆c , we have / / 0 0 d xt i , xt i + 1 d xt i , xt i + 1 ≤ ∆M/K sup <∞ ϕ ϕ sup K M D ∈D([0,T ]) D ∈D([0,T ]) t i ∈D

t i ∈D

which proves (ii) =⇒ (iii). Similarly, / 0 d xt i , xt i + 1 ≤ ∆M /K ϕ sup K D ∈D([0,T ]) t i ∈D

sup

D ∈D([0,T ]) t ∈D i

/ 0 d xt i , xt i + 1 ϕ M

≤ 1 for K large enough (as ∆M/K →K→∞ 0) and so (ii) =⇒ (i). The proof is ﬁnished. ˜ Recall C p-var ([0, T ] , E) ⊂ C p-var ([0, T ] , E), p˜ ≥ p. This generalizes to ˜ = O (ϕ) at 0+. Lemma 5.42 Assume ϕ, ϕ ˜ satisﬁes condition (∆c ) and ϕ Then C ϕ-var ([0, T ] , E) ⊂ C ϕ˜ -var ([0, T ] , E) . Proof. Let x ∈ C ϕ-var ([0, T ] , E) , then for all K > 0, and in particular K := |x|0;[0,T ] , we have sup

D ∈D([0,T ]) t ∈D i

/ 0 d xt i , xt i + 1 < ∞. ϕ K

For all i, d xt i , xt i + 1 /K ≤ 1, and by assumption, on [0, 1] , there exists a ﬁnite constant c such that ϕ ˜ (u) ≤ cϕ (u) for all u ∈ [0, 1] . Hence / / 0 0 d xt i , xt i + 1 d xt i , xt i + 1 ≤ c sup <∞ ϕ ˜ ϕ sup K K D ∈D([0,T ]) D ∈D([0,T ]) t i ∈D

t i ∈D

and so x ∈ C ϕ˜ -var ([0, T ] , E), using Proposition 5.41. We now make some quantitative relations between ϕ-variation and pvariation. As will be seen in Corollary 5.44 below, p-variation estimates often imply ϕ-variation estimates with no extra work. Theorem 5.43 Fix p ≥ 1, and assume that ϕ satisﬁes condition ∆c . p Assume also that ϕ−1 (·) is convex on [0, δ] for some δ ∈ (0, 1]. Let

Variation and H¨ older spaces

102

x ∈ C ϕ-var ([0, T ], E). Then, the control3 ω (s, t) :=

sup

D ∈D([s,t]) t ∈D i

/ 0 d xt i , xt i + 1 ϕ |x|ϕ-var;[0,T ]

satisiﬁes ω (0, T ) ≤ 1. Moreover, for C = C (ϕ, p) and all s < t in [0, T ] , |x|p-var;[s,t] ≤ C |x|ϕ-var;[0,T ] ϕ−1 (ω (s, t)) . Proof. It suﬃces to consider the case of non-constant x (·) so that |x|ϕ-var;[0,T ] > 0. By Lemma 5.38, ω (0, T ) ≤ 1. Then, as ϕ satisﬁes condition ∆c , there exists κ > 0 such that ∆κ ≤ δ, and we have 0 0 / / d (xs , xt ) d (xs , xt ) ≤ ∆κ ϕ ≤ δω (s, t) ϕ κ |x|ϕ-var;[0,T ] |x|ϕ-var;[0,T ] from which d (xs , xt ) ≤ κ−p |x|ϕ-var;[0,T ] ϕ−1 (δω (s, t)) . Note that p δω (s, t) ∈ [0, δ] for any s < t in [0, T ] and so, from convexity of ϕ−1 (·) on [0, δ], p (s, t) → ϕ−1 (δω (s, t)) p

p

p

is a control. It then follows from basic super-additivity properties of controls that p

p

|x|p-var;[s,t] ≤

|x|ϕ-var;[0,T ] κp

p

ϕ−1 (δω (s, t)) ≤ p

|x|ϕ-var;[0,T ] κp

ϕ−1 (ω (s, t))

p

where we used δ ≤ 1 in the ﬁnal step, and the proof is ﬁnished. ˜ ˜ d), We now consider a second path y with values in some metric space (E, whose p-variation is dominated by the p-variation of x. (This situation will be typical for solutions of (rough) diﬀerential equations.) Corollary 5.44 Fix p ≥ 1, and assume that ϕ satisﬁes condition ∆c . p Assume also that ϕ−1 (·) is convex on [0, δ] for some δ ∈ (0, 1]. Let ˜ be such that for all s < t in [0, T ] , x ∈ C([0, T ], E), y ∈ C([0, T ], E) |y|p-var;[s,t] ≤ K |x|p-var;[s,t] . Then, for some constant C = C (p, ϕ) and all s < t in [0, T ] , |y|ϕ-var;[s,t] ≤ CK |x|ϕ-var;[s,t] . Proof. From Theorem 5.43, we have d (ys , yt )

3 With

convention 0/0 = 0.

≤

|y|p-var;[s,t] ≤ K |x|p-var;[s,t]

≤

CK |x|ϕ-var;[0,T ] ϕ−1 (ω (s, t)) .

(5.21)

5.4 Generalized variation

103

Hence, if D = (ti ) is a dissection of [0, T ], we have / 0 d yt i , yt i + 1 ≤ ϕ ω (ti , ti+1 ) ≤ ω (0, T ) ≤ 1 CK |x|ϕ-var;[0,T ] t i ∈D

t i ∈D

which implies that |y|ϕ-var;[0,T ] ≤ CK |x|ϕ-var;[0,T ] . There is nothing special about the interval [0, T ] and by a simple reparametrization argument we see that |y|ϕ-var;[s,t] ≤ CK |x|ϕ-var;[s,t] .

5.4.2 Some explicit estimates for ψ p,q We now apply all these abstract considerations to the following class of variation function. Deﬁnition 5.45 For any (p, q) ∈ R+ × R set ψ p,q (0) = 0 and ψ p,q (t) :=

tp (ln ln 1/t) q p

for t ∈ (0, e−e ) t for t ≥ e−e

or, equivalently, ψ p,q (t) = tp / (ln∗ ln∗ 1/t) where ln∗ = max (1, ln) . q

Exercise 5.46 Show that, for any (p, q) ∈ R+ × R, the function ψ p,q (·) satisﬁes condition ∆c . Solution. A possible choice is ∆c = 4/ψ p,q (1/c). The details are left to the reader. Exercise 5.47 For (p1 , q1 ) , (p2 , q2 ) in R+ × R, we say that (p1 , q1 ) ≤ (p2 , q2 ) if p1 ≤ p2 or if p1 = p2 and q1 ≤ q2 . Show that for (p1 , q1 ) ≤ (p2 , q2 ) , C ψ p 1 , q 1 -var ([0, T ] , E) ⊂ C ψ p 2 , q 2 -var ([0, T ] , E) . Solution. In all cases, we have ψ p 2 ,q 2 (t) = lim sup tp 2 −p 1 lim sup t→0+ ψ p 1 ,q 1 (t) t→0+

1 ln ln 1/t

q 2 −q 1

and this limit is bounded as t → 0+ if (p1 , q1 ) ≤ (p2 , q2 ) .The result follows from Lemma 5.42. The following estimates on the inverse of ψ −1 will be useful to us later p,q on. Lemma 5.48 There exists C = C (p, q) such that for all t ∈ [0, ∞), 1 ψ (t) ≤ ψ −1 p,q (t) ≤ Cψ 1/p,−q /p (t) . C 1/p,−q /p

104

Variation and H¨ older spaces

Proof. For t large enough ψ p,q (t) = tp , ψ 1/p,−q /p = t1/p and there is nothing to show. For t small it suﬃces to observe that ψ 1/p,−q /p is the asymptotic inverse of ψ p,q at 0 + . Finally, the following proposition will allow us to use Theorem 5.43 with the functions ψ p,q . p Proposition 5.49 For any p > p > 0 and q ∈ R, the function ψ −1 p,q (·) is locally convex in a positive neighbourhood of 0. Proof. Obvious from an explicit computation of the second derivative of p ψ −1 near 0 + . p,q (·)

5.5 Higher-dimensional variation 5.5.1 Deﬁnition and basic properties We now discuss p-variation regularity of a function 2 f : [0, T ] → Rd , |·| s s → f . u u n

The generalization to [0, T ] with n > 2 follows the same arguments but 2 will not be relevant to us. Given a rectangle R = [s, t] × [u, v] ⊂ [0, T ] we write s, t s t s t f (R) := f := f +f −f −f . (5.22) u, v u v v u

2 If d = 1 and f is smooth, this is precisely R ∂∂a∂fb (a, b) dadb. Also, if f (s, t) = gs ⊗ ht , then s, t f = gs,t ⊗ hu ,v . u, v We will also use the following notations, consistent with our 1-dimensional increment notation. t t t f := f −f , u, v v u s, t t s f := f −f . v v v We will also frequently use the notation |R| for the area of R.

5.5 Higher-dimensional variation

105

2 Deﬁnition 5.50 Let f : [0, T ] → Rd , |·| and p ∈ [1, ∞). We say that f has ﬁnite p-variation if |f |p-var;[0,T ] 2 < ∞, where  |f |p-var;[s,t]×[u ,v ] =

sup (t i )∈D([s,t]) (t i )∈D([u ,v ])

1/p ti , ti+1 p  f  t j , t j +1 i,j

and write f ∈ C p-var [0, T ] , Rd .

In the 1-dimensional (1D) case, i.e. functions deﬁned on [0, T ], the notion of control is fundamental. In the 2-dimensional (2D) case, controls are deﬁned on ∆T × ∆T where we recall that ∆ := ∆T = {(s, t) : 0 ≤ s ≤ t ≤ T } . We think of elements in ∆T × ∆T as rectangles contained in the square 2 [0, T ] and write [s, t]×[u, v] rather than ((s, t) , (u, v)) for a generic element. Deﬁnition 5.51 Let ∆T = {(s, t) : 0 ≤ s ≤ t ≤ T }. A 2D control (more 2 precisely, 2D control function on [0, T ] ) is a continuous map ω : ∆T × ∆T → [0, ∞) which is super-additive in the sense that for all rectangles R1 , R2 , and R with R1 ∪ R2 ⊂ R and R1 ∩ R2 = ∅, ω (R1 ) + ω (R2 ) ≤ ω (R) and such that for all rectangles of zero area, ω (R) = 0. A 2D control ω is said to be H¨older-dominated if there exists a constant C such that for all s < t in [0, T ] , 2 ω [s, t] ≤ C |t − s| .

The proof of the following lemma is a straightforward adaption of the 1D case treated in Section 5.1 and left to the reader. 2 Lemma 5.52 Let f ∈ C [0, T ] , Rd . Then (i) If f is of ﬁnite p-variation for some p ≥ 1, 2

p

R rectangles in [0, T ] → |f |p-var;R is a 2D control. 2 (i) f is of ﬁnite p-variation on [0, T ] if and only if there exists a 2D control 2 ω such that for all rectangles R ⊂ [0, T ] , p

|f (R)| ≤ ω (R) and we say that “ω controls the p-variation of f ”.

Variation and H¨ older spaces

106

2

Remark 5.53 If f : [0, T ] → Rd is symmetric (i.e. f (s, u) = f (u, s) p for all s, u) and of ﬁnite p-variation then [s, t] × [u, v] → |f |p-var;[s,t]×[u ,v ] is symmetric. In fact, one can always work with symmetric controls, it suﬃces to replace a given ω with [s, t]×[u, v] → ω ([s, t] × [u, v])+ω ([u, v] × [s, t]). 2 Lemma 5.54 A function f ∈ C [0, T ] , Rd is of ﬁnite p-variation if and only if ti , ti+1 p < ∞. f sup tj , tj +1 (t j )∈D([0,T ]) i,j

Moreover, the p-variation of f is controlled by 3p−1 times p ti , ti+1 f . ω ([s, t] × [u, v]) := sup tj , tj +1 (t j )∈D([0,T ]) i,j [t i ,t i + 1 ]⊂[s,t] [t j ,t j + 1 ]⊂[u ,v ]

2 Proof. Assuming that ω [0, T ] is ﬁnite, it is easy to check that ω is a 2D control. Then, for any given [s, t] and [u, v] which do not intersect or such that [s, t] = [u, v] , p s, t f ≤ ω ([s, t] × [u, v]) . u, v Take now s ≤ u ≤ t ≤ v, then, s, t s, u u, t f = f +f u, v u, v u, v s, u u, t s, u = f +f +f . u, v u, t t, v Hence, p s, t f u, v

≤

2 3p−1 ω ([s, u] × [u, v]) + ω [u, t] + ω ([s, u] × [t, v])

≤

3p−1 ω ([s, t] × [u, v]) .

The other cases are dealt with similarly, and we ﬁnd at the end that for all s ≤ t, u ≤ v, p s, t f ≤ 3p−1 (ω [s, t] × [u, v]) . u, v This concludes the proof.

Example 5.55 Given two functions g, h ∈ C p-var [0, T ] , Rd we can deﬁne s (g ⊗ h) := g (s) ⊗ h (t) ∈ Rd ⊗ Rd t

5.5 Higher-dimensional variation

107

and g ⊗ h has ﬁnite 2D p-variation.4 More precisely, p s, t p p (g ⊗ h) ≤ |g|p-var;[s,t] |h|p-var;[u ,v ] =: ω ([s, t] × [u, v]) u, v and since ω is indeed a 2D control function (as a product of two 1D control functions!) we see that |g ⊗ h|p-var;[s,t]×[u ,v ] ≤ |g|p-var;[s,t] |h|p-var;[u ,v ] .

2 Exercise 5.56 Given f ∈ C p-var [0, T ] , Rd , for any ﬁxed [s, t]×[u, v] ∈ r 2 [0, T ] , prove that the (1-dimensional) p-variation of r ∈ [s, t] → f u, v is bounded by |f |p-var;[s,t]×[u ,v ] . Similarly, prove that the (1-dimensional) p s, t variation of r ∈ [u, v] → f is bounded by |f |p-var;[s,t]×[u ,v ] . r Remark 5.57 If ω is a 2D control function, then 2 (s, t) → ω [s, t] 2 2 2 is a 1D control function, i.e. ω [s, t] + ω [t, u] ≤ ω [s, u] , and 2 (s, t) → ω [s, t] is continuous and zero on the diagonal. 2 Remark 5.58 A function f ∈ C [0, T ] , Rd of ﬁnite p-variation can also be considered as a path t → f (t, ·) with values in the space C p-var [0, T ] , Rd with p-variation (semi-)norm. It is instructive to observe that t → f (t, ·) has ﬁnite p-variation if and only if f has ﬁnite 2D p-variation.

5.5.2 Approximations to 2D functions Piecewise linear-type approximations Recall from Section 5.3 that a continuous path of ﬁnite p-variation can be approximated by smooth and/or piecewise linear paths in the sense of “uniform convergence with uniform p-variation bounds”, but not, in general, in p-variation norm. The same is true in the 2D case and the approximations deﬁned below are the natural 2D analogue of piecewise linear approximations. 4X

⊗ X is equipped with a compatible tensor norm.

Variation and H¨ older spaces

108

˜ Deﬁnition 5.59 (D, linear-type approximation of a 2D func D piecewise 2 tion) Assume f ∈ C [0, T ] , E where E is a normed space. Let

˜

A function f D , D f

˜ = (˜ τ j ) ∈ D [0, T ] . D = (τ i ) , D 2 ∈ C [0, T ] , E with the property that

˜ D ,D

s t

=f

s t

˜ for all (s, t) ∈ D × D

is uniquely deﬁned by requiring that ˜

f (D , D ) (·, 0) ˜ f (D , D ) (0, ·)

= f D (·, 0) , ˜

= f D (0, ·) ,

τ j , τ˜j +1 ), and, for (s, t) × (u, v) ⊂ (τ i , τ i+1 ) × (˜ v−u t−s τ i , τ i+1 ˜) s, t D ,D ( f . × f = τ˜j , τ˜j +1 u, v τ i+1 − τ i τ˜j +1 − τ˜j

2 ˜ ∈ D [0, T ]. Then, Proposition 5.60 Let f ∈ C ρ-var [0, T ] , E and D, D ˜ we have for all s < t in D and u < v in D ρ D , D˜ ρ ≤ 9ρ−1 |f |ρ-var;[s,t]×[u ,v ] . (5.23) f ρ-var;[s,t]×[u ,v ]

˜ ˜ Moreover, f D , D → f uniformly as |D| , D → 0. ˜ ˜ Remark 5.61 It need not be true that f D , D → f as |D| , D → 0 in ρvariation. However, by interpolation this holds true when ρ is replaced by ρ > ρ. 2

Proof. Without loss of generality, [s, t] × [u, v] = [0, 1] . Given D = 2 ˜ = (˜ τ j ) ∈ D [0, 1] we now deﬁne ω D , D˜ on [0, 1] as follows: for (τ i ) , D τ j , τ˜j +1 ] we set small rectangles [s, t] × [u, v] ⊂ I × J ≡ [τ i , τ i+1 ] × [˜ ω D ([s, t] × [u, v]) :=

(t − s) (v − u) ρ |f |ρ-var;I ×J |I × J|

(with s, t ∈ I; u, v ∈ J);

then, for vertical “strips” of the form [s, t] × (J1 ∪ · · · ∪ Jn ) with s, t ∈ I ≡ [τ i , τ i+1 ] and Jl = [τ j + l−1 , τ j + l ], ω D ([s, t] × (J1 ∪ · · · ∪ Jn )) := for s, t ∈ I; u, v ∈ J.

(t − s) ρ |f |ρ-var;I ×(J 1 ∪···∪J n ) |I|

5.5 Higher-dimensional variation

109

We use a similar deﬁnition for horizontal strips; at last, for a (possibly) large rectangle with endpoints in D we set ρ

ω D ((I1 ∪ · · · ∪ Im ) × (J1 ∪ · · · ∪ Jn )) := |f |ρ-var;(I 1 ∪···∪I m )×(J 1 ∪···∪J n ) . 2

Now, an arbitrary rectangle A = [a, b] × [c, d] ⊂ [0, 1] decomposes uniquely into (at most) 9 rectangles A1 , . . . , A9 of the above type (4 small rectangles in the corners, 2 vertical and 2 horizontal 9 strips and 1 rectangle with endpoints in D) and we deﬁne ω D , D˜ (A) = i=1 ω D , D˜ (Ai ). We leave it to the reader to check that ω D , D˜ is indeed a 2D control function on ˜

2

D ,D that [0, 1] . On ρthe other hand, it is clear from the deﬁnition of f D , D˜ (Ai ) ≤ ω D , D˜ (Ai ) for i = 1, . . . , 9 and so f

9 ρ 9 ρ ρ ˜ D , D˜ D , D˜ D ,D ρ−1 (A) = f (Ai ) ≤ 9 (Ai ) = 9ρ−1 ω D , D˜ (A) . f f i=1

i=1

2 The proof of (5.23) is then ﬁnished with the remark that ω D , D˜ [0, 1] = ˜ ˜ ρ |R|ρ-var;[0,1] 2 . At last, uniform convergence of f D , D → f as |D| , D → 0 is 2

a simple consequence of (uniform) continuity of f on [0, 1] . Molliﬁer approximations We now turn to another class of well-known smooth approximations: molliﬁer approximations. Notation 5.62 (continuous extension of 2D functions) Whenever neces2 sary, we shall extend a continuous function f deﬁned on [0, T ] to a con2 tinuous function f = f (s, t) deﬁned on R by setting f (0, 0) for s, t f (T, T ) for s, t

< 0, f (0, T ) for s < 0, t > T, > T, f (T, 0) for s > T, t < 0

and, for s ∈ [0, T ] resp. t ∈ [0, T ], f (s, t) =

f (s, 0) if t < 0 f (s, T ) if t > T

resp. f (s, t) =

f (0, t) if s < 0 . f (T, t) if s > T

Note that, as a consequence of this deﬁnition, we have 2 f (R) = f R ∩ [0, T ] for all rectangles in R2 .

Variation and H¨ older spaces

110

Deﬁnition 5.63 (µ, µ ˜ molliﬁer approximation of a 2D function) Assume 2 ˜ be two compactly f ∈ C [0, T ] , E where E is a normed space. Let µ, µ 2 supported probability measures on R. We deﬁne f µ, µ˜ ∈ C [0, T ] , E by f

µ, µ ˜

s u

=

f

s−a u−b

dµ (a) d˜ µ (b) ,

noting that the same relation remains valid for rectangular increments, f

µ, µ ˜

s, t u, v

=

f

s − a, t − a u − b, v − b

dµ (a) d˜ µ (b) .

Proposition 5.64 Let µ, µ ˜ be two compactly supported probability mea 2 sures on R and f ∈ C ρ-var [0, T ] , E , extended to a continuous function on R2 (cf. notation above) with ρ-variation controlled by ω

s, t u, v

ρ

= |f |ρ-var;[s,t]×[u ,v ] .

Then f µ, µ˜ is also of ﬁnite ρ-variation, controlled by the 2D control ω µ, µ˜ and

s, t u, v

=

µ, µ˜ ρ f

ω

ρ-var;[0,T ]

2

s − a, t − a u − b, v − b

dµ (a) d˜ µ (b) ,

2 ρ ≤ ω µ, µ˜ [0, T ] ≤ |f |ρ-var;[0,T ] 2 .

(5.24)

2

˜ n converge weakly Moreover, f µ n , µ˜ n → f uniformly on [0, T ] whenever µn , µ to the Dirac measure at zero.5 Remark 5.65 There is nothing special about the interval [0, T ]. However, we cannot deduce from (5.24) that µ, µ˜ ρ f

ρ-var;[s,t] 2

ρ

= |f |ρ-var;[s,t] 2 for all s < t in [0, T ] . 2

The reason is that f µ, µ˜ depends on our extension of f from [0, T ] to R2 . Thus, if we construct fˆµ, µ˜ from fˆ = f |[s,t] 2 , extended to R2 , it will not, in general, coincide with f µ, µ˜ . 5 That

is,

ϕdµ n → ϕ (0) for all continuous, bounded ϕ.

5.6 Comments

111

Proof. Given (si ) ∈ D [0, T ],(tj ) ∈ D [0, T ] we have, using Jensen’s inequality, ρ ρ µ, µ˜ si , si+1 si − a, si+1 − a f dµ (a) d˜ µ (b) = f tj , tj +1 tj − b, tj +1 − b ρ si − a, si+1 − a f ≤ dµ (a) d˜ µ (b) tj − b, tj +1 − b ρ si − a, si+1 − a dµ (a) d˜ f µ (b) = tj − b, tj +1 − b ≤

ω µ, µ˜ ([si , si+1 ] × [tj , tj +1 ]) ,

µ, µ ˜ which shows that f has ﬁnite ρ-variation controlled by ω . Moreover, 2 since f (R) = f R ∩ [0, T ] for all rectangles, ρ

ρ

|f |ρ-var;[0−a,T −a]×[0−b,T −b] ≤ |f |ρ-var;[0,T ] 2 2 ρ and so ω µ, µ˜ [0, T ] ≤ |f |ρ-var;[0,T ] 2 , which concludes the proof.

5.6 Comments Proposition 5.8 appears in Lyons and Qian [120]; our (complete) proof partially follows Dudley and Norvaiˇsa [47], p. 93. Continuity properties of the type discussed in Lemma 5.13 appear in Musielak and Semadeni [132]. The notion of paths which are “absolutely continuous of order p”, cf. Exercise 5.15, is due to Love [114]. Fractional Sobolev spaces (discussed in Example 5.16) are also known as Besov or Slobodetzki spaces and arise in many areas of analysis. The notion of “geodesic space” and its variations (length space, etc.) is now well understood, for example Gromov [72] or Burago et al. [21] and the references cited therein. Exercise 5.35 is taken from Dudley and Norvaiˇsa [48], p. 28. An almost complete list of references for generalized variation, or ϕ-variation, is found in Dudley and Norvaiˇsa [47]. Comments on higher-dimensional p-variation will be given in Chapter 6.

6 Young integration We construct

·

ydx, the Young integral of y against x where x∈C [0, T ] , Rd , y ∈ C q -var [0, T ] , L Rd , Re 0

p-var

with 1/p+1/q > 1. Although the results here are well known, our approach is novel and extends – without much conceptual eﬀort! – to rough path estimates for ordinary – and then rough – diﬀerential equations.

6.1 Young–L´oeve estimates We start with two elementary analysis lemmas, tailor-made for obtaining the Young–L´ oeve estimate in Proposition 6.4 below. Lemma 6.1 Let ξ > 0 and θ > 1. Consider : [0, T ] → R with (r) ≤ 2 (r/2) + ξrθ for all r ∈ [0, T ]

(6.1)

and such that (r) = o (r) as r → 0+, i.e. lim

r →0+

(r) = 0. r

(6.2)

Then, for all r ∈ [0, T ] , ξ (r) ≤ rθ . 1 − 21−θ Proof. We deﬁne φ (r) = (r) / ξrθ and note that (6.1) implies φ (r) ≤ 1 + 21−θ φ (r/2) . Iterated use of this inequality shows that for all n ∈ N, φ (r) ≤ 1 +

n −1 k =1

21−θ

k

n r + 21−θ φ n . 2

(6.3)

We now send n → ∞. The last term on the right-hand side above tends to zero since 1−θ n (r2−n ) 1−θ n r = 2 φ n 2 2 ξrθ 2−θ n 1−θ r (r2−n ) ≤ ξ r2−n → 0 by assumption (6.2)

6.1 Young–L´ oeve estimates

113

and the proof is ﬁnished with φ (r) ≤

∞

2−θ k 2k =

k =0

1 . 1 − 21−θ

Lemma 6.2 Let Γ : ∆ ≡ {0 ≤ s < t ≤ T } → Re and assume (i) there exists a control ω ˆ such that sup

lim

r →0 (s,t)∈∆ : ω ˆ (s,t)≤r

|Γs,t | = 0; r

(6.4)

(ii) there exists a control ω and θ > 1, ξ > 0 such that θ

|Γs,u | ≤ |Γs,t | + |Γt,u | + ξω (s, u)

(6.5)

holds for 0 ≤ s ≤ t ≤ u ≤ T . Then, for all 0 ≤ s < t ≤ T , |Γs,t | ≤

ξ θ ω (s, t) . 1 − 21−θ

Remark 6.3 It is important to notice that the control ω ˆ does not appear in the conclusion. Proof. At the cost of replacing Γ by Γ/ξ, we can and will take ξ = 1. We assume that ω ˆ ≤ 1ε ω for some ε > 0. If this is not the case, we replace ω by ω + εˆ ω and let ε tend to 0 at the end. Deﬁne for all r ∈ [0, ω (0, T )] , (r) =

sup

|Γs,t | .

s,t such that ω (s,t)≤r

Consider any ﬁxed pair (s, u) with 0 ≤ s < u ≤ T such that ω (s, u) ≤ r, and pick t such that ω (s, t) , ω (t, u) ≤ ω (s, u) /2. (This is possible thanks to basic properties of a control function, see Exercise 1.10). By deﬁnition of , |Γs,t | ≤ (r/2) ,

|Γs,u | ≤ (r/2)

and it follows from the assumption (6.5) that r + rθ . |Γs,u | ≤ 2 2 Taking the supremum over all s < u for which ω (s, u) ≤ r yields that for r ∈ [0, ω (0, T )], r + rθ . (r) ≤ 2 2

Young integration

114

On the other hand, assumption (6.4) implies that lim

r →0

(r) = 0. r

It then suﬃces to apply Lemma 6.1 to see that for all r ∈ [0, ω (0, T )] , (r) ≤

1 rθ 1 − 21−θ

and this readily translates to the statement that, for all 0 ≤ s < t ≤ T, |Γs,t | ≤

1 θ ω (s, t) . 1 − 21−θ

Proposition 6.4 (Young–L´ oeve estimate) Assume x ∈ C 1-var [0, T ] , Rd , y ∈ C 1-var [0, T ] , L Rd , Re for p, q ≥ 1 with θ := 1/p + 1/q > 1.With the deﬁnition t t Γs,t := yu dxu − ys xs,t = ys,u dxu s

we have |Γs,t | ≤

s

1 |x|p-var;[s,t] |y|q -var;[s,t] . 1 − 21−θ

(6.6)

Remark 6.5 It is instructive to think of ys xs,t as a ﬁrst-order Euler ap t proximation to the Riemann–Stieltjes integral s ydx so that (6.6) is nothing but a “ﬁrst-order Euler error” estimate. The point of this estimate is its uniformity: although 1-variation was assumed to have a well-deﬁned Riemann–Stieltjes integral, the ﬁnal estimate only depends on the respec tive p, q-variation. Proof. From Exercise 1.9, 1/θ

1/θ

ω (s, t) := |x|p-var;[s,t] |y|q -var;[s,t] is a control. For all s < t in [0, T ] we deﬁne t t yu dxu − ys xs,t = ys,u dxu . Γs,t = s

s

Then, for ﬁxed s < t < u in [0, T ], we have u t ys,r dxr − ys,r dxr − Γs,u − Γs,t − Γt,u = =

s ys,t xt,u

s

t

u

yt,r dxr

6.2 Young integrals

115

and hence |Γs,u | ≤ |Γs,t | + |Γt,u | + |x|p-var;[s,t] |y|q -var;[t,u ] =

θ

|Γs,t | + |Γt,u | + ω (s, u) .

Deﬁning ω ˜ (s, t) = |x|1-var;[s,t] + |y|1-var;[s,t] , elementary Riemann–Stieltjes integral estimates show that 2

˜ (s, t) . |Γs,t | ≤ |ys,· |∞;[s,t] |x|1-var;[s,t] ≤ ω It only remains to apply Lemma 6.2 and the proof is ﬁnished. In the following section we shall use the Young–L´ oeve estimate to deﬁne the Young integral for x ∈ C p-var and y ∈ C q -var . Remark 6.6 We could have assumed y ∈ C q -var right away in Proposition 6.4. Indeed, as long as x ∈ C 1-var , Γs,t remains a well-deﬁned Riemann– Stieltjes integral and the only change in the argument is to use q

1+ 1/q

ω ˜ (s, t) := |x|1-var;[s,t] + |y|q -var;[s,t] =⇒ |Γs,t | ≤ ω ˜ (s, t)

in the ﬁnal lines of the proof.

6.2 Young integrals The Young–L´ oeve estimate clearly implies that · (x, y) → ydx 0

is bilinear (as a function of smooth Rd resp. L Rd , Re -valued paths x, y) and continuous in the respective p- and q-variation norm. The (unique, continuous) extension of this map to x ∈ C 0,p-var [0, T ] , Rd , y ∈ C 0,q -var [0, T ] , L Rd , Re is immediate from general principles and by squeezing p, q to p + ε, q + ε so that 1/ (p + ε) + 1/ (q + ε) > 1 one covers genuine p-variation and qvariation regularity, i.e. x ∈ C p-var and y ∈ C q -var . That said, we shall proceed in a slightly diﬀerent way which will motivate our later deﬁnition of p-var ([0, T ] , rough diﬀerential equations. To this end, recall that any x ∈ C d R can be approximated “uniformly with uniform variation bounds” by bounded variation paths xn , i.e. d∞;[0,T ] (xn , x) → 0 and sup |xn |p-var;[0,T ] < ∞. n

(For instance, piecewise geodesic=linear approximations will do.)

Young integration

116

Deﬁnition 6.7 (Young integral) Given x ∈ C p-var [0, T ] , Rd and y ∈ C q -var [0, T ] , L Rd , Re we say that z ∈ C ([0, T ] , Re ) is an (indeﬁnite) Young integral of y against a sequence (xn , y n ) ⊂ x if there exists C 1-var [0, T ] , Rd × C 1-var [0, T ] , L Rd , Re which converges uniformly with uniform variation bounds in the sense

and

|xn − x|∞;[0,T ]

→

0 and sup |xn |p-var;[0,T ] < ∞,

|y n − y|∞;[0,T ]

→

0 and sup |y n |q -var;[0,T ] < ∞,

·

n n

y n dxn → z uniformly on [0, T ] as n → ∞.

0

If z is unique we write

s − 0 ydx.

· 0

ydx instead of z and set

t s

ydx :=

p-var

t 0

ydx

d [0, T ] , R and Theorem 6.8 (Young–L´ o eve) Given x ∈ C with θ = 1/p + 1/q > 1, there exists a y ∈ C q -var [0, T ] , L Rd , Re

· unique (indeﬁnite) Young integral of y against x, denoted by 0 ydx and the Young–L´ oeve estimate t 1 ydx − ys xs,t ≤ |x|p-var;[s,t] |y|q -var;[s,t] ∀0 ≤ s ≤ t ≤ T : 1 − 21−θ s remains valid. Moreover, the indeﬁnite Young integral has ﬁnite p-variation and · ydx ≤ C |x|p-var;[s,t] |y|q -var;[s,t] + |y|∞;[s,t] 0 p-var;[s,t] (6.7) ≤ 2C |x|p-var;[s,t] |y|q -var;[0,T ] + |y0 | where C = C (p, q).

· Proof. Let us ﬁrst argue that any limit point z of 0 y n dxn (in uniform topology on [0, T ]) satisﬁes the Young–L´ oeve estimate. For every ε > 0 small enough so that θε := 1/ (p + ε) + 1/ (q + ε) > 1 the Young–L´ oeve estimate of Proposition 6.4 gives t 1 n n n n yr dxr − ys xs,t ≤ |xn |(p+ε)-var;[s,t] |y n |(q + ε)-var;[s,t] . (6.8) 1 − 21−θ ε s By Corollary 5.29, the right-hand side above can be made arbitrarily small, uniformly in n, provided t−s is small enough; this readily leads to equicontinuity of the indeﬁnite Riemann–Stieltjes integrals · n n yr dxr : n ∈ N . 0

6.2 Young integrals

117

Boundedness is clear and so, by Arzela–Ascoli, we have uniform convergence along a subsequence to some z ∈ C ([0, T ] , Re ), which proves existence of the Young integral. Using the ﬁrst part of Corollary 5.29, we let n tend to ∞ in (6.8) and obtain |zs,t − ys xs,t | ≤

1 |x|(p+ε)-var;[s,t] |y|(q + ε)-var;[s,t] . 1 − 21−θ ε

Then, an application of Lemma 5.13 justiﬁes the passage ε 0 which shows validity of the Young–L´ oeve estimate, |zs,t − ys xs,t | ≤

1 |x|p-var;[s,t] |y|q -var;[s,t] . 1 − 21−θ 1/θ

(6.9) 1/θ

To prove uniqueness we use the control ω (s, t) := |x|p-var;[s,t] |y|q -var;[s,t]

· (cf. Exercise 1.9). Assume z, z˜ are two limit points of 0 y n dxn so that z0 = z˜0 = 0. Fix a dissection (ti ) of [0, T ] and observe zt ,t − xt yt ,t + xt yt ,t − z˜t ,t |zT − z˜T | ≤ i i+ 1 i i i+ 1 i i i+ 1 i i+ 1 i

≤

2 θ ω (ti , ti+1 ) 1 − 21−θ i

≤

2 θ −1 ω (0, T ) max ω (ti , ti+1 ) . i 1 − 21−θ

Applying this to a sequence of dissections with mesh (=maxi |ti+1 − ti |) tending to zero we see that |zT − z˜T | can be made arbitrarily small and hence must be zero. This shows zT = z˜T and, as T was arbitrary, z ≡ z˜. At last, set c = 1/ 1 − 21−θ and observe that (6.9) implies p θp p p |zs,t | ≤ 21−1/p cp ω (s, t) + |y|∞;[s,t] |x|p-var;[s,t] . We observe that the right-hand side above is super-additive in (s, t) and so p · θp p p 1−1/p p ydx c ≤ 2 ω (s, t) + |y| |x| ∞;[s,t] p-var;[s,t] 0

p-var;[s,t]

and the proof is easily ﬁnished. Exercise 6.9 The purpose of this exercise is to show that our Deﬁnition 6.7 is consistent with the “usual” deﬁnition of Young integrals as limits of Riemann–Stieltjes sums. To this end, let x ∈ C p-var ([0, T ] , R) , y ∈ C q -var ([0, T ] , R) with θ := 1/p + 1/q > 1 and let Dn = (tni )i be a sequence of dissections of [0, T ] with |Dn | → 0, and ξ ni some points in tni , tni+1 . |D n |−1 Show that i=0 y (ξ ni ) xt i ,t i + 1 converges when n tends to ∞ to a limit I independent of the choice of ξ ni and the sequence (Dn ). Identify I as a

T Young integral 0 ydx in the sense of Deﬁnition 6.7.

Young integration

118

Exercise 6.10 Let ϕ = (ϕi )i=1,...,d be a collection of maps from Rd to Re and assume x ∈ C p-var [0, T ] , Rd . Show that · ϕ (x) dx 0

is a well-deﬁned Young integral provided ϕ, viewed as a map Rd → d e older provided L R , R is (γ − 1)-H¨ γ>p and γ − 1 ∈ (0, 1], to avoid trivialities.(We shall encounter this type of regularity assumption in our forthcoming discussion of rough integrals.) Solution. The path ϕ (x· ) has ﬁnite q = p/ (γ − 1)-variation and we see that the Young integral is well-deﬁned since γ 1 1 + = > 1. q p p

6.3 Continuity properties of Young integrals p-var [0, T ] , Rd and y ∈ C q -var ([0, T ] , Proposition d e 6.11 Given x ∈ C with 1/p + 1/q > 1 the map L R ,R · (x, y) → ydx

d

q -var

0

[0, T ] , R × C [0, T ] , L Rd , Re → C p-var ([0, T ] , Re ) from C equipped with the respective p , q-variation norms is a bilinear and continuous map. As a consequence, it is Lipschitz continuous on bounded sets and Fr´echet smooth.

Proof. Bilinearity of (x, y) → ydx follows from bilinearity of the approximations in the deﬁnition and uniqueness; continuity in the sense of bilinear maps is immediate from Young–L´ oeve and Fr´echet smoothness for bilinear, continuous maps is trivial. The following property deals with continuity with respect to “uniform convergence with uniform bounds”. n p-var d [0, T ] , R and y n , y ∈ Proposition 6.12 Assume given x , x ∈ C d e q -var [0, T ] , L R , R such that C p-var

lim |xn − x|∞;[0,T ]

=

0 and sup |xn |p-var;[0,T ] < ∞,

lim |y n − y|∞;[0,T ]

=

0 and sup |y n |q -var;[0,T ] < ∞

n →∞

n →∞

n n

6.4 Young–L´ oeve–Towghi estimates and 2D Young integrals

and 1/p + 1/q > 1. Then · · n n lim y dx − ydx n →∞

0

∞;[0,T ]

0

· n n = 0 and sup y dx n 0

119

< ∞. p-var;[0,T ]

Proof. Increase p, q by ε small enough so that 1/ (p + ε)+1/ (q + ε) > 1. By interpolation, xn → x in (p + ε)-variation and similarly y n → y in (q + ε)· · variation. By the preceding proposition, 0 y n dxn converges to 0 ydx in (p + ε)-variation and hence in ∞-norm. The uniform p-variation bounds · on 0 y n dxn follow immediately from the estimate in the Young–L´oeve theorem. Exercise 6.13 Using continuity properties, establish an integration-byparts formula for Young integrals. Exercise 6.14 Fix p, q ≥ 1 with > 1, R > 0, and ﬁx x ∈ 1/p + 1/q p-var d q -var C [0, T ], R . Deﬁne BR = y ∈ C [0, T ] , L Rd , Re , |y|q -var;[0,T ] ≤ R} . Prove that the map BR y

→

C p-var ([0, T ] , Re ) · ydx → 0

is Lipschitz using the dq -var -metric. Prove it is uniformly continuous with respect to dq -var -metric with q > q. Prove also it is uniformly continuous with respect to d∞ -metric. oeve esti[Hint: Lipschitz with respect to dq -var -metric is just the Young–L´ mate. For q > q, use the ﬁrst case plus interpolation.]

6.4 Young–L´oeve–Towghi estimates and 2D Young integrals Young integrals extend naturally to higher dimensions but only the 2D case will be relevant to us. 2D statements which extend line by line from the 1D statements will not be discussed in detail. In particular, we shall

use the Riemann–Stieltjes integral [0,T ] 2 ydx of a continuous function y ∈ 2 with respect to a bounded variation function x ∈ C [0, T ] , L Rd , Re 2 C 1-var [0, T ] , Rd . We will also show that the estimate ydx ≤ |y| ∞;R |x|1-var;R R

2

is valid for any rectangle R ⊂ [0, T ] . We ﬁrst need to extend Lemma 6.2 to its 2-dimensional version.

Young integration

120

2

Lemma 6.15 Let Γ : {0 ≤ s < t ≤ T } → Re be such that1 (i) for some control ω ˆ, sup

lim

r →0 rectangle R s.t. ω ˆ (R )≤r

|Γ (R)| = 0; r

(6.10)

(ii) for some control ω and some real θ > 1, for all rectangles R being the union of two (essentially) disjoint rectangles R1 , R2 , θ

|Γ (R)| ≤ |Γ (R1 )| + |Γ (R2 )| + ξω (R) .

(6.11)

2

Then, for all rectangles R ∈ [0, T ] , |Γ (R)| ≤

ξ θ ω (R) . 1 − 21−θ

Proof. The proof follows exactly the same lines as the 1-dimensional proof. We now proceed as in the 1D case and prove uniform Young–L´ oeve-type estimates, for integrand and driving signal which are assumed to be of bounded variation.

Proposition 6.16 Let 2 2 x ∈ C 1-var [0, T ] , Rd , y ∈ C 1-var [0, T ] , L Rd , Re . 2

Given R = [s, t] × [u, v] ⊂ [0, T ] , deﬁne

y

Γ (R) = R

s, . u, .

dx.

2

Then, for all such rectangles R ⊂ [0, T ] and p, q ≥ 1 such that θ := p−1 + q −1 > 1, |Γ (R)| ≤

1 1 − 21−θ

2 |y|q -var;R |x|p-var;R . 1

1

θ |y|qθ -var;R . We consider a generic Proof. Deﬁne the control ω (R) = |x|p-var;R rectangle R = [r, t] × [u, v] cut into two non-intersecting rectangles, say R1 = [r, s] × [u, v] and R2 = [s, t] × [u, v] . Observe that from the deﬁnition

1 We

identify once again {0 ≤ s < t ≤ T }2 with rectangles contained in [0, T ]2 .

6.4 Young–L´ oeve–Towghi estimates and 2D Young integrals

of Γ (R), we see

y

r, . u, .

121

dx r, . r, . y dx + y dx = u, . u, . R1 R2 r, . s, . y −y dx = Γ (R1 ) + Γ (R2 ) + u, . u, . R 2 r, s = Γ (R1 ) + Γ (R2 ) + y dx u, . R2 r, s s, t = Γ (R1 ) + Γ (R2 ) + y d x . u, . . [u ,v ] But from Exercise 5.56, the 1-dimensional q-variation of y r ,. s over [u, v] is bounded by |y|q -var;[r,s],[u ,v ] and the 1-dimensional p-variation of x s ,. t oeve 1 diover [u, v] is bounded by |x|p-var;[s,t],[u ,v ] . Hence, using Young–L´ mensional estimates, we obtain Γ (R) =

R

|Γ (R)| ≤ |Γ (R1 )| + |Γ (R2 )| +

1 θ ω (R) . 1 − 21−θ

Deﬁning ω ˜ to be a 2D control dominating the 1-variation of x and y, we see that 2 |Γ (R)| ≤ ω ˜ (R) . It only remains to apply Lemma 6.15. With the notation of the above theorem, we see s, . ydx = y dx + y u, . R R R s, . y dx + y + u R

that

s dx u, . s x (R) . u

We see that the second and third integrals will be well deﬁned when y and x are of ﬁnite q- and p-variation if their 1D projections y s. and y u. are of ﬁnite (1D) q-variation. This is actually satisﬁed if y 0. and y 0. are of ﬁnite (1D) q-variation and y is of ﬁnite (2D) q-variation. To . simplify, 0 = y = 0. In we therefore restrict ourselves to paths y such that y . 0 d e 2 q -var the set of functions y such [0, T ] , L R , R particular, we deﬁne C0,0 that y0, .= y.,0 = 0. 2 Deﬁnition 6.17 (Young integral) Given x ∈ C p-var [0, T ] , Rd , y ∈ 2 2 q -var we say that z ∈ C [0, T ] , Re is an (indef[0, T ] , L Rd , Re C0,0 inite) Young integral of y against x if there exists a sequence (xn , y n ) ⊂

Young integration

122

2 2 1-var [0, T ] , L Rd , Re which converges uniformly C 1-var [0, T ] , Rd ×C0,0 with uniform variation bounds in the sense (xn , x)

=

0 and sup |xn |p-var;[0,T ] 2 < ∞,

lim d∞;[0,T ] 2 (y n , y)

=

0 and sup |y n |q -var;[0,T ] 2 < ∞,

2 lim d n →∞ ∞;[0,T ]

n

n →∞

and

n

2

y n dxn = z uniformly on (s, u) ∈ [0, T ] as n → ∞.

lim

n →∞

[0,s]×[0,u ]

If z is unique we write

· 0

ydx instead of z.

Following the same lines as the 1-dimensional case (which involves generalizing a few analysis and p-variation lemmas from 1D to 2D), we obtain 2 Theorem 6.18 (Young–L´ oeve–Towghi) Given x ∈ C p-var [0, T ] , Rd , 2 q -var with θ = 1/p + 1/q > 1, there exists a [0, T ] , L Rd , Re y ∈ C0,0

· unique (indeﬁnite) Young integral of y against x, denoted by 0 ydx and we have 2 1 y s , u dx ≤ |x|p-var;R |y|q -var;R (6.12) · · 1 − 21−θ R 2

for all rectangles R = [s, t] × [u, v] ⊂ [0, T ] .

· One can also check, just as in the 1D case, that (x, y) → 0 yu dxu is a bilinear continuous map from 2 2 2 q -var [0, T ] , L Rd , Re → C p-var [0, T ] , Re C p-var [0, T ] , Rd × C0,0

(and hence Lipschitz on bounded sets and Fr´echet smooth).

6.5 Comments Young integration goes back to Young [177]. The higher-dimensional case was partially discussed in Young [177] and then in Towghi [170] in the form which is relevant to us.

Part II

Abstract theory of rough paths

7 Free nilpotent groups Motivated by simple higher-order Euler schemes for ODEs we give a systematic and self-contained account of the “algebra of iterated integrals”. Tensor algebras play a natural role. However, thanks to algebraic relations between iterated integrals the “correct” state-space will be seen to be a (so-called) free nilpotent Lie group, faithfully represented as a subset of the tensor algebra. It becomes a metric (and even geodesic) space under the so-called Carnot–Caratheodory metric and will later serve as a natural state-space for geometric rough paths.

7.1 Motivation: iterated integrals and higher-order Euler schemes Let x be an Rd -valued continuous path of bounded variation and deﬁne the kth iterated integrals of the path segment x|[s,t] as t g

k ;i 1 ,...,i k

:=

uk

... s

s

s

u2

dxiu11 . . . dxiukk .

The collection of all such iterated integrals, g = gk ;i 1 ,...,i k : 1 ≤ k ≤ N ; i1 , . . . , ik ∈ {1, . . . , d}

(7.1)

(7.2)

is called the step-N signature of the path segment x|[s,t] and is denoted by SN (x)s,t . Postponing (semi-obvious) algebraic formalities to the next section, let us consider (higher-order) Euler schemes for the ODE dy = V (y) dx =

d

Vi (y) dxi

i=1

with V ∈ C Re , L Rd , Re and recall that π (V ) (0, y0 ; x) stands for any (not necessarily unique) solution started at y0 , possibly only deﬁned up to some explosion time. Let I denote the identity function on Re and recall T the identiﬁcation of a vector ﬁeld W = W 1 , . . . , W e : Re → Re with the ﬁrst-order diﬀerential operator

e k =1

W k (y)

∂ . ∂y k

Free nilpotent groups

126

Granted suﬃcient regularity of V , a simple Taylor expansion suggests, at least for 0 < t − s << 1, a step-N approximation of the form yt

≈

ys +

d

Vi (ys ) xis,t

i=1

+... +

t Vi 1 · · · Vi N I (ys )

u2

... s

i 1 ,...,i N ∈{1,...,d}

uN

s

s

dxiu11 . . . dxiuNN .

Having made plain the importance of iterated integrals in higher-order Euler schemes, let us observe the presence of non-linear constraints between the iterated integrals (7.1). This happens already at the “second” level of iterated integrals. Example 7.1 Let x ∈ C01-var [0, T ] , Rd and write x = S2 (x) for its step2 lift. (i) Using integration by parts it readily follows that, for all i, j ∈ {1, . . . , d} t t 2;j,i i j + x = x dx + xjs,r dxir = xis,t xjs,t . (7.3) x2;i,j s,t s,t s,r r s

(ii) As a trivial consequence of

s

x1s,t

:= xt − xs we have

1;i 1;i x1;i s,t + xt,u = xs,u , i = 1, . . . , d.

(7.4)

More interestingly, an elementary computation1 gives 2;i,j 2;i,j 1;i 1;j x2;i,j s,u = xs,t + xt,u + xs,t xt,u , i, j = 1, . . . , d.

(7.5)

This (matrix) equation can be expressed in terms of equations of the respective symmetric and anti-symmetric parts. Adding the equation obtained by interchanging i, j we see, using (7.3) xis,u xjs,u = xis,t xjs,t + xit,u xjt,u + xis,t xjt,u + xjs,t xit,u which just (re-)expresses the additivity of vector increments (7.4). On the other hand, subtracting the equation obtained by interchanging i, j, followed by multiplication with 1/2, yields i,j i,j Ai,j s,u = As,t + At,u +

1 . . . to

1 i j xs,t xt,u − xjs,t xit,u 2

be compared with the forthcoming Theorem 7.11.

(7.6)

7.1 Motivation

127

Figure 7.1. We plot (xi· , xj· ). The triangle connects the points: (xis , xjs ) on the lower left side, (xit , xjt ) in the middle and (xiu , xju ) on the right side. Note that the respective area increments (signed area between the path and a linear chord) are not additive. In fact, Ais,,ju = Ais,,jt + Ait ,, ju + ∆is,,jt , u , where ∆is,,jt , u is the area of the triangle as indicated in the ﬁgure.

where Ai,j s,t

1 1 2;i,j xs,t − x2;j,i = := s,t 2 2

t

xis,r dxjr s

−

t

xjs,r dxir s

has an appealing geometric interpretation as seen in Figure 7.1. Let us draw some ﬁrst conclusions. 2

N

(i) The naive state space Rd+d +···+d for iterated integrals of the form (7.2) is too big. For instance, there is no need to store the symmetric part of x2s,t .

(ii) Any analysis which is based on higher-order Euler approximations over ﬁner and ﬁner intervals, e.g. starting with some interval [s, u] then [s, t] , [t, u], etc., must acknowledge the non-linear nature of “higher-order increments” as seen in (7.5) and (7.6).

Free nilpotent groups

128

7.2 Step-N signatures and truncated tensor algebras 7.2.1 Deﬁnition of SN We have seen that expressions of iterated integrals of type dxiu11 . . . dxiukk , x ∈ C 1-var [0, T ] , Rd s< u 1 < ...< u k < t

appear naturally when considering Euler schemes. A ﬁrst-order Euler scheme over the interval [s, t] involves

↔

dxiu s< u < t

i=1,...,d

d i=1

dxiu

ei ∈ Rd

s< u < t

where (ei )i=1,...,d denotes the canonical basis of Rd . For a second-order scheme one needs additionally dxiu dxjv , s< u < v < t

↔

d

i,j =1,...,d

dxiu dxjv ,

(ei ⊗ ej ) ∈ Rd ⊗ Rd

s< u < v < t

i,j =1

where Rd ⊗ Rd with basis (ei ⊗ ej )i,j =1,...,d can be viewed as the set of realvalued d × d matrices with canonical basis.2 The (obvious) generalization to k ≥ 2 reads dxiu11 . . . dxiukk s< u 1 < ...< u k < t

↔

i 1 ,...,i d

s< u 1 < ...< u k < t

i 1 ,...,i k

dxiu11

. . . dxiukk

(ei 1 ⊗ · · · ⊗ ei k )

(7.7)

where (ei 1 ⊗ · · · ⊗ ei k ), i1 , . . . , ik ∈ {1, . . . , d}, is the canonical basis of d ⊗k R , the space of k-tensors over Rd . Life is easier without many indices and we shall write the right-hand side of (7.7) simply as ⊗k dxu 1 ⊗ · · · ⊗ dxu k ∈ Rd . s< u 1 < ...< u k < t 2 e ⊗ e corresponds to the matrix with entry 1 in the ith line, jth column and 0 i j everywhere else.

7.2 Step-N signatures and truncated tensor algebras

129

⊗k k ∼ We note that, as vector spaces Rd = Rd , and it is a convenient convention to set d ⊗0 R := R. To any Rd -valued path γ of ﬁnite length deﬁned on some interval [s, t], we may associate the collection of its iterated integrals. We have Deﬁnition 7.2 The step-N signature of γ ∈ C 1-var [s, t] , Rd is given by SN (γ)s,t ≡ 1, dxu , . . . , dxu 1 ⊗ · · · ⊗ dxu k ⊕N k =0

∈

s< u < t d ⊗k

R

s< u 1 < ...< u k < t

.

The path u → SN (γ)s,u is called the (step-N ) lift of x.

Remark 7.3 This notation is further justiﬁed by Chen’s theorem below: the step-N lift of some γ ∈ C 1-var [0, T ] , Rd , t → SN (γ)t takes values in a group so that SN (γ)s,t is the natural increment of this path, i.e. the −1 product of (SN (γ)s ) with SN (γ)t . Given two vectors a, b ∈ Rd with coordinates ai i=1,...,d and bi i=1,...,d one can construct the matrix ai bj i,j =1,...,d and hence the 2-tensor a ⊗ b :=

d

ai bj ei ⊗ ej ∈ Rd ⊗ Rd .

i,j =1

(In fact, this is the linear extension of the map ⊗ : Rd × Rd → Rd ⊗ Rd which maps the pair (ei , ej ) to the (i, j) th basis element of Rd ⊗ Rd , for which we already used the suggestive notation ei ⊗ ej .) More generally, given ⊗k ai 1 ,...,i k ei 1 ⊗ · · · ⊗ ei k ∈ Rd (7.8) a= i 1 ,...,i k

⊗l then, with similar notation b ∈ Rd , we agree that a ⊗ b is deﬁned by a⊗b = ∈ We now deﬁne

ai 1 ,...,i k bj 1 ,...,j l ei 1 ⊗ · · · ⊗ ei k ⊗ ej 1 ⊗ . . . ej l d ⊗k d ⊗l d ⊗(k + l) ∼ ⊗ R . R = R

(7.9)

d ⊗k , (7.10) T N Rd := ⊕N k =0 R ⊗k for the projection to the kth tensor and write π k : T N Rd → Rd level. We shall also use the projection (7.11) π 0,k : T N Rd → T k Rd , for k ≤ N ,

Free nilpotent groups

130

which maps g = g 0 , . . . , g N ∈ T N Rd into g 0 , . . . , g k ∈ T N Rd . ⊗k Here, g k = π k (g) ∈ Rd is sometimes refered to as the kth level of g. Given g, h ∈ T N Rd , one extends (7.9) to T N Rd by setting g⊗h=

g ⊗ h ⇔ ∀k ∈ {0, . . . , N } : π k (g ⊗ h) = i

j

i+ j ≤N i,j ≥0

k

g k −i ⊗ hi .

i=0

The vector space T N Rd becomes an (associative) algebra under ⊗. More precisely, we have Proposition 7.4 The space T N Rd , +, .; ⊗ is an associative algebra with neutral element 1 := (1, 0, . . . , 0) = 1 + 0 + · · · + 0 ∈ T N Rd . (The unit element for + is 0 = (0, 0, . . . , 0), of course.) We will call T N Rd the truncated tensor algebra of level N . Proof. Straightforward and left to the reader. Remark 7.5 We shall see below that the set of all g ∈ T N Rd with π 0 (g) = 1 forms a (Lie) group ⊗ with unit element 1. When N = 1 under this group is isomorphic to Rd , + with the usual unit element 0 and this persuades us to set o := (1, 0, . . . , 0) . Similar to the algebra of square matrices, the algebra product is not commutative (unless N = 1 or d = 1), indeed 1 1 1 1 1 1 1 1 ⊗ = = ⊗ = . 0 1 0 0 1 0 1 0 d ⊗k Let us now deﬁne a norm on T N Rd := ⊕N . To this end, we k =0 R d ⊗k equip each tensor level R with Euclidean structure, which amounts to declaring the canonical basis {ei 1 ⊗ · · · ⊗ ei k : i1 , . . . , ik ∈ {1, . . . , d}} to ⊗k be orthonormal so that for any a ∈ Rd of form (7.8) 1 2 |ai 1 ,...,i k | |a|(Rd ) ⊗k = i 1 ,...,i k

and when no confusion is possible we shall simply write |a|. Let us also observe that for 0 ≤ i ≤ k ≤ N , ⊗i d ⊗k −i × R , |a ⊗ b| (a, b) ∈ Rd

( Rd ) ⊗k

= |a|

( Rd ) ⊗i

|b|

( Rd ) ⊗( k −i )

7.2 Step-N signatures and truncated tensor algebras

131

which is a compatibility relation between the tensornorms on the respective N tensor levels. Then for any g = k =0 π k (g) ∈ T N Rd we set |g|T N (Rd ) :=

max |π k (g)|

k =0,...,N

which makes T N Rd a Banach space (of ﬁnite dimension 1 + d + d2 + · · · + dN ); again we shall write |g| if no confusion is possible. We remark that there are other choices of norms on T N Rd , of course all equivalent, but this one will turn out to be convenient later on (cf. deﬁnition of ρp−ω later). Exercise 7.6 Consider the (inﬁnite) tensor algebra T ∞ Rd := ⊕∞ k =0 d ⊗k R . Show thatT N Rd is the algebra obtained by factorization by the ideal g ∈ T ∞ Rd : π i (g) = 0 for 0 ≤ i ≤ N . Exercise 7.7 Identify T ∞ Rd ≡ R !e1 , . . . , ed " with the algebra of polynomials in d non-commutative indeterminants e1 , . . . , ed . Conclude that T N Rd can be viewed as an algebra of polynomials in d non-commutative indeterminants for which ei 1 . . . ei n ≡ 0 whenever n ≥ N .

7.2.2 Basic properties of SN Given x : [0, T ] → Rd , continuous of bounded variation, and a ﬁxed s ∈ N [0, T ) the path SN (x)s,· takes values in T N Rd ∼ = R1+d+···+d , as vector spaces. Almost by deﬁnition, the path SN (x)s,t then satisﬁes an ODE on T N Rd driven by x. Proposition 7.8 Let x : [0, T ] → Rd be a continuous path of bounded variation. Then, for ﬁxed s ∈ [0, T ),

dSN (x)s,t = SN (x)s,t ⊗ dxt , SN (x)s,s = 1.

d Remark 7.9 If we write x (·)= i=1 xi (·) ei and deﬁne the (linear!) vec tor ﬁelds Ui : T N Rd → T N Rd by g → g ⊗ ei , i ∈ {1, . . . , d} this ODE can be rewritten in the more familiar form

dSN (x)s,t =

d i=1

Ui SN (x)s,· dxit .

Free nilpotent groups

132

Proof. Let us look at level k, k ≥ 1, of SN (x)s,t , dxr 1 ⊗ · · · ⊗ dxr k s< r 1 < ...< r k < t t

= rk =s

t

= r=s

s< r 1 < ...< r k −1 < r k

dxr 1 ⊗ · · · ⊗ dxr k −1

⊗ dxr k

π k −1 SN (x)s,r ⊗ dxr .

Hence, we see that SN (x)s,t = 1 +

s

t

SN (x)s,r ⊗ dxr .

Proposition 7.8 tells us that SN (x) satisﬁes an ODE of the type discussed in Part I of this book. A number of interesting properties of signatures are then direct consequences of the corresponding ODE statements. We ﬁrst describe what happens under reparametrization. Proposition 7.10 Let x : [0, T ] → Rd be a continuous path of bounded variation, φ : [0, T ] → [T1 , T2 ] a non-decreasing surjection, and write xφt := xφ(t) for the reparametrization of x under φ. Then, for all s, t ∈ [0, T ] , SN (x)φ(s),φ(t) = SN xφ s,t . Proof. A consequence of Propositions 3.10 and 7.8. Simple as it is, Proposition 7.10 has an appealing If [x] interpretation. denotes the equivalence class of some x ∈ C 1-var [0, T ] , Rd obtained by all possible reparametrizations, then the signature-map x ∈ C 1-var [0, T ] , Rd → SN (x)0,T is really a function of [x]. We now discuss the signature of the con 1-var [0, T ] , Rd , η ∈ catenation of two paths. Recall that, given γ ∈ C 1-var d [T, 2T ] , R , we set C γ (·) on [0, T ] γη ≡ η (·) − η (0) + γ (T ) on [T, 2T ] so that γ η ∈ C 1-var [0, 2T ] , Rd . If η was deﬁned on [0, T ] , rather than [T, 2T ], one may prefer to reparametrize such that γ η too is deﬁned on [0, T ]. Whatever parametrization one chooses, thanks to the previous proposition the signature of the entire path γ η is intrinsically deﬁned. The following theorem says the signature of γ η is precisely the tensor product of the respective signatures of γ and η.

7.2 Step-N signatures and truncated tensor algebras

133

Theorem 7.11 (Chen) Given γ ∈ C 1-var [0, T ] , Rd , η ∈ C 1-var ([T, 2T ] , Rd , SN (γ η)0,2T = SN (γ)0,T ⊗ SN (η)T ,2T . Equivalently, given x ∈ C 1-var [0, T ] , Rd and 0 ≤ s < t < u ≤ T we have SN (x)s,u = SN (x)s,t ⊗ SN (x)t,u . Proof. By induction on N . For N = 0, it just reads 1 = 1.1. Assume it is true for N and all s < t < u ∈ [0, T ], and let us prove it is true for N + 1. First observe, in T (N +1) Rd , SN +1 (x)s,u = 1 +

u

s

SN +1 (x)s,r ⊗ dxr = 1 +

u

s

SN (x)s,r ⊗ dxr ,

where the second equality follows from truncation beyond level (N + 1). For similar reasons, u u SN (x)t,r ⊗ dxr = SN +1 (x)s,t ⊗ SN (x)t,r ⊗ dxr . SN (x)s,t ⊗ t

t

Hence, using the induction hypothesis to split up SN (x)s,r when s < t < r < u, t u SN (x)s,r ⊗ dxr + SN (x)s,t ⊗ SN (x)t,r ⊗ dxr SN +1 (x)s,u = 1 + s t u = SN +1 (x)s,t + SN +1 (x)s,t ⊗ SN (x)t,r ⊗ dxr t = SN +1 (x)s,t ⊗ 1 + SN +1 (x)t,u − 1 =

SN +1 (x)s,t ⊗ SN +1 (x)t,u .

We now show that the inverse (with respect to ⊗) of the signature of a path is precisely the signature of that path with time reversed. − denotes the path x Proposition 7.12 Let x ∈ C 1-var [0, T ] , Rd . Then, if ← d x : t ∈ [0, T ] → xT −t ∈ R , −) −) ⊗ S (x) S (x) ⊗ S (← x = S (← x = 1. N

0,T

N

0,T

N

0,T

N

0,T

Proof. Using the fact that t → SN (x)0,t is the solution to an ODE driven by x (Proposition 7.8) this follows immediately from the corresponding results on ODEs with time-reversed driving signal (Proposition 3.13). Deﬁnition 7.13 For λ ∈ R, we deﬁne the dilation map δ λ : T N Rd → T N Rd such that π k (δ λ (g)) = λk π k (g) .

Free nilpotent groups

134

Exercise 7.14 Check that if λ is a real, x : [0, 1] → Rd is a continuous path of bounded variation, λx the path x scaled by λ, then SN (λx)s,t = δ λ SN (x)s,t . Proposition 7.15 Let (xn ) ⊂ C 1-var [0, 1] , Rd with supn |xn |1-var;[0,1] < ∞, uniformly convergent to some x ∈ C 1-var [0, 1] , Rd . Then, SN (xn )0,· converges uniformly to SN (x)0,· . In particular, lim SN (xn )0,1 = SN (x)0,1 .

n →∞

Proof. Using the fact that t → SN (x)0,t is the solution to an ODE driven by x (Proposition 7.8) this follows immediately from the ODE results on continuity of the solution map (cf. Corollary 3.16).

7.3 Lie algebra tN Rd and Lie group 1 + tN Rd The space T N Rd , +, ., ⊗ is an associative algebra. We introduce two simple subspaces (linear and aﬃne-linear, respectively) which will guide us towards the (crucial) free nilpotent Lie algebra and group and their Lie algebra. Let us set tN Rd ≡ g ∈ T N Rd : π 0 (g) = 0 so that

7.3.1

1 + tN Rd = g ∈ T N Rd : π 0 (g) = 1 .

The group 1 + tN Rd

We ﬁrst show that elements in 1 + tN Rd are invertible with respect to the tensor product ⊗. Lemma 7.16 Any g = 1 + a ∈ 1 + tN Rd has an inverse with respect to ⊗ given by N −1 k (−1) a⊗k , g −1 = (1 + a) = k =0

that is, g ⊗ g

−1

=g

−1

⊗ g = 1.

Proof. We have (1 + a)

N k =0

k

(−1) ak

=

N +1

k =1 N +1

= a

k +1

(−1)

ak +

N k =0

+ 1.

k

(−1) ak

7.3 Lie algebra and Lie group

135

⊗n Now because we set to zero any elements in Rd for n > N , we see that aN +1 = 0. It is also obvious that if g and h are in 1 + tN Rd , then g⊗ h ∈ 1 + tN Rd . Finally, 1 + tN Rd is an aﬃne-linear subspace of T N Rd hence N ˙ a smooth manifold, trivially diﬀeomorphic to tN Rd ∼ = Rd +···+ d . Noting that the group operations ⊗, −1 are smooth maps (in fact, polynomial when written out in coordinates) we have Proposition 7.17 The space 1 + tN Rd is a Lie group3 with respect to tensor multiplication ⊗. Let us remark that the manifold topology of 1 + tN Rd is, of course, induced by the metric ρ (g, h) := |g − h|T N (Rd ) = |g − h| = max |π i (g − h)| , i=1,...,N

(7.12)

which arises from the norm on T N Rd ⊃ 1 + tN Rd . We end this subsection with a simple observation. d N Proposition 7.18 Assume −1gn , g ⊂ 1 + t R . Then, limn →∞ |gn − g| = 0 if and only if limn →∞ gn ⊗ g − 1 = 0. Proof. All group operations in the Lie group 1 + tN Rd are continuous in the manifold topology of 1 + tN Rd . In particular, gn → g ⇔ gn−1 → g −1 ⇔ gn−1 ⊗ g → g −1 ⊗ g = 1 and we conclude with the remark that, as n → ∞, gn → g ⇔ |gn − g| → 0 and gn−1 ⊗ g → 1 ⇔ gn−1 ⊗ g − 1 → 0.

7.3.2 The Lie algebra tN Rd and the exponential map

The vector space tN Rd , +, . becomes itself an algebra under ⊗. As in every algebra, the commutator, in our case (g, h) → [g, h] := g ⊗ h − h ⊗ g ∈ tN Rd for g, h ∈ tN Rd , deﬁnes a bilinear map which is easily seen to be anticommutative, i.e. [g, h] = − [h, g] , for all g, h ∈ tN Rd 3 Recall that a Lie group is by deﬁnition a group which is also a smooth manifold and in which the group operations are smooth maps.

Free nilpotent groups

136

and to satisfy the Jacobi identity for all g, h, k ∈ tN Rd ; that is, [g, [h, k]] + [h, [k, g]] + [k, [g, h]] = 0 for all g, h, k ∈ tN Rd . Recalling that a vector space V = (V, +, .) equipped with a bilinear, anticommutative map [·, ·] : V × V → V which satisﬁes the Jacobi identity is called a Lie algebra (the map [·, ·] is called the Lie bracket), this can be summarized as Proposition 7.19 tN Rd , +, ., [·, ·] is a Lie algebra. We now deﬁne the exponential and logarithm maps via their power series: Deﬁnition 7.20 The exponential map is deﬁned by exp : tN Rd → 1 + tN Rd a →

1+

N a⊗k

k!

k =1

while the logarithm map is deﬁned by log (1 + a)

: →

1 + tN Rd → tN Rd N

(−1)

k =1

k +1

a⊗k . k

The deﬁnitions of exp and log are precisely via their classical power series with (i) usual powers replaced by “tensor powers” and (ii) the inﬁnite N sums replaced by k =1 , thanks to working within the tensor algebra with truncation beyond level N . A direct calculation shows that exp (log (1 + a)) = a, log (exp (a)) = a, for all a ∈ tN Rd . We emphasize that log (·) is globally deﬁned and there are no convergence issues whatsoever. Example 7.21 Fix a ∈ Rd ∼ = π 1 tN Rd . The step-N signature of x (·) given by t ∈ [0, 1] → ta computes to N dxr 1 ⊗ · · · ⊗ dxr k SN (x)0,1 = 1 + 0< r 1 < ...< r k < 1

k =1

=

1+

N

⊗k

a

1+

N a⊗k k =1

dr1 . . . drk 0< r 1 < ...< r k < 1

k =1

=

k!

= exp (a) .

(7.13)

7.3 Lie algebra and Lie group

7.3.3

137

The Campbell–Baker–Hausdorﬀ formula

Recall that exp maps tN Rd one-to-one and onto 1 + tN Rd . In general, ea eb = ea+ b , but one has 1 1 1 (7.14) ∀a, b ∈ tN Rd : ea eb = ea+b+ 2 [a,b]+ 1 2 [a,[a,b]]+ 1 2 [b,[b,a]]+... where . . . stands for (a linear combination, with universal coeﬃcients, of) higher iterated brackets of a and b. Thanks to truncation beyond tensorlevel N , all terms involving N or more iterated brackets must be zero. For N = 2, this formula is obtained by simple computation; indeed, given a,b ∈ t2 Rd we have b⊗2 a⊗2 ⊗ 1+b+ exp (a) ⊗ exp (b) = 1+a+ 2 2 = =

1 (a + b) 1 + a + b + (a ⊗ b − b ⊗ a) + 2 2 1 exp a + b + [a, b] . 2

⊗2

The same computation is possible, if tedious, for N = 3 and allows us to recover the next set of bracket terms as seen in (7.14). The general case, however, requires a diﬀerent argument and we shall give a proof based on ordinary diﬀerential equations. Given a linear operator A : tN Rd → tN Rd deﬁne A0 as identity map and set An = A ◦ · · · ◦ A (n times). An arbitrary real-analytic function n ≥0 γ n z n gives rise to the operator n ≥0 γ n An provided this series converges in operator norm. Fix a ∈ tN Rd and deﬁne the linear map tN Rd $ d → (ad a) d ≡ [a, d] ∈ tN Rd . n

Observe that (ad a) ≡ 0 for n > N. Lemma 7.22 For all a, d ∈ tN Rd we have exp (a) ⊗ d ⊗ exp (−a) = ead a d where ead a ≡

N 1 1 n n (ad a) = (ad a) . n! n! n =0

n ≥0

As a consequence we have the following operator identity on tN Rd : ead c = ead a ◦ ead b where c = log (exp (a) ⊗ exp (b)) .

Free nilpotent groups

138

Proof. Deﬁne the linear map ρt from tN Rd into itself by ρt : d → exp (t a) ⊗ d ⊗ exp (−t a) . Obviously, t → ρt (d) ∈ tN Rd is diﬀerentiable and d ρ (d) = [a, ρt (d)] = (ad a) ρt (d) . dt t Noting ρ0 (d) = d the solution is given by ρt (d) = et ad a d and we conclude by setting t = 1. The last statement is immediately veriﬁed by considering the image of some d ∈ tN Rd under both ead c and ead a ◦ ead b . Lemma 7.23 Assume t → ct ∈ tN Rd is continuously diﬀerentiable. Then d exp (−ct ) = −G (ad ct ) c˙t exp (ct ) ⊗ dt where c˙t = dct /dt and G (z) =

ez − 1 1 = zn . z (n + 1)! n ≥0

Proof. Set

d exp (−s ct ) . dt Then b0,t = 1 ⊗ 0 = 0. Taking derivatives with respect to s, a short computation shows that bs,t = exp (s ct ) ⊗

d bs,t ds

=

[ct , bs,t ] − c˙t

=

(ad ct ) bs,t − c˙t

where c˙t = dct /dt. Keeping t ﬁxed, the solution is given by bs,t

= =

es ad c t b0,t − f (s, ad ct ) c˙t −f (s, ad ct ) c˙t

where f (s, z) = (es z − 1) /z, entire in z. Setting s = 1 ﬁnishes the proof. Theorem 7.24 (Campbell–Baker–Hausdorﬀ ) Let a, b ∈ tN Rd . Then

1

log [exp (a) ⊗ exp (b)] = b +

H et

ad a

◦ ead b a dt

0

where H (z) =

(−1)n ln z n = (z − 1) . z−1 n+1 n ≥0

(7.15)

7.3 Lie algebra and Lie group

139

In particular, log (exp (a) ⊗ exp (b)) equals a sum of iterated brackets of a and b, with universal4 coeﬃcients. Proof. First observe that the linear operator H et ad a ◦ ead b is in fact given by a ﬁnite series. Indeed, et ad a ◦ ead b minus the identity map is a ﬁnite sum of N or less iterated applications of ad a, ad b. Consequently, only the ﬁrst N terms of the expansion of H are needed. We also observe that both sides of (7.15) of degree less than or equal to are polynomials N in the coordinates an ;i 1 ,...,i n , bn ,;i 1 ,...,i n , 1 ≤ n ≤ N . Therefore, it is suﬃcient to check (7.15) for a,b in a neighbourhood of 0. For t ∈ [0, 1] deﬁne ct = log et a ⊗ e b . Then

d −c t d −b e e ⊗ e−t a = −a = et a ⊗ eb ⊗ dt dt which, combined with Lemma 7.23, shows that ec t ⊗

a = G (ad ct ) c˙t .

(7.16)

On the other hand, Lemma 7.22 implies the operator identity ad ct = ln et ad a ◦ ead b which certainly holds when a,b are small enough. Since G(ln z)H(z) = 1, at least near z = 1, the operator identity −1 G (ad ct ) = G ln et ad a ◦ ead b = H et ad a ◦ ead b holds for a,b small enough. This allows us to rewrite (7.16) as c˙t = H et ad a ◦ ead b a and by integration c1 = b+

1

H et ad a ◦ ead b a dt

0

which is exactly what we wanted to show. d ⊂ tN Rd as the smallest sub-Lie algebra Deﬁnition 7.25 Deﬁne gN R of tN Rd which contains π 1 tN Rd ∼ = Rd . That is, gN Rd = Rd ⊕ Rd , Rd ⊕ · · · ⊕ Rd , . . . , Rd , Rd .

(N −1) brackets

We call it the free step-N nilpotent Lie algebra.

4 The coeﬃcients are given by numerical constants, computable from H . In particular, they do not depend on N or d = dim Rd .

140

Free nilpotent groups

Remark 7.26 (universal property of free Lie algebras) Let i denote the (linear) inclusion map i : π 1 tN Rd ∼ = Rd → gN Rd . Then g = gN Rd has the property that for any (step-N nilpotent) Lie algeba a and any linear map f : Rd → a there is a Lie algebra homomorphism φ : g → a so that f = φ ◦ i. Indeed, we can take a basis of g where each element is of the form eα = eα 1 , eα 2 , . . . eα k −1 , eα k d d ⊗k ∈ R , . . . , Rd , Rd ⊂ R ,k ≤ N and one checks that cα f (eα 1 ) , f (eα 2 ) , . . . f eα k −1 , f (eα k ) φ cα eα := has the desired properties.

Corollary 7.27 Let a, b ∈ gN Rd . Then log ea ⊗ eb ∈ gN Rd . In other words, exp gN Rd is a subgroup of 1 + tN Rd , ⊗, 1 Proof. An obvious corollary of the Campbell–Baker–Hausdorﬀ formula.

7.4 Chow’s theorem As was seen in Example 7.21, the step-N signature of the path t ∈ [0, 1] → vt, v ∈ Rd , is precisely exp (v) ∈ 1 + tN Rd . A piecewise linear path is just the concatenation of such paths (up to irrelevant reparametrization) and by Chen’s theorem its step-N signature is of the form ev 1 ⊗ · · · ⊗ ev m ∈ 1 + tN Rd with v1 , . . . , vm ∈ Rd . Conversely, any element of this form arises as the step-N signature of a piecewise linear path, e.g. of x : [0, m] → Rd with xi−1,i = vi , i = 1, . . . , m and linear between integer times. (If one prefers, the trivial reparametrization x ˜ (t) = x (tm) deﬁnes a piecewise linear path on [0, 1] with identical signature.) Theorem 7.28 (Chow) Let g ∈ exp gN Rd . Then, there exist v1 , . . . , vm ∈ Rd such that g = ev 1 ⊗ · · · ⊗ ev m . Equivalently, there exists a piecewise linear path x : [0, 1] → Rd with signature g, by which we mean g = SN (x)0,1 .

7.4 Chow’s theorem

141

Proof. It isenough to show this for log (g) in an open neighbourhood of A of the 0 ∈ gN Rd or, equivalently, for g in an open neighbourhood unit element 1 = exp (0). Indeed, given g ∈ exp gN Rd , it is clear that δ ε g ∈ A for ε small enough and if x denotes the (piecewise linear) path whose signature equals δ ε g, the scaled path x (·) /ε, which is still piecewise linear, has signature g. d With i: i = 1, . . . , d) denoting the standard basis in R , let us deﬁne the (e N d exp g R -valued paths φit = exp (tei ) . An easy application of the Campbell–Baker–Hausdorﬀ formula shows that5 φij t

: = exp (−tej ) ⊗ exp (−tei ) ⊗ exp (tej ) ⊗ exp (tei ) = exp t2 [ei , ej ] + o t2 .

For a multi-index I = iJ, we deﬁne by induction φIt

= φJ−t ⊗ exp (−tei ) ⊗ φJt ⊗ exp (tei ) exp t|I | eI + o t|I |

: =

k where eI = ei 1 , . . . ei k −1 , ei k for I = (i1 , . . . , ik ) ∈ {1, . . . , d} . We I I then deﬁne ψ t := φt 1 / |I | for t ≥ 0 and for t < 0, # ψ It

:=

φI−|t|1 / |I |

φJ−|t|1 / |I | ⊗

1/|I |

exp − |t|

ei ⊗

φJ|t|1 / |I |

1/|I |

⊗ exp |t|

ei

for |I| odd for |I| even

so that ψ It = exp (teI + o (t)) . We now choose a vector space basis (eI k : k = 1, . . . , n) of gN deﬁne ϕ (t1 , . . . , tn ) = log ψ It nn ⊗ · · · ⊗ ψ It 11 ∈ gN Rd

(7.17) Rd and

and note that ϕ(0) = log (1) = 0. We also observe that, thanks to (7.17), ϕ : Rn → gN Rd ∼ = Rn (as vector spaces) is continuously diﬀerentiable near 0, with non-degenerate derivative at the origin.6 This implies that the (one-to-one) image under ϕ of a small enough neighbourhood of 0 ∈ Rn , say Ω, contains an open neighbourhood of 0 ∈ gN Rd . On 5 Observe

⊗3 ⊗N . that the o t2 term is an element in g N Rd ∩ Rd ⊕ · · · ⊕ Rd

6 The diﬀerential of log at 1 is the identity map and hence plays no role in the nondegeneracy at 0.

142

Free nilpotent groups

the other hand, unwrapping the very deﬁnition of ϕ shows any element of the form exp ϕ (t1 , . . . , tn ) ∈ exp gN Rd , (t1 , . . . , tn ) ∈ Ω, is the concatenation of piecewise linear path segments in basis directions (ei : i = 1, . . . , d) and so the proof is ﬁnished. N d converges to 1 ∈ exp Corollary N d 7.29 Assume (gk ) ⊂ exp g R as k → ∞. Let xk denote the piecewise linear path with sigg R nature gk as constructed in the previous theorem. Then the length of xk converges to 0 as k → ∞. Proof. proof of the previous theorem shows that any element g ∈ The exp gN Rd , close enough to 1, can be written as g = exp ϕ (t1 , . . . , tn ) and the length of the associated (piecewise linear) path x = xg is bounded by 1/|I | 1/|I | K |t1 | 1 + · · · + |tn | n where K counts the maximal number of concatenations involved in the I ψ t j ’s. Observe that K depends on the space gN Rd but can be chosen independent of g. Since g → 1 is equivalent to (t1 , . . . , tn ) → 0, we obtain the desired continuity statement length (xg ) → 0 as g → 1.

7.5 Free nilpotent groups 7.5.1 Deﬁnition and characterization Let us consider (i) the set of all step-N signatures of continuous paths of ﬁnite length, GN Rd := SN (x)0,1 : x ∈ C 1-var [0, 1] , Rd ; (ii) the image of the sub-Lie algebra gN Rd ⊂ tN Rd , cf. Deﬁnition 7.25, under the exponential map exp gN Rd ⊂ 1 + tN Rd ;

7.5 Free nilpotent groups

143

2 3 of 1 + tN Rd generated by elements in (iii) the subgroup exp Rd exp Rd , i.e. 2

exp R

d

3

:=

# m 4

$ exp (vi ) : m ≥ 1, v1 , . . . , vm ∈ R

d

.

i=1

Theorem 7.30 (and deﬁnition) We have 2 3 GN Rd = exp gN Rd = exp Rd and GN Rd is a (closed) sub-Lie group of 1 + tN Rd , ⊗ , called the free nilpotent group of step N over Rd . 2 3 Proof. Step 1: We show exp gN Rd = exp Rd . Obviously, exp Rd N d and it follows from the CBH formula that ⊂ exp g R 2

3 exp Rd ⊂ exp gN Rd .

Conversely, Chow’s theorem tells us that any element of exp gN Rd can be expressed in the form ev 1 ⊗ · · · ⊗ ev m with vi ∈ Rd , which plainly implies 3 2 exp gN Rd ⊂ exp Rd . 2 3 Step 2: We show that exp Rd is a closed subset of 1 + tN Rd . By the N d previous step, it suﬃces to check that exp this g R is closed. But follows from (obvious!) closedness of gN Rd ⊂ tN Rd , exp gN Rd = log−1 gN Rd and continuity 2 of log.3 Step 3: We show GN Rd = exp Rd . The inclusion 2

3 exp Rd ⊂ GN Rd

is clear from Example 7.21 and Chen’s theorem. Indeed, any ev 1 ⊗ · · · ⊗ ev m is the step-N signature of x ˜ : [0, 1] → Rd where x ˜ (t) = x (tm) and x : d [0, m] → R with xi−1,i = vi , i = 1, . . . , m and linear between integer times. (This was already pointed out in our discussion of Theorem 7.28.) 2 3 By step 2, !exp (Rd )" = exp Rd and so the other inclusion will follow from GN Rd ⊂ !exp (Rd )". To prove this inclusion, take g = SN (x)0,1 ∈ GN Rd , for some x ∈ C 1-var [0, 1] , Rd . We know that piecewise linear approximations (xn ) satisfy supn |xn |1-var;[0,1] < ∞ and converge to x, uniformly on [0, 1]. By Chen’s 2 3 theorem, gn = SN (xn )0,1 ∈ exp Rd and from Proposition 7.15, gn converges to g. This shows that g ∈ !exp (Rd )", which is what we wanted.

Free nilpotent groups

144

2 3 Step 4: We show that the set G := GN Rd = exp gN Rd = exp Rd is a closed sub-Lie group of 1 + tN Rd , ⊗ . Topological 2 d 3closedness was is an abstract already seenin step 2. It is also clear that G = exp R sub-Lie group subgroup of 1 + tN Rd , ⊗ . To see that we have an actual we have to check that G is a submanifold of 1 + tN Rd . It is not hard to deduce this from d (log ◦ exp) = Id, so that by the chain rule, d exp is one-to-one at every point. Alternatively, one can appeal to a standard theorem in Lie group theory7 which asserts that a closed abstract subgroup is automatically a (closed) Lie subgroup. Remark 7.31 (manifold topology of GN Rd ) In equation (7.12) we deﬁned the metric ρ on 1 + tN Rd , induced from the norm |·|T N (Rd ) . By trivial restriction, ρ is also a metric on GN Rd , given by ρ (g, h) = max |π i (g − h)| , i=1,...,N

which induces the (sub)manifold topology on GN Rd .

7.5.2 Geodesic existence

Chow’s theorem tells us that for all elements g ∈ GN Rd , there exists a continuous path x of ﬁnite length such that SN (x)0,1 = g. One may ask for the shortest path (and its length) which has the correct signature. For instance, given a > 0, we can ask for the shortest path with signature 0 0 a exp + ∈ G2 R2 , 0 −a 0 or, equivalently, the shortest path in R2 which ends where it starts and wipes out area a. As is well known from basic isoperimetry, the shortest such path √ is given by a circle (with area a) with easily computed length given by 2 πa. With this motivating example in mind we now state Theorem 7.32 (geodesic existence) For every g ∈ GN Rd , the socalled “Carnot–Caratheodory norm”8 1 g := inf |dγ| : γ ∈ C 1-var [0, 1] , Rd and SN (γ)0,1 = g 0

is ﬁnite and achieved at some minimizing path γ ∗ , i.e. 1 g = |dγ ∗ | and SN (γ ∗ )0,1 = g. 0

7 See,

for example, Warner [175].

d usual, with Euclidean structure so that 01 |dγ| is the length of R isd equipped 1 -va r d [0, 1] , R , based on the Euclidean distance on R . γ∈C 8 As

7.5 Free nilpotent groups

145

Moreover, this minimizer can (and will) be parametrized to be Lipschitz (i.e. 1-H¨ older) continuous and of constant speed, i.e. |γ˙ ∗ (r)| ≡ (const) for a.e. r ∈ [0, 1]. Remark 7.33 By invariance of length and signatures under reparametrization, γ ∗ need not by deﬁned on [0, 1] but may be deﬁned for any interval [s, t] with non-empty interior. Proof. From Chow’s theorem, the inf is taken over a non-empty set so that g < ∞. By deﬁnition of inf, there is a sequence (γ n ) with signature g and we can assume (by reparametrization, cf. Proposition 1.38) that each γ n = γ n (t) has a.s. constant speed |γ˙ n | ≡ |γ n |1-H¨o l;[0,1] = cn where cn is the length of the path γ n· and cn ↓ g . Clearly, sup |γ n |1-H¨o l;[0,1] = sup cn < ∞ n

n

and from Arzela–Ascoli, after relabelling the sequence, γ n converges uniformly to some (continuous) limit path γ ∗ . By Lemma 1.23, |γ ∗ |1-H¨o l;[0,1] ≤ lim inf |γ n |1-H¨o l;[0,1] n

(7.18)

which shows in particular that γ ∗ itself is 1-H¨older, hence absolutely continuous, so that 1 1 |dγ ∗ | = |γ˙ ∗t | dt. 0

0

From basic continuity properties of the signature (Proposition 7.15) g ≡ SN (γ n )0,1 → SN (γ ∗ )0,1 which shows that SN (γ ∗ )0,1 = g. It remains to see that g =

1

|γ˙ ∗t | dt.

0

1 First, g ≤ 0 |γ˙ ∗t | dt is obvious from the deﬁnition of g . On the other hand, using (7.18) we have 0

1

|γ˙ ∗t | dt = |γ ∗ |1-H¨o l;[0,1] ≤ lim inf cn = g

and the proof is ﬁnished.

n

Free nilpotent groups

146

7.5.3 Homogenous norms

Let us now deﬁne the important concept of a homogenous norm on GN Rd . Deﬁnition 7.34 A homogenous norm is a continuous map |||.||| : GN Rd → R+ which satisﬁes (i) |||g||| = 0 if and only if g equals the unit element 1 ∈ GN Rd , (ii) homogeneity with respect to the dilation operator δ λ , |||δ λ g||| = |λ| . |||g||| for all λ ∈ R.

A homogenous norm is said to be symmetric if |||g||| = g −1 , and sub additive if |||g ⊗ h||| ≤ |||g||| + |||h|||. Remark 7.35 If |||.||| is a non-symmetric homogenous norm, then g → |||g||| + g −1 is a symmetric homogenous norm. Sub-additivity is pre served under such symmetrization. Proposition 7.36 Every symmetric sub-additive homogenous norm |||.||| leads to a genuine metric GN Rd via (g, h) → g −1 ⊗ h . Moreover, this metric is left-invariant.9 Proof. We write d (g, h) = g −1 ⊗ h . Property (i) in Deﬁnition 7.34 implies d (g, h) = 0 iﬀ g = h. Sub-additivity of |||.||| implies the triangle inequality for d and symmetry |||.||| implies d (g, h) = d (h, g). At last, left-invariance of d, i.e. d (g ⊗ h, g ⊗ k) = d (h, k) , follows from (g ⊗ h)

−1

⊗ (g ⊗ k) = h−1 ⊗ k.

Example 7.37 The simplest example of a homogenous norm is the map 1/i g ∈ GN Rd → |||g||| = max |π i (g)| . i=1,...,N

In general, it is neither symmetric nor sub-additive.

Exercise 7.38 Prove that −1/N 1/i maxi=1,...,N (i! |π i (g)|) is a sub(i) |||.|||1 : g ∈ GN Rd → (N !) additive homogenous norm. 1/i is a symmetric (ii) |||.|||2 : g ∈ GN Rd → maxi=1,...,N |π i (log g)| homogenous norm. 9 in general not right-invariant. A right-invariant metric could be deﬁned by . . . but g ⊗ h −1 .

7.5 Free nilpotent groups

147

Exercise 7.39 Compute the minimal length of all paths with signature x 0 a exp + . y −a 0 (This is precisely the Carnot–Caratheodory norm of (x, y, a) ∈ H, the 3dimensional Heisenberg group.) Check that this gives 5 x2 + y 2 when a = 0, 5 2 π |a| when x = y = 0. (See [157] for instance.)

7.5.4 Carnot–Caratheodory metric We now check that the Carnot–Caratheodory norm ·, which we introduced in Theorem 7.32, deﬁnes a homogenous norm in the sense of Deﬁnition 7.34. The geodesic existence result came with a map, the Carnot–Caratheodory d N R → [0, ∞). In conjunction with the group structure norm, from G of GN Rd , it is only a small step to deﬁne a genuine metric on GN Rd . Proposition 7.40 Let g, h ∈ GN Rd . We have (i) g = 0 if and only if g = 1, the unit element in GN Rd ; (ii) homogeneity δ λ g g for all λ ∈ R; = |λ| (iii) symmetry g = g −1 ; (iv) sub-additivity g ⊗ h ≤ g + h; (v) continuity: g → g is continuous. Proof. Notation: for g ∈ G let γ ∗g = γ ∗ denote an arbitrary minimizer from the geodesic existence theorem. (i) If g = 0, γ ∗g has almost everywhere zero derivative, hence g = SN γ ∗g 0,1 = 1. If g = 1, it is obvious that g = 0. (ii) The case λ = 0 is easy, so we assume λ = 0. The path λγ ∗g satis ﬁes SN λγ ∗g 0,1 = δ λ g. Hence δ λ g ≤ length λγ ∗g = |λ| × length γ ∗g = |λ| g. The opposite inequality follows from replacing λ by 1/λ and g by δ λ g. ← − (iii) Using the fact that SN γ ∗g = g −1 we obtain 0,1

− −1 g ≤ length ← γ ∗g = length γ ∗g = g . The opposite inequality follows from replacing g by g −1 . (iv) If γ ∗g , γ ∗h denote the resp. geodesics then, from Chen’s theorem, g ⊗ h = SN γ ∗g ,h 0,1

Free nilpotent groups

148

where γ ∗g ,h is the (Lipschitz continuous) concatenation of γ ∗g and γ ∗h with obvious length g + h. Hence, g ⊗ h must be less than or equal to the length of γ ∗g ,h . (v) Consider a sequence gn such that |gn − g| →n →∞ 0. (Here |·| denotes a norm topology on on the tensor algebra which induces the “original” −1 GN Rd .) By continuity of the group operations ⊗ and (·) , all of which are polynomial in the coordinates, gn−1 ⊗ g → 1 is an obvious consequence. From sub-additivity, |gn − g| ≤ gn−1 ⊗ g and since gn−1 ⊗ g is dominated by the length of any path with correct signature (namely gn−1 ⊗ g), it follows from Corollary 7.29 that −1 gn ⊗ g → 0. As a consequence, |gn − g| → 0 which implies continuity. Deﬁnition 7.41 The Carnot–Caratheodory norm on GN Rd induces (viaProposition 7.36) a genuine (left-invariant, continuous)10 metric d on N d G R , called the Carnot–Caratheodory metric. N d The space G R , d is not only a metric but a geodesic space (in the sense of Deﬁnition 5.17). To this end recall that, given g ∈ GN Rd , Theorem 7.32 provides us with an associated Lipschitz path γ ∗ : [0, 1] → Rd of minimal length11 equal to g such that t ∈ [0, 1] → SN (γ ∗ )0,t ∈ GN Rd connects the unit element in GN Rd with g. Proposition 7.42 GN Rd equipped with Carnot metric d is a geodesic space. Given g,h ∈ GN Rd , a connecting geodesic is given by t ∈ [0, 1] → Υt := g ⊗ SN (γ ∗ )0,t where γ ∗ is the geodesic associated with g −1 ⊗ h. Proof. Obviously, Υ is continuous and Υ0 = g, Υ1 = h. For any s < t in [0, 1] , d (Υs , Υt ) := SN (γ ∗ )s,t t |dγ ∗ | (7.19) ≤ s

1

|dγ ∗ |

=

(t − s)

=

(t − s) g −1 ⊗ h = |t − s| d (g, h) . 0

1 0 By 1 1 By

Proposition 7.40, part (v). reparametrization, the speed |γ˙ ∗t | may be taken constant for a.e. t ∈ [0, 1].

7.5 Free nilpotent groups

149

In fact, the inequality cannot be strict; there would be a strict inequality in d (g, h)

≤ d (Υ0 , Υs ) + d (Υs , Υt ) + d (Υt , Υ1 ) ≤ (|s| + |t − s| + |1 − t|) d (g, h) = d (g, h)

which is not possible. We conclude that equality holds in (7.19), which shows that Υ is the desired connecting geodesic. Remark 7.43 [sub-Riemannian structure of GN Rd ] The geodesic constructed above satisﬁes the diﬀerential equation dΥt =

d

Ui (Υt ) dγ i

i=1

where the Ui (g) = g ⊗ei , i = 1, . . . , d are easily seen to be left-invariant vector ﬁelds on GN Rd . In fact, Lie [U1 , . . . , Ud ] |g = Tg GN Rd Lie algefor all g ∈ GN Rd , where Lie[. . . ] stands for the generated bra and Tg GN Rd denotes the tangent space to GN Rd at the point g. Chow’s theorem can now be understood as the statement that any two tangent to the points in GN Rd can be joined by a path that remains {U1 , . . . , Ud }. A sub-Riemannian metric on Tg GN Rd is given by declaring the {U1 , . . . , Ud } to be orthonormal. This induces a natural length for any path which remains tangent to span{U1 , . . . , Ud }. This applies in particular to the geodesic Υ of the previous proposition (Υ0 = g, Υ1 = h) and this natural length is precisely the Carnot–Caratheodory distance d (g, h).

7.5.5 Equivalence of homogenous norms

Similar to the case of norms on Rd , all homogenous norms on GN Rd are equivalent. The proof relies crucially on the continuity of homogenous norms (which was part of their deﬁnition). Theorem 7.44 All homogenous norms on GN Rd are equivalent. More precisely, if .1 and .2 are two homogenous norms, there exists C ≥ 1 such that for all g ∈ GN Rd , we have 1 g1 ≤ g2 ≤ C g1 . C

(7.20)

Free nilpotent groups

150

Proof. It is enough to consider the case when g1 is given by 1/i

|||g||| := max |π i (g)| i=1,...,N

.

Let B = g ∈ GN Rd , |||g||| = 1 . Clearly, B is a compact set by continuity and .2 attains a (positive) minimum m and maximum M, i.e. for all g ∈ B, m ≤ g2 ≤ M. Since (7.20) holds trivially true when g = 1, the unit element of GN Rd , we only need to consider g = 1. We deﬁne ε = 1/ |||g||| so that |||δ ε g||| = 1. In particular, m ≤ ||δ ε g||2 ≤ M and by using homogeneity of .2 we obtain m ≤ ||g||2 / |||g||| ≤ M and (7.20) follows. Let us recall that the metric ρ on GN Rd , induced from the norm |·|T N (Rd ) , is given by ρ (g, h) = |g − h| = max |π i (g) − π i (h))| . i=1,...,N

(7.21)

When h = 1 this reduces to |g − 1| = max |π i (g)| . i=1,...,N

d N R . Then, Proposition 7.45 Let |||.||| be a homogenous norm on G there exists a constant C > 0 such that for all g ∈ GN Rd 1 N N min |||g||| , |||g||| ≤ |g − 1| ≤ C max |||g||| , |||g||| C and

1 1/N 1/N min |g − 1| , |g − 1| ≤ |||g||| ≤ C max |g − 1| , |g − 1| . C Proof. By equivalence of homogenous norms, it suﬃces to consider the case when 1/i |||g||| := max |π i (g)| . i=1,...,N 1/N which implies that But then, obviously, |||g||| ≤ max |g − 1| , |g − 1| N min |||g||| , |||g||| ≤ |g − 1| . On the other hand

i N |g − 1| = max |π i (g)| ≤ max |||g||| = max |||g||| , |||g||| i=1,...,N

i=1,...,N

and together these imply all the stated inequalities.

7.5 Free nilpotent groups

151

Corollary 7.46 The topology on GN Rd induced by Carnot–Caratheodory distance (in fact, by any metric associated with a symmetric sub-additive homogenous norm) coincides with the original12 topology of GN Rd . Proof. homogenous norm on −1 sub-additive Let |||.||| be any symmetric g ⊗ h for the associated metric. Given GN Rd and write d (g,h) = d N d N a sequence −1 (gn ) ⊂ G R and g ∈ G R , Proposition 7.45 implies that gn ⊗ g − 1 → 0 if and only if d (gn , g) = gn−1 ⊗ g → 0. On the other hand, we saw in Proposition 7.18 that gn−1 ⊗ g − 1 → 0 if and only if |gn − g| → 0. Remark 7.47 There are more geometric arguments for this. A Riemannian taming argument easily gives that convergence with respect to the CC distance implies convergence in the original topology. For the converse, continuity of the CC norm implies gn−1 ⊗ g → 0 as gn−1 ⊗ g → 1, which (by Proposition 7.18) is equivalent to gn → g in the original topology. We can improve Corollary 7.46 towards a quantitative comparison of the Carnot–Caratheodory distance with the “Euclidean” distance on GN Rd as given in (7.21). To this end, we need Lemma 7.48 Let g, h ∈ GN Rd , of form g = 1 + g 1 + · · · + g N , g i ∈ d ⊗i ⊗k R and similarly for h. The following equations then hold in Rd k=1, . . . , N : k −1 k −i −1 k g g ⊗h = ⊗ (hi − g i ) (7.22) i=1

and hk − g k =

k

i g k −i ⊗ g −1 ⊗ h .

(7.23)

i=1

Proof. Set g 0 = h0 = 1. By deﬁnition of the tensor product in GN Rd ⊂ k k k −i T N Rd , g −1 ⊗ h = i=0 g −1 ⊗hi . The result follows by subtract −1 k −i i k k ing from the previous expression 0 = g ⊗ g = i=0 g −1 ⊗g . The other equality follows from h − g = g ⊗ g −1 ⊗ h − 1 . Proposition 7.49 (ball-box estimate) Consider g, h ∈ GN Rd . There exists a constant C = C (N ) > 0 such that 1/N 1− 1 (7.24) d(g, h) ≤ C max |h − g| , |h − g| max 1, g N 1 2 Cf.

Remark 7.31.

Free nilpotent groups

152

and

N −1 N , d (g, h) . |h − g| ≤ C max d (g, h) max 1, g

(7.25)

In particular, recalling from (7.21) that ρ (g, h) ≡ |h − g|, Id : GN Rd , d GN Rd , ρ is Lipschitz on bounded sets in → direction and 1/N -H¨ older on bounded sets in ← direction. Proof. Equation (7.22) implies k −1 g ⊗h

≤ c1

k −1 k −i i g . h − g i i=1

=

c1

k

k −i

g

. hi − g i

by symmetry of ·

i=1

k −1 ≤ c2 |h − g| max 1, g . Hence, k 1/k max g −1 ⊗ h

! " 1− 1 1/k max max 1, g k |h − g| k =1,...,N 1/N 1− 1 . max 1, g N ≤ c4 max |h − g| , |h − g|

≤

k

c3

Conversely, from (7.23), k k k −i g −1 ⊗ hi . h − g k ≤ c5 g i=1

Hence, |h − g| ≤ c6

k N

k −i

g

−1 g ⊗ hi

k =1 i=1

≤

c7

N

i N −i d (g, h) max 1, g

i=1

N −1 N , d (g, h) . ≤ c8 max d (g, h) max 1, g

Corollary the Carnot–Caratheodory distance on 7.50 Let d denote GN Rd . Then GN Rd , d is a Polish space in which closed bounded sets are compact.

7.5 Free nilpotent groups

153

Proof. Completeness, and compactness of closed, bounded sets separability are obvious for GN Rd under ρ, the metric induced from |·|T N (Rd ) . It then suﬃces to apply the previous proposition. Exercise 7.51 (i) Let x ∈ Rd . Show that exp (x) = |x|, the Euclidean length of x. (ii) Assume .1 is a sub-additive, homogenous norm on GN Rd such that for all x ∈ Rd , (∗) : exp (x)1 = |x| . Show that g1 ≤ g for all g ∈ GN Rd . This says that the Carnot– Caratheodory norm is the largest sub-additive, homogenous norm which satisﬁes (∗). Solution. Let g ∈ GN Rd , γ a geodesic associated with g, and (γ n ) a sequence of piecewise linear approximations. Then, if (tni ) are the discontinuity points of the derivative of γ n , 4 SN (γ n )t n ,t n SN (γ n )0,1 = i i+ 1 1 i 1 n ≤ SN (γ )t ni ,t ni+ 1 1

i

=

exp γ ntni ,t ni+ 1

=

γ ntni ,t ni+ 1

i

1

i 1

|dγ nu | .

= 0

Letting n tend to ∞, we have by continuity of the map SN (.)0,1 , which follows from Theorem 3.15, 1 |dγ u | , SN (γ)0,1 ≤ 1

0

which reads g1 ≤ g .

7.5.6 From linear maps to group homomorphisms Linear maps from Rn to Rd can always be written as x → Ax where A is a n × d matrix. With a slight abuse of notation, we will call A the linear n map itself. It isobvious that A is a homomorphism from the group (R , +) d into the group R , + (in fact, the linear maps describe the set of all such homomorphisms), i.e. that for all x, y ∈ Rn , A (x + y) = Ax+Ay. It will be useful to extend A to a homomorphism from GN (Rn ) to GN Rd . To this

Free nilpotent groups

154

end, we recall that tN (Rn ) is generated by Rn in the sense that a vector space basis of tN (Rn ) is given by # m $ 4 N ∪m =1 ej i : ji ∈ {1, . . . , n} i=1 n where e1 , . . . , en is the canonical basisof R . We can then (uniquely) extend N n N d A to a homomorphism t (R ) → t R by requiring that it is compatible with ⊗, i.e. m m 4 4 ⊗m A ej i := (Aej i ) ∈ Rd ⊂ tN Rd i=1

i=1

and then extend A by linearity to all of tN (Rn ). On the other hand, tN (Rn ) is a Lie algebra with bracket [a, b] = a ⊗ b − b ⊗ a and so A is clearly compatible with the bracket, which is to say a Lie algebra homomorphism. From the Campell–Baker–Hausdorﬀ formula, A (·) := exp (A (log (·))) is thena group homomorphism between the Lie groups 1 + tN (Rn ) and N d 1 + t R . Equivalently, one can deﬁne directly A 1 + a1 + · · · + aN := 1 + Aa1 + A⊗2 a2 + · · · + A⊗N aN , ⊗k ⊗k ⊗k where ak = π k (a) ∈ (Rn ) and A⊗k : (Rn ) → Rd is deﬁned by linearity from A⊗k (ej 1 ⊗ · · · ⊗ ej k ) := Aej 1 ⊗ · · · ⊗ Aej k with ji ∈ {1, . . . , n} , and check that this deﬁnes a group homomorphism. By sheer restriction, this yields the group homomorphism A between GN (Rn ) and GN Rd . That said, we will ﬁnd it convenient in the sequel to have a direct construction of A based on step-N signatures. We have n d Proposition 7.52 Let A be a linear map from dR into R . There exists a N n N unique homomorphism from G (R ) to G R , denoted by A, such that for all x ∈ Rd , A exp (x) = exp (Ax) .

For all g ∈ GN (Rn ) we have13 Ag ≤ |A|op g and if g ∈ GN (Rn ) is written as the step-N signature of some x ∈ C 1-var ([0, 1] , Rn ), i.e. g = SN (x)0,1 , then Ag = ASN (x)0,1 = SN (Ax)0,1 . 1 3 |·| op

denotes the operator (matrix) norm.

7.5 Free nilpotent groups

155

Proof. Let x, x ˜ be continuous paths of bounded variation such that g = x)0,1 . Then, writing SN (Ax)0,1 and SN (A˜ x)0,1 in coorSN (x)0,1 = SN (˜ dinates shows that they are equal. Hence, it is possible to deﬁne Ag = SN (Ax)0,1 . We establish is a homomorphism, which may be done by checking that A−1 A g −1 ⊗ h = (Ag) ⊗ Ah for arbitrary elements g, h ∈ GN (Rn ) , which we may assume to be of form g = SN (x)0,1 , h = SN (y)0,1 (x, y are continuous paths of bounded variation). We recall that g −1 = −) , the signature of ← − = x (1 − ·), and deﬁne z to be the concatex x SN (← 0,1 − and y. Then, we have nation of ← x A g −1 ⊗ h = SN (Az)0,1 . ←− − and Ay and the On the other hand, Az is the concatenation of Ax = A← x proof is ﬁnished by observing that ←− −1 SN (Az)0,1 = SN Ax ⊗ SN (Ay)0,1 = (Ag) ⊗ Ah. 0,1

Finally, we discuss the estimate on Ag. Let γ : [0, 1] → Rd be a geodesic

1 path associated with g, i.e. a path such that SN (γ)0,1 = g and 0 |dγ| = g . Then, Ag = SN (Aγ)0,1 ≤

0

1

|d (Aγ)| ≤ |A|op

0

1

|dγ| = |A|op g

and the proof is ﬁnished. Example 7.53 One simple linear map from Rn ⊕Rn into Rn is the addition map, i.e. plus (x, y) = x+y where x, y ∈ Rn . It extends to a homomorphism plus from GN (Rn ⊕ Rn ) into GN (Rn ) .

Example 7.54 Another simple linear map from Rd ⊕ Rd onto Rd is the projection ponto the ﬁrst It then extends to a homomorphism d coordinates. d N d d N R ⊕R into G R . For example, if (x, h) is a Rd ⊕ Rd p from G valued path, p ◦ SN (x, h)0,1 = SN (x)0,1 .

Exercise 7.55 Another simple linear map from Rd to Rd is the map x → λx for a given λ ∈ R. Prove that its homomorphism extension is the re striction of the dilation map δ λ to GN Rd .

Free nilpotent groups

156

Exercise 7.56 Consider for λ = (λi )1≤i≤d ∈ Rd , the map δ λ : 1 + tN Rd → 1 + tN Rd deﬁned by   N δ λ 1 + xi 1 ,...,i k ei 1 ⊗ . . . ⊗ ei k  k =1 1≤i 1 ,...,i k ≤d

=1+

N

xi 1 ,...,i k λi 1 . . . λi k ei 1 ⊗ . . . ⊗ ei k .

k =1 1≤i 1 ,...,i k ≤d

d d (i) Prove that δ λ is the extension of the linear map i=1 xi ei → i=1 xi λi ei . (ii) Provethat if all the λi are equal to some scalar, then the restriction of δ λ to GN Rd is the dilation map from Exercise 7.55. Exercise 7.57 Show that sup g ∈G N (Rn ),g > 0

Ag / g = |A|op .

Solution. ≤ is clear from Proposition 7.52 and equality is achieved at g = exp (x) where x ∈ Rn , non-zero, is such that |Ax| / |x| = |A|op . Exercise 7.58 Prove that for all (λ, g) ∈ Rd+ × GN Rd , δ λ (g) ≤ (maxi=1,...,d λi ) g and g ≤ (maxi=1,...,d 1/λi ) δ λ (g) .

7.6 The lift of continuous bounded variation paths on Rd 7.6.1 Quantitative bound on SN Recall from Section 7.2.1 that SN maps a continuous path x of ﬁnite 1variation with values in Rd to a path {t → SN (x)t ≡ SN (x)0,t } simply by computing all iterated (Riemann–Stieltjes) integrals up to order N . Recall also that SN (x) wasNseen d to take values in the (free, step-N nilpotent) N d R ⊂ T R . We call SN (x) the canonical lift of x to a group G GN Rd -valued path, since14 π 1 SN (x)0,t = x0,t for all t ∈ [0, T ]. As we shall now see, SN (x) is not only of ﬁnite length (i.e. 1-variation) with respect to Carnot–Caratheodory metric on GN Rd but has the same length as the Rd -valued path x. 14 π

1

S N (x)t = xt for all t ∈ [0, T ] only holds if x (0) = 0.

7.6 The lift of continuous bounded variation paths

157

Proposition 7.59 Let x ∈ C 1-var [0, T ] , Rd . Then, SN (x)1-var;[0,T ] = |x|1-var;[0,T ] . Proof. From the very deﬁnition of the Carnot–Caratheodory norm, ≥ |x (x) SN s,t | for all 0 ≤ s < t ≤ T and thus s,t d (SN (x)s , SN (x)t ) = SN (x)s,t ≥ |xs,t | . Clearly then, SN (x)1-var;[s,t] ≥ |x|1-var;[s,t] for all 0 ≤ s < t ≤ T . Conversely, t |dx| = |x|1-var;[s,t] SN (x)s,t ≤ s

and since (s, t) → |x|1-var;[s,t] is a (super-additive) control function, we immediately obtain that for all 0 ≤ s < t ≤ T , SN (x)1-var;[s,t] ≤ |x|1-var;[s,t] and the proof is ﬁnished. Exercise 7.60 The purpose of this exercise is to replace C 1-var -regularity in Proposition 7.59 by W 1,p -regularity with p ∈ (1, ∞). Following Section 1.4.2, the space of all x : [0, T ] → GN Rd with xW 1 , p ;[0,T ] :=

sup

D ⊂[0,T ] i:t ∈D i

xt

i ,t i + 1

p 1/p p−1

|ti+1 − ti |

<∞

is denoted by W 1,p [0, T ] , GN Rd , a subset of C 1-var [0, T ] , GN Rd . Let x ∈ W 1,p [0, T ] , Rd and recall from Section 1.4.1 that its W 1,p -(semi-) norm is given by 1/p T

|x|W 1 , p ;[0,T ] =

p

|x˙ t | dt

.

0

Show that SN (x)W 1 , p ;[0,T ] = |x|W 1 , p ;[0,T ] . [0, T ] , G2 Rd is important as it allows an intrinsic (The case of W deﬁnition of the rate function of enhanced Brownian motion viewed as a symmetric diﬀusion process on G2 Rd .) 1,2

Solution. From the results in Section 1.4.1 we know that x is the indeﬁnite integral of some x˙ ∈ Lp [0, T ] , Rd and that T xt ,t p p p i i+ 1 = |x˙ t | dt. |x|W 1 , p ;[0,T ] = sup p−1 D ⊂[0,T ] i:t ∈D |ti+1 − ti | 0 i

Free nilpotent groups

158

Next, SN (x)W 1 , p ;[0,T ] ≥ |x|W 1 , p ;[0,T ] follows readily from SN (x)s,t ≥ |xs,t |. For the converse inequality, we ﬁrst observe that t 1−1/p |x˙ u | du ≤ |x|W 1 , p ;[s,t] |t − s| . SN (x)s,t ≤ s

Hence p SN (x)s,t |t − s|

p−1

p

p

p

≤ |x|W 1 , p ;[s,t] =⇒ SN (x)W 1 , p ;[s,t] ≤ |x|W 1 , p ;[s,t] , p

where we used super-additivity of (s, t) → |x|W 1 , p ;[s,t] . We aim now to show that continuous GN Rd -valued paths of bounded 1-var variation d are in one-to-one correspondence with elements of C [0, T ] , R . We ﬁrst need a simple lemma; recall that o ≡ 1 stands for the unit element in GN Rd . We note that the projection map πd 0,k (cf. k(7.11)) N may be restricted to yield a projection map π 0,k : G R → G Rd , whenever k ≤ N . Lemma 7.61 Let x, y be two elements of Co [0, T ] , GN Rd such that π 0,N −1 (x) = π 0,N −1 (y) . Then, the path h deﬁned by d N R ht := log x−1 t ⊗ yt ∈ g

is such that for some constant C depending only on N, for all s, t ∈ [0, T ] , N

|hs,t | ≤ C (xs,t + ys,t ) . d ⊗N is non-zero so that Proof. Note that only the projection of h to R exp (ht ) commutes with all elements in GN Rd . In particular, ys,t

= = = =

ys−1 ⊗ yt exp (−hs ) ⊗ x−1 s ⊗ xt ⊗ exp (ht ) −1 xs ⊗ xt ⊗ exp (−hs ) ⊗ exp (ht ) xs,t ⊗ exp (hs,t ) .

Using the equivalence of homogenous norms, we obtain |hs,t |

N

≤ c exp (hs,t ) N ≤ c x−1 ⊗ y s,t s,t ≤

N

c (xs,t + ys,t ) .

7.6 The lift of continuous bounded variation paths

159

Theorem 7.62 Let N ≥ 1 and x ∈ Co1-var [0, T ] , Rd . Then x = SN (x) is the unique “lift” ofx in the sense that π 1 (x) = x and such that x ∈ Co1-var [0, T ] , GN Rd . Moreover, SN : Co1-var [0, T ] , Rd → Co1-var [0, T ] , GN Rd is a bijection with inverse π 1 and, for all 0 ≤ s < t ≤ T , x1-var;[s,t] = |x|1-var;[s,t] . Proof. It is obvious that x = SN (x) has the lifting property, i.e. that 1-var d [0, T ] , R , and from Proposition 7.59 on C π 1 ◦SN is the identity map o it enough to we see that x ∈ Co1-var [0, T ] , GN Rd . To see uniqueness is show that SN ◦ π 1 is the identity map on Co1-var [0, T ] , GN Rd . This is trivially true when N = 1. By induction, we now assume the statement is true at level N − 1, for N ≥ 2. Deﬁne y = SN ◦ π 1 (x) and the path h by d N R . ht := log x−1 t ⊗ yt ∈ g N

N

From Lemma 7.61, |hs,t | ≤ c (xs,t + ys,t ) ≤ cω (s, t) where ω (s, t) is the super-additive (control) function x1-var;[s,t] + y1-var;[s,t] . In particular, h is of ﬁnite N1 -variation. As 1/N < 1 and h0 = 0, this implies that h ≡ 0, i.e. that y = x.

7.6.2 Modulus of continuity for the map SN

Proposition 7.63 Let x1 , x2 ∈ C 1-var [0, T ] , Rd , and ≥ maxi=1,2 i x . Then, for all N ≥ 1, 1-var;[s,t] ∃CN : ∀0 ≤ s < t ≤ T : π N SN x1 s,t − SN x2 s,t ≤ CN N −1 x1 − x2 1-var;[s,t] .

(7.26)

In particular, if ω is a ﬁxed control and ε is a positive real such that for all s, t ∈ [0, T ] , max xi 1-var;[s,t] ≤ ω (s, t) and x1 − x2 1-var;[s,t] ≤ εω (s, t) , i=1,2

we have15 max

sup

k =1,...,N 0≤s< t≤T

|π k (xs,t − ys,t )| k

ω (s, t)

≤ CN ε.

1 5 In the terminology of the forthcoming Deﬁnition 8.6, this is equivalent to ρ1 , ω S N x1 , S N x2 ≤ C N ε.

Free nilpotent groups

160

Proof. Obviously, (7.26) holds true with C1 = 1 for N = 1. We proceed by induction, assuming (7.26) holds. Then t π N +1 SN +1 x1 s,t −SN +1 x2 s,t = π N SN x1 s,r −SN x2 s,r ⊗dx1r s t + π N SN x2 s,r ⊗d x1r −x2r . s

From the induction hypothesis, sup π N SN x1 s,r − SN x2 s,r r ∈[s,t]

≤ CN

max xi

i=1,2

N −1 1-var;[s,t]

1 x − x2 , 1-var;[s,t]

hence the ﬁrst integral on the right-hand side above is estimated by t 1 2 1 π N SN x s,r − SN x s,r ⊗ dxr s

≤ CN

max xi 1-var;[s,t]

i=1,2

N

1 x − x2 . 1-var;[s,t]

N On the other hand, supr ∈[s,t] π N SN x2 s,r ≤ (1/N !) x2 1-var;[s,t] so that we can estimate the second integral on the right-hand side above: t 2 1 1 2 N 2 x 1-var;[s,t] x1 − x2 1-var;[s,t] . x S ⊗ d x π − x N N r r ≤ s,r N! s Combining the two estimates ﬁnishes the induction step and thus concludes the proof of the ﬁrst part of the proposition. The second part is an obvious corollary of the ﬁrst part. The previous proposition implies in particular that if x1 and x2 are two continuous paths of length bounded by 1, and such that |x1 − x2 |1-var;[0,1] ≤ ε, then for all N ∈ {1, 2, 3, . . . }, max |π k (g1 − g2 )| ≤ CN ε

k =1,...,N

where gi := SN (xi )0,1 ∈ GN Rd , i = 1, 2. We now prove in some sense the converse statement; the basic idea of the proof of the following proposition is illustrated in Figure 7.2. Proposition 7.64 Let g1 , g2 ∈ GN Rd , with g1 , g2 ≤ C1 and max |π k (g1 − g2 )| ≤ ε, k = 1, . . . , N.

k =1,...,N

7.6 The lift of continuous bounded variation paths

161

Then, there exists xi ∈ C 1-var [0, 1] , Rd , such that SN (xi )0,1 = gi , i = 1, 2, and a constant C2 = C2 (C1 , N ), such that max |xi |1-var;[0,1] ≤ C2 ,

i=1,2

|x1 − x2 |1-var;[0,1] ≤ εC2. Proof. First case: Assume that π 1,N −1 (g1 ) = π 1,N −1 (g2 ) = 0. In such a case, we can write gi = exp (i ) = 1 + i ; the hypothesis implies that |i | ≤ c1 , and that 2 = 1 + εm, with |m| ≤ c2 . We write g1 and g2 in the following way: g1 g2

= exp (1 − m) ⊗ exp (m) , = exp (1 − m) ⊗ exp ((1 + ε) m) = exp (1 − m) ⊗ δ (1+ε) 1 / N exp (m) .

Deﬁne z : [0, 1] → Rd to be a geodesic associated with the group element exp (1 − m) (observe that the length of z is bounded by a constant independent of ε). Deﬁne also y : [0, 1] → Rd to be a geodesic associated with the group element exp (m) (the length of y is bounded by a constant independent of ε). Deﬁne x1 : [0, 2] → Rd to be the concatenation of z and y, and x2 : [0, 2] → Rd to be the concatenation of z and 1/N (1 + ε) ! y. Observe that " the 1-variation distance between x1 and x2 is 1/N equal to (1 + ε) − 1 times the length of y, i.e. it is bounded by a con 1/N − 1 ≤ ε. Reparametrizing the paths x1 and x2 to stant times (1 + ε) be from [0, 1] into Rd ﬁnishes the ﬁrst case. General case: We prove the general case by induction. The case N = 1 can be solved using the ﬁrst case (or more simply with d straight lines). N R we now prove Assuming that the proposition holds for elements in G d N +1 R . To this end, take two arbitrary that it also holds for elements in G d N +1 R with elements g1 , g2 ∈ G max

k =1,...,N +1

|π k (g1 ) − π k (g2 )| < ε.

Set c3 := g1 ∨ g2 and deﬁne hi ∈ GN Rd by projection of gi to the ﬁrst N levels, i.e. so that π 0,N (gi ) = π 0,N (hi ). Obviously hi ≤ c3 for i = 1, 2 and max |π k (h1 ) − π k (h2 )| < ε. k =1,...,N

By the induction hypothesis, there exist two paths z1 , z2 (which we may take to be deﬁned on [0, 1]) of length bounded by c4 , with the length of

Free nilpotent groups

162

4 z 2

10

0 0 5

y

5 x 10 0

Figure 7.2. We illustrate the basic idea of the proof of Proposition 7.64. Two points of G2 R2 , identiﬁed with the 3-dimensional Heisenberg group, are given as g1 = (9, 9, 1.5) and g2 = (8, 8.5, 2). The corresponding paths x1 , x2 are the concatenation of straight lines, connecting the origin with (9, 9) and (8, 8.5) respectively, followed by a circle which wipes out the prescribed area, 1.5 and 2 respectively.

z1 − z2 bounded by c4 ε, where c4 = c4 (N, c3 ), and with the property that SN (zi )0,1 = hi , i = 1, 2. We now deﬁne −1 ki = SN +1 (zi )0,1 ⊗ gi ∈ GN +1 Rd ; i = 1, 2. Clearly, ki ≤ c4 + c3 =: c5 and π 1,N (ki ) = 0. Also, from Proposition 7.63, we have, for all j ≤ N + 1, −1 −1 z−1 )0,1 − SN +1 (← z−2 )0,1 π j SN +1 (z1 )0,1 − SN +1 (z2 )0,1 = π j SN +1 (← ≤

c6 ε. It is easy to see that, for any a, b ∈ T N +1 Rd , π 0,N (a) = π 0,N (b) =⇒ a−1 ⊗ b = 1 + π N +1 a−1 + π N +1 (b) . −1 +π N +1 (gi ) , Applied in our context this gives ki = 1+π N +1 SN +1 (zi ) and hence, for all j ≤ N + 1, we have |π j (k1 − k2 )| ≤ (1 + c6 ) ε. Therefore, using the ﬁrst case, there exist two paths y1 , y2 (deﬁned on [0, 1]) of length bounded by c7 , with length of y1 −y2 bounded by c7 ε, and with the property that SN +1 (yi ) = ki ; i = 1, 2. We conclude the proof by observing that the paths xi = zi yi , (re)parametrized to [0, 1], satisfy SN +1 (xi ) = gi , are of length bounded by c8 , and with length of x1 − x2 bounded by c8 ε. The proof is now ﬁnished.

7.7 Comments

163

As a ﬁrst corollary of this powerful lemma, we prove a modulus of continuity for homomorphism on GN Rd that extends the linear map on Rd (see Section 7.5.6). Proposition 7.65 Let A be a linear map from Rn into Rd . Then, for g, h ∈ GN (Rn ) with g and h bounded by 1, we have for some constant C = C (N ) , |Ag − Ah| ≤ C |A|op |g − h| . Proof. Using Lemma 7.64, we take g = SN (x)0,1 and h = SN (y)0,1 , where x and y are some bounded variation paths of length bounded by some constant c1 and such that |x − y|1-var;[0,1] ≤ c2 |g − h| . Then, using Proposition 7.63, |Ag − Ah|

= |SN (Ax)0,1 −SN (Ay)0,1 | ≤ c3 |A (x − y)|1−var;[0,1] ≤

c3 |A|op |(x − y)|1−var;[0,1]

≤

c4 |g − h| .

7.7 Comments Section 7.2: The signature map goes back to the classical work of Chen [28, 29] and it was clear from his work that truncation at step N leads to (a presentation) of the step-N free nilpotent Lie group. This point of view was in particular adopted by Lyons [115, 116]; see Lyons et al. [123] for references. Section 7.3: All this is standard, see Lyons et al. [123] for references. The proof of the Campbell–Baker–Hausdorﬀ formula via diﬀerential equations is also well-known and appears, for instance, in Strichartz [157]. If one only cares about the qualitative statement Corollary 7.27, an algebraic proof is possible via Friedrich’s criterion, see Reutenauer [142] for instance. Section 7.4: In our setting, Chow’s theorem plays the role of a converse to the Campbell–Baker–Hausdorﬀ formula. See Baudoin [7], Folland and Stein [54] for related points of view. It can be formulated in sub-Riemannian geometry (Montgomery [131] and the references cited therein), see also Varopouols et al. [174]. Section 7.5: Again, geodesics are essentially a sub-Riemannian concept but the details are simpler in our setting. Equivalence of homogenous

164

Free nilpotent groups

norms, on the other hand, is a group concept and allows for considerable simplifcation when it comes to topological consistency of the Carnot– Caratheodory metric with the original topology. Exercise 7.39 is taken from Strichartz [157]. Section 7.6 contains some “lifting” estimates which correspond, in essence, to the case p = 1 in the forthcoming estimates for the Lyons lift (see Section 9.1). Proposition 7.64 appears to be new; the construction of “almost” geodesic paths associated with a pair of “nearby” group elements will be a key ingredient (cf. the proof of the forthcoming Theorem 10.26) in establishing local Lipschitzness of the Itˆ o–Lyons map.

8 Variation and H¨ older spaces on free groups In the general setting of a (continuous) path with values in a metric space, say x : [0, T ] → E, we deﬁned its p-variation “norm”over [0, T ] , in symbols |x|p-var;[0,T ] . This applies in particular to a E = GN Rd -valued path x (·), where GN Rd is the free step-N nilpotent group discussed at length in the previous chapter. As a constant reminder that the (Carnot–Caratheodory) metric d on GN Rd was derived from ·, the Carnot–Caratheodory norm, we shall then use the notation 1/p p sup d xt i , xt i + 1 (8.1) xp-var;[0,T ] = (t i )⊂[0,T ]

=

sup

i

1/p xt i ,t i + 1 p

(t i )⊂[0,T ]

i

and, thanks to homogeneity of · with respect to dilation, speak of a homogenous p-variation norm. As a special case of Deﬁnition 5.1, C p-var [0, T ] , GN Rd = x ∈ C [0, T ] , GN Rd : xp-var;[0,T ] < ∞ and we shall assume p ≥ 1 unless otherwise stated. When E = Rd , (8.1) is precisely the usual p-variation (semi-)norm and (x, y) → |x0 − y0 | + |x − y|p-var;[0,T ] deﬁnes a genuine metric, the p-variation metric, on path space. Recalling p that |x − y|p-var;[0,T ] is of the form sup (t i )⊂[0,T ]

xt

i ,t i + 1

p − yt i ,t i + 1

i

(with xs,t ≡ xt − xs ∈ Rd ), a convenient extension to a GN Rd -valued path is to replace |xs,t − ys,t | by d (xs,t , ys,t ), where now d N R . xs,t ≡ x−1 s ⊗ xt ∈ G p p/k Alternatively, we may replace xt i ,t i + 1 − yt i ,t i+ 1 by |π k (xs,t − ys,t )| for all k = 1, . . . , N, using the fact that GN Rd ⊂ T N Rd ; recalling that

166

Variation and H¨ older spaces on free groups

⊗k π k : T N Rd → Rd denotes projection to the kth tensor level. When N = 1, the two notations coincide; for N ≥ 2, they do not. However, both resulting p-variation distances remain “locally uniformly” comparable (and in particular induce the same topology and the same notion of Cauchy sequences) and both are ﬁrst allows us to discuss the properties useful. The of the space C p-var [0, T ] , GN Rd with often identical arguments as in the Rd case. The second arises naturally in Lipschitz estimates of the Lyons lift and the Itˆ o–Lyons map, discussed in later chapters. Of course, everything said here applies in a H¨ older context. In particular, the homogenous 1/p-H¨ older norm is given by x1/p-H¨o l;[0,T ] =

d (xs , xt )

sup 0≤s< t≤T

1/p

|t − s|

=

xs,t

sup

|t − s|

0≤s< t≤T

1/p

(8.2)

and

C 1/p-H¨o l [0, T ] , GN Rd = x ∈ C [0, T ] , GN Rd : x1/p-H¨o l;[0,T ] < ∞ .

8.1 p-Variation and 1/p-H¨older topology 8.1.1 Homogenous p-variation and H¨ older distances As usual in the discussion of p-variation, we assume p ≥ 1. We then have Deﬁnition variation and H¨ older distance) Given 8.1 (homogenous x, y ∈ C [0, T ] , GN Rd we deﬁne dp-var;[0,T ] (x, y) :=

sup D

d xt i ,t i + 1 , yt i ,t i + 1

p

1/p

t i ∈D

and d1/p-H¨o l;[0,T ] (x, y) :=

d (xs,t , ys,t )

sup 0≤s< t≤T

1/p

|t − s|

,

where it is understood that d0;[0,T ] (x, y) := d0-H¨o l;[0,T ] (x, y) :=

sup

d (xs,t , ys,t ) .

0≤s< t≤T

With o = 1, the unit element in GN Rd , we note that dp-var;[0,T ] (x, o) = xp-var;[0,T ] , the homogenous p-variation norm of x and similarly in the H¨ older case. It is obvious that (x, y) → dp-var;[0,T ] (x, y) resp. d1/p-H¨o l;[0,T ] (x, y)

(8.3)

8.1 p-Variation and 1/p-H¨ older topology

167

is non-negative, symmetric and one sees (precisely as the case of Rd -valued paths) that the triangle inequality is satisﬁed. However, when the right hand side above is 0 this only tells us that xt = c ⊗ yt with c = x−1 0 ⊗ y0 . If attention is restricted to paths with ﬁxed starting point, which is the 1/p-H¨o l [0, T ] , GN Rd , then (8.3) case for Cop-var [0, T ] , GN Rd resp. Co deﬁnes a genuine metric. Otherwise, it suﬃces to add the distance of the starting points, in which case (x, y) → d(x0 , y0 ) + dp-var;[0,T ] (x, y) resp. d(x0 , y0 ) + d1/p-H¨o l;[0,T ] (x, y) resp. C 1/p-H¨o l ([0, T ] , gives a genuine metric on C p-var [0, T ] , GN Rd N d G R . Let us observe that for any λ ∈ R, dp-var;[0,T ] (δ λ x, δ λ y) = |λ| dp-var;[0,T ] (x, y), d1/p-H¨o l;[0,T ] (δ λ x, δ λ y) = |λ| d1/p-H¨o l;[0,T ] (x, y), where δ λ denotes dilation by λ on GN Rd , which explains the terminology homogenous p-variation (resp. 1/p-H¨older) distance; we also note that (s, t) → dp-var;[s,t] (x, y)p is a control function. Deﬁnition 8.2 (Homogenous p-ω distance) (i) Given a control function ω on [0, T ] and x ∈ C [0, T ] , GN Rd we deﬁne1 the homogenous p-ω norm xp-ω ;[0,T ] :=

sup 0≤s< t≤T

d (xs , xt ) 1/p

ω (s, t)

=

sup 0≤s< t≤T

xs,t 1/p

(8.4)

ω (s, t)

and

C p-ω [0, T ] , GN Rd = x ∈ C [0, T ] , GN Rd : xp-ω ;[0,T ] < ∞ .

(ii) Given x, y ∈ C p-ω [0, T ] , GN Rd and a control function ω on [0, T ], we deﬁne d (xs,t , ys,t ) dp-ω ;[0,T ] (x, y) = sup . 1/p 0≤s< t≤T ω (s, t) 1 The deﬁnition of · p -ω would make sense for paths with values in p -ω ;[0 , T ] and C an abstract metric space.

168

Variation and H¨ older spaces on free groups

Remark 8.3 Whenever ω (s, t) = 0 for some s < t, the right-hand side of (8.4) is inﬁnity unless xs,t = 0, in which case we use the convention 0/0 = 0. Equivalently, one can deﬁne 1/p for all 0 ≤ s < t ≤ T . xp-ω ;[0,T ] = inf M ≥ 0 : xs,t ≤ M ω (s, t)

Similar remarks apply to our deﬁnition of dp-ω ;[0,T ] (x, y). It is clear from the deﬁnition that

(x, y) → d (x0 , y0 ) + dp-ω ;[0,T ] (x, y) is a metric on C p-ω [0, T ] , GN Rd and, as above, one can omit the term d (x0 , y0 ) if attention is restricted to paths pinned at time 0. Let us also observe, as an elementary consequence of super-additivity of controls, dp-var;[0,T ] (x, y) ≤ ω (0, T )

1/p

dp-ω ;[0,T ] (x, y);

(8.5)

in the special case ω (s, t) = t − s this reads dp-var;[0,T ] (x, y) ≤ T 1/p d1/p-H¨o l;[0,T ] (x, y). Proposition 8.4 For all x1 , x2 ∈ C p-var [0, T ] , GN Rd there exists a control ω with ω (0, T ) = 1 such that i x , ≤ 31/p xi p-var;[0,T ] , i = 1, 2; p-ω ;[0,T ] dp-ω ;[0,T ] (x1 , x2 ) ≤ 31/p dp-var;[0,T ] (x1 , x2 ). Proof. Given x, y ∈ C p-var [0, T ] , GN Rd deﬁne the control ω x,y (s, t) :=

dp-var;[s,t] (x, y) dp-var;[0,T ] (x, y)

p

p

(using the convention 0/0 = 0 if necessary). Note ω x,o (s, t) = xp-var;[s,t] / p xp-var;[0,T ] where o denotes the trivial path constant equal to the unit element in GN Rd . Then ω (s, t) :=

1 1 1 ω x 1 ,x 2 (s, t) + ω x 1 ,o (s, t) + ω x 2 ,o (s, t) 3 3 3

has the desired properties. For instance, 1/p

d(x1s,t , x2s,t ) ≤ dp-var;[s,t] (x1 , x2 ) ≤ dp-var;[0,T ] (x1 , x2 )ω x 1 ,x 2 (s, t)

≤(3ω (s,t)) 1 / p

and similar for x1 , o and x2 , o.

8.1 p-Variation and 1/p-H¨ older topology

169

We now show that a map which is uniformly continuous on bounded sets in the metric dp-ω for all possible control functions ω, is also uniformly continuous on bounded sets in the metric dp-var . To this end, for paths x ∈ C [0, T ] , GM Rd we deﬁne the (homogenous) balls x : dp-ω ;[0,T ] (x, o) < R , Bp-var (R) = x : dp-var;[0,T ] (x, o) < R . Bp-ω (R) =

Corollary 8.5 Consider a map φ : C p-var [0, T ] , GM Rd → C p-var [0, T ] , GN (Re ) . Assume that for any control function ω, any ε > 0 and R > 0 there exists δ = δ (ε, R; ω) such that x, y ∈ Bp-ω (R) , dp,ω (x, y) < δ =⇒ dp-ω (φ (x) , φ (y)) < ε. Then, for any ε > 0 and R > 0 there exists η = η (ε, R) such that x, y ∈ Bp-var (R) , dp-var (x, y) < η =⇒ dp-var (φ (x) , φ (y)) < ε. In fact, we can choose η = δ (ε, CR; ω) /C with ω deﬁned as in Proposition older on bounded 8.4 and C = 31/p . (This shows that if φ is (locally) γ-H¨ sets in the metric dp-ω , for all possible control functions ω, it is also (locally) γ-H¨ older on bounded sets in the metric dp-var .) Proof. Given ε > 0 and R > 0 and x, y ∈ Bp-var (R) we take the corresponding with the properties as stated in Proposition 8.4. Taking control ω η = δ ε, 31/p R; ω /31/p then shows that dp-var (x, y) < η =⇒ dp-ω (x, y) < δ so that dp-ω (φ (x) , φ (y)) < ε and we conclude with (8.5).

8.1.2 Inhomogenous p-variation and H¨ older distances Our deﬁnition of homogenous (variation, H¨ older, ω-modulus) distance was based on measuring the distance of increments xs,t , ys,t ∈ GN Rd using the Carnot–Caratheodory distance. Alternatively, recalling that GN Rd ⊂ T N Rd we can use the (vector space) norm deﬁned on the latter which leads to the distance of increments given by |xs,t − ys,t |T N (Rd ) =

max |π k (xs,t − ys,t )| .

k =1,...,N

Observe that, for N > 1, this distance is not homogenous with respect to dilation on GN Rd .

Variation and H¨ older spaces on free groups

170

Deﬁnition 8.6 (inhomogenous older distance) variation and H¨ Given x, y ∈ C [0, T ] , GN Rd we deﬁne (i) for k = 1, . . . , N, (k ) ρp-var;[0,T ]

(x, y) =

π k xt

sup (t i )⊂[0,T ]

i ,t i + 1

− yt i ,t i + 1

p/k

k /p

i

and ρp-var;[0,T ] (x, y) =

(k )

max ρp-var;[0,T ] (x, y) ;

k =1,...,N

(ii) for any control function ω on [0, T ], for k = 1, . . . , N, (k )

ρp-ω ;[0,T ] (x, y) =

sup

|π k (xs,t − ys,t )|

0≤s< t≤T

k /p

ω (s, t)

and ρp-ω ;[0,T ] (x, y) =

(k )

max ρp-ω ;[0,T ] (x, y) ;

k =1,...,N

(iii) for k = 1, . . . , N, (k )

ρ1/p-H¨o l;[0,T ] (x, y) =

sup

|π k (xs,t − ys,t )| |t − s|

0≤s< t≤T

k /p

and ρ1/p-H¨o l;[0,T ] (x, y) =

(k )

max ρ1/p-H¨o l;[0,T ] (x, y) .

k =1,...,N

Some remarks are in order. By taking y ≡ o we have a notion of an inhomogenous (variation, H¨ older, ω-modulus) “norm”, but this will play no role in the sequel. Let us also remark that xp-var;[0,T ] = dp-var;[0,T ] (x, o) < ∞ iﬀ ρp-var;[0,T ] (x, o) < ∞ so that C p-var [0, T ] , GN Rd is precisely the set of paths x with ﬁnite ρp-var;[0,T ] distance between x and o. The map (x, y) → ρp-var;[0,T ] (x, y) is obviously non-negative, symmetric and one easily sees that the triangle inequality is satisﬁed. Then, precisely as in the previous section, adding |x0 − y0 |T N (Rd ) gives rise to a genuine metric on C p-var [0, T ] , GN Rd ; if attention is restricted to a path with pinned starting point, ρp-var;[0,T ]

8.1 p-Variation and 1/p-H¨ older topology

171

is already a metric. Of course, all this applies mutatis mutandis in a 1/pH¨older resp. p-ω context and ρ1/p-H¨o l resp. ρp-ω gives rise to metrics on the C 1/p-H¨o l resp. C p-ω spaces. Super-additivity of controls leads easily to 1/p N /p , ρp-var;[0,T ] x1 , x2 ≤ ρp-ω ;[0,T ] x1 , x2 max ω (0, T ) , ω (0, T ) (8.6) which should be compared with (8.5).

Proposition 8.7 For all x1 , x2 ∈ C p-var [0, T ] , GN Rd there exists a control ω with ω (0, T ) = 1 and a constant C = C (p, N ) such that ρp-ω ;[0,T ] (xi , o)

≤ Cρp-var;[0,T ] (xi , o), i = 1, 2;

≤ Cρp-var;[0,T ] (x1 , x2 ). Proof. Given x, y ∈ C p-var [0, T ] , GN Rd , we deﬁne for convenience, for k = 1, . . . , N , ρp-ω ;[0,T ] (x1 , x2 )

(k )

) ω (k x,y (s, t) = ρp-var;[s,t] (x, y)

p/k

.

We then deﬁne ω ¯ (k ) (s, t)

=

1 (k ) (k ) ω 1 2 (s, t) /ω x 1 ,x 2 (0, T ) 3 x ,x 1 (k ) (k ) + ω x 1 ,o (s, t) /ω x 1 ,o (0, T ) 3 1 (k ) (k ) + ω x 2 ,o (s, t) /ω x 2 ,o (0, T ) , 3

¯ (k ) (0, T ) = 1, and ﬁnally so that ω ¯ (k ) is a control with ω N 1 (k ) ω ¯ (s, t) . N

ω (s, t) =

k =1

To see that this deﬁnition of ω does the job, observe that for all k = 1, . . . , N and all 0 ≤ s < t ≤ T 1 π k xs,t − x2s,t p/k ≤ ω (k1) 2 (s, t) x ,x (k )

≤

3¯ ω (k ) (s, t) × ω x 1 ,x 2 (0, T )

≤

3N ω (s, t) × ω x 1 ,x 2 (0, T )

(k )

from which we see that k /p 1 π k xs,t − x2s,t ≤ (3N )k /p ω (s, t)k /p × ω (k1) 2 (0, T ) x ,x ! "k /p (k ) k /p k /p ≤ (3N ) ω (s, t) × max ω x 1 ,x 2 (0, T ) k =1,...,N

≤

(3N )

N /p

k /p

ω (s, t)

ρp-var;[0,T ] x1 , x2

172

Variation and H¨ older spaces on free groups

which says precisely that N /p ρp-ω ;[0,T ] x1 , x2 ≤ (3N ) ρp-var;[0,T ] x1 , x2 . The same argument applies to x1 , o and x2 , o and the proof is ﬁnished. We now show that a map uniformly continuous on bounded sets in the ρp-ω metric, for all ω, is uniformly continuous on bounded sets in the ρp-var metric. To this end, for paths x ∈ C [0, T ] , GM Rd we deﬁne the (inhomogenous) balls β p-ω (R) = x : ρp-ω ;[0,T ] (x, o) < R , x : ρp-var;[0,T ] (x, o) < R . β p-var (R) = Corollary 8.8 Consider a map φ : C p-var [0, T ] , GM Rd → C p-var [0, T ] , GN (Re ) such that for any control function ω, any ε > 0 and R > 0 there exists δ = δ (ε, R; ω) such that x, y ∈ β p-ω (R) , ρp-ω (x, y) < δ =⇒ ρp-ω (φ (x) , φ (y)) < ε. Then, for any ε > 0 and R > 0 there exists η = η (ε, R) such that x, y ∈ β p-var (R) , ρp-var (x, y) < η =⇒ ρp-var (φ (x) , φ (y)) < ε. In fact, we can choose η = δ (ε, CR; ω) /C with ω deﬁned as in Proposition 8.7 and C = C (N, p). (This shows that if φ is (locally) γ-H¨ older on bounded older on bounded sets in the ρp-ω metric, for all ω, it is also (locally) γ-H¨ sets in the ρp-var metric.) Proof. Obvious from Proposition 8.7.

8.1.3 Homogenous vs inhomogenous distances Proposition 8.9 Let ωbe a control function on [0, T ]. For all paths x, y in C p-ω [0, T ] , GN Rd 1/N 1− 1 (8.7) max 1, xp-ωN dp-ω (x, y) ≤ C max ρp-ω (x, y) , ρp-ω (x, y) and

N −1 ρp-ω (x, y) ≤ C max dp-ω (x, y) max 1, xp-ω , dp-ω (x, y)N

(8.8)

where C = C (N ). The corresponding H¨ older estimates are obtained by taking ω (s, t) = t − s.

8.1 p-Variation and 1/p-H¨ older topology

Proof. First we see that2 dp-ω ;[0,T ] (x, y)

=

ρp-ω ;[0,T ] (x, y)

=

sup 0≤s< t≤T

sup 0≤s< t≤T

173

d δ 1 1 / p xs,t , δ 1 1 / p ys,t , ω (s ,t ) ω (s ,t ) δ 1 xs,t − δ 1 1 / p ys,t 1/p ω (s ,t )

ω (s ,t )

T N (Rd )

so that these deﬁnitions indeed only diﬀer in how to measure distance in GN Rd . A quantitative comparison of these distances was given in Proposition 7.49, an application of which ﬁnishes the proof. For a concise formulation of the next theorem, let us set d˜p-ω ;[0,T ] (x, y) := dp-ω ;[0,T ] (x, y) + d (x0 , y0 ) ρ ˜p-ω ;[0,T ] (x, y) := ρp-ω ;[0,T ] (x, y) + ρ (x0 , y0 ) and similarly for d1/p-H¨o l , dp-var and ρ1/p-H¨o l , ρp-var where we have already started to omit [0, T ] in the notation when no confusion is possible. We have Theorem 8.10 Let ω be an arbitrary control function on [0, T ]. Each identity map ˜p-ω Id : C p-ω [0, T ] , GN Rd , d˜p-ω C p-ω [0, T ] , GN Rd , ρ 1 1 ˜p-H¨o l ) Id : (C p -H¨o l [0, T ] , GN Rd , d˜p-H¨o l ) (C p -H¨o l [0, T ] , GN Rd , ρ d d p-var N p-var N ˜ [0, T ] , G R , dp-var ) (C [0, T ] , G R ,ρ ˜ ) Id : (C p-var

is Lipschitz on bounded sets in → direction and 1/N -H¨ older on bounded sets in ← direction. Proof. The relevant estimates between “d (x0 , y0 ) and ρ (x0 , y0 )” follow directly from Proposition 7.49 and we focus on the path-space distance without tilde: the case of the identity map from C p-ω [0, T ] , GN Rd into itself (equipped with homogenous resp. inhomogenous distance dp-ω resp. ρp-ω ) is covered directly by Proposition 8.9, and the case of 1/p-H¨older paths is a special case, namely ω (s, t) = t − s. We thus turn to p-variation. → direction: Let x1 , x2 ∈ Bp-var (R), the “homogenous” p-variation ball of radius R as deﬁned before Corollary 8.5, and let ω denote the corresponding control constructed in Proposition 8.4. Then, with constants c1 , c2 which may depend on N, R, p we have ρp-var x1 , x2 ≤ ρp-ω x1 , x2 by (8.6) ≤ c1 dp-ω x1 , x2 by (8.8) ≤ c2 dp-var (x1 , x2 ) by the very choice of ω. ω (s, t) = 0 and x ∈ C p -ω then x s , t = o, the unit element in G N Rd ; we agree that δ 1 o = o. 2 If

0

Variation and H¨ older spaces on free groups

174

The ← direction follows the same logic, but now we rely on (8.5), (8.7) and ω as constructed in Proposition 8.7. We ﬁnish this section with a simple proposition (it will serve as a technical ingredient in our discussion of RDE smoothness later on). n d Proposition 8.11 Let A denote the canonical lift of A ∈ L R , R to d N n N p-var N [0, T ] , G (Rn ) the Hom G (R ) , G R . Then, for ﬁxed x ∈ C map A ∈ L Rn , Rd → Ax := {t ∈ [0, T ] → Axt } ∈ C p-var [0, T ] , GN Rd is continuous. Proof. We ﬁrst prove this for x ∈ C p-ω [0, T ] , GN (Rn ) . Recall that dp-ω and the inhomogenous distance ρp-ω are locally H¨older equivalent (and in particular induce the same topology). Consider A, B ∈ L Rn , Rd with operator norm bounded by some M . Controlling ρp-ω (Ax, Bx) amounts to i i i controlling (Ax) − (Bx) for i = 1, . . . , N . Every (Ax) can be written out as a contraction of A ⊗ · · · ⊗ A (i times) against the i-tensor xi . It is then easy to see that i i (Ax) − (Bx) ≤ CM |A − B| xi . This implies that A → Ax is even Lipschitz when C p-var [0, T ] , GN Rd is equipped with ρp-ω . Switching to dp-ω we still have continuity, and hence dp-var continuity.

8.2 Geodesic approximations Our interest in GN Rd -valued paths comes from the fact (cf. Section 7.2.1) that any continuous path x of ﬁnite 1-variation with values in Rd can be lifted to a path {t → SN (x)t ≡ SN (x)0,t } with values in GN Rd simply by computing all iterated (Riemann–Stieltjes) integrals up to order N . It is natural to ask some sort of converse: how can an abstract path x : [0, T ] → GN Rd be approximated by a sequence (SN (xn ))? We have Proposition 8.12 Let x ∈ Cop-var [0, T ] , GN Rd , p ≥ 1. Then there exists (xn ) ⊂ C 1-H¨o l [0, T ] , Rd , such that d∞;[0,T ] (x, SN (xn )) → 0 as n → ∞ and sup SN (xn )p-var;[0,T ] ≤ 31−1/p xp-var;[0,T ] < ∞. n

8.3 Completeness and non-separability 1/p-H¨o l

For x ∈ Co

175

[0, T ] , GN Rd we have

sup SN (xn )1/p-H¨o l;[0,T ] ≤ 31−1/p x1/p-H¨o l;[0,T ] < ∞. n

Proof. Given the fact that GN Rd is a geodesic space under Carnot– Caratheodory distance, this follows readily from the approximation results in geodesic spaces, Section 5.2.

8.3 Completeness and non-separability By Theorem 8.10 we can equip the space C p-var ([0, T ], GN Rd ) with either homogenous or inhomogenous p-variation distance and not only obtain the same topology but also the same “metric” notions of bounded sets or Cauchy-sequences. The same holds for H¨older paths, of course, and we have Theorem 8.13 (i) Let p ≥ 1. The space C p-var ([0, T ], GN Rd ) is a complete, non-separable metric space (with respect to either homogenous or inhomogenous p-variation distance). (ii) The space C 1/p-H¨o l ([0, T ], GN Rd ) is a complete, non-separable metric space (with respect to either homogenous or inhomogenous 1/p-H¨ older distance). Proof. (i) It suﬃces to consider the homogenous p-variation distance, and more precisely d˜p-var (x, y) ≡ dp-var (x, y) + d (x0 , y0 ) for x, y ∈ C p-var ([0, T ], GN Rd ). The completeness proof follows exactly from the arguments used to establish completeness of C p-var ([0, T ], Rd ). Then, if C p-var ([0, T ], GN Rd ) were separable for some N = 1, 2, . . . the same would be true for its projection to N = 1, but we know that C p-var ([0, T ], Rd ) is not separable, cf. Theorem 5.25 it is not. (ii) Similar and left to the reader.

8.4 The d0 /d∞ estimate Lemma 8.14 Let g, h ∈ GN Rd . Then there exists C = C (N, d) such that

−1 g ⊗ h ⊗ g ≤ C max h , h1/N g1−1/N .

176

Variation and H¨ older spaces on free groups

n Proof. Viewing g, h as elements in T N Rd , and writing (·) ≡ π n (·) for projection to the nth tensor level, we have for n = 1, ..., N i −1 n ⊗n g −1 ⊗ hj ⊗ g k ∈ Rd g ⊗h⊗g = . i+j +k = n j>0

⊗n Since every tensor level Rd is equipped with Euclidean structure, we easily see that i n −1 g ⊗h⊗g ≤ g −1 hj g k . i+j +k =n j>0

k Using g k ≤ c1 g and similar estimates for g −1 , h, also recalling g −1 = g, we ﬁnd n n −1 n −j j g h g ⊗ h ⊗ g ≤ c2 j =1

which implies n 1/n −1 g ⊗h⊗g

1−j /n

≤

c3 max g

≤

c3

j =1,...,n

sup 1/N ≤θ ≤1

By equivalence of homogenous norms, −1 g ⊗ h ⊗ g ≤ c4 sup

1/N ≤θ ≤1

1−θ

g

g

1−θ

h

j /n

θ

h .

θ

h .

The proof is now easily ﬁnished.

8.15 (d0 /d∞ estimate) On the path-space C0 [0, 1] , GN Proposition Rd the distances d∞ and d0 ≡ d0-H¨o l are locally 1/N -H¨ older equivalent. More precisely, there exists C = C (N, d) such that 1/N d∞ (x, y) ≤ d0 (x, y) ≤ C max d∞ (x, y) , d∞ (x, y) 1−1/N . (x∞ + y∞ ) Proof. Only the second inequality requires a proof. We write gh instead of g ⊗ h. For any s < t in [0, 1], −1 −1 −1 −1 −1 x−1 s,t ys,t = xs,t ys xs xs,t xt yt xt yt yt xt .

By sub-additivity, −1 xs,t ys,t

−1 −1 −1 −1 xs,t ys xs xs,t + x−1 t yt xt yt y t xt −1 −1 −1 −1 = v ys xs v + w xt yt w ≤

8.5 Interpolation and compactness

177

with v = xs,t and w = yt−1 xt . Note that −1 −1 yt xt = xt yt = d (xt , yt ) and v , w ≤ x∞ + y∞ . The conclusion now follows from Lemma 8.14.

8.5 Interpolation and compactness This interpolation result will be used extensively. Lemma 8.16 Assume p > p ≥ 1. (i) Let x, y be two elements of C p-var [0, T ], GN Rd . Then, p/p

1−p/p

dp -var (x, y) ≤ (xp-var + yp-var ) d0 (x, y) . (8.9) (ii) Let x, y be two elements of C p-ω [0, T ], GN Rd . Then, for all p > p, dp -ω (x, y) ≤ (xp -ω + yp -ω )

p/p

d0 (x, y)

1−p/p

.

(8.10)

Proof. (i) Consider a dissection (ti ) ⊂ [0, T ]. Then (8.9) follows from p d xt i ,t i + 1 , yt i ,t i + 1

≤ ≤

p p −p d xt i ,t i + 1 , yt i ,t i + 1 d0,[0,T ] (x, y) xt ,t + yt ,t p d0,[0,T ] (x, y)p −p , i i+ 1 i i+ 1

p 1/p p 1/p followed by summation over i, the elementary ( |ai +bi | ) ≤ ( |ai | ) + p 1/p and taking the supremum over all such dissections. ( |bi | ) (ii) Left to the reader. As a consequence of the above interpolation result, Lemma 5.12, and the Arzela–Ascoli theorem, we obtain the following compactness result. Proposition 8.17 Let (xn ) be a sequence in C [0, T ] , GN Rd . (i) Assume (xn )n is equicontinuous, bounded and supn xn p-var;[0,T ] < ∞. Then xn converges (in p > p variation, along a subsequence) to some p-var [0, T ] , GN Rd . x∈C (ii) Assume (xn )n is bounded and supn ||xn ||α -H¨o l;[0,T ] < ∞. Then xn converges (in α < α H¨ o lder topology, along a subsequence) to some x ∈ C α -H¨o l [0, T ] , GN Rd . The following corollary will also be useful.

such that Corollary 8.18 (i) If (xn ), x are in C p-var [0, T ] , GN Rd supn |xn |p-var;[0,T ] < ∞ and limn →∞ d∞;[0,T ] (x, xn ) = 0, then for p > p, sup xn p -var;[s,t] − xp -var;[s,t] → 0 as n → ∞ (s,t)∈∆ T

Variation and H¨ older spaces on free groups

178

where ∆T = {(s, t) : 0 ≤ s ≤ t ≤ T }. Furthermore, {xn p -var; [·,·] : n ∈ N} is equicontinuous in the sense that for every ε > 0 there exists δ such that |t − s| < δ implies (8.11) sup xn p -var; [s,t] < ε. n

[0, T ] , GN Rd so that supn ||xn ||α -H¨o l;[0,T ] < (ii) If (x ) , x are in C ∞ and limn →∞ d∞;[0,T ] (x, xn ) = 0, then for all s < t in [0, T ], and for α < α, sup xn α -H¨o l;[s,t] − xα -H¨o l;[s,t] → 0 as n → ∞ n

α -H¨o l

(s,t)∈∆ T

and

xn α -H¨o l;[·,·] : n ∈ N

is equicontinuous, similar to part (i).

d Proof. The argument given in the proof of Corollary 5.29 for the case of R valued paths extends line-by-line to the case of GN Rd -valued paths.

8.6 Closure of lifted smooth paths We now deﬁne C 0,p-var [0, T ] , GN Rd resp. C 0,1/p-H¨o l [0, T ] , GN Rd as the closure of step-N lifted smooth paths from [0, T ] → Rd in p-variation resp. 1/p-H¨older topology. A little care is needed since, by convention, SN (x)0 = 1 ≡ o, the unit element in GN Rd . 0,p-var [0, T ] , GN Rd as the set of conDeﬁnition 8.19 (i) We deﬁne C o tinuous paths x : [0, T ] → GN Rd for which there exists a sequence of smooth Rd -valued paths xn such that dp-var (x,SN (xn )) →n →∞ 0 and C 0,p-var [0, T ] , GN Rd as the set of paths x with 0,p-var [0, T ] , GN Rd . x0,· = x−1 0 ⊗ x· ∈ Co 0,1/p-H¨o l (ii) Similarly, Co [0, T ] , GN Rd is the set of paths x for which there exists a sequence of smooth Rd -valued paths xn such that d1/p-H¨o l (x,SN (xn )) →n →∞ 0 d 0,1/p-H¨o l 0,1/p-H¨o l N [0, T ] , G R are those paths x with x0,· ∈ Co and C [0, T ] , GN Rd .

8.6 Closure of lifted smooth paths

179

Obviously, all these spaces are closed subsets of C p-var [0, T ] , GN Rd imresp. C 1/p-H¨o l [0, T ] , GN Rd and thus complete. Proposition 7.63 1-var 1-var d [0, T ] , R to C plies a fortiori continuity of S , as a map from C N o o [0, T ] , GN Rd . Clearly then C 0,1-var [0, T ] , GN Rd = SN C 0,1-var [0, T ] , Rd (8.12) d and also C 0,1-H¨o l [0, T ] , GN Rd = SN C 1 [0, T ] , R d . The reader will 0,1-var [0, T ] , R , deﬁned as a recall from Proposition 1.32 that C 1-variation closure of smooth paths, turned out to be precisely the space of absolutely continuous paths (with respect to Euclidean metric on Rd ); in Exercise 8.20 it is seen that the same is true in the case of GN Rd -valued paths. Exercise 8.20 Show that C 0,1-var [0, T ] , GN Rd is precisely the space of absolutely continuous paths (with respect to Carnot–Caratheodory metric on GN Rd ). Solution. It suﬃces to consider x ∈ Co0,1-var [0, T ] , GN Rd . Any such x (·) is of the form SN (x)· where x is an Rd -valued absolutely continuous path. Hence, for every ε > 0and s1 < t1 ≤ s2 < t2≤ · · · < sn < tn in [0, T ] there exists δ so that i |ti − si | < δ implies i |xs i ,t i | < ε and in fact (cf. Exercise 5.15) |x|1-var;[s i ,t i ] < ε. i

But then, thanks to Proposition 7.59 |d (xs i , xt i )| ≤ |x|1-var;[s i ,t i ] = |x|1-var;[s i ,t i ] < ε. i

i

i

Lemma 8.21 We ﬁx p > 1. (i) If Ω ⊂ C 1-var [0, T ] , GN Rd and if C 0,1-var [0, T ] , GN Rd is included in the d1-var-closure of Ω, then the dp-var -closure of Ω is equal to C 0,p-var [0, T ] , GN Rd . (ii) If Ω ⊂ C 1-H¨o l [0, T ] , GN Rd and if C 0,1-H¨o l [0, T ] , GN Rd is included in the d1-H¨o l -closure of Ω, then the d1/p-H¨o l -closure of Ω is equal to C 0,1/p-H¨o l [0, T ] , GN Rd . Proof. Same proof as the N = 1 case (cf. Lemma 5.30). We can now extend “Wiener’s characterization” from the Rd setting (Theorem 8.22) to the group setting. Recall in particular that xD denotes the geodesic approximation to x based on some dissection D = (ti ) of [0, T ]. D That is, xD t i = xt i for all i and x |[t i ,t i + 1 ] is a geodesic connecting xt i and xt i + 1 as in Proposition 7.42. The proof of the Rd case then extends without any changes and we have

Variation and H¨ older spaces on free groups

180

8.22 (Wiener’s characterization) Let x ∈ C p-var [0, T ], GN Theorem Rd , with p > 1. The following d statements are equivalent. 0,p-var N (i.1) x ∈ C ([0, T ], G R ). p (i.2a) limδ →0 supD =(t i ),|D |< δ i xp-var;[t i ,t i + 1 ] = 0. p (i.2b) limδ →0 supD =(t i ),|D |< δ i d xt i , xt i + 1 = 0. (i.3) lim|D |→0 dp-var xD , x = 0. Secondly, let x ∈ C 1/p-H¨o l [0, T ], GN Rd , with p > 1. The following statements are equivalent. (ii.1) x ∈ C 0,1/p-H¨o l [0, T ], Rd . (ii.2a) limδ →0 sup|t−s|< δ x1/p-H¨o l;[s,t] = 0. (ii.2b) limδ →0 sup|t−s|< δ d(xs , xt )/|t − s|1/p = 0. (ii.3) lim|D |→0 d1/p-H¨o l xD , x = 0. Exercise 8.23 Let p ≥ 1. Show that C 0,p-var [0, T ] , GN Rd is precisely the space of paths which are absolutely continuous of order p. Solution. The case p = 1 was dealt with in Exercise 8.20. For p > 1 we simply combine the result of Exercise 5.15 with Wiener’s characterization. Corollary 8.24 For p > 1, we have the following set inclusions: -

C q -var [0, T ] , GN Rd

1≤q p

Proof. Similar to Corollary 5.33. Proposition 8.25 Let p ≥ 1. The spaces C 0,p-var [0, T ] , GN Rd , C 0,1/p-H¨o l [0, T ] , GN Rd are Polish with respect to either homogenous or inhomogenous p-variation resp. 1/p-H¨ older distance. Proof. As remarked in Section 8.3, either choice (homogenous, inhomogenous) of p-variation distance leads to the same topology and notion of Cauchy-sequence. Clearly then, C 0,p-var is complete under either distance. It remains to discuss separability. 1.35, there exists a count From Corollary 0,1-var d [0, T ] , R ; by continuity of SN , SN (Ω) is able space Ω dense in C d 0,1-var N [0, T ] , G R . We conclude using Lemma 5.30. Similar dense in C arguments for 1/p-H¨older spaces are left to the reader.

8.7 Comments

181

8.7 Comments Section 8.1 introduces the basic path space distance for GN Rd -valued paths (homogenous such as dp-var , inhomogenous such as ρp-var ). As will be discussed in detail in the next chapter, if one chooses N = [p] these distances are “rough path” distances. Noting that both dp-var and ρp-var induce the same topology, both notions are useful. Typical rough path continuity statements are locally Lipschitz continuous in the inhomogenous distance (which is also the distance put forward by Lyons, e.g. Lyons and Qian [120] in the references cited therein). The homogenous distance, on the other hand, comes in handy when establishing large deviation results via exponentially good approximations (as seen in Lemma 13.40 for instance); not to mention its general convenience, which often allows us to write arguments in the same way as for paths on Euclidean space. The geodesic approximation result of Section 8.2 appeared in Friz and Victoir [63]; here it is derived as a special case of a general approximation result on geodesic spaces. Section 8.3 follows Friz and Victoir [63]. The d0 /d∞ estimate in Section 8.4 is taken from Friz and Victoir [61]; Sections 8.5 and 8.6 follow Friz and Victoir [63].

9 Geometric rough path spaces of ﬁnite p-variation for any N ∈ N We have studied GN Rd -valued paths and p ≥ 1. If one thinks of GN Rd as the correct state-space that allows us not only to keep track of the spatial position in Rd but also to keep track of the accumulated area (and higher indeﬁnite iterated integrals up to order N ) then it should not be surprising that N and p stand in some canonical relation. To wit, if p = 1, knowledge of the Rd -valued path allows us to compute iterated integrals by Riemann–Stieltjes theory and there is no need to include area and other iterated integrals in the state-space. The same remark applies, more generally, to p ∈ [1, 2) using iterated Young integration and in this case we should take N = [p] = 1. When p ≥ 2, this is not possible and knowledge of higher indeﬁnite iterated integrals up to order N = [p] must be an a priori information, i.e. assumed to be known.1 However, we shall establish that integrals of order greater than [p] are still canonically determined. More precisely, we shall see that for N ≥ [p] there exists a canonical bijection2 SN : Cop-var [0, T ] , G[p] Rd → Cop-var [0, T ] , GN Rd such that for all x ∈ Cop-var [0, T ] , G[p] Rd we have xp-var;[0,T ] ≤ SN (x)p-var;[0,T ] ≤ CN xp-var;[0,T ] .

The analogous 1/p-H¨ older estimate also holds, and is a consequence of the p-variation estimate. Indeed, by reparametrization, [0, T ] may be replaced by [s, t] so that the H¨older statement follows trivially from xp-var;[s,t] ≤ 1/p

x1/p-H¨o l |t − s| . This gives a ﬁrst hint on the importance of these so-called (weak) geometeric rough whose regularity (p-variation) is in relation to their paths state-space G[p] Rd . 1 In a typical probabilistic situation (cf. Part III of this book), N = 2 or 3 and the required iterated integrals will be constructed via some stochastic integration procedure. 2 Recall that o in C p -va r indicates that all paths start at the unit element of G [p ] . o

9.1 The Lyons-lift map x → SN (x)

183

9.1 The Lyons-lift map x → SN (x) 9.1.1 Quantitative bound on SN We start with two simple technical lemmas. Lemma 9.1 Let g1 , g2 ∈ T N +1 Rd . Then, g1 ⊗ g2 − (g1 + g2 ) = π 0,N (g1 ) ⊗ π 0,N (g2 ) − π 0,N (g1 + g2 ) . In particular, if h1 , h2 ∈ T N +1 Rd are such that π 0,N (g1 ) = π 0,N (h1 ) and π 0,N (g2 ) = π 0,N (h2 ) , we have g1 ⊗ g2 − (g1 + g2 ) = h1 ⊗ h2 − (h1 + h2 ) . Proof. Simple algebra.

1 2 1-var [s, u] , Rd , such that Lemma ufor some N ≥ 1, 1 9.2 Let x 2,x ∈ C u SN x s,u = SN x s,u . Then, if ≥ max s dx1 , s dx2 , we have for some constant C depending only on N, (9.1) SN +1 x1 s,u − SN +1 x2 s,u ≤ C N +1 . Proof. By assumption, SN +1 x1 s,u − SN +1 x2 s,u = π N +1 SN +1 x1 s,u − SN +1 x2 s,u and (9.1) follows from π N +1 SN +1 xi s,u ≤

1 (N + 1)!

u

N +1

i dx

, i = 1, 2.

s

Alternative proof of (9.1) which introduces the useful idea of “reducing two paths to one path”: Without loss of generality, assume (s, u) = (0, 1) and observe that SN x1 0,1 = SN x2 0,1 implies −1 SN +1 x1 0,1 − SN +1 x2 0,1 = SN +1 x1 0,1 ⊗ SN +1 x2 0,1 − 1. ← − Deﬁne x = x1 x2 , i.e. as the concatenation of x1 (·) and x2 (1 − ·) and assume x is (re-)parametrized on [0, 1]. It follows that SN +1 x1 0,1 − SN +1 x2 0,1 = SN +1 (x)0,1 − 1, with SN (x)0,1 = 1 and |x|1-var;[0,1] = x1 1-var;[0,1] + x2 1-var;[0,1] ≤ 2. But then from SN (x)0,1 = 1 SN +1 (x)0,1 − 1 = π N +1 SN +1 (x)0,1 ≤ and (9.1) follows.

1 N +1 (2) (N + 1)!

Geometric rough path spaces

184

We are now ready for the crucial quantitative estimate on SN (x)p-var for N ≥ p. Proposition 9.3 Let x ∈ C 1-var [0, T ] , Rd . Then, for all N ≥ [p] , there exists a constant C depending only on N and p (and not depending on the 1-variation norm of x nor its p-variation) such that for all s < t in [0, T ] , (9.2) SN (x)p-var;[s,t] ≤ C S[p] (x)p-var;[s,t] . The constant C can be chosen to be right-continuous with respect to p. Proof. It is enough to show that for all N ≥ [p] , SN +1 (x)p-var;[s,t] ≤ c1 (p) SN (x)p-var;[s,t] ,

(9.3)

where p → c1 (p) is right continuous. Explicit dependency on p in our constant will be written in this proof. p

We deﬁne x = SN (x), y = SN +1 (x) and ω (s, t) = xp−var;[s,t] . By s,t Theorem 7.32 we can ﬁnd, for all s < td in [0, T ], a “geodesic” path x : d N [s, t] → R associated with xs,t ∈ G R which is the shortest (Lipschitz) path which has step-N signature equal to xs,t . Then, deﬁne Γs,t = ys,t − SN +1 xs,t s,t . For s < t < u, we also deﬁne xs,t,u to be the concatenation of xs,t and xt,u , and observe that SN +1 (xs,t,u )s,u = SN +1 (xs,t )s,t ⊗ SN +1 (xt,u )t,u . Then, as ys,t ⊗ yt,u = ys,u , Γs,u − (Γs,t + Γt,u ) = ys,t ⊗ yt,u − (ys,t + yt,u ) −SN +1 xs,t s,t ⊗ SN +1 xt,u t,u +SN +1 xs,t s,t + SN +1 xt,u t,u +SN +1 xs,t s,t ⊗ SN +1 xt,u t,u − SN +1 (xs,u )s,u . By construction, for all k ≤ N and all s, t ∈ [0, T ] , π k (ys,t ) = π k s,t SN +1 (x )s,t , hence we can apply Lemma 9.1 to see that ys,t ⊗ yt,u − (ys,t + yt,u ) = SN +1 xs,t s,t ⊗ SN +1 xt,u t,u − SN +1 xs,t s,t + SN +1 xt,u t,u . Hence, we are left with Γs,u − (Γs,t + Γt,u ) = SN +1 xs,t,u s,u − SN +1 (xs,u )s,u ,

9.1 The Lyons-lift map x → SN (x)

185

which we bound using Lemma 9.2: |Γs,u − (Γs,t + Γt,u )|

u

≤ c1 max

s,t,u dxr ,

s

≤

c2 ω (s, u)

N +1

u

|dxs,u r | s

N +1 p

.

(9.4)

t

t Secondly, using s |dxs,t | ≤ s |dx|, Lemma 9.2 gives N +1 N +1 ˜ (s, t) . |Γs,t | = ys,t − SN +1 xs,t s,t ≤ c3 |x|1-var;[s,t] =: ω The last two inequalities allow us to apply the same (analysis) Lemma 6.2 (which we have already used for establishing the Young–L´ oeve estimate). We get, for all 0 ≤ s < t ≤ T , N +1 ys,t − SN +1 xs,t s,t ≤ c4 CN ,p ω (s, t) p where CN ,p can be taken to be 1/(1 − 21− inequality that |π N +1 (ys,t )|

≤ c4 CN ,p ω (s, t)

N +1 p

N +1 p

). This implies by the triangle

+ π N +1 SN +1 xs,t s,t

≤ c5 (1 + CN ,p ) ω (s, t)

N +1 p

,

and hence, using the equivalence of homogenous norms, that for all s, t ∈ [0, T ] , 1/p 1/p ω (s, t) . ys,t ≤ c6 1 + (1 + CN ,p ) This means that

1/p yp-var;[s,t] ≤ c6 1 + (1 + CN ,p ) xp-var;[s,t]

for all 0 ≤ s < t ≤ T and the proof is ﬁnished.

9.1.2 Deﬁnition of the map SN on Cop-var [0, T ] , G[p] Rd

d p-var [p] [0, T ] , G R . A path Deﬁnition 9.4 Let N ≥ [p] ≥ 1 and x ∈ C o d p-var N [0, T ] , G R that projects down onto x is said to be a pin Co Lyons lift of x of order N . When p is ﬁxed, we will simply speak of a Lyons lift. We now show that there exists a unique Lyons lift of order N , for all x ∈ Cop-var [0, T ] , G[p] Rd . In the terminology of the forthcoming Deﬁnition 9.15 this says precisely that a weak geometric p-rough path admits a unique Lyons lift.

Geometric rough path spaces

186

Theorem 9.5 Let N ≥ [p] ≥ 1 and x ∈ Cop-var [0, T ] , G[p] Rd . Then there exists a unique Lyons lift of order N of x. By writing SN (x) for this path, we deﬁne the map SN on Cop-var [0, T ] , G[p] Rd . Moreover, (i) the map SN : Cop-var [0, T ] , G[p] Rd → Cop-var [0, T ] , GN Rd is a bijection with inverse π 0,[p] ; (ii) we have for some constant C = C (N, p), which may be taken rightcontinuous in p, xp-var;[s,t] ≤ SN (x)p-var;[s,t] ≤ C xp-var;[s,t]

(9.5)

for all s < t in [0, T ]. d q -var [q ] [0, T ] , G R for q ≤ p, SN Remark 9.6 Observe that if x ∈ C S[p] (x) = SN (x). This justiﬁes using the same notation for all p, and in particular the same notation for p > 1 and p =1. For further convenience, we will deﬁne for N ≤ [p] and for x ∈ C p-var [0,T ] ,G[p] Rd the path SN (x) which is just the projection of x onto GN Rd . In particular, the estimate SN (x)p-var;[s,t] ≤ C xp-var;[s,t] still holds for N ≤ [p] . Proof. First step, existence: Let xn be a sequence in C 1-var [0, T ] , Rd such that sup S[p] (xn )p-var;[0,T ] < ∞ and lim d∞ S[p] (xn ) , x = 0. n →∞

n

By Proposition 9.3, for all s < t in [0, T ] and ε small enough (namely, such that [p + ε] = [p]), SN (xn )s,t ≤ cp+ ε S[p] (xn )(p+ε)-var;[s,t] . By Corollary 8.18 the right-hand side above can be made arbitrarily small, uniformly in n, provided t − s is small enough; this implies readily that SN (xn ) is equicontinuous. Boundedness is clear and so, by Arzela– Ascoli and switching toa subsequence if necessary, we have the existence of a continuous GN Rd -valued path z such that SN (xn ) → z uniformly on [0, T ] as n → ∞. From the very choice of (xn ) it then follows that the projection of z to a G[p] Rd -valued path must be equal to x. Then, for all 0 ≤ s < t ≤ T, zs,t = lim SN (xn )s,t n →∞

≤ ≤

lim SN (xn )(p+ε)-var;[s,t] cp+ε lim S[p] (xn )

n →∞

n →∞

(p+ε)-var;[s,t]

9.1 The Lyons-lift map x → SN (x)

187

where we used (9.2), Proposition 9.3, for the last estimate. On the other hand, from the ﬁrst part of Corollary 8.18, lim S[p] (xn )(p+ε)-var;[s,t] = x(p+ε)-var;[s,t] n →∞

and hence, for all 0 ≤ s < t ≤ T, zs,t ≤ cp+ ε x(p+ε)-var;[s,t] . Using “right-continuity” of p → cp (Proposition 9.3) and also right-continuity of the homogenous p-variation norm with respect to p (Lemma 5.13) we may send ε → 0 to obtain zs,t ≤ cp xp-var;[s,t] . p

Super-additivity of (s, t) → xp-var;[s,t] implies that zp-var;[s,t] ≤ cp xp-var;[s,t] ; the converse estimate xp-var;[s,t] ≤ zp-var;[s,t] is trivial since we know that z lifts x. In particular, we found a (Lyons) lift of x in Cop-var [0, T ] , GN Rd , which satisﬁes (9.5). with z ∈ Cop-var [0, T ] , GM +1 Rd Second step, uniqueness: Given z, ˜ π 0,M (z) ≡ π 0,M (˜ z) we show (by induction in M ) that M ≥ [p] implies zt deﬁnes a path in gM +1 Rd ∩ z≡˜ z. From Lemma 7.61, ht = log z−1 t ⊗˜ d ⊗(M +1) R , and for all s, t ∈ [0, T ] , M +1

|hs,t | ≤ c1 zs,t

+ c1 ˜ zs,t

p

M +1

.

p

We deﬁne the control ω (s, t) = zp-var;[s,t] + ˜ zp-var;[s,t] . The previous inequality now reads (M +1)/p

|hs,t | ≤ c2 ω (s, t)

.

In particular, h is of ﬁnite M p+1 -variation. As M p+1 < 1, we deduce that h z, which is what we wanted to is constant equal to h0 = 0, i.e. that z = ˜ show. Third step: It remains to see that, as stated in (i), SN is a bijection with inverse on Cop-var [0, T ] , G[p] N isthe identity map d π 0,[p] . Obviously, π 0,[p] ◦Sp-var R . Conversely, given x ∈ Co [0, T ] , GN Rd it is clear from (9.5) that SN ◦ π 0,[p] (x) has ﬁnite p-variation. By uniqueness, we see that SNN p-var [0, T ] , G ◦ π (x) = x so that S ◦ π acts as identity map on C N 0,[p] o d0,[p] R . This completes the proof.

188

Geometric rough path spaces

Exercise 9.7 Let N ≥ [p] ≥ 1 and x∈ Cop-var [0, T ] , G[p] Rd . Prove N d such that for all sequences (xn ) ⊂ that there existsdz ∈ Co [0, T ] , G R 1-var [0, T ] , R such that C d∞;[0,T ] S[p] (xn ) , x → 0 and sup S[p] (xn ) p-var;[0,T ] < ∞, n

we have SN (xn ) → z uniformly on [0, T ] as n → ∞. (This exercise shows that we could have deﬁned the Lyons lift as a limit, similar to our deﬁnition of the Young integral and the forthcoming deﬁnition of the solution to a rough diﬀerential equation.)

9.1.3 Modulus of continuity for the map SN We shall now establish that the Lyons-lifting map is locally Lipschitz continuous with respect to inhomogenous rough path distances on pathspace. To this end, we need the following two lemmas. Lemma 9.8 Let N ≥ [p] ≥ 1 and x ∈ Cop-var [s, t] , G[p] Rd , and xs,t ∈

t C 1-var [s, t] , Rd with s |dxs,t | ≤ K xp-var;[s,t] such that SN (xs,t )s,t = SN (xs,t ) . Then, for some constant C depending only on K, N and p, N +1 SN +1 (x)s,t − SN +1 xs,t s,t ≤ c xp-var;[s,t] . Proof. This is fairly obvious. As SN (x)s,t = SN (xs,t )s,t , we have SN +1 (x)s,t − SN +1 xs,t s,t = π N +1 SN +1 (x)s,t − π N +1 SN +1 xs,t s,t ≤ π N +1 SN +1 (x)s,t + π N +1 SN +1 xs,t s,t . Using equivalence of homogenous norms and the quantitative estimates on the Lyons lift obtained in Theorem 9.5, we have SN +1 (x)s,t − SN +1 xs,t s,t N +1 s,t N +1 + SN +1 x s,t ≤ c1 SN +1 (x)s,t N +1 N +1 ≤ c2 xp-var;[s,t] + xs,t 1-var;[s,t] N +1

≤ c3 xp-var;[s,t] . The following result generalizes Lemma 9.2.

9.1 The Lyons-lift map x → SN (x)

189

Lemma 9.9 Let x1 , x2 , x ˜1 , x ˜2 ∈ C 1-var [s, u] , Rd such that SN x1 s,u = SN x2 s,u , 1 2 SN x ˜ s,u = SN x ˜ s,u , and assume there exist ≥ 0, ε > 0 such that u u u u 1 2 1 2 max |dx | + |dx |, |d˜ x |+ |d˜ x | s s s u s u 1 2 1 2 dxr − d˜ dxr − d˜ xr , xr max s

≤ , ≤

ε.

s

Then, for some constant C depending only on N , 1 2 ˜ s,u −SN +1 x ˜ s,u ≤ CεN +1 . SN +1 x1 s,u −SN +1 x2 s,u − SN +1 x Proof. Working as in the proof of Lemma 9.2, we see that we can assume ˜2 = 0 and (s, u) = (0, 1) . Then, scaling x1 and x ˜1 by 1 , we can x2 = x assume = 1. The lemma then follows from Proposition 7.63. We can now prove local Lipschitzness of the Lyons-lifting map SN . Theorem 9.10 Let x1 , x2 ∈ C p-var [0, T ] , G[p] Rd and ω a control such that for all 0 ≤ s < t ≤ T and i = 1, 2, i p x ≤ ω (s, t) , p-var;[s,t] 1 2 ≤ ε. ρp,ω x , x Then, for all N ≥ [p] , there exists a constant C depending only on N and p such that for all s < t in [0, T ] , ρp,ω SN x1 , SN x2 ≤ Cε. 1 2 Proof. Itis enough to show that for all 1N ≥2 [p] , if x and x are two paths p-var N d [0, T ] , G R with ρp,ω x , x ≤ ε, then, for some constant in C cN , ρp,ω SN +1 x1 , SN +1 x2 ≤ cN ε. Let x1,s,t , x2,s,t ∈ C 1-var [s, t] , Rd such that SN xi,s,t s,t = xis,t ,

and such that

s

d xi,s,t

≤

c1 ω (s, t)

d x1,s,t − x2,s,t

≤

c1 εω (s, t)

t

1/p

,

s

t

1/p

.

Geometric rough path spaces

190

This is possible thanks to Proposition 7.64, applied to δ λ x1s,t and δ λ x2s,t 1/p

with λ = 1/ω (s, t) . We deﬁne similarly xi,s,u and xi,t,u , and then xi,s,t,u to be the concatenation of xi,s,t and xi,t,u . Observe in particular that u 1/p d x1,s,t,u − x2,s,t,u ≤ 21−1/p c1 εω (s, u) . s

Following the proof of Proposition 9.3, we deﬁne for s < t, Γis,t = SN +1 xi,s,t s,t − SN +1 xi s,t , i = 1, 2, and Γs,t = Γ1s,t − Γ2s,t . It is clear from the proof of Proposition 9.3 that i c2 N +1 Γs,t ≤ ω (s, t) p , i = 1, 2, 2 and, by the triangle inequality, N +1 Γs,t ≤ c2 ω (s, t) p .

(9.6)

On the other hand, employing the same logic as in the proof of Proposition 9.3, we see that Γis,u − Γis,t + Γit,u = SN +1 xi,s,t,u s,u − SN +1 xi,s,u s,u . We can therefore use Lemma 9.9 to see that N +1 Γs,u − Γs,t + Γt,u ≤ c3 εω (s, u) p .

(9.7)

Inequalities (9.6) and (9.7) allow us to use the (analysis) Lemma 6.2, and we learn that for all 0 ≤ s < t ≤ T , N +1 Γs,t ≤ c4 εω (s, t) p . From Proposition 7.63, N +1 π N +1 SN +1 x1,s,t s,t − SN +1 x2,s,t s,t ≤ c5 εω (s, t) p . This implies by the triangle inequality that N +1 π N +1 SN +1 x1 s,t − SN +1 x2 s,t ≤ c6 εω (s, t) p , i.e. that ρp,ω SN +1 x1 , SN +1 x2 ≤ ε. From Theorem 8.10 we immediately deduce Corollary 9.11 Let N ≥ [p] . (i) The map SN : Cop-var [0, T ] , G[p] Rd → Cop-var [0, T ] , GN Rd

9.2 Spaces of geometric rough paths

191

is uniformly continuous on bounded sets, using the dp-var -metric. (ii) The map SN : Co1/p-H¨o l [0, T ] , G[p] Rd → Co1/p-H¨o l [0, T ] , GN Rd is uniformly continuous on bounded sets, using the d1/p-H¨o l -metric.

9.2 Spaces of geometric rough paths Theorem 9.12 Let p ≥ 1 and N ≥ [p] . Let Ω denote either Cop-var , 1/p-H¨o l 0,1/p-H¨o l or Co . Then Co0,p-var , Co (i) the map x ∈ Ω([0, T ], GN Rd ) → π 0,[p] (x) ∈ Ω([0, T ], G[p] Rd ) is a bijection, with inverse the map SN ; (ii) for p ≥ 2 and d ≥ 2, the map x ∈ Ω([0, T ], GN Rd ) → π 0,[p]−1 (x) ∈ Ω([0, T ], G[p]−1 Rd ) is not a bijection. Remark 9.13 The proof of part (ii) will show that: (case (ii a)) π 0,[p]−1 is not an injection, when p is not an integer. It is proven in [122] that it is a surjection when p is not an integer. (case (ii b1)) π 0,[p]−1 is not an injection when p is an integer, and Ω = p-var C or Ω = C 1/p-H¨o l . (case (ii b2)) π 0,[p]−1 is not a surjection, when p is an integer, and Ω = C 0,p-var or Ω = C 0,1/p-H¨o l ; we leave it as an exercise for the reader to prove that it is not an injection in this case. Proof. (i) The case Ω = C p-var follows from Theorem 9.5 and C 1/p-H¨o l is an obvious corollary from the C p-var case. The case Ω = C 0,p-var (resp. C 0,1/p-H¨o l ) follows from the case Ω = C p-var (resp. C 1/p-H¨o l ) by Wiener’s characterization (Theorem 8.22). (ii) (a) We ﬁrst assume that Ω = C p-var or C 1/p-H¨o l . Let h be a non-zero ⊗[p] -valued path which is 1/q-H¨older with q = N/p. As in g[p] Rd ∩ Rd the proof of Lemma 7.61, we see that if x ∈ Ω([0, T ], G[p] Rd ) and y is deﬁned by yt = xt ⊗ exp (ht ) , d [p] R ) and π 0,[p−1] (y) = π 0,[p−1] (x) . This means then y ∈ Ω([0, T ], G that π 0,[p]−1 (y) = π 0,[p]−1 (x), and as y = x, π 0,[p]−1 is not an injection from Ω([0, T ], GN Rd ) into Ω([0, T ], G[p−1] Rd ).

Geometric rough path spaces

192

(b) We now assume that Ω = C 0,p-var or C 0,1/p-H¨o l . (b1) We deal with the case N > p and again take a non-zero ⊗[p] , h ∈ C 0,1/q -H¨o l [0, T ] , g[p] Rd ∩ Rd with q = N/p. Deﬁne the path y as above, yt = xt ⊗ exp (ht ). We have already seen that y ∈ Ω([0, T ], G[p] Rd ). Using Wiener character ization we actually see that y ∈ Ω([0, T ], G[p] Rd ) (it is at this point that we need q > 1, that is N > p). Once again, π 0,[p]−1 (y) = π0,[p]−1 (x), and as y = x, π 0,[p]−1 is not an injection from Ω([0, T ], GN Rd ) into Ω([0, T ], G[p]−1 Rd ). (b2) It only remains to deal with the case Ω = C 0,p-var or C 0,1/p-H¨o l , and N = p (which implies that p ∈ {2, 3, . . . }). We aim to prove in this case that π 0,[p]−1 is not a surjection or, in other words, that there exists a path x ∈ Ω([0, T ], G[p]−1 Rd ) which admits no lift to Ω([0, T ], G[p] Rd ). To this end, assume T = 1 for simplicity of notation, and assume we have a path y ∈ Ω([0, 1], G[p] Rd ) that projects down onto x. Let ω (s, t) = p yp-var;[s,t] , which is ﬁnite by assumption. By deﬁnition of the increment of y, we have n 24 −1 y in , i +n1 . y0,1 = 2

2

i=0

˜ in , i +n1 = log x in , i +n1 , Deﬁne x ˜ in , i +n1 ∈ G[p] Rd by π 0,[p]−1 log x 2 2 2 2 2 2 ˜ in , i +n1 = 0. Observe that x ˜ in , i +n1 are not the increments and π [p] log x 2 2 2 2 ˜ as a map that associates to every of a G[p] Rd -valued path; we view x d i i+1 [p] an element in G R . Hence, as we also dyadic interval of form n , 2n 2 have π 0,[p]−1 y

i 2n

, i2+n1

y0,1 =

=x

n 24 −1

x ˜

i 2n

i 2n

,

, i2+n1

i+ 1 2n

i=0

, a short computation gives

+

n 2 −1

π [p] log y

i 2n

, i2+n1

.

i=0

Then, by equivalence of homogenous norms, there exists c > 0 such that π [p] log y 2 in

,

i+ 1 2n

[p] c y in , i +n1 2 2 i i+1 by deﬁnition of ω. ≤ cω , 2n 2n

≤

In particular, we obtain that n −1 24 x ˜ in , i +n1 ≤ ω (0, 1) + |y0,1 | < ∞. 2 2 i=0

9.2 Spaces of geometric rough paths

193

Therefore, we proved that a necessary condition for a path x ∈ Ω([0, T ], G[p]−1 Rd ) to admit a lift y ∈ Ω([0, T ], G[p] Rd ) is given by 2 n −1 4 sup x ˜ in , i +n1 < ∞. 2 2 n ≥0 i=0

We therefore aim to provide a path x such that the above expression is inﬁnite. To this end, using d ≥ 2, we let e1 , e2 be the ﬁrst two vectors in the ⊗i standard basis of Rd . Deﬁne v1 = e2 , and vi+1 = [e1 , vi ] so that vi ∈ Rd . 0, p −1 ol p -H¨

0,1/p-H¨o l

For two paths, f ∈ C0 ([0, T ] , R) and g ∈ C0 deﬁne x ∈ Ω([0, T ], G[p]−1 Rd by

([0, T ] , R) , we

x (t) = ef t e 1 + g t v [ p −1 ] = ef t e 1 ⊗ eg t v [ p −1 ] ∈ G[p]−1 Rd . We note that x is indeed in Ω = C 0,p-var , from the deﬁnition of f, g and 1 xs,t = ef s , t e 1 ⊗ eg s , t v [ p −1 ] ≤ (const) × |fs,t | + |gs,t | p −1 . Deﬁning x ˜ in , i +n1 ∈ G[p] Rd as explained above, the Campbell–Baker– 2 2 Hausdorﬀ formula yields n 24 −1

i=0

x ˜

i 2n

, i2+n1

= exp f0,1 e1 + g0,1 v[p−1] + 6 n 2 −1 In particular, supn i=0 x ˜

n 2 −1

f 2 in g

i 2n

, i2+n1

− g 2 in f

i 2n

, i2+n1

v[p]

.

i=0 i 2n

2 n −1 f 2 in g in sup 2 n ≥0

, i2+n1

, i2+n1

< ∞ if and only if − g 2 in f

i=0

which itself is equivalent to

2 n −1 f 2 in g in sup 2 n ≥0 i=0

i 2n

i + 1 < ∞, , 2n

i + 1 < ∞. , 2n

The following Exercise 9.14 shows that for any p, q ≥ 1 with 1/p + 1/q = 1 there exist paths 0,1/p-H¨o l

0,1/q -H¨o l

([0, T ] , R) , g ∈ C0 ([0, T ] , R) f ∈ C0 n 2 −1 such that supn ≥0 i=0 f 2 in g in , i +n1 = ∞. In particular, we now see that 2 2 there exists a path x ∈ Ω([0, T ], G[p−1] Rd ) which does not admit a lift to Ω([0, T ], G[p] Rd ).

Geometric rough path spaces

194

Exercise 9.14 Assume [0, 1] is a continuous increasing 1] → ∞ that−1φ, ϕ−k: [0, ∞ −1 −k 2 and 2 are convergent φ ϕ bijection such that k =1 k =1 series. We then deﬁne the functions xφ : t ∈ [0, 1] → y φ : t ∈ [0, 1] →

∞ k =1 ∞

φ−1 2−k sin 2k +1 πt , φ−1 2−k 1 − cos 2k +1 πt .

k =1

(i) Prove that if φ is such that limx→0 xp /φ (x) = 0, then xφ and y φ belong 0,1/p-H¨o l ([0, T ] , R) . to C0 (ii) Prove that n 2 −1

i=0

1 j −1 −j −1 −j 2 ϕ 2 . 2 φ 4 j =1 n

xφi y ϕi 2n

2n

,

i+ 1 2n

≥

0,1/p-H¨o l

(iii) Provide an example of functions f, g ∈ C0 ([0, 1] , R) , for 1/p + 1/q = 1, such that 2 n −1 sup f 2 in g in , i +n1 = ∞. 2 2 n ≥0

0,1/q -H¨o l

([0, 1] , R)×C0

i=0

Solution. (i) First case. We ﬁrst assume that φ−1 (x) ≤ K |x| for all s, t ∈ [0, 1] , φ x (t) − xφ (s) ∞ ≤ 2−k /p sin 2k +1 πt − sin 2k +1 πs k =1



≤ 2K  

1

log 2

t −s

k =1

k =log 2

≤ 2K π |t − s|

1 log 2 t −s

2k (1−1/p) +

k =1 1/p

≤ cK |t − s|

+∞

. Then,

 φ−1 2−k 

+∞

φ−1 2−k 2k +1 π |t − s| +

1/p

1 t −s



2−k /p 

1 k =log 2 t −s

.

1/p . Hence, for all = 0, φ−1 (x) = o |x| −k −1 2 ≤ ε2−k /p . We ε > 0, there exists n ≥ 0 such that k ≥ n implies φ write xφ = xφ,0,n + xφ,n ,∞ , Second case. Now, if limx→0

xp φ(x)

9.2 Spaces of geometric rough paths

where xφ,i,j = we have

195

φ−1 2−k sin 2k +1 πt . Clearly, as xφ,0,n is smooth, φ,0,n x (t) − xφ,0,n (s) sup = 0. lim 1/p h→0 s,t,|t−s|≤h |t − s|

j

i+1

Moreover, working as in the ﬁrst case, we have φ x (t) − xφ (s) ≤ cε |t − s|1/p . Hence, we proved that for all ε > 0, φ x (t) − xφ (s) sup ≤ cε, lim 1/p h→0 s,t,|t−s|≤h |t − s| which concludes the proof of (i). − cos 2k +1 π 2in and sin 2k +1 π 2in are equal to 0, (ii) As cos 2k +1 π i+1 2n we obtain that n 2 −1

i=0

xφi y ϕi 2n

2n

,

i+ 1 2n

=

n

n φ−1 2−k ϕ−1 2−j .Ij,k

j,k =1

where n Ij,k

=

n 2 −1

i=0

% & i+1 i − cos 2j +1 π n . sin 2πi2k −n cos 2j +1 π n 2 2

n n Trigonometric exercises show that Ij,k = 0 if j = k, and that Ij,j ≥ 2j −2 . As φ−1 and ϕ−1 are positive, we therefore obtain that n 2 −1

i=0

1 j −1 −j −1 −j 2 ϕ 2 . 2 φ 4 j =1 n

xφi y ϕi 2n

2n

, i2+n1

≥

(iii) Just take f, g = xφ , xϕ with φ (x) = xp log x1 and ϕ (x) = xq log x1 . We therefore showed that the sets Cop-var ([0, T ], G[p] Rd ), Co0,p-var ([0, T ], 1/p-H¨o l 0,1/p-H¨o l G[p] Rd ), Co ([0, T ], G[p] Rd ) and Co ([0, T ], G[p] Rd ) are quite fundamental! We therefore give their elements names: Deﬁnition 9.15 (i) A weak geometric p-rough path is a continuous path of ﬁnite p-variation with values in the free nilpotent group of step [p] over Rd , i.e. an element of C p-var ([0, T ], G[p] Rd ). (ii) A geometric p-rough path is a continuous path with values in the free nilpotent group of step [p] over Rd which is in the p-variation closureof the set of bounded variation paths, i.e. an element of C 0,p-var ([0, T ], G[p] Rd ). (iii) A weak geometric 1/p-H¨older rough path is a 1/p-H¨older path with values in the free nilpotent group of step [p] over Rd , i.e. an element of

196

Geometric rough path spaces

C 1/p-H¨o l ([0, T ], G[p] Rd ). (iv) A geometric 1/p-H¨older rough path is a continuous path with values in the free nilpotent group of step [p] over Rd which is in the 1/p-H¨older closure older paths, i.e. an element of C 0,1/p-H¨o l ([0, T ], G[p] d of the set of 1-H¨ R ). Recall from the interpolation results of the previous chapter that ⊂ C 0,p-var ([0, T ], G[p] Rd C (p+ ε)-var ([0, T ], G[p] Rd ⊂ C p-var ([0, T ], G[p] Rd and for this reason the diﬀerence between weak and genuine geometric p-rough paths is important only when we care about very precise results. Exercise 9.16 Identify the G2 R2 with the 3-dimensional Heisenberg group H ∼ = R3 . Verify that the “pure-area path” (0, 0; t) is a weak geometric 2-rough path but not a genuine 2-rough path. ⊗n . Show that t → exp (t g) ⊂ Exercise 9.17 Assume g ∈ gn Rd ∩ Rd older geometric n-rough path and compute explicitly Gn Rd is a weak H¨ SN (exp ((·) g)) ⊂ GN Rd , n ≤ N. Solution. Claim that SN (expn (· g))0,t = expN (tg) where we write expk for the exp-map in T k Rd in order to distinguish from expn and expN . To see this, note that d (expN (s g) , expN (t g)) = expN (−sg) ⊗ expN (tg) and from the Campbell–Baker–Hausdorﬀ formula, this clearly equals 1/n 1/n expN ((t − s) g) , which is bounded by a constant times (t − s) |g| . older and by uniqueness of the Lyons lift It follows that expN (· g) is 1/n-H¨ the claim is proved.

9.3 Invariance under Lipschitz maps The content of this section is not used directly in the sequel and depends on techniques of the forthcoming Section 10.6 on rough integration. The probabilistic motivation here is the fact that Φ ◦ M , the image of a semimartingale M under a C 2 -map Φ, is again a semi-martingale; a manifest consequence of Itˆo’s lemma. Having deﬁned what wemean by a weak geometric p-rough path, say x ∈ C p-var [0, T ] , G[p] Rd , it is natural to ask whether the (yet to be

9.4 Young pairing of weak geometric rough paths

197

deﬁned!) image of x under a suﬃciently smooth map Φ : Rd → Re is also a weak geometric p-rough path. The following result can then be summarized by saying that the image of a weak geometric p-rough path under a Lipγlo c map, γ > p, is indeed another weak geometric p-rough path. Theorem 9.18 Assume d [p] R ; (i) x ∈ C p-var [0, T ] , G (ii) Φ ∈ Lipγlo c Rd , Re , γ > p. Then there exists a unique continuous (in fact, uniformly continuous on bounded sets) map Φ∗ : C p-var [0, T ] , G[p] Rd → C p-var [0, T ] , G[p] (Re ) with the property that, whenever x = S[p] (x) for some x ∈ C 1-var [0, T ] , Rd then Φ∗ x = S[p] (Φ ◦ x) . Proof. The proof relies on the forthcoming Theorem d e 10.47 in Section 10.6. R , R satisﬁes the assumpIndeed, ϕ := DΦ = (∂1 Φ, . . . , ∂d Φ) ⊂ Lipγlo−1 c tion of that theorem, so that · Φ∗ x := ϕ (xs ) dxs (rough integral) 0

is a well-deﬁned geometric p-rough path; more precisely, an element of C p-var [0, T ] , G[p] (Re ) and the rough integral Φ∗ x is a (uniformly) continuous (on bounded sets) function of the integrator x in p-variation met“push-forward” behaviour whenever ric. To see that Φ∗ x has the claimed x = S[p] (x) for some x ∈ C 1-var [0, T ] , Rd it suﬃces to note that, by the fundamental theorem of calculus, t Φ (x)0,t = ϕ (xs ) dxs (classical Riemann–Stieltjes integral). 0

By the basic consistency properties of a rough integral Φ∗ x is the precisely the step-[p] lift of the indeﬁnite Riemann–Stieltjes integral · ϕ (xs ) dxs 0

and so Φ∗ x = S[p] (Φ ◦ x) as was claimed.

9.4 Young pairing of weak geometric rough paths Throughout this section, we ﬁx p > q ≥ 1 such that q −1 +p−1 > 1. Observe that this implies q ∈ [1, 2).

Geometric rough path spaces

198

9.4.1 Motivation

Consider a path x ∈ C p-var [0, T ] , Rd . It is natural, e.g. in the context of diﬀerential equations with drift term, to consider the “space-time” path t → (x (t) , t), plainly an element of C p-var [0, T ] , Rd+1 . It can also be important to replace x by x + h where h is a suitable perturbation.3 Let us now move to a genuine geometric rough path setting and consider x ∈ C p-var [0, T ] , G[p] Rd . Recall the intuition that x contains the a priori information of up to [p] iterated integrals which are not deﬁned, in general, in Riemann–Stieltjes or Young sense; there is, however, enough regularity built into the deﬁnition of such a geometric p-rough path that higher iterated integrals, i.e. beyond level [p], are canonically deﬁned, as was seen in our discussion of the Lyons-lifting map.

Now, even if x = π 1 (x) has not suﬃcient

regularity

to form the integral x ⊗ dx, one surely can form the integral xdt or x ⊗ dh for suﬃciently regular h, e.g. when the last integral is a well-deﬁned Young integral as discussed in Section 6.2. In particular, one would hope that, given a suﬃciently regular h : [0, T ] → Rd there is a canonically deﬁned geometric rough path, say S[p] (x, h), with values in G[p] Rd ⊕ Rd which coincides with S[p] (z)

where z = (x, h) : [0, T ] → Rd ⊕ Rd whenever x = S[p] (x) for some nice path x. By the same token, given a suﬃciently regular h : [0, T ] → Rd one would hope that there is a canonically deﬁned geometric rough path, say Th (x), with values in G[p] Rd which coincides with S[p] (x + h) whenever x = S[p] (x) for some nice path x. Moreover, such constructions should be robust, for example so that x → Th (x) is continuous in (e.g. p-variation) rough path distance. When p ∈ (2, 3), so that x takes values in G2 Rd , the reader will have no diﬃculties in deriving such results by making use of the Young–L´ oeve estimates of Section 6.1. The remainder of this chapter is devoted to handling the general case.

9.4.2

The space C (p,q)-var [0, T ] , Rd ⊕ Rd

We recall from Section 7.5.6 that for ﬁxed λ ∈ R, the dilation map δ λ : GN Rd → GN Rd is the unique group homomorphism which extends scalar mutiplication a ∈ Rd → λa ∈ Rd . Similarly, for ﬁxed γ 1 , γ 2 ∈ R, the map (a, b) ∈ Rd ⊕ Rd → (γ 1 a, γ 2 b) ∈ Rd ⊕ Rd lifts to a group homomorphism δ γ 1 ,γ 2 : GN Rd ⊕ Rd → GN Rd ⊕ Rd . (Elements in GN Rd ⊕ Rd arise as the step-N signature of a path in

Rd ⊕ Rd . Scaling the ﬁrst d (resp. last d ) coordinates of the path by γ 1 3 For

example, when adding a Cameron–Martin path to Brownian motion.

9.4 Young pairing of weak geometric rough paths

199

(resp. γ 2 ) gives rise precisely to the δ γ 1 ,γ 2 -dilation of the original signature.) Observe that almost by deﬁnition of δ γ 1 ,γ 2 , we have for a path (x, h) ∈ 1-var d 1-var d [0, 1] , R × C C [0, 1] , R , δ γ 1 ,γ 2 SN (x ⊕ h)0,1 = SN (γ 1 x ⊕ γ 2 h)0,1 . is the set of continuous GN Rd Noting that C p-var [0, T ] , GN Rd valued paths x such that for some control function ω (e.g. (s, t) → xp-var;[s,t] ) one has δ 1 < ∞, sup x s,t 1/p ω (s ,t )

s< t in [0,T ]

we are led to the following deﬁnition.

Deﬁnition 9.19 We say that a continuous path x ∈ C [0, T ] , GN Rd ⊕ Rd is of ﬁnite mixed (p, q)-variation, and write x ∈ C (p,q )-var [0, T ] , GN Rd ⊕ Rd if, for some control ω, xp,q -ω ;[0,T ] := sup δ 1 1 / p , 1 1 / q (xs,t ) < ∞. ω (s ,t )

s< t in [0,T ]

ω (s ,t )

(A convention of type 0/0 = 0 is in place, to deal with s < t such that ω (s, t) = 0.) If we can take ω (s, t) = |t − s| , we say that x is p1 , 1q 1 1 . H¨ older and write x ∈ C ( p , q )-H¨o l [0, T ] , GN Rd ⊕ Rd (p,q )-var As usual, we write Co [0, T ] , GN Rd ⊕ Rd , etc. if we only con sider paths started at o, the unit element in GN Rd ⊕ Rd . Deﬁnition 9.20 For a pair of controls ω 1 , ω 2 , we also deﬁne 1 1 sup (x ) xp,q -ω 1 ,ω 2 ;[0,T ] = δ s,t , ω 1 (s ,t )1/ p , ω 2 (s ,t )1/ q s< t in [0,T ]

, and, for two paths x1 , x2 ∈ C (p,q )-var [0, T ] , GN Rd ⊕ Rd 1 2 ρp,q -ω 1 ,ω 2 ;[0,T ] x , x = 1 2 1 1 1 1 sup δ xs,t − δ xs,t . , , ω 1 (s ,t )1/ p ω 2 (s ,t )1/ q ω 1 (s ,t )1/ p ω 2 (s ,t )1/ q s< t in [0,T ]

Exercise 9.21 Let x ∈ C

(p,q )-var

[0, T ] , G

N

d

R ⊕R d

and assume ω 1

and ω 2 are control functions. Show that, for all s < t in [0, T ] , 1/p 1/q . xs,t ≤ c xp,q -ω 1 ,ω 2 ;[0,T ] ω 1 (s, t) + ω 2 (s, t)

Geometric rough path spaces

200

9.4.3 Quantitative bounds on SN

For (x, h) ∈ C 1-var [0, T ] , Rd ⊕ Rd , we now aim to show that for every N ∈ {1, 2, . . . }, SN (x ⊕ h)s,t ≤ (const) × S[p] (x)p-var;[s,t] + |h|q -var;[s,t] . As in earlier chapters, the constant here depends only on the p-variation of x and the q-variation of h, allowing for a subsequent passage to the limit. The argument is similar to proving the Lyons-lift estimate SN (x)s,t ≤ (const) × S[p] (x)p-var;[s,t] , although in the latter the case N ∈ {1, . . . , [p]} is trivial. To start, we need a lemma replacing the use of geodesics. Lemma 9.22 Let N ≥ 1, and (x, h) ∈ C 1-var [s, t], Rd × C 1-var [s, t], Rd such that, for ﬁxed α, β > 0, SN +1 (αx)s,t + SN (αx ⊕ βh)s,t ≤ C1 . Then there exists a path (xs,t , hs,t ) such that (i) SN (x ⊕ h)s,t = SN xs,t ⊕ hs,t s,t SN +1 (x)s,t = SN +1 xs,t s,t ;

(9.8) (9.9)

(ii) there exists a constant CN which depends only on C1 and N such that

t

α

s,t dxu + β

s

t

s,t dhu ≤ CN .

s

Proof. Observe that if SN (x ⊕ h)s,t = SN (xs,t ⊕ hs,t )s,t , then SN (αx ⊕ βh)s,t = SN (αxs,t ⊕ βhs,t )s,t and SN +1 (αx)s,t = SN +1 (αxs,t )s,t . Hence, we can assume without loss of generality that α = β = 1. By deﬁnition of the Carnot–Caratheodory homogenous norm, there exist two paths x1,s,t , h1,s,t such that SN (x ⊕ h)s,t = SN x1,s,t ⊕ h1,s,t s,t , with

s

t

1,s,t dxu +

s

t

1,s,t dhu ≤ c1 SN (x ⊕ h)s,t .

9.4 Young pairing of weak geometric rough paths

201

−1 Then, deﬁne g = SN +1 x1,s,t s,t ⊗ SN +1 (x)s,t , and observe that

g

≤

1,s,t dxu + SN +1 (x)s,t s c3 SN (x ⊕ h)s,t + SN +1 (x)s,t

≤

c4 .

t

≤ c2

Deﬁne a path x2,s,t such that SN +1 x2,s,t s,t = g, with

t

2,s,t dxu = g ≤ c4 .

s

We have

and

SN +1 x1,s,t s,t ⊗ SN +1 x2,s,t s,t = SN +1 (x)s,t SN x1,s,t ⊕ h1,s,t ⊗ SN x2,s,t ⊕ 0 = SN (x ⊕ h)s,t .

Therefore, concatenating x1,s,t ⊕ h1,s,t and x2,s,t ⊕ 0 gives us a path that satisﬁes the required conditions of the lemma. We then need a slight generalization of Lemma 9.2. Lemma 9.23 Let x1 ⊕h1, x2 ⊕h2 be two paths in C 1-var [s, u], Rd ⊕Rd . Assume that (i) SN x1 ⊕ h1 s,u = SN x2 ⊕ h2 s,u , 1 2 (ii) SN +1 x s,u = SN +1 x s,u , u u

u

u (iii) s dx1r + s dx2r ≤ 1 and s dh1r + s dh2r ≤ 2 . Then, there exists a constant C = C (N ) such that N +1 +1−k k N 2 . SN +1 x1 ⊕ h1 s,u − SN +1 x2 ⊕ h2 s,u ≤ C 1 k =1

Proof. Working as in Lemma 9.2, we can assume without loss of generality that x2 ⊕ h2 = 0, and (s, u) = (0, 1) . Deﬁne for convenience y = x1 ⊕ h1 . We have, by deﬁnition of SN +1 and the triangle inequality SN +1 (y)s,u − 1 ≤

i 1 ,...,i N + 1

i1 iN + 1 dyr 1 . . . dyr N + 1 . s< r 1 < ...< r N + 1 < u

202

Geometric rough path spaces

Because SN +1 x1 s,u = 1, we have SN +1 (y)s,u − 1 ≤ i 1 ,...,i N + 1 {i 1 ,...,i N + 1 }∩{d+1,...,d+ d } = 0

i1 iN + 1 dyr 1 . . . dyr N + 1 . s< r 1 < ...< r N + 1 < u

i By (iii), s< r 1 < ...< r N + 1 < u dyri 11 . . . dyrNN ++ 11 is bounded by a constant times +1−k k N 2 , where k is the cardinal of {i1 , . . . , iN +1 } ∩ {d + 1, . . . , d + d }. 1 That concludes the proof. We can now generalize lemma 9.3, to give a quantitative estimate on SN (x ⊕ h)p-var for N ≥ 1. Lemma 9.24 Let (x, h) be a path in C 1-var [0, T ] , Rd ⊕ Rd , and p q ω 1 = S[p] (x)p-var;[.,.] , ω 2 = |h|q -var;[.,.] .

For all N ≥ 1, there exists a constant C = C (N, p, q) (and not depending on the 1-variation norm of x or h) such that (9.10) SN (x ⊕ h)p,q -ω 1 ,ω 2 ;[0,T ] ≤ C. Proof. Deﬁne (HN ): for all paths x ⊕ h ∈ C 1-var [0, T ] , Rd ⊕ Rd with S[p] (x) and |h|q -var;[0,T ] bounded above by 1, we have for some p-var;[0,T ] constant cN , for all s < t in [0, T ] , SN (x ⊕ h)p,q -ω 1 ,ω 2 ;[0,T ] ≤ cN . First observe that, for N = 1, we have for s < t ∈ [0, T ] , hs,t xs,t ⊕ ≤ 2, 1/q ω 1 (s, t)1/p ω 2 (s, t) i.e. (H1 ) is satisﬁed. Then, to prove that (HN ) implies for all x ⊕ h ∈ notice it is enough C 1-var [0, T ] , Rd ⊕ Rd with S[p] (x)p-var;[0,T ] and |h|q -var;[0,T ] bounded above by 1, we have SN (x ⊕ h)0,T ≤ cN +1 . Indeed, for an arbitrary x ⊕ h ∈ C 1-var [0, T ] , Rd ⊕ Rd , applying the x h would imply that , above to S [ p ] (x) p - v a r ; [ 0 , T ] |h|q - v a r ; [ 0 , T ] h x ⊕ SN ≤ cN . S[p] (x) |h| q -var;[0,T ] p-var;[0,T ]

9.4 Young pairing of weak geometric rough paths

203

By time change, the above would also hold for all s, t, when replacing [0, T ] by [s, t] . This is precisely saying that SN (x ⊕ h)p,q -ω 1 ,ω 2 ;[0,T ] ≤ C. Let us therefore ﬁx some path x ⊕ h ∈ C 1-var [0, T ] , Rd ⊕ Rd with S[p] (x) and |h|q -var;[0,T ] less than 1. We deﬁne the control ω by p-var;[0,T ] ω (s, t) =

1 q S[p] (x)p + |h| q -var;[0,T ] , p-var;[0,T ] 2

which satisﬁes ω (0, T ) ≤ 1. The induction the hypothesis tells us that h x SN ≤ c1 . ⊕ 1/p 1/q ω (s, t) ω (s, t) s,t x ≤ c2 . If If N + 1 ≤ [p] , the hypothesis tells us that SN +1 ω (s,t) 1 / p s,t x ≤ c2 . N + 1 > [p] , then Theorem 9.5 tells us that S N +1 ω (s,t) 1 / p s,t

s,t

Hence, following Lemma 9.22, we can deﬁne the path x that SN (x ⊕ h)s,t = SN xs,t ⊕ hs,t s,t SN +1 (x)s,t = SN +1 xs,t s,t and such that

1 ω (s,t) 1 / p

t s

|dxs,t u |+

⊕ hs,t , such

t 1 |dhs,t u | is bounded ω (s,t) 1 / q s t,u t,u s,u s,u

above by a

constant c2 . Deﬁne similarly the paths x , h , x , h , and deﬁne xs,t,u (resp. hs,t,u ) to be the concatenation of xs,t and xt,u (resp. hs,t and ht,u ). Then, deﬁne Γs,t = SN +1 (x ⊕ h)s,t − SN +1 xs,t ⊕ hs,t s,t . Working as in Proposition 9.3, |Γs,u − (Γs,t + Γt,u )| ≤ SN +1 xs,t,u ⊕ hs,t,u s,u − SN +1 (xs,u ⊕ hs,u )s,u , which we bound using Lemma 9.23: |Γs,u − (Γs,t + Γt,u )| ≤ c3

N +1

ω (s, u)

k =1

≤

c4 ω (s, u)

θ

N + 1 −k p

k

ω (s, u) q

204

Geometric rough path spaces

where θ = min1≤k ≤N +1 N +1−k + kq > 1. Using, as in the proof of Prop postion 9.3, that x and h are actually of bounded variation, we obtain using the (analysis) Lemma 6.2 that for all s, t, θ

|Γs,t | ≤ c5 ω (s, t) . We then deduce from the triangle inequality that π N +1 SN +1 (x ⊕ h)0,T ≤ c6 , which concludes the proof.

9.4.4 Deﬁnition of Young pairing map We deﬁne of (i) pN to be the extension of the projection onto the ﬁrst d coordinates Rd ⊕ Rd to a homomorphism from GN Rd ⊕ Rd onto GN Rd ; (ii) p N to be the extension of the projection of onto the last d coordinates d d N d d N d R ⊕R onto G R . R ⊕ R to a homomorphism from G

Deﬁnition 9.25 Let (x, h) ∈ Cop-var [0, T ] , G[p] Rd ×Coq -var [0, T ] , Rd . such that pN (z) = SN (x) A path z in Cop,q -var [0, T ] , GN Rd ⊕ Rd and p N (z) = SN (h) is said to be a (p, q)-Lyons lift or Young pairing of (x, h) of order N . We shall see in the following theorem that such a Young pairing is unique, and will denote it by SN (x, h) or SN (x ⊕ h). Theorem 9.26 Let (x, h) ∈ Cop-var [0, T ] , G[p] Rd ×Coq -var [0, T ] , Rd , and N ≥ 1. Then there exists a unique (p, q)-Lyons lift of order N of p-var (x, h). By writing SN (x, h) for this path, we deﬁne the map SN on Co d [p] q -var d [0, T ] , G R × Co [0, T ] , R . Moreover, if ω 1 = S[p] (x)p-var;[.,.] , ω 2 = |h|q -var;[.,.] , then, for some constant C = C (N, p, q), SN (x, h)p,q -ω 1 ,ω 2 ;[s,t] ≤ C. Proof. The existence of such a lift follows the same lines as the homogenous case, but using Lemma 9.24 rather than Lemma 9.3. Uniqueness follows from the following lemma. Lemma 9.27 Let y and z be two elements of Cop,q -var [0, T ], GN Rd ⊕Rd such that pN (z) = pN (y) and p N (z) = p N (y). Then we have z = y.

9.4 Young pairing of weak geometric rough paths

205

Proof. Let (HN ) be the induction hypothesis that the lemma holds true for level-N paths, i.e. as written in the statement. For N = 1, there is nothing to prove, hence (H1 ) is true. Assume now that (HN −1 ) is true, and p q let us prove that (HN ) is true. Deﬁne ω 1 = yp-var;[.,.] , ω 2 = zq -var;[.,.] .

We ﬁx e1 , . . . , ed a basis of Rd and ed+1 , . . . , ed+ d a basis of Rd , so that e1 , . . . , ed+ d is a basis of Rd ⊕ Rd . ⊗N ∩ gN Rd ⊕ Rd -valued path f by Deﬁne the Rd ⊕ Rd f (t) = log z−1 t ⊗ yt . With f (s, t) = f (t) ≡ f (s) we have exp (f (s, t)) = z−1 s,t ⊗ ys,t ; we may also write f (t) =

fi 1 ,...,i N (t) e1 ⊗ . . . ⊗ ed+ d .

1≤i 1 ,...,i N ≤d+d

0 if {i1 , . . . , iN } ∩

Hypothesis (HN −1 ) implies that fi 1 ,...,i N = {d + 1, . . . , d + d } = ∅ and from Lemma 7.61, |fi 1 ,...,i N (s, t)| ≤ c1 ω 1 (s, t)

N −a p

α

ω 2 (s, t) q ,

where a is the cardinal of the set {i1 , . . . , iN } ∩ {d + 1, . . . , d + d } . Set ω = ω 1 + ω 2 so that |fi 1 ,...,i N (s, t)| ≤ c1 [2ω (s, t)]

N −a p

+ aq

.

But for all i1 , . . . , iN such that the cardinal of {i1 , . . . , iN }∩{d+1, . . . , d+d } is greater than or equal to 1, N p−a + aq > 1, which implies that for the respective component fi 1 ,...,i N is of ﬁnite r-variation for some r < 1, i.e. it is equal to fi 1 ,...,i N (0) = 0.

9.4.5 Modulus of continuity for the map SN We go quickly in this section as there are no new ideas required. First, we need to generalize Lemmas 9.22 and 9.23 to handle the “diﬀerence of paths”. Lemma 9.28 Let N ≥ 1, and xi , hi i=1,2 be two paths in C 1-var [s, t], Rd × C 1-var [s, t], Rd such that, for ﬁxed α, β > 0, SN +1 (αx)s,t + SN (αx ⊕ βh)s,t ≤ C.

Geometric rough path spaces

206

Then there exist paths xi,s,t , hi,s,t : i = 1, 2 such that (i) we have SN xi ⊕ hi s,t = SN xi,s,t ⊕ hi,s,t s,t SN +1 xi s,t = SN +1 xi,s,t s,t ; (ii) there exists a constant CN depending only on C and N such that t t i,s,t i,s,t dxu + β dhu ≤ CN ; α s

s

(iii) with ε = SN +1 αx1 s,t − SN +1 αx2 s,t + SN αx1 ⊕ βh1 s,t − SN αx2 ⊕ βh2 s,t , we have

α

t

1,s,t dxu − dx2,s,t +β u

s

t

1,s,t dhu − dh2,s,t ≤ CN ε. u

s

Proof. As in the proof of Lemma 9.22,we can assume α = β = 1. Using Proposition 7.64, there exist two paths xi,1,s,t , hi,1,s,t i=1,2 such that SN xi ⊕ hi s,t = SN xi,1,s,t ⊕ hi,1,s,t s,t , with

t

s

i,1,s,t dxu +

t

i,1,s,t dhu ≤ c1 ,

s

t 1,1,s,t 1,1,s,t dxu dhu − dxu2,1,s,t + − dhu2,1,s,t s s i i ≤ c1 SN x ⊕ h s,t − SN xi ⊕ hi s,t . t

−1 Then, deﬁne g i = SN +1 xi,1,s,t s,t ⊗SN +1 xi s,t . Observe that g i ≤ c2 . Then, from Lemma 9.1 1 g − g 2 = SN +1 x1,1,s,t s,t −SN +1 x2,1,s,t s,t − SN +1 x1 s,t −SN +1 x2 s,t ≤ SN +1 x1 s,t − SN +1 x2 s,t + SN +1 x1,1,s,t s,t − SN +1 x2,1,s,t s,t ≤ c3 ε using Proposition 7.63. Using Proposition 7.64 one more time, we deﬁne the paths xi,2,s,t i=1,2 by SN +1 xi,2,s,t s,t = g i ,

9.4 Young pairing of weak geometric rough paths

with

i,2,s,t dxu

=

i g ≤ c2

1,2,s,t dxu − dxu2,2,s,t

=

1 g − g 2 ≤ c3 ε.

t

s

t

207

s

Concatenating the paths xi,1,s,t ⊕ hi,1,s,t and xi,2,s,t ⊕ 0 gives us two paths that satisfy the required conditions of the lemma. We leave the proof of the next lemma, extending Lemma 9.23, to the reader. ˜i ˜i , h be four pairs in C 1-var Lemma 9.29 Let xi , hi i=1,2 , x i=1,2 [s, u] , Rd ⊕ Rd . Assume that ˜1 ˜2 ˜1 ⊕ h ˜2 ⊕ h (i) SN x1 ⊕ h1 s,u =SN x2 ⊕h2 s,u and SN x =SN x ; s,u s,u 1 2 1 2 ˜ s,u = SN +1 x ˜ s,u ; (ii) SN +1 x s,u = SN +1 x s,u and SN +1 x (iii) u u u u 1 2 1 2 dxr + dxr ≤ 1 and dhr + dhr ≤ 2 , s s s s u u u u 1 2 ˜1 ˜2 d˜ d˜ xr + xr ≤ 1 and dhr + dhr ≤ 2 ; s

(iv)

s

u

s

1 dxr − d˜ x1r +

u

s

2 dxr − d˜ x2r ≤ ε1 and

s s u u 1 2 1 ˜ + ˜ 2 ≤ ε2 . dh − d h dhr − dh r r r s

s

Then, SN +1 x1 s,u − SN +1 x2 s,u − SN +1 x3 s,u − SN +1 x4 s,u ≤ Cε

N +1

+1−k k N 2 . 1

k =1

Similar to the proof of Theorem 9.10 we are also led to

Theorem 9.30 Let 1 ≤ q ≤ p so that 1/p + 1/q > 1. Assume xi , hi i=1,2 are two pairs of elements in Cop-var [0, T ] , G[p] Rd × Coq -var [0, T ] , Rd , and ω a control such that for all s, t ∈ [0, T ] , for i = 1, 2, i p q x + hi q -var;[s,t] ≤ ω (s, t) p-var;[s,t] ≤ ε. ρp-ω x1 , x2 + ρq -ω h1 , h2

Geometric rough path spaces

208

Then, for all N ≥ 1, there exists a constant C depending only on N , p, and q such that ρp,q -ω SN x1 , h1 , SN x2 , h2 ≤ Cε. Corollary 9.31 Let ω be a control, 1 ≤ q ≤ q , 1 ≤ p ≤ p , 1/p + 1/q > 1.Then, for ﬁxed R > 0, the maps4 xp-ω ≤ R , dp -ω × |h|q -ω ≤ R , dq -ω , dp -ω → C p-ω [0, T ] , G[p] Rd ⊕ Rd (x, h)

→

and xp-var ≤ R , dp -var

× →

(x, h) → and

xp-var ≤ R , d∞

× →

(x, h) →

SN (x, h)

|h|q -var ≤ R , dq -var , dp -var C p-var [0, T ] , G[p] Rd ⊕ Rd SN (x, h)

|h|q -var ≤ R , d∞ , d∞ C p-var [0, T ] , G[p] Rd ⊕ Rd SN (x, h)

are uniformly continuous. Proof. A consequence of the previous theorem and an interpolation argument. Remark 9.32 As a typical application, we see that the Young pairing (x, h) → SN (x, h) is also continuous in the sense of “uniform convergence with uniform bounds”. Indeed, take any sequence of paths (xn , hn ) ∈ C 1-var [0, T ] , Rd such that < ∞ sup S[p] (xn )p-var;[0,T ] + |hn |q -var;[0,T ] n lim d∞ S[p] (xn ) , x + d∞ S[p] (hn ) , h = 0. n →∞

Then, by Theorem 9.26 and the last part of the previous corollary above it, it follows that sup S[p] (xn ⊕ hn )p-var;[0,T ] < ∞, n lim d∞ S[p] (xn ⊕ hn ) , S[p] (x ⊕ h) = 0. n →∞

4d q -ω

h 1 , h 2 = h 1 − h 2 q -ω .

9.4 Young pairing of weak geometric rough paths

209

9.4.6 Translation of rough paths

In Section 7.5.6, we deﬁned the map plus from GN Rd ⊕ Rd to GN Rd d to be the unique homomorphism such that d for d all x, y ∈ R , N plus (exp (x ⊕ y)) = exp (x + y) . If x is a G R ⊕ R -valued path, we N d can therefore deﬁne d the G R -valued path plus (x) : t ∈ [0, T ] → N plus (xt ) ∈ G R . When x is the weak geometric rough path equal to S[p] (y ⊕ h) , where y is a weak geometric p-rough path and h a weak geometric q-rough path, plus (x) is then a canonical notion of addition of two paths. Theorem 9.33 Let (x, h) ∈ C p-var [0, T ] , G[p] Rd × C q -var [0, T ] , Rd . The translation of x by h, denoted Th (x) ∈ C p-var [0, T ] , G[p] Rd , is deﬁned by Th (x)t = plus S[p] (x ⊕ h)t . (i) We have for some constant C1 depending only on p and q, Th (x)p-var;[0,T ] ≤ C1 xp-var;[0,T ] + |h|q -var;[0,T ] .

(9.11)

(ii) Let xi , hi i=1,2 ∈ C p-var [0, T ] , G[p] Rd × C q -var [0, T ] , Rd , and ω a control. If we have for all s, t ∈ [0, T ], i p q x + hi ≤ ω (s, t) , p-var;[s,t]

q -var;[s,t]

then 1/q −1/p ρp-ω Th 1 x1 , Th 2 x2 ≤ C2 ρp-ω x1 , x2 + ω (0, T ) ρq -ω h1 , h2 for some constant C2 depending only on p and q. i variation and Remark 9.34 If xi =S[p] xi where xi is of bounded i if h i i is also of bounded variation, then Th i (x) = S[p] h + x , i.e. Th i x is just the canonical lift of the sum of the paths xi and hi . i Proof. We ﬁrst prove the quantitative bound on Th i x p-var;[0,T ] Th i xi s,t

= ≤

plus S[p] xi ⊕ hi s,t c1 S[p] xi ⊕ hi s,t .

From Exercise 9.21, deﬁning p ω 1 = S[p] (x)p-var;[.,.] ,

q

ω 2 = |h|q -var;[.,.] ,

we have S[p] xi ⊕ hi s,t ≤ c2 S[p] xi ⊕ hi p,q -ω

1 ,ω 2

1/p

ω 1 (s, t)

1/q

+ ω 2 (s, t)

.

Geometric rough path spaces

210

From Theorem 9.26, S[p] xi ⊕ hi p,q -ω ,ω is bounded, which proves 1 2 (9.11). Then, for s, t ∈ [0, T ] , deﬁning εs,t = δ 1 1 / p Th 1 x1 s,t − δ 1 1 / p Th 2 ω (s ,t ) ω (s ,t ) 2 x s,t and using in the third line Proposition 7.65, we have |εs,t | = 1 2 1 2 δ 1 ω ( s , t ) 1 / p plus S[p] x ⊕ h s,t − δ ω ( s , t1) 1 / p plus S[p] x ⊕ h s,t 1 2 1 2 = plus δ 1 1 / p S[p] x ⊕ h s,t − plus δ 1 1 / p S[p] x ⊕ h s,t ω (s ,t ) ω (s ,t ) 1 1 2 2 ≤ c1 δ 1 1 / p S[p] x ⊕ h s,t − δ 1 1 / p S[p] x ⊕ h s,t ω (s ,t ) ω (s ,t ) 1/q −1/p 1 h c1 δ 1 1 / p , 1 1 / q S[p] x1 ⊕ ω (s, t) ω (s ,t ) ω (s ,t ) s,t ≤ . 1/q −1/p 2 2 −δ 1 1 / p , 1 1 / q S[p] x ⊕ ω (s, t) h ω (s ,t )

ω (s ,t )

s,t

Using Theorem 9.30, we then obtain |εs,t |

≤ c2 ρp-ω ≤ c2 ρp-ω

1/q −1/p 1 1/q −1/p 2 x1 , x2 + ρq -ω ω (s, t) h , ω (s, t) h 1 2 1/q −1/p x , x + ω (s, t) ρq -ω h1 , h2 .

Hence, as q < p, taking supremum over all s, t ∈ [0, T ] , we have 1/q −1/p ρp-ω Th 1 x1 , Th 2 x2 ≤ c2 ρp-ω x1 , x2 + ω (0, T ) ρq -ω h1 , h2 . As a corollary, interpolation provides us with the following uniform continuity on bounded sets result. Corollary 9.35 The rough path translation (x, h) → Th (x) as a map from Cop-var [0, T ] , G[p] Rd × Coq -var [0, T ] , Rd → Cop-var [0, T ] , G[p] Rd is uniformly continuous on bounded sets, using the dp-var -metric. This is also true as a map from × Co1/p-H¨o l [0, T ] , Rd Co1/p-H¨o l [0, T ] , G[p] Rd → Co1/p-H¨o l [0, T ] , G[p] Rd . Exercise 9.36 Assume T−x n (x) → 0. Show that this is, in general, not equivalent to S2 (xn ) → x and neither implies the other.

9.5 Comments

211

Exercise 9.37 This exercise will demonstrate (again!) the power of pvariation estimates in the sense that they immediately imply non-trivial estimates in terms of H¨ older and Besov norm. Recall from Example 5.16 that for δ ∈ (1/2, 1] and q = 1/δ, δ −1/2

|h|q -var;[s,t] ≤ (const) × |h|W δ , 2 -var;[s,t] |t − s|

.

Assume x ∈ C α -H¨o l [0, T ] , G[1/α ] Rd , h ∈ W δ ,2 [0, T ] , Rd with α ∈ (1/4, 1/2) and δ := α + 1/2. Show that Th (x)α -H¨o l;[s,t] ≤ (const) × xα -H¨o l;[s,t] + |h|W δ , 2 ;[s,t] . Explain the restriction α > 1/4. Solution. Set q = 1/δ = 1/ (α + 1/2) and p = 1/α so that h ∈ C q -var and x ∈ C p-var . To apply the above corollary we need 1/p + 1/q > 1 ⇐⇒ α + (α + 1/2) > 1, which explains the restriction α > 1/4. The actual estimate then immediately follows from Th (x)s,t

≤

c xp-var;[s,t] + c |h|q -var;[s,t]

≤

c xα -H¨o l;[s,t] |t − s| + c |h|W δ , 2 ;[s,t] |t − s| .

α

α

9.5 Comments The main results of this chapter can be found in Lyons [116], see also Lyons and Qian [120] and Lyons et al. [123], although some of our proofs are new. Let us note that the estimate for the Lyons lift (Theorem 9.5) can be made independent of N , a consequence of Lyons’ “neo-classical inequality”, see Lyons [116] and, for a sharpened version, forthcoming work by Hara and Hino. The necessity to distinguish between geometric rough paths and weak geometric rough paths was recognized in Friz and Victoir [63]. Exercise 9.37 is taken from Friz and Victoir [64].

10 Rough diﬀerential equations Our construction of Young’s integral was based on estimates for classical Riemann–Stieltjes integrals with constants depending only on the p- and q-variation of integrand and integrator respectively, followed by a limit argument. The same approach works for ordinary diﬀerential equations: in this chapter we establish estimates for ordinary diﬀerential equations with constants only depending on a suitable p-variation bound of the driving signal. A limiting procedure then leads us naturally to “rough diﬀerential equations”.

10.1 Preliminaries As was pointed out in Section 7.1, for a ﬁxed starting time s, a natural step-N approximation for the solution of the ODE dy = V (y) dx =

d

Vi (y) dxi , ys ∈ Re ,

i=1

is given by

yt ≈ ys + E(V ) y, SN (x)s,t

(10.1)

where Deﬁnition 10.1 (Euler scheme) Let N ∈ N. Given (N − 1) times con- tinuously diﬀerentiable vector ﬁelds V = (V1 , . . . , Vd ) on Re , g ∈ T (N ) Rd and y ∈ Re we call E(V ) (y, g) :=

N

Vi 1 . . . Vi k I (y) gk ,i 1 ,...,i k

k =1 i 1 ,...,i k ∈{1,...,d}

the (increment of the) step-N Euler scheme.

When g = SN (x)s,t , the (step-N ) signature of a path segment x|[s,t] , we call E(V ) ys , SN (x)s,t the (increment of the) step-N Euler scheme for dy = V (y) dx over the time interval [s, t].

10.1 Preliminaries

213

We now prove a simple error estimate for the step-N scheme. To this end, it is convenient to assume Lipschitz regularity of the vector ﬁelds in the sense of E. Stein. To prepare for the following deﬁnition, given a real γ > 0, we agree that %γ& is the largest integer strictly smaller than γ so that γ = %γ& + {γ} with %γ& ∈ N and {γ} ∈ (0, 1]. Deﬁnition 10.2 (Lipschitz map) A map V : E → F between two normed spaces E, F is called γ-Lipschitz (in the sense of E. Stein), in symbols V ∈ Lipγ (E, F ) or simply V ∈ Lipγ (E) if E = F , if V is %γ& times continuously diﬀerentiable and such that there exists a constant 0 ≤ M < ∞ such that the supremum norm of its kth derivatives, k = 0, . . . , %γ&, and the {γ}-H¨older norm of its %γ&th derivative are bounded by M . The smallest M satisfying the above conditions is the γ-Lipschitz norm of V , denoted |V |Lip γ . (It should be noted that LipN maps have (N − 1) bounded derivatives, with the (N − 1)th derivative being Lipschitz, but need not be N times continuously diﬀerentiable.) This deﬁnition applies in particular to a collection of vector ﬁelds V = (V1 , . . . , Vd ) on Re , which we can view as a map d e d Re into L R , R , equipped y → {a = a1 , . . . , ad → i=1 V (y) ai } from with operator norm. Saying that V ∈ Lipγ Re , L Rd , Re is equivalent to V1 , . . . , Vd ∈ Lipγ (Re ), but it is usually the γ-Lipschitz norm of V which comes up naturally in estimates. We are now ready to state a ﬁrst error estimate for the Euler approximation in (10.1). Proposition 10.3 (Euler ODE estimate) Let γ > 1, V = (Vi )1≤i≤d be a collection of vector ﬁelds in Lipγ −1 (Re ) and x ∈ C 1-var [s, t] , Rd . Then there exists a constant C = C (γ) such that γ t |dxr | . π (V ) (s, ys ; x)s,t − E(V ) ys , Sγ (x)s,t ≤ C |V |Lip γ −1 s

(10.2) Proof. At the cost of replacing x by |V |Lip γ −1 x and Vi by

1 |V |L i p γ −1

Vi , we

can and will assume that |V |Lip γ −1 = 1. We set n := %γ& and ﬁrst show that ys,t − E(V ) ys , Sn (x)s,t = [Vi 1 · · · Vi n I (yr 1 ) − Vi 1 · · · Vi n I (ys )] dxir11 · · · dxirnn . i 1 ,...,i n ∈{1,...,d}

s< r 1 < ···< r n < t

Rough diﬀerential equations

214

To this end, consider a smooth function f and note that for any k ≤ n − 1, Vi 1 · · · Vi k f ∈ C 1 . By iterated use of the change-of-variable formula (cf. Exercise 3.17), n −1 Vi 1 · · · Vi k f (ys ) dxir11 · · · dxirkk f (yt ) = f (ys ) +

+

k =1 i 1 ,...,i k ∈{1,...,d}

i 1 ,...,i k ∈{1,...,d}

s< r 1 < ···< r k < t

s< r 1 < ···< r n < t

Vi 1 · · · Vi n f (yr 1 ) dxir11 · · · dxirnn

and the claim follows from specializing to f = I, the identity function. Clearly, t t |ys,t | = V (y) dx ≤ c1 |dxr | . s

s

Lipγ −1 -regularity of the vector ﬁelds implies that Vi 1 · · · Vi n I (·) is H¨older continuous with exponent {γ} ≡ γ − n. Hence, for all r ∈ [s, t], t {γ } |dxr | |Vi 1 · · · Vi n I (yr ) − Vi 1 · · · Vi n I (ys )| ≤ c2 s

and after integration, using γ = n + {γ}, i1 in [V · · · V I (y ) − V · · · V I (y )] dx · · · dx i1 in r1 i1 in s r1 rn s< r 1 < ···< r n < t t γ |dxr | . ≤ c3 s

Summation over the indices ﬁnishes the estimate. Remark 10.4 The proof also showed that, keeping the notation n = %γ&, π (V ) (0, y0 ; x)0,T − E(V ) y0 , Sγ (x)0,T = [Vi 1 · · · Vi n I (yr 1 ) − Vi 1 · · · Vi n I (y0 )] dxir11 · · · dxirnn i 1 ,...,i k ∈{1,...,d}

=

0< r 1 < ···< r n < T

[Vi 1 · · · Vi n I (yr ) − Vi 1 · · · Vi n I (y0 )]

i 1 ,...,i k ∈{1,...,d}

0< r < T

dxir1

r < r 2 ···< r n < T

dxir22 · · · dxirnn

n ;i , . . . , i n

≡d (x r , T 1

)

(we underlined integration variables here) [V n (yr ) − V n (y0 )] d xnr,T . ≡ 0< r < T

10.2 Davie’s estimate

215

10.2 Davie’s estimate The main result in this section will require a quantitative understanding of (A) the diﬀerence of ODE solutions started at the same point, with different driving signals (but with common iterated integrals up to a given order); (B) the diﬀerence of ODE solutions started at diﬀerent points but with identical driving signals. This is the content of the following two lemmas. Lemma 10.5 (Lemma A) Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ), with γ > 1; (ii) s < u are some elements in [0, T ]; (iii) ys ∈ Re (thought of as a “time-s” initial condition); (iv) x and x ˜ are some paths in C 1-var [s, u] , Rd such that Sγ (x)s,u = x)s,u ; Sγ (˜ u

u x| . (v) ≥ 0 is a bound on |V |Lip γ −1 s |dx| + s |d˜ Then we have, for some constant C = C (γ) , π (V ) (s, ys ; x)s,u − π (V ) (s, ys ; x ˜)s,u ≤ Cγ . Proof. We do not give the most straightforward proof (which would be to insert the Euler approximation of order %γ& and use the triangle inequality), but provide a (still simple) proof that will be more instructive later on. By reparametrization of time, we can assume (s, u) = (0, 1) . Deﬁne the concatenation of x ˜ (1 − ·) and x (·), in symbols ← − z := x ˜ x, reparametrized so that z : [0, 1] → Rd . Then, π (V ) (0, y0 ; x)0,1 −π (V ) (0, y0 ; x ˜)0,1 = π (V ) (0, y0 ; x)1 − π (V ) (0, y0 ; x ˜)1 = π (V ) (0, π (V ) (0, y0 ; x ˜)1 ;z)1 −π (V )(0,y0 ;˜ x)1 = π (V ) (0, π (V ) (0, y0 ; x ˜)1 ; z)0,1 . By assumption (iv) and Chen’s theorem, −1

Sγ (z)0,1 = Sγ (˜ x)0,1 ⊗ Sγ (x)0,1 = 1. Hence, E(V ) ·, Sγ (z)0,1 ≡ 0 and the proof is ﬁnished with the ODE

Rough diﬀerential equations

216

Euler estimates from Proposition 10.3, π (V ) (0, ·; z)0,1 = π (V ) (0, ·; z)0,1 − E(V ) ·, Sγ (z) 0,1 % &γ 1 |dz| ≤ c1 |V |Lip γ −1 0

=

c1

|V |Lip γ −1

1

|dx| + 0

1

γ |d˜ x| ≤ c1 γ .

0

Lemma 10.6 (Lemma B) Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lip1 (Re ); (ii) t < u are some elements in [0, T ]; initial conditions); (iii) yt , y˜t ∈ Re (thought of as “time-t” (iv) x is a path in C 1-var [t, u] , Rd ;

u (v) ≥ 0 is a bound on |V |Lip 1 t |dx|. Then, if π (V ) (t, ·; x) denotes the unique solution to dy = V (y) dx from some time-t initial condition, we have π (V ) (t, yt ; x)t,u − π (V ) (t, y˜t ; x)t,u ≤ |yt − y˜t | . exp () . In particular, the ﬂow associated with dy=V (y) dx is Lipschitz continuous. Proof. This was seen in Theorem 3.8. Equipped with these two simple lemmas, and the technical Lemma 10.59 in Appendix 10.8, we are now ready to provide the crucial p-variation estimate of an ODE solution in terms of the p-variation of the driving signal. Lemma 10.7 (Davie’s lemma) Let γ > p ≥ 1. Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ); (ii) x is a path in C 1-var [0, T ] , Rd , and x := S[p] (x) is its canonical lift to a G[p] Rd -valued path; (iii) y0 ∈ Re is an initial condition. Then there exists a constant C1 depending on p, γ (and not depending on the 1-variation norm of x) such that for all s < t in [0, T ] , p p π (V )(0, y0 ; x) |V | ≤ C γ −1 xp-var;[s,t] ∨ |V| γ −1 xp-var;[s,t] . 1 Lip Lip p-var;[s,t] (10.3) Moreover, if xs,t ∈ C 1-var [s, t] , Rd is a path such that t s,t s,t dx ≤ K x Sγ x s,t = Sγ (x)s,t and (10.4) p-var;[s,t] s

for some K ≥ 1 then, for any time-s initial condition ys ∈ Re , γ π (V ) (s, ys ; x)s,t − π (V ) s, ys ; xs,t s,t ≤ C2 K |V |Lip γ −1 xp-var;[s,t] (10.5) where C2 depends on p and γ.

10.2 Davie’s estimate

217

Remark 10.8 In case of non-uniqueness we abuse notation in the sense that π (V ) (0, y0 ; x) resp. π (V ) (s, ys ; x) in the above estimates stands for any choice of ODE solution to dy = V (y) dx with the indicated initial conditions at times 0, s respectively. Remark 10.9 From Proposition 10.3, inequality (10.5) is equivalent to the Euler estimate γ π (V ) (s, ys ; x)s,t −E(V ) s, ys ; Sγ xs,t s,t ≤C3 |V |Lip γ −1 xp-var;[s,t] . (10.6)

Remark 10.10 A ﬁnite variation path (and even Lipschitz continuous) with the properties (10.4) always exists. Indeed, it suﬃces to take xs,t as geodesic associated with the element g = Sγ (x)s,t ∈ Gγ Rd , parametrized on the interval [s, t]. The length of this curve is precisely equal to g, the Carnot–Caratheodory norm of g ∈ Gγ Rd , and so s

t

s,t dx

= Sγ (x)s,t ≤ Sγ (x)p-var;[s,t] ≤ C(γ ,p) S[p] (x) p-var;[s,t]

where in the last step we used the estimates for the Lyons-lift map, Proposition 9.3, applicable as γ > p and so %γ& ≥ [p]. Let us also note that t t s,t dx ≤ K |dx| (10.7) s

since

t s

s

|dxs,t | ≤ K xp-var;[s,t] ≤ K x1-var;[s,t] = K |x|1-var;[s,t] .

Proof. The case p < γ < 2 is discussed in Exercises 10.12, 10.13. We assume here γ ≥ 2, so that dy = V (y) dx with time-s initial condition ys has a unique solution, denoted as usual by π (s, ys ; x). We deﬁne x = S[p] (x), and p ω (s, t) = K |V |Lip γ −1 xp-var;[s,t] .

t Thanks to s |dxs,t | ≤ K xp-var;[s,t] and an elementary ODE estimate (Theorem 3.4), we have 1 (10.8) π (V ) s, ys ; xs,t s,t ≤ c1 ω (s, t) p . Then, for all s < t in [0, T ], we deﬁne Γs,t = yt − π s, ys ; xs,t t = ys,t − π s, ys ; xs,t s,t .

Rough diﬀerential equations

218

Then, for ﬁxed s < t < u in [0, T ], we have Γs,u − Γs,t − Γt,u = −π (s, ys ; xs,u )s,u + π s, ys ; xs,t s,t + π t, yt ; xt,u t,u . Deﬁne xs,t,u to be the concatenation of xs,t and xt,u and set, for better readability, A := π s, ys ; xs,t,u s,u − π (s, ys ; xs,u )s,u B := π t, yt ; xt,u t,u − π t, π s, ys ; xs,t t ; xt,u t,u = π t, yt ; xt,u t,u − π t, yt − Γs,t ; xt,u t,u which then allows us to write Γs,u − Γs,t − Γt,u = A + B.

(10.9)

The term A is estimated by – nomen est omen – Lemma A, noting that t u u s,t,u s,t t,u dx = dx + dx ≤ 2K x p-var;[s,u ] . s

s

t

Similarly, Lemma B was tailor-made to estimate B and we are led to γ /p 1/p + c2 π s, ys ; xs,t t − yt ω (t, u) |Γs,u − Γs,t − Γt,u | ≤ c1 ω (s, u) 1/p exp c2 ω (t, u) γ /p 1/p 1/p . + c2 |Γs,t | ω (t, u) exp c2 ω (t, u) ≤ c1 ω (s, u) The elementary inequality 1 + c2 ω (t, u)

1/p

1/p 1/p ≤ exp 2c2 ω (s, u) , exp c2 ω (t, u)

combined with the triangle inequality, then gives γ 1/p + |Γt,u | + c1 ω (s, u) p . |Γs,u | ≤ |Γs,t | exp 2c2 ω (s, u) On the other hand, using again Lemma A, |Γs,t | = ys,t − π s, ys ; xs,t s,t % & t t s,t γ dx |dx| + |V |Lip γ −1 ≤ c3 |V |Lip γ −1 % ≤

s

c3 |V |Lip γ −1 (K + 1)

=: ω ˜ (s,t)

&γ

t

|dx| s

s

thanks to (10.7).

(10.10)

10.2 Davie’s estimate

219

Obviously, ω ˜ is a control function whose ﬁniteness depends crucially on the a priori assumption that x has ﬁnite 1-variation and we summarize the previous estimate in writing: γ

ω (s, t) ) where γ > 1. |Γs,t | = O (˜

(10.11)

The two estimates (10.10), (10.11) are precisely what is needed to apply (the elementary analysis) Lemma 10.59 (found in the appendix of this chapter): it follows that, for all s < t in [0, T ], γ /p 1/p exp c4 ω (s, t) |Γs,t | ≤ c4 ω (s, t) and we emphasize that c4 does not depend on ω ˜ and in particular not on the 1-variation of x. From (10.8) and the triangle inequality, we therefore have for all s < t in [0, T ] , 1/p γ /p 1/p exp c4 ω (s, t) |ys,t | ≤ c1 ω (s, t) + c4 ω (s, t) and if attention is restricted to s, t such that ω (s, t) ≤ 1, we obviously have 1/p

|ys,t | ≤ (c1 + c4 ec 4 ) ω (s, t)

.

But then it follows from Proposition 5.10 that for all s < t in [0, T ], 1/p |y|p-var;[s,t] ≤ c5 ω (s, t) ∨ ω (s, t) . That also leads to

|Γs,t | = ys,t − π s, ys ; xs,t s,t 1/p 1/p ≤ c5 ω (s, t) ∨ ω (s, t) + c1 ω (s, t)

and hence

γ /p 1/p 1/p |Γs,t | ≤ min c4 ω (s, t) , c5 ω (s, t) ∨ ω (s, t) exp c4 ω (s, t) 1/p + c1 ω (s, t) γ /p

≤ c6 ω (s, t)

.

The proof is now ﬁnished. Exercise 10.11 Prove that we can take the constants C1 and C2 in Lemma 10.7 to be continuous in p, for p ∈ [1, γ). The following exercise deals with the case p < γ < 2 in Lemma 10.7.

Rough diﬀerential equations

220

Exercise 10.12 (i) for 1 < γ < 2, and a, b > 0, prove that aγ −1 b ≤ (γ − 1) abγ −1 + (2 − γ) bγ .

(10.12)

(ii) Under the assumption of Lemma 10.7 for p < γ < 2, prove that if Γs,t = ys,t − V (ys ) xs,t and ω (s, t) = |V |Lip γ −1 |x|p-var,[s,t] , we have for s < t < u, |Γs,u − Γs,t − Γt,u | ≤ c (γ − 1) |Γs,t | ω (t, u)

γ −1 p

γ /p

+ c (2 − γ) ω (t, u)

.

(iii) Prove Lemma 10.7 in the case γ < 2. Solution. (i) Write x = a/b, and dividing by bγ , we see that (10.12) is equivalent to xγ −1 ≤ (γ − 1) x + (2 − γ) for x > 0, which is checked by basic calculus. (ii) For s < t < u, |Γs,u − Γs,t − Γt,u | =

|[V (yt ) − V (ys )] xt,u | γ −1

1/p

≤

c |Γs,t |

≤

c (γ − 1) |Γs,t | ω (t, u) using (i).

ω (t, u)

γ −1 p

γ /p

+ c (2 − γ) ω (t, u)

(iii) Exactly the same argument as in the proof of Lemma 10.7.

Exercise 10.13 (i) Under the assumption of Lemma 10.7 for p < γ < 2, prove using Young estimates that for all s, t |y|p-var;[s,t] ≤ c1 |V |Lip γ −1 |x|p-var;[s,t] 1 + |y|p-var;[s,t] . (ii) Using proposition 5.10, prove inequality (10.3) of Lemma 10.7. Solution. From Young’s inequality, t |ys,t | = V (y) dx s γ −1 ≤ c1 |x|p-var;[s,t] |V |∞ + |V |Lip γ −1 |y|p-var;[s,t] γ −1 ≤ c1 |V |Lip γ −1 |x|p-var;[s,t] 1 + |y|p-var;[s,t] ≤ c2 |V |Lip γ −1 |x|p-var;[s,t] 1 + |y|p-var;[s,t] 1/p p ≤ c3 |V |Lip γ −1 |x|p-var;[s,t] 1 + |y|p-var;[s,t] .

10.3 RDE solutions

221 p

The p-variation of x is controlled by ω x;p ≡ |x|p-var;[·,·] . Using similar notap tion for y we see that |ys,t | is estimated by a constant times ω x;p +ω x;p ω y ;p which is a control (cf. Exercise 1.9) and from the basic super-additivity property of controls, 1/p p |y|p-var;[s,t] ≤ c3 |V |Lip γ −1 |x|p-var;[s,t] 1 + |y|p-var;[s,t] ≤ c3 |V |Lip γ −1 |x|p-var;[s,t] 1 + |y|p-var;[s,t] . (ii) For s, t such that 2c2 |V |Lip γ −1 |x|p-var;[s,t] < 1, we obtain |y|p-var;[s,t]

c2 |V |Lip γ −1 |x|p-var;[s,t]

≤

≤

2c2 |V |Lip γ −1 |x|p-var;[s,t] .

1 − c2 |V |Lip γ −1 |x|p-var;[s,t]

We then obtain estimates on |y|p-var;[s,t] for s, t such that 2c2 |V |Lip γ −1 |x|p-var;[s,t] ≥ 1 using Proposition 5.10.

10.3 RDE solutions Davie’s lemma gives us uniform estimates for ODE solutions which depend only on the rough path regularity (e.g. p-variation or 1/p-H¨ older) of the canonical lift of a “nice” driving signal x ∈ C 1-var [0, T ] , Rd . It should therefore come as no surprise that a careful passage to the limit will yield a sensible notion of diﬀerential equations driven by a “generalized” driving signal, given as a limit of nice driving signals (in p-variation or 1/p-H¨older rough path sense . . . ). This class of generalized driving signals is precisely the class of weak geometric p-rough paths introduced in the previouschap ter. Indeed, we saw in Section 8.2 that for any x ∈ C p-var [0, T ] , G[p] Rd , there exist (xn ) ⊂ C 1-var [0, T ] , Rd which approximate x uniformly with uniform p-variation bounds, lim d0;[0,T ] S[p] (xn ) , x = 0 and sup S[p] (xn )p-var;[0,T ] < ∞. n →∞

n

(10.13)

10.3.1 Passage to the limit with uniform estimates Our aim is now to make precise the meaning of the rough diﬀerential equation (RDE) (10.14) dy = V (y) dx, y (0) = y0 ∈ Re where V = (Vi )1≤i≤d is a family of suﬃciently nice vector ﬁelds and x : [0, T ] → G[p] Rd is a weak geometric p-rough path. The following

Rough diﬀerential equations

222

is essentially an existence result for such RDEs; since our precise deﬁnition is very much motivated by this result, the precise deﬁnition of an RDE solution (together with remarks on alternative deﬁnitions) is postponed till the next section. Theorem 10.14 (existence) Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ), where γ > p; (ii) (xn ) is a sequence in C 1-var [0, T ] , Rd , and x is a weak geometric p-rough path such that < ∞; lim d0;[0,T ] S[p] (xn ) , x and sup S[p] (xn ) n →∞

p-var;[0,T ]

n

(iii) y0n ∈ Re is a sequence converging to some y0 . y0n ; xn ) converges in uniform Then, at least along a subsequence, π (V ) (0, topology to some limit, say y ∈ C [0, T ] , Rd . Any such limit point satisﬁes the following estimates: there exists a constant C depending on p, γ such that for all s < t in [0, T ], p p |y|p-var;[s,t] ≤ C |V |Lip γ −1 xp-var;[s,t] ∨ |V |Lip γ −1 xp-var;[s,t] . (10.15) Moreover, if xs,t : [s, t] → Rd is any continuous bounded variation path such that t s,t s,t dx ≤ K x Sγ x s,t = Sγ (x)s,t and p-var;[s,t] s

for some constant K ≥ 1 then, again for all s < t in [0, T ], γ ys,t − π (V ) s, ys ; xs,t s,t ≤ C K |V |Lip γ −1 xp-var;[s,t] ,

(10.16)

where C depends on p, γ. Proof. Let ε > 0 be small enough such that p + ε < γ, and let us deﬁne for convenience the function φp (x) = x ∨ xp . By Davie’s Lemma (i.e. Lemma 10.7) for all s, t ∈ [0, T ], there exists c1 = c1,p+ε such that π (V ) (0, y0n ; xn )s,t ≤ c1,p+ε φp+ |V |Lip γ −1 S[p] (xn )(p+ε)-var;[s,t] , (10.17) where c1 is independent of n. From Corollary 5.29, S[p] (xn ) (p+ε)-var;[.,.]

is equicontinuous in the sense that for all ε > 0, there exists δ such that for all s, t with |t − s| < δ, S[p] (xn ) < ε. (p+ε)-var;[s,t]

10.3 RDE solutions

223

This implies that π (V ) (0, y0n ; xn ) is equicontinuous and hence converges (along a subsequence) to a path y. We therefore obtain that |ys,t | ≤ c1,p+ε φp+ |V |Lip γ −1 lim S[p] (xn )(p+ε)-var;[s,t] n →∞ ≤ c1,p+ε φp+ |V |Lip γ −1 x(p+ε)-var;[s,t] . By Exercise 10.11, limε→0 c1,p+ ε = c1,p , and by Lemma 5.13, limε→0 x(p+ ε)-var;[s,t] = xp-var;[s,t] . Hence, for all s, t ∈ [0, T ] , |ys,t | ≤ c1,p φp |V |Lip γ −1 xp-var;[s,t] . The right-hand side of the last expression raised to power p deﬁnes a control, hence we obtain (10.15). From Remark 10.9, proving inequality (10.16) is equivalent to the proof of the following inequality: γ ys,t − E(V ) s, ys ; Sγ xs,t s,t ≤ C |V |Lip γ −1 xp-var;[s,t] . By Davie’s Lemma (i.e. Lemma 10.7), for all s, t ∈ [0, T ], π (V ) (0, y0n ; xn )s,t − E(V ) s, ys ; Sγ xs,t n s,t γ ≤ c1,p+ ε |V |Lip γ −1 S[p] (xn )(p+ε)-var;[s,t] , where c1 is independent of n. Letting n tend to inﬁnity (along the subsequence that allows us to have convergence of π (V ) (0, y0n ; xn ) to y), we obtain γ ys,t − E(V ) s, ys ; Sγ (x)s,t ≤ c1,p+ε |V |Lip γ −1 x(p+ε)-var;[s,t] . Letting ε converge to 0 ﬁnishes the proof. Let us point out explicitly the error estimate for the Euler scheme, which was established in the ﬁnal step of the previous proof. Corollary 10.15 (Euler RDE estimates) Under the assumptions of Theorem 10.14 (in particular x ∈ C p-var [0, T ] , G[p] Rd , V ∈ Lipγ −1 (Re ) , γ > p) we have γ ys,t − E(V ) ys , Sγ (x)s,t ≤ C |V |Lip γ −1 xp-var;[s,t] where C only depends on p, γ. If x is 1/p-H¨ older, γ γ γ /p . ys,t − E(V ) ys , Sγ (x)s,t ≤ C |V |Lip γ −1 x1/p-H¨o l;[0,T ] × |t − s| Remark 10.16 Note %γ& ≥ [p] so that Sγ (x) is the Lyons lift of x. For γ close enough to p, Sγ (x) ≡ x.

224

Rough diﬀerential equations

10.3.2 Deﬁnition of RDE solution and existence Perhaps the simplest way to turn Theorem 10.14 into a sensible deﬁnition of what we mean by dy = V (y) dx, (10.18) started at y0 ∈ Re , is the following: Deﬁnition 10.17 Let x ∈ C p-var [0, T ] , G[p] Rd be a weak geometric p-rough path. We say that y ∈ C ([0, T ] , Re ) is a solution to the rough diﬀerential equation (short: RDE solution) driven by x along the collection of Re -vector ﬁelds V = (Vi )i=1,...,d and started at y0 if there exists a sequence (xn )n in C 1-var [0, T ] , Rd such that (10.13) holds, and ODE solutions yn ∈ π (V ) (0, y0 ; xn ) such that yn → y uniformly on [0, T ] as n → ∞ . The (formal) equation (10.18) is referred to as a rough diﬀerential equation (short: RDE). This deﬁnition generalizes immediately to other time intervals such as [s, t], and we deﬁne π (V ) (s, ys ; x) ⊂ C ([s, t] , Re ) to be the set of all solup-var ([s, t] , tionsto the above RDE starting at ys at time s driven by x ∈ C [p] d R . In case of uniqueness, π (V ) (s, ys ; x) is the solution of the RDE. G Let us note that Theorem 10.14 is now indeed an existence result for RDE solutions. That said, there are some possible variations on the theme of the RDE deﬁnition on which we wish to comment. Remark 10.18 (RDE, Davie deﬁnition) Theorem 10.14 and Corollary 10.15 allow us to pick other “deﬁning” properties of RDE solutions. For instance, A. M. Davie [37] deﬁnes y to be an RDE solution if there exists a control function ω ˜ and a function θ (δ) = o (δ) as δ → 0 such that for all s < t in [0, T ], ys,t − E(V ) (ys , xs,t ) ≤ θ (˜ ω (s, t)) . (10.19) (Note that (10.19) contains an implicit regularity assumption on V so that the Euler scheme E is well-deﬁned.) Applying Corollary 10.15 (take γ small enough so that %γ& ≥ [p] , hence Sγ (x) ≡ x, then θ (δ) = δ γ /p and p ω ˜ (s, t) = (const) × xp-var;[s,t] ) shows that any RDE solution in the sense of Deﬁnition 10.17 is also a solution in Davie’s sense. With either deﬁnition, let us note that (10.19) leads immediately to the statement that y satisﬁes some sort of compensated Riemann–Stieltjes integral equation, E(V ) yt i , xt i ,t i + 1 , yt − y0 = lim n →∞

t i ∈D n

for any sequence of dissection (Dn ) of [0, t] with mesh tending to zero.

10.3 RDE solutions

225

Remark 10.19 (RDE, Lyons deﬁnition) As one expects, RDE solutions can also be deﬁned as the solution to a “rough” integral equation and this is Lyons’ original approach [116, 120, 123]. To this end, one ﬁrst needs a notion of rough integration (cf. Section 10.6) which allows, for suﬃciently smooth ϕ = (ϕ1 , . . . , ϕd ), deﬁned on Rd , the deﬁnition of an (indeﬁnite) rough integral · ϕ (z) dz with z = π 1 (z) 0 such that, in the case when z = S[p] (z) for some z ∈ C 1-var [0, T ] , Rd , it coincides with S[p] (ξ) where ξ is the classical Riemann–Stieltjes integral

· ϕ (z) dz. 0 Note that (10.18) cannot be rewritten as an integral equation of the above form (for y is not part of the integrating signal x). Nonetheless, the “enhanced” diﬀerential equation (in which the input signal is carried along to the output) dx = dx dy = V (y) dx can be written in the desired form · ϕ (z) dz z0,· = 0

provided we set z = (x, y) and

1 0 . V (y) 0 The above integral equation indeed makes sense as a rough integral equation (with implicit regularity assumption on V so the rough integral is wellp-var ([0, T ] , deﬁned) d replacing z by a genuine geometric p-rough path z ∈ C [p] e R ⊕R and solutions can be constructed, for instance, by a Picard– G iteration [116, 120, 123]. An Re -valued solution is then recovered by projection z → π 1 (z) = z = (x, y) → y and again one can see that an RDE solution in the sense of Deﬁnition 10.17 is also a solution in this sense. ϕ (z) =

10.3.3 Local existence As in ODE theory, if the vector ﬁelds have only locally the necessary regularity for existence, we get local solutions. Exercise 10.20 We keep the notation of Theorem 10.14. Fix s < t in [0, T ] and assume that for some open set Ω, we have for all u ∈ [s, t] , yu ∈ Ω. Prove that p p |y|p-var;[s,t] ≤ C |V |Lip γ −1 (Ω) xp-var;[s,t] ∨ |V |Lip γ −1 (Ω) xp-var;[s,t] where C = C (p, γ).

226

Rough diﬀerential equations

As a consequence of this result, we obtain the following: Theorem 10.21 (local existence) Asssume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds Lipγloc−1 (Re ) , with γ > p; (ii) x : [0, T ] → G[p] Rd is a weak geometric p-rough path; (iii) y0 ∈ Re is an initial condition. Then either there exists a (global) solution y : [0, T ] → Re to dy = V (y) dx with initial condition y0 , or there exists τ ∈ [0, T ] and a (local) solution y : [0, τ ) → Re such that y is a solution on [0, t] for any t ∈ (0, τ ) and lim |y (t)| = +∞.

tτ

Proof. It simpliﬁes the argument to have unique ODE solutions, we thus assume γ ≥ 2. (Otherwise p < γ < 2 and some care, similar to the proof of Theorem 3.6, is needed; we leave this extension to the reader.) Without loss of generality, y0 = 0. Pick (xn ) ⊂ C 1-var [0, T ] , Rd so that S[p] (xn ) → x uniformly with supn S[p] (xn )p-var;[0,T ] < ∞. Replace V by compactly supported Lipγ −1 -vector ﬁelds V n which coincide with V on the ball {y : |y| ≤ n}. From the preceding existence theorem, π (V ( 1 ) ) (0, y0 ; xn ) → y (1) ∈ π (V 1 ) (0, y0 ; x) , where strictly speaking we have replaced xn by a subsequence along which this convergence holds. If (1) ≤1 y ∞;[0,T ]

then we can replace V1 by V and hence ﬁnd a global solution, in which case we are done. Otherwise, (1) τ 1 = inf t ≥ 0 : yt ≥ 1 ∈ [0, T ] and we switch to another subsequence so that π (V ( 2 ) ) (0, y0 ; xn ) → y (2) ; observing that y (1) ≡ y (2) on [0, τ 1 ]. Again, if y (2) ∞;[0,T ] ≤ 2 we ﬁnd a global solution, otherwise we deﬁne another τ 2 ≥ τ 1 by τ 2 = inf t ≥ 0 : y (2) ≥ 2 ∈ [0, T ] and so on. Iterating this either yields a global solution or a family of RDE solutions y (n ) ∈ π (V n ) (0, y0 ; x) , consistent in the sense that y (n ) ≡ y (n +1) on [0, τ n ] . Moreover, by deﬁnition of τ n we see that |y (τ n )| = n → ∞ as n → ∞ and the proof is ﬁnished.

10.3 RDE solutions

227

10.3.4 Uniqueness and continuity For ordinary diﬀerential equations, we saw that existence is guaranteed for continuous vector ﬁelds, while uniqueness requires Lipschitz vector ﬁelds; that is, one additional degree of smoothness. In essence, this remains true for RDEs driven by p-rough paths: we saw that RDE solutions exist for Lipγ −1 -vector ﬁelds when γ > p and we will now see that uniqueness holds, still assuming γ > p, for Lipγ -vector ﬁelds (or Lipγloc since uniqueness is a local issue!). We will show uniqueness by establishing Lipschitz regularity of the RDE ﬂows. Later, we will see that uniqueness also holds for γ = p (in this case, we will only prove uniform continuity on bounded sets of the RDE ﬂow rather than Lipschitzness), and that uniqueness still holds when we relax the rough path regularity of the driving signal in a way that gives uniqueness under optimal regularity assumptions for RDEs driven by Brownian motion and L´evy’s area. The reader may ﬁnd it useful to quickly revise the proof of Davie’s lemma, which is similar to arguments in this section. In particular, the following Lemmas A and B are essentially straightforward generalizations of what we called Lemmas A and B in Section 10.2. We start with the elementary yet useful f , Re ) with β ∈ [1, 2], and a, b, a ˜, ˜b : Lemma 10.22 Let g, g˜ ∈ Lipβ Re , L(R e 1-var f [0, 1], R . Then, with ˜∈C [0, 1] → R , and x, x ∆≡

1

(g (ar ) − g (br )) dxr − 0

1

g˜ (˜ ar ) − g˜ ˜br d˜ xr ,

0

we have1 |∆|

≤

|g|

L ip β

1

˜r − ˜br . |dxr | (ar − br ) − a

0

β −1 + |a − b|∞;[0,1] + ˜ a − ˜b |x|1-var;[0,1] ∞;[0,1] × |g| β b − ˜b + |g − g˜|Lip β −1 L ip ∞;[0,1] a − ˜b |x − x ˜|1-var;[0,1] . + |˜ g | β . ˜

L ip

1 In

∞;[0,1]

the case β = 1, |g − g˜|L ip β −1 has to be replaced by 2 |g − g˜|∞ .

Rough diﬀerential equations

228

Proof. Step 1: Fix a, b, a ˜, ˜b ∈ Re . When β > 1, g ∈ C 1 and we can write g (a) − g (˜ a) − g (b) − g ˜b as

1 0 1

= 0

g (ta + (1 − t) a ˜) (a − a ˜) dt −

1

g tb + (1 − t) ˜b b − ˜b dt

0

g (ta + (1 − t) a ˜) a − a ˜ − b − ˜b dt

1

+

!

" g (ta + (1 − t) a b − ˜b dt ˜) − g tb + (1 − t) ˜b

0

to obtain, using |g |∞ ≤ |g|

L ip β

for β > 1, ˜b (a − b) − a ˜ − L ip β β −1 a − ˜b + |g| β |a − b| + ˜ b − ˜b .

a) − g ˜b ≤ |g| (g (a) − g (b)) − g (˜

L ip

In fact, this argument remains valid for β = 1 since g ∈ Lip1 implies absolute continuity of t → g (ta + (1 − t) a ˜). β −1 ˜ a − ˜b a)−˜ g b − g (˜ a)−g ˜b ≤ |g − g˜|Lip β −1 ˜ Step 2: Obviously g˜ (˜ so that, by the triangle inequality, ˜ − ˜b a) − g˜ ˜b ≤ |g| β (a − b) − a (g (a) − g (b)) − g˜ (˜ L ip β −1 |g| β b − ˜b + |g − g˜|Lip β −1 . + |a − b| + ˜ a − ˜b L ip

Step 3: At last, we write ∆ = ∆1 + ∆2 where ∆1

1

!

1

!

0

∆2

" g (ar ) − g (br ) − g˜ (˜ dxr , ar ) − g˜ ˜br

= =

" g˜ (˜ ar ) − g˜ ˜br d (xr − x ˜r ) .

0

Using the elementary . . . dxr ≤ |. . . | |dxr| ≤ |. . . |∞ |x|1-var we bound ˜ g (˜ ar ) −˜ g ˜br ≤ |˜ g | β . ˜ a−b ∆1 using the step-2 estimate, while ˜ ∞

L ip

∞;[0,1]

allows us to bound ∆2 . Together, they imply the claimed estimate. We now turn to Lemma A and note the assumption of Lipγ -regularity, γ ≥ 1, in contrast to Lemma A which was formulated for Lipγ −1 vector ﬁelds, γ > 1. Lemma A) Assume that 10.23 (Lemma (i) Vi1 1≤i≤d and Vi2 1≤i≤d are two collections of vector ﬁelds in Lipγ (Re ) vector ﬁelds, with γ ≥ 1;

10.3 RDE solutions

229

(ii) s < u are two elements of [0, T ]; (iii) ys1 , ys2 ∈ Re (thought of as “time-s” initial conditions); ˜1 and x2 , x ˜2 are driving signals in C 1-var [s, u] , Rd such that (iv) x1 , x S[γ ] x1 s,u S[γ ] x2 s,u

= =

1 ˜ s,u , S[γ ] x 2 ˜ s,u ; S[γ ] x

(v) ≥ 0, δ > 0 are such that u u u u 1 1 2 2 max |dx | + |d˜ x |, |dx | + |d˜ x | s s s u s u 1 1 2 2 dx − dx , d˜ x − d˜ x max s

≤ , ≤

δ;

s

(vi) υ ≥ 0 is a bound on V 1 Lip γ and V 2 Lip γ . Then, for some constant C depending only on γ, we have2 ˜1 s,u − π (V 2 ) s, ys2 ; x2 s,u π (V 1 ) s, ys1 ; x1 s,u − π (V 1 ) s, ys1 ; x −π (V 2 ) s, ys2 ; x ˜2 s,u γ ≤ C ys1 − ys2 + υ1 V 1 − V 2 Lip γ −1 . (υ) exp (Cυ) [γ ]

+ Cδυ. (υ)

exp (Cυ) .

Proof. First, as in the proof of Lemma A, we take (s, u) = (0, 1) and observe ˜i 0,1=π (V i ) 0, π (V i ) 0, y0i; x ˜i 1 ; z i 0,1; i=1, 2, π (V i ) 0, y0i ; xi 0,1−π (V i ) 0, y0i ; x ← − ← − where z 1 = x ˜1 x1 and z 2 = x ˜2 x2 are reparametrized in the same way to a path from [0, 1] to Rd . By assumption (iv) and Chen’s theorem z 1 has trivial step-[γ] signature, i.e. 1 −1 ˜ 0,1 ⊗ S[γ ] x1 0,1 = 1, S[γ ] z 1 0,1 = S[γ ] x

(10.20)

1 and similarly for z 2 . Next, by assumption (v) we have 0 dzr1 − dzr2 ≤ 2δ. Using the Lipschitzness of the ﬂow of ODEs, in the quantitative form of Theorem 3.8, we see that it is enough to prove the above lemma with ˜1 = 0. We can thus assume z 1 = x1 and z 2 = x2 . To simplify the x ˜2 = x notation, deﬁne yui = π (V i ) 0, y0i ; xi u , i = 1, 2, 2 For

γ = 1, V 1 − V 2 L ip γ −1 is replaced by 2 V 1 − V 2 ∞ .

Rough diﬀerential equations

230

and, for N := [γ], = xi,N u ,1

⊗N dxir 1 ⊗ · · · ⊗ dxir N ∈ Rd ,

u < r 1 < ···< r n < 1

Observe that

1

max

i=1,2

i = 1, 2.

i,N dxu ,1 ≤ N .

0

From (10.20) and Remark 10.4 concerning the remainder representation of an Euler approximation,3 we have 1 i,N i i i V yu −V i,N y0i dxi,N =y0,1 −E(V i ) y0 , SN xi 0,1 = y0,1 u ,1 , i=1, 2, 0

and so we have 1 2 y0,1 − y0,1

1

= 0

1,N 1 V yu − V 1,N y01 dx1,N u ,1 ,

1

−

2,N 2 V yu − V 2,N y02 dx2,N u ,1 .

0

Note that V i,N ∈ Lipβ where β := γ − N + 1 ∈ [1, 2) since N = [γ]. 1 1 2 2 Using Lemma 10.22 to the paths u → yu , y0 , yu , y0 , we then obtain, using i,N γ −N + 1 ≤ c1 υ N , the bound maxi=1,2 V Lip 1 2 y0,1 − y0,1

1 1,N 2 ≤ c1 υ N y0,. − y0,. x .,1 ∞;[0,1]

1-var;[0,1]

γ −1 2 1,N 1 y0,. x + c1 y0,. + .,1 ∞;[0,1] ∞;[0,1] 1-var;[0,1] 1,N N 1 2 2,N γ −N × υ y0 − y0 + V −V Lip 1,N 2,N 2 . + c1 υ N . y0,. x.,1 − x.,1 ∞;[0,1] 1-var;[0,1]

We ﬁrst observe, by Theorem 3.18, 1 2 y0,1 − y0,1 ≤ c2 y01 − y02 υ + δυ + V 1 − V 2 ∞ ec 2 υ ∞;[0,1] and the ODE estimate of Theorem 3.4 gives 1 2 y0,. + y0,. ≤ c3 υ. ∞;[0,1] ∞;[0,1] V = (V 1 , . . . , V d ) we think of V N ≡ V j 1 . . . V j N j , . . . , j ∈{1 , . . . , d } as an element 1 N ⊗N of the (Euclidean) space Rd which contracts naturally with elements of the form ⊗N . x iu,,N1 ∈ Rd 3 If

10.3 RDE solutions

231

Moreover, we easily see that 1,N V − V 2,N Lip γ −N ≤ c4 υ N −1 V 1 − V 2 Lip γ −1 2,N and, from Proposition 7.63, we have x1,N − x ≤ c5 δN −1 . We .,1 .,1 1-var;[0,1] N also have x.,1 1-var;[0,1] ≤ N of course. Putting all these inequalities together gives the desired estimate. ¯ detail how all Exercise 10.24 In the ﬁnal step of the proof of Lemma A, the estimates are put together. Solution. Assume V 1 = V 2 at ﬁrst. Using 1 2 y0,. − y0,. ≤ c2 y01 − y02 υ + δυ ec 2 υ ∞;[0,1] we get 1 2 y0,1 − y0,1

υ N y01 − y02 υ + δυ ecυ N γ −1 N N 1 + (υ) υ y0 − y02 + υ N . υ.δN −1 ≡ y01 − y02 ∆1 + δ∆2

with ∆1

= ≤

γ −1

υ N υecυ N + (υ) N υ N " ! N +1 N +γ −1 ecυ . (const) × (υ) + (υ)

From N = [γ] it is clear that min (N + 1, N + γ − 1) ≥ γ and so γ

∆1 ≤ (const) × (υ) ecυ . Similarly, N

∆2 = υ N +1 N ecυ + υ N +1 N υ (υ) ecυ . When V 1 = V 2 we have 1 1 2 y0,1 − y0,1 ≤ y0 − y02 ∆1 + δ∆2 + V 1 − V 2 Lip γ −1 ∆3 with ∆3

=

γ −1 N

γ −1

≤ (υ) N ecυ 1 γ −1 1 cυ γ (υ) e = (υ) ecυ . υ (υ)

Rough diﬀerential equations

232

Lemma B) Assume that 10.25 (Lemma (i) Vi1 1≤i≤d and Vi2 1≤i≤d are two collections of vector ﬁelds in Lipγ (Re ) with γ ≥ 1; (ii) t 0 are such that

u

u

|dx1 |,

max t

|dx2 | ≤ and

t

u

1 dxr − dx2r ≤ δ;

t

(vi) υ is a bound on V 1 Lip γ and V 2 Lip γ . Then we have, for some constant C = C (γ), π (V 1 ) t, yt1 ; x1 t,u − π (V 1 ) t, y˜t1 ; x1 t,u − π (V 2 ) t, yt2 ; x2 t,u − π (V 2 ) t, y˜t2 ; x2 t,u ≤ Cυ exp (Cυ) yt1 − y˜t1 − yt2 − y˜t2 1 yt − y˜t1 + yt2 − y˜t2 m in(2,γ )−1 + Cυ exp (Cυ) × y˜t1 − y˜t2 + υ1 V 1 − V 2 Lip γ −1 + υδ + Cδυ exp (Cυ) yt2 − y˜t2 . Proof. At the price of replacing γ by min (2, γ) ∈ [1, 2], we can and will assume that γ ∈ [1, 2]. Deﬁne for r ∈ [t, u] , yri = π (V i ) t, yti ; xi r

and

y˜ri = π (V i ) t, y˜ti ; xi r

with i = 1, 2.

We deﬁne 1 2 1 2 − yt,r − y˜t,r − y˜t,r er = yt,r

with r ∈ [t, u],

and have to estimate eu . Fom Lemma 10.22, applied with γ ∈ [1, 2], we see for all r ∈ [t, u], er

r r 1 1 1 1 2 2 2 2 1 2 V ys − V y˜s dxs − V ys − V y˜s dxs = t t  r 1 υ t ys − y˜s1 − ys2 − y˜s2 . dx1s   γ −1    + y 1 − y˜1 + y 2 − y˜2 ∞;[t,r ] ∞;[t,r ] ≤ 1 1 1 y˜ − y˜2 x V − V 2 γ −1  υ × +   1-var;[t,r ] Lip ∞;[t,r ]   + υ y 2 − y˜2 ∞;[t,r ] x1 − x2 1-var;[t,r ] .

10.3 RDE solutions

From Theorem 3.15 we have i y − y˜i ≤ ∞;[t,r ] 1 2 y˜ − y˜ ≤ ∞;[t,r ]

233

c1 yti − y˜ti exp (c1 υ) , i = 1, 2, c1 y˜t1 − y˜t2 exp (c1 υ) + c1 V 1 − V 2 Lip γ −1 exp (c1 υ) + c1 υδ exp (c1 υ) .

Hence, we obtain er

≤

υ

r

es dx1s + υ yt1 − y˜t1 − yt2 − y˜t2

t

γ −1 υ exp (c2 υ) + c2 yt1 − y˜t1 + yt2 − y˜t2 1 y˜t − y˜t2 + 1 V 1 − V 2 γ −1 + υδ Lip υ 2 + c1 yt − y˜t2 υδ exp (c1 υ) . The proof is then ﬁnished by an application of (Gronwall’s) Lemma 3.2. Equipped with these two lemmas, we can prove (under some regularity assumptions) that the map x → π (V ) (0, y0 ; x) is well-deﬁned and locally Lipschitz continuous in all its parameters (vector ﬁelds, initial condition, and driving signal). Theorem 10.26 (Davie) 4 Assume that 1 1 (i) V = Vi 1≤i≤d and V 2 = Vi2 1≤i≤d are two collections of Lipγ -vector ﬁelds on Re for γ > p ≥ 1; (ii) ω is a ﬁxed control;5 (iii) x1 , x2 are two weak geometric p-rough paths in C p-var [0, T ] , G[p] Rd , with xi p-ω ≤ 1; conditions; (iv) y01 , y02 ∈ Re thought initial of as time-0 (v) υ is a bound on V 1 Lip γ and V 2 Lip γ . Then there exists a unique RDE solution starting at y0i along V i driven by xi , denoted by y i ≡ π (V i ) 0, y0i ; xi , for i = 1, 2. Moreover, there exists C = C (γ, p) such that6 ! " ρp,ω ;[0,T ] y 1 , y 2 ≤ C υ y01 − y02 + V 1 − V 2 Lip γ −1 + υρp,ω ;[0,T ] x1 , x2 · exp (Cυ p ω (0, T )) . 4 The present theorem stands in a similar relation to Davie’s lemma as Lemma A ¯ to ¯ to B. A, or Lemma B i p 5 In view of (iii) one can take ω (s, t) = i = 1 , 2 x p -va r;[s , t ] . 6 Here: ρ 1 2 1 2 = y − y p , ω ;[0 , T ] . p , ω ;[0 , T ] y , y

Rough diﬀerential equations

234

Proof. The present regularity assumptions on the vector ﬁelds, Lipγ with γ > p, are more than enough to guarantee existence of RDE solutions. More precisely, let us pick solutions y i ∈ π (V i ) 0, y0i ; xi , i = 1, 2. We may assume, without loss of generality, p < γ < [p] + 1 so that [γ] = [p] and also set ε := ρp,ω ;[0,T ] x1 , x2 . For all s < t in [0, T ] we can ﬁnd paths x1,s,t and x2,s,t such that S[p] xi,s,t s,t = xis,t , i = 1, 2, and such that, for a constant c1 = c1 (p),

i,s,t dxr

≤

c1 ω (s, t)

1,s,t dxr − dx2,s,t r

≤

c1 εω (s, t)

t

1/p

,

s

t

1/p

;

s

indeed, this is possible thanks to Proposition 7.64 applied to gi := δ 1 1 / p xis,t ∈ G[p] Rd , i = 1, 2, ω (s ,t )

noting that g1 , g2 ≤ 1 and |g1 − g2 |T [ p ] (Rd ) ≤ ε. After these preliminaries, let us now ﬁx s < t < u in [0, T ] and deﬁne xi,s,t,u := xi,s,t xi,t,u , the concatenation of xi,s,t and xi,t,u . Following Davie’s lemma, we deﬁne i − π (V i ) s, ysi ; xi,s,t s,t , i = 1, 2, Γis,t = ys,t and set Γs,t := Γ1s,t − Γ2s,t . From the estimates in (the existence) Theorem 10.14, using the fact that the vector ﬁelds are [p]-Lipschitz, we have [p]+ 1 i 1 Γs,t ≤ c1 υω (s, t)1/p , i = 1, 2, 2 and hence

(10.21)

[p]+ 1 Γs,t ≤ c1 υω (s, t)1/p .

We now proceed similarly as in the proof of Davie’s lemma, cf. (10.9). Namely, for i = 1, 2, we deﬁne Ai ≡ π (V i ) s, ysi ; xi,s,t,u s,u − π (V i ) s, ysi ; xi,s,u s,u ,

10.3 RDE solutions

235

noting that S[p] xi,s,t,u = S[p] xi,s,u , and B i ≡ π (V i ) t, yti ; xi,t,u t,u − π (V i ) t, π (V i ) s, ysi ; xi,s,t t ; xi,t,u t,u = π (V i ) t, yti ; xi,t,u t,u − π (V i ) t, yti − Γis,t ; xi,t,u t,u . ¯ := B 1 − B 2 so that Γs,u − Γs,t − Γt,u = A¯ + B ¯ We also set A¯ := A1 − A2 , B and hence ¯ . Γs,u − Γs,t − Γt,u ≤ A¯ + B 1/p

We now apply Lemmas A and B with parameters := c1 ω (s, u) , δ := ε. Lemma A was tailor-made to give the estimate ! "γ A¯ ≤ c2 ys1 −ys2 + 1 V 1 −V 2 γ −1 υω (s, u)1/p exp c2 υω (s, u)1/p Lip υ "[p]+ 1 ! 1/p 1/p . + c2 ε υω (s, u) exp c2 υω (s, u) Re-observing that [p]+ 1 1/p max π (V i ) s, ysi ; xs,t t − yti ≤ c1 υω (s, t) ,

i=1,2

(10.22)

Lemma B tells us that ¯ = B 1 − B 2 B 1 1/p ≤ c3 Γs,t υω (s, u) p exp c3 υω (s, u) 1 1/p + c3 yt1 − yt2 + V 1 − V 2 Lip γ −1 + ευω (s, t) υ ! "1+ ([p]+1)(m in(2,γ )−1) 1/p 1/p exp c3 υω (s, u) · υω (s, u) "[p]+ 2 ! 1/p 1/p . + c3 ε υω (s, u) exp c3 υω (s, u) Observe that 1+([p] + 1) (min (2, γ) − 1) ≥ min ([p] + 2, 1 + γ (γ − 1)) ≥ γ, and obviously [p] + 2 ≥ γ. Putting things together, we obtain Γs,u ≤ Γs,t exp c4 υω (s, u)1/p + Γt,u (10.23) 1 +c4 max yr1 − yr2 + ε + V 1 − V 2 Lip γ −1 υ r ∈{s,t} "γ ! 1/p 1/p . · υω (s, u) exp c4 υω (s, u) We also have, from Theorem 3.18, that 1 y − y 2 s,t − Γs,t = π (V 1 ) s, ys1 ; x1,s,t s,t − π (V 2 ) s, ys2 ; x2,s,t s,t

236

Rough diﬀerential equations

is bounded above by 1 1/p 1/p . c5 ys1 − ys2 + V 1 − V 2 Lip γ −1 + ε υω (s, t) exp c5 υω (s, t) υ (10.24) Thanks to estimates (10.23), (10.24) and (10.21), we can apply Lemma Γs,t and t → yt1 − yt2 with the ε parameter in that 10.63 to (s, t) → lemma set to υ1 V 1 − V 2 Lip γ −1 + ε. We therefore see that 1 2 1 1 1 2 2 y −y V y + ≤c −y −V +ε exp (c6 υ p ω (0, T )) , 6 0 0 ∞;[0,T ] Lip γ −1 υ and that for all s < t in [0, T ] , with θ = γ/p > 1, Γs,t ≤c7 y01 −y02 +ε+ 1 V 1 −V 2 γ −1 υ pθ ω (s, u)θ exp (c7 υ p ω (0, T )) . Lip υ These estimates plus (10.24) easily give that, for all s < t in [0, T ] , 1 1/p 2 ys,t −ys,t ≤c8 υ ys1 −ys2 + V 1−V 2 Lip γ −1 +υε ω (s, t) exp (c8 υ p ω (0, T )) and this implies the claimed Lipschitz estimate. Obtaining uniqueness of the RDE is then easy: take two solutions in π (V 1 ) 0, y01 , x1 . The above estimate tells us the supremum distance between these two solutions is 0. We now discuss some corollaries. First observe that a locally Lipschitz estimate in ρ1/p-H¨o l;[0,T ] -metric follows immediately from setting ω (s, t) proportional to (t − s). The corresponding result for ρp-var;[0,T ] -metric is the content of Corollary Assume that 10.27 (i) V 1 = Vi1 1≤i≤d and V 2 = Vi2 1≤i≤d are two collections of Lipγ -vector ﬁelds on Re for γ > p ≥ 1; (ii) x1 , x2 are two weak geometric p-rough paths in C p-var [0, T ] , G[p] Rd ; (iii) y01 , y02 ∈ Re thought initial conditions; of as time-0 (iv) υ is a bound on V 1 Lip γ and V 2 Lip γ and p a bound on x1 p-var;[0,T ] and x2 p-var;[0,T ] . Then, if y i = π (V i ) 0, y0i ; xi , we have for some constant C depending only on γ and p, ρp-var;[0,T ] y 1 , y 2 % 1 y0 − y02 + 1 V 1 − V 2 γ −1 ≤ Cvp Lip v & + ρp-var;[0,T ] δ 1/ p x1 , δ 1/ p x2 exp Cυ p pp .

10.3 RDE solutions

237

Proof. Follows from Theorem 10.26 with control ω given by the sum of the controls constructed in Propositions 8.4 and 8.7. (Note that for p = 1 we essentially obtain the ODE estimate from Theorem 3.18.) Thanks to Theorem 8.10 we can “locally uniformly” switch from the inhomogenous path-space metrics (ρp,ω ,ρp-var ) to the homogenous ones (dp,ω , dp-var ). In fact, we can state the following. Corollary 10.28 Let V = (Vi )1≤i≤d be a collection of Lipγ -vector ﬁelds on Re for γ > p ≥ 1. If ω is a control, p ≥ p and R > 0, the maps7 → (C p-ω ([0, T ] , Re ) , dp ,ω ) Re × xp,ω ≤ R , dp ,ω

and Re ×

(y0 , x)

→

π (V ) (0, y0 ; x)

xp-var ≤ R , dp -var

→

(C p-var ([0, T ] , Re ) , dp -var )

(y0 , x)

→

π (V ) (0, y0 ; x)

xp-var ≤ R , d∞

→

(C p-ω ([0, T ] , Re ) , d∞ )

(y0 , x)

→

π (V ) (0, y0 ; x)

and Re ×

are also uniformly continuous. Proof. For p = p, this follows from the above Theorem 10.26/Corollary 10.27 combined with the remark that inhomogenous “ρ” path-space metrics are “H¨older equivalent” on bounded sets to the homogenous ones (Theorem 8.10). Recalling the d0 /d∞ -estimate (Proposition 8.15) we only have to consider the metrics dp -var for p > p and d0 and since d0 x1 , x2 ≤ dp -var x1 , x2 it suﬃces to consider the d0 case. and But this is easy: take p˜ ∈ (p, γ) 1 2 1 2 consider two paths x and x in xp,ω ≤ R such that d0 x , x < ε. From interpolation, 1 2 p/p dp-ω x , x ≤ ε1−p/p (2R) ˜ and noting xp-ω ≤ cR where c may depend on ω (0, T ) it only remains to ˜ use (uniform) continuity (on bounded sets) of the Itˆ o–Lyons map, applied with p˜ instead of p. 7 All

sets refering to x are subsets of C p -va r [0, T ] , G [p ] Rd .

Rough diﬀerential equations

238

10.3.5 Convergence of the Euler scheme Consider an RDE of the form dy = V (y) dx.

(10.25)

As usual, x denotes a geometric p-rough path and we assume suﬃcient regularity on the collection of vector ﬁelds V (i.e. that they are Lipγ −1 , γ > p) to ensure existence of a solution y. An Euler scheme of order ≥ [p] is a natural way to approximate such solutions, at least locally on some (small) time interval [s, t], and we have already seen the error estimate (cf. Corollary 10.15) ys,t − E(V ) (ys , xs,t ) ω (s, t)γ /p with ω (s, t) = xp p-var;[s,t] which was closely related to our existence proof of RDE solutions. We may rewrite the above as yt = ys + ys,t ≈ ys + E(V ) (ys , xs,t ) . Iteration of this leads to an approximate solution over the entire time horizon [0, T ]. We formalize this in Deﬁnition 10.29 (Euler scheme for RDEs) Given g ∈ GN Rd and V = (Vi )1≤i≤d , a collection of vector ﬁelds in C N−1 (Re ), we write Eg y := y + E(V ) (y, g) . p-var Then, ([0, T ] , given D = {0 = t0 < t1 < · · · < tn = T } and x ∈ Co [p] d R and a ﬁxed integer N ≥ p we deﬁne the “step-N Euler apG proximation” to (10.25) at time tk ∈ D by8

ytEuler;D := Et k ←−t 0 y0 := E k

S N (x) t

k −1 , t k

◦ · · · ◦ ES N (x) t 1 , t 0 y0 .

Theorem 10.30 Assume that (i) x ∈ C p-var [0, T ] , G[p] Rd ; (ii) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ (Re ); γ > p; (iii) D = {0 = t0 < t1 < · · · < tn = T } is a ﬁxed dissection of [0, T ] and write #D = n. Set N := %γ& ≥ [p] , and deﬁne the control p

p

ω (s, t) = |V |Lip γ xp-var;[s,t] . Then there exists C = C (p, γ) so that, with y = π (V ) (0, y0 ; x), #D θ Euler;D ω (tk −1 , tk ) , θ = (N + 1) /p > 1. yT − yT ≤ CeC ω (0,T ) k =1

8x

is a geometric p-rough path and S N (x) its unique step-N Lyons lift.

10.3 RDE solutions

239

If x ∈ C 1/p-H¨o l [0, T ] , G[p] Rd and tk +1 − tk ≡ |D| for all k then Euler;D yT − yT

N +1

θ −1

N +1

≤

CT |V |Lip γ x1/p-H¨o l;[0,T ] |D|

∼

|D|

θ −1

.

Proof. With D = (tk ) let z k := π (V ) (tk , Et k ←−t 0 y0 ; x) ∈ C ([tk , T ] , Re ) be the unique RDE solution to dy = V (y) dx, with time tk initial condition given by Et k ←−t 0 y0 . Note that z 0 = yT and z n = ET ←−t 0 y0 = yTEuler;D . Hence n k Euler;D zT − z k −1 ≤ yT − yT T k =1

and since zTk

=

zTk −1

= =

π (V ) tk , Et k −1 ←−t 0 y0 + E(V ) Et k −1 ←−t 0 y0 , SN (x)t k −1 ,t k ; x , T t k −1 ←−t 0 π (V ) tk −1 , E y0 ; x T π (V ) tk , π (V ) tk −1 , Et k −1 ←−t 0 y0 ; x t ; x , k

T

we can use Lipschitzness of RDE ﬂows (implied a fortiori by theorem 10.26; here c1 = c1 (γ, p)) to estimate k zT − z k −1 ≤ c1 ztk − ztk −1 exp (c1 ω (0, T )) . T k k On the other hand, k zt − ztk −1 = E(V ) Et k −1 ←−t 0 y0 , SN (x) t k −1 ,t k k k t k −1 ←−t 0 y0 ; x t ,t − π (V ) tk −1 , E k −1 k ≤ π (V ) (tk −1 , ·; x)t k −1 ,t k − E(V ) ·, SN (x)t k −1 ,t k

∞

and using Euler RDE estimates this is bounded from above by (N +1)/p

c2 ω (tk −1 , tk )

;

indeed, V ∈ Lipγ =⇒ V ∈ Lip(N +1)−1 with N + 1 = %γ& + 1 ≥ γ > p and we apply corollary 10.15 with N + 1 (instead of γ). A similar result holds for “geodesic” approximations in the sense of the following deﬁnition. Deﬁnition 10.31 (geodesic scheme forRDEs) Given D = {0 = t0 < t1 < · · · < tn = T }, x ∈ Cop-var [0, T ] , G[p] Rd , V = (Vi )1≤i≤d a collection of Lipschitz vector ﬁelds on Re , and a ﬁxed integer N ≥ p we deﬁne

Rough diﬀerential equations

240

the “step-N geodesic approximation” to (10.25) via any xD ∈ C 1-var [0, T ] , Rd such that SN xD t ,t = xt k −1 ,t k for all k = 1, . . . , n; k −1 k D sup SN x p-var;[0,T ] ≤ K xp-var;[0,T ] . D ∈D[0,T ]

The “step-N geodesic approximation” to (10.25) at time t ∈ [0, T ] is then simply deﬁned as ytgeo;D := π (V ) 0, y0 ; xD t . Remark 10.32 Such xD always exist and can be constructed as concatenations of geodesics associated to SN (x)t k −1 ,t k ∈ GN Rd . From SN (x)p-var;[0,T ] ≤ Cp,N xp-var;[0,T ] we can then take K = 31−1/p Cp,N . Proposition 10.33 Under the assumptions of Theorem 10.30, there exists C = C (p, γ, K) such that we have the error estimate (for the step-N geodesic approximation), #D θ geo;D C ω (0,T ) ≤ Ce − y ω (tk −1 , tk ) , θ = (N + 1) /p > 1. y T T k =1

Proof. With D = (tk ) let z k := π tk , yt k , xD ∈ C ([tk , T ] , Re ) be the unique ODE solution to dy = V (y) dxD , with time tk initial condition given by yt k = π (V ) (0, y0 ; x)t k . Observe that D −1 # k +1 geo;D z − zTk . ≤ yT − yT T k =0

Since zTk +1

= π tk +1 , yt k + 1 , xD T and zTk = π tk +1 , π tk , yt k , xD t

k+1

, xD

T

we can use Lipschitzness of RDE ﬂows (implied a fortiori by Theorem 10.26) p p to see that, with ω (s, t) = |V |Lip γ xp-var;[s,t] as earlier, k +1 k D z t ≤ c − z exp (c ω (t , T )) y − π , y , x 1 1 k +1 t k t (V ) T k+1 k T tk + 1 = c1 exp (c1 ω (tk +1 , T )) yt k ,t k + 1 − π (V ) tk , yt k , xD t ,t k k+1

with c1 = c1 (p, γ, K) , uniformly over D simply because SN xD ≤ K xp-var;[t k + 1 ,T ] for all k. From the geodesic p-var;[t k + 1 ,T ] error estimate it is clear that, with c2 = c2 (p, γ), k +1 θ z − zTk ≤ c2 exp (c2 ω (0, T )) × ω (tk , tk +1 ) T for θ > 1. In fact, the same argument as in Theorem 10.30 shows that θ can be taken to be (N + 1) /p. The proof is now easily ﬁnished.

10.4 Full RDE solutions

241

10.4 Full RDE solutions The RDEs we considered in the previous subsection map weak geometric p-rough paths to Re -valued paths of bounded p-variation. We shall now see that one can construct a “full” solution as a weak geometric p-rough path in its own right. This will allow us to use a solution to a ﬁrst RDE to be the driving signal for a second RDE. Relatedly, RDE solutions can then be used (as integrators) in rough integrals, cf. Section 10.6 below. This is not only for “functorial” beauty of the theory! It is precisely this reasoning that will enable us later to deal with various derivatives of RDEs such as the Jacobian of an RDE ﬂow. Let us also remark that in Lyons’ orginal work [116], existence and uniqueness was established by Picard iteration and so it was a necessity to work with “full” RDE solutions throughout.

10.4.1 Deﬁnition

Deﬁnition 10.34 Let x ∈ C p-var [0, T ] , G[p] Rd be a weak geometric p-rough path. We say that y ∈ C [0, T ] , G[p] (Re ) is a solution to the full rough diﬀerential equation (short: a full RDE solution) driven by x [p] e along the vector ﬁelds (Vi )i and started at y0 ∈ G (R ) if there exists n 1-var d [0, T ] , R such that (10.13) holds, and ODE a sequence (x )n in C solutions yn ∈ π (V ) (0, π 1 (y0 ) ; xn ) such that y0 ⊗ S[p] (y n ) converges uniformly to y when n → ∞. The (formal) equation dy = V (y) dx is referred to as a full rough diﬀeren tial equation (short: full RDE). This deﬁnition generalizes immediately to time intervals [s, T ] and we deﬁne π (V ) (s, ys ; x) ⊂ C [s, T ] , G[p] (Re ) to be the set of all solutions to the above full RDE starting at ys at time s, and in case of uniqueness, π (V ) (s, ys ; x) is the solution of the full RDE.9 A key remark about full RDEs (driven by x along vector ﬁelds V ) is that they are just RDEs (in the sense of the previous section) driven by x but along diﬀerent vector ﬁelds as made precise in the next theorem. We have Theorem 10.35 Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ), where γ > p; (ii) x : [0, T ] → G[p] Rd is a weak geometric p-rough path; ∼ R1+e+···+ e [ p ] is an initial condition; (iii) y0 ∈ G[p] (Re ) ⊂ T [p] (Re ) = (iv) y is a solution of the full RDE driven by x, along (V ) , started at y0 . 9 Make sure to distinguish between Re -valued RDE solutions denoted by π (. . . ) and G [p ] (Re )-valued full RDE solutions denoted by the bold greek letter π (. . . ) .

Rough diﬀerential equations

242

∼ R1+e+···+ e [ p ] is a solution to Then, u → zu := yu ∈ G[p] (Re ) ⊂ T [p] (Re ) = the RDE dzu = W (z) dxu [p ] e [p] driven by x along Lipγlo−1 (Re ) ∼ = R1+e+···+ e c (R )-vector ﬁelds on T given by Wi (z) = z ⊗ Vi (π 1 (z)) , i = 1, . . . , d.

Proof. By deﬁnition of full RDEs and RDEs, we can assume that x =S[p] (x) where x is of bounded variation. Then, if y = π (V ) (0, y0 ; x) , dyu

=

d y0 ⊗ S[p] (y)0,u

=

y0 ⊗ S[p] (y)0,u ⊗ dy0,u

=

yu ⊗ V (yu ) dxu .

10.4.2 Existence Theorem 10.35 is useful as it immediately implies a local existence theorem for full RDEs. For global existence we need to rule out explosion and we do this via the following quantitative estimates for full RDEs. Theorem 10.36 Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ), where γ > p; (ii) (xn ) is a sequence in C 1-var [0, T ] , Rd , and x is a weak geometric p-rough path such that lim d0;[0,T ] S[p] (xn ) , x = 0 and sup S[p] (xn )p-var;[0,T ] < ∞; n →∞

n

(iii) y0n ∈ G[p] (Re ) is a sequence converging to some y0 . Then, at least along a subsequence, y0n ⊗ S[p] π (V ) (0, π 1 (y0n ) ; xn ) converges in uniform topology. Moreover, there exists a constant C1 depending on p, γ, such that for any such limit point y, we have p p yp-var;[s,t] ≤ C1 |V |Lip γ −1 xp-var;[s,t] ∨ |V |Lip γ −1 xp-var;[s,t] . (10.26) Then, if xs,t : [s, t] → Rd is any continuous bounded variation path such that t s,t s,t dxu ≤ K x Sγ x s,t = Sγ (x)s,t and p-var;[s,t] s

10.4 Full RDE solutions

243

for some constant K ≥ 1, we have for all s < t in [0, T ] and all k ∈ {1, . . . , %γ&} , (10.27) π k Sγ (y)s,t − Sγ π (V ) s, π 1 (ys ) ; xs,t s,t pk γ +k −1 ∨ K |V |Lip γ −1 xp-var;[s,t] ≤ C2 K |V |Lip γ −1 xp-var;[s,t] γ +k −1

where C2 depends on p and γ. (For t − s small, the term (. . . ) inates.)

dom-

Proof. Observe ﬁrst that it is enough to prove the quantitative estimates for x = S[p] (x) , wherex is a bounded variation path, so that y = y0 ⊗ S[p] π (V ) (0, π 1 (y0 ) ; x) . Deﬁne the control ω by10 1/p

ω (s, t)

= K |V |Lip γ −1 xp-var;[s,t]

and consider the hypothesis (HN ) : 1/p

∃c1,N > 0 : ∀ s < t in [0, T ] : SN (y)p-var;[s,t] ≤ c1,N ω (s, t)

∨ ω (s, t) .

We aim to prove by induction that (HN ) holds for all N = 1, . . . , [p]. For N = 1, (HN ) follows from (Davie’s) Lemma 10.7. We now assume (HN ) for a ﬁxed N < [p] , and aim to prove (HN +1 ) . Fix s and t in 1/p [0, T ] , deﬁne α := ω (s, t) ∨ ω (s, t) and observe that SN (y)p-var;[s,t] ≤ 1/p ≤ c1,N . c1,N ω (s, t) ∨ ω (s, t) = c1,N α is equivalent to SN y α

For u ∈ [0, t − s], deﬁne zu = α.SN +1

1 y α

p-var;[s,t]

s,s+u

so that (we could write dxs+ u instead of dxs+u in the next line) dzu = Wα ,y s (zu ) dxs+ u ,

z0 = (α, 0, . . . , 0) ∈ T N +1 (Re )

(10.28)

with vector ﬁelds Wα ,y s on T N +1 (Re ) given by (cf. notation of Theorem 10.35) 

Wα ,y s

1 0 For

 (ys + π 1 (z))  for z ∈ T N +1 (Re ) . ... (z) =  1 α z⊗Vd (ys + π 1 (z)) 1 α z⊗V1

the proof of the ﬁrst estimate, (10.26), we take K = 1.

Rough diﬀerential equations

244

Observe that Wα ,y s (z) only depends on z through π 0,N (z); that is, it does not depend on π N +1 (z) . Then, for k = 0, . . . , N, 1 |π k (zu )| = sup π k ◦ SN y sup α α s,s+ u u ∈[0,t−s] u ∈[0,t−s] k 1 y ≤ c2,N sup SN α s,s+ u u ∈[0,t−s] ≤

c3,N from the induction hypothesis. Deﬁne Ω = z ∈ T N +1 (Re ) , α1 |π 0,N (z)| < c3,N + 1 and observe that {zu : u ∈ [0, t − s]} ⊂ Ω. On the other hand, for z ∈ Ω, ! 1 1 π 0,N (z) ⊗Vi (. . . ) ≤ (c3,N + 1) |Vi (. . . )| z⊗Vi (. . . ) = α α so that |Wα ,y s |Lip γ −1 (Ω) ≤ c5,N |V |Lip γ −1 . We can then ﬁnd compactly ˜ α ,y which coincide with Wα ,y on Ω, and such supported vector ﬁelds W s s that ˜ Wα ,y s γ −1 ≤ c4 |Wα ,y s |Lip γ −1 (Ω) ≤ c5,N |V |Lip γ −1 . Lip

Moreover, since z|[0,t−s] remains in Ω, we see that z actually solves ˜ α .y (zu ) dxs+u , dzu = W s

z0 = (α, 0, . . . , 0) .

From Lemma 10.7, we have ˜ ˜ p p |zt−s −α| ≤ c6,N Wy s γ −1 xp-var;[s,t] ∨ Wy s γ −1 xp-var;[s,t] Lip (Ω) (Ω) Lip p p ≤ c7,N |V |Lip γ −1 xp-var;[s,t] ∨ |V |Lip γ −1 xp-var;[s,t] =c7,N α. This reads α δ α1 SN +1 (y)s,t − 1 ≤ c7,N α and, using Proposition 7.45, we have δ α1 SN +1 (y)s,t ≤ c7,N , which is equivalent to 1/p SN +1 (y)s,t ≤ c7,N α = c1,N +1 ω (s, t) ∨ ω (s, t) . This ﬁnishes the induction step and thus the proof of (10.26).

10.4 Full RDE solutions

245

For the proof of (10.27) we proceed similarly. We ﬁrst write α.Sγ α1 y |[0,t−s] as solution to a diﬀerential equation of the form (10.28), but now with vector ﬁelds Wα ,y s on T γ (Re ). Applying the “geodesic approximation” error estimate from (Davie’s) Lemma 10.7 we see that 1 1 γ /p s,t y π (V ) s, ys ; x − αSγ , ≤ c1 ω (s, t) α.Sγ α s,t α s,t which means that for all k ∈ {1, . . . , %γ&} , γ /p . π k Sγ (y)s,t − Sγ π (V ) s, ys ; xs,t s,t ≤ c1 αk −1 ω (s, t) (10.29) 1/p Recalling that, by deﬁnition, α = ω (s, t) ∨ ω (s, t) leads to an estimate of the form (10.27) with right-hand side given by a constant times !

1/p

ω (s, t)

"γ +k −1

"γ +p(k −1) ! 1/p ∨ ω (s, t) .

"γ +k −1 ! 1/p Obviously, the term ω (s, t) dominates for ω (s, t) ≤ 1. For ω (s, t) ≥ 1 it is in fact better to estimate each term on the left-hand side of (10.29) separately, using in particular k k π k Sγ (y)s,t ≤ Sγ (y)s,t ≤ yp-var;[s,t] , thanks to estimates for the Lyons lift and (10.26). Remark 10.37 Observe that the (ﬁrst part of the) above proof shows that 1/p 1/p = |V |Lip γ −1 for any ﬁxed s < t, and α ≥ ω (s, t) ∨ ω (s, t) with ω (s, t) xp-var;[s,t] r ∈ [0, t − s] → zr = α δ 1/α y s,s+ r ∈ T [p] Rd ˜ α ,y (zr ) dxs+ r started from z0 = (α, 0, . . . , 0) along comsatisﬁes dzr = W s ˜ α ,y on T [p] (Re ) which satisfy pactly supported vector ﬁelds W s ˜ Wα ,y s γ −1 ≤ c5,N |V |Lip γ −1 Lip

1

˜ α ,y (z) ≡ z⊗Vj (ys + π 1 (z)) and W on z ∈ T [p] (Re ) , α1 |z| < c s α j =1,...,d for some c dependent on p.

10.4.3 Uniqueness and continuity Theorem 10.26 states uniqueness of RDE solutions, but ignores full RDEs. We observe that full RDE solutions are RDE solutions driven by diﬀerent

246

Rough diﬀerential equations

vector ﬁelds. In particular, if we put ourselves under the same conditions as Theorem 10.26, we automatically obtain uniqueness of full RDEs. What is less obvious is that we have a similar Lipschitz bound for the (full) Itˆ o– Lyons map. Just as in the existence discussion, this estimate on full RDE solutions is actually a consequence of our earlier estimate on RDE solutions. Theorem 10.38 Assume that (i) V 1 = Vi1 1≤i≤d and V 2 = Vi2 1≤i≤d are two collections of Lipγ -vector ﬁelds on Re for γ > p ≥ 1; (ii) ω is a ﬁxed control; (iii) x1 , x2 ∈ C p-var [0, T ] , G[p] Rd , with xi p−ω ≤ 1; of as initial conditions; (iv) y01 , y02 ∈ G[p] (Re ) thought time-0 (v) υ is a bound on V 1 Lip γ and V 2 Lip γ . For i = 1, 2 we set yi = π (V i ) 0, y0i ; xi ; that is, the full RDE solutions driven by xi , starting at y0 , along the vector ﬁelds V1i , . . . , Vdi . Then we have the following Lipschitz estimate on the (full) Itˆ o–Lyons map:11 ρp,ω y1 , y2 ! " ≤ C υ y01 − y02 + V 1 − V 2 Lip γ −1 + υρp,ω x1 , x2 exp (Cυ p ω (0, T )) , where C = C (γ, p) and y0i = π 1 y0i ∈ Re . 1/p

Proof. Fix s < t in [0, T ] , and deﬁne α = c2 ω (s, t) i exp (c2 ω (0, T )) ∈ R. . As noticed in Deﬁne for i = 1, 2 and r ∈ [0, t − s] , zri = α δ 1/α ys,s+r Remark 10.37, z i is the solution of the RDE ˜ i i zri dxis+r , dzri = W z0i = α, α ,y s ˜i ˜ αi ,y (z) ≡ 1 z⊗V i (ys + π 1 (z)) ≤ cυ, and W on where W α ,y s j α s j =1,...,d γ Lip i 1 [p] e i z ∈ T (R ) : α |z| < c for some c dependent on p. Writing yt = π 1 yt it is also easy to see that 2 2 ˜2 ˜ 1 1 V ys + . − V 1 ys1 + . γ −1 ≤ c Wα ,y s2 −W 1 α ,y s Lip Lip γ −1 2 ≤ c1 V − V 1 Lip γ −1 + c1 V 1 ys2 + . − V 1 ys1 + . Lip γ −1 ≤ c2 V 2 − V 1 Lip γ −1 + ys2 − ys1 .υ (10.30) 1 1 By deﬁnition of a full RDE solution y, any increment y s , t depends on the starting point y 0 only through y 0 = π 1 (y 0 ). This explains why we don’t have y 01 − y 02 = 1 y − y 2 [ p ] e on the right-hand side. 0 0 T (R )

10.4 Full RDE solutions

247

where we used crucially V 1 ∈ Lipγ in the last estimate. Therefore, we obtain from Theorem 10.26 that, with ε := ρp,ω ;[0,T ] x1 , x2 , 1 1 2 zt−s − zt−s ≤ c3 υ ys1 − ys2 + V 1 − V 2 Lip γ −1 + ευ ω (s, t) p exp (c3 υ p ω (0, T )) . But this says precisely that for all k = 1, . . . , [p] and s < t in [0, T ] , 1 k 2 ≤ c3 υ ys1 − ys2 + V 1 − V 2 γ −1 + ευ ω (s, t) p π k ys,t − ys,t Lip exp (c3 υ p ω (0, T )) .

(10.31)

∈T (R ) on the [We insist that one does not have the norm of right-hand side above, but ys1 − ys2 coming from (10.30).] But from Theo1/p rem 10.26, noting that ω (0, s) exp (Cυ p ω (0, T )) υ −1 exp (c4 υ p ω(0, T )), we have ! " υ ys1 − ys2 ≤ c4 υ y01 − y02 + V 1 − V 2 Lip γ −1 + ευ exp (c4 υ p ω (0, T )) . ys1

− ys2

N +1

e

Plugging this inequality into (10.31) gives us the desired result. The following corollaries can then be proved with almost identical arguments as in the RDE case. Corollary Assume that 10.39 (i) V 1 = Vi1 1≤i≤d and V 2 = Vi2 1≤i≤d are two collections of Lipγ -vector ﬁelds on Re for γ > p ≥ 1; (ii) x1 , x2 ∈ C p-var [0, T ] , G[p] Rd ; (iii) y01 , y02 ∈ G[p] (R e ) thought of time-0 initial conditions; (iv) υ is a bound on V 1 Lip γ and V 2 Lip γ and p a bound on x1 p-var;[0,T ] and x2 p-var;[0,T ] . Then, if yi = π (V i ) 0, y0i ; xi , we have % 1 y0 − y02 + 1 V 1 − V 2 γ −1 ρp-var;[0,T ] y1 , y2 ≤ Cvp Lip v " + ρp-var;[0,T ] δ 1 /p x1 , δ 1 /p x2 exp Cυ p pp , where C = C (γ, p) and y0i = π 1 y0i ∈ Re . For simplicity of the statement, we once again remove the dependency in the vector ﬁelds. Corollary 10.40 Let V = (Vi )1≤i≤d be a collection of Lipγ -vector ﬁelds on Re for γ > p ≥ 1. If ω is a control, p ≥ p and R > 0, the maps → C p-ω [0, T ] , G[p] (Re ) , dp ,ω Re × xp,ω ≤ R , dp ,ω (y0 , x)

→

π (V ) (0, y0 ; x)

Rough diﬀerential equations

248

and Re ×

and Re ×

xp-var ≤ R , dp -var

→

(y0 , x)

→

xp-var ≤ R , d∞

→

(y0 , x)

→

C p-var [0, T ] , G[p] (Re ) , dp -var

π (V ) (0, y0 ; x)

C p-var [0, T ] , G[p] (Re ) , d∞

π (V ) (0, y0 ; x)

are also uniformly continuous.

10.5 RDEs under minimal regularity of coeﬃcients We now show the uniqueness of solutions to RDEs driven by geometric prough paths along Lipp -vector ﬁelds. In fact, the driving signal only needs to be of ﬁnite ψ p,1 -variation where we recall from Section 5.4 that ψ p,1 (t) = tp / (ln∗ ln∗ 1/t) where ln∗ ≡ max (1, ln) . For instance, with probability one, (enhanced) Brownian sample paths have ﬁnite ψ 2,1 -variation but are not geometric 2-rough paths (i.e. don’t have ﬁnite 2-variation), as will be discussed in the later Sections 13.2 and 13.9. The main interest in the reﬁned regularity assumption is that it shows that RDE solutions driven by (enhanced) Brownian motion have unique solutions under Lip2 -regularity assumptions. Theorem Assume thatp ≥ 1, and that 10.41 (i) V 1 = Vj1 1≤j ≤d and V 2 = Vj2 1≤j ≤d are two collections of Lipp -vector ﬁelds on Re ; (ii) x1 , x2 ∈ C ψ p , 1 -var [0, T ] , G[p] Rd ; e (iii) y01 , y02 ∈ G[p] of as time-0 initial conditions; (Ri ) thought i (iv) y ∈ π (V i ) 0, y0 ; xi for i = 1, 2 (that is, they are full RDE solutions i driven by xi , starting at y0i , along i the vector ﬁelds V ); i (v) assume V Lip p ≤ υ and x ψ -var;[0,T ] ≤ R for i = 1, 2. p ,1 Then, π (V i ) 0, y0i ; xi is a singleton; that is, there exists a unique full RDE solution yi = π (V i ) 0, y0i ; xi starting at y0i driven by xi along V i . Moreover, for all ε > 0, there exists µ = µ (ε; p, υ, R) > 0 such that12 1 y0 − y02 + V 1 − V 2 p −1 + d∞ x1 , x2 < µ Lip implies d∞ y1 , y2 < ε. 1 2 For

p = 1, V 1 − V 2 L ip p −1 is replaced by 2 V 1 − V 2 ∞ .

10.5 RDEs under minimal regularity of coeﬃcients

249

Remark 10.42 The theorem applies in particular to geometric p-rough paths, i.e. when x1 , x2 ∈ C p-var [0, T ] , G[p] Rd . It is also clear that under Lipp -regularity, existence of (full) is not an issue here so that RDE solutions the set of RDE solutions π (V i ) 0, y0i ; xi is not empty. Proof. Since we only deal with supremum distance d∞ and without quantitative estimates, it is enough to prove the above result for RDE rather than full RDE. Running constants c1 , c2 , . . . may depend on p, υ, R which 1 2 y , taken , y are kept ﬁxed in this proof. Let ε(µ) be the supremum of d ∞ over all RDE solutions yi ∈ π (V i ) 0, y0i ; xi , i = 1, 2 such that V i , xi satisfy (v) and such that 1 y0 − y02 + V 1 − V 2 p −1 + d∞ x1 , x2 < µ; Lip we show that ε = ε (µ) → 0 as µ → 0. Construction of xis,t : For i = 1, 2, we deﬁne ω i (s, t) =

sup



D ∈D([s,t]) t ∈D i

ψ p,1 

i xt i ,t i + 1 xi ψ p , 1 -var;[0,T ]

 ;

Proposition 5.39, ω i is a control with ω i (0, T ) ≤ 1. Deﬁne ω = from 1 2 ω +ω . 1/p Using the fact that ψ −1 with δ (z) = p,1 (·) ≤ c1 ψ 1/p,−1/p (·) = c1 δ (·) ∗ ∗ 1 z ln ln z (see Lemma 5.48), it follows (again using Proposition 5.39) that for all s < t in [0, T ], i xs,t ≤ c2 δ (ω (s, t))1/p . 1/p 1/p As δ (z) ≤ z 1/p for all p > p, we see that xi p -var;[s,t] ≤ c3 ω (s, t) . By interpolation, we obtain that for all p > p , there exists g1 , a continuous function 0 at 0 such that, for all s < t in [0, T ] and k = 1, . . . , [p] , 1 π k xs,t − x2s,t ≤ g1 (µ)ω (s, t)k /p . This immediately implies that for α < 1 (we can take it as close to 1 as we want, and in particular we take it greater than {p}) and for all s < t in [0, T ] , and for k = 1, . . . , [p] , 1 k + α −1 π k xs,t − x2s,t ≤ c3 g1 (µ) ω (s, t) p As δ δ (ω (s,t)) −1 / p xis,t ≤ c2 , and

≤

c3 g1 (µ) δ (ω (s, t))

k + α −1 p

.

(α −1)/p , δ δ (ω (s,t)) −1 / p x1s,t − δ δ (ω (s,t)) −1 / p x2s,t ≤ c3 g1 (µ) δ (ω (s, t))

Rough diﬀerential equations

250

Proposition 7.64 provides us with two paths x1,s,t and x2,s,t such that S[p] xi,s,t s,t = xis,t , and

t

≤ d xi,s,t r

1/p

c4 δ (ω (s, t))

,

s

t

≤ d x1,s,t − x2,s,t r r

α /p

c4 g1 (µ) δ (ω (s, t))

.

s

We deﬁne similarly xi,s,u and xi,t,u , and then xi,s,t,u to be the concatenation of xi,s,t and xi,t,u . Estimates on Γ: Following closely the pattern of proof of Theorem 10.26, we set for s < t, i − π s, ysi ; xi,s,t s,t , i = 1, 2, Γis,t = ys,t and Γs,t := Γ1s,t − Γ2s,t . Theorem 10.14 gives us for p ∈ (p, [p] + 1), i 1 Γs,t ≤ c5 xi [p]+

p -var;[s,t]

.

Using Proposition 5.49 and Theorem 5.43, we see that i i x ≤ c6 ψ −1 p,1 ω (s, t) p -var;[s,t] ≤

1/p

c7 δ (ω (s, t))

.

In particular, we have with θ = ([p] + 1) /p > 1, Γs,t ≤ c7 δ (ω (s, t))θ ≤ c8 δ (ω (s, t)) .

(10.32)

Deﬁne for i = 1, 2, Ai := π (V i ) s, ysi ; xi,s,t,u s,u − π (V i ) s, ysi ; xi,s,u s,u B i := π (V i ) t, yti ; xi,t,u t,u − π (V i ) t, π (V i ) s, ysi ; xi,s,t t ; xi,t,u t,u ¯ := B 1 − B 2 , we obtain Γs,u − Γs,t − Γt,u = A¯ + B ˜ so and A¯ := A1 − A2 , B that ¯ . Γs,u − Γs,t − Γt,u ≤ A¯ + B 1/p

From Lemma A, applied with parameters := c4 δ (ω (s, u)) α /p c4 g1 (µ) δ (ω (s, u)) , it follows that A¯ ≤ c9 ys1 − ys2 + V 1 − V 2 p −1 δ (ω (s, u)) Lip α /p

≤

[p ]+ α

+ c9 g1 (µ) δ (ω (s, u)) .δ (ω (s, u)) p c9 ys1 − ys2 + V 1 − V 2 Lip p −1 + g1 (µ) δ (ω (s, u)) .

, δ :=

10.5 RDEs under minimal regularity of coeﬃcients

251

On the other hand, maxi=1,2 π (V ) s, ysi ; xs,t t − yti = maxi=1,2 Γis,t and with (10.32), Γs,t ≤ c8 δ (ω (s, t))θ ≤ c8 δ (ω (s, t)) .

(10.33)

From Lemma B it then follows that 1 ¯ ≤ c3 Γs,t δ (ω (s, u)) p + c3 g1 (µ) δ (ω (s, u))1+α /p B α /p + c3 yt1 − yt2 + V 1 − V 2 Lip γ −1 + g1 (µ) δ (ω (s, u)) 1

·δ (ω (s, u)) p

(1+p(m in(2,p)−1))

.

Observe that p1 (1 + p (min (2, p) − 1)) ≥ p1 min p + 1, p2 − 2p + 1 + p ≥ 1, and obviously 1 + α/p ≥ 1. Putting things together, we obtain 1 ¯ ≤ c12 Γs,t δ (ω (s, u)) p B + c12 yt1 − yt2 + V 1 − V 2 Lip p −1 + g1 (µ) .δ (ω (s, u)) . ¯ , we obtain that Adding the inequalities on A¯ and B 1 Γs,u − Γs,t − Γt,u ≤ c13 Γs,t δ (ω (s, u)) p (10.34) + c13 max yr1 − yr2 + V 1 − V 2 Lip p −1 + g1 (µ) δ (ω (s, u)) . r ∈{s,t}

We also have, from Theorem 3.18, that π (V 1 ) s, ys1 ; x1,s,t − π (V 2 ) s, ys2 ; x2,s,t 1/p α /p ≤ c14 ys1 − ys2 + V 1 − V 2 Lip p −1 δ (ω (s, t)) + g1 (µ) ω (s, t) α /p . (10.35) ≤ c14 ys1 − ys2 + V 1 − V 2 p −1 + g1 (µ) ω (s, t) Lip

Conclusion: Thanks to estimates (10.34), (10.35) and (10.33), we can apply Proposition 10.70 (found in Appendix 10.8) to obtain our result. By interpolation, we obtain the following corollary. Corollary 10.43 Let V = (V1 , . . . , Vd ) be a collection of Lipp -vector ﬁelds on Re . Fix p > p > p > 1, a control ω, R ∈ (0, ∞] and set13 Ωp -ω (p, R) = x : xp -ω < R and xψ p , 1 -var < R . Ω (p, R) = x : xψ p , 1 -var < R . 1 3 In

general, C ψ p , 1 -va r C p

-ω .

Rough diﬀerential equations

252

Then the maps Re × (Ωp -ω (p, R) , dp -ω ) → C ψ p , 1 -var ∩ C p -ω [0, T ] , G[p] (Re ) , dp -ω (y0 , x) → π (V ) (0, y0 ; x) and Re × (Ω (p, R) , dp -var )

→

(y0 , x) →

C ψ p , 1 -var [0, T ] , G[p] (Re ) , dp -var

π (V ) (0, y0 ; x)

are uniformly continuous for R ∈ (0, ∞) and continuous for R = +∞. (This also holds when dp -var is replaced by d∞ .) Proof. Since continuity is a local property, it suﬃces to consider the case R < ∞. Observe that for all x ∈ Ω (p, R), we have π (V ) (0, y0 ; x)

ψ p , 1 -var

< c(R),

as follows from Corollary 5.44 and Proposition 5.49. Then, we know from Theorem 10.41 that Re × (Ω (p, R) , d∞ ) → C ψ p , 1 -var [0, T ] , G[p] (Re ) , d∞ (y0 , x) →

π (V ) (0, y0 ; x)

is uniformly continuous. In particular, this that for implies some ﬁxed ε, R > 0, there exists µ > 0 such that if y01 − y02 + dp -var x1 , x2 < µ, with x1 , x2 ∈ Ω (p, R), then d∞ π (V ) 0, y01 ; x1 , π (V ) 0, y02 ; x2 < ε. Using π (V ) 0, y0i ; xi ψ -var < c(R) we can use interpolation to obtain p ,1 uniform continuity of Re × (Ω (p, R) , dp -var )

→

(y0 , x) →

C ψ p , 1 -var [0, T ] , G[p] (Re ) , dp -var

π (V ) (0, y0 ; x) .

The modulus ω case follows from the same argument, except that for the interpolation step, we need to assume that the p -ω norm of xi , i = 1, 2 is bounded by R (since, for a given ω, we cannot be sure that the ψ p,1 variation of xi is controlled by this ω).

10.6 Integration along rough paths

253

10.6 Integration along rough paths With our main interest in (rough) diﬀerential equations we constructed RDEs directly as limits of ODEs. In the same spirit, we now deﬁne “rough integrals” as limits of Riemann–Stieltjes integration. Given the work already done, we can take a short-cut and derive existence, uniqueness and continuity properties quickly from the previous RDE results. Deﬁnition 10.44 (rough integrals) Let x ∈ C p-var [0, T ] , G[p] Rd be a weak geometric p-rough path, and ϕ = (ϕi )i=1,...,d a collection of maps from Rd to Re . We say that y ∈ C [0, T ] , G[p] (Re ) is a rough path integral of ϕ along x, if there exists a sequence (xn ) in C 1-var [0, T ] , Rd such that ∀n : xn0 = π 1 (x0 ) lim d0;[0,T ] S[p] (xn ) , x = 0 n →∞ sup S[p] (xn )p-var;[0,T ] < ∞ n

. ϕ (xnu ) dxnu , y = 0. lim d∞ S[p]

and

n →∞

0

We will write ϕ (x) dx for the set of rough path integrals of ϕ along x. If this set is a singleton, it will denote the rough path integral of ϕ along x.

· We ﬁrst note that a classical indeﬁnite Riemann–Stieltjes integral 0 ϕ (x) dx can be written as a (projection to the y-component of the) ODE solution to dz = dx, dy = ϕ (z) dx, (z, y) = (x0 , 0). This leads to the key remark that rough integrals can be viewed as (projections of) a solution to (full) RDEs driven by x along vector ﬁelds V = (V1 , . . . , Vd ) given by (10.36) Vi (x, y) = (ei , ϕi (x)) , where (e1 , . . . , ed ) is the standard basis of Rd . Obviously V has the same amount of Lip-regularity as ϕ = (ϕi )i=1,...,d , viewed as a map ϕ : Rd → L Rd , Re ; in fact, |V |Lip γ −1 1 + |ϕ|Lip γ −1 . Thus, if one is happy with rough integration under Lipγ -regularity, γ > p, existence, uniqueness and uniform

254

Rough diﬀerential equations

continuity on bounded sets (in fact: Lipschitz continuity on bounded sets with respect to the correct rough path metric ρp-var ) of → C p-var [0, T ] , G[p] (Re ) Lipγ × C p-var [0, T ] , G[p] Rd (ϕ, x) → ϕ (x) dx is immediate from the corresponding results on full RDEs in Section 10.4. The point of the forthcoming theorem is that for rough path integration, one gets away with Lipγ −1 regularity. As can be seen already from

the next lemma, there is no hope for local Lipschitz continuity of (ϕ, x) → ϕ (x) dx, but uniqueness and uniform continuity on bounded sets hold true. Lemma 10.45 Let (i) ϕ1j j =1,...,d , ϕ2j j =1,...,d be two collections of Lipα Rd , Re -maps, α ∈ (0, 1], so that maxi=1,2 ϕi Lip α ≤ 1; (ii) xi ∈ C 1-var [s, t] , Rd such that, for some ≥ 0 and ε ∈ [0, 1] , i x ≤ , 1-var;[s,t] 2 1 x − x ≤ ε and x2s − x1s ≤ ε. 1-var;[s,t] Then, for k ∈ {1, 2, . . . } , 2 2 1 1 2 1 ϕ x dx ϕ x dx − π k Sk π k Sk s,t s,t 2 ≤ ϕ − ϕ1 ∞ + εα k . Proof. From Proposition 2.8, we easily see that ϕ2 x2 dx2 − ϕ1 x1 dx1 ≤ ϕ2 − ϕ1 ∞ + εα . 1-var;[s,t]

The conclusion follows then from Proposition 7.63. Lemma that 10.46 (Lemma Aintegral ) Assume (i) ϕj j =1,...,d is a collection of Lipγ −1 Rd , Re -maps where γ > 1; (ii) s < u are some elements in [0, T ]; (iii) x and x ˜ are some paths in C 1-var [s, u] , Rd such that Sγ (x)s,u = x)s,u and xs = x ˜s ; Sγ (˜

u

u x|. (iv) ≥ 0 is a bound on s |dx| + s |d˜ Then we have, for some constant C = C (γ) and for all k ∈ {1, . . . , %γ&} , k ϕ (x) dx ϕ (˜ x) d˜ x −π k ◦ Sk ≤ C ϕLip γ −1 γ +k −1 . π k ◦ Sk s,u s,u

10.6 Integration along rough paths

255

· Proof. The indeﬁnite Re -valued integral s ϕ (x) dx is also a (projection of the) solution to the ODE y· = π (V ) (s, (xs , 0) ; x) with V given by (10.36). By homogeneity, we may assume ϕLip γ −1 = 1 so that |V |Lip γ −1 ≤ c1 . We leave it to the reader to provide a self-contained “Riemann–Stieltjes/ODE” proof of the claimed estimate. Somewhat shorter, if more fanciful, we can think of y as a solution (again, strictly speaking, a projection thereof) of an RDE driven by a weak geometric 1-rough path x. The claimed estimate is now a consequence of the “geodesics error-estimate” of Theorem 10.36 with p = 1 and K = 1. Indeed, we take a geodesic xs,u associated with Sγ (x)s,u such that x)s,u and Sγ (xs,u )s,u = Sγ (x)s,u = Sγ (˜ u |dxs,u | ≤ Sγ (x)1-var;[s,u ] = |x|1-var;[s,u ] . s

˜s we see that both y· = π (V ) (s, (xs , 0) ; x) and y˜· = π (V ) Since xs = x (s, (˜ xs , 0) ; x ˜) have the same geodesic approximation given by π (V )(s, (xs , 0) ; xs,u ). On the “projected” level of the integral this amounts to consid· s,u ering s ϕ (xs + xs,u s,· ) dx· ) and so, for all s < u in [0, T ] and for all k ∈ {1, . . . , %γ&} , · · s,u s,u ϕ (x) dx − Sγ ϕ xs + xs,· dx· π k Sγ s s s,u s,u γ + k −1 ≤ c2 γ + k −1 . ≤ c2 |x|1-var;[s,u ] The same estimate holds with x replaced by x ˜ and an application of the triangle inequality ﬁnishes the proof. Theorem Existence: Assume that 10.47 (i) ϕ = ϕj j =1,...,d is a collection of Lipγ −1 Rd , Re -maps where γ > p ≥ 1; (ii) x is a geometric p-rough path in C p-var [0, T ] , G[p] Rd . Then, for all s < t ∈ [0, T ] , there

exists a unique rough-path integral of ϕ along x. The indeﬁnite integral ϕ (x) dx is a geometric rough path: there exists a constant C1 depending only on p and γ such that for all s < t in [0, T ], p ϕ (x) dx x . ≤ C ϕ ∨ x γ −1 1 Lip p-var;[s,t] p-var;[s,t] p-var;[s,t]

Also, if xs,t : [s, t] → Rd is any continuous bounded variation path such that t s,t s,t s,t dxu ≤ K x xs = π 1 (xs ) , Sγ x s,t = Sγ (x)s,t and p-var;[s,t] s

Rough diﬀerential equations

256

for some constant K, we have for all s < t in [0, T ] with K xp-var;[s,t] < 1, and all k ∈ {1, . . . , %γ&} , # $ s,t s,t ϕ xu dxu ϕ (x) dx − Sγ π k s,t s,t γ +k −1 k ≤ C2 ϕLip γ −1 K xp-var;[s,t] , (10.37) where C2 depends on p and γ.

Uniqueness, continuity: There exists a unique element in ϕ (x) dx. More precisely, if ω is a ﬁxed control, max ϕi Lip γ −1 , xi p-ω ;[0,T ] < R, i=1,2

and

ε = x10 − x20 + ρp-ω ;[0,T ] x1 , x2 + ϕ2 − ϕ1 Lip γ −1 ,

then for some constant β = β (γ, p) > 0 and C = C (R, γ, p) ≥ 0, 1 1 2 2 1 2 ϕ x dx , ϕ x dx ≤ Cεβ . ρp-ω ;[0,T ] Proof. Existence, ﬁrst proof: For some ﬁxed r > 0, consider the collection of vector ﬁelds V (r ) deﬁned by (r )

Vi

(x, y) = (ei , rϕi (x)) for x, y ∈ Rd × Re ,

where ei is the ith standard basis vector of Rd . Observe that V (r ) is (γ − 1)Lipschitz, and that for a bounded variation path we have . rϕ (xu ) dxu . π (V ( r ) ) (0, (x0 , 0) ; x) = x, 0

In view of (γ − 1)-Lipschitz regularity of the vector ﬁelds we can apply Theorem 10.36; we obtain existence of a rough integral of x along ϕ, and that every element y(r ) of rϕ (x) dx satisﬁes (r ) y

p-var;[s,t]

≤ c1

(r ) V

Lip γ −1

p xp-var;[s,t] ∨ V (r )

Lip γ −1

p xp-var;[s,t] .

Now, setting y = y(1) it is easy to see that y(r ) = δ r y and V (r ) Lip γ −1 ≤ c2 1 + r |ϕ|Lip γ −1 ; thus, picking r = 1/ |ϕ|Lip γ −1 , we obtain p yp-var;[s,t] ≤ c3 |ϕ|Lip γ −1 xp-var;[s,t] ∨ xp-var;[s,t] .

10.6 Integration along rough paths

257

The error estimate (10.37) is also a consequence of the corresponding estimate in Theorem 10.36, again applied to full RDEs along vector ﬁelds V (r ) with r = 1/ |ϕ|Lip γ −1 . Existence, second proof: We provide a second existence proof, based on by now standard arguments, which is (notationally) helpful for the forthcoming uniquess proof. Similar to the proof of RDE existence, it suﬃces to establish uniform estimates for x of the form x = S[p] (x), followed by a straightforward limiting argument. Let us thus assume x = S[p] (x) and 1/p xs,t ≤ ω (s, t) for some control ω. By assumption, there exists xs,t , which we think of as an “almost” geodesic path associated with xs,t , and we deﬁne ϕ (x) dx ϕ xs,t dxs,t − Sγ . Γs,t = Sγ s,t

s,t

Then, for s < t < u, we have Γs,u = ∆1 − ∆2 + ∆3 where ∆1

=

∆2

=

Sγ

∆3

=

Sγ

⊗ Sγ

ϕ (x) dx

Sγ

s,t

ϕ xs,t dxs,t

ϕ (x) dx

⊗ Sγ

ϕ x

s,t

ϕ xs,t,u dxs,t,u

, t,u

t,u

dx

t,u

− Sγ

, t,u

ϕ (xs,u ) dxs,u

.

s,u

s,u

First, using Lemma Aintegral (Lemma 10.46), we obtain that for all N ∈ {1, . . . , %γ&} |π N (∆3 )| ≤ c1 ω (s, u) We also see that ϕ (x) dx ∆1 − ∆2 = Sγ

γ + N −1 p

.

(10.38)

⊗ Γt,u + Γs,t ⊗ Sγ

ϕ x

t,u

dx

t,u

s,t

t,u

so that Γs,u − Γs,t ⊗ Γt,u equals ϕ xs,t dxs,t ϕ xt,u dxt,u Sγ ⊗ Γt,u + Γs,t ⊗ Sγ s,t

+ ∆3 . t,u

(10.39) We now prove by induction in N ∈ {1, . . . , %γ&} that ∀s < u in [0, T ] : |π N (Γs,u )| ≤ c2 ω (s, u)

γ + N −1 p

1/p . exp c2 ω (s, u)

Rough diﬀerential equations

258

For N = 0, this is obvious as π 0 (Γs,t ) = 0. Assume now that γ + k −1 1/p ∀s < u in [0, T ] : |π k (Γs,u )| ≤ c2 ω (s, u) p exp c2 ω (s, u) holds for all k < N . Then, from (10.39), we have π N (Γs,u ) − π N (Γs,t ) − π N (Γt,u ) =

π N (∆3 ) +

N −1 k =1

+

N −1

π k ◦ Sγ

π k (Γs,t ) ⊗ π N −k (Γt,u )

ϕ x

s,t

dx

N −1

⊗ π N −k (Γt,u )

s,t s,t

k =1

+

(10.40)

π k (Γs,t ) ⊗ π N −k ◦ Sγ

ϕ xt,u dxt,u

, t,u

k =1

so that, using the induction hypothesis, (10.38) and bounds of the type s,t s,t k /p dx ϕ x ≤ c2 ω (s, t) , π k ◦ Sγ s,t we have14

1/p . exp c2 ω (s, u) (10.41) We can then classically use Lemma 10.59 to obtain that γ + N −1 1/p , |π N (Γs,u )| ≤ c4 ω (s, u) p exp c4 ω (s, u) |π N (Γs,u ) − π N (Γs,t ) − π N (Γt,u )| ≤ c3 ω (s, u)

γ + N −1 p

which concludes the induction proof. The triangle inequality then leads to, for k ∈ {1, . . . , %γ&}, k /p 1/p , ϕ (x) dx ≤ c4 ω (s, t) exp c4 ω (s, t) π k Sγ s,t which is equivalent to saying 1/p 1/p . ϕ (x) dx Sγ ≤ c5 ω (s, t) exp c5 ω (s, t) s,t 1 4 As we shall need (10.41) in the uniqueness/continuity part below, let us point out that (10.41) also follows from the “ﬁrst” existence proof, namely from the error estimate (10.37).

10.6 Integration along rough paths

259

An application of Proposition 5.10 then gives 1/p ϕ (x) dx Sγ ≤ c6 ω (s, t) ∨ ω (s, t) . s,t Continuity/uniqueness: Without loss of generality, we assume %γ& = [p] , and also, by simple scaling, that maxi=1,2 ϕi Lip γ −1 ≤ 1 (so that we can use Lemma 10.45). We set ε = x10 − x20 + ρp-ω ;[0,T ] x1 , x2 + ϕ2 − ϕ1 Lip γ −1 and agree that, in this part of the proof, constants may depend on ω (0, T ) and R. Then we deﬁne (exactly as in the proof of Theorem 10.26) two paths x1,s,t and x2,s,t such that S[p] xi,s,t s,t = xis,t , i = 1, 2, and such that

i,s,t dxr

≤

c7 ω (s, t)

1,s,t dxr − dx2,s,t r

≤

c7 εω (s, t)

t

1/p

, i = 1, 2,

s

t

1/p

.

s

We also deﬁne (as usual!) Γis,t =

ϕi xir dxir

− S[p]

i,s,t dxr ϕi xi,s,t r

s,t

, i = 1, 2, s,t

and set Γs,t := Γ1s,t − Γ2s,t . From the existence part, we have Γs,t ≤ γ /p

c8 ω (s, t)

. Deﬁne

# ∆3 =

Sγ #

−

ϕ1 x1 , s , t , u dx1 , s , t , u

− Sγ s,u

Sγ

ϕ

2

x

2,s,t,u

dx

2,s,t,u

$

s,u

− Sγ

s,u

ϕ1 x1 , s , u dx1 , s , u

ϕ

2

x

2,s,u

dx

2,s,u

$ .

s,u

Rough diﬀerential equations

260

Observe that by continuity of the Riemann–Stieltjes integral and the map Sγ , we have for all integers k ∈ {1, . . . , [p]}

≤

π k ∆3 ϕ2 x2,s,t,u dx2,s,t,u π k ◦ Sγ s,u ϕ1 x1,s,t,u dx1,s,t,u − π k ◦ Sγ

s,u

2,s,u 2,s,u 2 dx ϕ x + π k ◦ Sγ s,u ϕ1 x1,s,u dx1,s,u − π k ◦ Sγ s,u m in{γ −1,1} ≤ 2 ϕ2 − ϕ1 Lip γ −1 + ρp,ω x1 , x2 k /p

ω (s, t) ≤

using Lemma 10.45

k /p

α

ε ω (s, t)

for some α ∈ (0, 1].

(10.43)

We now prove by induction on N ≤ [p] that ∀k ∈ {1, . . . , N } : ∃β k > 0 : π k Γs,t ≤ εβ k ω (s, t) . For N = 0, it is obvious, as π 0 Γs,t = 0. Assume now the induction hypothesis is true for all k < N. From equation (10.40), we see that π N Γs,u − π N Γs,t − π N Γt,u is equal to ∆3 + D1 + D2 + D3 , where

D1

=

N −1

−1 N π k Γs,t ⊗ π N −k Γ2t,u − π k Γ1s,t ⊗ π N −k Γt,u

k =1

D2

=

N −1

Sγ

πk

k =1

+

k =1

πk

ϕ x

2,s,t

ϕ x1,s,t dx1,s,t

− Sγ N −1

k =1

Sγ

dx

2,s,t

s,t

⊗ π N −k Γ2t,u

s,t

ϕ x

1,s,t

dx

1,s,t s,t

⊗ π N −k Γt,u

10.6 Integration along rough paths

D3

=

N −1

π k Γs,t ⊗ π N −k ◦ Sγ

k =1

−

N −1 k =1

−Sγ

πk

Γ1t,u

ϕ x2,t,u dx2,t,u

t,u

⊗ π N −k

261

ϕ x

Sγ

ϕ x1,t,u dx1,t,u

2,t,u

dx

t,u

2,t,u

. t,u

We easily see from the induction hypothesis and Lemma 10.45 that N /p

|D1 | + |D2 | + |D3 | ≤ c9 εα N ω (s, t)

, for some αN ∈ (0, α].

In particular, with (10.43), we obtain that π N Γs,u − Γs,t − Γt,u ≤ c10 εα N ω (s, t)N /p . From the existence part, equation (10.41), we also have π N Γs,u −Γs,t −Γt,u ≤ π N Γ2s,u −Γ2s,t −Γ2t,u + π N Γ1s,u −Γ1s,t −Γ1t,u ≤ c11 ω (s, t)

γ + N −1 p

θ

≤ c12 ω (s, t) with θ = γ/p > 1.

These estimates show that assumption (ii) of Lemma 10.61 is satisﬁed (checking assumption (i) is easy and left to the reader); it then follows that for some β N ∈ (0, 1], π N Γs,t ≤ εβ N ω (s, t) and so the induction step is completed. Putting this last inequality and Lemma 10.45 together, we see that for k ≤ [p] , we have from the triangle inequality 2 2 1 1 k /p 2 1 ϕ x dx − ϕ x dx ≤ εm in(β N ,γ −1) ω (s, t) . π k s,t s,t The proof is now ﬁnished. As a consequence of Theorem 10.47, we have Corollary 10.48 Let ϕi : Rd → Re , i = 1, . . . , d be some (γ − 1)-Lipschitz maps where γ > p. For any ﬁxed control ω, R > 0 and p ≤ p , the maps → C p-ω [0, T ] , G[p] (Re ) , dp -ω Lipγ −1 × xp-ω ≤ R , dp ,ω (ϕ, x) → ϕ (x) dx

Rough diﬀerential equations

262

and Lipγ −1 ×

and Lipγ −1 ×

xp-var ≤ R , dp -var → C p-var [0, T ] , G[p] (Re ) , dp -var (ϕ, x) → ϕ (x) dx

xp-var ≤ R , d∞

→

(ϕ, x)

→

C p-var [0, T ] , G[p] (Re ) , d∞ ϕ (x) dx

are uniformly continuous.

d e R ,R , i = Exercise 10.49 Extend Theorem 10.47 to ϕi ∈ Lipγlo−1 c 1, . . . , d. ˜ ˜ i ∈ Lipγ −1 Rd , Re such that ϕ ≡ ϕ Solution. It suﬃces to replace ϕi by ϕ on a ball with radius |x|∞;[0,T ] + 1.

10.7 RDEs driven along linear vector ﬁelds In various applications, for instance when studying the “Jacobian of the ﬂow”, one encouters RDEs driven along linear vector ﬁelds. Since linear vector ﬁelds are unbounded, we can, at this stage, only assert uniqueness and local existence (cf. Theorem 10.21). The aim of the present section is then to establish global existence with some precise quantitative estimates. Exercise 10.50 (linear vector ﬁelds) Let us consider a collection V = (Vi )i=1,...,d of linear vector ﬁelds of the form Vi (z) = Ai z + bi for e × e matrices Ai and elements bi of Re . Prove (by induction) that V N ;i 0 ,...,i N (z) ≡ Vi 0 . . . Vi N (z) = Ai N . . . Ai 0 z + Ai N . . . Ai 1 bi 0 . Lemma 10.51 (Lemma Alinear ) Assume that (i) V = (Vi )1≤i≤d with Vi (z) = Ai z are a collection of linear vector ﬁelds, and ﬁx N ∈ N; (ii) s, t are some elements of [0, T ]; x)s,t ; (iii) x, x ˜ are some paths in C 1-var [s, t] , Rd such that SN (x)s,t = SN (˜

t

t (iv) is a bound on s |dxu | and s |d˜ xu | and υ is a bound on maxi (|Ai |) . Then, N +1 π (V ) (s, ys ; x)s,t − π (V ) (s, ys ; x ˜)s,t ≤ C [υ] exp (Cυ) .

10.7 RDEs driven along linear vector ﬁelds

263

Proof. Deﬁne for all r ∈ [s, t] , ys,r = π (V ) (s, ys ; x)s,r ; we saw in Theorem 3.7 that |ys,r | ≤ c (1 + |ys |) υ exp (cυ) . From Proposition 10.3, we have π (V ) (s, ys ; x)s,t − E(V ) ys , SN (x)s,t t i1 iN (V . . . V I (y ) − V . . . V I (y )) dx . . . dx ≤ i1 iN r i1 iN s r r s

i 1 ,...,i N ∈{1,...,d}

t i1 iN ys,r dxr . . . dxr ≤ Ai N . . . Ai 1 s

≤

c (1 + |ys |) [υ]

N +1

exp (cυ) .

˜)s,t and π (V ) (s, ys ; x)s,t share the same Euler approximation, As π (V ) (s, ys ; x the triangle inequality ﬁnishes the proof. Equipped with this result and the Lipschitzness of the ﬂow for ordinary diﬀerential equations (Lemma 12.5), we are ready to obtain a version of Lemma 10.7 for linear vector ﬁelds. Lemma 10.52 (Davielinear ) Let p ≥ 1. Assume that (i) V = (Vi )1≤i≤d is a collection of linear vector ﬁelds deﬁned by Vi (z) = Ai and elements of Re bi ; Ai z + bi , for some e × e matrices 1-var [0, T ] , Rd , and x := S[p] (x) its canonical lift to (ii) x is a path in C a G[p] Rd -valued path; (iii) y0 ∈ Re is an initial condition; (iv) υ is a bound on maxi (|Ai | + |bi |). Then there exists a constant C depending on p (but not on the 1-variation norm of x) such that for all s < t in [0, T ] , p π (V ) (0, y0 ; x)s,t ≤ C (1 + |y0 |) υ xp-var;[s,t] exp Cυ p xp-var;[0,T ] .

Proof. To simplify the notation we set y = π (V ) (0, y0 ; x) and N := [p], a control function on [0, T ] is deﬁned by p

ω (s, t) := υ p xp-var;[s,t] . For every s < t in [0, T ] we deﬁne xs,t as a geodesic path associated with xs,t = SN (x)s,t , i.e.

s,t

SN x

s,t

= SN (x)s,t

and s

t

s,t dx = x

p-var;[s,t]

Rough diﬀerential equations

264

and we note that, as in (10.7), we have t t s,t dx ≤ |dx| . s

(10.45)

s

Let us now ﬁx s < t < u in [0, T ] and deﬁne xs,t,u := xs,t xt,u , the concatenation of xi,s,t and xi,t,u . Following the by now classical pattern of proof, ﬁrst seen in (Davie’s) Lemma 10.7, we set Γs,t = π (V ) (s, ys ; x)s,t − π (V ) s, ys ; xs,t s,t and observe Γs,u − Γs,t − Γt,u = A + B where A = π s, ys ; xs,t,u s,u − π (s, ys ; xs,u )s,u B = π t, yt ; xt,u t,u − π t, yt + Γs,t ; xt,u t,u . Lemma 10.51 (Lemma Alinear ) was tailor-made to estimate A and gives (N +1)/p 1/p |A| ≤ c1 (1 + |ys |) ω (s, u) . exp c1 ω (s, u) On the other hand, from Theorem 3.8, we have 1/p 1/p |B| ≤ c2 |Γs,t | ω (t, u) exp c2 ω (t, u) and so |Γs,u | ≤ |Γs,t | e2c 2 ω (s,u )

1/p

(N +1)/p

+ |Γt,u | + c1 (1 + |ys |) ω (s, u)

Another application of Lemma Alinear with ω ˜ (s, t) := |x|1-var;[s,t] we have lim

sup

1/p

ec 1 ω (s,u ) . (10.46) combined with (10.45) shows that

r →0 s,t s.t ω ˜ (s,t)≤r

|Γs,t | = 0. r

(10.47)

Also, ODE estimates give that for all s, t ∈ [0, T ] , we have 1/p 1/p π (V ) s, ys ; xs,t s,t ≤ c3 (1 + |ys |) ω (s, t) ec 2 ω (s,t) .

(10.48)

Inequalities (10.46), (10.47) and (10.48) allow us to use (the analysis) Lemma 10.63 (from the appendix to this chapter) to see that |y|∞;[0,T ] ≤ c4 (1 + |y0 |) exp (c4 ω (0, T )) , and that for all s, t ∈ [0, T ], θ

|Γs,t | ≤ c4 (1 + |y0 |) ω (s, t) exp (c4 ω (0, T )) .

(10.49)

10.7 RDEs driven along linear vector ﬁelds

265

The triangle inequality then provides that for all s, t ∈ [0, T ], 1/p

|ys,t | ≤ c5 (1 + |y0 |) ω (s, t)

exp (c5 ω (0, T )) .

We now give the appropriate extension to (full) RDEs. Theorem 10.53 Assume that (i) V = (Vi )1≤i≤d is a collection of linear vector ﬁelds deﬁned by Vi (z) = e Ai z + bi , for some e × e matrices Ai and elements of R [p]bi; d p-var [0, T ] , G R ; (ii) x is a weak geometric rough path in C [p] e (iii) y0 ∈ G (R ) is an initial condition; (iv) υ is a bound on maxi (|Ai | + |bi |). Then there exists a unique full RDE solution π (V ) (0, y0 ; x) on [0, T ]. Moreover, there exists a constant C depending on p such that for all s < t in [0, T ] , we have p π (V ) (0, y0 ; x)s,t ≤ C (1 + |y0 |) υ xp-var;[s,t] exp Cυ p xp-var;[0,T ] . Proof. We only have to prove the above estimate for a bounded variation path, the ﬁnal result following a now classical limiting procedure. 1 Write y0 = π 1 (y0 ) , and deﬁne y˜ = 1+|y π (V ) (0, y0 ; x). Observe that 0| using the linearity of the vector ﬁelds, we have y0 y˜ = π (V ) 0, ;x 1 + |y0 | = π (V ) (0, y˜0 ; x) . From (the proof of) Lemma 10.52 it follows that p y0 |) exp c1 υ p xp-var;[0,T ] |˜ y |∞;[0,T ] ≤ c1 (1 + |˜ p ≤ 2c1 exp c1 υ p xp-var;[0,T ] ≡ R. For any vector ﬁeld V˜ such that V ≡ V˜ on Ω = {y ∈ Re : |y| < R + 1} we have π (V ) (0, y˜0 ; x) ≡ π (V˜ ) (0, y˜0 ; x) . Moreover, we can (and will) take V˜ such that ˜ V γ ≤ c1 |V |Lip γ (Ω) ≤ |V |∞;Ω + |V |∞;Ω ≤ υ (R + 2) . Lip

It then suﬃces to use the estimate of Theorem 10.36, applied to the full RDE with vector ﬁelds V˜ driven by S[p] (x), to see that p p S[p] π (V ) (0, y˜0 ; x) s,t ≤ c2 V˜ γ −1 xp-var;[s,t] ∨ V˜ γ −1xp-var;[s,t] Lip Lip p p ≤ c3 υ xp-var;[s,t] exp c3 υ xp-var;[0,T ] .

266

Rough diﬀerential equations

This implies that S[p] π (V ) (0, y0 ; x) s,t = (1 + |y0 |) S[p] π (V ) (0, y˜0 ; x) s,t p ≤ C(1+ |y0 |) υxp-var;[s,t] exp Cυ p xp-var;[0,T ] .

Remark 10.54 This estimate shows in particular that (full) solutions to linear RDEs have growth controlled by p exp (const) × xp-var;[0,T ] which has implications on the integrability of such a solution when the driving signal x is random. It is therefore interesting to know that this estimate cannot be improved and the reader can ﬁnd a construction of the relevant examples in [58]. Exercise 10.55 Assume x ∈ C p-var [0, T ] , G[p] Rd drives linear vector ﬁelds as in Theorem 10.53 above. If x is controlled by a ﬁxed control ω in the sense that ∀0 ≤ s < t ≤ T : υ xp-var;[s,t] ≤ ω (s, t)

1/p

,

the conclusion of Theorem 10.53 can be written as 1/p π (V ) (0, y0 ; x)s,t ≤ C (1 + |y0 |) ω (s, t) exp (Cω (0, T )) , valid for all s < t in [0, T ], where any dependence on p, υ has been included in the constant C. Show that an estimate of this exact form remains valid, if the assumption on x is relaxed to 1/p

∀0 ≤ s < t ≤ T : υ xp-var;[s,t] ≤ ω (s, t)

∨ ω (s, t) .

(10.50)

The importance of this exercise comes from the fact (cf. Theorem 10.36) that (10.50) is a typical estimate for solutions of full RDEs, i.e. when x itself arises as the solution to a full RDE along Lipγ -vector ﬁelds, γ > p. 1/p

1/p

= ω (s, t) ∨ ω (s, t). For Solution. Assume (10.50) and write ω ˜ (s, t) s, t : ω (s, t) ≤ 1, Theorem 10.53 then gives 1/p π (V ) (0, y0 ; x)s,t ≤ C (1 + |ys |) ω (s, t) exp (Cω (s, t)) 1/p ≤ C 1 + |y|∞;[0,T ] ω (s, t) exp (Cω (0, T )) and we are done if we can show that |y|∞;[0,T ] ≤ c (1 + |y0 |) exp (cω (0, T )) .

10.7 RDEs driven along linear vector ﬁelds

267

From equation (10.49) (now applied with ω ˜ !) this estimate follows from (the p analysis) Lemma 10.63 but since ω ˜ (0, T ) = ω (0, T ) for large ω (0, T ), this is not good enough. However, from Remark 10.64 we can do a little better and get   |y|∞;[0,T ] ≤ c (|y0 | + ε) exp c

 sup D =(t i )⊂[0,T ] such that ω (t i ,t i + 1 )≤1 for all i

 ω ˜ (ti , ti+1 ) .

i

But since ω ˜ ≡ ω, when ω ≤ 1 we can replace ω ˜ by ω, and by superadditivity, |y|∞;[0,T ] ≤ c (|y0 | + ε) exp (cω (0, T ))

as required.

Exercise 10.56 (non-explosion) Consider V = (Vi )1≤i≤d , a collection of locally Lipγ −1 -vector ﬁelds on Re for γ ∈ (p, [p] + 1), such that (i) Vi are Lipschitz continuous; older (ii) the vector ﬁelds V [p] = Vi 1 . . . Vi [ p ] i ,...,i ∈{1,...,d} are (γ − [p])-H¨ 1 [p ] continuous. Show that if x is a geometric p-rough path, and if y0 , then π (V ) (0, y0 ; x) does not explode. Provide a quantitative bound. Solution. The argument is the same as for linear RDEs. We only need to extend Lemma Alinear , i.e. we need to prove the following: if (a) s, t are some elements of [0, T] ; x)s,t ; (b) x, x ˜ are some paths in C 1-var [s, t] , Rd such that SN (x)s,t = SN (˜ [p] 1/[p]

t

t (c) is a bound on s |dxu | and s |d˜ xu | and υ is a bound on V (γ −[p])-H¨o l and supy ,z |V (y) − V (z)| / |y − z| . Then, γ π (V ) (s, ys ; x)s,t − π (V ) (s, ys ; x ˜)s,t ≤ C [υ] exp (Cυ) . To do so, deﬁne for all r ∈ [s, t] , ys,r = π (V ) (s, ys ; x)s,r ; as the vector ﬁelds Vi are Lipschitz continuous, Theorem 3.7 gives |ys,r | ≤ c1 (1 + |ys |) υ exp (cυ) .

Rough diﬀerential equations

268

From Proposition 10.3, π (V ) (s, ys ; x)s,t − E(V ) ys , SN (x)s,t t i1 i[ p ] Vi 1 . . . Vi [ p ] I (yr ) − Vi 1 . . . Vi [ p ] I (ys ) dxr . . . dxr ≤ i 1 ,...,i N ∈{1,...,d}

≤

s

t

γ −[p]

|ys,r |

υ [p]

i i dxr1 . . . dxrN

s

i 1 ,...,i N ∈{1,...,d}

γ −[p]

γ

≤ c2 (1 + |ys |) [υ] exp (cυ) γ ≤ c2 (1 + |ys |) [υ] exp (cυ) . ˜)s,t and π (V ) (s, ys ; x)s,t share the same Euler approximation, As π (V ) (s, ys ; x the triangle inequality ﬁnishes the proof. Equipped with this result, we then prove the exercise by going through the proof of Theorem 10.53.

10.8 Appendix: p-variation estimates via approximations Our discussion of the Young–L´ oeve inequality was based on some elementary analysis considerations; Lemmas 6.1 and 6.2. We now give the appropriate extensions, still elementary, upon which we base our discussion of rough diﬀerential equations. Lemma 10.57 Let θ > 1, K, ξ ≥ 0, α > 0. Assume : [0, R] → R+ satisﬁes (i) (r) = 0; lim r →0 r (ii) for all r ∈ [0, R] , r + ξrθ exp (Krα ) . (r) ≤ 2 2 Then, for all r ∈ [0, R], ξrθ exp (r) ≤ 1 − 21−θ

2K rα 1 − 2−α

.

10.8 Appendix: p-variation estimates via approximations

269

Proof. Note that it is enough to prove the ﬁnal estimate for r = R; indeed, given any other r ∈ [0, R], it suﬃces to replace the interval [0, R] by [0, r]. Assumption (ii) implies that for all r ∈ [0, R], r exp (Krα ) + ˆξrθ (r) ≤ 2 2 with ˆξ = ξ exp (KRα ). By induction, we obtain that (r) ≤2n

r 2n

exp Krα

n −1

2−k α

+ ˆξrθ

k =0

n −1

 2k (1−θ ) expKrα

k −1

 2−j α .

j =0

k =0

n −1 We bound exp Krα k =0 2−k α ≤ exp (Krα / (1 − 2−α )) and so obtain (r) exp

−Krα 1 − 2−α

n −1 r θ ˆ ≤ 2 n + ξr 2k (1−θ ) . 2 n

k =0

By assumption (i), sending n to ∞ yields ˆξrθ Krα . exp (r) ≤ 1 − 21−θ 1 − 2−α As ˆξ = ξ exp (KRα ) ≤ ξ exp (KRα / (1 − 2−α )), we then obtain ξRθ 2KRα (r) ≤ . exp 1 − 21−θ 1 − 2−α As a variation on the theme, let us give Lemma 10.58 Let θ > 1, K, ξ ≥ 0, α > 0 and β ∈ [0, 1). Assume : [0, R] → R+ satisﬁes (i) (r) = 0; lim r →0 r (ii) for all r ∈ [0, R] , r (r) ≤ 2 + ξrθ ∧ εrβ exp (Krα ) . 2 Then, for all r ∈ [0, R], for some constant C depending on θ and β, we have 1 −β θ −1 2K α . r (r) ≤ Crε θ −β ξ θ −β exp 1 − 2−α

Rough diﬀerential equations

270

Proof. Just as in the previous proof, we only prove the estimate for r = R. Deﬁning ˆξ = ξ exp (KRα ) and ˆε = ε exp (KRα ), we have for all r ∈ [0, R], r exp (Krα ) + ˆξrθ ∧ ˆεrβ . (r) ≤ 2 2 By induction, we obtain that n −1 r (r) ≤ 2n n exp Krα 2−k α 2 k =0   n −1 k −1 ˆξrθ 2k (1−θ ) ∧ ˆεrβ 2k (1−β ) exp Krα 2−j α  . + j =0

k =0

n −1 We bound exp Krα k =0 2−k α ≤ exp (Krα / (1 − 2−α )) and then let n tend to ∞ to obtain ∞ −Krα ˆξrθ 2k (1−θ ) ∧ ˆεrβ 2k (1−β ) (r) exp ≤ −α 1−2 k =0 2k (1−β ) + ˆξrθ ≤ ˆεrβ 1 0≤k ≤ θ −β ln 2

ˆ ξ ˆε

1 k > θ −β ln 2

ˆ ξ ˆε

2k (1−θ ) +ln 2 r

1 β ≤ c1 ˆεr 2 θ −β

+ ˆξrθ 2 ≤

θ −1

1 θ −β

+ln 2 r

ln 2

ln 2

ˆ ξ ˆε

ˆ ξ ˆε

+ln 2 r (1−β )

+ln 2 r (1−θ )

1 −β

c2 rˆε θ −β ˆξ θ −β ,

where c1 , c2 are constants which depend on θ and β. This estimate ﬁnishes the proof. An important consequence of Lemma 10.57 is the following estimate. Typically (e.g. in the proof of Davie’s estimate, Lemma 10.7), Γ is the difference between a path y (for which we are trying to bound its p-variation) and a “local” approximation of y which is easier to control. Lemma 10.59 Let ξ > 0, θ > 1, K ≥ 0, α > 0 and Γ : {0 ≤ s < t ≤ T } ≡ ∆T → Re be such that: (i) for some control ω ˆ, lim

sup

r →0 (s,t)∈∆ : ω ˆ (s,t)≤r T

|Γs,t | = 0; r

(10.51)

10.8 Appendix: p-variation estimates via approximations

271

(ii) for some control ω we have that, for all s < t < u in [0, T ], θ α |Γs,u | ≤ |Γs,t | + |Γt,u | + ξω (s, u) exp (Kω (s, u) ) .

(10.52)

Then, for all s < t in [0, T ] , θ

ξω (s, t) exp |Γs,t | ≤ 1 − 21−θ

2K α ω (s, t) 1 − 2−α

.

Remark 10.60 It is important to notice that the control ω ˆ is not used in the ﬁnal estimate. Proof. We assume that ω ˆ ≤ 1ε ω for some ε > 0; otherwise we can replace ω by ω + εˆ ω and let ε tend to 0 at the end. Deﬁne for all r ∈ [0, ω (0, T )] , (r) =

sup

|Γs,t | .

(s,t)∈∆ T :ω (s,t)≤r

Consider any ﬁxed pair (s, u) with 0 ≤ s < u ≤ T such that ω (s, u) ≤ r. From basic properties of control functions we can then pick t such that ω (s, t) and ω (t, u) is bounded above by ω (s, u) /2. It follows that |Γs,t | ≤ (r/2) ,

|Γs,u | ≤ (r/2) ,

and by assumption (ii), r + ξrθ exp (Krα ) . |Γs,u | ≤ 2 2 Taking the supremum over all s < u in [0, T ] for which ω (s, u) ≤ r yields that for all r ∈ [0, ω (0, T )], r + ξrθ exp (Krα ) . (r) ≤ 2 2 Assumption (i) implies that limr →0 (r) /r = 0 and by (the previous) Lemma 10.57, we see that for all r ∈ [0, ω (0, T )] , ξrθ 2K α exp r . (r) ≤ 1 − 21−θ 1 − 2−α Obviously, |Γs,t | ≤ (r) for r = ω (s, t) and so the proof is ﬁnished. The same argument, but using Lemma 10.58 instead of Lemma 10.57, leads to the following estimate which we use in the proof of Theorem 10.47 where we establish continuity of rough integration. Lemma 10.61 Let θ > 1, K, ξ ≥ 0, α > 0, β ∈ [0, 1) and Γ : {0 ≤ s < t ≤ T } ≡ ∆T → Re

Rough diﬀerential equations

272

be such that: (i) for some control ω ˆ, lim

sup

r →0 (s,t)∈∆ : ω ˆ (s,t)≤r T

|Γs,t | = 0; r

(ii) for some control ω we have that, for all s < t < u in [0, T ], θ β α |Γs,u | ≤ |Γs,t | + |Γt,u | + ξω (s, u) ∧ εω (s, u) exp (Kω (s, u) ) . Then, for all s < t in [0, T ] , for some constant C depending on θ and β, we have 1 −β θ −1 2K α |Γs,t | ≤ Cω (s, t) ε θ −β ξ θ −β exp . ω (s, t) 1 − 2−α Remark 10.62 It is worth noting that (ii) is equivalent to saying that, for all η ∈ [0, 1], θ (1−η )+β η α exp (Kω (s, u) ) , |Γs,u | ≤ |Γs,t | + |Γt,u | + ξ 1−η εη ω (s, u) using |(∗)| ≤ a ∧ b ⇔ |(∗)| ≤ aη b1−η ∀η ∈ [0, 1], and thus renders Lemma 10.59 applicable. In fact, for any 0 ≤ η < (θ − 1) / (θ − β) we have ˜θ = βη + θ (1 − η) > 1; setting also ˜ξ = ξ 1−η εη , we can apply Lemma 10.59 to get 1 2K α θ˜ η 1−η ω (s, t) ε ξ exp ω (s, t) . |Γs,t | ≤ 1 − 2−α 1 − 21−θ˜ Although this would be suﬃcient for our application (namely, the proof of Theorem 10.47) we see that our direct analysis showed that we can take η = (θ − 1) / (θ − β) in the above estimate. As mentioned right above Lemma 10.59, such estimates are typically used when Γ is the diﬀerence between a path y and a “local” approximation of y which is easier to control. Sometimes (e.g. in the proof of uniqueness/continuity result for RDEs, Theorem 10.26) the path y itself comes into play; in the sense that (10.52) above has to be replaced by (10.53) below. With some extra information, such as (10.54) below, a similar analysis is possible. Lemma 10.63 Let K ≥ 0, ε > 0, θ > 1, 1/p > 0 and Γ y

: {0 ≤ s < t ≤ T } ≡ ∆T → Re , : [0, T ] → Re

10.8 Appendix: p-variation estimates via approximations

273

be such that: (i) for some control ω ˆ, |Γs,t | = 0; r

sup

lim

r →0 (s,t)∈∆ : ω ˆ (s,t)≤r T

(ii) for some control ω we have that, for all s < t < u in [0, T ], θ 1/p ; |Γs,u | ≤ |Γs,t | + |Γt,u | +K ε + sup |yr | ω (s, u) exp Kω (s, u) 0≤r ≤u

(10.53) (iii) for all s < t in [0, T ] , 1 1/p . |ys,t − Γs,t | ≤ K ε + sup |yr | ω (s, t) p exp Kω (s, t) r ≤t

(10.54)

Then we have, for some constant C depending only on K, p and θ, |y|∞;[0,T ] ≤ C exp (Cω (0, T )) (|y0 | + ε) , and for all s < t in [0, T ] , θ

|Γs,t | ≤ C (|y0 | + ε) ω (s, t) exp (Cω (0, T )) . Proof. Fix v < v in [0, T ] and s < t < u ∈ [0, v ]. By assumption (ii) we have, θ 1/p . |Γs,u | ≤ |Γs,t | + |Γt,u | + K ε + |y|∞;[0,v ] ω (s, u) exp Kω (s, u) We may thus apply Lemma 10.59 (on the interval [0, v ] rather than [0, T ] and with parameters ξ = K ε + |y|∞;[0,v ] and α = 1/p). It follows that for all s < t in [0, v ] , θ 1/p |Γs,t | ≤ c1 ε + |y|∞;[0,v ] ω (s, t) exp c1 ω (s, t)

and together with assumption (iii) we see that 1/p 1/p . sup |ys,t | ≤ c2 ε + |y|∞;[0,v ] ω (v, v ) exp c2 ω (v, u) s,t∈[v ,v ]

This in turn implies |y|∞;[0,v ]

≤

|y|∞;[0,v ] +

≤

1/p 1/p . |y|∞;[0,v ] + ε + |y|∞;[0,v ] c2 ω (s, t) exp c2 ω (v, v )

sup s,t∈[v ,v ]

|ys,t |

Rough diﬀerential equations

274

We now pick v0 = 0 and set for i ∈ {0, 1, 2, . . . }, 1 1/p 1/p ∧T ≤ vi+1 = sup c2 ω (vi , r) exp c2 ω (vi , r) 2 r> vi 1 = sup ω (vi , r) ≤ ∧ T, c3 r> vi 1/p 1/p where c3 was determined from c2 (1/c3 ) exp c2 (1/c3 ) = 1/2. It follows that 1 ε + |y|∞;[0,v i + 1 ] , |y|∞;[0,v i + 1 ] ≤ |y|∞;[0,v i ] + 2 which implies |y|∞;[0,v i + 1 ] ≤ 2 |y|∞;[0,v i ] + ε and then, by induction, |y|∞;[0,v i ] ≤ 2i (|y0 | + ε) . We claim that vN = T where N = [c3 ω (0, T )]+1, the ﬁrst integer strictly greater than c3 ω (0, T ) . Indeed, vN < T would imply ω (vi , vi+1 ) = 1/c3 for all i < N , and hence lead to the contradiction c3 ω (0, T ) ≥ c3

N −1

ω (vi , vi+1 ) = N.

i=0

We are now able to say that |y|∞;[0,T ]

≤

2c 3 ω (0,T )+1 (|y0 | + ε)

≤

c4 exp (c4 ω (0, T )) (|y0 | + ε) .

Coming back to inequality (10.53), we obtain that for all s < t in [0, T ] , θ |Γs,u | ≤ (|Γs,t | + |Γt,u |) + exp (c5 ω (0, T )) (|y0 | + ε) ω (s, u) 1/p . exp Kω (s, u) We may thus apply Lemma 10.59 (with parameters ξ = ec 5 ω (0,T ) (|y0 | + ε) and α = 1/p) once again to obtain that, for all s < t in [0, T ] , θ

|Γs,t | ≤ c6 (|y0 | + ε) ω (s, t) exp (c6 ω (0, T )) . The proof is now ﬁnished. Remark 10.64 The conclusion of the above lemma can be slightly sharpened to15     |y|∞;[0,T ] ≤ C (|y0 | + ε) exp C ω (ti , ti+1 ) sup D =(t i )⊂[0,T ] such that ω (t i ,t i + 1 )≤1 for all i

i

≤ C (|y0 | + ε) exp (Cω (0, T )) . . . by super-additivity of controls. 1 5 The

interest in the sharpening is explained in Exercise 10.55.

10.8 Appendix: p-variation estimates via approximations

275

To see this, the above arguments remain unchanged until the deﬁnition 1 vi+1 = sup ω (vi , r) ≤ ∧T c3 r> vi 1/p 1/p = 1/2. Clearly with c3 determined from c2 (1/c3 ) exp c2 (1/c3 ) then, by making the preceding constant c2 bigger if necessary, we may assume that 1/c3 ≤ 1. As in the proof of Lemma 10.63 we have |y|∞;[0,v i ] ≤ 2i (|y0 | + ε) . For any integer N chosen such that vN = T one then has the conclusion |y|∞;[0,T ] ≤ 2N (|y0 | + ε) . We claim that





 N = c3 

sup

D =(t i )⊂[0,T ] such that ω (t i ,t i + 1 )≤1 for all i

 ω (ti , ti+1 ) + 1

i

is a valid choice. Assume vN < T . Then ω (vi , vi+1 ) = 1/c3 ≤ 1 for all i < N and so N = c3

N −1

ω (vi , vi+1 ) ≤ c3

i=0

sup D =(t i )⊂[0,T ] such that ω (t i ,t i + 1 )≤1 for all i

which is a contradiction.

ω (ti , ti+1 ) = N − 1

i

In the remainder of this appendix we make the appropriate extensions which are used in Section 10.5 to establish the uniqueness of an RDE solution under minimal regularity assumptions. Condition 10.65 (Ψ (C, β)) We say that an increasing function δ : R+ → (C, β) if R+ belongs to the class Ψ (i) for all σ > 0, we have n ≥0 rn (σ) = ∞, where $ # 2n r dx ≥ σ ; rn (σ) = inf r > 0, δ x 1 (ii) there exists C > 0, β ∈ (0, 1) such that δ (x) ≤ Cxβ for all x ∈ [0, 1]. Exercise 10.66 Prove that δ ∈ Ψ (C, β) for (i) δ (x) = xθ for θ > 1; (ii) δ (x) = x; (iii) δ (x) = x ln∗ ln∗ x1 . Exercise 10.67 Prove that δ ∈ / Ψ (C, β) for (i) δ (x) = x ln∗ x1 ; (ii) δ (x) = xθ for θ < 1.

Rough diﬀerential equations

276

Lemma 10.68 Let K > 0, α > 0. Assume : [0, R] → R+ satisﬁes for all r ∈ [0, R] , r α exp (Kδ (r) ) + δ (r) , (r) ≤ 2 2 where δ ∈ Ψ (C, β) for some constants C, β > 0, and α > 0. Then, for all n ≥ 0, for some constant C1 depending on K, α, β and C, 0 / r 2n r n dx exp C1 rα β . δ (r) ≤ 2 n + 2 x 1 Proof. By induction, we obtain that (r) ≤ 2n

r 2n

exp K

n −1

δ

r α

k =0

We then bound exp K exp crα β to obtain

+

2k

n −1 k =0

 r α r 2k expK δ j δ k . 2 2 j =0 

k −1

n −1 r α ∞ by exp KC α rα β k =0 2−k α β = k =0 δ 2 k

−1 r n r (r) exp −crα β ≤ 2n n + 2k δ k . 2 2 k =0

By assumption, x → δ have

r x

is a non-increasing function, hence for all k, we

r 2k + 1 r dx. 2 δ k ≤ δ 2 x 2k k

Summing up over k, we obtain that for all n ≥ 0, we have

(r) exp −cr

αβ

r 2n r dx. ≤2 n + δ 2 x 1 n

Lemma 10.69 Let ω be a control, C > 0, θ > 1 and Γ : {0 ≤ s < t ≤ T } → Re a continuous map such that (i) for all s < t in [0, T ] , θ

|Γs,t | ≤ Cδ (ω (s, t)) ;

(10.55)

(ii) for all s < t 0. 1/p +δ (ω (s, u)) . |Γs,u | ≤ (|Γs,t | + |Γt,u |)exp Cω (s, u)

10.8 Appendix: p-variation estimates via approximations

277

Then, for all s < t in [0, T ] , and some constant C1 depending on C, β and ω (0, T ) , we have / 0 θ 2 n ω (s, t) ω (s, t) n dx . + δ |Γs,t | ≤ C1 inf 2 δ n ≥0 2n x 1 Proof. Deﬁne (r) =

|Γs,t | ,

sup s,t such that ω (s,t)≤r

and observe that, as in the proof of Lemma 10.59, we have r 1/p exp Cδ (r) + δ (r) . (r) ≤ 2 2 We conclude using Lemma 10.68 and (10.55). Proposition 10.70 Let ω be a control, C > 0, µ > 0, θ > 1, α ∈ (0, 1), β ∈ 1 d θ , 1 and assume that δ ∈ Ψ (C, β) . Let y : [0, T ] → R be a continuous e path, and Γ : {0 ≤ s < t ≤ T } → R a continuous map such that (i) for some θ > 1, for all s < t in [0, T ] , θ

|Γs,t | ≤ Cδ (ω (s, t)) ;

(10.56)

(ii) for all s < t < u in [0, T ] , 1/p +C µ+ sup |yr | δ (ω (s, u)) ; |Γs,u | ≤ (|Γs,t | + |Γt,u |)exp Cδ (ω (s, u)) r ≤u

(10.57) (iii) for all s < t in [0, T ] ,

|ys,t

α /p − Γs,t | ≤ C µ + sup |yr | δ (ω (s, t)) . r ≤t

Then, for all ε > 0, there exists δ = δ (ω (0, T ) , C, L, K, θ, p) > 0 such that |y0 | + µ ≤ δ implies |y|∞;[0,T ] < ε and |y|p-ω ;[0,T ] < ε. Proof. At the price of replacing p by p/α, we can and will assume α = 1. We allow the constant in this proof to depend on ω (0, T ) , C, L, K, θ and p. Deﬁne τ γ = inf t > 0, |y|∞;[0,t] > γ . Using Lemma 10.69, we have for all u, v < τ γ and all n ≥ 0, / 0 θ 2n ω (u, v) ω (u, v) n dx + (µ + γ) δ |Γu ,v | ≤ c1 2 δ 2n x 1 / 0 2n ω (u, v) θβ n (1−θ β ) ≤ c1 2 dx . ω (u, v) + (µ + γ) δ x 1

Rough diﬀerential equations

278

From the triangle inequality, we therefore obtain that for all u, v < τ α and all n ≥ 0, 2n ω (u, v) 1/p dx δ |yu ,v | ≤ c2 (µ + γ) δ (ω (u, v)) + x 1 θβ

+ c2 ω (u, v)

2n (1−θ β ) .

As |y|∞;[0,t] −|y|∞;[0,s] ≤ supu ,v ∈[s,t] |yu ,v | , we have that |y|∞;[0,t] −|y|∞;[0,s] is less than or equal to 2n ω (s, t) 1/p θβ dx +c2 ω (s, t) 2n(1−θ β) . c2 µ+ |y|∞;[0,t] δ (ω (s, t)) + δ x 1 (10.58) Fix b ∈ 1, 2θ β −1 and deﬁne $ # 2n 1 rk 1/p rk := inf r > 0, c2 δ (rk ) + dx = 1 − ∧ ω (0, T ) . δ x b 1 ∞ By assumption on δ, k = n 0 rk = +∞. In particular, for a ﬁxed n0 that will be chosen later, we can deﬁne n1 to be the ﬁrst integer such that n 0 +n 1

rk ≥ ω (0, T ) .

k=n0

We then deﬁne the times (ti )i=0,...,n 1 by t0 = 0 and ti+1 = inf {t > ti , ω (ti , ti+1 ) = rn 0 +n 1 −i } ∧ T. Observe that by construction, tn 1 = T. Also, inequality (10.58) gives for all n ≥ 0, |y|∞;[0,t i + 1 ] ≤ |y|∞;[0,t i ] + c2 rnθ β0 + n 1 −i 2n (1−θ β ) 1/p + µ + |y|∞;[0,t i + 1 ] c2 δ (rn 0 +n 1 −i ) +

1

2n

δ

r

n 0 +n 1 −i

x

dx .

We will take n = n0 + n1 − i. By deﬁnition of rn 0 +n 1 −i , 2 n 0 + n 1 −i 1 rn 0 +n 1 −i 1/p dx = 1 − , c2 δ (rn 0 + n 1 −i ) + δ x b 1 so that |y|∞;[0,t i + 1 ] ≤ b |y|∞;[0,t i ] + (b − 1) µ + bc2 rnθ β0 +n 1 −i 2(1−θ β )(n 0 +n 1 −i) .

10.9 Comments

279

As all ri are bounded by ω (0, T ), we obtain (b − 1) (1−θ β )(n 0 +n 1 −i) µ + c3 2 . |y|∞;[0,t i + 1 ] ≤ b |y|∞;[0,t i ] + b An easy induction then gives us that |y|∞;[0,t k ]

≤

b |y0 | + k

b

k −j

(b − 1) µ + c3 2(1−θ β )(n 0 +n 1 −j ) b

j =0

≤

k −1

bk |y0 | +

bk − 1 µ b−1

+ c3 2(1−θ β )n 0

k −1

2(1−θ β )(n 1 −j ) bk −j .

j =0

Applying this to k = n1 , we see that bn 1 − 1 µ bn 1 |y0 | + |y|∞;[0,T ] ≤ b−1 n 1 −1 (n 1 −j ) b2(1−θ β ) + c3 2(1−θ β )n 0 j =0

≤

bn 1 − 1 µ bn 1 |y0 | + b−1 c3 + 2(1−θ β )n 0 . 1 − b2(1−θ β )

For a given ε > 0, we pick n0 large enough so that 1−b2c(31 −θ β ) 2(1−θ β )n 0 ≤ ε, to obtain that bn 1 − 1 n1 µ + ε. |y|∞;[0,T ] ≤ b |y0 | + b−1 Observe that n1 depends on n0 , which depends on ε. Nonetheless, we see that for n ε > 0, there exists δ > 0 such that |y0 | + µ ≤ δ implies 1 −1 bn 1 |y0 | + b b−1 µ < ε and hence |y|∞;[0,T ] ≤ 2ε. That concludes the proof.

10.9 Comments The main result of this chapter, the continuity of the RDE solutions as a function of the driving signal, also known as the universal limit theorem, is

280

Rough diﬀerential equations

due to T. Lyons [116, 120], nicely summarized in Lejay [104] and the Saint Flour notes [123]. There have been a number of (re)formulations of rough path theory by other authors, including Davie [37], Feyel and de la Pradelle [50], Gubinelli [74] and Hu and Nualart [86]. Our presentation builds on Friz and Victoir [67] and combines Davie’s approach [37] with geometric ideas. It seems to lead to essentially sharp estimates. In particular, we can extend Davie’s uniqueness result under Lipp -regularity, p < 3 (compared to Lipp+ ε in Lyons’ uniqueness proof via Picard iteration) to the case of arbitrary p ≥ 1. In this case, the ﬂow need not be Lipschitz continuous (A. M. Davie, personal communication). Convergence of Euler schemes for rough diﬀerential equations is established in Davie [37] for [p] = 1, 2; the general case, Section 10.3.5, is new. Some of our estimates appear as special cases in previous works, for instance in Hu and Nualart [87] in the “Young” case of 1/p-H¨older paths with p ∈ [1, 2). Lipschitz estimates for rough integration or diﬀerential equations, at least for p ∈ [2, 3), appear in Hu and Nualart [86] and Lyons and Qian [120]; see also Gubinelli [74].

11 RDEs: smoothness We remain in the RDE setting of the previous chapter; that is, we consider rough diﬀerential equations of the form dy = V (y) dx, y (0) = y0 , where x = x (t) is a weak geometric p-rough path. In the present chapter we investigate various smoothness properties of the solution, in particular as a function of y0 and x. In particular, we shall see that RDE solutions induce ﬂows of diﬀeomorphisms which depend continuously on x. As an application, we consider a class of parabolic partial diﬀerential equations with “rough” coeﬃcients in a transport term.

11.1 Smoothness of the Itˆo–Lyons map Assuming x ∈ C 1-var [0, T ] , Rd we saw in Chapter 4 (cf. Remark 4.5) that the Re -valued ODE solution y = π (V ) (0, y0 , x) together with its directional d π (V ) (0, y0 + εv; x + εh) ε=0 satisﬁes the system derivative z = dε dy = V (y) dx, dz = (DV (y) dx) · z + V (y) dh started at (y0 , v) ∈ Re ⊕ Re . In particular, we may write (y, z) = π (W ) (0, (y0 , v) ; (x, h)) where (W ) are the induced vector ﬁelds on Re ⊕ Re . In this formulation, the extension to a rough path setting is easy. Assume at ﬁrst that V ∈ Lipγ +1 with γ > p so W ∈ Lipγloc , and assume furthermore that there exists a G[p] Rd ⊕ Rd -valued geometric p-rough path χ which projects onto the G[p] Rd -valued geometric p-rough paths x and h. After a localization argument (exploiting the structure of (y, z) and in particular the fact that linear RDEs do not explode; cf. the argument in Section 11.1.1 below) we may assume W ∈ Lipγ and have existence/uniqueness/continuity properties of the RDE π (W ) (0, (y0 , v) ; χ) with values in Re ⊕ Re . Projection to the second component gives (at least a candidate for) the directional derivative z=

d π (V ) (0, y0 + εv; plus ◦ δ 1,ε (χ)) ε=0 . dε

282

RDEs: smoothness

(Recall from Section 7.5.6 that plus resp. δ 1,ε is deﬁned as the unique extension of (x, h) ∈ R2e → (x + h) ∈ Re resp. (x, h) ∈ R2e → (x, εh) ∈ R2e to a homomorphism between the respective free nilpotent groups.) When h enjoys complementary Young regularity to x, say h ∈ C q -var [0, T ] , Rd with 1/p + 1/q > 1, we naturally take χ =S[p] (x, h), the Young pairing of x and h, in which case plus ◦ δ 1,ε (χ) = Tεh x. (The translation operator T was introduced in Section 9.4.6.) The proof that z is not only a candidate but indeed is the directional derivative can then be done by passing to limit in the corresponding ODE statements, using both continuity of the Lyons–Itˆ o maps and “closedness of the derivative operator” (Proposition B.7). Unfortunately, this reasoning requires one degree too much regularity (our discussion above started with V ∈ Lipγ +1 ). With a little extra eﬀort we can prove diﬀerentiability for V ∈ Lipγ , γ > p. The argument exploits, of course, the speciﬁc structure of (y, z) and in particular the fact that DV ∈ Lipγ −1 only appears in a rough integration procedure. (Recall from Section 10.6 that existence/uniqueness/continuity for rough integrals holds under Lipγ −1 regularity, γ > p.)

11.1.1 Directional derivatives All smoothness properties under consideration will be local. On the other hand, the diﬀerential equation satisﬁed by these derivatives (and higher derivatives) naturally exhibits growth beyond the standard conditions for global existence. To make (iterated) localization arguments transparent, we make the following Deﬁnition 11.1 Let V = (V1 , . . . , Vd ) be a collection of vector ﬁelds on Re . We say that V satisﬁes the p non-explosion condition if for all R > 0, there exists M > 0 such that if (y0 , x) ∈ Re × C p-var [0, T ] ; G[p] Rd with xp-var;[0,T ] + |y0 | < R, π (V ) (0, y0 , x)

∞;[0,T ]

< M.

Following our usual convention we agree that, in the case of non-uniqueness, π (V ) (0, y0 ; x) stands for any full RDE solutions driven by x along vector ﬁelds V started at y0 . For example, a collection of Lipγ −1 (Re )-vector ﬁelds, with γ > p, satisﬁes the p non-explosion condition.

11.1 Smoothness of the Itˆ o–Lyons map

283

In what follows, we ﬁx a collection V = (V1 , . . . , Vd ) of Lipγlo c (Re )-vector ﬁelds that satisﬁes the p non-explosion condition. Motivated by the presentation of directional derivatives of ODE solutions established in Theorem 4.4, we make the following deﬁnitions. 1. Consider the ODE       x x dx  ≡ V˜  h  dχ, dh d h  =  y y V (yu ) dxu

(11.1)

where χt = (xt , ht ) ∈ Rd ⊕ Rd and V˜ ∈ Lipγloc is deﬁned by the last equality. We then deﬁne the map f1,(V ) : Re × C p-var [0, T ] , Gd Rd ⊕ Rd → C p-var [0, T ] , G[p] Rd ⊕ Rd ⊕ Re by1 f1,(V ) : (y0 , χ) → π (V˜ ) (0, exp ((0, 0, y0 )) ; χ) . 2. Consider the (Riemann–Stieltjes) integral · · DV xt M (yt ) d ≡ ϕ (wt ) dwt , = V ht H 0 0

(11.2)

where w = (x, h, y) ∈ Rd ⊕ Rd ⊕ Re and ϕ ∈ Lipγloc−1 is deﬁned by the last equality. We then deﬁne the map f2,(V ) : C p-var [0, T ] , G[p] Rd ⊕ Rd ⊕ Re → C p-var [0, T ] , G[p] Re×e ⊕ Re as the rough integral f2,(V ) : w →

·

ϕ (w) dw. 0

3. Consider the linear ODE dzt = dMt · zt + dHt ≡ A (zt ) d

Mt Ht

(11.3)

1 There is some irrelevant freedom in choosing the starting point. Any y 0 ∈ G [p ] Rd ⊕ Rd ⊕ Re with the property that the last component of π 1 (y 0 ) ∈ Rd ⊕Rd ⊕Re is equal to y 0 ∈ Re will do.

RDEs: smoothness

284

where A is a collection of linear (strictly speaking: aﬃne-linear) vector ﬁelds. We then deﬁne the map f3,(V ) : Re × C p-var [0, T ] , G[p] Re×e ⊕ Re → C p-var ([0, T ] , Re ) as a solution to the corresponding (linear) RDE, namely f3,(V ) : (z0 , ξ) → π (A ) (0, z0 ; ξ) . Remark 11.2 Observe that if (y0 , x) , (v, h) ∈ Re × C 1-var [0, T ] , Rd , then we proved in Theorem 4.4 that the derivative of the ODE map π (V ) (0, y0 ; x) in (y0 , x) is given by D(v ,h) π (V ) (0, y0 ; x) = f3,(V ) v, f2,(V ) ◦ f1,(V ) y0 , S[p] (x ⊕ h) .

We are now ready for our main theorem: Theorem 11.3 (directional derivatives in starting point and perturbation) (i) γ > p, and V = (V1 , . . . , Vd ) is a collection of Lipγlo c (Re )-vector ﬁelds that satisﬁes the p non-explosion condition; (ii) x ∈ C p-var [0, T ] , G[p] Rd is a weak geometric p-rough path; (iii) v ∈ Re and h ∈ C q -var [0, T ] , Rd with 1/p + 1/q > 1 and q ≤ p. Then ε → π (V ) (0, y0 + εv; Tεh (x)) is diﬀerentiable in C p-var ([0, T ] , Re ), and its derivative at 0 is given by f3,(V ) v, f2,(V ) ◦ f1,(V ) y0 , S[p] (x ⊕ h) . Proof. Without loss of generality, we can assume that the vector ﬁelds are γ e 1-var in Lip (R ) . We then consider any sequence of paths (xn , hn ) ∈ C d [0, T ] , R such that sup S[p] (xn )p-var;[0,T ] + hn q -var;[0,T ] < ∞ n lim d∞ S[p] (xn ) , x + d∞ S[p] (hn ) , h = 0. n →∞

From basic continuity properties of the Young pairing of such rough paths (cf. Remark 9.32) this implies that sup S[p] (xn ⊕ hn )p-var;[0,T ] < ∞ n lim d∞ S[p] (xn ⊕ hn ) , S[p] (x ⊕ h) = 0. n →∞

Let us use the notations Yp ≡ C p-var [0, T ] , Rd and Y∞ ≡ C [0, T ] , Rd ; they are Banach spaces when equipped with p-variation and ∞-norm. For

11.1 Smoothness of the Itˆ o–Lyons map

285

any ﬁxed n ∈ N, the map θ ∈ R → π (V ) 0, y0 + θv; S[p] (xn + θhn ) ∈ Y1 is continuously diﬀerentiable in Y1 with derivative given by gn (θ) ≡ f3,(V ) v, f2,(V ) ◦ f1,(V ) y0 + θv, S[p] (xn + θhn ) , a consequence of our smoothness results on ODE solution maps, Theorem 4.4. By the fundamental theorem of calculus (in a Banach setting, cf Section B.1 in Appendix B), for all ε ∈ [0, 1] and n ∈ N we then have ε gn (θ) dθ, π (V ) 0, y0 + εv; S[p] (xn + εhn ) − π (V ) 0, y0 ; S[p] (xn ) = 0

(11.4) as an equation in Y1 (by which we mean in particular that the integral appearing on the right-hand side is the limit in Y1 of its Riemann-sum approximations). From the continuous embedding Y1 → Y∞ we can view (11.4) as an equation in Y∞ and as such we now try to send n → ∞ in (11.4). By continuity of the translation operator and the Itˆ o–Lyons map (Theorem 9.33) we have π (V ) 0, y0 + εv; S[p] (xn + εhn ) → π (V ) (0, y0 + εv; Tεh (x)) in Y∞ (even with uniform p-variation bounds) for any ε (including ε = 0) which justiﬁes the passage to the limit in the left-hand side of (11.4) to π (V ) (0, y0 + εv; Tεh (x)) − π (V ) (0, y0 ; x) (which is actually an element of Yp ≡ C p-var [0, T ] , Rd , not only Y∞ ). On the other hand, from Theorem 10.47, we see that sup |gn (θ) − g (θ)|Y∞ → 0 as n → ∞

θ ∈[0,1]

where g (θ) ≡ f3,(V ) v, f2,(V ) ◦ f1,(V ) (y0 + θv, Tθ h (x)) . Clearly then

0

ε

(gn (θ) − g (θ)) dθ

Y∞

≤ 0

ε

|gn (θ) − g (θ)|Y∞ dθ → 0 as n → ∞

which justiﬁes the passage to the limit in the right-hand side of (11.4) and we obtain ε g (θ) dθ, π (V ) (0, y0 + εv; Tεh (x)) − π (V ) (0, y0 ; x) = 0

as an equation in Y∞ . Now, π (V ) (0, y0 ; x) , π (V ) (0, y0 + εv; Tεh (x)) ∈ Yp Y∞ and for the integrand on the right-hand side we even have {θ → g (θ)} ∈ C ([0, 1] , Yp ) .

286

RDEs: smoothness

To prove this, from the continuity of the Itˆ o map (Theorem 10.26 and its corollaries), it is enough to prove that θ → Tθ h (x) = plus ◦ δ 1,θ S[p] (x ⊕ h) is a continuous function from [0, 1] into Yp . But this is easily implied by Proposition 8.11. By a simple fact of Banach calculus (Proposition B.1 in Appendix B) it then follows that ε → π (V ) (0, y0 + εv; Tεh (x)) is continuously diﬀerentiable in Yp and the proof is then ﬁnished. We can generalize the previous theorem by perturbing the driving rough path in a more general way. Indeed, after replacing in the proof above Tεh (x) by plus◦δ 1,ε S[p] (x, h) (the two are equal), we observe that all we need to translate a rough path x is a path χ ∈ C p-var [0, T ] , G[p] Rd ⊕ Rd that projects onto x, i.e. such that plus ◦ δ 1,0 (χ) = x. We obtain the following result. Proposition 11.4 (i) γ > p, and V = (V1 , . . . , Vd ) is a collection of loﬁelds that satisﬁes cally Lipγ (Re )-vector the p non-explosion condition; (ii) χ ∈ C p-var [0, T ] , G[p] Rd ⊕ Rd ; (iii) v ∈ Re . Then ε → π (V ) (0, y0 + εv; plus ◦ δ 1,ε (χ)) is diﬀerentiable in C p-var ([0, T ] , Re ), and its derivative at 0 is given f3,(V ) v, f2,(V ) ◦ f1,(V ) (y0 , χ) . If χ = S[p] (x, h) , the proposition is exactly the previous theorem. If we want to diﬀerentiate π (V ) (0, y0 ; x) in the direction of a path h ∈ C p-var [0, T ] , G[p] Rd which is not of ﬁnite q-variation with q −1 + p−1 > 1, then the previous proposition tells us to construct a rough path χ that projects on both x and h (in the ∈ C p-var [0, T ] , G[p] Rd ⊕ Rd sense that plus ◦ δ 1,0 (χ) = x and plus ◦ δ 0,1 (χ) = h). This would allow us, for instance, to diﬀerentiate the solution of an SDE in the direction of another Brownian motion, or in the direction of (L´evy-)area perturbations. That said, we shall not pursue these directions here and return to the Young perturbation setting of Theorem 11.3. We now address the question of higher directional derivatives. Proposition 11.5 Assume that (Re )(i) γ > p, k ≥ 1, and V = (V1 , . . . , Vd ) is a collection of Lipklo−1+γ c vector ﬁelds that satisﬁes the p non-explosion condition; (ii) x ∈ C p-var [0, T ] , G[p] Rd is a weak geometric p-rough path; (iii) v1 , . . . vk ∈ Re and h1 , . . . , hk ∈ C q -var [0, T ] , Rd with 1/p + 1/q > 1 and q ≤ p. Then the following directional derivatives exist in C p-var ([0, T ] , Re ) for all

11.1 Smoothness of the Itˆ o–Lyons map

287

j ∈ {1, . . . , k} , D(v 1 ,...,v j ;h 1 ,...,h j ) π (V ) (0, y0 ; x) $ # j ∂j π (V ) 0, y0 + εi vi , T( j ε i h i ) (x) = i= 1 ∂ε1 . . . ∂εj i=1

ε 1 =...ε j =0

and the ensemble of these derivatives satisﬁes the RDE obtained by formal diﬀerentiation. Proof. The argument is the same as in the ODE case (cf. Proposition 4.6). All we need to observe is that f3,(V ) v, f2,(V ) ◦ f1,(V ) y0 , S[p] (x, h) can be obtained by the projection of the solution of an RDE driven along locally Lipk −2+γ vector ﬁelds satisfying the p non-explosion condition.

11.1.2 Fr´echet diﬀerentiability Theorem 11.6 Assume that (i) γ > p, k ≥ 1, and V = (V1 , . . . , Vd ) is a collection of Lipk −1+γ (Re )vector ﬁelds that satisﬁes the pnon-explosion condition; (ii) x ∈ C p-var [0, T ] , G[p] Rd is a geometric p-rough path. Then, the map (y0 , h) ∈ Re × C q -var [0, T ] , Rd → π (V ) (0, y0 ; Th (x)) ∈ C p-var ([0, T ] , Re ) is C k -Fr´echet. Proof. Once again, the proof is identical to the ODE case (cf. Propok π (V ) (0, y0 ; x) is sition 4.8). The map (y0 , x) , (vi , hi )1≤i≤k → D(v i ,h i ) 1 ≤i ≤k

uniformly continuous on bounded sets because of (i) uniform continuity on bounded sets of the Itˆo–Lyons map and (ii) uniform continuity on bounded sets of rough integration. We can now appeal to Corollary B.11 in Appendix B. The proof is then ﬁnished. Corollary 11.7 (Li–Lyons) Let k ∈ {1, 2, . . . } and p ∈ [1, 2) and consider the (Young) diﬀerential equation dy = V (y) dx along Lipγ -vector ﬁelds on Re for γ > p − 1 + k with (unique) solution y = π (V ) (0, y0 ; x). Then (y0 , x) ∈ Re × C p-var [0, T ] , Rd → π (V ) (0, y0 ; x) ∈ C p-var ([0, T ] , Re ) is C k in the Fr´echet sense.

RDEs: smoothness

288

Proof. Apply the previous theorem with p = q ∈ [1, 2). In this case, the driving signal is Rd -valued, say x, and since then Th (x) = x + h we can take x ≡ 0. The previous theorem shows C k -Fr´echetness of the map from (y0 , h) to the C p-var ([0, T ] , Re )-valued solution of dy = V (y) d (0 + h) . Replacing the letter h by x leads to the claimed statement. Exercise 11.8 (Kusuoka) (i) Assume that α ∈ (1/4, 1/2) and set p = 1/α. Let γ > p, k ≥ 1, and V = (V1 , . . . , Vd ) be a collection of Lipk −1+γ (Re )vector ﬁelds that non-explosion condition; satisﬁes the p older rough path. Take (ii) x ∈ C α -H¨o l [0, T ] , G[p] Rd is a geometric α-H¨ δ ∈ (1/2, 1/2 + α) and let us consider the fractional Sobolev (or Besov) space W0δ ,2 (which is strictly bigger than the usual Cameron–Martin space W01,2 ). Show that the map (y0 , k) ∈ Re × W0δ ,2 [0, T ] , Rd → π (V ) (0, y0 ; Tk (x)) ∈ C α -H¨o l ([0, T ] , Re ) is C k -Fr´echet. Solution. Using the p-variation properties of Besov spaces, as discussed in Example 5.16, the Young pairing (x, h) → S[p] (x ⊕ h) is continuous from C α -H¨o l ×W0δ ,2 → C α -H¨o l and this is the only modiﬁcation needed in the arguments of this section. y s ,x Exercise 11.9 (Duhamel’s principle) Write Jt←s for the derivative e e (“Jacobian”) of yt ≡ π (V ) (s, ·; x)t : R → R at some point ys ∈ Re . Establish the formula

D(v ,h) π (V ) (0, y0 ; x)t

=

y 0 ,x Jt←0 ·v d t y ,x s Jt←s · Vi π (V ) (0, y0 ; x)s dhis . + i=1

0

Detail all assumptions. y 0 ,x Exercise 11.10 Assume V ∈ Lipγ (Re ) and write J·←0 for the (Fr´echet) derivative of yt ≡ π (V ) (0, ·; x) : Re → C p-var [0, T ] , Rd at some point y0 ∈ y 0 ,x can be viewed as an element in C p-var ([0, T ] , Re×e ), Re . Noting that J·←0 show that p y 0 ,x |p-var;[0,T ] ≤ C exp C xp-var;[0,T ] |J·←0

with a suitable constant C depending on p, γ and |V |Lip γ . y 0 ,x Solution. One proceeds as in Exercise 10.55, noting that J·←0 satisﬁes e a linear RDE starting at I, the identity map in R . Note the constant C can be chosen independent of y0 thanks to translation invariance of the Lipγ -norm, i.e. |V (y0 + ·)|Lip γ = |V |Lip γ .

11.2 Flows of diﬀeomorphisms

289

11.2 Flows of diﬀeomorphisms We saw that Lipγ + k −1 -regularity on the vector ﬁelds V implies that y0 ∈ Re → π (V ) (0, y0 ; x) ∈ C p-var ([0, T ] , Re ) is C k -Fr´echet. Relatedly, under the same regularity assumptions, we now show that the map (t, y0 ) → π (0, y0 ; x)t is a ﬂow of C k -diﬀeomorphisms, i.e. an element in the space Dk (Re ) deﬁned as   φ : [0, T ] × Re → Re : (t, y) → φt (y) such that   ∀t ∈ [0, T ] : φt is a C k -diﬀeomorphism of Re Dk (Re ) := .   ∀α : |α| ≤ k : ∂α φt (y) , ∂α φ−1 (y) are continuous in (t, y) t (11.5) Proposition 11.11 Let p ≥ 1 and k ∈ {1, 2, . . .} and assume V = (V1 , . . . ,Vd ) + k −1 -vector ﬁelds on Re for γ > p. Assume x ∈ is a collection of Lipγ p-var [p] d [0, T ] , G R . Then, the map C φ : (t, y) ∈ [0, T ] × Re → π (V ) (0, y; x)t ∈ Re is a ﬂow of C k -diﬀeomorphisms. Moreover, for any multi-index α with 1 ≤ |α| ≤ k, the maps (t, y) ∈ [0, T ] × Re → ∂α φt (y) , ∂α φ−1 t (y) are bounded by a constant only depending on p, γ, k, xp-var;[0,T ] and |V |Lip γ + k −1 . Proof. We proceed as in the ODE case (Corollary 4.9). Clearly, y0 ∈ −1 Re → π (V ) (0, y0 ; x)t is in C k (Re , Re ). We then argue that π (V ) (0, ·; x)t = − , where ← − (·) = x (t − ·) ∈ C p-var [0, t] , G[p] Rd . Indeed, x π (V ) 0, ·; ← x t we have seen that this holds (cf. the proof of Corollary 4.9) in the ODE case, i.e. when x is replaced by some continuous, bounded variation path x. A simple limit argument (in fact: our deﬁnition of an RDE solution combined with uniqueness) then shows that this identity remains valid in the RDE setting. It follows that π (V ) (0, ·; x)t is a bijection whose inverse is also in C k (Re , Re ). This ﬁnishes the proof that π (V ) (0, ·; x)t is a C k -diﬀeomorphism of Re . At last, each ∂α -derivative of π (V ) (0, ·; x)t resp. −1 π (V ) (0, ·; x)t can be represented via (non-explosive) RDE solutions which plainly implies joint continuity in t and y0 . This also yields the claimed boundedness, since for a ﬁxed y say ∂α π (V ) (0, y; x) = ∂α π (V˜ ) (0, 0; x)

RDEs: smoothness

290

where V˜ = V (y + ·); it is then clear that supt∈[0,T ] |∂α π (V˜ ) (0, 0; x)t | will be bounded by a constant depending only on k, p, γ, xp-var;[0,T ] and ˜ V

Lip γ + k −1

= |V |Lip γ + k −1 ,

thanks to translation invariance of Lip-norms. The following statement is a ﬁrst limit theorem for RDE ﬂows. The uniformity in y0 ∈ Re is a consequence of the invariance of the Lipγ -norm under translation, ∀y0 ∈ Re , γ ≥ 1 : |V |Lip γ = |V (y0 + ·)|Lip γ .

(11.6)

Theorem 11.12 Let p ≥ 1 and k ∈ {1, 2, . . . } and assume V = (V1 , . . . , Vd ) is a collection of Lipγ + k −1 -vector ﬁelds on Re for γ > p. Write α = (α1 , . . . , αe ) ∈ Ne and |α| = α1 + · · · + αe ≤ k. Then the ensemble ∂α π (V ) (0, y0 ; x) : |α| ≤ k depends continuously on x ∈ C p-var [0, T ] , G[p] Rd . More precisely, for all ε, R > 0 there exists δ (depending also on p, γ, k and |V |Lip γ + k −1 ) such that for all x1 , x2 with maxi=1,2 xi p-var;[0,T ] ≤ R and dp-var;[0,T ] x1 , x2 < δ we have sup ∂α π (V ) 0, y0 ; x1 − ∂α π (V ) 0, y0 ; x2 p-var;[0,T ] < ε.

y 0 ∈Re

(11.7)

If x is a geometric 1/p-H¨ older rough path, we may replace p-variation by 1/p-H¨ older throughout. Proof. We show for all ε > 0 that there exists δ such that dp-var;[0,T ] x1 , x2 < δ implies sup ∂α π (V ) 0, y0 ; x1 − ∂α π (V ) 0, y0 ; x2 p-var;[0,T ] < ε y 0 ∈Re

where α is an arbitrary multi-index with |α| = α1 + · · · + αe ≤ k. The main observation is that we can take y0 = 0 at the price of replacing V by V (y0 + ·). Thus, thanks to (11.6), uniformity in y0 ∈ Re will come for free provided our choice of δ depends on V only through |V |Lip γ + k −1 . Case 1: Assume k = 1 so that V ∈ Lipγ . In this case, ∂α π (V ) (0, y0 ; x) corresponds to a directional derivative in one of the basis directions of Re , say ej . From Theorem 11.3 we can write ∂α π (V ) (0, 0; x) as a composition of the form f3,(V ) ej , f2,(V ) ◦ f1,(V ) (0, x) .

11.2 Flows of diﬀeomorphisms

291

Inspection of the respective deﬁnitions of these maps shows continuous dependence in x with modulus of continuity only depending on |V |Lip γ , as required. More precisely, f1,(V ) was deﬁned as a full RDE solution and the continuity estimate for full RDE solutions, Corollary 10.39, clearly shows that the modulus of continuity only depends on |V |Lip γ . Similar remarks apply to f2,(V ) and f3,(V ) after inspection of the continuity estimates for rough integrals and solutions of RDEs with linear vector ﬁelds. Case 2: Now assume V ∈ Lipγ +k −1 for k > 1. We have already pointed out that the ensemble ∂α π (V ) (0, 0; x) : |α| ≤ k − 1 can be written as a solution to an RDE along Lipγloc -vector ﬁelds (satisfying the p non-explosion condition). After localization, it can be written as an RDE solution along genuine Lipγ -vector ﬁelds where we insist that the Lipγ -norm of these localized vector ﬁelds only depends on xp-var;[0,T ] and |V |Lip γ + k −1 . We can now appeal to case 1 and the proof is ﬁnished. (The adaptation to the H¨ older case is left to the reader as a simple exercise.) Theorem 11.13 The conclusion of Theorem 11.12 holds with (11.7) replaced by2 −1 −1 sup ∂α π (V ) 0, y0 ; x1 − ∂α π (V ) 0, y0 ; x2 < ε. (11.8) y 0 ∈Re

p-var;[0,T ]

If x is a geometric 1/p-H¨ older rough path, we may replace p-var by 1/p-H¨ol throughout. Proof. We proceed as in the proof of Theorem 11.12 and observe that we can take y0 = 0 at the price of replacing V by V (y0 + ·). Secondly, we only consider the case |α| = k = 1 so that V ∈ Lipγ . (The general case is reduced to this one as in the proof of Theorem 11.12, case 2.) It helps to note that with yˆ = π (s, y; x)t we have π (V ) (0, y; x (s − ·))s = π (V ) (0, yˆ; x (t − ·))t . We now consider the (inverse) ﬂow of two RDEs driven by xi , i = 1, 2 respectively, so that −1 π (V ) 0, y i ; xi s,t i i = π (V ) 0, y ; x (t − ·) t − π (V ) 0, yˆi ; xi (t − ·) t 1 i i = y − yˆ · ∇π 0, yˆi + τ y i − yˆi ; xi (t − ·) t dτ .

0 = π (s,y i ;x i ) s,t

2∂

α π (V )

=:F (x i )

(0, y 0 ; x)−1 denotes the path t → ∂α π (V ) (0, ·; x)−1 |·= y 0 ∈ Re . t

RDEs: smoothness

292

We can write

2 2 −1 π (V ) 0, y 1 ; x1 −1 0, y − π ; x (V ) s,t s,t ≤ π s, y 1 ; x1 s,t − π s, y 2 ; x2 s,t |F x1 | + π s, y 2 ; x2 s,t F x1 − F x2 ,

which leaves us with estimating four terms. First, from Corollay 10.27, ! " π s, y 1 ; x1 s,t − π s, y 2 ; x2 s,t ≤ c3 ρp-var;[s,t] x1 , x2 + y 1 − y 2 where c3 may depend on R ≥ maxi=1,2 xi p-var;[0,T ] and |V |Lip γ . Second, it is easy to see that 1 F x ≤ sup sup ∇π 0, y0 ; xi (t − ·) t t∈[0,T ] y 0 ∈Re

≤ c4 = c4 R, |V |Lip γ . Third, for c5 = c5 R, |V |Lip γ we have π s, y 2 ; x2 s,t ≤ c5 x2 p-var;[s,t] by Theorem 10.14. At last, we easily see that F x1 − F x2 is bounded from above by sup ∂α π (V ) 0, y0 ; x1 − ∂α π (V ) 0, y0 ; x2 < ε, α :|α |=1

p-var;[0,T ]

y 0 ∈Re

for an arbitrary ﬁxed ε> 0 which is possible by Theorem 11.12provided x1 , x2 satisfy maxi=1,2 xi p-var;[0,T ] ≤ R and dp-var;[0,T ] x1 , x2 < δ for some δ = δ (ε, R). Putting things together, and taking y 1 = y 2 = y0 = 0, we see that 2 −1 π (V ) 0, 0; x1 −1 − π (V ) 0, 0; x s,t s,t ≤ c3 c4 ρp-var;[s,t] x1 , x2 + c5 x2 p-var;[s,t] F x1 − F x2 ≤ c6 dp-var;[s,t] x1 , x2 + c6 x2 p-var;[s,t] ε (thanks to Theorem 8.10) and, by super-additivity of dp-var;[·,·] x1 , x2 resp. x2 p-var;[·,·] , this becomes −1 −1 − π (V ) 0, 0; x2 ≤ c6 δ + c6 Rε π (V ) 0, 0; x1 p-var;[0,T ]

with c6 = c6 R, |V |Lip γ . This estimate is more than enough to ﬁnish the proof; e.g. replace δ by min (δ, Rε) and start the argument with ε/ (c6 R)

11.2 Flows of diﬀeomorphisms

293

instead of ε. (The adaption to the H¨ older case is left to the reader as a simple exercise.) We already pointed out that π (V ) (0, ·; x) can be viewed as an element in Dk (Re ), the space of ﬂows of C k -diﬀeomorphisms. For any bounded set K ⊂ Re one can deﬁne |φ|(0,k );K :=

sup |∂α φt (y)| t∈[0,T ], α :|α |≤k , y ∈K

and, setting Kn := {y ∈ Re : |y| ≤ n}, dDk (Re ) (φ, ψ) =

∞ 1 |φ − ψ|(0,k );Kn 2n 1 + |φ − ψ|(0,k );Kn n =1

and also d˜Dk (Re ) (φ, ψ) = dDk (Re ) (φ, ψ)+dDk (Re ) φ−1 , ψ −1 . One can check that Dk (Re ) is a Polish space under d˜Dk (Re ) and that convergence d˜Dk (Re ) (φn , φ) → 0 is equivalent to −1 −1 sup |∂α φnt (y) − ∂α φt (y)| + ∂α (φnt ) (y) − ∂α (φt ) (y) → 0

t∈[0,T ], y ∈K, α :|α |≤k

for all compact subsets K ⊂ Re . We then have the following “limit theorems for ﬂows of diﬀeomorphisms”, as an immediate consequence of Theorems 11.12 and 11.13. Corollary 11.14 Under the assumptions of Theorem 11.12, the map x ∈ C p-var [0, T ] , G[p] Rd → π (V ) (0, ·; x) ∈ Dk (Re ) is (uniformly) continuous (on bounded sets). Exercise 11.15 Establish continuity of x ∈ C p-var [0, T ] , G[p] Rd → π (V ) (0, ·; x) ∈ Dkp-var (Re ) where Dkp-var is constructed as Dk (Re ) but with the semi-norm |φ|(0,k );K replaced by sup

sup

α :|α |≤k , y ∈K

(t i )⊂[0,T ]

p ∂α φt i ,t i + 1 (y)

1/p

Conduct a similar dicussion in the 1/p-H¨ older context.

.

RDEs: smoothness

294

11.3 Application: a class of rough partial diﬀerential equations Let S n denote the set of symmetric n × n matrices and consider the partial diﬀerential equations of parabolic type u˙ u (0, ·)

F t, x, Du, D2 u , = u0 ∈ BUC (Rn ) ,

=

(11.9) (11.10)

where F = F (t, x, p, X) ∈ C ([0, T ] , Rn , Rn , S n ) is assumed to be degenerate elliptic 3 and u = u (t, x) ∈ BUC ([0, T ] × Rn ) is a real-valued function of time and space.4 Equation (11.9) will be interpreted in viscosity sense and we recall5 that this means that u is a viscosity sub- (and super-) solu¯) is a tion to ∂t − F = 0; that is, if ψ ∈ C 2 ((0, T ) × Rn ) is such that (t¯, x maximum (resp. minimum) of u − ψ then ¯) . ψ˙ (t¯, x ¯) ≤ (resp. ≥ ) F t¯, x ¯, Dψ (t¯, x ¯) , D2 ψ (t¯, x The aim of this section is to allow for some “rough” perturbation of the form Du · V (x) dzt ≡ ∂i u (t, x) Vji (x) dztj i,j

d

: [0, T ] → Rd and, as usual, V = (V1 , . . . , Vd ) where z = z 1 , . . . , z denotes a collection of suﬃciently nice vector ﬁelds on Re . As pointed out in [111], classical (deterministic) second-order viscosity theory can deal at best with z ∈ W 1,1 [0, T ] , Rd , i.e. measurable dependence in time of dz/dt. Any “rough” partial diﬀerential equation of the form du = F t, x, Du, D2 u dt − Du (t, x) · V (x) dzt where z enjoys only “Brownian” regularity of z (i.e. just below 1/2-H¨ older), or less, falls dramatically outside the scope of the deterministic theory. However, one can give meaning to this equation (and then establish existence, uniqueness, stability, etc.) via ideas from rough path theory; that is, by accepting that z should be replaced by a geometric p-rough path z. The main result of this section is6

3 This means F (. . . , X ) ≥ F (. . . , Y ) whenever X ≥ Y in the sense of symmetric matrices. 4 BUC denotes the space of bounded, uniformly continuous functions, equipped with local-uniform topology. 5 See the “User’s guide” [34] or Fleming and Soner’s textbook [53]. 6 Unless otherwise stated, BUC-spaces will be equipped with the topology of locally uniform convergence.

11.3 Application

295

Theorem 11.16 Let p ≥ 1 and (z ε ) ⊂ C ∞ [0, T ] , Rd so that S[p] (z ε ) → z ∈ C 0,p-var [0, T ] , G[p] Rd . Assume that uε0 ∈ BUC (Rn ) → u0 ∈ BUC (Rn ) , locally uniformly as ε → 0. Let F = F (t, x, p, X) be continuous, degenerate elliptic, and assume that ∂t − F satisﬁes Φ(3) -invariant comparison (cf. deﬁnition 11.19 below). Assume that V = (V1 , . . . , Vd ) is a collection of Lipγ +2 (Rn ) vector ﬁelds with γ > p. Consider (necessarily unique7) viscosity solutions uε ∈ BUC ([0, T ] × Rn ) to duε = F t, x, Duε , D2 uε dt − Duε · V (x) dz ε (t) = 0, (11.11) uε (0, ·) = uε0 , (11.12) and assume that the resulting family (uε : ε > 0) is locally uniformly bounded 8 . Then (i) there exists a unique u ∈ BUC ([0, T ] × Rn ), only dependent on z and u0 but not on the particular approximating sequences, such that uε → u locally uniformly. We write (formally) du = F t, x, Du, D2 u dt − Du · V (x) dz (t) = 0, (11.13) (11.14) u (0, ·) = u0 ; and also u = uz when we want to indicate the dependence on z; (ii) we have the contraction property ˆz |∞;Rn ×[0,T ] ≤ |u0 − u ˆ0 |∞;Rn |uz − u where u ˆz is deﬁned as the limit of u ˆη , deﬁned as in (11.11) with uε replaced η by u ˆ throughout; (iii) the solution map (z,u0 ) → uz from C p-var [0, T ] , G[p] Rd × BUC (Rn ) → BUC ([0, T ] × Rn ) is continuous. Let us recall that comparison (for BUC-solutions of ∂t − F = 0 ) means that, whenever u, v ∈ BUC ([0, T ] × Rn ) are viscosity sub- (resp. super-) solutions to (11.9) with respective BUC-initial datas u0 ≤ v0 , then u ≤ v on [0, T ] × Rn . 7 This

follows from the ﬁrst ﬁve lines in the proof of this theorem. simple suﬃcient condition is boundedness of F (·, ·, 0, 0) on [0, T ] × Rn , and the assumption that u ε0 → u 0 uniformly, as can be seen by comparison. 8A

RDEs: smoothness

296

Given F ∈ C ([0, T ] , Rn , Rn , S n ), a suﬃcient condition for comparison9 is the following technical10 Condition 11.17 There exists a function θ : [0, ∞] → [0, ∞] with θ (0+) = 0, such that for each ﬁxed t ∈ [0, T ], 2 ˜| F (t, x, α (x − x ˜) , X) − F (t, x ˜, α (x − x ˜) , Y ) ≤ θ α |x − x ˜| + |x − x (11.15) for all α > 0, x, x ˜ ∈ Rn and X, Y ∈ S n (the space of n × n symmetric matrices) satisfy I 0 X 0 I −I −3α ≤ ≤ 3α . 0 I 0 −Y −I I Furthermore, we require F = F (t, x, p, X) to be uniformly continuous on sets BR := {(t, x, p, X) ∈ [0, T ] × Rn × Rn × S n : |p| ≤ R, |X| ≤ R} ∀R < ∞. Remark 11.18 Using the elementary inequalities, |sup (.) − sup (∗)| , |inf (.) − inf (∗)| ≤ sup |. − ∗| one immediately sees that if Fγ , Fγ ,β satisfy (11.15) for γ, β in some index set with a common modulus θ, then inf γ Fγ , supβ inf γ Fβ ,γ , etc. again satisfy (11.15). Similar remarks apply to the uniform continuity property; provided there exists, for any R < ∞, a common modulus of continuity σ R for all Fγ , Fγ ,β restricted to BR . To state our key assumption on F we need some preliminary remark on the transformation behaviour of Du = (∂1 u, . . . , , ∂n u) , D2 u = (∂ij u)i,j =1,...,n under change of coordinates on Rn where u = u (t, ·), for ﬁxed t. Let us allow the change of coordinates to depend on t, say v (t, ·) := u (t, φt (·)) where φt : Rn → Rn is a diﬀeomorphism. Diﬀerentiating v t, φ−1 t (·) = u (t, ·) twice, followed by evaluation at φt (y), we have, with summation over repeated indices, ∂i u (t, φt (x)) = ∂ij u (t, φt (x)) =

∂k v (t, x) ∂i φ−1;k |φ t (x) t ∂k l v (t, x) ∂i φ−1;k |φ t (x) ∂j φ−1;l |φ t (x) t t + ∂k v (t, x) ∂ij φ−1;k |φ t (x) . t

9 ...

which, en passant, implies degenerate ellipticity, cf. page 18 in [34]. e.g. [34, (3.14) and Section 8], [53, Section V.7, V.8] or the appendix of [24].

1 0 See

11.3 Application

297

We shall write this, somewhat imprecisely11 but conveniently, as Du|φ t (x)

=

2

D u|φ t (x)

=

3 Dv|x , Dφ−1 t |φ t (x) ,

2 2

D

2

v|x , Dφ−1 t |φ t (x)

⊗

Dφ−1 t |φ t (x)

3

(11.16) 2 3 2 −1 + Dv|x , D φt |φ t (x) .

Let us now introduce Φ(k ) as the class of all ﬂows of C k -diﬀeomorphisms of Rn , φ = (φt : t ∈ [0, T ]), such that φ0 = Id ∀φ ∈ Φ(k ) and such that have k bounded derivatives, uniformly in t ∈ [0, T ]. Since φt and φ−1 t Φ(k ) ⊂ Dk (Rn ) we inherit a natural notion of convergence: φ (n) → φ in Φ(k ) iﬀ for all multi-indices α with |α| ≤ k, ∂α φ (n) → ∂α φt , ∂α φ (n)

−1

→ ∂α φ−1 locally uniformly in [0, T ] × Rn . t

Deﬁnition 11.19 (Φ(k ) -invariant comparison) Let k ≥ 2 and 3 2 F φ ((t, x, p, X)) := F t, φt (x) , p, Dφ−1 t |φ t (x) , 3 2 3 2 −1 2 −1 . X, Dφ−1 t |φ t (x) ⊗ Dφt |φ t (x) + p, D φt |φ t (x)

(11.17)

We say that ∂t = F satisﬁes Φ(k ) -invariant comparison if, for every φ ∈ Φ(k ) , comparison holds for BU C solutions of ∂t − F φ = 0. Example 11.20 (F linear) Suppose that σ (t, x) : [0, T ] × Rn → Rn ×n and b (t, x) : [0, T ] × Rn → Rn are bounded, continuous in t and! Lipschitz continuous in x, uniformly in t ∈ [0, T ]. If F (t, x, p, X) = " T Tr σ (t, x) σ (t, x) X + b (t, x) · p, then Φ(3) -invariant comparison holds. Although this is a special case of the following example, let us point out that F φ is of the same form as F with σ, b replaced by k

σ φ (t, x)m k

bφ (t, x)

= σ im (t, φt (x)) ∂i φ−1;k |φ t (x) , k = 1, . . . , n; m = 1, . . . , n t " ! = bi (t, φt (x)) ∂i φ−1;k |φ t (x) t σ im σ jm ∂ij φ−1;k | + φ t (y ) , k = 1, . . . , n. t i,j

By deﬁning properties of ﬂows of diﬀeomorphisms, t → ∂i φ−1;k |φ t (x) , t −1;k 3 ∂ij φt |φ t (y ) is continuous and the C -boundedness assumption inherent in our deﬁnition of Φ(3) ensures that σ φ , bφ are Lipschitz in x, uniformly in t ∈ [0, T ]. It is then easy to see (cf. the argument of [53, Lemma 7.1]) that F φ satisﬁes condition 11.17 for every φ ∈ Φ(3) . This implies that Φ(3) -invariant comparison holds for BU C solutions of ∂t − F φ = 0. speaking, one should view Du, D 2 u |· as a second-order cotangent vector, the pull-back of Dv, D 2 v |x under φ −1 t . 1 1 Strictly

RDEs: smoothness

298

Exercise 11.21 (F quasi-linear) Let ! " T F (t, x, p, X) = Tr σ (t, x, p) σ (t, x, p) X + b (t, x, p) .

(11.18)

(i) Assume that b = b (t, x, p) : [0, T ]×Rn ×Rn → R is bounded, continuous and Lipschitz continuous in x and p, uniformly in t ∈ [0, T ]. (ii) Assume that σ = σ (t, x, p) : [0, T ] × Rn × Rn → Rn ×n is a continuous, bounded map such that σ (t, ·, p) is Lipschitz continuous, uniformly in (t, p) ∈ [0, T ] × Rn ; assume also existence of a constant c > 0, such that ∀p, q ∈ Rn : |σ (t, x, p) − σ (t, x, q)| ≤ c

|p − q| 1 + |p| + |q|

(11.19)

for all t ∈ [0, T ] and x ∈ Rn . Show that under these assumptions Φ(3) invariant comparison holds for ∂t = F . Example 11.22 (F of Hamilton–Jacobi–Bellman type) From Example 11.20 and Remark 11.18, we see that Φ(3) -invariant comparison holds when F is given by ! " T F (t, x, p, X) = inf Tr σ (t, x; γ) σ (t, x; γ) X + b (t, x; γ) · p , γ ∈Γ

the usual non-linearity in the Hamilton–Jacobi–Bellman equation, whenever the conditions in Example 11.20 are satisﬁed uniformly with respect to γ ∈ Γ. More generally, one can take the inﬁmum of quasi-linear Fγ , provided the conditions in Exercise 11.21 are satisﬁed uniformly. Example 11.23 (F of Isaac type) Similarly, Φ(3) -invariant comparison holds for ! " T F (t, x, p, X) = sup inf Tr σ (t, x; β, γ) σ (t, x; β, γ) X + b (t, x; β, γ) · p β

γ

(such non-linearities arise in the Isaac equation in the theory of diﬀerential games), and more generally F (t, x, p, X) ! " T = sup inf Tr σ (t, x, p; β, γ) σ (t, x, p; β, γ) · X + b (t, x, p; β, γ) β

γ

whenever the conditions in Examples 11.20 and 11.21 are satisﬁed uniformly with respect to β ∈ B and γ ∈ Γ, where B and Γ are arbitrary index sets. Lemma 11.24 Let z : [0, T ] → Rd be smooth and assume that we are given Lipγ -vector ﬁelds V = (V1 , . . . , Vd ) with γ > 3. Then the ODE dyt = V (yt ) dzt , t ∈ [0, T ] has a unique solution ﬂow φ = φz ∈ Φ(3) .

11.3 Application

299

Proof. This follows directly from Proposition 11.11 applied with p = 1. A direct ODE proof, building on Corollary 4.9 and then arguing as in the proof of Proposition 11.11 is also not diﬃcult (and actually shows that C 3 -boundedness of V is enough here). Proposition 11.25 Let z, V and φ be as in Lemma 11.24. Then u is a viscosity sub- (resp. super-) solution (always assumed BUC) of (11.20) u˙ (t, x) = F t, x, Du, D2 u − Du (t, x) · V (x) z˙t if and only if v (t, x) := u (t, φt (x)) is a viscosity sub- (resp. super-) solution of (11.21) v˙ (t, x) = F φ t, x, Dv, D2 v where F φ was deﬁned in (11.17). Proof. Set y = φt (x). When u is a classical sub-solution, it suﬃces to use the chain rule and deﬁnition of F φ to see that v˙ (t, x)

= u˙ (t, y) + Du (t, y) · φ˙ t (x) = u˙ (t, y) + Du (t, y) · V (y) z˙t ≤ F t, y, Du (t, y) , D2 u (t, y) = F φ t, x, Dv (t, x) , D2 v (t, x) .

The case when u is a viscosity sub-solution of (11.20) is not much harder: 2 n suppose that (t¯, x ¯) is a maximum of v − ξ, where −1ξ ∈ C ((0, T ) × R ) and 2 n x) so deﬁne ψ ∈ C ((0, T ) × R ) by ψ (t, y) = ξ t, φt (y) . Set y¯ = φt¯ (¯ that ¯, Dξ (t¯, x ¯) , D2 ξ (t¯, x ¯) . F t¯, y¯, Dψ (t¯, y¯) , D2 ψ (t¯, y¯) = F φ t¯, x Obviously, (t¯, y¯) is a maximum of u − ψ, and since u is a viscosity subsolution of (11.20) we have ψ˙ (t¯, y¯) + Dψ (t¯, y¯) V (¯ y ) z˙ (t¯) ≤ F t¯, y¯, Dψ (t¯, y¯) , D2 ψ (t¯, y¯) . ¯) = ψ˙ (t¯, y¯) + On the other hand, ξ (t, x) = ψ (t, φt (x)) implies ξ˙ (t¯, x Dψ (t¯, y¯) V (¯ y ) z˙ (t¯) and putting things together we see that ¯, Dξ (t¯, x ¯) , D2 ξ (t¯, x ¯) , ξ˙ (t¯, x ¯) ≤ F φ t¯, x which says precisely that v is a viscosity sub-solution of (11.21). Replacing maximum by minimum and ≤ by ≥ in the preceding argument, we see that if u is a super-solution of (11.20), then v is a super-solution of (11.21). Conversely, the same arguments show that if v is a viscosity sub- (resp. super-) solution for (11.21), then u (t, y) = v t, φ−1 (y) is a sub- (resp. super-) solution for (11.20). We can now give the proof of the main result.

300

RDEs: smoothness ε

Proof. (Theorem 11.16) Using Lemma 11.24, we see that φε ≡ φz , the ε solution ﬂow to dy = V (y) dz ε , is an element of Φ ≡ Φ(3) . Set F ε := F φ . ε From Proposition 11.25, we know that u is a solution to duε = F t, y, Duε , D2 uε dt − Duε (t, y) · V (y) dztε , uε (0, ·) = uε0 if and only if v ε is a solution to ∂t − F ε = 0. By assumption of Φ-invariant comparison, |v ε − vˆε |∞;Rn ×[0,T ] ≤ |v0 − vˆ0 |∞;Rn , where v ε , vˆε are viscosity solutions to ∂t − F ε = 0. Let φz denote the solution ﬂow to the rough diﬀerential equation dy = V (y) dz. Thanks to Lipγ +2 -regularity of the vector ﬁelds φz ∈ Φ, and in particular a z ﬂow of C 3 -diﬀeomorphisms. Set F z = F φ . The “universal” limit theorem [120] holds; in fact, on the level of ﬂows of diﬀeomorphisms (see [119] and the rest of this chapter for more details) it tells us that, since z ε tends to z in rough path sense, φε → φz in Φ so that, by continuity of F (more precisely: uniform continuity on compacts), we easily deduce that F ε → F z locally uniformly. From the “Barles–Perthame” method of semi-relaxed limits (Lemma 6.1 and Remarks 6.2, 6.3 and 6.4 in [34], see also [53]), the pointwise (relaxed) limits v¯ := lim sup ∗ v ε , v := lim inf ∗ v ε are viscosity (sub- resp. super-) solutions to ∂t − F z = 0, with identical initial data. As the latter equation satisﬁes comparison, one has trivially uniqueness and hence v := v¯ = v is the unique (and continuous, since v¯, v are upper resp. lower semi-continuous) solution to ∂t v = F z v , v (0, ·) = u0 (·) . Moreover, using a simple Dini-type argument (e.g. [34, p. 35]) one sees that this limit must be uniform on compacts. It follows that v is the unique solution to ∂t v = F z v , v (0, ·) = u0 (·) (hence does not depend on the approximating sequence to z) and the proof of (i) is ﬁnished by setting −1 uz (t, x) := v t, (φzt ) (x) .

11.4 Comments

301

(ii) The comparison |uz − u ˆz |∞;[0,T ]×Rn ≤ |u0 − u ˆ0 |∞;Rn is a simple consequence of comparison for v, vˆ (solutions to ∂t v = F z v). At last, to see (iii), we argue in the very same way as in (i), starting with F z n → F z locally uniformly to see that v n → v locally uniformly, i.e. uniformly on compacts.

11.4 Comments Flows of RDE solutions were ﬁrst studied by Lyons and Qian [119], see also [120]; perturbations in the driving signal in [118]. Theorem 11.6 appears to be new; Corollary 11.7 was established by Li and Lyons in [109]. Exercise 11.8 is the rough path generalization of an SDE regularity result of Kusuoka [98]; his result is recovered upon taking the driving rough path to be enhanced Brownian motion. The limit Theorem 11.12 is the rough path generalization of the corresponding limit theorems for stochastic ﬂows as discussed in Ikeda and Watanabe [88], Kusuoka [98] or Malliavin [124]. Our deﬁnition of Dk (Re ), equation (11.5), follows Ben Arous and Castell [11]; see also Kusuoka [98]. Corollary 11.14 is somewhat cruder than Theorem 11.12 but helpful in making the link to various works on stochastic ﬂows, including Kusuoka [98] and Li and Lyons [109]. Section 11.3 on rough partial diﬀerential equations du = F t, x, Du, D 2 u dt + H (x, Du) dz, with F fully non-linear but H = (H1 , . . . , Hd ) linear in Du is taken from Caruana et al. [24]; the case when F and H are both linear (with respect to the derivatives of u) was considered in Caruana and Friz [23]. From the works of Lions and Souganidis [111–113] we conjecture that the present results extend to suﬃciently smooth but non-linear H. Other classes of rough partial diﬀerential equations have been studied; in Gubinelli et al. [77] the authors consider the evolution problem dY = −AY dt + B (Y ) dX where −A is the generator of an analytic semi-group; the solution is understood in mild sense, with the integrals involved being of Young type. An extension to a genuine rough setting (i.e. beyond the Young setting) is discussed in Gubinelli and Tindel [76].

12 RDEs with drift and other topics In the last two chapters we discussed various properties of rough diﬀerential equations of the form dy = V (y)dx, where V = (V1 , . . . , Vd ) denotes, as usual, a collection of vector ﬁelds. In applications, the term V (y)dx may model a state-dependent perturbation of the classical ODE y˙ = W (y). This leads to diﬀerential equations of the form dy = V (y)dx+W (y) dt where W (dy) dt is viewed as a drift term. To some extent, no new theory is required here. It suﬃces to replace V by V˜ = (V1 , . . . , Vd , W ) and the geometric p-rough path x by the “space-time” rough path x ˜ : t → S[p] (x,t), as discussed in Section 9.4. The downside of this approach is that one has to impose the same regularity assumptions on V and W , which is wasteful.1 We shall see in this chapter that the regularity assumptions on W can be signiﬁcantly weakened. Moreover, our estimates will be important in their own right as they will lead us to a deterministic understanding of McShane-type approximation results.2

12.1 RDEs with drift terms It is helpful to consider drift terms of the more general form W (y) dh where q -var d [0, T ] , R . It is natural to assume W = (W1 , . . . , Wd ) and h ∈ C that the drift term signal h has better regularity than x; which is to say that q ≤ p. A well-deﬁned Young pairing S[p] (x, h) is still necessary and so we assume that 1/p + 1/q > 1. It follows that q ∈ [1, 2) and hence [q] = 1. This implies that h is actually a geometric q-rough path, say . h ∈ C q -var [0, T ] , G[q ] Rd 1 From ODE theory, one expects that W ∈ Lip 1 will suﬃce for uniqueness, in contrast to V ∈ Lip p needed for RDE uniqueness . . . 2 To be discussed in Sections 12.2 and 13.3.4.

12.1 RDEs with drift terms

303

In fact, we shall ﬁnd it convenient not to impose that q ≤ p, since this will allow us to keep the symmetry between x and h. The object of study is then the rough diﬀerential equation of the form dy = V (y)dx + W (y)dh,

(12.1)

where x is a weak geometric p-rough path, and h is a weak geometric qrough path.3 In many applications, ht = t or h is of bounded variation, i.e. q = 1. We note again that (12.1) can be rewritten as a standard RDE driven by the geometric p-rough path S[p] (x ⊕ h), along the vector ﬁelds (V, W ), at the price of suboptimal regularity assumptions on V and W . Our direct analysis of (12.1) starts with the following p-var Deﬁnition 12.1 Let p, q ≥ 1 such that 1/p + 1/q > 1. Let x ∈ C q -var [p] d [0, T ] , G R be a weak geometric p-rough path, and h ∈ C

[0, T ] , G[q ] Rd be a weak geometric q-rough path. We say that y ∈ e C ([0, T ] , R ) is a solution to the rough diﬀerential equation (short: an RDE solution) driven by (x, h) along the collection of Re -vector ﬁelds (Vi )1≤i≤d , (Wj )1≤j ≤d and started at y0 if there exists a sequence (xn , hn ) ⊂ C 1-var [0, T ] , Rd × C 1-var [0, T ] , Rd such that sup S[p] (xn )p-var;[0,T ] + S[q ] (hn )q -var;[0,T ] < ∞, n lim d0;[0,T ] (S[p] (xn ) , x) = 0 and lim d0;[0,T ] S[q ] (hn ) , h = 0

n →∞

n →∞

and ODE solutions y ∈ π (V ,W ) (0, y0 ; (xn , hn )) such that n

y n → y uniformly on [0, T ] as n → ∞ . The (formal) equation dy = V (y) dx + W (y) dh is referred to as a rough diﬀerential equation with drift (short: an RDE with drift). This deﬁnition generalizes immediately to time intervals [s, T ] and we deﬁne π (V ,W ) (s, ys ; (x, h)) ⊂ C ([s, T ] , Re ) to be the set of all solutions to the above RDE with drift starting at ys at time s, and in case of uniqueness, π (V ,W ) (s, ys ; (x, h)) is the solution of the RDE with drift. We will also be interested in full RDE solutions with drift. Let us deﬁne this concept. p-var ([0, T ] , Deﬁnition d 12.2 Let p, q ≥ 1 such that 1/p+1/q > 1. Let x ∈ C [p] q -var ([0, T ] , G R be a weak geometric p-rough path, and h ∈ C

G[q ] Rd

be a weak geometric q-rough path. We say that y ∈ C ([0, T ] ,

3 In the case that q > p, one sees that 1/p + 1/q > 1 implies p ∈ [1, 2) and so V (y)dx plays the role of the drift term.

RDEs with drift and other topics

304

G[m ax(p,q )] (Re ) is a solution to the full rough diﬀerential equation (short: a full RDE solution) driven by (x, h) along the collection of Re -vector ﬁelds (Vi )1≤i≤d , (Wj )1≤j ≤d and started at y0 if there exists a sequence (xn , hn )n in C 1-var [0, T ] , Rd × C 1-var [0, T ] , Rd such that sup S[p] (xn )p-var + S[q ] (hn )q -var < ∞, n lim d0 (S[p] (xn ) , x) = 0 and lim d0 S[q ] (hn ) , h = 0

n →∞

n →∞

and ODE solutions yn ∈ π (V ,W ) (0, π 1 (y0 ) ; (xn , hn )) such that y0 ⊗ S[m ax(p,q )] (y n ) → y uniformly on [0, T ] as n → ∞ . The (formal) equation dy = V (y) dx + W (y) dh is referred to as a full rough diﬀerential equation with drift (short: a full RDE with drift). This deﬁnition generalizes immediately to time intervals [s, T ] and we deﬁne π (V ,W ) (s, ys ; (x, h)) ⊂ C [s, T ] , G[m ax(p,q )] (Re ) to be the set of all solutions to the above full RDE with drift starting at ys at time s, and in case of uniqueness, π (V ,W ) (s, ys ; (x, h)) is the solution of the full RDE with drift.

12.1.1 Existence We start by comparing ODE solutions with drift with their counterparts where we remove the drift. Lemma 12.3 Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ) with γ > 1; (i bis) W = (Wj )1≤j ≤d is a collection of vector ﬁelds in Lipβ −1 (Re ) with β > 1; (ii) s, t are some elements of [0, T ]; (iii) ys ∈ Re is an initial condition; (iv) x and h are two paths in C 1-var [s, t] , Rd and C 1-var [s, t] , Rd ;

t (v) x ≥ 0 is a bound on |V |Lip γ −1 s |dxr | and h ≥ 0 is a bound on

t |W |Lip β −1 s |dhr | . Then we have, for some constant C = C (γ, β) , π (V ,W ) (s, ys ; (x, h))s,t − π (V ) (s, ys ; x)s,t − π (W ) (s, ys ; h)s,t ≤ C βh + x γh −1 + x h + βx −1 h + γx exp (C (x + h )) .

12.1 RDEs with drift terms

305

In the case when γ ≥ 2 and β ≥ 2, we have π (V ,W ) (s, ys ; (x, h))s,t − π (V ) (s, ys ; x)s,t − π (W ) (s, ys ; h)s,t ≤ Cx h exp (C (x + h )) .

Proof. Without loss of generality, we assume (s, t) = (0, 1). Case 1: We ﬁrst assume that γ ≥ 2 and β ≥ 2. Deﬁne, for u ∈ [0, 1], zu = π (V ,W ) (0, y0 ; (x, h))u , yux = π (V ) (0, y0 ; x)u and yuh = π (W ) (0, y0 ; h)u . Deﬁne also x h . + y0,u eu = z0,u − y0,u

First observe that x h y0,u ≤ cx and y0,u ≤ ch . Then, by deﬁnition of y and z, for u ∈ [0, 1] ,

eu

u

u

dhr

W (zr ) − W {V (zr ) − V + 0 u u x zr − yrh . |dhr | |zr − yr | . |dxr | + c2 |W |Lip β −1 ≤ c2 |V |Lip γ −1 0 0 u x h z0,r − y0,r ≤ c2 . |V |Lip γ −1 |dxr | + |W |Lip β −1 |dhr | − y0,r 0 u u h x y0,r . |dxr | + c2 |W | β −1 y0,r . |dhr | + c2 |V |Lip γ −1 Lip 0 0 u ≤ c2 er |V |Lip γ −1 |dxr | + |W |Lip β −1 |dhr | + c4 x h . =

(yrx )} dxr

0

yrh

0

We conclude the proof by using Gronwall’s inequality. Case 2: We still assume γ ≥ 2, but β < 2. As π (W ) (0, y0 ; h)s,t − W (y0 ) hs,t ≤ c1 βh , we see that we can replace π (W ) (s, ys ; h)s,t by W (y0 ) hs,t . Deﬁne this time eu = |zu − yu − W (y0 ) h0,u | , and observe that |z0,1 | ≤

RDEs with drift and other topics

306

c1 (x + h ) . Then, by deﬁnition of y and z, for u ∈ [0, 1] , u u eu = {V (zr ) − V (yr )} dxr + {W (zr ) − W (z0 )} dhr 0 0 u u β −1 |zr − yr | . |dxr | + c2 |W |Lip β −1 sup |z0,r | |dhr | ≤ c2 |V |Lip γ −1 r 0 0 u u β −1 ≤ c2 |V |Lip γ −1 er |dxr | + c2 |W |Lip β −1 sup |z0,r | |dhr | r 0 0 u u |dxr | |W |Lip β −1 |dhr | + c2 |V |Lip γ −1 0 u 0 β −1 ≤ c3 |V |Lip γ −1 er |dxr | + c3 h (x + h ) + x 0 u er |dxr | + c4 βh + h βx −1 exp (c4 x ) . ≤ c3 |V |Lip γ −1 0

We conclude this case with Gronwall’s lemma once again. The case γ < 2 and β ≥ 2 is of course the symmetric case. Case 3: We ﬁnally consider the case γ < 2 and β < 2, which is the simplest case. As β π (W ) (0, y0 ; h)0,1 − W (y0 ) h0,1 ≤ c1 h , π (V ) (0, y0 ; x)0,1 − V (y0 ) x0,1 ≤ c1 γx , we see that we can replace π (W ) (0, y0 ; h)0,1 by W (y0 ) h0,1 and π (V ) (0, y0 ; x)0,1 by V (y0 ) x0,1 . But π (V ,W ) (0, y0 ; (x, h))0,1 − V (y0 ) x0,1 − W (y0 ) h0,1 1 [V (zu ) − V (z0 )] dxu ≤ 0 1 [W (zu ) − W (z0 )] dhu + 0

≤ ≤

γ −1

β −1

x + c4 (x + h ) c4 (x + h ) β γ −1 c5 h + x h + βx −1 h + γx .

h

That concludes the proof. It is now easy to give the following generalization of Lemma 10.5. Lemma 12.4 (Lemma Adrift ) Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ) with γ > 1; (i bis) W = (Wj )1≤j ≤d is a collection of vector ﬁelds in Lipβ −1 (Re ) with β > 1; (ii) s < u are some elements of [0, T ];

12.1 RDEs with drift terms

307

(iii) x, x ˜ are two paths in C 1-var [s, u] ,Rd such that x)s,u ; Sγ (x)s,u =Sγ (˜ 1-var d ˜ are two paths in C [s, u] , R such that Sβ (h)s,u = (iii bis) h, h ˜ ; Sβ h s,u u

u (iv) x ≥ 0 is a bound on |V |Lip γ −1 max s |dx|, s |d˜ x| and h ≥ 0 is a

u u ˜ . bound on |W |Lip β −1 max s |dh| , s dh We then have, for some constant C = C (γ, β) , ˜ )s,u ˜, h π (V ,W ) (s, ys ; (x, h))s,u − π (V ,W ) (s, ys ; x ≤ C βh + x γh −1 + x h + βx −1 h + γx exp (C (x + h )) . ˜ )s,u as ∆1 + ∆2 + ˜, h Proof. Write π (V ,W ) (s, ys ; (x, h))s,u − π (V ,W ) (s, ys ; x ∆3 , where ∆1 = π (V ,W ) (s, ys ; (x, h))s,u − π (V ) (s, ys ; x)s,u − π (W ) (s, ys ; h)s,t ˜ , − π (V ,W ) (s, ys ; (x, h))s,u −π (V ,W ) (s, ys ; x ˜)s,u −π (W ) s, ys ; h s,t

and ∆2

=

∆3

=

π (V ) (s, ys ; x)s,u − π ( V˜ ) (s, ys ; x ˜)s,u ˜ π (W ) (s, ys ; h)s,t − π (W ) s, ys ; h

s,t

.

Lemma 12.3 gives |∆1 | ≤ c1 βh + x γh −1 + x h + βx −1 h + γx exp (c1 x ) , Lemma 10.5 (which we called “Lemma A”) gives |∆2 | ≤ c2 γx exp (c2 x ) , and |∆3 | ≤ c3 βh exp (c3 h ) . The triangle inequality then ﬁnishes the proof. Lemma 12.5 (Lemma Bdrift ) Assume that (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ) with γ > 1; (i bis) W = (Wj )1≤j ≤d is a collection of vector ﬁelds in Lipβ −1 (Re ) with β > 1; (ii) t < u are some elements of [0, T ]; initial conditions); (iii) yt , y˜t ∈ Re (thought of as “time-t” (iv) x is a path in C 1-var [t, u] , Rd ;

RDEs with drift and other topics

308

(iv bis) h is a path in C 1-var [t, u] , Rd ;

u (v) x ≥ 0 is a bound on |V |Lip γ −1 t |dxr | and h ≥ 0 is a bound on

u |W |Lip β −1 t |dhr |. Then, if π (V ) (t, ·; x) denotes the unique solution to dy = V (y) dx from some time-t initial condition, we have for some C = C (γ, β) , π (V ,W ) (t, yt ; (x, h))t,u − π (V ,W ) (t, y˜t ; (x, h))t,u C |yt − y˜t | . γx −1 + βh −1 exp (C (x + h )) ≤ + βh + x γh −1 + x h + βx −1 h + γx exp (C (x + h )) . Proof. Write once again π (V ,W ) (t, yt ; (x, h))t,u − π (V ,W ) (t, y˜t ; (x, h))t,u as ∆1 + ∆2 + ∆3 , where ∆1

=

π (V ,W ) (t, yt ; (x, h))t,u − π (V ) (t, yt ; x)t,u − π (W ) (t, yt ; h)t,u − π (V ,W ) (t, y˜t ; (x, h))t,u − π (V ) (t, y˜t ; x)t,u − π (W ) (t, y˜t ; h)t,u ,

and ∆2

=

π (V ) (t, yt ; x)t,u − π (V ) (t, y˜t ; x)t,u

∆3

=

π (W ) (t, yt ; h)t,u − π (W ) (t, y˜t ; h)t,u .

Lemma 12.3 gives

|∆1 | ≤ c1 βh + h βx −1 + γh −1 β + γx exp (c1 x ) .

Then remark that Lemma 12.5 (Lemma B) in the case γ ≥ 2 gives |∆2 | ≤ c2 (|yt − y˜t | .x ) exp (c2 x ) , while inequality (10.12) in Exercise 10.12 easily leads to |∆2 | ≤ c2 |yt − y˜t | .γx −1 + γx exp (c2 x ) in the case γ < 2. We obtain similarly |∆2 | ≤ c2 |yt − y˜t | .βh −1 + βh exp (c2 h ) . We are now ready for our existence theorem: Theorem 12.6 Assume that p, q, γ, β ∈ [1, ∞) are such that 1/p + 1/q > 1 γ > p and β > q 1 β−1 γ−1 1 + > 1 and + > 1. q p q p

(12.2) (12.3) (12.4)

12.1 RDEs with drift terms

309

(i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ −1 (Re ); (i bis) W = (Wi )1≤i≤d is a collection of vector ﬁelds in Lipβ −1 (Re ); (ii) (xn ) is a sequence in C 1-var [0, T ] , Rd , and x is a weak geometric p-rough path such that lim d0;[0,T ] S[p] (xn ) , x and sup S[p] (xn )p-var;[0,T ] < ∞; n →∞

n

(ii bis) (hn ) is a sequence in C 1-var [0, T ] , Rd , and h is a weak geometric q-rough path such that lim d0;[0,T ] S[q ] (hn ) , h and sup S[q ] (hn )q -var;[0,T ] < ∞; n →∞

n

(iii) y0n ∈ G[m ax(p,q )] (Re ) is a sequence converging to some y0 ; (iv) ω is the control deﬁned by p q ω (s, t) = |V |Lip γ −1 xp-var;[s,t] + |W |Lip β −1 hq -var;[s,t] . Then, at least along a subsequence, y0n ⊗ S[m ax(p,q )] π (V ,W ) (0, π 1 (y0n ) ; (xn , hn ))) converges in uniform topology, and there exists a constant C1 depending on p, q, γ, and β such that for any limit point y, and all s < t in [0, T ] , 1/ m ax(p,q ) ∨ ω (s, t) . ym ax(p,q )-var;[s,t] ≤ C1 ω (s, t)

Finally, if xs,t : [s, t] → Rd and hs,t : [s, t] → Rd are two continuous paths of bounded variation such that t s,t dxu ≤ K x S[p] xs,t s,t = xs,t and p-var;[s,t] s t s,t dhu ≤ K h = hs,t and S[q ] hs,t q -var;[s,t] s

for some constant K then, for all s < t : ω (s, t) ≤ 1, there exists C2 and θ > 1, θ ys,t − S[m ax(p,q )] π (V ,W ) s, π 1 (ys ) ; xs,t , hs,t s,t ≤ C2 ω (s, t) , (12.5) where C2 and θ depend on p, q, γ, β and K. Proof. The argument follows exactly the line of the proof of Lemma 10.7, Theorem 10.14, and Theorem 10.36. We adapt Lemma 10.7 by ﬁrst assuming x and h to be the lift of smooth paths and concentrating on RDEs

RDEs with drift and other topics

310

rather than full RDEs. We then deﬁne geodesics hs,t , xs,t corresponding to the elements hs,t and xs,t , and Γs,t to be the diﬀerence of the ODE solution driven by hs,t , xs,t and h, x. Then we bound Γs,u − Γs,t − Γt,u using Lemma Adrift and Lemma Bdrift , and conclude this ﬁrst part with Lemma 10.59. When using Lemma Adrift and Lemma Bdrift , the variables x and h 1/p 1/q are set to ω (s, u) and ω (s, u) , respectively. To be able to use Lemma 10.59, all the powers in the expression ω (s, u)

β /q

+ ω (s, u)

γ −1 q

+ p1

1/q +1/p

+ ω (s, u)

1

+ ω (s, u) q

+ β −1 p

+ ω (s, u)

γ /p

(which comes to βh + x γh −1 + x h + βx −1 h + γx ) must be strictly greater than 1. That explains the Conditions 12.2, 12.3, and 12.4. Equipped with the equivalent of Lemma 10.7, a limit argument similar to the one in Theorem 10.14 allows us to prove the current theorem, for the case of RDEs (rather than full RDEs). We then use similar arguments as the one in the proof of Theorem 10.36 to conclude the proof. Remark 12.7 The conditions on p, q, γ, β in Theorem 12.6 may look surprisingly complicated and it helps to play through a few special cases. To this end, let us note that the conditions have been stated in such a way as to emphasize the symmetric roles of these parameters. We may break this symmetry by assuming, without loss of generality, that p ≥ q in which case the ﬁrst condition in (12.4) is seen to be redundant since γ−1 1 γ γ−1 1 + ≥ + = > 1. q p p p p The conditions on p, q, γ, β then reduce to

1 1/p + 1/q > 1, γ > p and β > max q, 1 + p 1 − q

.

Let us look at three special cases: (i) The case q = 1, frequently encountered in applications. The condition reduces to γ > p and β > q = 1, which is natural when compared with the regularity assumptions for RDE existence. (ii) The case q = p. The conditions now reduce to p < 2, γ > p and β > p. and we eﬀectively consider RDEs (or better: Young ODEs) driven by t → (xt , ht ) of ﬁnite p-variation. (iii) A good understanding of the term “max q, 1 + p 1 − 1q ” is the content of the forthcoming Remark 12.12.

12.1 RDEs with drift terms

311

The following corollary will be important as it often allows us to identify RDEs with drift. Examples will be seen in Section 12.2 below. Corollary 12.8 (Euler estimate, RDE with drift) In the setting of the previous Theorem 12.6, but focusing for simplicity on RDE solutions rather than full RDE solutions, we have for M > 0, for all s, t such that ω (s, t) ≤ M , θ π (V ,W ) (s, ys ; ((x, h)))s,t −π (V ) s, ys ; xs,t s,t −π (W ) s, ys ; hs,t s,t ≤ C1 ω (s, t) for some θ > 1 and C1 = C (M, p, q, γ, β). We also have for some constant C2 = C (M, p, q, γ, β) , θ π (V ,W ) (s, ys ; ((x, h)))s,t − E(V ) (ys , xs,t ) − E(W ) (ys , hs,t ) ≤ C2 ω (s, t) . (12.6) Proof. We assume M = 1; the general case follows the same lines. By the triangle inequality and inequality (12.5), it suﬃces to prove that π (V ,W ) s, ys ; xs,t , hs,t s,t − π (V ) s, ys ; xs,t s,t − π (W ) s, ys ; hs,t s,t θ

is bounded by Cω (s, t) . But this follows from Lemma 12.3. The second inequality follows easily from the ﬁrst one.

12.1.2 Uniqueness and continuity For existence of RDE with drift, we started by comparing π (V ,W ) (s, ys ; (x, h))s,t to π (V ) (s, ys ; x)s,t + π (W ) (s, ys ; h)s,t . We now look at the continuity of the diﬀerence between those two terms. Lemma 12.9 Assume that (i) V = (Vi )1≤i≤d and V˜ = (Vi )1≤i≤d are two collections of vector ﬁelds in Lipγ (Re ), γ ≥ 1; ˜ = (Wi ) (i bis) W = (Wj )1≤j ≤d and W 1≤i≤d are two collections of vector β e ﬁelds in Lip (R ), β ≥ 1; (ii) s < t are some elements of [0, T ]; (iii) ys ∈ Re is an initial condition; ×2 ˜ (iv) (x, x ˜) and h, h are two pairs in C 1-var [s, t] , Rd and C 1-var ×2 [s, t] , Rd ; (v) x and h are such that t t max |V |Lip γ |dxr | , V˜ γ |d˜ xr | ≤ x , Lip s s t t ˜ ˜ max |W |Lip β |dhr | , W dhr ≤ h ; β s

Lip

s

RDEs with drift and other topics

312

(vi) δ x and δ h are such that t max |V |Lip γ , V˜ γ |dxr − d˜ xr | ≤ δ x , Lip s t ˜ ˜ r ≤ δ h ; max |W |Lip β −1 , W dhr − dh β −1 Lip

s

(vii) εV and εW are such that 1 max |V |Lip γ

, V˜

Lip γ

1

˜ max |W |Lip β , W

V − V˜

Lip γ −1

˜ W − W

Lip β −1

≤ εV ,

≤ εW .

Lip β

Then, if ∆ is deﬁned by ∆ = π (V ,W ) (s, ys ; (x, h))s,t − π (V ) (s, ys ; x)s,t − π (W ) (s, ys ; h)s,t ˜ ˜ , ˜, h −π (V˜ ) (s, y˜s ; x ˜)s,t −π (W˜ ) s, y˜s ; h − π (V˜ , W˜ ) s, y˜s ; x s,t s,t we have for some constant C = C (γ, β) , ∆ ≤ C (|y0 − y˜0 | + εV + εW ) h x + h βx −1 + γh −1 x exp (C (x + h )) + C δ x h + γh −1 x + δ h c + βx −1 h exp (C (x + h )) . Proof. Without loss of generality, we assume (s, t) = (0, 1) . Deﬁne for u ∈ [0, 1], zu = π (V ,W ) (0, y0 ; (x, h))u , yux = π (V ) (0, y0 ; x)u and yuh = π (W ) (0, y0 ; h)u ; ˜ ˜ ˜ . z˜u = π (V˜ , W˜ ) 0, y˜0 ; x ˜, h , y˜ux˜ =π (V˜ ) (0, y˜0 ; x ˜)u and yuh =π (W˜ ) 0, y˜0 ; h u

We also set

u

˜ x h x ˜ h − z˜0,u − y˜0,u . − y0,u − y˜0,u eu = z0,u − y0,u

We obtain by deﬁnition of y, z, y˜, z˜ for u ∈ [0, 1] that eu = ∆1u + ∆2u , where u u 1 x xr , V˜ (˜ zr ) − V˜ y˜rx˜ d˜ ∆u = {V (zr ) − V (yr )} dxr − 0 0 u u ˜r . ˜ (˜ ˜ y˜rh˜ W (zr ) − W yrh dhr − ∆2u = W zr ) − W dh 0

0

12.1 RDEs with drift terms

313

Lemma 10.22 implies that 1 ∆u ≤ |V | γ Lip

u

zr − yrx − z˜r − y˜rx˜ . |dxr |

0

γ −1 y x − y˜x˜ x + ε + |z − y x |∞;[0,1] + z˜ − y˜x˜ ∞;[0,1] V ∞;[0,1] x ˜ + z˜ − y˜ .δ x .

There are a few terms in here we know how to bound: ﬁrst from Lemma 12.3 (that we have to use with Lipschitz parameters γ + 1 and β + 1 which are greater than 2), we have h h z − y x − y0,. + |z − y x |∞;[0,1] ≤ y0,. ∞;[0,1] ∞;[0,1] ≤ c1 h + c1 x h exp (c1 (x + h )) ≤ c2 h exp (c2 (x + h )) . Similarly, we have z˜ − y˜x˜

∞;[0,1]

≤ c2 h exp (c2 (x + h )) .

Then, Theorem 3.15 provides x y − y˜x˜ ≤ c3 (|y0 − y˜0 | + δ x + εV x ) . ∞;[0,1] Hence, we obtain 1 ∆u

≤

|V |Lip γ

0

u

|er | . |dxr | + |V |Lip γ

u

˜ h h y0,u − y˜0,u . |dxr |

0

+ c4 (|y0 − y˜0 | + εV (1 + x )) .γh −1 x exp (c4 (x + h )) + c4 δ x h + γh −1 x exp (c4 (x + h )) . Theorem 3.18 also provides ˜ h h ≤ c5 (|y0 − y˜0 | h + δ h + εW h ) exp (c5 (x + h )) . y0,. − y˜0,. ∞;[0,1]

That gives our ﬁnal bound on ∆1u , namely 1 ∆u ≤

u |V |Lip γ |er | . |dxr | 0 + c6 |y0 − y˜0 | h x + γh −1 x + εV γh −1 x + εW h x exp (c6 (x + h )) + c6 δ h x + δ x h + γh −1 x exp (c6 (x + h )) .

314

RDEs with drift and other topics

By symmetry, we obtain u 2 ∆u ≤ |W | β |er | . |dhu | Lip 0 + c7 |y0 − y˜0 | h x + h βx −1 + εW h βx −1 + εV h x exp (c7 (x + h )) + c7 δ x h + δ h c + βx −1 h exp (c6 (x + h )) . In particular, we obtain that u |eu | ≤ |er | . |W |Lip β |dhr | + |V |Lip γ |dxr | 0 y0 | +εV +εW ) h x +h βx −1 +γh −1 x exp (c6 (x +h )) + c8 (|y0 −˜ + c8 δ x h + γh −1 x + δ h c + βx −1 h exp (c6 (x + h )) . We conclude with Gronwall’s lemma. In the existence part, we used Lemma 12.3 to extend Lemma A and Lemma B to be able to generalize the RDE existence theorem to the RDE with drift existence theorem. Here, with Lemma 12.9, we can do the same, and generalize the RDE continuity theorems to RDE with drift continuity theorems. Without further details, we therefore present the uniqueness/continuity theorem for RDE with drift: Theorem 12.10 Assume that p, q, γ, β ∈ [1, ∞) are such that 1/p + 1/q > 1 γ > p and β > q 1 β−1 γ−1 1 + > 1 and + > 1; q p q p (i) V 1 = Vi1 1≤i≤d and V 2 = Vi2 1≤i≤d are two collections of vector e ﬁelds in Lipγ (R );1 1 (i bis) W = Wi 1≤i≤d and W 2 = Wi2 1≤i≤d are two collections of vector ﬁelds in Lipβ (Re ); (ii) ω is a ﬁxed control; (iii) x1 , x2 are two weak geometric p-rough paths in C p-var [0, T ] , G[p] Rd , with xi p-ω ≤ 1; (iii h1 , h2 are two weak geometric q-rough paths in C q -var [0, T ] , G[q ] bis) Rd , with hi q -ω ≤ 1; )] e initial (iv) y01 , y02 ∈ G[m ax(p,q ) thought of as time-0 conditions; 1 (R (v) υ is a bound on V Lip γ , V 2 Lip γ , W 1 Lip β and W 2 Lip β . Then, π (V i ,W i ) 0, y0i ; xi , hi is a singleton; that is, there exists a unique full RDE solution yi = π (V i ,W i ) 0, y0i ; xi , hi started at y0i driven by

12.1 RDEs with drift terms

315

(xi , hi ) along (V i , W i ). Moreover, ρm ax(p,q )-ω y1 , y2 ≤ Cε exp (Cυω (0, T )) where C = C (γ, β, p, q) and ε = υ y01 − y02 + V 1 − V 2 Lip γ −1 + W 1 − W 2 Lip β −1 +υ ρp-ω x1 , x2 + ρq -ω h1 , h2 . Remark that the metric ρm ax(p,q )-ω , unlike d∞;[0,T ] in the next statement, only measures the distance between the of two paths. Then, in 1increments y0 − y02 = y01 − y02 e rather than the above deﬁnition of ε, it is really R 1 y0 − y02 [ p ∨q ] e . We now state the reﬁned uniqueness theorem, which T (R ) also extends to the drift case without diﬃculties. Theorem 12.11 Assume that p, q, γ, β ∈ [1, ∞) are such that 1/p + 1/q > 1 γ ≥ p and β ≥ q 1 β−1 γ−1 1 + ≥ 1 and + ≥ 1. q p q p (i) Vj1 1≤j ≤d and Vj2 1≤j ≤d are two collections of vector ﬁelds in Lipp (Re ); (i bis) Wj1 1≤j ≤d and Wj2 1≤j ≤d are two collections of vector ﬁelds in Lipq (Re ); (ii) x1 , x2 ∈ C ψ p -var [0, T ] , G[p] Rd , with xi ψ -var ≤ R; p , with hi ψ -var ≤ R; (ii bis) h1 , h2 ∈ C ψ q -var [0, T ] , G[q ] Rd q

(iii) y01 , y02 ∈ G[p∨q ] (Re ) thought of as time-0 initial conditions; (iv) yi are some arbitrary elements of π (V i ) 0, y0i ; xi (that is, they are i i i RDE solutions driven the vector x , starting ﬁelds V ); 2 at y0 ,1 along by 1 2 (v) υ is a bound on V Lip p , V Lip p , W Lip q and W Lip q . is a singleton; that is, there exists a Then, π (V i ,W i ) 0, y0i ; xi , hi unique full RDE solution yi = π (V i ,W i ) 0, y0i ; xi , hi started at y0i driven by (xi , hi ) along (V i , W i ). Moreover, for all ε > 0, there exists µ = µ (ε; p, q, υ, R) > 0 such that 1 y0 − y02 + V 1 − V 2 p −1 + W 1 − W 2 q −1 + d∞ x1 , x2 < µ Lip Lip implies that

d∞;[0,T ] y1 , y2 < ε.

Remark 12.12 The conditions on γ, β, p, q above already appeared in the existence theorem for RDEs with drift and all comments made then (Remark 12.7) remain valid. In particular, assuming p ≥ q without loss of

316

RDEs with drift and other topics

generality, the conditions reduce to

1 . 1/p + 1/q > 1, γ > p and β > max q, 1 + p 1 − q

In Section 12.2 we shall see that q := p/ [p] ≥ 1 and β := γ − [p] + 1 arises naturally when perturbing the (centre of the) driving geometric p-rough path. (In fact, the drift vector ﬁelds, then consist of [p] − 1 iterated Lie brackets of the original Lipγ -vector ﬁelds which explains the choice of β.) An elementary computation then gives 1 = p − [p] + 1 < β, max q, 1 + p 1 − q

which shows that this condition is natural after all.

Of course, Corollaries 10.39 and 10.40 also extend to the drift case and we leave the details to the reader. We conclude this section with an exercise in which the reader is invited to implement the so-called Doss–Sussmann method for RDEs with drift. For simplicity, we only deal with Re -valued RDE solutions. Let us also note that it does not (seem to) lead to optimal regularity assumptions. Exercise 12.13 (Doss–Sussmann) Let x ∈ C p-var [0, T ] , G[p] Rd , V0 ∈ Lip1 (Re ) , V = (V1 , . . . , Vd ) ∈ Lipγ +1 (Re ) −1

x be the Jacobian of π (V ) (0, ·; x)t : Re → Re and set with γ > p. Let J0←−t x (12.7) (y) · V0 π (V ) (0, y; x)t . W (t, y) ≡ J0←−t

(i) Show that the ordinary, time-inhomogenous ODE z˙t = W (t, zt ) , z (0) = y0

(12.8)

admits a unique, non-explosive solution on [0, T ]. (ii) Show that the solution to the RDE with drift dy = V (y) dx + V0 (y) dt, started at y0 , is given by y = π (V ) (0, zt ; x)t , t ∈ [0, T ] .

(12.9)

(iii) Deduce an Euler estimate for RDEs with drift of form (12.6).

12.2 Application: perturbed driving signals and impact on RDEs 12.2.1 (Higher-)area perturbations and modiﬁed drift terms We consider a driving rough path x and consider what happens if we perturb it on some “higher-area” level such as ⊗N , V N Rd := gN Rd ∩ Rd

12.2 Application

317

the centre of the Lie algebra gN Rd ; for example, V 2 Rd = so (d), the space of anti-symmetric d × d matrices. Unless otherwise stated, V N Rd will be equipped with the Euclidean metric. Theorem 12.14 (centre perturbation) Let p, r ≥ 1 and N ∈ N such that [r] = N ≥ [p] . Given a weak geometric p-rough path x : [0, T ] → G[p] Rd and r ϕ ∈ C [ r ] -var [0, T ] , V N Rd we deﬁne the perturbation xϕ := exp (log (SN (x)) + ϕ) .

(12.10)

Then xϕ is a weak geometric max (p, r)-rough path. Assume V ∈ Lipγ with γ > max (p, r), so that dy = V (y) dxϕ , y (0) = y0 has a unique RDE solution. Then there is a unique solution to the RDE with drift, dz = V (z) dx + W (z) dϕ, z (0) = y0 where W is the collection of vector ﬁelds given by Vi 1 , . . . , Vi N −1 , Vi N . . . i ,...,i ∈{1,...,d} 1 N and y = π (V ) (0, y0 ; xϕ ) ≡ π (V ,W ) (0, y0 ; (x,ϕ)) = z. We prepare the proof with k

Lemma 12.15 Let k ∈ N. Given a multi-index α= (α1 , . . . ,αk ) ∈ {1, . . . , d} and Lipk −1 vector ﬁelds V1 , . . . , Vk on Re , deﬁne Vα = Vα k , Vα k −1 , . . . , [Vα 2 , Vα 1 ] . Further, let e1 , . . . , ed denote the canonical basis of Rd . Then gn Rd , the step-n free Lie algebra, is generated by elements of the form ⊗k , k≤n eα = eα k , eα k −1 , . . . , [eα 2 , eα 1 ] ∈ Rd with [u, v] = u ⊗ v − v ⊗ u and 4 i ,...,i 1 Vi k · · · Vi 1 (eα ) k = Vα . i 1 ,...,i k ∈{1,...,d}

4A

⊗k k-tensor u ∈ Rd is written as u = i 1 , . . . , i k ∈{1 , . . . , d } u i k , . . . , i 1 ei k ⊗ . . . ⊗ ei 1 .

RDEs with drift and other topics

318

Proof. It is clear that gn Rd is generated by the eα . We prove the second statement by induction: a straightforward calculation shows that it holds for k = 2. Now suppose it holds for k − 1 and denote Vα˜ = Vα k −1 , . . . , [Vα 2 , Vα 1 ] . Then (using summation convention), i ,...,i 1

Vi k . . . Vi 1 (eα ) k

i ,...,i 1 Vi k . . . Vi 1 eα k ⊗ eα k −1 , . . . , [eα 2 , eα 1 ] k i ,...,i 1 − Vi k . . . Vi 1 eα k −1 , . . . , [eα 2 , eα 1 ] ⊗ eα k k i ,...,i 1 = Vi k . . . Vi 1 δ α k ,i k ⊗ eα k −1 , . . . , [eα 2 , eα 1 ] k −1 i ,...,i 2 − Vi k . . . Vi 1 eα k −1 , . . . , [eα 2 , eα 1 ] k ⊗ δ α k ,i 1 i ,...,i 1 = Vα k Vi k −1 . . . Vi 1 eα k −1 , . . . , [eα 2 , eα 1 ] k −1 i ,...,i 2 − Vi k . . . Vi 2 eα k −1 , . . . , [eα 2 , eα 1 ] k Vα k = Vα k Vα˜ − Vα˜ Vα k = Vα k , Vα k −1 , . . . , [Vα 2 , Vα 1 ] , =

where we set α ˜ = (αk −1 , . . . , α1 ) and use the induction hypothesis that Vα˜ equals i ,...,i 1 Vi k −1 . . . Vi 1 eα k −1 , . . . , [eα 2 , eα 1 ] k −1 i ,...,i 2 = Vi k . . . Vi 2 eα k −1 , . . . , [eα 2 , eα 1 ] k . Proof of Theorem 12.14. Remark that W, ϕ satisfy the regularity condition of Theorem 12.10 (cf. Remark 12.12), and so RDEs of type dz = V (z) dx + W (z) dϕ have unique solutions. It suﬃces to show that yT = zT . Take a dissection D = (ti ) of [0, T ] and deﬁne zti = π (V ,W ) (ti , yt i ; (x,ϕ))t for t ∈ [ti , T ] . |D |

Note that zT0 = zT and zT

= yT , hence

|zT − yT | ≤

|D | i zT − z i−1 . T i=1

Now, i zT − z i−1 T

= π (V ,W ) (ti , yt i ; (x,ϕ))T − π (V ,W ) ti−1 , yt i −1 ; (x,ϕ) T = |π (V ,W ) (ti , yt i ; (x,ϕ))T −π (V ,W ) ti , π (V ,W ) ti−1 , yt i −1 ; (x,ϕ) t i ; (x,ϕ) | T yt i − π (V ,W ) ti−1 , yt i −1 ; (x,ϕ) t i

thanks to Lipschitzness of the ﬂow (which was established in Theorem 12.10). By subtracting/adding E(V ) yt i −1 , xt i −1 ,t i + E(W ) yt i −1 , ϕt i −1 ,t i

12.2 Application

319

we estimate zTi − zTi−1 ≤ ∆1 + ∆2 where ∆1 = yt i − E(V ) yt i −1 , xt i −1 ,t i − E(W ) yt i −1 , ϕt i −1 ,t i , ∆2 = π (V ,W ) ti−1 , yt i −1 ; (x,ϕ) t − E(V ) yt i −1 , xt i −1 ,t i − E(W ) yt i −1 , ϕt i −1 ,t i . i

Thanks to Lemma 12.15 and E(W ) yt i −1 , ϕt i −1 ,t i = W yt i −1 · ϕt i −1 ,t i we have E(V ) yt i −1 , xt i −1 ,t i + E(W ) yt i −1 , ϕt i −1 ,t i = E(V ) yt i −1 , xϕti −1 ,t i and hence, from the Euler estimate for RDEs, Corollary 10.15, ∆1 ≤ c1 ω θ (ti−1 , ti ) for some control ω and some θ > 1. On the other hand, our Euler estimates for RDEs with drift as stated in Corollary 12.8 imply that, θ similarly, ∆2 ≤ c2 ω (ti−1 , ti ) . It follows that, with c3 = c1 + c2 , i zT − z i−1 ≤ c3 ω (ti−1 , ti )θ T |D | θ and so |zT − yT | ≤ c3 i=1 ω (ti−1 , ti ) → 0 as |D| → 0. In Theorem 12.14 we have studied the impact of level-N perturbation of a driving signal. More precisely, given a weak geometric p-rough path x with [p] ≤ N and a suﬃciently regular map ϕ(N ) : [0, T ] → V N Rd we deﬁned in (12.10) a perturbation of x, which we now denote by Tϕ ( N ) x := exp[log SN (x) + 0, . . . , 0, ϕ(N ) ], and then saw that RDEs driven along vector ﬁelds V = (V1 , . . . , Vd ) by dx versus d Tϕ ( N ) x eﬀectively diﬀer by a drift term of the form Vi 1 , . . . , Vi N −1 , Vi N . . . dϕN ;i 1 ,...,i N with summation over iterated indices. On the other hand, the “natural” level-1 perturbation of a geometric p-rough path x, in the direction of a suﬃciently regular path ϕ(1) : [0, T ] → Rd , is given by (cf. Section 9.4.6) the translation operator Tϕ ( 1 ) acting on x. The respective RDEs driven by dx and dxϕ

(1)

obviously diﬀer by a drift term of the form Vi dϕ1;i .

These two perturbations can be thought of as special cases of a general perturbation. To this end, we now consider a general perturbation ϕ = ϕ(1) , . . . , ϕ(N ) : [0, T ] → gN Rd ,

RDEs with drift and other topics

320

assumed (for simplicity) to be of bounded variation with respect to the Euclidean metric on gN Rd . Let us also assume, at ﬁrst, x = SN (x) where x ∈ C 1-var [0, T ] , Rd . We then deﬁne, inductively, Tϕ ( 1 ) x : = x + ϕ(1) ∈ C 1-var [0, T ] , Rd (12.11) T(ϕ ( 1 ) ,ϕ ( 2 ) ) x : = exp[log S2 Tϕ ( 1 ) x + 0, ϕ(2) ∈ C 2-var [0, T ] , G2 Rd .. . T(ϕ ( 1 ) ,...,ϕ ( N ) ) x : = exp[log SN T(ϕ ( 1 ) ,...,ϕ ( N −1 ) ) x + 0, . . . , 0, ϕ(N ) ∈ C N -var [0, T ] , GN Rd and note that, even though x was assumed to be of bounded variation, T(ϕ ( 1 ) ,...,ϕ ( N ) ) x is a genuine (weak) geometric N -rough path. Theorem 12.16 (general perturbation) (i) Let p ≥ 1 and [p] ≤ N . Given x ∈ Cop-var [0, T ] , G[p] Rd and ϕ : [0, T ] → gN Rd of bounded variation with respect to the Euclidean metric on gN Rd , there exists a unique p-var C [0, T ] , GN Rd if [p] = N Tϕ x := T(ϕ ( 1 ) ,...,ϕ ( N ) ) x ∈ if [p] > N C N -var [0, T ] , GN Rd with the property that, whenever S[p] (xn ) → x uniformly and supn S[p] (xn ) < ∞ then p-var T(ϕ ( 1 ) ,...,ϕ ( N ) ) xn → T(ϕ ( 1 ) ,...,ϕ ( N ) ) x uniformly and with uniform p- (resp. N -) variation bounds. (ii) Assume V ∈ Lipγ , γ > max (p, N ). Then y ≡ π (V ) (0, y0 ; Tϕ x) equals z ≡ π (V ,∗V ) (0, y0 ; (x;ϕ)) where y is the RDE solution to dy = V (y) d T(ϕ ( 1 ) ,...,ϕ ( N ) ) x and z the solution of the following RDE with drift, dz = V (z) dx + (∗V ) (z) dϕ, z (0) = y0 where (∗V ) (·) dϕ =

N

Vi 1 , . . . , Vi k −1 , Vi k . . . |· dϕ(k );i 1 ,...,i k .

k =1 i 1 ,...,i k

Proof. When x = SN (x) for x ∈ C 1-var [0, T ] , Rd , we can use (12.11) and apply iteratively Theorem 12.14 to see that π (V ) (0, y0 ; Tϕ x) equals z ≡ π (V ,∗V ) (0, y0 ; (x;ϕ)) .

12.2 Application

321

For the general case, we need to properly deﬁne Tϕ x. To this end, let C = (∂1 , . . . , ∂d ) be the collection of coordinate vector ﬁelds on Rd . The full RDE solution y = π (C ) (0, x0 ; x) is identically equal to the input signal x, which suggests deﬁning Tϕ x := π (V ,∗V ) (0, o; (x; ϕ)) as an RDE with drift. We can now use continuity results for RDEs with drift to see that x → Tϕ x has the required continuity properties, as stated in part (i).

12.2.2 Limits of Wong–Zakai type with modiﬁed area The next theorem describes a situation in which a piecewise linear approximation is twisted in such a way as to lead to a centre perturbation. In view of the forthcoming examples (Section 13.3.4) we state the following results only for geometric H¨ older rough paths. Deﬁnition 12.17 Let α ∈ (0, 1], x ∈ C α -H¨o l [0, T ] , G[1/α ] Rd and write x = π 1 (x) for its projection to a path with values in Rd . (i) Assume [1/α] ≤ N ∈ N and let (Dn ) = (tni : i) be a sequence of dissections of [0, T ] such that5 sup S[1/α ] xD n α -H¨o l = M < ∞ and d∞ S[1/α ] xD n , x →n →∞ 0. n ∈N

If (xn ) ⊂ C 1-H¨o l [0, T ] , Rd is such that6 −1 pnt := SN (xn )0,t ⊗ SN xD n 0,t takes values in the centre of GN Rd whenever t ∈ Dn then we say that (xn ) is an approximation on (Dn ) with perturbations (pn ) on level N to x. (ii) Let β ∈ (0, 1] such that [1/β] = N ≥ [1/α] .

(12.12)

We say that an approximation (xn ) on (Dn ) with perturbations (pn ) on level N to x is min (α, β)-H¨older comparable (with constants c1 , c2 , c3 ) if for all tni , tni+1 ∈ Dn β −1 |xn |1-H¨o l; [t n ,t n ] ≤ c1 xD n 1-H¨o l; [t n ,t n ] + c2 tni+1 − tni and i i+ 1 i i+ 1 n ps,t ≤ c3 |t − s|β for all s, t ∈ Dn . 5 We

D. 6 It

recall that xD is the piecewise linear approximation to x based on the dissection

/ Dn . is not assumed that p nt ∈ centre of G N Rd when t ∈

RDEs with drift and other topics

322

Although at ﬁrst sight technical, these deﬁnitions are fairly natural: ﬁrstly, we restrict our attention to H¨ older rough paths x which are the limit of “their (lifted) piecewise linear approximations”. As we shall see in Part III, this covers the bulk of stochastic processes which admit a lift to a rough path. Assumption (ii) in the above deﬁnition then guaranteess older scale, comparable to the piecewise that (xn ) remains, at min (α, β)-H¨ linear approximations. In particular, the assumption on |xn |1-H¨o l; [t n ,t n ] = i

i+ 1

|x˙ n |∞; [t n ,t n ] will be easy to verify in all examples (cf. below). The ini i+ 1 n tuition if we assume x runs at constant speed over any interval n isnthat, I = ti , ti+1 , Dn = (tni ), it is equivalent to saying that length (xn |I ) (

β c1 length xD n |I + c2 |I| β = c1 xt ni ,t ni+ 1 + c2 tni+1 − tni ).

≤

Theorem 12.18 Let α, β ∈ (0, 1] and assume [1/β] = N ≥ [1/α]. Assume x ∈ C α -H¨o l [0, T ] , G[1/α ] Rd and let (xn ) be an approximation on some sequence (Dn ) of dissections of [0, T ] with perturbations (pn ) on level N to x. (i) If the approximation is min (α, β)-H¨ older comparable (with constants c1 , c2 , c3 ) then there exists a constant C = C (α, β, c1 , c2 , M, T, N ) such that

sup SN (x )m in(α ,β )-H¨o l n

n ∈N

D n ≤ C sup S[1/α ] x + c3 + 1 < ∞. α -H¨o l n ∈N

(ii) If pnt → pt for all t ∈ ∪n Dn and ∪n Dn is dense in [0,T ] then p is a β-H¨ older continuous path with values in the centre of GN Rd and for every t ∈ [0, T ], d SN (xn )0,t , SN (x)0,t ⊗ p0,t ≤ d SN xD n 0,t , SN (x)0,t +d pn0,t , p0,t → 0 as n → ∞.

(iii) If the assumptions of both (i) and (ii) are met then, for all γ < min (α, β), dγ -H¨o l (SN (xn ) , xϕ ) →n →∞ 0 where ϕ := log p ∈ V N Rd and xϕ = exp (log (SN (x)) + ϕ).

12.2 Application

323

Proof. (i) Take s < t in [0, T ]. If s, t ∈ tni , tni+1 we have by our assumption on |xn |1-H¨o l;[t i ,t i + 1 ] SN (xn )s,t

≤

|t − s| SN (xn )1-H¨o l; [t n ,t n ] i i+ 1 n = |t − s| |x |1-H¨o l; [t n ,t n ] i i + 1 β −1 n n n ≤ |t − s| c1 xt i ,t i + 1 / ti+1 − tni + c2 tni+1 − tni α −1 β −1 + c2 tni+1 − tni ≤ |t − s| c1 |x|α -H¨o l tni+1 − tni m in(α ,β )

≤ c4 |t − s|

,

with suitable constant c4 . Otherwise we can ﬁnd tni ≤ tnj so that s ≤ tni ≤ tnj ≤ t and γ SN (xn )s,t ≤ 2c4 |t − s| + SN (xn )t ni ,t nj . Estimates for the Lyons lift x → SN (x), Proposition 9.3, then guarantee existence of a constant c5 such that SN (xn )t ni ,t nj ≤ SN xD n t n ,t n + pntni ,t nj i j α β ≤ c5 S[1/α ] xD n α -H¨o l tnj − tni + c3 tnj − tni m in(α ,β ) ≤ (c5 S[1/α ] xD n α -H¨o l + c3 ) |t − s| and, since supn S[1/α ] xD n α -H¨o l < ∞ by assumption, the proof of the uniform H¨ older bound is ﬁnished. older. By a standard Arzela–Ascoli(ii) By assumption, pn is uniformly β-H¨ type argument, it is clear that every pointwise limit (if only on the dense older regularity is preserved in this set ∪n Dn ) is a uniform limit and β-H¨ limit, i.e. p is β-H¨ o lder itself. For every t ∈ ∪n Dn , pnt takes values in the N d centre of G R and hence (density of ∪n Dn , continuity of p) it is easy to see that p takes values in the centre for all t ∈ [0, T ]. Now take t ∈ Dn . Since elements in the centre commute with all elements in GN Rd , we have d SN (xn )0,t , SN (x)0,t ⊗ p0,t −1 −1 = SN (xn )0,t ⊗ SN xD n 0,t ⊗ pn0,t ⊗ SN xD n 0,t −1 ⊗ SN (x)0,t ⊗ pn0,t ⊗ p0,t −1 −1 ⊗ p0,t = SN xD n 0,t ⊗ SN (x)0,t ⊗ pn0,t ≤ d SN xD n 0,t , SN (x)0,t + d pn0,t , p0,t .

324

RDEs with drift and other topics

On the other hand, given an arbitrary element t ∈ [0, T ] we can take tn to be the closest neighbour in Dn and so d SN (xn )0,t , SN (x)0,t ⊗ p0,t = d SN xD n 0,t , SN (x)0,t + 2d pn0,t , p0,t −1 −1 + d SN (x)0,t ⊗ SN (xn )0,t , S (x)0,t n ⊗ SN (xn )0,t n . From the assumptions and H¨ older (resp. uniform H¨ older) continuity of n n S (x) (resp. SN (x )), we see that d SN (x )0,t , SN (x)0,t ⊗ p0,t → 0, as required. (iii) Uniform min (α, β)-H¨ older bounds imply equivalence of pointwise and uniform convergence; convergence with H¨ older exponent γ < min (α, β) then follows by interpolation. Observe also that SN (x)0,t ⊗ p0,t = xϕ = exp (log (SN (x)) + log p) is a simple consequence of p0,t taking values in the centre.

12.3 Comments The present exposition of RDEs with drift is new. A detailed study of RDEs with drift was previously carried out in Lejay and Victoir [107]. Exercise 12.13 goes back to Doss–Sussmann [44, 166] and is taken from Friz and Oberhauser [58], as is the bulk of material in Section 12.2 which can be used to prove optimality of various rough-path estimates for RDEs and linear RDEs obtained in Chapter 10.

Part III

Stochastic processes lifted to rough paths

13 Brownian motion We discuss how Brownian motion can be enhanced, essentially by adding L´evy’s stochastic area, to a process (“enhanced Brownian motion”, EBM) with the property that almost every sample path is a geometric rough path (“Brownian rough path”). Various approximation results are studied, followed by a discussion of large deviations and support descriptions in rough path topology.

13.1 Brownian motion and L´evy’s area 13.1.1 Brownian motion We start with the following fundamental Deﬁnition 13.1 (Brownian motion) A real-valued stochastic process (β t : t ≥ 0) is a (1-dimensional) Brownian motion if it has the properties (i) β 0 (ω) = 0 for all ω; (ii) the map t → β t (ω) is a continuous function of t ∈ R+ for all ω; (iii) for every t, h ≥ 0, β t,t+h ≡ β t+ h −β t is independent of (β u : 0 ≤ u ≤ t), and has Gaussian distribution with mean 0 and variance h. BrowAn Rd -valued stochastic process (Bt : t ≥ 0) is a (d-dimensional) nian motion if it has independent components B 1 , . . . , B d , each of which is a 1-dimensional Brownian motion. A realization of Brownian motion is called a Brownian path. It is an immediate corollary of properties (i)–(iii) that Brownian motion has stationary increments, that is D

(Bs,s+ t : t ≥ 0) = (Bt : t ≥ 0) , as for the Brownian scaling property, D

∀λ > 0 : (Bλ 2 t : t ≥ 0) = (λBt : t ≥ 0) and

D (Bt : t ≥ 0) = tB1/t : t ≥ 0 .

(13.1)

We trust the reader is familiar with the following basic facts concerning Brownian motion. (Some references are given in the comments section at the end of this chapter.)

Brownian motion

328

• Existence. More precisely, there exists a unique Borel probability measure W on C [0, ∞), Rd so that the coordinate function Bt (ω) = ω t deﬁnes a Brownian motion. The aforementioned measure is known as the (d-dimensional) Wiener measure. • Brownian motion is a martingale. In fact, a theorem of P. L´evy states that if (Mt ) denotes any Rd -valued continuous martingale started at zero, such that Mt ⊗ Mt − t × I (where I is the (d × d)-identiy matrix) is also a martingale, then (Mt ) must be a d-dimensional Brownian motion. • Brownian motion is a zero-mean Gaussian process with covariance function1 (s, t) → E (Bs ⊗ Bt ) = (s ∧ t) × I. As for every continuous Gaussian process, mean and covariance fully determine the law of the process. • Brownian motion is a (time-homogenous) Markov process. Its transition density – also known as heat-kernel – is given by pt (x, y) =

1 d/2

e−

|x −y |2 2t

,

(2πt)

where |·| denotes the Euclidean norm on Rd . • Brownian sample paths are of unbounded variation, i.e. for any T > 0 |B|1-var;[0,T ] = +∞ a.s. In fact, Brownian sample paths have unbounded p-variation for any p ≤ 2 and the reader can ﬁnd a self-contained proof in Section 13.9.

13.1.2 L´evy’s area: deﬁnition and exponential integrability ˜ we deﬁne their Given two independent Brownian motions, say β and β, 2 L´evy’s area as the stochastic Itˆo integral t 1 ˜ −β ˜ dβ . (13.2) t ∈ [0, ∞) → β s dβ s s s 2 0 ∧ t =>min ?(s, t)>. ? ˜ = β, ˜ β (= 0) it would not make a diﬀerence here to use Stratonovich β, β integration. 1s

2 Since

13.1 Brownian motion and L´ evy’s area

329

We recall that (Itˆo) stochastic integrals are limits of their (left-point) Riemann–Stieltjes approximations, uniformly on compact time intervals. Indeed, given any sequence of dissections3 (Dn ) of [0, T ] with mesh |Dn | → 0, one has4   t 2  ˜ ˜ − ˜ → 0 as n → ∞. E  sup β s dβ βsi β s s i + 1 ∧t − β s i ∧t t∈[0,T ] 0 s i ∈D n

(13.3) Since uniform limits of continuous functions are continuous, (13.3) implies in particular that (13.2) can be taken to be continuous in t, with probability one. Deﬁnition 13.2(L´ evy’s area) Given a d-dimensional Brownian motion B = B 1 , . . . , B d , we deﬁne the d-dimensional L´evy area A = Ai,j : i, j ∈ {1, . . . , d}) as the continuous process t 1 i j j i = B dB − B dB t → Ai,j t s s s s . 2 0 We also deﬁne the L´evy area increments as, for any s < t in [0, T ] , 1 i j i,j i,j j i Ai,j B = A − A − B − B B s,t t s s s,t s s,t 2 t 1 i j Bs,r dBrj − Bs,r dBri . = 2 s

note We that At = A0,t and more generally, As,t take values in so (d) ≡ Rd , Rd , the space of anti-symmetric d × d matrices, and it suﬃces to consider i = j (or i < j if you wish). As a consequence of basic properties of Brownian motion, we have that ∀λ

>

∀0 ≤

D

0 : (Aλt : t ≥ 0) = (λAt : t ≥ 0) , D

s < t < ∞ : As,t = A0,t−s .

We now establish that the L´evy area has exponential integrability and note that by scaling it suﬃces to consider t = 1. There are many ways to see this integrability (including integrability properties of the Wiener–Itˆ o chaos and heat-kernel estimates), but we have chosen a fairly elementary one. 3 Unless

otherwise stated, dissections are assumed to be deterministic. can be taken as the very deﬁnition of the stochastic integral 0· βdβ. Alter ˜ is an L 2 -martingale, one employs Doob’s L 2 natively, only accepting that 0· β s dβ s inequality to “get rid” of the sup inside the expectation, followed by Itˆ o ’s L 2 -isometry to establish (13.3). 4 This

Brownian motion

330

Lemma 13.3 Let B be a d-dimensional Brownian motion. Then, for all η < 1/2, we have " ! η 2 |B|∞;[0,T ] < ∞. E exp T D

Proof. It suﬃces to consider d = 1, and T = 1. From B = (−B) and the reﬂection principle for Brownian motion,5 we see that % & P [|B|∞ ≥ M ] ≤ 2P sup Bt ≥ M 0≤t≤1

=

4P (B1 ≥ M ) .

The result follows from the usual tail behaviour of B1 ∼ N (0, 1). Proposition 13.4 Let B be a d-dimensional Brownian motion, and A its L´evy area. There exists η > 0 such that for any 0 ≤ s < t ≤ T, & % |As,t | < ∞. E exp η |t − s| D

Proof. Since As,t / |t − s| = A0,1 it is enough to prove exponential integrability of L´evy’s area at time 1. To this end, it suﬃces to consider a “building 1 ˜ We observe that, conditional on β (.) , block” of L´evy’s area of form 0 βdβ.

1 ˜ we can view 0 βdβ as if the integrand β were deterministic and from a very basic form of Itˆ o’s isometry, 1 1 2 ˜ ∼ N 0, β s dβ β ds . s s 0

0

It follows that, conditional on F = σ (β s : 0 ≤ s ≤ 1), " ! 1 ˜ E eη | 0 β s d β s | Fβ 1 ! " = E eη |Z | |Fβ with Z ∼ N 0, β 2s ds β

β

2

η 2E eη Z |F = 2 exp 2 2 η 2 |β|∞;[0,1] ≤ 2 exp 2

≤

0 1

β 2s ds 0

and, after taking expectations, 2 ! 1 " η ˜ 2 E eη | 0 β s d β s | ≤ 2E exp |β|∞;[0,1] < ∞ 2 for η > 0 small enough, thanks to Lemma 13.3. The proof is now ﬁnished. 5 For

example, [143, p. 105].

13.1 Brownian motion and L´ evy’s area

331

˜ be Exercise 13.5 (L´ evy’s construction of L´ evy’s area) Let β and β two independent Brownian motions on [0, T ]. Consider a sequence of dyadic ˜ D n for the resultdissections Dn = {iT 2−n : i = 0, . . . , 2n } and write β D n , β ing piecewise linear approximations. Show that T Dn 1 Dn ˜Dn Dn ˜ β s dβ s − β s dβ s An (ω) := 2 0 is a (discrete-time) martingale with respect to the ﬁltration Fn := σ

˜ : t ∈ Dn βt , β t

which converges in L2 (P). Identify the limit as L´evy’s area (at time T ), as deﬁned in (13.2). Exercise 13.6 Consider a d-dimensional Brownian motion with its associated, so (d)-valued, L´evy area process A. Let (0 = t0 < t1 < · · · < tn = T ) be a dissection of [0, T ]. Show that n n At i −1 ,t i ≤ cq At i −1 ,t i q 2 i=1

L (P)

i=1

L (P)

where c is a constant, independent of n and q. (Remark that this estimate is an immediate consequence of integrability properties of Wiener–Itˆ o chaos as discussed in Section D.4 in Appendix D. The point of this exercise is to give an elementary proof.) Solution. Without loss of generality, we take T = 1. If Xi := At i −1 ,t i and Sn = X1 + · · · + Xn , a sum of independent random variables, it sufﬁces to show existence of η, M > 0, independent of n and q, such that D E eη |S n |/|S n |L 2 < M < ∞. From Sn = −Sn and e|x| ≤ ex + e−x it is enough to estimate E eη S n /|S n |L 2 = Πni=1 E eη X i /|S n |L 2 . 2 Note that, for all λ small enough, E eλA 0 , 1 = E eλ Z for some random variable Z withan exponential 2 tail. (This can be seen from the iden1 1 2 λ ˜ tity E exp λ 0 βdβ = E exp 2 0 β dt obtained by conditioning on β, exactly as in the proof of Proposition 13.4.) Note also that by scaling properties of L´evy’s area, D

Xi = |ti − ti−1 | A0,1 = c |Xi |L 2 A0,1 with c = 1/ |A0,1 |L 2 .

Brownian motion

332

2

2

It follows that, for η small enough, and with θi = |Xi |L 2 / |Sn |L 2 , |Xi |L 2 E (exp (η Xi / |Sn |L 2 )) = E exp ηc A0,1 |Sn |L 2 θ i = E exp η 2 c2 θi Z ≤ E exp η 2 c2 Z by Jensen’s inequality. As a result, E eη S n /|S n |L 2 ≤ E exp η 2 c2 Z < ∞ for η small enough.

13.1.3 L´evy’s area as time-changed Brownian motion The following result will only be used in Section 13.8 on the support theorem in its conditional form. Proposition 13.7 Let B be a d-dimensional Brownian motion and ﬁx two ˜ = B j where i, j ∈ {1, . . . , d} and i = j. Set distinct components β = B i , β t 1 t 2 ˜2 1 ˜ −β ˜ dβ β s + β s ds. and a (t) := β s dβ A (t) := s s s 2 4 0 0 Then A a−1 (t) : t ≥ 0 is a (1-dimensional) Brownian motion, indepen 2 2 ˜ dent of the process β s + β s : s ≥ 0 , and hence independent of the radial process {|Bs | : s ≥ 0} where |·| denotes Euclidean norm on Rd . Proof. Set rt (ω) ≡ rt ≡

2

˜ . By Itˆo’s formula, β 2t + β t rt2 = 2

where

γ t (ω) = 0

t

t

rs dγ s + t

(13.4)

0

βs dβ + rs s

0

t

˜ β s ˜ dβ . rs s

(Note that dr and dγ diﬀer by a drift diﬀerential.) Clearly, the system of martingales (A, γ) satisifes the bracket relations !γ"t = t, !γ, A"t = 0 and 1 t 2 !A"t = r ds. 4 0 s Let γ˜ t = A (φt ) where φt = a−1 (t). By L´evy’s characterization, γ and γ˜ are two mutually independent Brownian motions. Moreover, (13.4) shows that rt is the pathwise unique solution to an SDE driven by γ˜ and, in particular, γ t ) and (rt ) are σ [rs : s ≤ t] ⊂ σ [γ s : s ≤ t] . Consequently, the processes (˜ independent and we arrive at the representation t 1 2 r ds A (t) = γ˜ 4 0 s

13.2 Enhanced Brownian motion

333

where γ˜ is a Brownian motion independent of the process (rt ). This concludes the proof. t ˜ −β ˜ dβ . Exercise 13.8 Derive the characteristic function of 12 0 β s dβ s s s

13.2 Enhanced Brownian motion 13.2.1 Brownian motion lifted to a G2 Rd -valued path

d 2 R ∼ Recallthat exp denotes the exponential map from g = Rd ⊕ so (d) 2 d 2 → G R . Its inverse log (·) can be viewed as a global chart for G Rd , which is therefore diﬀeomorphic to a Euclidean space of dimension d + d (d − 1) /2. (As far as the geometry is concerned, it cannot get much simpler!) If x : [0, T ] → Rd is a smooth path started at 0, then its step-2 lift satisﬁes S2 (x)t = exp (xt + at ) ∈ G2 Rd with area ai,j t =

1 2

t

xis dxjs − xjs dxis

.

0

Recall also that −1

S2 (x)s,t = S2 (x)s ⊗ S2 (x)t = exp (xs,t + as,t ) , where xs,t = xt − xs and as,t ∈ so (d) is given by ai,j s,t

= =

1 2

t

xis,r dxjr − xjs,r dxir s

i,j ai,j t − as −

1 i j xs xs,t − xjs xis,t . 2

This motivates us to deﬁne the lift Brownian motion to a process with values in G2 Rd as follows. Deﬁnition 13.9 (enhanced Brownian motion, or EBM) Let B and A denote a d-dimensional Brownian motion and its L´evy area process. The continuous G2 Rd -valued process B, deﬁned by Bt := exp [Bt + At ] , t ≥ 0, is called enhanced Brownian motion; if we want to stress the underlying process we call B the natural lift of B. Sample path realizations of B are called Brownian rough paths. (This terminology is motivated by the forthcoming Corollary 13.14).

334

Brownian motion

We also write Bs,t = B−1 ⊗ Bt ∈ G2 Rd and observe that this is s consistent with Bs,t = exp [Bs,t + As,t ] , where Bs,t = Bt − Bs (as usual) and As,t ∈ so (d) is given by 1 i j i,j i,j j i Ai,j B = A − A − B − B B s,t t s s s,t s s,t 2 t 1 i j Bs,r dBrj − Bs,r dBri a.s. = 2 s Why? We have just recalled that all this holds for smooth paths, where one can write out all iterated integrals as Riemann–Stieltjes integrals. This is still true for the Brownian case but now convergence is only in the L2 sense, see (13.3), and L2 -limits are only deﬁned up to null-sets, hence the a.s. above. Exercise 13.10 (i) Check that, almost surely, t Bt = 1, Bt , B ⊗ ◦dB ∈ G2 Rd , 0

where ◦dB denotes Stratonovich integration. (ii) Show that t ˆ t = 1, Bt , B B ⊗ dB ∈ T12 Rd , 0

where dB denotes Itˆ o integration, does not yield a geometric rough path. Hint: Consider i = j and compute the expectation. The following proposition should be compared with our deﬁnition of Brownian motion, Deﬁnition 13.1. It identiﬁes enhanced Brownian motion as a special case of a left-invariant Brownian motion on a Lie group. Proposition 13.11 Enhanced Brownian B is a left-invariant motion Brownian motion on the Lie group G2 Rd , ⊗,−1 , 1 in the sense that (i) B0 (ω) = 1 for all ω; (ii) the map t → Bt (ω) is a continuous function of t ∈ R+ for all ω; (iii) for every t, h ≥ 0, Bt,t+h =B−1 t ⊗Bt+ h is independent of σ(Bu : u ≤ t); (iv) it has stationary increments, D

(Bs,s+ t : t ≥ 0) = (Bt : t ≥ 0) . Proof. (i),(ii) are trivial. For (iii) observe that, since Ar is measurable function of {Bu : u ≤ r}, σ (Br : r ≤ s) = σ (Br , Ar : r ≤ s) = σ (Br : r ≤ s) . On the other hand, (Bs,s+ t , As,s+t : t ≥ 0) is measurably determined by σ (Bs,r : r ≥ s) = σ (Bs,s+ t : t ≥ 0) ,

13.2 Enhanced Brownian motion

335

see (13.5) in particular. From deﬁning properties of Brownian motion, σ (Br : r ≤ s) and σ (Bs,s+ t : t ≥ 0) are indepedent and this ﬁnishes the proof. (iv) Recall that for Brownian motion, for all s ≥ 0, D

(Bs,s+ t : t ≥ 0) = (Bt : t ≥ 0) . Then, for s ﬁxed,

Ai,j s,s+ t

t≥0

= =

1 2 1 2 D

=

s+t i Bs,r dBrj

−

j Bs,r dBri

s

t≥0

s+t i j Bs,r dBs,r

−

j i Bs,r dBs,r

s

1 2

t

(13.5) t≥0

Bri dBrj − Brj dBri 0

t≥0

and the same holds for the pair D

(Bs,t , As,s+t )t≥0 = (Bt , At )t≥0 . Recall that the Lie group G2 Rd , ⊗,−1 , e has the additional structure of dilation. As we shall now see, it ﬁts together perfectly with scaling properties of enhanced Brownian motion. Lemma 13.12 (EBM, scaling) Let B be an enhanced Brownian motion. For all λ > 0 we have D

(Bλ 2 t : t ≥ 0) = (δ λ Bt : t ≥ 0) , where δ is the dilatation operator on G2 Rd . Proof. From Brownian scaling, for any λ > 0 we have D

(Bλ 2 t )t≥0 = (λBt )t≥0 . That is, speeding up time by a factor of λ2 is, in law, equivalent to spatial scaling by a factor of λ. Since A is determined as the limit of a homogenous polynomial of degree 2 in terms of Brownian increments, see (13.3), the scaling factor λ appears twice and one has D (Bλ 2 t , Aλ 2 t )t≥0 = λBt , λ2 At t≥0 . Now apply exp: Rd ⊕ so (d) → G2 Rd .

Brownian motion

336

13.2.2 Rough path regularity As shown in Theorem 13.69 in the appendix to this chapter, Brownian motion has inﬁnite 2-variation6 and hence inﬁnite q-variation for any q ≤ 2. Therefore, our only chance to construct a “Brownian” rough path is to look for a geometric p-rough path, p > 2, which lifts B. In other words, we look for a process B with values in G[p] Rd , with ﬁnite p-variation (or 1/pH¨older regularity) with respect to the Carnot–Caratheodory distance, such that π 1 (B) = B. We shall see that an enhanced Brownian motion, i.e. the G2 Rd -valued process t → Bt ≡ exp (Bt , At ), has in fact a.s. ﬁnite α-H¨ older regularity for any a < 1/2. In particular, there is no cost in assuming α ∈ (1/3, 1/2) so that [p] = [1/α] = 2, which conﬁrms that a.e. realization of B = B (ω) is a geometric p-rough path (in fact a geometric 1/p-H¨older rough path), p ∈ (2, 3) , in the sense of Deﬁnition 9.15. In order to establish that B is a.s. a geometric α-H¨older rough path we need to show that, for some α ∈ (1/3, 1/2), the path t → Bt is H¨older regular with respect to d (g, h) = g −1 ⊗ h, the Carnot–Caratheodory norm. Using equivalence of homogenous norms, all this boils down to the question of whether 1/2 d (Bs , Bt ) = d eB s + A s , eB t +A t = eB s , t +A s , t ∼ |Bs ,t | ∨ |As,t | α

is bounded by C (ω) |t − s| , uniformly for s, t on a ﬁnite interval such as [0, 1]. Obviously, this is true for the Brownian increment Bs,t and we are only left with the question “Does there exist α > 1/3 such that

sup s,t∈[0,1]

|As,t | 2α

|t − s|

< ∞ a.s. ?”

(13.6) To fully appreciate the forthcoming Corollary 13.14, the reader is urged to think for a moment about how to prove this! To avoid misunderstandings, let us point out two things: (i) α-H¨older regularity, for any α < 1/2, of t → At (ω) ∈ so (d) is a straightforward application of a suitable version of Kolmogorov’s regularity criterion, applied to a process with values in the Euclidean space so (d). It also follows from Proposition 13.7 that we can represent L´evy area as (α-H¨older continuous) Brownian motion run at a (Lipschitz continuous) random clock. 6 This is not to be confused with the important fact that Brownian motion has ﬁnite quadratic variation in the sense of Theorem 13.70.

13.2 Enhanced Brownian motion

337

(ii) The cancellation on the right-hand side of As,t = (At − As ) −

1 [Bs , Bs,t ] 2

(13.7)

is essential. Thoughtless application of the triangle inequality only shows 1 |B|∞ |Bs,t | 2 α C (ω) |t − s| ,

|As,t | ≤ |At − As | +

which is not the positive answer to (13.6) which we seek. In fact, this only shows that sample paths t → Bt (ω) are a.s. H¨older of exponent less than 1/4 and there is a world of diﬀerence between α < 1/4 and α > 1/3. The ﬁrst is a sample path property of limited interest, the latter implies that almost every realization B (ω) is a geometric rough path to which all of the theory of rough paths is applicable! Theorem 13.13 Write B for a G2 Rd -valued enhanced Brownian motion on [0, T ]. Then there exists η > 0, not dependent on T , such that 0 / 2 d (Bs , Bt ) sup E exp η < ∞. (13.8) |t − s| s,t∈[0,T ] D

Proof. From scaling properties of enhanced Brownian motion, Bs,t = δ (t−s) 1 / 2 B0,1 so that 2

2 D

2

d (Bs , Bt ) = Bs,t = (t − s) B1 . Hence, it suﬃces to ﬁnd η small enough so that " ! 2 < ∞. E exp η B1 2

2

By equivalence of homogenous norms, B1 ∼ |B1 | +|A1 | where B1 (resp. A1 ) denotes d-dimensional Brownian motion (resp. L´evy area) at time 1. Thus, everything boils down to (trivial) Gaussian integrabilty of B1 ∼ N (0, 1) and exponential integrability of L´evy area, which was established in Proposition 13.4. Thanks to (13.8), we can appeal to general regularity results for stochastic processes (as discussed in Section A.4 in Appendix A). Corollary 13.14 Write B for a G2 Rd -valued enhanced Brownian motion on [0, T ]. (i) Let α ∈ [0, 1/2). Then there exists η > 0, not dependent on T , such that " ! η 2 < ∞. B E exp α -H¨o l;[0,T ] T 1−2α

Brownian motion

338

(ii) 5 Assume that ϕ is a ﬁxed increasing function such that ϕ (h) = h log (1/h) (L´evy modulus) in a positive neighbourhood of 0. Then there exists η = η (T ) > 0 such that " ! 2 E exp η Bϕ-H¨o l;[0,T ] < ∞ where Bϕ-H¨o l;[0,T ] = sups< t in[0,T ] d (Bs , Bt ) /ϕ (t − s). Proof. (i) By scaling, T 2α −1 Bα -H¨o l;[0,T ] has the same distribution as 2

2

Bα -H¨o l;[0,1] . We then apply Theorem A.19 with modulus function h → h1/α . (ii) A direct application of Theorem A.19 with modulus function h → ϕ (h). Let us remark that we may take α ∈ (1/3, 1/2) in the previous corollary, which therefore implies a fortiori that B is a.s. a geometric α-H¨ older rough path. Since α-H¨older regularity implies p-variation regularity with p = 1/α ∈ (2, 3), we trivially see that B is a.s. a geometric p-rough path. Similarly, ϕ-H¨older regularity implies ϕ−1 -variation regularity where ϕ−1 (h) ∼

h2 . log (1/h)

In fact, more is true and the general results of Section A.4 (Theorem A.24 to be precise) show that (13.8) implies Theorem 13.15 (exact variation for EBM) Write B for a G2 Rd valued enhanced Brownian motion on [0, T ] . Let7 ψ 2,1 (h) =

h2 , ln ∗ ln ∗ (1/h)

where ln∗ = max (1, ln). Then there exists η > 0, not dependent on T , such that " ! η 2 Bψ 2 , 1 -var;[0,T ] < ∞ E exp T where the reader is reminded that $ # Bψ -var;[0,T ] = inf M > 0, sup ψ d Bt i , Bt i + 1 /M ≤ 1 . D ∈D[0,T ] t ∈D i

Enhanced Brownian motion also satisﬁes a law of iterated logarithms. We ﬁrst recall Khintchine’s law of iterated logarithms for a Brownian motion8 7 This is one instance of a (generalized) variation function, as introduced in Deﬁnition 5.45. 8 See McKean’s classical text [125, p. 12] or [94, Theorem 9.23], for instance; it can also be obtained as a consequence of Schilder’s theorem, to be discussed in Section 13.6.

13.3 Strong approximations

339

which states that, for a 1-dimensional Brownian motion β, & % |β h | = c = 1, (13.9) P lim sup h→0 ϕ (h) √ where c 5 ∈ (0, ∞) is a deterministic constant (equal to 2, in fact) and ϕ (h) = h ln∗ ln∗ (1/h). (Observe that ϕ is Lipschitz equivalent to the inverse of ψ 2,1 , see Lemma 5.48). Proposition 13.16 (law of the iterated logarithm for EBM) Write B for G2 Rd -valued enhanced Brownian motion on [0, T ] . Let ϕ (h) = 5 h ln∗ ln∗ (1/h). Then there exists a deterministic constant c ∈ (0, ∞) such that 0 / B0;[0,h] = c = 1. P lim sup ϕ (h) h→0 Proof. From general principles (Theorem A.21 in Section A.4) we see that (13.8) implies ||B||0;[0,h] L := lim sup ϕ (h) h→0 deﬁnes an almost surely ﬁnite random variable, i.e. L (ω) < ∞ almost surely. On the other hand, by the classical law of iterated logarithms for Brownian motion, it is clear that lim sup h→0

||B||0;[0,h] ϕ (h)

≥ lim sup h→0

|Bh | = c˜ > 0 a.s. ϕ (h)

√

where c˜ = 2 is the constant from Khintchine’s law of iterated logarithms. It follows that 0 < c˜ ≤ L (ω) < ∞ a.s. By construction of enhanced Brownian motion, ||B||0;[0,h] is σ(Bt : t ∈ [0, h]) measurable where B = π 1 (B) denotes the underlying d-dimensional Brownian motion. It now follows from Blumenthal’s zero–one law for Brownian motion9 that L equals, almost surely, a deterministic constant.

13.3 Strong approximations We discuss a number of approximation results in which enhanced Brownian motion arises as an almost-sure limit or limit in probability, always in the appropriate rough path metrics. The interest in these results is that either convergence is preserved under continuous maps; applied to the Itˆ o–Lyons 9 See,

for example, [143, Chapter III] or [94, Theorem 7.17].

340

Brownian motion

map in rough path topology all convergence results discussed then translate immediately to strong convergence results in which the limit of certain (random ODEs/RDEs) is identiﬁed as an RDE solution driven by B, i.e. as the solution to a Stratonovich SDE. Our list of approximations is not exhaustive and several strong convergence results (including convergence of non-nested piecewise linear approximations and Karhunen–Lo´eve approximations) are left to a later chapter on Gaussian processes (Chapter 15) which provides the natural framework for these convergence results.

13.3.1 Geodesic approximations Let us ﬁx α < 1/2. From the last section, we know that enhanced Brownian motion B has sample paths with B (ω) ∈ C0a-H¨o l [0, T ] , G2 Rd almost surely. From general interpolation results, it follows that for every α < 1/2 we also have B (ω) ∈ C00,a-H¨o l [0, T ] , G2 Rd almost surely. From the very deﬁnition of the space C00,a-H¨o l it then follows that almost every B (ω) is the dα -H¨o l -limit of smooth paths lifted to G2 Rd . (When α ∈ (1/2, 1/3) this is precisely the diﬀerence between weak geometric α-H¨older rough paths and (genuine) geometric α-H¨ older rough paths.) The important remarks here are that (i) these approximations are based on entirely deterministic facts and applied to almost every ω and (ii) they rely on all the information contained in B (ω), that is on the underlying Brownian path B (ω) = π 1 (B) and the L´evy area A (ω) = Anti (π 2 (B)) . This is in strict contrast to all probabilistic approximations discussed in the following sections. These are only based on the Rd -valued Brownian motion B = π 1 (B) and frequently (but not always!) give rise to the standard L´evy area which underlies our deﬁnition of enhanced Brownian motion.

13.3.2 Nested piecewise linear approximations As earlier, B = exp (B + A) denotes an enhanced Brownian motion, the natural lift of a d-dimensional Brownian motion B. We now consider a sequence (Dn ) of nested dissections, that is Dn ⊂ Dn +1 for all n, such that |Dn |, the mesh of Dn , tends to zero as n → ∞. The reason for this assumption is that then Fn := σ (Bt : t ∈ Dn )

13.3 Strong approximations

341

forms a family of σ-algebras increasing in n. In other words, (Fn ) is a ﬁltration and this will allow us to use elegant martingale arguments. (What we will not use here is the fact that t → Bt is a martingale.) Deﬁne B n = B D n (ω) as the piecewise linear approximation based on the dissection Dn . We consider the step-2 lift and write, as usual, Bn := S2 (B n ) = exp (B n + An ) . Proposition 13.17 For ﬁxed t in [0, T ] the convergence Bnt → Bt holds almost surely and in L2 (P). Proof. The statement is d (Bnt , Bt ) → 0 (a.s. and in L2 ). This is equivalent to (a) |Btn − Bt | → 0

and

(b) |Ant − At | → 0.

Ad (a), Since {Dn } is nested, Fn := σ (Bt : t ∈ Dn ) forms a ﬁltration. We claim that a.s. E [Bt |Fn ] = Btn and E [At |Fn ] = Ant . Using the Markov property of B, E [Bt |Fn ] = E Bt |Bt i , Bt i + 1 where ti , ti+1 are two neighbours in Dn with t ∈ [ti , ti+1 ]. It is a simple exercise of Gaussian conditioning10 to see that ti+1 − t t − ti E Bt |Bt i , Bt i + 1 = Bt i + Bt ti+1 − t i ti+1 − ti i + 1 and this is precisely equal to Btn . Mesh |Dn | → 0 implies that Bt is (∨n Fn )measurable and martingale convergence shows that11 Btn = E [Bt |Fn ] → Bt a.s and in L2 . It simpliﬁes things to set Ad (b). We ﬁrst ﬁx n and show E [At |Fn ] = Ant .

t i ˜ j m ˜ ˜ be a dissection of β = B , β = B , i = j and consider 0 βdβ. Let D ˜m [0, t] , with t ﬁxed, and mesh D → 0. By L2 -continuity of E [·|Fn ] and 1 0 The

reader might be familiar with E [B t |B T ] = (t/T ) B T . are more elementary arguments for B tn → B but this one extends to the area

1 1 There

level.

Brownian motion

342

(13.3), % E

t

& ˜ n βdβ|F

0

 ˜  = lim E  βti β t i ,t i + 1 Fn m →∞ m ˜ t i ∈D " ! ˜ E βti β = lim t i ,t i + 1 Fn 

m →∞

=

˜m t i ∈D

lim

m →∞

=

ti t

˜n β nti β t i ,t i + 1

˜ and part (a)) (use β ⊥ β

˜m ∈D n

˜ , β n dβ

0

by deﬁnition of the Riemann–Stieltjes integral applied to the (bounded ˜ and sub˜ n . After exchanging the roles of β and β variation!) integrator β traction, we ﬁnd E [At |Fn ] = Ant as claimed. The ﬁnal reasoning is as above: At is (∨n Fn )-measurable, this follows from (13.3), and by martingale convergence Ant = E [At |Fn ] → At a.s and in L2 . Theorem 13.18 For every α ∈ [0, 1/2), there exists a positive random variable M with Gaussian tails, in particular M < ∞ a.s., such that sup n =1,...,∞

Bn α -H¨o l;[0,T ] ≤ M

where B∞ ≡ B. Proof. We keep the notation of the last proof, where we established Btn = E [Bt |Fn ] and Ant = E [At |Fn ] . Simple algebra (attention As,t = At − As !) yields n = E [Bs,t |Fn ] and Ans,t = E [As,t |Fn ] . Bs,t

(13.10)

We focus on one component in the matrix As,t , say Ai,j s,t with i = j. Clearly, 2 i,j 1/2 2 ∼ Bs,t , As,t ≤ |As,t | ≤ |Bs,t | ∨ |As,t | where of homogenous norms on ∼ is a reminder of the Lipschitz equivalence 2 2α G2 Rd . From Corollary 13.14, Bs,t ≤ M1 (t − s) for a non-negative r.v. M1 with Gauss tail. In particular, |M1 |L q < ∞ for all q < ∞. (More precisely, the Gauss tail is captured in |M1 |L q = O q 1/2 for q large.) We then have 2α 2α −M1 (t − s) ≤ Ai,j s,t ≤ M1 (t − s)

13.3 Strong approximations

343

and conditioning with respect to Fn yields −M2 (t − s)

2α

2α

≤ E[Ai,j s,t |Fn ] ≤ M2 (t − s)

where M2 = sup{E[M1 |Fn ] : n ≥ 1} has its Lq -norm controlled by Doob’s maximal inequality q |M1 |L q = O q 1/2 as q → ∞. |M2 |L q ≤ q−1 (The square-root growth implies that M2 has a Gauss tail.) From (13.10) we have 2α 2α −M2 (t − s) ≤ Ans,t;i,j ≤ M2 (t − s) , where M2 is independent of n. If necessary, replace M2 by d2 M2 to obtain the estimate 2α sup Ans,t ≤ M2 (t − s) . n

The same reasoning, easier in fact, shows that n ≤ M2 (t − s)α . sup Bs,t n

Putting everything together n n n 1/2 α Bs,t ∼ Bs,t ∨ As,t ≤ M2 (t − s) , which is precisely the required estimate on Bn α -H¨o l , uniform over n ≥ 1. Setting M = M1 + M2 ﬁnishes the proof. With the uniform bounds of Theorem 13.18, a simple argument (interpolation plus H¨ older’s inequality) leads to Corollary 13.19 Let (Dn ) be a sequence of nested dissections of [0, T ], that is Dn ⊂ Dn +1 for all n, such that mesh |Dn | → 0 as n → ∞. Then dα -H¨o l;[0,T ] S2 B D n , B → 0 almost surely and in Lq (P) for all q ∈ [1, ∞).

13.3.3 General piecewise linear approximations We saw that martingale arguments lead to a quick proof of convergence of (lifted) piecewise linear approximations to enhanced Brownian motion, along a nested sequence of dissections. Dealing with an arbitrary sequence (Dn ) requires a direct analysis. We ﬁrst establish pointwise Lq -estimates (only here we use the speciﬁcs of piecewise linear approximations) followed by a general Besov–H¨older-type embedding which implies the corresponding rough path estimates.

Brownian motion

344

Proposition 13.20 Let D be a dissection of [0, T ] and 1/r ∈ [0, 1/2]. Then there exists C = C (T ) such that for k = 1, 2 and all 0 ≤ s < t ≤ T, q ≥ 1, π k Bs,t − S2 B D s,t

≤ C |D|

L q (P)

1/2−1/r

√

1/r

q |t − s|

k .

Proof. We write A for the L´evy area, i.e. B = exp (B + A) . Step 1: We consider ﬁrst s, t ∈ D. In this case, the level 1 estimate is trivial D as Bs,t = Bs,t . For level 2, observe that if s = tm and t = tn for some m < n, we have n −1 At l ,t l + 1 . Bs,t − S2 B D s,t = i=m

From Exercise 13.6, π 2 S2 B D s,t − Bs,t Since

n −1

Lq

n −1 ≤ c1 q At i ,t i + 1 i=m

@ A n −1 A 2 ≤ c2 q B (ti+1 − ti ) . i=m

L2

2

(ti+1 − ti ) ≤ |t − s| min (|D| , |t − s|), we have 1/2 1/2 π 2 S2 B D s,t − Bs,t ≤ c2 q (|D| ∧ |t − s|) . |t − s| . i= m

Lq

Step 2: (small intervals) Consider the case sD ≤ s < t ≤ sD . Then D Bs,t − Bs,t q L

≤

t − s D q + |Bs,t | q B s ,s D L L sD − sD

≤

c3 q 1/2 |t − s|

1/2

which settles level 1. For level 2, we π 2 S2 B D s,t − Bs,t q ≤ L

≤

= c3 q 1/2 (|D| ∧ |t − s|)1/2 ,

estimate π 2 ◦ S2 B D s,t q + |π 2 (Bs,t )|L q L 2 |t − s| + |t − s| c4 q |sD − sD |

≤

2c4 q |t − s|

=

2c4 q (|D| ∧ |t − s|)

1/2

1/2

. |t − s|

.

Step 3: (arbitrary intervals) It remains to deal with s < t such that s ≤ sD ≤ tD ≤ t. The level 1 estimate follows immediately from the level 1 estimate of step 2; indeed, D Bs,t − Bs,t q ≤ B D D − Bs,s D + BtD ,t − Bt D ,t q s,s D L L q L

≤

2c3 q

1/2

(|D| ∧ |t − s|)1/2 .

13.3 Strong approximations

345

For the level 2 estimate, we note the algebraic identity in T 2 Rd , S2 B D s,t − Bs,t = S2 B D s,s D − Bs,s D ⊗ S2 B D s D ,t + Bs,s D ⊗ S2 B D s D ,t − BD D ,t s D D ⊗S2 B D t ,t D + Bs,t D ⊗ S2 B D t ,t − Bt D ,t . D

(13.11)

(13.12) (13.13)

Projection to level 2 yields an expression of π 2 S2 B D s,t − Bs,t in terms of the ﬁrst and second level of all involved terms. For instance, the Lq -norm of (13.12) projected to level 2 is readily estimated by D D π 2 Bs,s D q + π 2 S2 B D D B S − B + π D ,t 2 2 s D L s ,t D t D ,t L q Lq + Bs,s D L q + |Bt D ,t |L q .|BsDD ,t D − Bs D ,t D |L q + Bs,s D L q . |Bt D ,t |L q

=0

which, by the previous steps, is bounded by a constant times q times D s −s + |D| ∧ tD −sD 1/2 . tD −sD 1/2 + |t−tD | + sD −s 1/2 . |t−tD |1/2 1/2

≤ 3(|D| ∧ |t − s|)1/2 |t − s|

.

The estimates for the Lq -norm of (13.11), (13.13) projected to level 2 are very similar and also lead to bounds of the form O q |D| ∧ |t − s|)1/2 1/2

. We omit the details. |t − s| Step 4: The estimates of steps 1–3 can be summarized in π k S2 B D s,t − Bs,t

Lq

k k −1 1/2 ≤ c5 q 2 (|D| ∧ |t − s|) . |t − s| 2 ,

valid for k = 1, 2 and all s < t in [0, T ]. By geometric interpolation, using 2/r ∈ [0, 1], we also have π k S2 B D s,t − Bs,t

Lq

≤ c5

k

q2

|D|

1−2/r

2/r

∧ |t − s|

1/2

. |t − s|

k −1 2

k 1/2−1/r k /r . |t − s| ). ≤ c6 q 2 |D|

We then obtain the following quantitative estimates in both (homogenous, inhomogenous) H¨ older rough path metrics.

Brownian motion

346

Corollary 13.21 Let 0 ≤ α < 1/2. Then, for every η ∈ (0, 1/2 − α), there exists a constant C = C (α, η, T ) such that, for all q ∈ [1, ∞), dα -H¨o l S2 B D , B

η /2

≤ Cq 1/2 |D|

L q (P)

and also, for k ∈ {1, 2} , (k ) ρα -H¨o l;[0,T ] B, S2 B D

k

Lq

(P)

η

≤ Cq 2 |D| .

In particular, S2 B D → B in Lq (P) for all q < ∞ as |D| → 0, with respect to either α-H¨ older rough path metric. Proof. Deﬁne r by 1/r := 1/2 − η and note that α < 1/r < 1/2. Write c1 , c2 , . . . for constants which may depend on T (and tacitly on d). We have, for any q ∈ [1, ∞), 0 ≤ s < t ≤ T and dissection D of [0, T ] , π k Bs,t − S2 B D s,t

1/2−1/r

L q (P)

≤ c1 |D|

√

1/r

q |t − s|

k , k = 1, 2,

by the previous result (Proposition 13.20). Also, q 1/q

E (Bs,t )

≤ c2

√

1/2

q |t − s|

√ γ ≤ c3 ( q |t − s| )

from basic scaling and integrability of enhanced Brownian motion and both together easily imply that √ q 1/q γ ≤ c4 ( q |t − s| ) . E S2 B D s,t We can then appeal to Theorem A.13 in Appendix A to see that D B − S π k s,t 2 B k k s,t 1/2−1/r η sup ≤ c5 q 2 |D| = c5 q 2 |D| kα |t − s| s,t∈[0,T ] L q (P)

and also

dα -H¨o l S2 B D , B

η /2

L q (P)

≤ c6 q 1/2 |D|

.

Exercise 13.22 Let (Dn ) ⊂ D ([0, T ]) be a sequence of dissections of older [0, T ]. Show that, S2 B D n → B almost surely with respect to α-H¨ rough path topology, α ∈ [0, 1/2), provided mesh (Dn ) → 0 fast enough. Hint: Borel–Cantelli.

13.3 Strong approximations

347

13.3.4 Limits of Wong–Zakai type with modiﬁed L´evy area We formulate the following result for random rough paths which are the limits of their “piecewise linear approximations”. Although the example we have in mind here is enhanced Brownian motion (in which case α ∈ older convergence was estab(1/3, 1/2) , N ∈ {2, 3, . . . } and Lq (P)/α-H¨ lished in the previous section), it applies to the bulk of stochastic processes which admit a lift to rough path. Deﬁnition (0, 1] and assume X = X (ω) has sample paths 13.23 Let α ∈ in C0α -H¨o l [0, T ] , G[1/α ] Rd ; write X = π 1 (X) for its projection to a process with values in Rd . (i) Let N ≥ [1/α] and let (Dn ) = (tni : i) be a sequence of dissections of [0, T ] such that ∀q ∈ N : sup S[1/α ] X D n α -H¨o l;[0,T ] q < ∞ L (P) n ∈N d∞ S[1/α ] X D n , X → 0 in probability as n → ∞. If (X n (ω)) ⊂ C 1-H¨o l [0, T ] , Rd such that, for all ω,12 −1 Pnt (ω) := SN (X n (ω))0,t ⊗ SN X D n (ω) 0,t takes values in the centre of GN Rd whenever t ∈ Dn then we say that (X n ) is an approximation on (Dn ) with perturbations (Pn ) on level N to the random rough path X. (ii) Let β ∈ (0, 1] and [1/β] = N ≥ [1/α]. We say that an approximan tion (X n ) on (Dn ) with perturbations d(P ) on level N to the random α -H¨o l [1/α ] [0, T ] , G R is min (α, β)-H¨ older compararough path X ∈ C ble (with constants c1 , c2 , c3 ) if for all tni , tni+1 ∈ Dn , all ω and all q ∈ [1, ∞), β −1 |X n |1-H¨o l; [t n ,t n ] ≤ c1 X D n 1-H¨o l; [t n ,t n ] + c2 tni+1 − tni i i+ 1 i i+ 1 n β Ps,t q ≤ c3 |t − s| for all s, t ∈ [0, T ] . L (P)

Theorem 13.24 Let α, β ∈ (0, 1] and [1/β] = N≥ [1/α]. Assume X = X (ω) has sample paths in C0α -H¨o l [0, T ] , G[1/α ] Rd and write X = π 1 (X) for its projection to a process with values in Rd ; let (X n ) be an approximation on (Dn ) with perturbations (Pn ) on level N to X. (i) If the approximation is min (α, β)-H¨ older comparable (with constants c1 , c2 , c3 ) then for all γ < min (α, β), ∀q ∈ [1, ∞) : sup SN (xn )γ -H¨o l;[0,T ] q < ∞. n ∈N

1 2 It

L (P)

/ Dn . is not assumed that P nt (ω) ∈ centre of G N Rd when t ∈

Brownian motion

348

(ii) If Pnt → Pt in probability for all t ∈ ∪n ∈N Dn dense in [0, T ] then, for all such t, d SN (X n )0,t , SN (X)0,t ⊗ P0,t → 0 in probability. (iii) If the assumptions of both (i) and (ii) are met then, for all γ < min (α, β), dγ -H¨o l;[0,T ] (SN (X n ) , Xϕ ) → 0 in Lq for all q ∈ [1, ∞) ⊗N where ϕ := log P ∈ V N Rd ≡ gN Rd ∩ Rd and Xϕ = exp (log (SN (X)) + ϕ). Proof. (i) By a standard Garsia–Rodemich–Rumsey or Kolmogorov argu ˜ < β, the existence ment, the assumption on Pns,t L q (P) implies, for any β q of C3 ∈ L for all q ∈ [1, ∞) so that ˜ β ∀s < t in [0, T ] : Pns,t ≤ C3 (ω) |t − s| . ! " ˜ large enough so that 1/β ˜ = [1/β] = N and γ < min α, β ˜ . We can pick β ˜ instead of β and learn that there We can then apply Theorem 12.18 with β exists a deterministic constant c such that D n n + 1 + C3 . sup SN (X )m in (α , β˜ )-H¨o l ≤ c sup S[1/α ] X α -H¨o l n

n

Taking Lq -norms ﬁnishes the uniform Lq -bound. (ii) From Theorem 12.18 d SN (X n )0,t , SN (X)0,t ⊗ P0,t ≤ d SN X D n 0,t , SN (X)0,t + d Pn0,t , P0,t which, from the assumptions, obviously converges to 0 (in probability) for every ﬁxed t ∈ ∪n Dn . (iii) General facts, about Lq -convergence of rough paths (cf. Section A.3.2 in Appendix A; inspection of the proofs shows that convergence in probability for all t in a dense set of [0, T ] is enough), implies the claimed convergence. Remark 13.25 The assumptions on X n and Pn guarantee that the (X n ) remain, at min (α, β)-H¨ older scale, comparable to the piecewise linear approximations. In particular, the assumption on |X n |1-H¨o l; [t n ,t n ] = i i+ 1 ˙ n X is easy to verify in all examples below. The intuition is that, ∞; [t ni ,t ni+ 1 ]

13.3 Strong approximations

349

if we assume that X n runs at constant speed over any interval I = tni , tni+1 , Dn = (tni ), it is equivalent to saying that β length (X n |I ) ≤ c1 length X D n |I + c2 |I| β ( = c1 Xt ni ,t ni+ 1 + c2 tni+1 − tni ).

Remark 13.26 In both examples below we have β = 1/N . It is Theorem 12.18, from which Theorem 13.24, was essentially obtained as a corollary, which suggests the need for the slightly looser condition [1/β] = N . Example 13.27 (Sussmann) Take any sequence of dissection of [0, T ] , say (Dn ) with mesh |Dn | → 0 and X (ω) such as in Theorem 13.24. The piecewise linear approximation X D n is nothing but the repeated concatentation of linear chords connecting the points (Xt : t ∈ Dn ). For some ﬁxed v ∈ V N Rd , N ∈ {2, 3, . . . } we now construct Sussmann’s nonstandard approximation X n as (repeated) concatenation of linear chords n and “geodesic loops”. First, we require X n (t) n= X (t) for all t ∈ Dn = n i.e. t ∈ ti−1 , ti for some i, we proceed (ti : i). For intermediate times, as follows: For t ∈ [tni−1 , tni−1 + tni /2] we run linearly constant n and nat speed from X tni−1 so as to reach X (tni ) by time t + t i−1 i /2. (This n and X (t is the usual linear interpolation between X tni−1 ) i nbut run at n n double speed.) This leaves us with the interval [ ti−1 + ti /2, ti ] for other a “geodesic” purposes and at x (tni ) ∈ Rd, through n wen run, starting ξ : n n d t − tn ∈ GN Rd . with exp v/ [ ti−1 + ti /2, ti ] → R associated i i−1 Since N > 1, π 1 exp v/ tni − tni−1 = 0 and so this geodesic path returns to its starting point in Rd ; in particular, X n tni−1 + tni /2 = X n (tni ) = X (tni ) . It is easy to see (via Chen’s theorem) that this approximation satisﬁes the assumptions of Theorem 13.24 with −1 Pns,t := SN (X n )s,t ⊗ SN X D n s,t = ev(t−s) ∀s, t ∈ Dn 1/N (so that Pns,t L q = Pns,t |t − s| ﬁrst for all s, t ∈ Dn and then, easy vt to see, for all s, t) and deterministic limit n P0,tn = e , β = 1/N . Indeed, n the length of x over any interval I = ti−1 , ti is obviously bounded by the length of the corresponding linear chord plus the length of the geodesic associated with exp(|I| v), which is precisely equal to 1/N

exp(|I| v) = |I|

exp (v) =: c2 |I|

1/N

.

An application of Theorem 13.24, applied to X = B, i.e. enhanced Browian motion, gives the following convergence result. For any γ < 1/N we have dγ -H¨o l (SN (B n ) , SN (B) ⊗ ev· ) → 0 in Lq for all q ∈ [1, ∞).

Brownian motion

350

Observe that γ for 1/γ ∈ [N, N + 1) is a genuine rough path convergence. In Section 12.2 we have identiﬁed RDEs driven by T(0,...,0,v·) SN (B) = SN (B) ⊗ ev· as RDEs driven by B with an additional drift. Example 13.28 (McShane) Given x ∈ C [0, T ] , R2 , an interpolation function φ = φ1 , φ2 ∈ C 1 [0, 1] , R2 with φ (0) = (0, 0) and φ (1) = (1, 1) and a ﬁxed D = (ti ) of [0, T ] we deﬁne the McShane interpolation dissection x ˜D ∈ C [0, T ] , R2 componentwise by ;i x ˜D t

:=

xit D

∆ (t,i)

+φ

t − tD D t − tD

xit D ,t D , i = 1, 2.

The points tD , tD ∈ D denote the left, resp. right, neighbouring points of t in the dissection and i, if x1t D ,t D x2t D ,t D ≥ 0 ∆ (t, i) := 3 − i, if x1t D ,t D x2t D ,t D < 0. As a simple consequence of this deﬁnition, for u < v in [ti , ti+1 ] D u − ti v − ti 2 φ 1 ˜ u ,v = exp x A ˜D + , S2 x x x u ,v t i ,t i + 1 t i ,t i + 1 ti+1 − ti ti+1 − ti where Aφ (u, v) ≡ Aφu ,v is the area increment of φ over [u, v] ⊂ [0, 1]. Con sider now X (ω) = B (ω) = exp (B + A) ∈ C0α -H¨o l [0, 1] , G[1/α ] R2 with α ∈ (1/3, 1/2) and take any (Dn )n ∈N with |Dn | → 0. (We know from Section 13.3.3 that S2 B D n converges to B in α-H¨older rough path topology and in Lq for all q.) It is easy to see (via Chen’s theorem) that McShane’s approximation to 2-dimensional Brownian motion satisﬁes the assumptions of Theorem 13.24 with β = 1/2, N = 2. Indeed, writing ˜Dn B n := B for McShane’s approximations, it is clear that for any s < t s − tD t − tD 2 1 n φ with D = Dn , Ps,t = exp xt D ,t D xt D ,t D × A tD − tD tD − tD and for two points ti < tj in Dn the relevant increment is given by Pnti ,t j

= exp

Aφ0,1

j 1 2 Bt k ,t k + 1 Bt k ,t k + 1 . k =i+1

13.3 Strong approximations

351

j It is easy to see that k = i+1 Bt1k ,t k + 1 Bt2k ,t k + 1 converges, in L2 say, to its mean j 2 2 (tk +1 − tk ) = |tj − ti | , π π k = i+1 1/2 while Pnti ,t j ≤ c˜q |tj − ti | follows directly from Lq

n Pt i ,t j ∼

1/2

j 1 Bt k ,t k + 1 Bt2k ,t k + 1

k = i+1

and j 1 2 Bt k ,t k + 1 Bt k ,t k + 1 k = i+1

j 1 ≤ Bt k ,t k + 1 Lq

k =i+1

Lq

2 Bt k ,t k + 1

Lq

= cq |tj − ti | .

1/2 for all s, t since for u < v in [ti , ti+1 ] In fact, Pns,t L q ≤ c˜q |t − s| q /2 1/q 1 v − t u − t i i = E xt i ,t i + 1 x2t i ,t i + 1 Aφ , Lq ti+1 − ti ti+1 − ti 1/2 u − v 1/2 = cφ,q (ti+1 − ti ) ≤ cφ,q |u − v| . ti+1 − ti At last, for any ti ∈ Dn , we have |B n |1-H¨o l;[t i ,t i + 1 ] ≤ φ ∞ B D n 1-H¨o l;[t i ,t i + 1 ] . This shows that all assumptions of Theorem 13.24 are satisﬁed and we have, for all α ∈ [0, 1/2), n Pu ,v

dα -H¨o l (S2 (B n ) , exp (Bt + At + tΓ)) → 0 in Lq for all q ∈ [1, ∞) where At is the usual so (2)-valued L´evy’s area and 2 φ 0 A0,1 π ∈ so (2) . Γ= − π2 Aφ0,1 0

13.3.5 Convergence of 1D Brownian motion and its ε-delay A real-valued Brownian motion β and its ε-delay β ε ≡ β (· − ε) give rise to the R2 -valued process t → (β εt , β t ) := β t−ε , β t . We shall assume ε > 0. On a suﬃciently small time interval (of length ≤ ε), it is clear that β ε and β have independent Brownian increments so that ε β s,t , β s,t : t ∈ [s, s + ε]

Brownian motion

352

has the distribution of a 2-dimensional standard Brownian motion (Bt : t ∈ [0, ε]). This suggests deﬁning the stochastic area increments of (β ε , β) as Aεs,t

=

1 2

t

β εs,· dβ − β s,· dβ ε s t

1 β εs,· dβ − β εs,t β s,t . 2 s 2 2 process (Xεt : In particular, deﬁne the G R -valued continuous ε we can ε t ≥ 0) as β 0,· , β 0,· enhanced with the area process A0,· so that =

log Xεt

=

β ε0,t β 0,t

+

0 −Aε0,t

Aε0,t 0

.

It is left for the reader to check, as a simple consequence of Chen’s −1 relation, that the area-component of Xεs,t = (Xεs ) ⊗ Xεt is indeed given by Aεs,t . We then have Lemma 13.29 There exists η > 0 such that 0 / 2 d (Xεs , Xεt ) sup sup E exp η < ∞. |t − s| ε∈(0,1] s,t∈[0,T ] Proof. We estimate ε As,t

L q (P)

≤ ≤ ≤

t 1 ε β s,· dβ + β εs,t β s,t L q (P) 2 s L q (P) t 1 β εs,· dβ + β s−ε,t−ε L 2 q (P) β s,t L 2 q (P) 2 s L q (P) t β εs,· dβ + c1 |t − s| q q s

L (P)

1/2 since β s,t L 2 q (P) / |t − s| = β 0,1 L 2 q (P) = O q 1/2 , cf. Lemma A.17. Thus it will be enough to show that t ε β s,· dβ = O ((t − s) q) . s

L q (P)

To this end, we ﬁrst observe that by stationarity of Brownian increments we may replace (s, t) by (0, t − s). In other words, it suﬃces to estimate Lq -moments of the continuous martingale

t

β ε0,· dβ.

Mt = 0

13.3 Strong approximations

Noting !M "t =

353

t β −ε,s−ε 2 ds, the exponential martingale inequality gives 0

P (Mt > tx)

≤ P Mt > tx, !M "t ≤ xt2 + P !M "t > xt2 2 1 (tx) 2 2 + P t |β| > xt ≤ exp − ∞;[0,t] 2 xt2 1 2 = exp − x + P |β|∞;[0,1] > x . 2

The same argument applies to −Mt and we see that |Mt | /t has an exponential tail. Equivalently, t 1 ε β 0,· dβ = O (q) , t 0 L q (P) which is what we wanted to show. An appeal to general regularity results for stochastic processes (see Section A.4 in Appendix A) then gives Proposition 13.30 Let α ∈ [0, 1/2). Then there exists η > 0 such that " ! 2 sup E exp η Xε α -H¨o l;[0,T ] < ∞ ε∈(0,1]

and

" ! 2 sup E exp η Xε ψ 2 , 1 -var;[0,T ] < ∞.

ε∈(0,1]

Theorem 13.31 Let β be a 1-dimensional Brownian motion with ε-delay older rough path, β ε ≡ β (· − ε), lifted to G2 R2 -valued geometric α-H¨ α ∈ (1/3, 1/2) given by Xεt = exp β ε0,t , β 0,t ; Aε0,t . Set also ˜ t := exp((β t , β t ) ; −t/2). X Then, for any q ∈ [1, ∞) we have ˜ dα -H¨o l;[0,T ] Xε , X

L q (P)

→ 0 as ε → 0.

Proof. Thanks to Proposition A.15 of Appendix A, in the presence of uniform α-H¨older bounds (which we established in Proposition 13.30), it suﬃces to show that log Xεt ≡ β 0,t−ε , β 0,t ; Aε0,t → (β t , β t , −t/2) as ε → 0

Brownian motion

354

in probability and pointwise, i.e. for ﬁxed t ∈ [0, T ]. Clearly it is enough to focus on the area. Using t t β ε0,· dβ → β 0,· dβ 0

0

(in probability and pointwise as ε → 0), easily seen from Itˆo’s isometry, we have t 1 ε β ε0,· dβ − β ε0,t β 0,t A0,t = 2 0 t 2 1 β 0,t → β 0,· dβ − 2 0 1 2 2 1 1 β 0,t − t − β 0,t = − t = 2 2 2 and the proof is ﬁnished.

13.4 Weak approximations We now turn to weak approximations of enhanced Brownian motion B and prove a Donsker-type theorem in the Brownian rough-path setting. In later chapters on Gaussian resp. Markov processes we shall encounter other weak convergence results which may be applied to enhanced Brownian motion and thus deserve to be mentioned brieﬂy here: from general principles on Gaussian rough paths we have, for instance, that enhanced fractional Brownian motion BH converges weakly to B as H → 1/2 (recall that Brownian motion is fractional Brownian with Hurst parameter H = 1/2). Similarly, a sequence of Markov processes (X a n ) on Rd with (uniformly elliptic) generator of divergence form ∇ · (an ∇), enhanced with suitable stochastic area to a G2 Rd -valued process Xa n , will be seen to converge weakly to B provided that 2an → I, the d × d identity matrix. In all these cases weak convergence holds with respect to a rough path metric (namely, α-H¨older topology with any α < 1/2). The interest in such results is that weak convergence is preserved under continuous maps; applied to the Itˆo–Lyons map in rough path topology all these weak convergence results translate immediately to weak convergence results in which the limit of certain (random ODEs/RDEs) is identiﬁed as an RDE solution driven by B, i.e. as the solution to a Stratonovich SDE.

13.4.1 Donsker’s theorem for enhanced Brownian motion Consider a random walk in Rd , given by the partial sums of a sequence of independent random variables (ξ i : i = 1, 2, 3, . . . ), identically distributed,

13.4 Weak approximations

355

D

ξ i = ξ with zero mean and unit covariance matrix, E (ξ ⊗ ξ) = I. Donsker’s theorem (e.g. [143]) states that the rescaled, piecewise linearly connected random walk 1 (n ) Wt = 1/2 ξ 1 + · · · + ξ [tn ] + (nt − [nt]) ξ [n t]+1 n converges weakly to standard Brownian motion, on C [0, 1] , Rd with sup topology. It was observed by Lamperti in [100] that this convergence takes 2p < ∞, place in α-H¨older topology, for α < (p − 1) /2p provided E |ξ| p > 1; and this is essentially sharp. In particular, for convergence in αH¨older topology for any α < 1/2 one needs ﬁnite moments of any order. We now extend this to a rough-path setting. More precisely, we show weak convergence in homogenous α-H¨older norm of the lifted rescaled random walk to G2 Rd -valued enhanced Brownian motion B. Observe that this implies a weak Wong–Zakai-type theorem: ODEs driven by W (n ) converge weakly (in α-H¨ older topology) to the corresponding Stratonovich SDE solution. Theorem 13.32 (Donsker’s theorem for EBM) Assume ξ has zero p mean, unit covariance and E (|ξ| ) < ∞ for all p ∈ [1, ∞) and α < 1/2. (n ) Then S2 (W· ) converges weakly to B, in C 0,α -H¨o l [0, 1] , G2 Rd . We shall, in fact, prove a more general theorem that deals with random walks on groups. More precisely, Chen’s theorem implies S2 W (n ) = δ n −1 / 2 eξ 1 ⊗ · · · ⊗ eξ [ n t ] ⊗ e(n t−[n t])ξ [ n t ] + 1 t

⊗ where δ denotes dilation on G2 Rd and ev = 1, v, v2 , the usual step-2 ξ exponential map. Observe that (ξ ) = e i is an independent, identically i d 2 distributed sequence of G R -valued random variables centred in the sense that E (π 1 (ξ i )) = Eξ i = 0 that the (π 1 is the projection from G2 Rd → Rd ). Let us also observe 2 d shortest path which connects the unit element 1 ∈ G R with eξ i is linear interpolation on Rd lifts to geodesic precisely etξ i , so that piecewise 2 d interpolation on G R . This suggests the following Donsker-type theorem: Theorem 13.33 Let (ξi) be a centred sequence of independent and identically distributed G2 Rd -valued random variables with ﬁnite moments of all orders, q ∀q ∈ [1, ∞) : E (ξ i ) < ∞,

Brownian motion

356

such that π 1 (ξ i ) has zero mean and unit variance, and consider the rescaled (n ) random walk deﬁned by W0 = 1 and (n ) Wt = δ n −1 / 2 ξ 1 ⊗ · · · ⊗ ξ[tn ] for nt = [nt] 1 2 for t ∈ 0, n , n , . . . , piecewise geodesically connected in between (n ) (n ) (n ) (i.e. Wt |[ i , i + 1 ] is a geodesic connecting Wi/n and W(i+1)/n ). Then, for n n any α < 1/2, W(n ) converges weakly to B, in C 0,α -H¨o l [0, 1] , G2 Rd . Proof. Following a standard pattern of proof, weak convergence follows from convergence of the ﬁnite-dimensional distributions and tightness (here in α-H¨older topology). Step 1: (Convergence of the ﬁnite-dimensional distributions) This is an immediate consequence of a central limit theorem on free nilpotent groups (see comments to this chapter). Step 2: (Tightness) We need to ﬁnd positive constants a, b, c such that for all u, v ∈ [0, 1], a " ! 1+b ≤ c |v − u| , sup E d Wv(n ) , Wu(n ) n

so that we can apply Kolmogorov’s tightness criterion (Corollary A.11 in Appendix A) to obtain tightness in γ-H¨older rough path topology, for any γ < b/a. Using basic properties of geodesic interpolation, we see that it is enough to consider u, v ∈ 0, n1 , n2 , . . . and then, of course, there is no loss of generality in taking [u, v] = [0, k/n] for some k ∈ {0, . . . , n}. It follows that what has to be established reads 1+b k 1 a E [ξ ⊗ · · · ⊗ ξ ] ≤ c , 1 1 k n na/2 uniformly over all n ∈ N and 0 ≤ k ≤ n, and with b/a arbitrarily close to 1/2. To this end, it is enough to show that for all p ∈ {1, 2, . . . }, " ! 4p = O k 2p , (∗) : E ξ 1 ⊗ · · · ⊗ ξk since we can then take a = 4p, b = 2p−1 and of course b/a = (2p − 1)/(4p) ↑ 1/2 as p ↑ ∞. Thus, the proof is ﬁnished once we show (∗) and this is the content of the last step of this proof. Step 3: Let P be a polynomial function on G2 (Rd ), i.e. a polynomial in a1;i , a2;ij where a = a1;i , a2;ij ; 1 ≤ i ≤ d, 1 ≤ i < j ≤ d ∈ g2 Rd is the log-chart of G2 (Rd ), g → a = log (g). We deﬁne the degree d◦ P by agreeing that monomials of the form 1;i α i 2;ij α i , j a a

13.5 Cameron–Martin theorem

357

have degree αi + 2 αi,j . An easy application of the Campbell–Baker– Hausdorﬀ formula reveals that T P : g → E (P (g ⊗ ξ)) − P (g) is also a polynomial function, of degree ≤ do P − 2. For instance, m T a → a2;ij m −1 m −2 1;k 2 is seen to contain terms a2;ij a and a2;ij , etc. (all of which are of degree 2m − 2). Now, for any p ∈ {1, 2, . . . }, 4p 4p a1;i + a1;ij 2p ∼ ea i

=

i< j

4p 1;ij 2p a1;i a + =: P (ea )

i

i< j

where P is a polynomial of degree 4p. Recalling the deﬁnition of the operator T and using independence, we have E [P (ξ1 ⊗ · · · ⊗ ξk )] = E E P ((ξ 1 ⊗ · · · ⊗ ξk −1 ) ⊗ ξ k ) | ξ1 , . . . , ξ k −1 = E T P (ξ1 ⊗ · · · ⊗ ξk −1 ) + P (ξ1 ⊗ · · · ⊗ ξk −1 ) = ··· = = =

(T + 1)k P (1) k T l P (1), l l≥0

but the function T P : g → E(P (g ⊗ ξ)) − P (g) is a polynomial function of degree at most d◦ P − 2 = 4p − 2. Hence, d◦ T l P ≤ d◦ P − 2l = 2(2p − l) and the above sum contains only a ﬁnite number of terms, more precisely E [P (ξ1 ⊗ · · · ⊗ ξk )] =

2p k l=0

l

T l P (1).

Since each of these terms is O(k 2p ), as k → ∞, we are done. Exercise 13.34 Generalize Theorem 13.32 to a random walk with E (ξ) = 0 and arbitrary non-degenerate covariance matrix.

13.5 Cameron–Martin theorem For the reader’s convenience, we state a general fact of Gaussian analysis, Theorem D.2 in Appendix D, in a Brownian context. Recall that the

Brownian motion

358

Cameron–Martin space for d-dimensional Brownian motion is given by (cf. Section 1.4.1) H =

W01,2

[0, T ] , R

d

=

·

h˙ t dt : h˙ ∈ L2 [0, 1] , Rd

(13.14)

0

> ? ˙ g˙ and has Hilbert structure given by !h, g"H = h,

L2

.

Theorem 13.35 Let B be a d-dimensional Brownian motion on [0, T ]. Let h ∈ H be a Cameron–Martin path. Then the law of B is equivalent to thelaw of Th (B) ≡ B + h. (These laws are viewed as Borel measures on C0 [0, T ] , Rd , denoted by W and (Th )∗ W ≡ Wh respectively.) In fact, dWh = exp dW

0

T

1 ˙ hdB − 2

T

2 ˙ ht dt .

0

α -H¨o l Almost2surely, d enhanced Brownian motion B has sample paths in C [0, T ] , G R for any α ∈ [0, 1/2). By interpolation, we also have that a.s. B takes values in the Polish space C 0,α -H¨o l [0, T ] , G2 Rd and in fact in the closed subspace of paths starting at the unit element of G2 . We view d 0,α -H¨o l 2 [0, T ] , G R -valued random B as a C variable; the law of B is then a Borel probability measure on C 0,α -H¨o l [0, T ] , G2 Rd .

Theorem 13.36 Let B be a G2 Rd -valued enhanced Brownian motion on [0, T ]. Let h ∈ H be a Cameron–Martin path. Then the law of Th (B) is equivalent to the law of B. Proof. We can assume that the underlying probability space is a Wiener space C0 [0, T ] , Rd equipped with Wiener measure W. In particular, Brownian motion is the coordinate map B (t, ω) = ω (t). The law of B is W and we write Wh for the law of B + h. From the Cameron–Martin h theorem, we know that the measures are equivalent, W ∼ W2 . Now, B d 0,α -H¨o l [0, T ] , G Rd . It is a measurable map from C [0, T ] , R → C is easy to see that (using Stratonovich calculus or, more elementary, the L2 -convergent Riemann–Stieltjes sum for the area) that B (· + h) = Th B a.s.

(13.15)

and hence the law of Th B is B∗ Wh , the usual short notation for the image measure of Wh under B. On the other hand, the law of B is B∗ W. Equivalence of measures implies equivalence of image measures, and we ﬁnd B∗ W ∼B∗ Wh . The proof is now easily ﬁnished. Let us elaborate a bit further on property (13.15).

13.6 Large deviations

359

Proposition 13.37 Let B = B (ω) be Rd -valued Brownian motion, realized as coordinate map on Wiener space, and B be the corresponding G2 Rd -valued enhancement, realized as thelimit of lifted piecewise linear approximations, say dα H¨o l;[0,T ] B, S2 B D n → 0 in probability. Then P ({ω : B (ω + h) ≡ Th B (ω) for all h ∈ H}) = 1. Proof. As in the proof of Theorem 13.36, we assume that Brownian motion is realized as the coordinate map on Wiener space, B (t, ω) = ω (t), under Wiener measure W. It is clear that (13.16) S2 B D n (ω + h) = S2 ω D n + hD n = Th D n S2 ω D n , where ω D , hD etc denote the piecewise linear approximations of the respective paths based on some dissection D. By passing to a subsequence, if necessary, we may assume that lim S2 B D n (ω) n →∞

(with respect to dα -H¨o l ) exists for W-almost surely ω, the limit being, by deﬁnition, the geometric α-H¨older rough path B (ω). Fixing such an ω, chosen from a set of full W-measure, we note that, for any h ∈ H, the sequence Dn S2 B (ω + h) : n = 1, 2, . . . is also convergent. Indeed, from (13.16) and basic continuity properties of the translation operator (h, x) → Th x we see that, always in α-H¨older rough path topology, S2 B D n (ω + h) → Th (B (ω)) as n → ∞. On the other hand, we have B (ω + h) = lim S2 B D n (ω + h) , thanks to the existence of the limit on the right-hand side and the very realization of B as the limit of lifted piecewise linear approximations. This, of course, allows us to identify B (ω + h) = Th (B (ω)) and we stress the fact that ω was chosen in a set of full measure independent of h. This concludes the proof.

13.6 Large deviations Let B denote d-dimensional standard Brownian motion. It is rather obvious that εB → 0 in distribution as ε → 0. The same can be said for enhanced

Brownian motion

360

Brownian motion B provided scalar multiplication by ε on Rd is replaced d 2 by dilation δ ε on G R , i.e. δ ε B → o in distribution as ε → 0. It turns out that, to leading order, the speed of this convergence can be computed very precisely. This is a typical example of a large deviations statement for sample paths. We assume in this section that the reader is familiar with the rudiments of large deviations as collected in the Appendix. Adopting standard terminology, the goal of this section is prove a large deviation principle for enhanced Brownian motion B in suitable rough path metrics. There is an obvious motivation for all this. The contraction principle will imply – by continuity of the Itˆo–Lyons maps and without any further work – a large deviation principle for rough diﬀerential equations driven by enhanced Brownian motion. Combined with the fact that RDEs driven by enhanced Brownian motion are exactly Stratonovich stochastic diﬀerential equations, this leads directly to large deviations for SDEs, better known as Freidlin–Wentzell estimates.

13.6.1 Schilder’s theorem for Brownian motion Let B denote d-dimensional standard Brownian motion on [0,T ]. If Pε ≡ (εB)∗ P denotes the law of εB, viewed as a Borel measure on C0 [0, T ] , Rd , the next theorem can be summarized in saying that (Pε )ε> 0 satisﬁes a large deviation principle on the space C0 [0, T ] , Rd with rate function I. (When no confusion arises, we shall simply say that (εB)ε> 0 satisﬁes a large deviation principle.) All subsequent large deviation statements will involve the good rate function (cf. Exercise 13.39) 1 2 !h, h"H if h ∈ H I (h) = +∞ otherwise where H denotes the Cameron–Martin space for B as deﬁned in (13.14). We now show that (εB)ε> 0 satisﬁes a large deviation principle in uniform topology with good rate function I. This is nothing other than a special case of the general large deviation result for Gaussian measures on Banach spaces, see Section D.2 in Appendix D. However, in an attempt to keep the present chapter self-contained, we include the following classical proof based on Fernique estimates.13 Theorem 13.38 (Schilder) Let B be a d-dimensional Brownian motion on [0, T ]. For any measurable A ⊂ C0 [0, T ] , Rd we have − I (A◦ ) ≤ lim inf ε2 log P [εB ∈ A] ≤ lim sup ε2 log P [εB ∈ A] ≤ −I A¯ . ε→0

ε→0

(13.17) Here, A◦ and A¯ denote the interior and closure of A with respect to uniform topology. 1 3 I(A)

= inf(I(h): h ∈ A).

13.6 Large deviations

361

Proof. For simplicity of notation we assume T = 1. We write C0 ([0, 1]) instead of C0 [0, 1] , Rd and assume d = 1 since the extension to d > 1 only involves minor notational changes. (Upper bound) Write x for a generic path in C0 ([0, 1]) and let xm denote the piecewise linear approximation of x interpolated at points in Dm = {i/m : i = 0, . . . , m}. Step 1: We deﬁne Um := !xm , xm "H . In other words,

1

2

|x˙ m t | dt = m

Um = 0

m xi/m − x(i−1)/m 2 . i=1

Under Pε = (εB)∗ P, the random variable Um is distributed like ε2 χ2 with m degrees of freedom and so ∞ 1 e−u /2 um /2−1 du. Pε [Um ≥ l] = m /2 2 Γ (m) l/ε 2 Therefore, for arbitrary m, l we have lim sup ε2 log Pε (Um > l) ≤ −l/2. ε→0

For G open and non-empty, l := inf {!h, h"H : h ∈ G ∩ H} < ∞ and so Pε [xm ∈ G] = Pε [xm ∈ G ∩ H] ≤ Pε [!xm , xm "H ≥ l] . From the preceding tail estimate on Um = !xm , xm "H it plainly follows that 1 lim sup ε2 log Pε [xm ∈ G] ≤ −l/2 = − I (G) . 2 ε→0 Step 2: We ﬁx α ∈ (0, 1/2). From our Fernique estimates (Corollary 13.14), Z := 2 |B|α -H¨o l;[0,1] has a Gauss tail and so there exists c1 > 0 such that % α & ! " ! " 1 ≥δ/ε Pε |xm −x|∞;[0,1] ≥δ = P |B m −B|∞;[0,1] ≥δ/ε ≤mP Z m m 2 α α exp −c1 (m δ/ε) . ≤ mP [Z ≥ m δ/ε] ≤ c1 This shows that piecewise linear approximations are exponentially good in the sense ! " lim sup ε2 log Pε |xm − x|∞;[0,1] ≥ δ ≤ −c1 m2α δ 2 → −∞ as m → ∞. ε→0

(13.18) Step 3: Write B (y, δ) ≡ {x ∈ C0 ([0, 1]) : |x − y|∞ < δ}. Given a closed set F , its open δ-neighbourhood F δ is deﬁned as ∪ {B (y, δ) : y ∈ F }. Clearly, ! " Pε (F ) ≤ Pε xm (.) ∈ F δ + Pε |xm − x|∞;[0,1] ≥ δ

Brownian motion

362

and by combining the estimates obtained in the ﬁrst two steps we see that lim sup ε2 log Pε (F ) ≤ max −I F δ , −c1 m2α δ . ε→0

Now let m → ∞ and then δ → 0. The convergence I F δ → I (F ) is standard, see Lemma C.1. (Lower bound) It is enough to consider an open ball of ﬁxed radiusδ centred at some h ∈ H. Deﬁne Z = B − ε−1 h and Aε = B ∈ B 0, δε−1 . By the Cameron–Martin theorem, Theorem 13.35, P [εB ∈ B (h, δ)] = P Z ∈ B 0, δε−1 % 1 2 & 1 1 1˙ ˙ ht tdBt − 2 · E exp − ht dt ; Aε ε 0 2ε 0 1 1˙ −I (h)/ε 2 = e E exp − ht dBt ; Aε ε 0 & % 2 1 1˙ = e−I (h)/ε E exp − ht dBt Aε P (Aε ) ε 0

−I (h)/ε 2

≥ e

−I (h)/ε 2

P (Aε ) = e

(1 + o (1)) .

In the last line we used symmetry (B and −B having identical distribu

1 tions implies E 0 gdB|Aε = 0 for all deterministic integrands g such as ˙ and Jensen’s inequality −ε−1 h), E

%

1

exp 0

& gdBt Aε ≥ exp E

1 0

gdB Aε = 1.

Exercise 13.39 Show that h →

!h, h"H if h ∈ H +∞ otherwise 1 2

is a good rate function. (Hint: Compactness of level sets follows from equicontinuity and Arzela–Ascoli).

13.6.2

Schilder’s theorem for enhanced Brownian motion

Let Φm : x ∈ C [0, T ] , Rd → xD m denote the piecewise linear approximation map along the dissection Dm given by {iT /m : i = 0, . . . , m}. Clearly, Φm (εB) satisﬁes a large deviation principle, as can be seen from elementary m-dimensional Gaussian considerations, or Schilder’s theorem and the contraction principle applied to the continuous (linear) map Φm . We have seen

13.6 Large deviations

363

in Section 13.3.3 that, for any α ∈ [0, 1/2), there exist positive constants C = Cα ,T , η > 0 such that for all q ∈ [1, ∞), η √ 1 dα -H¨o l;[0,T ] (S2 (Φm (B)) , B) q ≤ C q . L (P) m

(13.19)

As an almost immediate consequence, we see that piecewise linear approximations are exponentially good in the following sense. Lemma 13.40 For any δ > 0 and α ∈ [0, 1/2) we have lim lim sup ε2 log P dα -H¨o l;[0,T ] (S2 ◦ Φm (εB) , δ ε B) > δ = −∞.

m →∞

ε→0

η Proof. We deﬁne αm = C m1 . Using inequality (13.19), we estimate δ P dα -H¨o l;[0,1] (S2 (Φm (εB)) , δ ε B) > δ = P dα -H¨o l;[0,T ] (S2 (Φm (B)) , B) > ε −q δ √ q q ≤ q αm ε ! ε √ " αm q , ≤ exp q log δ and after choosing q = 1/ε2 we obtain, for ε small enough, α m . ε2 log P dα -H¨o l;[0,1] (S2 (Φm (εB)) , δ ε B) > δ ≤ log δ Now it suﬃces to take the lim sup with ε → 0 and note that log (αm /δ) → −∞ as m → ∞. We also need the following (uniform) continuity property on level sets of the rate function. As will be seen in the proof below, this is a consequence of 1/2 (13.20) |h|1-var;[s,t] ≤ |t − s| |h|H and general continuity properties of the lifting map SN in variation metrics. We recall that the good rate function I is deﬁned by 1 2 !h, h"H if h ∈ H I (h) = +∞ otherwise.

Lemma 13.41 For all Λ > 0 and α ∈ [0, 1/2) we have sup {h:I (h)≤Λ}

dα -H¨o l;[0,T ] (S2 (Φm (h)) , S2 (h)) → 0 as m → ∞.

Brownian motion

364

Proof. Without loss of generality, we take T = 1. First observe that S2 (Φm (h))1-var;[s,t]

≤

|Φm (h)|1-var;[s,t]

≤

|h|1-var;[s,t] √ 1/2 2Λ |t − s| .

≤

Hence, we see that interpolation allows us to restrict ourselves to the case α = 0. Furthermore, Proposition 8.15 allows us to actually replace dα -H¨o l by d∞ . Then, we easily see that m −1 d∞ (S2 (Φm (h)) , S2 (h)) ≤ max d S2 (Φm (h)) i , S2 (h) i m m i=0 m −1 + max S2 (Φm (h))0; [ i , i + 1 ] m m i=0 + S2 (h)0; [ i , i + 1 ] . m

m

Clearly, S2 (Φm (h))0; [ i , i + 1 ] m m

≤

S2 (Φm (h))1-var; [ i , i + 1 ] m m √ −1/2 ≤ 2Λm ,

and similarly, S2 (h)0;[t i ,t i + 1 ] ≤

√ 2Λm−1/2 .

Then, because Φm (h) i = h mi , using equivalence of homogenous norms we m have 1/2 i−1 m −1 m −1 max d S2 (Φm (h)) i , S2 (h) i ≤ c1 max π 2 ◦ S2 (h) j , j + 1 m m m m i=0 i=0 j =0  1/2 m −1 ≤ c1  π 2 ◦ S2 (h) j , j + 1  m



j =0

m −1

≤ c2 

m

1/2 2 |h|1-var; [ j , j + 1 ]  m m

j =0

m −1 1/2 ≤ c2 |h|1-var;[0,1] max |h|1-var; [ j i=0

≤ c3 Λ

1/2

m

−1/4

.

In particular, we see that sup {h:I (h)≤Λ}

d∞ (S2 (Φm (h)) , S2 (h)) ≤ c4 Λ1/2 m−1/4 ,

which concludes the proof.

m

1/2 ,

j+1 m

]

13.6 Large deviations

365

Theorem 13.42 For any α ∈ [0, 1/2), the family (δ ε B : ε > 0) satisﬁes a large deviation in homogenous α-H¨ older topology. More precisely, viewing 0,α -H¨o l [0, T ] , G2 := (δ B) P as a Borel measure on the Polish space (C P 0 ε d ε ∗ R , dα -H¨o l ), the family (Pε : ε > 0) satisﬁes a large deviation principle on this space with good rate function, deﬁned for x ∈ C00,α -H¨o l [0, T ], G2 Rd , given by 1 J (x) = !π 1 (x) , π 1 (x)"H if π 1 (y) ∈ H. 2 Proof. We once again assume T = 1 without loss of generality. We know from Section 13.3.2 that B is the almost sure dα -H¨o l limit of the lifted piecewise linear approximation, based on the dyadics dissections Dn = (i/2n : i = 0, . . . , 2n ) for instance. We may assume that the underlying probability space is the usual d-dimensional Wiener space C0 [0, 1] , Rd , so that P is a Wiener measure and B (ω) = ω. At the price of modifying B on a set of probability zero, we can and will assume that B (ω) := lim S2 B D n (ω) with respect to dα -H¨o l n →∞

(arbitrarily deﬁned on the null set where this limit does not exist) so that B is well-deﬁned on H ⊂ C 1-var and coincides with the map h → S2 (h), based on Riemann–Stieltjes integration.14 We approximate the measurable map B (·) by ω ∈ C0 [0, 1] , Rd → S2 (Φm (ω)) ∈ C00,α -H¨o l [0, 1] , G2 Rd , which is a continuous map (for ﬁxed m) as is easily seen from continuity of the two maps C0 [0, 1] , Rd → Φm (ω) ∈ C00,1-H¨o l [0, 1] , Rd , x ∈ C01-H¨o l [0, 1] , Rd → S2 (x) ∈ C01-H¨o l [0, 1] , G2 Rd .

ω

∈

The extended contraction principle, Section C.2 in Appendix C, implies the required large deviation principle for enhanced Brownian motion provided we check (i) exponential goodness of these approximations and (ii) a (uniform) continuity property on level sets of the rate function. But these properties were the exact content of Lemmas 13.40 and 13.41 above. It should be noted that the proof of Theorem 13.42 uses few speciﬁcs of (enhanced) Brownian motion and only relies on reasonably good (“Gaussian”) estimates of piecewise linear approximations and some regularity of generally, the map h → lim n →∞ S N h D n is well-deﬁned on C ρ -va r , ρ < 2, and coincides with the step-N Young lift of h. 1 4 More

Brownian motion

366

the Cameron–Martin space: (13.19), (13.20). Indeed, as will be discussed in Section 15.7, an almost identical proof carries through in a general Gaussian (rough path) setting. We also note that it would be suﬃcent to prove Theorem 13.42 in uniform topology, i.e. for α = 0, by appealing to the so-called inverse contraction principle, Section C.2 in Appendix C. We have Proposition 13.43 Assume Theorem 13.42 holds for α = 0. Then it also holds for any α ∈ [0, 1/2). Proof. By the inverse contraction principle all we have to do is check that {δ ε B} is exponentially tight in α-H¨older topology. But this follows from the compact embedding of C α -H¨o l [0, T ] , G2 Rd → C 0,α -H¨o l [0, T ] , G2 Rd and Gauss tails of Bα -H¨o l where α < α < 1/2, i.e. ∃c > 0 : P [Bα -H¨o l > l] ≤ exp −cl2 . Indeed, deﬁning the following precompact sets in α-H¨older topology, 5 KM = x : |x|α -H¨o l ≤ M/c , exponential tightness follows from ! " 5 ε2 log P δ ε Bα -H¨o l > M/c 0 / C M 2 ≤ −M. = ε log P Bα -H¨o l > cε2

c )] = ε2 log [Pε (KM

Exercise 13.44 (Schilder for EBM via Itˆ o calculus) The purpose of this exercise is to give a direct proof of Theorem 13.42 using martingale techniques. Thanks to Proposition 13.43, we only need to consider the uniform topology. (i) Deﬁne the so (d)-valued approximations to L´evy’s area process, Am t

1 := 2

t

B[m s]/m ⊗ dBs − 0

t

dBs ⊗ B[m s]/m . 0

Use the fact that t → At is a martingale to show that they give rise to exponentially good approximations to {δ ε B}: lim limε→0 ε2 log P d∞;[0,T ] exp εB + ε2 Am , δ ε B ≥ δ = −∞. m →∞

13.7 Support theorem

367

t

t m (ii) Deﬁne A (h)t = 12 0 h[m s]/m ⊗ dhs − 0 dhs ⊗ h[m s]/m for any h ∈ H. Show that for all Λ > 0, lim

sup

m

m →∞ {h∈H:I (h)≤Λ}

d∞;[0,T ] (exp(h + A (h) ), S2 (h)) = 0.

(iii) Deduce a large deviation principle for enhanced Brownian motion in B)∗ P viewed as a uniform topology. More precisely, show that Pε = (δ ε Borel measure on the Polish space C0 [0, T ] , G2 Rd , d∞ satisﬁes a large deviation principle with good rate function J (y) = I (π 1 (y)). Exercise 13.45 The purpose of this exercise is to give a direct proof of Theorem 13.42 using Markovian techniques. Again, thanks to Proposition 13.43, it suﬃces to consider the uniform topology. Let p (t, x, y) denote the transition density for enhanced Brownian motion seen as a Markov process on G2 Rd . Use Varadhan’s formula (cf. Section E.5 in Appendix E 2

lim 2ε log p (ε, x, y) = −d (x, y)

ε→0

and the fact that G2 Rd is a geodesic space to establish a large deviation principle for enhanced Brownian motion in uniform topology. Exercise 13.46 (Strassen’s law) Let B denote enhanced Brownian motion B on [0, 1]. Establish the following functional version of the law of iterated 5 logarithm for B in α-H¨ older (rough path) topology, α < 1/2: let ϕ (h) = h ln ln (1/h) for h small enough, and show that t ∈ [0, 1] → δ ϕ (1h ) Bh· (ω) is compact as a random variable with values in C 0,α -H¨o l [0, 1] , G2 relatively Rd with the compact set of limit points as h → 0 given by S2 (K) where √ K = {h ∈ H : |h|H ≤ 2}.

13.7 Support theorem 13.7.1

Support of Brownian motion

Almost surely, the d-dimensional Brownian motion B ∈ C α -H¨o l [0, T ] , Rd for α ∈ [0, 1/2), and hence we alsohave that B belongs almost surely to the Polish space C 0,α -H¨o l [0, T ] , Rd , and in fact in the closed subspace of paths started at 0, C00,α -H¨o l [0, T ] , Rd . B can then be viewed as a C00,α -H¨o l valued random variable and its law of B is a Borel probability measure on C00,α [0, T ] , Rd .

368

Brownian motion

Deﬁnition 13.47 Let µ be a Borel probability measure on some Polish space (E, d). The (topological) support of µ is the smallest closed set of full measure. We recall that H = W01,2 [0, T ] , Rd denotes the Cameron–Martin space for Brownian motion. Let us also recall (cf. Theorem 13.35) that, for any h ∈ H, the law of Th (B) = B+h is equivalent to the law of B. Let us record 1/2-H¨o l , and α < 1/2, it is some simple properties of Th . Thanks to H → C 0,α d [0, T ] , R into itself and bijective with inverse a continuous map of C0 T−h . In particular, the image of any open sets under Th is again open. Corollary 13.48 Let h be a Cameron–Martin path and x ∈ C 0,α ([0, T ] , Rd . Then if x belongs to the support of the law of B, so does Th (x) . Proof. Write N (x) for all open neighbourhoods of x. To show that Th (x) is in the support, it suﬃces to show that ∀V ∈ N (Th (x)) : P (B ∈ V ) > 0. Fix V ∈ N (Th (x)). By continuity, there exists U ∈ N (x) so that Th (U ) ⊂ V . From the above remark, Th (U ) ∈ N (Th (x)). Thus P (B ∈ V )

≥ P (B ∈ Th (U )) = P (T−h B ∈ U )

and from Cameron–Martin the last expression is positive if and only if P (B ∈ U ) is positive. But this is true since U ∈ N (x) and x is in the support. Theorem 13.49 Let α ∈ (0, 1/2). The topological support of the law of Brownian motion on [0, T ] in α-H¨ older topology is precisely C00,α [0, T ] ; Rd . Proof. Almost surely, B (ω) ∈ C00,α [0, T ] ; Rd which is closed in αH¨ older topology. Therefore, the support of the law of B is included in C00,α [0, T ] ; Rd . Vice-versa, the support contains (trivially!) one point, say x ∈ C00,α [0, T ] ; Rd . From the (deﬁning) properties of the space Co0,α , there are smooth paths {xn } with xn (0) = 0 so that x − xn = T−x n (x) → 0 in α-H¨older topology. Any such xn is a Cameron–Martin path and so T−x n (x) ∈ support for all n. By deﬁnition, the support is closed (in α-H¨older topology) and therefore 0 ∈ support. But then any translate Th (0) = h belongs to the support, for

13.7 Support theorem

369

all Lipschitz (in paths h. Since Lipschitz paths are fact, Cameron–Martin) dense in C 0,α [0, T ] ; Rd , taking the closure yields C 0,α [0, T ] ; Rd ⊂ supp (law of B) .

13.7.2 Recalls on translations of rough paths We just used the translation map Th (x) = x + h for Rd -valued paths x and h. Assume both x and h are Lipschitz, started at 0, and consider the step-2 lift: x ≡ S2 (x), and Th (x) ≡ S2 (Th (x)). From deﬁnition of S2 , · · Th (x) = 1 + x1 + h + x2 + x ⊗ dh + h ⊗ dx + S2 (h) . 0

0

The following proposition is an easy consequence of the results of Section 9.4.6. Proposition 13.50 Let α ∈ (1/3, 1/2]. The mapx → Th (x) can be ex tended to a continuous map of C 0,α [0, 1] , G2 Rd into itself and Th also denotes this extension. It is bijective with inverse T−h . In particular, the image of any open set under Th is again open.

13.7.3

Support of enhanced Brownian motion

Following Section 13.5 we recall that B∗ P, the law of B, can be viewed as a Borel probability measure on C 0,α [0, T ] , G2 Rd , α ∈ [0, 1/2). Moreover, we saw that the law of Th (B) is equivalent to the law of B when h ∈ H is a Cameron–Martin path. As a consequence, we have Proposition 13.51 Let h ∈ H be a Cameron–Martin path and x ∈ supp (B∗ P). Then Th (x) ∈ supp (B∗ P) . Proof. With the properties of x → Th (x) we established in Proposition 13.50 and (law of B) ∼ (law of Th (B)) , the proof given earlier for Brownian motion (Corollary 13.48) adapts with no changes. Lemma 13.52 Let α ∈ (0, 1/2). There exist x ∈ supp (P∗ B) and (xn ) ⊂ H so that T−x n xα -H¨o l;[0,T ] → 0 as n → ∞. Remark 13.53 Note that T−B n Bα -H¨o l;[0,T ] → 0 does not follow as a deterministic consequence from dα -H¨o l;[0,T ] (B, S2 (Bn )) → 0.

Brownian motion

370

Proof. If B n denotes the piecewise linear approximation based on a nested sequence of dissections, we saw that S (B n ) → B a.s. (pointwise) with uniform α-H¨ older bounds. In fact, the essential observation was that % t & t ˜ n = ˜n E βdβ|F β n dβ

0

0

˜ : t ∈ Dn . The arguments given in Section 13.3.2 also where Fn = σ β t , β t give % t & t ˜ ˜ ˜ E βdβ σ (β t : t ∈ Dn ) ∨ σ(β t : t ∈ [0, T ] ) = β n dβ, 0 0 % t & t ˜ σ (β : t ∈ [0, T ]) ∨ σ(β ˜ : t ∈ Dn ) ˜n . E βdβ = βdβ t t 0

0

Both integrals on the right-hand side make sense as Riemann–Stieltjes integrals, and T−B n B → 0 a.s. (pointwise) with uniform α-H¨ older bounds. The usual interpolation ﬁnishes the proof. Indeed, we could have started with α ˜ ∈ (α, 1/2), got uniform α ˜ -bounds and used interpolation to obtain T−B n B → 0 in α-H¨older topology. This statement holds a.s. and we can take any x = B (ω) for ω in a set of full measure. Theorem 13.54 Let α ∈ (0, 1/2). The topological support of the law of G2 Rd -valued enhanced Brownian motion on [0, T ] with respect to dα -H¨o l is precisely C00,α [0, T ] ; G2 Rd . Proof. Thanks to Proposition 13.51 and Lemma 13.52, the argument is the same as that for d-dimensional Brownian motion, as given in the proof of Theorem 13.49.

13.8 Support theorem in conditional form 13.8.1 Brownian motion conditioned to stay near the origin We want to condition d-dimensional standard Brownian motion B to stay ε-close to the origin over the time interval [0, 1]. In other words, we want to condition with respect to the event @   A d   A 2 Bti < ε . sup B (13.21) |B|∞;[0,1] < ε = t∈[0,1]  i=1

13.8 Support theorem in conditional form

371

Despite the equivalence of norms on Rd , Brownian motion does care how it is conﬁned and it is important that we use the Euclidean norm on Rd . (See Proposition 13.7, which we shall use below.) From Theorem 13.49, we know that |B|∞;[0,1] < ε has positive probability, but the next lemma gives a precise quantitative bound. Lemma 13.55 Let λ > 0 denote the lowest eigenvalue of − 12 ∆ with Dirichlet boundary conditions on ∂B (0, 1), the boundary of the Euclidean unit ball. Then there exists a constant C > 0 such that ! " t 1 exp −λ 2 ≤ P |B|∞;[0,t] < ε . (13.22) C ε Proof. By Brownian scaling it suﬃces to consider ε = 1. Let pt (x, y) denote the Dirichlet heat-kernel for B (0, 1). Then, " ! pt (0, y) dy. P |B|∞;[0,t] < 1 = B (0,1)

Recall that the lowest eigenvalue is simple and that the (up to multiplicative constants unique) eigenfunction ψ (·) corresponding to λ can be taken positive,15 continuous (in fact, smooth16 ) and L2 -normalized so that ψ 2 (z) dz = 1. B (0,1)

In particular,

ψ (y) = eλ

p1 (y, z) ψ (z) dz B (0,1)

1 ≤ ≤

e

ψ 2 (z) dz

p1 (y, z) dy

e

λ

1 2

λ

5

B (0,1)

B (0,1) −d/4

p2 (y, y) ≤ e (4π) λ

and the proof is ﬁnished with the estimate 0 < ψ (0) = eλt pt (0, y) ψ (y) dy B (0,1) −d/4 λ(t+1) ≤ (4π) e pt (0, y) dy. B (0,1)

We shall need to complement Lemma 13.55 with an upper estimate and write Px to indicate that Brownian motion B is started at B (0) = x. 1 5 See, 1 6 This

for example, [71, Theorem 8.38]. follows from standard elliptic regularity theory.

Brownian motion

372

Lemma 13.56 Let λ > 0 denote the lowest eigenvalue of − 12 ∆ with Dirichlet boundary conditions on the Euclidean unit ball B (0, 1). Then there exists a constant C > 0 such that ! " t (13.23) sup Px |B|∞;[0,t] < ε < C exp −λ 2 . ε x∈B (0,1) Proof. Again, by Brownian scaling it suﬃces to consider ε = 1. Let x ∈ B (0, 1), the Euclidean ball B (0, 1) ⊂ Rd . From symmetry considerations (cf. Exercise 13.57 below) we see that ! ! " " pt (0, x) dx, Px |B|∞;[0,t] < 1 ≤ P0 |B|∞;[0,t] < 1 = B (0,1)

where pt (·, ·) denotes the Dirichlet heat-kernel for B (0, 1). Using pt (x, y) = pt (y, x) and writing Pt for the associated semi-group on L2 (B (0, 1)), we have pt (0, x) dx = pt−1 (x, y) p1 (0, y) dy dx B (0,1)

B (0,1)

≤ ≤ ≤ =

B (0,1)

5 |B (0, 1)| pt−1 (·, y) p1 (0, y) dy B (0,1) 2 L (B (0,1)) 5 |B (0, 1)| |Pt−1 p1 (0, ·)|L 2 (B (0,1)) 5 |B (0, 1)|e−(t−1)λ |p1 (0, ·)|L 2 (B (0,1)) 5 5 |B (0, 1)|e−(t−1)λ p2 (0, 0),

as required. Exercise 13.57 Let B denote Brownian motion on Rd equipped with Euclidean distance. Show that for all x ∈ B (0, 1), ! ! " " Px |B|∞;[0,t] < 1 ≤ P0 |B|∞;[0,t] < 1 . (Hint: Use symmetry). We can now deﬁne the conditional probabilities Pε (•) := P • | |B|∞;[0,1] < ε . (Since the conditioning event has positive probability, this notion is elementary.) Lemma 13.58 (increments over small times) There exists C > 0 such that for all R > 0 and 0 < ε < 1, / 2 0 R B 1 s,t Pε ∃0 ≤ s < t ≤ 1, |t − s| < ε2 : . α > R ≤ C exp − C ε1−2α |t−s|

13.8 Support theorem in conditional form

373

Proof. Step 1: Suppose there exists a pair of times s, t ∈ [0, 1] such that Bs,t α > R. |t − s| E D Then there exists a k ∈ {1, . . . , 1/ε2 } so that [s, t] ⊂ (k−1) ε2 , (k+1) ε2 . In particular, the probability that such a pair of times exists is at most s < t, |t − s| < ε2 and

2 *1/ε +

Pε Bα -H¨o l;[(k −1)ε 2 ,(k +1)ε 2 ] > R .

(13.24)

k =1

We will see in step 2 below that each term in this sum is exponentially small with ε, namely bounded by / 2 0 1 R ε (13.25) P Bα -H¨o l;[(k −1)ε 2 ,(k +1)ε 2 ] > R ≤ C exp − C ε1−2α E D for some positive constant C. Since there are only 1/ε2 terms in this sum, it suﬃces to make C slightly bigger to control the entire sum and this ﬁnishes the proof of Lemma 13.58, subject to proving (13.25). Step 2: We now show that for any T1 < T2 in [0, 1] with T2 − T1 ≤ 2ε2 we have / 2 0 R 1 ε . P ||B||α ;[T 1 ,T 2 ] > R < C exp − C ε1−2α (Applied to T1 = (k − 1) ε2 , T2 = (k + 1) ε2 we will then obtain the estimate (13.25), as desired.) Writing out the very deﬁnition of Pε leads immediately to Pε ||B||α ;[T 1 ,T 2 ] > R P0 ||B||α -H¨o l;[T 1 ,T 2 ] > R; |B|0;[0,T 1 ] < ε; |B|0;[T 2 ,1] < ε ! " ≤ P0 |B|0;[0,1] < ε By using the Markov property, this equals ! " E0 PB (T 2 ) |B|0;[0,1−T 2 ] < ε ; ||B||α -H¨o l;[T 1 ,T 2 ] > R; |B|0;[0,T 1 ] < ε P0 B0;[0,1] < ε ! " −2 −2 ≤ c1 eλε E0 e−λ(1−T 2 )ε ; ||B||α -H¨o l;[T 1 ,T 2 ] > R; |B|0;[0,T 1 ] < ε , where c1 is the product of the respective multiplicative constants of Lemmas 13.55 and 13.56. Using independence of (enhanced) Brownian increments,

Brownian motion

374

see Proposition 13.11, the last equation line is −2 = c1 eλT 2 ε P0 ||B||α -H¨o l;[0,T 2 −T 1 ] > R P0 |B|0;[0,T 1 ] < ε −2 ≤ c2 eλ(T 2 −T 1 )ε P0 ||B||α -H¨o l;[0,T 2 −T 1 ] > R as another application of Lemma 13.55. Using T2 −T1 ≤ 2ε2 this expression is bounded by ≤ c2 e2λ P0 ||B||α -H¨o l;[0,2ε 2 ] > R 0 / 1 R2 2λ ≤ c2 e c3 exp − c3 (2ε2 )1−2α where we used scaling and Fernique estimates for enhanced Brownian motion in the last step. The proof is now ﬁnished. Lemma 13.59 (increments over large times) There exists C > 0 such that for all R > 0 and 0 < ε small enough, namely such that (13.27) is satisﬁed, 2 2 R B 1 s,t . Pε ∃0 ≤ s < t ≤ 1, |t−s| ≥ ε2 : α > R ≤C exp − C ε1−2α |t − s| Proof. Let us ﬁrst recall Lipschitz equivalence of homogenous norms, Bs,t ∼ |Bt − Bs | ∨

|As,t |.

We can thus establish Lemma 13.59 by estimating 5 |As,t | |Bt − Bs | ε 2 P ∃s, t ∈ [0, 1] , |t − s| ≥ ε : α ∨ α >R |t − s| |t − s| 5 |A | s,t ≤ Pε ∃s, t ∈ [0, 1] , |t − s| ≥ ε2 : 2ε1−2α ∨ (13.26) α >R . |t − s| Two observations are of assistance. First, upon assuming ε small enough, namely such that (13.27) 2ε1−2α < R, we have

5 2ε

1−2α

∨

|As,t (ω)| > R ⇐⇒ α |t − s|

5

|As,t (ω)| > R. α |t − s|

Second, |B|∞;[0,1] ≤ ε implies that the area increments become “almost” additive. More precisely, cf. (13.7), As,t = (At − As ) −

1 [Bs , Bs,t ] 2

=⇒

|As,t | ≤ |At − As | + 2ε2

13.8 Support theorem in conditional form

375

and it follows that |As,t | |t − s|

2α

|At − As |

≤

2−4α 2α + 2ε |t − s| |At − As | R2 α + 2α 2 ε |t − s|

≤

where we used |t − s| ≥ ε2 and (13.27). Putting things together shows that (13.26) is = P

ε

|As,t |

∃s, t ∈ [0, 1] , |t − s| ≥ ε : 2

>R

2α

|t − s| |At − As | R2 ε ≤ P ∃s, t ∈ [0, 1] : 2α α > 2 ε |t − s|

2

and it is of course enough to consider

|Zt − Zs | R2 α > 2α 2 ε |t − s| 2 R 2α ε > 2

∃s, t ∈ [0, 1] :

Pε = Pε

|Z|α -H¨o l;[0,1]

(13.28)

where Z is one component of the L´evy’s area, Zt ≡

Ai,j t

1 = 2

t

B dB − i

j

t j

B dB

0

i

0

for ﬁxed i = j in {1, . . . , d}. Proposition 13.7 tells us that there exists a 1-dimensional Brownian motion, say W , such that 1 Zt = W (a (t)) with a (t) := 4

t

(Bsi )2 + (Bsj )2 ds

0

and W is independent of the process (B·i )2 + (B·j )2 and so independent older norm of Z we use a basic fact of |B|∞;[0,1] . In order to control the H¨ about composition of H¨ older functions, α |f ◦ g|(α β )-H¨o l ≤ |f |α -H¨o l |g|β -H¨o l , with the remark that |f |α -H¨o l can be replaced by the α-H¨ older norm of f restricted to the range of g. Applying this to W ◦ a yields α |Z|α -H¨o l;[0,1] ≤ |W |α -H¨o l;[0,a(1)] |a|Lip;[0,1] .

Brownian motion

376

On the conditioning event |B|∞;[0,1] ≤ ε we have both a (1) and |a|Lip;[0,1] ≤ ε2 /4 ≤ ε2 and so we can continue to estimate (13.28): R2 2α R2 ε ≤ Pε |W |α -H¨o l;[0,ε 2 ] > Pε |Z|α -H¨o l;[0,1] > 2 2 2 R = P |W |α -H¨o l;[0,ε 2 ] > 2 R2 = P ε1−2α |W |α -H¨o l;[0,1] > . 2 From the second to the third line above, when replacing Pε by P, we crucially used that W is independent of (B·i )2 + (B·j )2 and so independent of the radial process (B· ) and in particular of |B|∞;[0,1] . The proof of Lemma 13.59 is then ﬁnished with Fernique estimates, i.e. Gaussian integrability for the α-H¨ older norm of W . Remark 13.60 Fix any s < t in [0, 1] with the property that t − s ≥ ε2 . Then Bs,t Bs,t ε ε 1−2α > R ≤ P > Rε P α 1/2 |t − s| |t − s| 1 for R > 2. ≤ C exp − R4 C The last estimate comes from Lemma 13.59, applied with Rε1−2α instead of R and noting that condition (13.27) is satisﬁed for R > 2. We are now able to state the main result of this section. Theorem 13.61 Let α ∈ [0, 1/2). Then, for any δ > 0, lim Pε Bα -H¨o l;[0,1] > δ = 0. ε→0

Proof. An obvious consequence of Lemmas 13.58 and 13.59. Exercise 13.62 (i) Let ϕ be a ﬁxed increasing function such that ϕ (h) 5 = h log (1/h) in a positive neighbourhood of 0. Prove that there exists η > 0 such that sup Eε exp η Bϕ-H¨o l,[0,1] < ∞, ε∈(0,1)

where Bϕ-H¨o l;[0,T ] = sups,t∈[0,T ] d (Bs , Bt ) /ϕ (t − s). (ii) Show that there exists η > 0 such that 2 sup Eε exp η Bψ 2 , 1 -var;[0,1] < ∞. ε∈(0,1)

13.8 Support theorem in conditional form

377

Hint: It suﬃces to establish ∃η > 0 : sup

sup E

ε

ε∈(0,1] s,t∈[0,T ]

2

Bs,t exp η |t − s|

< ∞.

The case t − s ≤ ε2 follows from the argument in the proof of Lemma 13.58, line by line with s, t instead of T1 , T2 and ||B||α -H¨o l;[0,T 2 −T 1 ] replaced 1/2

by Bs,t / |t − s| P

; also noting that

Bs,t

ε

1/2

|t − s|

>R

≤ (const)e P λ

Bs,t 1/2

|t − s|

>R .

13.8.2 Intermezzo on rough path distances Recall that Rd -valued path started at 0 into d S2 maps a suﬃciently regular

· 2 the G R -valued path 1 + h (·) + 0 h ⊗ dh. We then have Proposition 13.63 Let X = exp (X + A) ∈ C0α -H¨o l [0, 1] , G2 Rd and h ∈ C01-var [0, 1] , Rd . Then there exists a constant C such that d Xs,t , S2 (h)s,t − (T−h X)s,t ≤ C

α

|X − h|α -H¨o l |h|1-var;[s,t] |t − s| .

In particular, when h ∈ H ≡ W01,2 [0, 1] , Rd this implies |dα -H¨o l (X, S2 (h)) − T−h Xα -H¨o l | ≤ C

|h|H |X − h|α -H¨o l .

Proof. By symmetry of the Carnot–Caratheodory norm and the triangle inequality it follows that |a − b| = a−1 − b ≤ d a−1 , b = a ⊗ b . We apply this with (Ah denotes the area associated to h)

1 1 −1 1 h a = Xs,t ⊗(S2 (h))s,t = exp −Xs,t +hs,t − As,t +As,t − Xs,t −hs,t , hs,t 2 and b = (T−h X)s,t

1 t 1 h = exp Xs,t − hs,t + As,t − As,t − [hs,· , d (X − h)] 2 s 1 t − [Xs,· − hs,· , dh] . 2 s

Brownian motion

378

By the Campbell–Baker–Hausdorﬀ formula, noting the cancellation of the indicated terms, 1 t 1 t 1 1 −hs,t , hs,t − [hs,· , d (X−h)] − [Xs,· −hs,· , dh] a⊗b= exp − Xs,t 2 2 s 2 s and it follows that d Xs,t , S2 (h)s,t − (T−h X)s,t is less than or equal to a constant times t 1/2 t 1/2 1 Xs,t − hs,t , hs,t 1/2 + . [h , d (X − h)] + [X − h , dh] s,· s,· s,· s

s

The ﬁrst term is estimated via 1 α Xs,t − hs,t , hs,t ≤ 2 |X − h| α -H¨o l |h|1-var;[s,t] |t − s| =: ∆, for the third we note that t (X − h ) ⊗ dh s,· s,· r s

≤ |h|1-var;[s,t] sup |Xs,r − hs,r | r ∈[s,t]

α

≤ |h|1-var;[s,t] |X − h|α -H¨o l |t − s|

t so that s [Xs,· − hs,· , dh] is also bounded by ∆ and a similar bound is obtained for the middle term after integration by parts. The ﬁnal statement comes from 1 |h|1-var;[s,t] ≤ |h|H |t − s| 2 .

13.8.3 Enhanced Brownian motion under conditioning We now condition d-dimensional standard Brownian motion B to stay εuniformly close to a given path h over the time interval [0, 1] and ask what happens to enhanced Brownian motion B when ε → 0. To this end, let us write out the conditioning event in more detail, @   A d   A 2 Bti − hit < ε sup B |B − h|∞;[0,1] < ε = t∈[0,1]  i=1

and introduce the notation Pε,h (•) = P

•|

|B − h|∞;[0,1] < ε .

We assume that h ∈ H ≡ W01,2 [0, 1] , Rd , the Cameron–Martin space.

13.8 Support theorem in conditional form

379

Lemma 13.64 Given h ∈ H and α < 1/2 then, for any δ > 0, lim Pε,h dα -H¨o l;[0,1] (B, S2 (h)) >δ =0 iﬀ lim Pε,h T−h Bα -H¨o l;[0,1] >δ = 0. ε→0

ε→0

Proof. Immediate from Proposition 13.63, noting that |B − h|α -H¨o l is dominated by either quantity dα -H¨o l;[0,1] (B, S2 (h)) , T−h Bα -H¨o l;[0,1] . Lemma 13.65 Given h ∈ H and α < 1/2 then, for any δ > 0, " C ! "C ! ˙ Eε exp −2 I[h] Pε,h T−h Bα -H¨o l;[0,1] >δ ≤ Pε Bα -H¨o l;[0,1] >δ ! " d 1 1 where we write I h˙ ≡ 0 h˙ t dBt = i=1 0 h˙ it dBti . Proof. From Proposition 13.37, (T−h B) (ω) = B (ω − h) a.s. and we proceed by the Cameron–Martin therorem. A drift term −h corresponds to the Radon–Nikodym density 1 2 ˙ ˙ −1 Rh = exp −I[h] ht dt 2 0 which will allow us to write out Pε,h in terms of Eε,0 . We ﬁrst note that by symmetry, " " ! ! 1 1 E 0 h˙ t dBt , |B|∞;[0,1] < ε E − 0 h˙ t dBt , |−B|∞;[0,1] < ε ! " ! " = , P |B|∞;[0,1] < ε P |−B|∞;[0,1] < ε ˙ = 0. From Jensen’s inequality we then have which shows that Eε I[h] ˙ ˙ = 1. Eε e−I [ h] ≥ exp Eε −I[h] After these short preparations, we can write Pε,h T−h Bα -H¨o l;[0,1] > δ " ! E Rh , Bα -H¨o l;[0,1] > δ ∩ |B|∞;[0,1] < ε ! " = E Rh , |B|∞;[0,1] < ε ! " ˙ Eε e−I [ h] , Bα -H¨o l;[0,1] > δ ! " = ˙ Eε e−I [ h] ! " ˙ ≤ Eε e−I [ h] , Bα -H¨o l;[0,1] > δ .

Brownian motion

380

Cauchy–Schwarz ﬁnishes the proof. We can now state the main result of this section. Theorem 13.66 Given h ∈ C02 [0, 1] , Rd and α < 1/2 then, for any δ > 0, (13.29) lim Pε,h dα -H¨o l;[0,1] (B, S2 (h)) > δ = 0. ε→0

Proof. Combining the previous two lemmas gives the desired conclusion provided ! " ˙ < ∞. lim Eε exp −2I[h] When h ∈ so that

C02

ε→0

[0, 1] , R

|I[g]| =

d

1 0

this is very easy; it suﬃces to write g := −2h˙ ∈ C 1

gdB = g1 B1 −

0

1

Bdg ≤ 2 |B|∞;[0,1] × |g| ˙ ∞;[0,1] .

The reader may suspect that the restriction to h ∈ C02 [0, 1] , Rd H in the previous statement was unnecessary. The extension to h ∈ H turns out to be rather subtle and is discussed in the following Exercise 13.67 Show that for all g ∈ L2 [0, 1] , Rd , Eε [exp (I [g])] ≤ E [exp (|I [g]|)] . Hint: A classical correlation inequality [35, Theorem 2.1] states that for any i.i.d. family of standard Gaussian (Xi ) and any convex, symmetric set Ck ⊂ Rk , P [|X1 | < η, (X1 , . . . , Xk ) ∈ Ck ] ≤ P [|X1 | < η] P [(X1 , . . . , Xk ) ∈ Ck ] . (13.30) Solution. It suﬃces to show that for any η > 0 and ε > 0, Pε (|I (g)| < η) ≥ P (|I (g)| < η) since then E (exp I (g)) ε

(13.31)

∞

≤ E (exp |I (g)|) = ex Pε (|I (g)| > x) dx 0 ∞ ≤ ex P (|I (g)| > x) dx = E (exp |I (g)|) . ε

0

To show inequality (13.31), let (gi ) denote an orthonormal basis for L2 ([0, 1] ,

1 Rd such that g1 = g. Set Xi := 0 gi dB and denote by X the inﬁnite vector whose components are Xi . Note that the Xi are standard, i.i.d. normal random variables. Let C be a convex, symmetric set in R∞ and denote by

13.9 Appendix

381

Ck its projection on Rk . Clearly, Ck is convex and symmetric and (13.30) applies to it. By dominated convergence, P [|X1 | < η, X ∈ C] ≤ P [|X1 | < η] P [X ∈ C] . Therefore, choosing C := |B|∞;[0,1] < ε and noting that C is both symmetric and convex, we obtain inequality (13.31).

13.9 Appendix: inﬁnite 2-variation of Brownian motion Let B denote d-dimensional standard Brownian motion. We now show that |B|2-var;[0,T ] = +∞ a.s. The importance of this statement is that it rules out a stochastic integration based on Young integrals. Lemma 13.68 (Vitali covering) Assume a set E ⊂ [0, 1] admits a “Vitali cover”; that is, a (possibly uncountable) family I = {Iα } of closed intervals (with non-empty interior) in (0, 1) so that for every t ∈ E, η > 0 there exists an interval I ∈ I with length |I| < η and t ∈ I. Then, for every ε > 0, there exist disjoint intervals I1 , . . . , In ∈ I such that17 |E\ (I1 ∪ · · · ∪ In )| < ε.

(13.32)

. Proof. We start by taking any interval I1 ∈ I and assume that I1 , . . . , Ik have been deﬁned. If these intervals cover E, the construction is trivially ﬁnished. Otherwise, we set rk := sup {|I| : I ∈ I, I disjoint of I1 ∪ · · · ∪ Ik } and note rk ∈ (0, 1]. We can then pick Ik +1 ∈ I, disjoint of I1 ∪ · · · ∪ Ik , with |Ik +1 | > rk /2. Assuming the construction does not ﬁnish trivially, we obtain a family (Ik : k ∈ N) of closed, disjoint intervals in (0, 1). Clearly, ∞

|Ik | ≤ |∪∞ k =1 Ik +1 | ≤ 1

(13.33)

k =1

∞ and we can pick n ∈ N such that k > n |Ik | < ε/5; with this choice of n we now verify (13.32). To this end, take any t ∈ E\ (I1 ∪ · · · ∪ In ) and then I ∈ I, disjoint of I1 ∪ · · · ∪ In , with t ∈ I. There exists an integer l ≥ n + 1 such that I is disjoint of I1 ∪ · · · ∪ Il−1 but I ∩ Il = ∅; otherwise |I| ≤ rk −1 < 2 |Ik | for all k, in contradiction to (13.33). We also have |I| ≤ rl−1 < 2 |Il | and thinking of Il as a “ball” of radius |Il /2| it is 1 7 We write |·| for the Lebesgue measure. If E is not measurable, the statement and proof remain valid if |·| is understood as an outer Lebesgue measure.

382

Brownian motion

then clear from t ∈ I, I ∩ Il = ∅ that t is also contained in a “ball” with the same centre but radius 5 |Il /2|. In other words, t is contained in some interval Jl with|Jl | = 5 |Il |. By our choice of n we then see that ∞ |E\ (I1 ∪ · · · ∪ In )| ≤ l= n +1 |Jl | < ε. Theorem 13.69 Let B denote d-dimensional Brownian motion on [0, T ]. Assume that for some function ψ, deﬁned in a positive neighbourhood of 0, h2 / (log log 1/h) → 0 as h ↓ 0. ψ (h) Then supD t i ∈D ψ Bt i ,t i + 1 = +∞ where the sup runs over all dissections of [0, T ]. In particular, for any q ≤ 2 we have |B|q -var;[0,T ] = +∞ with probability one. Proof. Without loss of generality, we may take T = 1 and argue on 1dimensional Brownian motion β. From Khintchine’s law of iterated logarithms, see (13.9), there exists a deterministic constant c ∈ (0, ∞) such that, with probability one, lim sup t↓0

|β t | =c ϕ ¯ (t)

5 where ϕ ¯ (h) = h log log 1/h is well-deﬁned for h small enough. (The fact that c = 21/2 is irrelevant for the argument.) For every ﬁxed t, β t,t+h : h ≥ 0 is a Brownian motion and so it is clear that (for ﬁxed t) with probability one, β t,t+h = c. lim sup ¯ (h) h↓0 ϕ ¯ (h) := h2 / (log log 1/h) is the asympotic inverse of ϕ Noting that ψ ¯ at 0, this implies P (t ∈ Eδ ) = 1 where c ¯ β > h for some h ∈ (0, δ) . Eδ = t ∈ (0, 1) : ψ t,t+h 2 A Fubini argument applied to the product of P and Lebesgue measure |·| on (0, 1) shows that P (|Eδ |) = 1. But then E := ∩δ > 0 Eδ = ∩n E1/n also satisﬁes P (|E|) = 1 and since for each t ∈ E there are arbitrarily small ¯ β > ch/2, this family of all such intervals [t, t + h] such that ψ t,t+h intervals forms a Vitali cover of E. We can ﬁx δ > 0, and discarding all intervals of length ≥ δ still leaves us with a Vitali cover of E. By Vitali’s covering lemma, there are disjoint intervals [ti , ti + hi ] for i = 1, . . . , n, d with hi < δ, of total length i=1 hi arbitrarily close to 1 and in particular ≥ 1/2, say. We can complete the endpoints of these disjoint intervals to a

13.10 Comments

383

dissection Dδ = (sj ) of [0, 1] with mesh ≤ δ and ¯ β ¯ β ≥ ψ ψ s j ,s j + 1 t i ,t i +h i j

i

≥

chi /2

i

≥

c/4.

¯ On the other hand, writing ∆ (δ, ω) = inf s j ∈D δ ψ β s j ,s j + 1 /ψ β s j ,s j + 1 , sup D =(r j )

ψ β r j ,r j + 1

≥

ψ β s j ,s j + 1

s j ∈D δ

j

≥

∆ (δ, ω)

¯ β ψ s j ,s j + 1

j

≥

∆ (δ, ω) c/4.

It is now an easy consequence of (uniform) continuity of β on [0, T ] and ¯ the assumption ψ/ψ → 0 that ∆ (δ, ω) → +∞ as δ → 0. This ﬁnishes the proof. The case q = 2 should be compared with ﬁnite quadratic variation of Brownian motion in the commonly used sense of semi-martingale theory. The reader can ﬁnd a proof in [143, Section II.2.12]. Theorem 13.70 Let β denote Brownian motion on [0, T ]. If (Dn ) is a sequence of nested (i.e. Dn ⊂ Dn +1 ) dissections of [0, T ] such that |Dn | → 0 as n → ∞, then 2 almost surely. lim β t i ,t i + 1 = T n →∞

t i ∈D n

If we drop the nestedness assumption, convergence holds in probability.

13.10 Comments Section 13.1: The deﬁnition and basic properties of Brownian motion are classical. Among the many good textbooks, let us mention the classics: Ikeda and Watanabe [88], Karatzas and Shreve [94], Revoz and Yor [143], Rogers and Williams [144] and Stroock [159]. The deﬁnition of L´evy area based on stochastic integration appears in Ikeda and Watanabe [88, p. 128] or Malliavin [124, p. 216]. L´evy’s original (martingale) construction, discussed in Exercise 13.5, only uses discrete-time martingale techniques and corresponds to a Karhunen–L´ oeve-type convergence result which extends to other Gaussian processes (see Section 15.5.3). A completely diﬀerent

384

Brownian motion

Markovian construction of L´evy area will be given in the later chapter on Markovian processes, starting with Section 16.1. At last, Section 13.1.3 follows Ikeda and Watanabe [88]. Section 13.2: The name enhanced Brownian motion ﬁrst appears in [115]. Once it is identiﬁed as a special case of a left-invariant Brownian motion on a (free, nilpotent) Lie group, properties such as those given in Proposition 13.11 are well known (e.g. Rogers and Williams [145] or the works of Baldi, Ben Arous). Rough path regularity of enhanced Brownian motion was established in unpublished thesis work of Sipil¨ ainen [156]. Following the monograph of Lyons and Qian [120], it follows from showing the dyadic piecewise linear approximations converge in p-variation (rough path) metric. Our exposition here is a simple abstraction of Friz and Victoir [62], based on general Besov–H¨older embedding-type results for paths with values in metric spaces. A Fernique-type estimate for rough path norms of enhanced Brownian motion (Corollary 13.14) was also established by Inahama [90]. Section 13.3: Geodesic approximations were introduced in the rough path context in Friz and Victoir [63]; the resulting convergence results are trivial but worth noting. Rough path convergence of dyadic piecewise linear approximations to Brownian motion was established in unpublished thesis work of Sipil¨ ainen [156], see also Lyons and Qian [120], and underwent several simpliﬁcations, notably Friz [69]. Non-standard approximations to Brownian motion were pioneered by McShane [126] and Sussmann [167]; the corresponding subsection is taken from Friz and Oberhauser [58]. Section 13.4: The discussion of Donsker’s theorem for enhanced Brownian motion is taken from Breuillard et al. [16]. For the central limit theorem on free nilpotent groups, see Cr´epel and Raugi [36]. Section 13.5: The Cameron–Martin theorem for Brownian motion is classical. See Stroock [159], for instance. The proper abstract setting is for Gaussian measures on Banach spaces and (cf. Appendix D and the references therein). Theorem 13.36 is a simple observation and appears in Friz and Victoir [62]; see also Inahama [91]. Section 13.6: Large deviations for Brownian motion in uniform topology were obtained by Schilder, See Kusuoka [98]. Extensions to H¨ older topology are discussed in Baldi et al. [5]; one can even do without any topology, Ledoux [102]. Large deviations for EBM were ﬁrst established in p-variation rough path topology, Ledous et al. [101]; the 1/p-H¨ older case was obtained in Friz and Victoir [62]. Proposition 13.43 is taken from Friz and Victoir [65]. Exercise 13.44 follows the usual martingale arguments of the Freidlin– Wentzell estimates; Exercise 13.45 is a special case of the large deviation principle established in Section 16.7. Exercise 13.46 follows from the same arguments as in the Brownian motion case, Baldi et al. [5]. Section 13.7: The support description of Brownian motion itself is a trivial consequence of the Cameron–Martin theorem. The support description of EBM is subtle because of L´evy’s area. Based on correlation

13.10 Comments

385

inequalities, it was ﬁrst obtained in Ledoux et al. [101] in p-variation topology. The arguments were simpliﬁed and strengthened to 1/p-H¨older (resp. L´evy modulus) topology in Friz [69] (resp. Friz and Victoir [62]); the present discussion is a further streamlining. Section 13.8: The discussion in this section follows closely Friz et al. [57]. The subtle gap between C 2 and H as discussed in Exercise 13.67 was noted in the Onsager–Machlup context, see Shepp and Zeitouni [154], but seems new in the support context. Appendix: The inﬁnite 2-variation of Brownian motion, not to be confused with the ﬁnite “quadratic variation” of Brownian motion, is well known, e.g. Freedman [56, Chapter 1]; our slightly sharper result is taken from Taylor [168].

14 Continuous (semi-)martingales We have seen in the previous chapter that Brownian motion B can be enhanced to a stochastic process B = B (ω) for which almost every realization is a geometric 1/p-H¨older rough path (and hence a geometric p-rough path), p ∈ (2, 3). In this chapter, we show that any continuous, ddimensional semi-martingale, say S = M +V where M is a continuous local martingale and V a continuous path of bounded variation on any compact time interval, admits a similar enhancement with p-variation rough path regularity, p ∈ (2, 3). In fact, it suﬃces to construct a lift of M , denoted by M, since then the lift of S is given deterministically via the translation operator, S = TV M. Note that convergence of lifted piecewise linear approximations, in the sense that1 dp-var;[0,T ] S2 S D n , S →n →∞ 0 in probability, is readily reduced to showing the convergence dp-var;[0,T ] S2 M D n , M →n →∞ 0 in probability. Indeed, since V D n − V (1+ε)-var;[0,T ] →n →∞ 0 follows readily from Proposition 1.28 plus interpolation, it suﬃces to use basic continuiuty properties of the translation operator (h, x) → Th x as a map from C q -var [0, T ] , Rd × C p-var [0, T ] , G2 Rd → C p-var [0, T ] , G2 Rd , valid for 1/q + 1/p > 1. After these preliminary remarks we can and will focus our attention on continuous local martingales. We assume the reader is familiar with the basic aspects of this theory.

14.1 Enhanced continuous local martingales We write Mc0,lo c [0, ∞), Rd or Mc0,lo c Rd for the class of Rd -valued continuous local martingales M : [0, ∞) → Rd null at 0, deﬁned on some ﬁltered probability space (Ω, F, (Ft ) , P). We deﬁne the process !M " : Ω × [0, ∞) → Rd componentwise, with the ith component being deﬁned as the 1 As usual, S D n denotes the piecewise linear approximation to a path S based on some dissection D n ∈ D [0, T ].

14.1 Enhanced continuous local martingales

387

3 2 3 2 “usual” bracket (or quadratic variation) process M i ≡ M i , M i of (the real-valued, continuous local martingale) M i ; that is, the unique real-valued 2 2 3 continuous increasing process such that M i − M i is a continuous local martingale null at zero.2 The area-process A : Ω×[0, ∞) → so (d) is deﬁned by Itˆ o or Stratonovich stochastic integration Ai,j t

= =

1 2 1 2

t

t

Mri dMrj − 0

Mrj dMri 0

t

Mri ◦ dMrj − 0

t

Mrj ◦ dMri , i, j ∈ {1, . . . , d} ;

0

3 2 the equality being a consequence of the fact that the covariation M i , M j is symmetric in i, j. (As is well known, Itˆ o and Stratonovich integrals diﬀer by 1/2 in the covariation between integrand and integrator.3 ) We note that the area-process is a vector-valued continuous local martingale. By disregarding a null set we can and will assume that M and A are continuous. Deﬁnition 14.1 If M is an Rd -valued continuous local martingale, deﬁne M := exp (M +A) to be its lift, and observe that M has sample paths in C ([0, ∞), G2 Rd . The resulting classof enhanced (continuous, local) martingales is denoted by Mc0,lo c G2 Rd . The lift is compatible with the stopping and time-changes. Lemma 14.2 Let M be an Rd -valued continuous local martingale, and M its lift. (i) Let τ be a stopping time. Then, Mτ : t → Mt∧τ is the lift of M τ : t → Mt∧τ . (ii) Let φ be a time-change; that is, a family φs , s ≥ 0, of stopping times and right-continuous. If M such that the maps s → φs are a.s. increasing is constant on each interval φt− , φt , then M ◦ φ is a continuous local martingale and its lift is M ◦ φ. Proof. Stopped processes are special cases of time-changed processes (take φt = t∧τ ) so it suﬃces to show the second statement. This follows from the compatibility of a time change φ and stochastic integration with respect to a continuous local martingale, constant on each interval φt− , φt .4 The lift is of course a special case of stochastic integration. The lift is also compatible with respect to scaling and concatenation of (local martingale) paths. 2 See,

for example, [143, Chapter 4, Theorem 1.8] or [145, p. 54]. for example, [160]. 4 See Proposition V.1.5. (ii) of [143], for example. 3 See,

388

Continuous (semi-)martingales

Lemma 14.3 Let M be an Rd -valued continuous local martingale, and M its lift. If δ c : G2 Rd → G2 Rd is the dilation operator (see Deﬁnition 7.13), then δ c M is the lift of the martingale cM. Proof. Left to the reader.

14.2 The Burkholder–Davis–Gundy inequality Deﬁnition 14.4 F : R+ → R+ is moderate if (i) F is continuous and increasing, (ii) F (x) = 0 if and only if x = 0, and (iii) for some (and then for every) α > 1, sup x> 0

F (αx) < ∞. F (x)

A few properties of moderate functions are collected in the following lemma. Lemma 14.5 (i) x → F (x) is moderate if and only if x → F x1/2 is moderate. (ii) Given c, A, B > 0 : c−1 A ≤ B ≤ cA =⇒ ∃C = C (c, F ) : C −1 F (A) ≤ F (B) ≤ CF (A) . (iii) ∃C : ∀x, y > 0 : F (x + y) ≤ C [F (x) + F (y)] . Proof. (i),(ii) are left to the reader. Ad (iii). Without loss of generality, we assume x < y; then F (x + y) ≤ F (2y) ≤ c1 F (y) by moderate growth of F . We now recall the classical Burkholder–Davis–Gundy inequality for continuous local martingales.5 Theorem 14.6 (Burkholder–Davis–Gundy) Let F be a moderate function, M ∈ Mc0,lo c Rd some continuous local martingale. Then there exists a constant C = C (F, d) such that 1/2 1/2 −1 ≤ E F sup |Ms | . ≤ CE F |!M "∞ | C E F |!M "∞ | s≥0

Observe that if one knows the above statement only for R-valued martingales, then using the norm on Rd , |a| = max a1 , . . . , ad , the Rd Burkholder–Davis–Gundy inequality is a simple consequence of the 5A

proof can be found in [145, p. 93], for instance.

14.2 The Burkholder–Davis–Gundy inequality

389

Burkholder–Davis–Gundy inequality for Mc0,lo c (R), applied componentwise. Lemma 14.5 shows that one can switch to Lipschitz equivalent norms. In Section 14.5 we shall need a Burkholder–Davis–Gundy-type upper bound for real-valued discrete-time martingales. To state this, let us ﬁrst deﬁne the p-variation of a discrete-time martingale (Yn ) as 01/p / p Yn ≡ sup − Yn . |Y | p-var

k+1

k

(n k ) k

A proof of the following lemma can be found in [108, Proposition 2b] for d = 1. The extension to dimension d > 1 is straightforward. Lemma 14.7 Let F be moderate, and Y : N → Rd a discrete martingale. If 1 < q < p ≤ 2 or 1 = q = p, there exists a constant C = C (F, d),  / 01/q  q  . ≤ CE F  |Yn +1 − Yn | E F |Y |p-var n

We now derive the Burkholder–Davis–Gundy inequality for enhanced (continuous, local) martingales. Theorem 14.8 (BDG for martingales) Let F be a moder enhanced ate function, M ∈ Mc0,lo c G2 Rd be the lift of some local martingale M . Then there exists a constant C = C (F, d) so that 1/2 1/2 ≤ E F sup Ms,t . C −1 E F |!M "∞ | ≤ CE F |!M "∞ | s,t≥0

Proof. The lower bound comes from Ms,t ≥ |Ms,t |, monotonicity of F and the classical Burkholder–Davis–Gundy lower bound. For the upper bound we note that supu ,v ≥0 Mu ,v ≤ 2 supt≥0 Mt . By the equivalence of homogenous norms, 1/2 Mt ≤ c1 |Mt | + |At | and using “F (x + y) F (x) + F (y)”, combined with the classical Burkholder–Davis–Gundy upper bound, it suﬃces to show that 1/2 1/2 . ≤ c2 E F |!M "∞ | E F sup |At | t≥0

1/2 is moderate and A itself is a But this is easy using the fact that F (·) martingale with bracket 2 i,j 3 A = t t t 1 2 j 2 2 i 3 3 3 1 t i j 2 i j M i 2 d M j + M d M − M M d M ,M 4 2 0 0 0 ≤ c1 sup |Mt | . |!M "t | t≥0

Continuous (semi-)martingales

390

to which we can apply the Burkholder–Davis–Gundy inequality: 1/2 1/4 E F sup |At | ≤ c2 E F |!A"∞ | t≥0

1/2 1/4 ≤ c3 E F sup |Mt | × |!M "∞ | t≥0

≤

1/2 c4 E F sup |Mt | + |!M "∞ |

≤

1/2 + E F |!M "∞ | c5 E F sup |Mt |

≤

1/2 . c6 E F |!M "∞ |

t≥0

t≥0

Here, we used “F (xy) ≤ F x2 + y 2 F x2 + F y 2 ” and, of course, the classical Burkholder–Davis–Gundy upper bound in the last step.

14.3 p-Variation rough path regularity of enhanced martingales We now show that every M ∈ Mc0,lo c G2 Rd is a geometric p-rough path for p ∈ (2, 3). In other words, for every T > 0 , Mp-var;[0,T ] < ∞ a.s.

(14.1)

The Burkholder–Davis–Gundy inequality on the group allows for an elegant proof of this. Proposition 14.9 (enhanced p-variation regularity) martingale, Let p > 2 and M ∈ Mc0,lo c G2 Rd . Then, for every T > 0, Mp-var;[0,T ] < ∞ a.s. Proof. There exists a sequence of stopping times τ n that converges to ∞ almost surely, such that M τ n and !M τ n " are bounded (for instance, τ n = inf{t : |Mt | > n or |!M "t | > n} will do). Since P Mp-var;[0,T ] = Mp-var;[0,T ∧τ n ] ≤ P (τ n < T ) → 0 as n → ∞ it suﬃces to consider the lift of a bounded continuous martingale with bounded quadratic variation. We can work with the l1 -norm on Rd , |a| = d i=1 |ai | . The time-change φ (t) := inf {s : |!M "s | > t} may have jumps, but continuity of |!M "| ensures that | !M "φ(t) | = t. From the deﬁnition of

14.3 p-Variation rough path regularity

391

φ and the Burkholder–Davis–Gundy inequality on the group, both !M " and that Xt = Mφ(t) deﬁnes M are constant on the intervals φt− , φt . Itfollows a continuous6 path from [0, |!M "T |] to G2 Rd and from the invariance of the p-variation with respect to time-changes, we have Xp-var; [0, |M |] = Mp-var;[0,T ] . T As argued at the beginning of the proof, we may assume that |!M "T | ≤ R for some deterministic R large enough. Therefore, (14.2) P Mp-var;[0,T ] > K = P Xp-var; [0, |M |] > K, |!M "T | ≤ R T ≤ P Xp-var;[0,R ] > K . We go on to show that X is in fact H¨ older continuous. For 0 ≤ s ≤ t ≤ R, we can use the Burkholder–Davis–Gundy inequality on the Group, Theorem 14.8, to obtain q 2q 2q = E Mφ(s),φ(t) ≤ cq E !M "φ(t) − !M "φ(s) . E Xs,t Observe that !M "φ(t) − !M "φ(s)

=

2

Mi

3 φ(t)

2 3 − M i φ(s)

i

=

!M "φ(t) − !M "φ(s) = t − s.

Thus, for all q < ∞ there exists a constant cq such that 2q q ≤ cq |t − s| . E Xs,t We can now apply Theorem A.10 to see that X1/p-H¨o l;[0,R ] ∈ Lq for all q ∈ [1, ∞) and

P Xp-var;[0,R ] > K ≤

E Xp-var;[0,R ] K

≤

E X1/p-H¨o l;[0,R ] .R1/p K

tends to zero as K → ∞. Together with (14.2) we see that Mp-var;[0,T ] < ∞ with probability 1, as claimed. 6 From Lemma 14.2, X is the lift of M ◦ φ, which is a continuous local martingale. This is another way to see continuity of X.

Continuous (semi-)martingales

392

14.4 Burkholder–Davis–Gundy with p-variation rough path norm Following a classical approach to Burkholder–Davis–Gundy-type inequalities, we ﬁrst prove a Chebyshev-type estimate. Lemma 14.10 There exists a constant A such that for all continuous local martingales M , for all λ > 0, E (|!M "∞ |) , P Mp-var;[0,∞) > λ ≤ A λ2

(14.3)

where M denotes the lift of M. Proof. It suﬃces to prove the statement when λ = 1 (the general case follows by considering M/λ with lift δ 1/λ M). The statement then reduces to " ! ∃A : ∀M : P Mp-var;[0,∞) > 1 ≤ A E (|!M "∞ |) . Assume this is false. Then for every A, and in particular for A (k) ≡ k 2 , there exists M ≡ M (k ) with lift M(k ) such that the condition is violated, i.e. we have: & % ? " ! > (k ) 2 (k ) >1 . k E M 1 , nk = [1/uk + 1] ∈ N and note that 1 ≤ nk uk ≤ 2. Observe that % & ? " ! > (k ) 2 (k ) > 1 = nk uk ≤ 2. nk k E M ≤ nk P M ∞

p-var;[0,∞)

We now “expand” the sequence M (k ) : k = 1, 2, . . . by replacing each M (k ) with nk independent copies of M (k ) . This yields another sequence of continuous local martingales, say N (k ) : k = 1, 2, . . . . Writing N(k ) for the lift of N (k ) we clearly see that & % P N(k ) >1 = nk uk = +∞, k

p-var;[0,∞)

k

while ? " ? " 2 ! > ! > E N (k ) nk E M (k ) < ∞. = ≤ k2 ∞ ∞ k

k

k

Thus, if the claimed statement is false, there exists a sequence of martingales N k with lift Nk each deﬁned on some ﬁltered probability space

14.4 B–D–G with p-variation rough path norm

393

Ωk , Ftk , Pk with the two properties & ? " % ! > Pk N(k ) > 1 = +∞ and Ek N (k ) < ∞. p-var;[0,∞)

k

∞

k

6∞ 6∞ Deﬁne the probability space Ω = k =1 Ωk , the probability P = k =1 Pk , and the ﬁltration (Ft ) on Ω given by   k −1 ∞ 4 4 i Ft = ⊗ Fgk(k −t) ⊗  F∞ F0k  for k − 1 ≤ t < k, i=1

j =k +1

where g (u) = 1/u − 1 maps [0, 1] → [0, ∞]. Then, a continuous martingale on (Ω, (Ft ) , P) is deﬁned by concatenation, Nt =

k −1

(k )

(i) N∞ + Ng (k −t) for k − 1 ≤ t < k,

i=1

and hence its lift N satisﬁes Nt =

k −1 4

(k )

⊗ Ng (k −t) .

N(i) ∞

i=1

We also observe that, again for k − 1 ≤ t < k, !N "t =

k −1 >

N (i)

i=1

? ∞

? > + N (k )

g (k −t)

.

3 2 In particular, !N "∞ = k N (k ) ∞ and, using the second property of the martingale sequence, E (|!N "∞ |) < ∞. Deﬁne the events Ak = Np-var;[k −1,k ] > 1 . Then, using the ﬁrst property of the martingale sequence, P (Ak ) = Pk N(k ) > 1 = ∞. k

p-var;[0,∞)

k

Since the events {Ak : k ≥ 1} are independent, the Borel–Cantelli lemma implies that P (Ak inﬁnitely often) = 1. Thus, almost surely, for all K > 0 there exist a ﬁnite number of increasing times t0 , . . . , tn ∈ [0, ∞) so that n Nt i=1

i −1 ,t i

>K

394

Continuous (semi-)martingales

and Np-var;[0,∞) must be equal to +∞ with probability one. We now deﬁne a martingale X by time-change, namely via f (t) = t/ (1 − t) for 0 ≤ t < 1 and f (t) = ∞ for t ≥ 1, X : t → Nf (t) . Note that E (|!N "∞ |) < ∞ so that N can be extended to a (continuous) martingale indexed by [0, ∞] and X is indeed a continuous martingale with lift X. Since lifts interchange with time-changes, Xp-var;[0,1] = Np-var;[0,∞) = +∞ with probability one. But this contradicts the pvariation regularity of enhanced martingales. The passage from the above Chebyshev-type estimate to the full Burkholder–Davis–Gundy inequality is made possible by the following lemma. The proof can be found in [145, p. 94]. Lemma 14.11 (good λ inequality) Let X, Y be non-negative random variables, and suppose there exists β > 1 such that for all λ > 0, δ > 0, P (X > βλ, Y < δλ) ≤ ψ (δ) P (X > λ) where ψ (δ) 0 when δ 0. Then, for each moderate function F, there exists a constant C depending only on β, ψ, F such that E (F (X)) ≤ CE (F (Y )) . We now derive the Burkholder–Davis–Gundy inequality for enhanced (continuous, local) martingales in homogenous p-variation norm. Theorem 14.12 (BDG inequality in homogenous p-variation norm) Let F be a moderate function, M ∈ Mc0,lo c G2 Rd the lift of some continuous local martingale M, and p > 2. Then there exists a constant C = C (p, F, d) so that 1/2 1/2 ≤ E F Mp-var;[0,∞) ≤ CE F |!M "∞ | . C −1 E F |!M "∞ | Proof. Only the upper bound requires a proof. Fixing λ, δ > 0 and β > 1, we deﬁne the stopping times S1 = inf t > 0, Mp-var;[0,t] > βλ , S2 = inf t > 0, Mp-var;[0,t] > λ , 1/2 > δλ , S3 = inf t > 0, |!M "t | with the convention that the inﬁmum of the empty set is ∞. Deﬁne the local martingale Nt = MS 3 ∧S 2 ,(t+S 2 )∧S 3 and its lift N; note that Nt ≡ 0 on {S2 = ∞}. It is easy to see that Mp-var;[0,S 3 ] ≤ Mp-var;[0,S 3 ∧S 2 ] + Np-var ,

14.5 Convergence of piecewise linear approximations

395

where Np-var ≡ Np-var;[0,∞) . By deﬁnition of the relevant stopping times, 1/2 ≤ δλ = P (S1 < ∞, S3 = ∞) . P Mp-var > βλ, |!M "∞ | On the event {S1 < ∞, S3 = ∞} one has Mp-var;[0,S 3 ] > βλ and, since S2 ≤ S1 , one also has Mp-var;[0,S 3 ∧S 2 ] . Hence, on {S1 < ∞, S3 = ∞} , Np-var ≥ Mp-var;[0,S 3 ] − Mp-var;[0,S 3 ∧S 2 ] ≥ (β − 1) λ. Therefore, using (14.3), 1/2 P Mp-var > βλ, |!M "∞ | ≤ δλ

≤

P Np-var ≥ (β − 1) λ A

≤

2

(β − 1) λ2

E (|!N "∞ |) .

From the deﬁnition of N , for every t ∈ [0, ∞], !N "t = !M "S 3 ∧S 2 ,(t+ S 2 )∧S 3 . On {S2 = ∞} we have !N "∞ = 0 while on {S2 < ∞} we have, from deﬁnition of S3 , |!N "∞ | = !M "S 3 ∧S 2 ,S 3 = !M "S 3 − !M "S 3 ∧S 2 ≤ 2 !M "S 3 = 2δ 2 λ2 . It follows that

E (|!N "∞ |) ≤ 2δ 2 λ2 P (S2 < ∞) = 2δ 2 λ2 P Mp-var > λ

and we have the estimate 1/2 ≤ δλ ≤ P Mp-var > βλ, |!M "∞ |

P M > λ . p-var 2 (β − 1) 2Aδ 2

An application of the good λ-inequality ﬁnishes the proof.

14.5 Convergence of piecewise linear approximations Recall that xD denotes the piecewise linear approximation to some continuous Rd -valued path x, based on some dissection D of [0, T ]. Given M ∈ Mc0,lo c Rd the same notation applies (path-by-path) and we write M D = M D (ω). The next lemma involves no probabilty.

Continuous (semi-)martingales

396

Lemma 14.13 Let p ≥ 1 and x : [0, T ] → G2 Rd be a weak geometric p-rough path. Set x = π 1 (x) and let D be a dissection of [0, T ]. Then there exists a constant C = C (p) such that D S2 x ≤ C xp-var;[0,T ] p-var;[0,T ] p 1/p D + C max d xs k ,s k + 1 , S2 x s ,s . (s k )⊂D

k

k+1

k

p p p ! D x + Proof. We ﬁrst note that S2 xD s,t ≤ 3p−1 xD S D 2 D s,s s ,t D D p + xt D ,t . Now let (uk ) be a dissection of [0, T ], unrelated to D. Recall that uD resp. uD refers to the right- resp. left-neighbours of u in D. p D 31−p S2 x u k ,u k + 1 k

≤

p p " p ! D D x + + xD S x D 2 u k + 1 , D ,u k u k ,u u D ,u k + 1 , D k

k

≤ ≤

k

k

p 2 xD

p-var;[0,T ] p

2c1 |x|p-var;[0,T ]

p D + max S2 x s j ,s j + 1 (s j )⊂D

j

p D + max S2 x s j ,s j + 1 . (s j )⊂D

j p

Trivially, |x|p-var;[0,T ] ≤ xp-var;[0,T ] . On the other hand, using (a + b) ≤ 2p−1 (ap + bp ) when a, b > 0, the triangle inequality gives p D 21−p max S2 x s k ,s k + 1 (s k )⊂D

k

≤ max d xs k ,s k + 1 , S2 xD s (s k )⊂D

p k

,s k + 1

p

+ xp-var;[0,T ] .

k

Taking the supremum over all possible subdivisions (uk ) ﬁnishes the proof. Lemma 14.14 Let F be a moderate function, M ∈ Mc0,lo c G2 Rd the lift of some continuous local martingale M . Assume 2 < p < p ≤ 4. Then there exists a constant C = C (p, p , F ) so that for all dissections D = {tl } of [0, T ] ,    p 1/p D  d Ms k ,s k + 1 , S2 M s ,s E F  max (s k )⊂D

≤

k

k

  1/p  p Mt ,t  . CE F  l l+ 1 l

k+1

14.5 Convergence of piecewise linear approximations

397

Proof. For ﬁxed k, there are i < j so that sk = ti and sk +1 = tj . Then Ms k ,s k + 1 =

j −1 4

exp Mt l ,t l + 1 +At l ,t l + 1 , S2 M D s

k ,s k + 1

=

j −1 4

l= i

exp Mt l ,t l + 1 .

l=i

From equivalence of homogenous norms we have d Ms k ,s k + 1 , S2 M D s ,s = M−1 ⊗ S2 M D s ,s (14.4) s ,s k k + 1 k k+1 k k+1 j −1 At l ,t l + 1 = exp l=i j −1 1/2 At l ,t l + 1 . ≤ c1 l= i

The idea is to introduce the (vector-valued) discrete-time martingale Yj =

j −1

At l ,t l + 1 ∈ so (d)

l=0

so that max

(s k )⊂D

≤ c1

d Ms k ,s k + 1 , S2 M D s k

max

{i 1 ,...,i n }⊂{1,...,# D }

which can be rewritten as max d Ms k ,s k + 1 , S2 M D s (s k )⊂D

k

Noting that F ◦ yields

√

p/2 − Yi k

k

p k

k+1

1/p 1/p

≤ c1

,s k + 1

|Y |p/2-var .

· is moderate and that 1 < p /2 < p/2 ≤ 2, Lemma 14.7

" √ E F ◦ · |Y |p/2-var !

Yi

p k ,s k + 1

 2/p  √ p /2  c2 E F ◦ ·  |Yl+1 − Yl | 

≤

l

 2/p  p /2 √ At ,t  = c2 E F ◦ ·  l l+ 1 

l

 

Mt ,t p ≤ c3 E F  l l+ 1 l

1/p   .

Continuous (semi-)martingales

398

Theorem 14.15 Let F be a moderate function, M ∈ Mc0,lo c G2 Rd the lift of a continuous local martingale M. Then there exists a constant C = C (p, F, d) so that for all dissections D of [0, T ] , 1/2 ≤ CE F |!M "T | . E F S2 M D p-var;[0,T ] Proof. From Lemma 14.13, D S2 M ≤ c1 Mp-var;[0,T ] p-var;[0,T ] + c1 max d Ms k ,s k + 1 , S2 M D s (s k )⊂D

p k

1/p

,s k + 1

.

k

Using “F (x + y) F (x) + F (y)” and the above lemma, with p = 1 + p/2 for instance, we obtain " " ! ! ≤ c2 E F Mp-var E F S2 M D p-var   1/p  p Mt ,t  + c2 E F  l

l+ 1

l

" " ! ! ≤ c3 E F Mp-var + c3 E F Mp -var . The proof is now ﬁnished with the Burkholder–Davis–Gundy inequality on the group in p- (resp. p )-variation norm. Theorem 14.16 that M is a continuous local martingale with lift Assume M ∈ Mc0,lo c G2 Rd . If |M |∞;[0,T ] ∈ Lq (Ω) for some q ≥ 1,

(14.5)

then dp-var;[0,T ] S2 M D , M converges to 0 in Lq . If M is a continuous local martingale, then convergence holds in probability. Remark 14.17 If q > 1, Doob’s maximal inequality implies that (14.5) holds for any Lq -martingale. Proof. Observe ﬁrst that when t = tj ∈ D, as in the last lemma,

d Mt , S2 M

D t

j −1 1/2 ≤ c1 At l ,t l + 1 . l=0

The path M·D restricted to [ti , ti+1 ] is a straight line with no area, hence D t−s Mt ,t S2 M t i ,t = exp ti+1 − ti i i + 1

14.5 Convergence of piecewise linear approximations

and d∞ M, S2 M D = max i

sup t∈[t i ,t i + 1]

399

d Mt i ⊗Mt i ,t , S2 M D t ⊗ S2 M D t ,t i

i

−1 −1 = sup S2 M D t i ,t ⊗ S2 M D t i ⊗ Mt i ⊗ Mt i ,t t∈[0,T ]

≤ max i

sup t∈[t i ,t i + 1 ]

+ max i

S2 M D t i ,t

sup t∈[t i ,t i + 1 ]

−1 S2 M D t i ⊗ Mt i + Mt i ,t

j −1 1/2 Mu ,v + c1 max At l ,t l + 1 ≤ 2 sup i,j 0< v −u ≤|D | l=i j −1 1/2 ≤2 sup Mu ,v + 2c1 max At l ,t l + 1 . j 0< v −u ≤|D | l=0

Now, using the classical Burkholder–Davis–Gundy inequality, we have   j −1 q /2  q /4  j −1 2 At ,t  E max At l ,t l + 1  ≤ c2 E  l l+ 1 j l=0 l=0  q /4  j −1 4 Mt ,t  ≤ c3 E  l l+ 1 l=0

q /4 max Mt l ,t l + 1 ≤ c3 E l j −1 q /4  3 Mt ,t . l l+ 1 l=0

H¨ older’s inequality, Theorem 14.12 and Theorem 14.8 then lead us to  j −1 q /2  At l ,t l + 1  E max j l=0

q ≤ c3 E max Mt l ,t l + 1

1/4

l

≤

l ,t l + 1

l=0

3

3/4 q 1/4 q c3 E max Mt l ,t l + 1 E M3-var;[0,T ]

≤

 j −1 Mt E

c4 E

l

1/4 sup

0< v −u ≤|D |

Mu ,v

q

q

3/4

E (M∞ )

.

q /3 3/4 

Continuous (semi-)martingales

400

This proves that

≤

q E d∞ M, S2 M D 1/4 c5 E

sup 0< v −u ≤|D |

q

q

Mu ,v

sup 0< v −u ≤|D |

3/4

E (M∞ )

+ c5 E

(14.6)

Mu ,v

q

.

Since M is almost surely continuous, and hence uniformly continuous on [0, T ], Mu ,v → 0 a.s. with |D| → 0; sup 0< v −u ≤|D |

by dominated convergence (with M∞ ∈ Lq , seen by (14.5) and Burkholder–Davis–Gundy inequality on the group), this convergence also holds in Lq . Hence, using (14.6), we see that d∞;[0,T ] M,S2 M D → 0 in Lq . Recall from Proposition 8.15 that ≤ c6 d∞;[0,T ] M,S2 M D d0;[0,T ] M,S2 M D 1/2 + c6 M∞ d∞;[0,T ] M,S2 M D . It suﬃces to use Cauchy–Schwarz, q /2 E M∞ d∞;[0,T ] M,S2 M D q 1/2 q 1/2 ≤ E (M∞ ) E d∞;[0,T ] M,S2 M D to see that d0 M,S2 M D → 0 in Lq . We then use interpolation (Lemma 8.16) to see that for 2 < p < p, dp-var;[0,T ] M, S2 M D p D pp D 1− pp p Mp -var;[0,T ] + S2 M . ≤ c7 d0 M,S2 M p -var;[0,T ] Hence, q E dp-var;[0,T ] S2 M D , M q pp D q 1− pp qp M,S M d . ≤ cq7 E Mp p-var;[0,T ] + S2 M D p -var;[0,T 0 2 ] Using H¨ older’s inequality with conjugate exponents 1/(p /p) and 1/(1−p /p) gives p /p q q q ≤ c8 E Mp -var + S2 M D p -var E dp-var S2 M D , M ! q "1−p /p × E d0 M,S2 M D .

14.6 Comments

401

But now it suﬃces to remark, using our Burkholder–Davis–Gundy estimates (Theorems 14.12 and 14.15), that 0 / q D q max E Mp -var;[0,T ] , sup E S2 M p -var;[0,T ]

q /2

≤ c9 E |!M "T |

D ∈D[0,T ]

q ≤ c10 E |M |∞;[0,T ] ,

and the last term is ﬁnite by assumption. We proved that dp-var S2 M D , M) → 0 in Lq for any martingale M s.t. |M |∞;[0,T ] ∈ Lq . At last, if M is a local martingale one obtains convergence in probability by a simple localization argument.

14.6 Comments Local (and semi)martingales, including the classical Burkholder–Davis– Gundy inequality, are discussed in many textbooks, see e.g. Revuz and Yor [143], Rogers and Williams [145], and Stroock [160]. Proposition 14.7 was strengthened in Pisier and Xu [137], Theorem 2.1 (ii) to 1 ≤ p = q < 2. Rough path regularity of enhanced martingales and certain convergence results were ﬁrst established in Coutin and Lejay [31]. The proof of the Burkholder–Davis–Hundy inequality for enhanced martingales in p-variation rough path norm follows closely L´epingle [108] and is taken from Friz and Victoir [66], as is the rough path convergence of piecewise linear approximation. An interesting recent application of rough paths to semi-martingale theory was given in Feng and Zhao [49]: the authors construct a stochastic area between the local time x → Lxt of a real-valued semi-martingale and a deterministic function g = g (x) of ﬁnite q-variation, q < 3; as an application, to obain a generalization of the Tanaka–Meyer formula. A large deviation principle for square-integrable martingales over Brownian ﬁltration is discussed in forthcoming work by Z. Qian and C. Xu.

15 Gaussian processes We have seen in a previous chapter that d-dimensional Brownian motion B can be enhanced to a stochastic process B = B (ω) for which almost every realization is a geometric 1/p-H¨older rough path (and hence a geometric prough path), p ∈ (2, 3). Now, B is a continuous, centred Gaussian process, with independent components B 1 , . . . , B d , whose law is fully determined by its covariance function R (s, t)

= E (Bs ⊗ Bt ) = diag (s ∧ t, . . . , s ∧ t) .

Let us note that this covariance function, R = R (s, t), has ﬁnite 1-variation (in 2D sense, where the variation of R is based on its rectangular increments, cf. Section 5.5). In the present chapter, our aim is to replace Brownian motion by a ddimensional, continuous, centred Gaussian process X with independent components X 1 , . . . , X d ; again, its law is fully determined by its covariance function. In particular, we want to construct a reasonable lifted process X with geometric rough (sample) paths (in short: a Gaussian rough path) and study its probabilistic properties. We shall see that this is possible whenever the covariance function has ﬁnite ρ-variation (in 2D sense), for some ρ ∈ [1, 2), so that X is a geometric p-rough path for p > 2ρ. This also leaves considerable room to deal with Gaussian processes (with sample path behaviour) worse than Brownian motion. The main tools in this chapter are 2D Young theory (cf. Section 6.4) and then integrability of Gaussian chaos and L2 -expansions (the essentials of which are collected in Appendix D).

15.1 Motivation and outlook Let X = Xt1 , . . . , Xtd : t ∈ [0, T ] be a d-dimensional, continuous and centred Gaussian process with independent components. By a trivial reparametrization, t → XtT , we can and will take T = 1. The law of such a process is fully characterized by its covariance function, R (s, t) = diag E Xs1 Xt1 , . . . , E Xsd Xtd , s, t ∈ [0, 1] . To explain the main idea, assume at ﬁrst that X has smooth sample paths so that X can be lifted canonically via iterated integration. With

15.1 Motivation and outlook

403

focus on the ﬁrst set of iterated integrals, assuming X0 = 0 for simpler notation, we can write 2 t i ∂2 j Rj (u, v) dudv = Xu dXu Ri (u, v) E ∂u∂v 0 [0,t] 2 Ri (u, v) dRj (u, v) , ≡ [0,t] 2

where Ri is the covariance function of X i . The integral which appears on the right-hand side above can be viewed as a 2-dimensional (2D) Young integral. From the Young–L´ oeve–Towghi inequality (Theorem 6.18) we see that under the assumption ρ < 2 we have, for 0 ≤ t ≤ 1, 2 t i j ≤ (const) × |Ri |ρ-var;[0,1] 2 |Rj |ρ-var;[0,1] 2 Xu dXu E 0

≤

2

(const) × |R|ρ-var;[0,1] 2 .

This gives us uniform L2 -bounds provided the covariance of the process has ﬁnite ρ-variation in 2D sense with ρ ∈ [1, 2). It is then relatively straightforward to deﬁne X ⊗ dX (. . . and then a “natural” lift of X to a geometric rough path X . . . ) in L2 -sense, as long as X has covariance with ﬁnite ρ-variation in 2D sense. Recall that the latter means1 ti , ti+1 ρ ρ R |R|ρ-var;[0,1] 2 = sup t j , t j +1 i,j

=

" ρ ! sup E Xt i + 1 − Xt i Xt j + 1 − Xt j < ∞. i,j

2 One should note that the assumption R ∈ C ρ-var [0, 1] really encodes some information about the decorrelation of the increments of X. In the extreme case of uncorrelated increments (example: Brownian motion or L2 -martingales) the double-sum reduces to the summation over i = j. (In particular, one sees that the covariance of Brownian motion has ﬁnite ρ = 1 variation in 2D sense.) The Gaussian nature of X starts to play a role when turning L2 -esimates into Lq -estimates for all q < ∞, an essentially free consequence of Wiener– Itˆo chaos integrability. This will be seen to imply (rough path) regularity 1 The sup runs over all dissections (t ) = D, (t ) = D of [0, 1]. If one takes the sup · · only over D, so that both ti and tj = tj are taken from D, then this will still suﬃce to control the ρ-variation of R; see Lemma 5.54.

Gaussian processes

404

for X and also “Fernique” estimates, by which we mean Gaussian tails of homogenous rough path norms of X. Another useful consequence is the fact that, assuming ﬁnite ρ-variation of R, the Cameron–Martin space H is continuously embedded in the space of ﬁnite ρ-variation paths. When ρ ∈ [1, 2), the standing assumption in a Gaussian rough path setting, we see that Cameron–Martin paths are fully accessible to Young theory. This in turn is crucial for various results, including rough path convergence of Karhunen–Lo`eve approximations, large deviation and support statements. (Further applications towards Malliavin calculus will be discussed in a later chapter.)

15.2 One-dimensional Gaussian processes Throughout this section, X will be a real-valued centred Gaussian process on [0, 1] with continuous sample paths and (continuous) covariance R = R (s, t) = E (Xs Xt ). We note that the law of X induces a Gaussian measure on the Banach space C ([0, 1] , R).

15.2.1 ρ-Variation of the covariance 2

For a covariance function, as a function of two variables (s, t) ∈ [0, 1] → R (s, t), we have a well-deﬁned concept of ρ-variation (in the 2D sense) as discussed in Section 5.5. We start with some simple examples. Example 15.1 (Brownian motion) Standard Brownian motion on [0, 1] has covariance RBM (s, t) = min (s, t) . 2

Trivially, (s, t) ∈ [0, 1] → RBM (s, t) has ﬁnite ρ-variation with ρ = 1, controlled by δ x= y (dxdy) , ω ([s, t] × [u, v]) := |(s, t) ∩ (u, v)| = [s,t]×[u ,v ]

2

where δ is the Dirac mass. Since ω [s, t] = |t − s| , we see that ω is a H¨ older-dominated 2D control (in the sense of Deﬁnition 5.51). Example 15.2 (Gaussian martingales) Any continuous Gaussian martingale M has a deterministic bracket.2 Since D (M (t) : t ≥ 0) = BM t : t ≥ 0 2 See,

for example, [143, Chapter IV, (1.35)].

15.2 One-dimensional Gaussian processes

405

we see that R (s, t) = min {!M "s , !M "t } = !M "m in(s,t) . But the notion of ρ-variation is invariant under reparametrization and it follows that R has ﬁnite 1-variation since RBM has ﬁnite 1-variation.3 Exercise 15.3 (Gaussian bridge processes) Gaussian bridge processes are immediate generalizations of the Brownian bridge: given a continuous, centred Gaussian process X on [0, 1] with covariance R of ﬁnite ρ-variation, the corresponding bridge is deﬁned as XBridge (t) := X (t) − tX (1) with covariance RBridge . Prove that RBridge has ﬁnite ρ-variation, and that if R has its ρ-variation controlled by a H¨ older-dominated 2D control then the same is true for RBridge . Exercise 15.4 (Ornstein–Uhlenbeck) Show that the usual (real-valued) Ornstein–Uhlenbeck (stationary or started at a ﬁxed point) has covariance of ﬁnite 1-variation, controlled by a H¨ older-dominated 2D control. We now turn to fractional Brownian motion β H on [0, 1] with Hurst parameter H ∈ (0, 1). It is a zero-mean Gaussian process with covariance 1 2H H 2H 2H t = . β + s − |t − s| RH (s, t) = E β H s t 2 For Hurst parameter H > 1/2, fractional Brownian motion has H¨ older sample paths with exponent greater than 1/2 which is, in the context of rough paths, a trivial case. We shall therefore make the standing assumption H ≤ 1/2 noting that this covers Brownian motion with H = 1/2. Proposition 15.5 (fractional Brownian motion) Let β H be fractional Brownian motion with Hurst parameter H ∈ (0, 1/2]. Then, its covariance 3 One should note that L 2 -martingales (without assuming a Gaussian structure) have orthogonal increments, i.e.

E (X s , t X u , v ) = 0 if s < t < u < v and this alone will take care of the (usually diﬃcult to handle) oﬀ-diagonal part in the variation of the covariance (s, t) → E (X s X t ).

Gaussian processes

406

is of ﬁnite 1/ (2H)-variation, controlled by 1/(2H ) ω H (·, ·) := RH 1/(2H )-var;[·,·]×[·,·] .

(15.1)

Moreover, there exists a constant C = C (H) such that, for all s < t in [0, 1], H 1 R ≤ CH |t − s| 2 H 1/(2H )-var;[s,t] 2 so that ω H is a H¨ older-dominated control. Proof. Let D = {ti } be a dissection of [s, t], and let us look at 2 1H H β . E β H t i ,t i + 1 t j , t j + 1 i,j

H is negative, hence, β For a ﬁxed i and i = j, as H ≤ 12 , E β H t i ,t i + 1 t j , t j + 1 2 1H H H E β β t i ,t i + 1 t j , t j + 1 j

2 2 1H 2 1H H H H E β β + E β t i ,t i + 1 t i ,t i + 1 t j , t j + 1

≤

j = i

  2 1H 2 2 1H H H H E   β t i ,t i + 1 β t j , t j + 1 + E β t i ,t i + 1 j = i     2 1H 2 1H 2  2 1H −1   H  + 2 2 1H −1 E β H βH 2  t i ,t i + 1 β t j , t j + 1 t i ,t i + 1 E j

≤

≤

2 2 1H H + E β t i ,t i + 1 2 2 1H 1 H H H 2H CH E β t i ,t i + 1 β s,t + CH E β t i ,t i + 1 .

≤ Hence,

2 1H H H β E β t i ,t i + 1 t j , t j + 1 i,j

≤

CH

i

+ CH

2 2 1H E β H t i ,t i + 1

1 H 2H . E β H t i ,t i + 1 β s,t i

15.2 One-dimensional Gaussian processes

407

The ﬁrst term is equal to CH |t − s| , so we just need to prove that 1 H 2H ≤ CH |t − s| . E β H t i ,t i + 1 β s,t

(15.2)

i

To achieve this, it will be enough to prove that for [u, v] ⊂ [s, t] , 2H H . E β H u ,v β s,t ≤ CH |v − u| 2H

First recall that as 2H < 1, if 0 < x < y, then (x + y) − x2H ≤ y 2H . Hence, using this inequality and the triangle inequality, 2H 2H 2H 2H H + (u − s) − (v − s) − (t − u) E β H u ,v β s,t = cH (t − v) 2H 2H 2H 2H + cH (v − s) − (u − s) ≤ cH (t−u) − (t−v) ≤ 2cH (v − u)

2H

.

Exercise 15.6 We say that a real-valued Gaussian process X on [0, 1] satisﬁes the Coutin–Qian conditions if, for some H ∈ (0, 1) , cH > 0 and all s, t 2 E |Xs,t |

≤

cH |t − s|

|E (Xs,s+ h Xt,t+h )| ≤ cH |t − s|

2H

,

2H −2

(15.3) h2 , for 0 < h < t − s. (15.4)

Let ω H be the 2D control for the covariance of fBM, as deﬁned in (15.1). Show that, for all s ≤ t and u ≤ v in [0, 1], 2H

|E (Xs,t Xu ,v )| ≤ CH ω H ([s, t] × [u, v])

,

and conclude that the covariance of X has ﬁnite 1/ (2H)-variation, controlled by a H¨ older-dominated 2D control. Solution. Working as in Lemma 5.54, at the price of a factor 3 2 H −1 , we can restrict ourselves to the cases s = u ≤ t = v and s ≤ t ≤ u ≤ v. The ﬁrst case is given by assumption (15.3), so let us focus on the second one. Assume ﬁrst we can write t − s = nh, v − u = mh and that u − t > h. Then, 1

E (Xs,t Xu ,v ) =

−1 n −1 m k =0 l=0

E Xs+k h,s+(k +1)h Xt+ lh,t+(l+1)h .

Gaussian processes

408

Using the triangle inequality and our assumption, |E (Xs,t Xu ,v )|

=

−1 n −1 m

E Xs+k h,s+(k +1)h Xu + lh,u +(l+1)h

k =0 l=0

≤ CH

−1 n −1 m k =0 l=0

≤

CH

u + lh

2H −2

s+(k +1)h

h2 2H −2

|y − x|

dxdy

k =0 l=0 u +(l−1)h s+k h v −h t 2H −2

|y − x| dxdy s u −h H CH E β H u −h,v −h β s,t .

≤ CH ≤

−1 n −1 m

|(u + lh) − (s + kh)|

Letting h tend to 0, by continuity, we easily see that H |E (Xs,t Xu ,v )| ≤ CH E β H u ,v β s,t , which implies our statement for s ≤ t ≤ u ≤ v. That concludes the proof.

15.2.2 A Cameron–Martin/variation embedding As in the last section, X is a real-valued centred Gaussian process on [0, 1] with continuous sample paths and hence induces a Gaussian measure on the Banach space C ([0, 1] , R). From general principles (see Appendix D) the associated Cameron–Martin space 4 H ⊂ C ([0, 1] , R) consists of paths t → ht = E (ZXt ) where Z is an element of the L2 -closure of ˜ span t : t ∈ [0, 1]}, a Gaussian random variable. We recall that if h = {X ˜ · denotes another element in H, the inner product !h, h " = E ZX H

E (ZZ ) makes H a Hilbert space. The following embedding theorem will prove crucial in our later applications to support theorems and large deviations. Proposition 15.7 Assume the covariance R : (s, t) → E (Xs Xt ) is of ﬁnite ρ-variation (in 2D sense) for ρ ∈ [1, ∞). Then H is continuously embedded in the space of continuous paths of ﬁnite ρ-variation. More precisely, for all h ∈ H and all s < t in [0, 1] , |h|ρ-var;[s,t] ≤ 4 Equivalently:

!h, h"H

reproducing kernel Hilbert space.

Rρ-var;[s,t] 2 .

15.2 One-dimensional Gaussian processes

409

Proof. Let h = E (ZX. ) and assume, without loss of generality, that 1/2 !h, h"H = |Z|L 2 = 1. Let (tj ) be a dissection of [s, t] . Let ρ be the H¨ older conjugate of ρ. Using duality for lρ -spaces, we have5 

ht 

j

,t j + 1

1/ρ ρ 

j

=

sup β ,|β |l ρ ≤1

≤

≤

sup β ,|β |l ρ ≤1

sup β ,|β |l ρ ≤1

 β j ht j ,t j + 1 =

j

1

 β j Xt j ,t j + 1 

j

β j β k E Xt j ,t j + 1 Xt k ,t k + 1

(Cauchy–Schwarz)

j,k

@ A  1   ρ1 ρ A ρ A ρ ρ  β j |β k |   E Xt ,t Xt ,t B j j+1 k k+1 j,k

E Xt j ,t j + 1 Xt



E Z

j,k

1/(2ρ)

 ≤

sup β ,|β |l ρ ≤1

k

,t k + 1

ρ 

≤

Rρ-var;[s,t] 2 .

j,k

The proof is then ﬁnished by taking the supremum over all (tj ) ∈ D [s, t]. Remark 15.8 Assume that the ρ-variation of R is controlled by a H¨ olderdominated control, i.e. ρ ∀s < t in [0, 1] : Rρ-var;[s,t] 2 ≤ K |t − s| .

Then Proposition 15.7 implies that |hs,t | ≤ |h|ρ-var;[s,t] ≤ |h|H K 1/2 |t − s|

1/(2ρ)

which tells us that H is continuously embedded in the space of 1/ (2ρ)H¨ older continuous paths (which can also be seen directly from hs,t = E (ZXs,t ) and Cauchy–Schwarz). The point is that 1/ (2ρ)-H¨older only implies 2ρ-variation regularity, in contrast to the sharper result of Proposition 15.7. Remark 15.9 Let HBM denote the Cameron–Martin space of real-valued Brownian motion (β t : t ∈ [0, 1]); deﬁned as the set of all paths t → ht = E (Zβ t ) where Z is in the L2 -closure of span {β t : t ∈ [0, 1]}. As is well known (see Example D.3), HBM is identiﬁed with the Sobolev space W01,2 5 The

case ρ = 1 may be seen directly by taking β j = sgn h t j , t j + 1 .

Gaussian processes

410

([0, 1] , R). It is worth noting that Proposition 15.7 implies |h|1-var;[s,t] ≤ 1/2

M |t − s|

with M =

!h, h"HB M ; and this property alone, using

|h|W 1 , 2 =

sup (t i )∈D[0,1]

ht

1/2 2 / |ti+1 − ti | i ,t i + 1

i

implies the (important) estimate |h|W 1 , 2 ≤

!h, h"HB M .

H Remark 15.10 Consider HfBM ≡ HH , the Cameron–Martin space of fractional Brownian motion with Hurst parameter H. It can be useful to know that smooth paths started at the origin are contained in HH . In fact, one even has (e.g. [40, 63]) (15.5) C01 [0, T ] , Rd ⊂ HH .

Let us now focus on the interesting regime H ∈ (0, 1/2]. Proposition 15.7 immediately gives HH → C 1/(2H )-var which shows that fractional Cameron–Martin paths have ﬁnite q ∈ [1, 2)variation as long as H > H ∗ = 1/4. In fact, one can do a little better and show that for any δ ∈ (1/2, 1/2 + H) , HH → W0δ ,2 → C 1/δ -var . The ﬁrst embedding is well known: from [40] and the references therein we + know that HH is continuously embedded in the potential space IH +1/2,2 , + which we need not deﬁne here, and from [51, 40] one has IH +1/2,2 ⊂ W δ ,2 ; a direct proof can be found in [64]. The second embedding is a Besov variation embedding, see Corollary A.3 in Appendix A.

15.2.3 Covariance of piecewise linear approximations Let X be a centred real-valued continuous Gaussian process on [0, 1] with covariance R = RX assumed to be of ﬁnite ρ-variation, dominated by some 2D control function ω. We now discuss what happens to (the ρ-variation of) the covariance of piecewise linear approximations to X. To this end, let ˜ = (˜ τ j ) be dissections of [0, 1] and write X D for the piecewise D = (τ i ) , D linear approximation to X, i.e. XtD = Xt for t ∈ D and X D is linear τ j , τ˜j +1 ) between two successive points of D. If (s, t) × (u, v) ⊂ (τ i , τ i+1 ) × (˜ we set, consistent with Deﬁnition 5.59, t v s, t ˜ ˜ D ,D D D ˙ ˙ : =E R Xr dr Xr dr u, v s u v−u t−s τ i , τ i+1 . × R = τ j , τ j +1 τ i+1 − τ i τ˜j +1 − τ˜j

15.2 One-dimensional Gaussian processes

411

In particular, RD := RD ,D is then precisely RX D i.e. the covariance of X D . Proposition 15.11 (covariance of piecewise, linear approximations) Let X be a continuous, centred, real-valued Gaussian process on [0, 1] with ˜ ∈ D [0, 1]. Then covarianceR assumed to be of ﬁnite ρ-variation. Let D, D ˜

XD , XD

is jointly Gaussian with covariance

R(X D ,X D˜ ) :

s t



E XsD XtD  ˜ → E XsD XtD

of ﬁnite ρ-variation. Moreover, R(X D ,X D˜ )

 ˜ E XsD XtD  ˜ ˜ E XsD XtD

1

ρ-var;[0,1] 2

≤ 4.91− ρ |R|ρ-var;[0,1] 2 .

˜ Proof. It is easy to check that X D , X D is jointly Gaussian. Observe that, using the notation of Deﬁnition 5.59, ˜ RD ,D RD , D R(X D ,X D˜ ) = . ˜ ˜ ˜ RD ,D RD , D It then follows from Proposition 5.60 that each component of this matrix ρ has ﬁnite ρ-variation in 2D sense, controlled by 9ρ−1 |f |ρ-var . We now go a bit further in our analysis of piecewise linear approximation and show that H¨ older-domination of the ρ-variation (on the “diagonal” 2 [s, t] ) remains valid when switching from R = RX to R(X D ,X ) (this will only be used in Section 15.5.1 for establishing H¨ older convergence of piecewise linear approximations). As usual, given s ∈ [0, 1] , we write sD for the greatest element of D such that sD ≤ s, and sD the smallest element of D such that s < sD . Lemma 15.12 Let X be a continuous, centred, real-valued Gaussian process on [0, 1] with covariance R. Then (i) for all u1 , v1 ,u2 , v2 ∈ D, D 1 R ≤ 91− ρ |R|ρ-var;[u 1 ,v 1 ]×[u 2 ,v 2 ] ; ρ-var;[u ,v ]×[u ,v ] 1

1

2

2

(ii) for all s, t ∈ [0, 1], with sD ≤ s, t ≤ sD , for all u, v ∈ D, D 1/2 1/2 1− ρ1 t − s R Xs ,s D 2 ≤ 9 |R|ρ-var;[u ,v ] 2 ; E D sD − sD ρ-var;[s,t]×[u ,v ] D (iii) for all s1 , t1 , s2 , t2 ∈ [0, 1], with s1,D ≤ s1 , t1 ≤ sD 1 , s2,D ≤ s2 , t2 ≤ s2 , D t1 − s1 t2 − s2 R ≤ sD − s sD − s E Xs 1 , D ,s 1 , D Xs 2 , D ,s 2 , D . ρ-var;[s 1 ,t 1 ]×[s 2 ,t 2 ] 1,D 2,D 1 2

Gaussian processes

412

Proof. (i) This follows from Proposition 15.11; indeed, there is no diﬀerence 2 in the argument between working with [0, 1] or rectangles whose interval endpoints are elements of D. (ii) The second estimate is a bit more subtle. Take s, t ∈ [0, 1], with sD ≤ s, t ≤sD , u, v ∈ D, (si ) and (tj ) subdivisions of

= E XsDi ,s i + 1 XtD , we know from Proposition [s, t] and [u, v] . Then, if hi,D t 15.7 that i,D h ρ-var;[u ,v ]

≤

1/2 D 1/2 D 2 R Xs,t E ρ-var;[u ,v ]

≤

9ρ−1

1/2 si+1 − si 1/2 Xs ,s D 2 |R| . 2 E D ρ-var;[u ,v ] sD − sD

Hence, for a ﬁxed i, ρ ρ E XsDi s i + 1 XtDj ,t j + 1 ≤ hi,D ρ-var;[u ,v ] j

si+1 −si 9ρ−1 D s − sD

≤

1/2 |R|ρ-var;[u ,v ] 2

21/2 ρ E Xs D ,s D .

Summing over i and taking the supremum over all dissections ends the proof of the second estimate. We leave the easy proof of the third estimate to the reader. Proposition 15.13 (H¨ older estimate in piecewise linear case) Let X be a continuous, centred, real-valued Gaussian process on [0, 1] with covariance R assumed to be of ﬁnite ρ-variation. Then ρ

|RX |ρ-var;[s,t] 2 ≤ K |t − s|

for all s < t in [0, 1]

implies, for some constant C = C (ρ), R(X ,X D ) ρ

ρ-var;[s,t] 2

≤ CK |t − s|

for all s < t in [0, 1] .

Proof. We need to estimate the ρ-variation of all the entries of R(X ,X D ) ,

s t

→

E(Xs Xt ) E XsD Xt

E Xs XtD E XsD XtD

and focus on the lower-right entry, which is precisely RX D . By scaling we assume, without loss of generality, that K = 1. Then, by an argument similar to the proof of Proposition 5.60 (or Exercise 5.11 for the analo2 gous 1-dimensional case), we may estimate its ρ-variation over some [s, t] (with the property that sD ≤ tD ) in terms of the ρ-variation of RD over

15.2 One-dimensional Gaussian processes

413

smaller rectangles, namely ρ ρ 1 D ρ R ρ-var;[s,t] 2 ≤ RD ρ-var;[s,s D ] 2 + RD ρ-var;[s,s D ]×[s D ,t D ] ρ−1 9 ρ ρ + RD ρ-var;[s,s D ]×[t ,t] + RD ρ-var;[s D ,t ]×[s,s D ] D D ρ ρ + RD ρ-var;[s D ,t ]×[s D ,t ] + RD ρ-var;[s D ,t ]×[t ,t] D D D D D ρ D ρ + R ρ-var;[t ,t]×[s,s D ] + R ρ-var;[t ,t]×[s D ,t ] D D D D ρ + R ρ-var;[t D ,t] 2 . The proof is then easily ﬁnished with Lemma 15.12 and the fact that, for sD ≤ s, t ≤ sD , we have estimates of the form 1/2 t − s D t − s E Xs ,s D 2 s − sD 1/(2ρ) ≤ D D sD − sD s − sD t − s 1−1/2ρ 1/(2ρ) = D |t − s| s − sD ≤

1/(2ρ)

|t − s|

.

Similar arguments apply to the ρ-variation of (s, t)→ E Xs XtD , E XsD Xt with details left to the reader. The proof is then ﬁnished.

15.2.4 Covariance of molliﬁer approximations Let X be a continuous centred, real-valued Gaussian process on [0, 1] with covariance R = RX assumed to be of ﬁnite ρ-variation, dominated by some 2D control function ω. We now consider molliﬁer approximations. To this end, let us ﬁrst extend Xt from [0, 1] to (−∞, ∞) by setting Xt ≡ X0 for t < 0 and Xt ≡ X1 for t > 1. As a simple consequence of this, for any rectangle Q ⊂ R2 , |RX |ρ-var;Q = |RX |ρ-var;Q ∩[0,1] 2 .

(15.6)

Then, given a “molliﬁer” probability measure µ on R, compactly supported, we deﬁne µ Xt = Xt−u dµ (u) ; we also recall the notation (cf. Proposition 5.64) s, t s − a, t − a µ, µ ˜ ω = ω dµ (a) d˜ µ (b) . u, v u − b, v − b Proposition 15.14 (covariance of molliﬁer approximations) Let X be a continuous, centred, real-valued Gaussian process on [0, 1] with covariance R assumed to be of ﬁnite ρ-variation, controlled by ρ

ω (Q) = |R|ρ-var;Q for any rectangle Q ⊂ R2 .

414

Gaussian processes

Let µ be a compactly supported probability measure on R. Then X µ is a Gaussian process with covariance of ﬁnite ρ-variation controlled by ω µ,µ . Moreover, ω µ n ,µ n → ω (pointwise) along any sequence µn −→ δ 0 , the Dirac ˜ denotes measure at zero.6 If µ another compactly supported probability measure on R, then X µ , X µ˜ is jointly Gaussian with covariance   ˜ µ µ µ µ X ) E X X E (X t t s s s  R(X µ ,X µ˜ ) : →  t E X µ˜ X µ˜ E X µ˜ X µ s

t

s

t

of ﬁnite ρ-variation, controlled by a 2D control ω ˆ which satisﬁes 1/ρ 2 2 1/ρ R(X µ ,X µ˜ ) . ˆ [0, 1] ≤ 4.4 ω [0, 1] 2 ≤ ω ρ-var;[0,1]

δ Remark 15.15 We shall apply µ, µ ˜ given by µδ , µη where dµ := 1δ ϕ uδ

and ϕ ∈ C ∞ (R, R+ ), supported on [−1, 1] with total mass ϕ (t) dt = 1. Note that µδ converges to the Dirac measure at zero, as δ → 0. Proof. We leave it to the reader to check that X µ , and then X µ , X µ˜ , are Gaussian processes. Proposition 5.64 then implies all statements, noting that µ, µ˜ R Rµ, µ˜ R(X µ ,X µ˜ ) = Rµ˜ ,µ Rµ˜ , µ˜ has ﬁnite ρ-variation controlled by ω ˆ µ, µ˜ = 4ρ ω µ,µ + ω µ, µ˜ + ω µ˜ , µ˜ + ω µ˜ , µ˜ .

15.2.5 Covariance of Karhunen–Lo`eve approximations Let X be a continuous centred, real-valued Gaussian process on [0, 1] with covariance R = RX assumed to be of ﬁnite ρ-variation, dominated by some 2D control function ω. We now consider Karhunen–Lo`eve approximations, also known as L2 -approximations. The situation here is more subtle than for piecewise linear or molliﬁer approximations. We will focus on the important case ρ ∈ [1, 2), although we only obtain uniform 2-variation bounds rather than uniform ρ-variation bounds (there is a world of diﬀerence, as will be seen in the next section: ρ < 2 allows for many uniform estimates which do not hold with ρ = 2). For a precise statement, we need some notation: H ⊂ C ([0, 1] , R) denotes the Cameron–Martin space, for which we ﬁx an orthonormal basis hk : k ∈ N . From general principles H embeds isometrically into a (Gaussian) subspace of L2 (P), h ∈ H → ξ (h) ∈ L2 (P) and there is an L2 -expansion/approximation of the following type, where ξ is the Paley–Wiener map (cf. Section D.3 in Appendix D). 6 That

is, for all bounded continuous functions f , lim n →∞

f dµ n = f (0) .

15.2 One-dimensional Gaussian processes

415

Deﬁnition 15.16 eve approximation) For a ﬁxed or (Karhunen–Lo` thonormal basis hk : k ∈ N in H ⊂ C ([0, 1] , R) consider the L2 -expansion of X, Zk hk , convergent a.s. and in L2 (P) , X= k ∈N

where Zk := ξ hk , k ∈ N, is a sequence of independent standard normal random variables. For a ﬁxed set A ⊂ N, deﬁne FA = σ (Zk , k ∈ A) and XtA = E [Xt |FA ] . The sequence X {1,...,n } : n ∈ N is then called a Karhunen–Lo`eve approx imation to X. Remark 15.17 Observe that t → XtA is a Gaussian process in its own right with covariance function i i hs ht . RA (s, t) := RX A (s, t) := E XsA XtA = i∈A

Lemma 15.18 Let X be a continuous, centred, real-valued Gaussian process on [0, 1] with covariance R assumed to be of ﬁnite ρ-variation, for some ρ ≥ 1. Then, for all subsets A of N, |RX A |ρ-var;[s,t] 2

≤

(1 + min {|A| , |Ac |}) |RX |ρ-var;[s,t] 2 ,

|RX A |2-var;[s,t] 2

≤

|R|2-var;[s,t] 2 .

Proof. We ﬁrst prove the ﬁrst inequality, assuming that |Ac | < ∞ so that hk ⊗ hk . RX A = k ∈A c

It is then clear from Proposition 15.7, using |hk |H = 1, that 2

|hk ⊗ hk |ρ-var;[s,t] 2 ≤ |hk |ρ-var;[s,t] ≤ |R|ρ-var;[s,t] 2 , and it follows from the triangle inequality that |RX A |ρ-var;[s,t] 2 ≤ |R|ρ-var;[s,t] 2 + |hk ⊗ hk |ρ-var;[s,t] 2 k ∈A c

≤

(1 + |A |) |R|ρ-var;[s,t] 2 . c

The case |A| < ∞ is similar but easier and left to the reader. We now turn to the proof of the second inequality: let D = (ti ) be a dissection of [s, t] and set XiA = XtAi ,t i + 1 . Let β = β i,j be a positive symmetric matrix, and let us estimate i,j β i,j E XiA XjA . To this end, note 1 2 E XiA XjA = E (Zk Xi ) E (Zk Xj ) = E Zk − E Zk2 Xi Xj , 2 k ∈A

k ∈A

Gaussian processes

416

so that

A

β i,j E XiA Xj

  2 1  2 = E Zk − E Zk β i,j Xi Xj  . 2 i,j k ∈A

i,j

As β is symmetric, we can write β = P T DP, with P P T the identity matrix, and D a diagonal matrix which contains the (non-negative) eigenvalues (di ) of β. Simple linear algebra gives T 2 β i,j Xi Xj = (P X) D (P X) = di (P X)i , i,j

i

and we can compute = β i,j E XiA XjA i,j

1 2 di E Zk2 − E Zk2 (P X)i 2 k ∈A i 2 di E (Zk (P X)i ) =

i

≤

k ∈A

2 di E (P X)i

(Parseval inequality)

i

T E (P X) D (P X) β i,j E (Xi Xj ) =

=

i,j

≤

|β|l 2 |R|2-var;[s,t] 2

(H¨ older inequality).

(Note that ﬁnite ρ ∈ [1, 2)-variation of R implies ﬁnite 2-variation of R.) We now apply this estimate with β i,j = E XiA XjA and ﬁnd 1 E X A X A 2 ≤ |R| i j 2-var . i,j

The proof is ﬁnished by taking the supremum over all dissections of [s, t].

15.3 Multidimensional Gaussian processes Any Rd -valued centred Gaussian process X = X 1 , . . . , X d with continuous sample paths gives rise to an abstract Wiener space (E, H, P) with E = C [0, 1] , Rd and H ⊂ C [0, 1] , Rd . If Hi denotes the Cameron– Martin process X i and with the 1-dimensional Gaussian i space associated d i ∼ all X : i = 1, . . . , d are independent, then H = ⊕i=1 H . Recall that H embeds isometrically into a (Gaussian) subspace of L2 (P), h ∈ H → ξ (h) ∈ L2 (P) .

15.3 Multidimensional Gaussian processes

417

15.3.1 Wiener chaos From Section D.4 of Appendix D, there is an (orthogonal) decomposition of the form (n ) (P) . L2 (P) = ⊕∞ n =0 W The subspaces W n (P) are known as homogenous Wiener chaos of order n and C n (P) := ⊕nj=0 W (i) (P) denotes the Wiener chaos (or non-homogenous chaos) of order n. Our interest in Wiener chaos comes from the fact that C n (P) is precisely the closure (in probability, say) of polynomials of degree less than or equal to n in the variables ξ (hk ) where (hk ) ⊂ H is any ﬁxed orthonormal basis. In particular, any polynomial in Xtikk for ﬁnitely many ik ∈ {1, . . . , d} and tk ∈ [0, 1] is an element of C n (P) for suﬃciently large n. d Proposition 15.19 Assume the R -valued continuous centred Gaussian process X = X 1 , . . . , X d has sample paths of ﬁnite variation and let N d SN (X) d ≡ X denote its natural lift to a process with values in G R ⊂ N R . Then, for n = 1, . . . , N and any s, t ∈ [0, 1] the random variable T π n (Xs,t ) is an element in the nth (in general, not homogenous) Wiener chaos.7

Proof. π n (X) is given by n iterated integrals which can be written out in terms of (a.s. convergent) Riemann–Stieltjes sums. Each such Riemann– Stieltjes sum is a polynomial of degree at most n and with variables of the form Xs,t . It now suﬃces to remark that the (not necessarily homogenous) nth Wiener chaos contains all such polynomials and is closed under convergence in probability. As a special case of the Wiener chaos integrability, see (D.5) in Section D.4, we have Lemma 15.20 Let n ∈ N and Z ∈ C n (P). Then, for q > 2, n /2

|Z|L 2 ≤ |Z|L q ≤ |Z|L 2 (n + 1) (q − 1)

.

A simple but useful consequence is that, for random variables Z, W ∈ C n (P) we have |W Z|L 2 ≤ C (n) |W |L 2 |Z|L 2 . (15.7) (There is nothing special about L2 here, but this is how we usually use it.) We now discuss some more involved corollaries. Corollary 15.21 Let g be a random element of GN Rd such that for all 1 ≤ n ≤ N the projection π n (g) is an element of the nth Wiener chaos. Let δ be a positive real. Then the following statements are equivalent: 7 Strictly

⊗n speaking, the Rd -valued chaos.

Gaussian processes

418

(i) there exists a constant C1 > 0 such that for all n = 1, . . . , N, there exists q = q (n) ∈ [1, ∞) : |π n (g)|L q ≤ C1 δ n ; (ii) there exists a constant C2 > 0 such that for all n = 1, . . . , N and for n all q ∈ [1, ∞) : |π n (g)|L q ≤ C2 q 2 δ n ; q 1/q (iii) there exists a constant C3 > 0 and there exists q ∈ [1, ∞) : E (g ) ≤ C3 δ; q 1/q (iv) there exists a constant C4 > 0 such that for all q ∈ [1, ∞) : E (g ) 1 ≤ C4 q 2 δ. When switching from the ith to the jth statement, the constant Cj depends only on Ci and N . Proof. Clearly, (iv)=⇒(iii), (ii)=⇒(i), and Lemma 15.20 shows (i)=⇒(ii). It is therefore enough to prove (ii)=⇒(iv), (iii)=⇒(i), (ii)=⇒(iv): By equivalence of homogenous norms, we have N

1/n

g ≤ c1 max |π n (g)| n =1

,

so that, q 1/q

E (g )

≤ ≤ ≤

1/q N q /n c2 max E |π n (g)| n =1 N n q /n 1/q c3 max q 2 δ n n =1

1 2

c4 q δ.

(iii)=⇒(i): By equivalence of homogenous norms, we have 1/n

|π n (g)| Hence,

≤ c5 g .

n /q 0 q /n q n /q ≤ cn5 E (g 0 ) 0 ≤ c6 δ n . E |π n (g)| 0

Proposition 15.22 Let X be a continuous GN Rd -valued stochastic process and ω a control function on [0, 1]. Assume that for all s < t in [0, 1] and n = 1, . . . , N , the projection π n (Xs,t ) is an element in the nth Wiener chaos and that, for some constant C, n

|π n (Xs,t )|L 2 ≤ Cω (s, t) 2 ρ .

(15.8)

Then, (i) there exists a constant C = C (ρ, N ) such that for all s < t in [0, 1] and q ∈ [1, ∞), 1 √ (15.9) |d (Xs , Xt )|L q ≤ C qω (s, t) 2 ρ ;

15.3 Multidimensional Gaussian processes

419

(ii) if p > 2ρ then Xp-var;[0,1] has a Gaussian tail. More precisely, if ω (0, 1) ≤ K then there exists η = η (p, ρ, N, K) > 0 such that 2 E exp η Xp-var;[0,1] < ∞; (15.10) (iii) if ω (s, t) ≤ K |t − s| for all s < t in [0, 1] we may replace Xp-var;[0,1] in (15.10) by X1/p-H¨o l;[0,1] . Proof. (i) is a clear consequence of Corollary 15.21 and (iii) follows from a (probabilistic) Besov–H¨ older embedding, Theorem A.12, in Appendix A. At last, (ii) follows from (iii) by reparametrization. Indeed, assuming without loss of generality that ω (0, 1) > 0, super-additivity of controls implies & % ω (0, s) ω (0, t) − ∀s < t in [0, T ] : ω (s, t) ≤ ω (0, 1) ω (0, 1) ω (0, 1) ˜ t : 0 ≤ t ≤ 1 by requiring that and we may deﬁne X ˜ ω (0,t)/ω (0,1) = Xt . X (Note that ω (0, s) /ω (0, 1) = ω (0, t) /ω (0, 1) =⇒ ω (s, t) = 0 =⇒ ˜ is indeed well-deﬁned.) Clearly, X ˜ X|[s,t] ≡ Xs a.s. from (15.8) and X satisﬁes the assumptions for (iii) with K = ω (0, 1) and we conclude with invariance of variation norms under reparametrization, ˜ ˜ ≤ X . Xp-var;[0,1] = X p-var;[0,1]

1/p-H¨o l;[0,1]

Remark 15.23 (L´ evy modulus and exact variation) In the setting of Proposition 15.22 and under a H¨ older assumption on ω, i.e. ∀s < t in [0, 1] : ω (s, t) ≤ K |t − s| it is immediate from (15.8), cf. Lemma A.17, that there exists η = η (ρ, N ) > 0 so that 2 d (Xs , Xt ) < ∞. sup E exp η 1/ρ s,t∈[0,1] |t − s| In the language of Section A.4, Appendix A, this shows that X satisﬁes the “Gaussian integrability condition (2ρ)”. From the general results of that appendix it then follows that X has a.s. L´evy modulus-type regularity and also ﬁnite ψ 2ρ,ρ -variation. In fact, the same reparametrization argument that was used in the proof of Proposition 15.22 shows that ﬁnite ψ 2ρ,ρ variation holds without the H¨ older assumption on ω. We note that, at least for ρ = 1, the interest in generalized variation regularity comes from Section 10.5.

Gaussian processes

420

Proposition 15.24 Let X, Y be two continuous GN Rd -valued stochastic processes and ω a control function on [0, 1]. Assume that for all s < t in [0, 1] and n = 1, . . . , N, π n (Xs,t ) and π n (Ys,t ) are elements of the nth Wiener chaos and that, for some C > 0 and ε > 0, n

n

|π n (Xs,t )|L 2 ≤ Cω (s, t) 2 ρ and |π n (Ys,t )|L 2 ≤ Cω (s, t) 2 ρ , |π n (Ys,t − Xs,t )|L 2 ≤ Cεω (s, t)

n 2ρ

.

(15.11) (15.12)

Then, (i) there exists a constant C = C (C, ρ, N ) such that for all q ∈ [1, ∞) n

|π n (Ys,t − Xs,t )|L q ≤ C q 2 εω (s, t) 2 ρ ; n

(ii) if p > 2ρ there exists a constant C = C (C, p, ρ, N ) such that dp-var;[0,1] (X, Y) q ≤ C max ε1/N , ε √q (15.13) L and, for all n = 1, . . . , N , we have (n ) ρp-var;[0,T ] (X, Y)

≤ C q 2 ε; n

Lq

(15.14)

(iii) if ω (s, t) ≤ K |t − s| for all s < t in [0, 1] then we may replace (n ) (n ) dp-var;[0,1] , ρp-var;[0,T ] in (15.13), (15.14) by d1/p-H¨o l;[0,1] , ρ1/p-H¨o l;[0,T ] respectively. Proof. (i) is a clear consequence of Corollary 15.21 and (iii) follows from a (probabilistic) Besov–H¨ older “distance” comparison, Theorem A.13. Case (ii) then follows from (iii) by the same reparametrization argument that we used in the proof of Proposition 15.22. Remark 15.25 Recall from Deﬁnition 8.6 that the inhomogenous p-variation distance was given by ρp-var;[0,1] (X, Y) =

(n )

max ρp-var;[0,1] (X, Y)

n =1,...,N

with (n ) ρp-var;[0,1]

(X, Y) =

sup (t i )∈D[0,T ]

π k Xt

i ,t i + 1

− Yt i ,t i + 1

p/k

k /p

i

so that in the context of Proposition 15.24 one has, for ε ∈ (0, 1], q ∈ [1, ∞), N ρp-var;[0,1] (X, Y) q ≤ C q 2 ε, L dp-var;[0,1] (X, Y) q ≤ C ε1/N √q. L

15.3 Multidimensional Gaussian processes

421

(Similarly with 1/p-H¨ older distances provided ω (s, t) ≤ (const) × |t − s| .) In other words, working with an “inhomogenous” p-variation distance yields linear estimates in ε (which is useful, since by Theorem 10.38 the Itˆo–Lyons map is locally Lipschitz continuous in ρp-var;[0,1] ) while working with “homogenous” p-variation distance dp-var has the advantage that the random variable dp-var;[0,1] (X, Y) has a Gaussian tail (which will be useful to establish “exponential goodness” of certain approximations in the context of large deviations, cf. Section 15.7).

15.3.2 Uniform estimates for lifted Gaussian processes As in the previous section, we consider an Rd -valued continuous centred Gaussian process X with independent components X 1 , . . . , X d . We shall assume that the sample paths (Xt (ω) : t ∈ [0, 1]) are of ﬁnite variation. This implies that iterated integrals of X are well-deﬁned as Riemann– Stieltjes integrals and we shall see that their second moments are controlled uniformly (i.e. with constants not depending on the ﬁnite-variation sample path assumption!) using the estimates for 2D Young integrals (from Section 6.4) of the respective covariances, which explains the standing assumption ∃ρ ∈ [1, 2) : |RX |ρ-var;[0,1] 2 < ∞. (Recall that RX (s, t) = E (Xs ⊗ Xt ) = diag (RX 1 , . . . , RX d ) is the Rd ⊗ Rd -valued covariance function of X.) We shall also control the diﬀerence between (the iterated integrals of) a pair of Gaussian processes (X, Y ), in which case we will make the stronger assumption ∃ρ ∈ [1, 2) : R(X ,Y ) ρ-var;[0,1] 2 < ∞. The following exercise shows that the above assumption indeed implies that X, Y, X − Y , etc. have covariance of ﬁnite ρ-variation and we shall use this without further notice. Exercise 15.26 Let Z = (Z1 , . . . , Zn ) be a centred, n-dimensional Gaussian process, with covariance RZ of ﬁnite ρ-variation controlled by some 2D control ω. Let α be a linear map from Rn into Rd , then the covariance of αZ also has ﬁnite ρ-variation controlled by Cω where C = C (α). In a typical application below, (X, Y ) is a (2d)-dimensional, centred Gaussian process in which all coordinate pairs X 1 , Y 1 , . . . , X d , Y d are independent (think of Y as the coordinate-wise piecewise linear or molliﬁer approximation to X) which allows us to reduce parts of the analysis to d = 1. We will need Lemma 15.27 Let (X, Y ) be a 2-dimensional centred Gaussian process with covariance R of ﬁnite ρ-variation controlled by ω. Then, for ﬁxed

Gaussian processes

422

s < t in [0, 1], the function 2

(u, v) ∈ [s, t] → f (u, v) := E (Xs,u Ys,u Xs,v Ys,v ) satisﬁes f (s, ·) = f (·, s) = 0 and has ﬁnite ρ-variation. More precisely, there exists a constant C = C (ρ) such that 2 ρ 2 . |f |ρ-var;[s,t] 2 ≤ Cω [s, t] Proof. We ﬁx u < u , v < v , all in [s, t] . Using Xs,u Ys,u − Xs,u Ys,u = Xu ,u Ys,u + Xs,u Yu ,u , we bound |E ((Xs,u Ys,u − Xs,u Ys,u ) (Xs,v Ys,v − Xs,v Ys,v ))| by |E (Xu ,u Ys,u Xv ,v Ys,v )| + |E (Xs,u Yu ,u Xv ,v Ys,v )| + |E (Xu ,u Ys,u Xs,v Yv ,v )| + |E (Xs,u Yu ,u Xs,v Yv ,v )| . To estimate the second expression, for example, we use a well-known identity for the product of Gaussian random variables,8 E (Xs,u Yu ,u Xv ,v Ys,v )

= E (Xs,u Yu ,u ) E (Xv ,v Ys,v ) + E (Xs,u Xv ,v ) E (Yu ,u Ys,v ) + E (Xs,u Ys,v ) E (Xv ,v Yu ,u ) ,

to obtain 1 ρ |E (Xs,u Yu ,u Xv ,v Ys,v )| Cρ

≤

ω ([s, u] × [u, u ]) ω ([v, v ] × [s, v ]) + ω ([s, u] × [v, v ]) ω ([u, u ] × [s, v ]) + ω ([s, u] × [s, v ]) ω ([u, u ] × [v, v ])

≤

ω ([s, t] × [u, u ]) ω ([v, v ] × [s, t]) + ω ([s, t] × [v, v ]) ω ([u, u ] × [s, t]) + ω ([s, t] × [s, t]) ω ([u, u ] × [v, v ]) .

Working similarly with all terms, we obtain that this last expression con2 trols the ρ-variation of (u, v) ∈ [s, t] → E (Xs,u Ys,u Xs,v Ys,v ) , and the 2 bound on the ρ-variation on [s, t] . Proposition 15.28 Assume that (i) X = X 1 , . . . , X d is a centred continuous Gaussian process with independent components and with bounded variation sample paths; 8 This is a consequence of the so-called Wick formula for Gaussian random variables; see also [120, Lemma 4.5.1].

15.3 Multidimensional Gaussian processes

423

(ii) the covariance of X is of ﬁnite ρ-variation dominated by a 2D control ω, for some ρ ∈ [1, 2); (iii) X = S3 (X) . There exists C = C (ρ) such that for all s < t in [0, 1] , for n = 1, 2, 3,; n 2 2ρ . |π n (Xs,t )|L 2 (P) ≤ Cω [s, t] Proof. From Proposition 15.61 in the appendix to this chapter, it is enough to prove 1/ρ 2 i 2 ≤ ω [s, t] (a) E Xs,t for all i; 2 2/ρ 2 ≤ Cω [s, t] for i, j distinct; (b) E Xi,j s,t 2 3/ρ 2 ≤ Cω [s, t] for i, j distinct; (c) E Xi,i,j s,t 2 3/ρ 2 ≤ Cω [s, t] for i, j, k distinct. (d) E Xi,j,k s,t The level-one estimate (a) is obvious. For the level-two estimate (b), we ﬁx i = j and s < t, s < t . Then, using independence of X i and X j , t t i,j i,j i = E Xs,u Xsi ,v dXuj dXvj E Xs,t Xs ,t t

s

s

i E Xs,u Xsi ,v dE Xuj Xvj s s t t s, u u dR RX i = X j s , v v s s t

=

≤

Cω ([s, t] × [s , t ])

2/ρ

by Young 2D estimate.

(b) follows trivially from setting s = s , t = t (the general result will be used in the level-three estimates, see step 2 below). We break up the levelthree estimates into several steps. Throughout, the indices i, j (and then k) are assumed to be distinct. Step 1: For ﬁxed s < t, s < t , t < u we claim that 1/ρ j i 1/ρ ω ([s, t] × [t , u ]) . E Xi,j s,t Xs ,t Xt ,u ≤ Cω ([s, t] × [s , t ]) Indeed, with dE Xtj ,u Xuj ≡ E Xtj ,u X˙ uj du we have j i E Xi,j s,t Xs ,t Xt ,u

=

t

E

s t

= u =s

i Xs,u Xsi ,t Xtj ,u dXuj

i E Xs,u Xsi ,t dE Xtj ,u Xuj .

Gaussian processes

424

i Since the 1D ρ-variation of u → E Xs,u Xsi ,t is controlled by (u, v) → ω ([u, v] × [s , t ]) , and similarly for u → E Xtj ,u Xuj , the (classical 1D) Young estimate gives t i j i j E Xs,u Xs ,t dE Xt ,u Xu u=s

≤ Cω ([s, t] × [s , t ])

1/ρ

ω ([s, t] × [t , u ])

1/ρ

. 2

Step 2: For ﬁxed s < t, we claim that the 2D map (u, v) ∈ [s, t] i,j E Xi,j s,u Xs,v has ﬁnite ρ-variation controlled by 2 [u1 , u2 ] × [v1 , v2 ] → Cω [s, t] ω ([u1 , u2 ] × [v1 , v2 ]) .

→

Indeed, using the level-two estimate and step 1, for u1 < u2 , v1 < v2 all in [s, t], i,j i,j Xs,v 2 − Xi,j E Xi,j s,u 2 − Xs,u 1 s,v 1 i i j Xuj 1 ,u 2 Xi,j = E Xi,j u 1 ,u 2 + Xs,u v 1 ,v 2 + Xs,v 1 Xv 1 ,v 2 1 i,j = E X i,j u 1 ,u 2 Xv 1 ,v 2 i,j i Xvj 1 ,v 2 + E Xu 1 ,u 2 Xs,v 1 i + E Xs,u Xuj 1 ,u2 Xi,j v1 ,v 2 1 i i + E Xs,u 1 Xs,v E Xuj 1 ,u 2 Xvj 1 ,v 2 1 2/ρ ≤ ω ([u1 , u2 ] × [v1 , v2 ]) 1/ρ 1/ρ + ω ([u1 , u2 ] × [s, v1 ]) ω ([u1 , u2 ] × [v1 , v2 ]) 1/ρ 1/ρ + ω ([s, u1 ] × [v1 , v2 ]) ω ([u1 , u2 ] × [v1 , v2 ]) 1/ρ 1/ρ + ω ([s, u1 ] × [s, v1 ]) ω ([u1 , u2 ] × [v1 , v2 ]) 1/ρ 2 ≤ 4 ω [s, t] ω ([u1 , u2 ] × [v1 , v2 ]) . (Here we used the fact that ω can be taken symmetric.) Step 3: We now establish the actual estimates and start with (d). For i, j, k distinct, we have 2 t i,j k i,j = Xs,u dXu E Xi,j E s,u Xs,v dRk (u, v) . [s,t] 2

s

By Young’s 2D estimate, combined with ρ-variation regularity of the integrand established in step 2, we obtain 2 3/ρ t i,j 2 k ≤ Cω [s, t] E Xs,u dXu , s

as desired. The estimate (c) follows from 2 t i 2 2 i 2 k i E Xs,u dXu Xs,v dRk (u, v) = E Xs,u s

[s,t] 2

15.3 Multidimensional Gaussian processes

425

and Young’s 2D estimate, combined with ρ-variation regularity of the integrand which follows as a special case of Lemma 15.27 (the full generality will be used in the next section). Corollary 15.29 Let X, ρ, ω be as in the last proposition. Then (i) there exists a constant C = C (ρ, N ) such that for all s < t in [0, 1] and q ∈ [1, ∞), 1 √ |d (Xs , Xt )|L q ≤ C qω (s, t) 2 ρ ;

(15.15)

(ii) if p > 2ρ then Xp-var;[0,1] has a Gaussian tail. More precisely, if ω (0, 1) ≤ K then there exists η = η (p, ρ, N, K) > 0 such that 2 E exp η Xp-var;[0,1] < ∞;

(15.16)

(iii) if ω (s, t) ≤ K |t − s| for all s < t in [0, 1] we may replace Xp-var;[0,1] in (15.16) by X1/p-H¨o l;[0,1] .

Proof. An immediate consequence of the estimates of Propositions 15.28 2 and 15.22, applied with (1D) control (s, t) → ω [s, t] . Our next task is to establish suitable moment estimates for the diﬀerence of the (ﬁrst three) iterated integrals of two nice Gaussian processes. Proposition 15.30 Let (i) (X, Y ) = X 1 , Y 1 , . . . , X d , Y d be a centred continuous i i Gaussian process with bounded variation sample paths, such that X , Y is independent of X j , Y j when i = j; (ii) the covariance of (X, Y ) is of ﬁnite ρ-variation dominated by a 2D control ω, for some ρ ∈ [1, 2); (iii) X = S3 (X) and Y = S3 (X); (iv) ε > 0 such that for all s < t in [0, 1] , 1/ρ 2 . |RX −Y |ρ-var;[s,t] 2 ≤ ε2 ω [s, t] 2 Then, for ω [0, 1] ≤ K, there exists a constant C = C (ρ, K) such that for all s < t in [0, 1] and n = 1, 2, 3, we have n 2 2ρ |π n (Xs,t − Ys,t )|L 2 (P) ≤ Cεω [s, t] .

Gaussian processes

426

Proof. From Proposition 15.62 in the appendix to this chapter, it is enough to prove i 2 (a) E Xis,t − Ys,t 2 i,j − Y (b) E Xi,j s,t s,t 2 i,i,j − Y (c) E Xi,i,j s,t s,t 2 i,j,k − Y (d) E Xi,j,k s,t s,t

≤

1/ρ 2 εω [s, t] for all i; 2/ 2 Cεω [s, t] for i, j distinct;

≤

3/ρ 2 Cεω [s, t] for i, j distinct;

≤

3/ρ 2 Cεω [s, t] for i, j, k distinct.

≤

The level-one estimate (a) is obvious from 1/ρ 2 i i 2 ≤ |RX i −Y i |ρ-var;[s,t] 2 ≤ |RX −Y |ρ-var;[s,t] 2 ≤ εω [s, t] E Xs,t − Ys,t . For the level-two estimate (b) we ﬁx i = j. By inserting/subtracting

t i Xs,u dYuj we have s t 2 2 t 2 i,j i,j i,j i,j i j i j Xs,u dYu + 2 Xs,u dYu − Ys,t Xs,t − Ys,t 2 ≤ 2 Xs,t − L s s L2 L2 # j j 2 t i Xu − Yu √ ≤ 2ε Xs,u d 2 ε s L 2  t Xi − Y i  s,u √ s,u dYuj + s 2 ε

2

≤ 2c1 ω [s, t]

2/ρ

L

where the last estimate comes from application of Proposition 15.28 to a ˜ = X i , X j − Y j /√ε . (2-dimensional) Gaussian process of the form X ˜ has indeed independent components and covariance of ﬁnite (Note that X ρ-variation, controlled by ω.) The level-three estimate (d), on the variance i,j,k of Xi,j,k s,t −Ys,t , with i, j, k distinct and ﬁxed, is proved in a similar fashion:

k after adding/subtracting [s,t] Xi,j we are left with an integral of the s,· dY

i,j k form Xs,· d (X − Y ) and a second one of the form i,j j i i,j k i j k i Xs,u −Ys,u dYu = dY j dY k . Xs,· d X −Y dY + Xs,· −Ys,· It then suﬃces to apply Proposition 15.28 √ to a (3-dimensional) Gaussian process of the form X i , X j , X k − Y k / ε .

15.3 Multidimensional Gaussian processes

427

It remains to prove the other level-three estimate (c) and we keep i = j ﬁxed throughout. We have t 2 i 2 j 2 i,i,j i,i,j j Xs,u d Xu − Yu Xs,t − Ys,t 2 ≤ 2 L

s

L2

t 2 i 2 i 2 j Xs,u − Ys,u dYu . + 2 2 s

L

t

2 j i d Xu − Yuj can be written as a 2D Young inThe variance of s Xs,u tegral and by Lemma 15.27 and 2D Young estimates we obtain t i 2 j 2 3/ρ j X d X − Y ≤ c2 εω (s, t) . s,u u u 2 s

L

To deal with the other term, we ﬁrst note that, again using Lemma 15.27, the ρ-variation of 1 ! i 2 i 2 i 2 i 2 " Xs,u − Ys,u Xs,v − Ys,v (u, v) → g (u, v) ≡ E ε 0 / i i i i Xs,v i i − Ys,u − Ys,v Xs,u i i √ √ = E Xs,u + Ys,u Xs,v + Ys,v ε ε 2 2 2 over [s, t] is controlled by a constant times ω [s, t] ; then, again using 2D Young estimates with 1/ρ + 1/ρ > 1, we see that t 2 i 2 i 2 j X dY − Y = ε g (u, v) dRY j (u, v) s,u s,u u s

[s,t] 2

L2

≤

3/ρ 2 c4 εω [s, t] .

The proof is then ﬁnished. Corollary 15.31 Let X = S3 (X) , Y = S3 (X) , ω, K, ρ as in the previous proposition and in particular 1/ρ 2 . (15.17) |RX −Y |ρ-var;[s,t] 2 ≤ ε2 ω [s, t] Then (i) there exists a constant C = C (ρ, K) such that for all s < t in [0, 1] , q ∈ [1, ∞) and n = 1, 2, 3, n n 2 2ρ ; |π n (Ys,t − Xs,t )|L q (P) ≤ Cq 2 εω [s, t] (ii) if p > 2ρ then there exists a constant C = C (p, ρ, K) such that √ dp-var;[0,1] (X, Y) q ≤ C max ε1/3 , ε q (15.18) L (P)

Gaussian processes

428

and for n = 1, 2, 3 we have (n ) ρp-var;[0,1] (X, Y)

≤ C q 2 ε; n

Lq

(P)

(15.19) (n )

(iii) if ω (s, t) ≤ K |t − s| for all s < t in [0, 1] then dp-var;[0,1] , ρp-var,[0,1] in (n )

(15.18), (15.19) may be replaced by d1/p-H¨o l;[0,1] , ρ1/p-H¨o l;[0,1] respectively. Proof. An immediate consequence of the estimates of Propositions 15.30 2 and 15.24, applied with (1D) control (s, t) → ω [s, t] and ρ ∈ (ρ, 2). For (ii), (iii) we may take ρ = ρ + min (p, 4) /2 (so that 2ρ < 2ρ < p) so that C has no explicit dependence on ρ . Remark 15.32 Assume the covariance of (X, Y ) is of ﬁnite ρ-variation dominated by a 2D control ω, for some ρ ∈ [1, 2). Then, by interpolation, for all ρ > ρ |RX −Y |ρ -var;[s,t] 2

≤ ≤

1−ρ/ρ

|RX −Y |∞

ρ/ρ

. |RX −Y |ρ-var;[s,t] 2 1/ρ 1−ρ/ρ 2 |RX −Y |∞ ω [s, t] .

But ω also controls the ρ -variation of the covariance of (X, Y ) ; indeed, R(X ,Y ) ρ

ρ -var;[s,t]×[u v ]

and hence, with c = K ρ

≤

R(X ,Y ) ρ

ρ-var;[s,t]×[u v ]

ρ −ρ ρ ≤ R(X ,Y ) ρ-var;[0,1] 2 R(X ,Y ) ρ-var;[s,t]×[u ,v ]

/ρ−1

ρ 2 , where R(X ,Y ) ρ-var;[0,1] 2 ≤ ω [0, 1] ≤ K,

R(X ,Y ) ρ ≤ cω ([s, t] × [u, v]) . ρ -var;[s,t]×[u v ] It follows that Corollary 15.31 may be applied with parameter ρ , control cω and 1−ρ/ρ . ε2 = |RX −Y |∞

15.3.3

Enhanced Gaussian process

The uniform estimates of the previous section, proved under the assumption of bounded variation sample paths, allow for a simple passage to the limit. Indeed, given a (d-dimensional) continuous Gaussian process X, whose sample paths are not of bounded variation, but whose covariance has ﬁnite ρ-variation, ρ ∈ [1, 2), we may consider suitably smooth approximations (X n ) for which 2 sup ω n ,m [0, 1] ≤ K n ,m

15.3 Multidimensional Gaussian processes

429

where ω n ,m is 2D control which controls the ρ-variation of R(X n ,X m ) . (In fact, we have already seen that the above supremum bound is satisﬁed for either piecewise linear or molliﬁer approximations.) Corollary 15.31 then implies that S3 (Xn ) is Cauchy-in-probability in C 0,p-var [0, 1] , G3 Rd , for any p > 2ρ, which leads us to the following result. Theorem 15.33 (enhanced Gaussian process) Assume X = X 1 , . . . , X d is a centred continuous Gaussian process with independent components. Let ρ ∈ [1, 2) and assume the covariance of X is of ﬁnite ρ-variation 2 dominated by a 2D control ω with ω [0, 1] ≤ K. Then, there exists a unique continuous G3 Rd -valued process X, such that: (i) X “lifts” the Gaussian process X in the sense π 1 (Xt ) = Xt − X0 ; (ii) there exists C = C (ρ) such that for all s < t in [0, 1] and q ∈ [1, ∞), 1 √ 2 2ρ ; |d (Xs , Xt )|L q ≤ C qω [s, t]

(15.20)

2 (iii) (Fernique-estimates) for all p > 2ρ and ω [0, 1] ≤ K, there exists η = η (p, ρ, K) > 0, such that 2 E exp η Xp-var;[0,1] <∞ 2 and if ω [s, t] ≤ K |t − s| for all s < t in [0, 1], then we may replace Xp-var;[0,1] by X1/p-H¨o l;[0,1] ; (iv) the lift X is natural in the sense that it is the limit of S3 (X n ) where X n is any sequence of piecewise linear or molliﬁer approximations to X such that d∞ (X n , X) converges to 0 almost surely. Deﬁnition 15.34 A G3 Rd -valued process X as constructed above is called an enhanced Gaussian process; if we want to stress the underlying Gaussian process we call X the natural lift of X. Sample path realizations of X are called Gaussian rough paths, as is motivated by (i) for ρ ∈ [1, 3/2), we see that X has almost surely ﬁnite p-variation, for any p ∈ (2ρ, 3) , and hence so does its projection to G2 Rd , which is therefore almost surely a geometric p-rough path; (ii) for ρ ∈ [3/2, 2), we see that X has almost surely ﬁnite p-variation, for any p ∈ (2ρ, 4) , and is therefore almost surely a geometric p-rough path. Remark 15.35 With the notation of the above theorem, if X has a.s. sample paths of ﬁnite [1, 2)-variation, X coincides with the canonical lift obtained by iterated Young integration of X. If ˜ = (1, π 1 (X) , π 2 (X)) ∈ C 0,p-var [0, 1] , G2 Rd a.s. X 0

Gaussian processes

430

˜ is a geometric p-rough path and X coincides with the for p < 3 then X ˜ Let us also observe that only point (iv) guarantees the Lyons lift of X. uniqueness of the lift. For p ∈ [3, 4), if e1 , . . . , ed is a basis of Rd , ˜ t := Xt ⊗ exp (t [e1 , [e1 , e2 ]]) X would also satisfy conditions (i) to (iii). Similarly, for p ∈ [2, 3), the Lyons lift of the projection to G2 Rd of t → Xt ⊗exp (t [e1 , e2 ]) would also satisfy conditions (i) to (iii). Proof. Fix a molliﬁer function ϕ (·) and set dµn (u) = nϕ (nu) du. Deﬁne smooth approximations to X, componentwise by convolution against dµn ; that is,9 t → Xtn =

Xt−u dµn (u) ,

so that Xtn is a smooth function in t. From Proposition 15.14 there exists c1 = c1 (ρ) so that ρ ρ sup R(X n ,X m ) ρ-var;[0,1] 2 ≤ c1 |R|ρ-var;[0,1] 2 =: c1 K n ,m

and from Corollary 15.31 (plus Remark 15.32) we see that there exists θ > 0 and c2 = c2 (p, q, K, θ) so that θ dp-var;[0,1] (S3 (Xn ) , S3 (Xm )) q ≤ c1 |RX n −X m |∞ . L (P) of Co0,p-var It follows that S3 (Xn ) is Cauchy-in-probability as a sequence d 10 0,p-var 3 [0, 1] , G R and so there exists X ∈ C valued random variables o such that dp-var S3 X D n , X → 0 in probability and from the uniform estimates from Corollary 15.29 also in Lq for all q ∈ [1, ∞). From Corollary 15.29 we have the estimate 1 √ 2 2ρ (15.21) |d (S3 (Xn )s , S3 (Xn )t )|L q ≤ C qω n [s, t] for any 2D control ω n which controls the ρ-variation of RX n and in particular for ω n = ω µ n ,µ n , the “µn -convolution of ω” from Proposition 15.14. Sending n → ∞ then shows that 1 √ 2 2ρ . |d (Xs , Xt )|L q ≤ C qω [s, t] Obviously, the increments Xs,t = X−1 s ⊗ Xt are limits (in probability, say) of S3 (X n )s,t and so, from Proposition 15.19 and closedness of the 9 We

could also use piecewise linear (instead of molliﬁer) approximations. Cauchy criterion for convergence in probability of r.v.s with values in a Polish space is an immediate generalization of the corresponding real-valued case. 10 A

15.3 Multidimensional Gaussian processes

431

Wiener–Itˆ o chaos under convergence in probability, π n (Xs,t ) is indeed an element of the nth (not necessarily homogenous) Wiener–Itˆ o chaos. The statements of (ii),(iii) then follow directly from Proposition 15.22, applied 2 with 1D control s, t → ω [s, t] . For (iv), as of yet, our construction ofX may depend on the particular 1 1 d d molliﬁer function ϕ. Assume now that X is Gaussian , Y , . . . , X , Y with independent X i , Y i : i = 1, . . . , d such that Y has bounded variation sample paths. Then R(X n ,Y ) 2 ≤ R(X n ,X ) 2 + R(X ,Y ) 2 ρ-var;[0,1]

ρ-var;[0,1]

ρ-var;[0,1]

which is ﬁnite, whenever RX ∈ C ρ-var , uniformly in n and uniformly over all Y given by (componentwise) piecewise linear or molliﬁer approximations to X. (This follows from Propositions 15.11 and 15.14 respectively.) We can therefore, as in part (i), pass to the limit in θ dp-var;[0,1] (S3 (Xn ) , S3 (Y )) q ≤ c4 |RX n −Y |∞ L (P) to learn that θ dp-var;[0,1] (X, S3 (Y )) q ≤ c4 |RX −Y |∞ . L (P) When applied to Y = X D n with |Dn | → 0 resp. Y = X µ n for any µn → δ 0 . the right-hand side above tends to zero and the proof of (iv) is ﬁnished. Theorem 15.33 asserts in particular that d-dimensional Brownian motion can be naturally lifted to an enhanced Gaussian process, easily identiﬁed as enhanced Brownian motion (in view of (iv) and the results of Section 13.3.3). Other examples are obtained by considering d independent (continuous, centred) Gaussian processes, each of which satisﬁes the condition that its covariance is of ﬁnite ρ-variation, for some ρ < 2. For example (cf. Proposition 15.5) one may take d independent copies of fractional Brownian motion: the resulting Rd -valued fractional Brownian motion B H can be lifted to an enhanced Gaussian process (“enhanced fractional Brownian motion”, BH ) provided H > 1/4. Further examples are constructed by consulting the list of Gaussian processes in Section 15.2. Exercise 15.36 In the context of Theorem 15.33, (i) show that there exists η = η (ρ, K) > 0 such that   / 02  , X ) d (X s t  < ∞; sup E exp η 1 0≤s< t≤1 ω (s, t) 2 ρ (ii) deﬁne a deterministic time-change from [0, 1] onto itself, given by ρ

ρ

τ (t) = |RX |ρ-var;[0,t] 2 / |RX |ρ-var;[0,1] 2

Gaussian processes

432

˜ t : 0 ≤ t ≤ 1 by requiring that X ˜ τ (t) = and deﬁne the Gaussian process X ˜ admits a natural lift X ˜ so that Xt . Show that X ˜ τ (t) = Xt X and such that



  2  ˜ ˜   d Xs , Xt   sup E exp η   < ∞; 1 0≤s< t≤1 |t − s| 2 ρ

(iii) deduce from the results of Section A.4, Appendix A, that for a suitable constant c, ˜ 2 <∞ E exp ηc X ψ 2 ρ , ρ -var;[0,1]

2 < ∞. E exp ηc Xψ 2 ρ , ρ -var;[0,1]

and then11

Solution. (i) is a consequence of (15.20), cf. Lemma A.17. (ii) Assume ˜ we see that ω (0, 1) = 1 for simplicity. By deﬁnition of τ and X ρ

|RX˜ |ρ-var;[τ (s),τ (t)] 2

ρ

ρ

≤

|RX˜ |ρ-var;[0,τ (t)] 2 − |RX˜ |ρ-var;[0,τ (s)] 2

=

|RX |ρ-var;[0,t] 2 − |RX |ρ-var;[0,s] 2

=

|RX |ρ-var;[0,1] 2 (τ (t) − τ (s)) ,

ρ

ρ

ρ

ρ ˜ has ﬁnite ρ-variation controlled by ω which implies that X ˜ = |RX˜ |ρ-var;[·,·]×[·,·] . 2 Clearly then, ω ˜ [s, t] ≤ K |t − s| and the claimed estimate follows from (15.20), applied with ω ˜. (iii) This is a straightforward consequence of the results of Section A.4, Appendix A, and invariance of (generalized) variation norms under repara metrization. 1 1 continTheorem 15.37 Let (X, Y ) = X , Y , . . . , X d , Y d be a centred uous Gaussian process such that X i , Y i is independent of X j , Y j when i = j. Let ρ ∈ [1, 2) and assume the covariance Y ) is of ﬁnite ρ of (X, 2 variation, controlled by a 2D control ω with ω [0, 1] ≤ K, and write X and Y for the natural lift of X and Y . Assume also that

1/ρ 2 |RX −Y |ρ-var;[s,t] 2 ≤ ε2 ω [s, t] . 1 1 This sharpens the statement on ﬁnite p-variation, p > 2ρ, and is relevant (at least when ρ = 1) as it allows for unique RDE solutions driven by X along Lip 2 -vector ﬁelds.

15.4 The Young–Wiener integral

433

Then (i) there exists a constant C = C (ρ, K) such that for all s < t in [0, 1] , q ∈ [1, ∞) and n = 1, 2, 3, n n 2 2ρ ; |π n (Ys,t − Xs,t )|L q (P) ≤ Cq 2 εω [s, t]

(15.22)

(ii) if p > 2ρ then there exists a constant C = C (p, ρ, K) such that √ 1/3 dp-var;[0,1] (X, Y) q ≤ C max ε , ε q (15.23) L (P) and for n = 1, 2, 3 we have (n ) ρp-var;[0,1] (X, Y)

L q (P)

≤ C q 2 ε; n

(15.24) (n )

(iii) if ω (s, t) ≤ K |t − s| for all s < t in [0, 1] then dp-var;[0,1] , ρp-var;[0,1] (n )

in (15.18), (15.19) may be replaced by d1/p-H¨o l;[0,1] , ρ1/p-H¨o l;[0,1] respectively. Proof. The statements are precisely those of Corollary 15.31 but without assuming that X, Y are the step-three lift of processes with boundedvariation sample paths. The proof is then completed with the same passage to limit, along the lines of the previous proof. Remark 15.38 As already noted in Remark 15.32, estimates (15.22), (15.23), (15.24) of Theorem 15.37 apply in particular after replacing ρ by 1−ρ/ρ . ρ ∈ (ρ, 2), such that 2ρ < 2ρ < p, and after replacing ε2 by |RX −Y |∞ In particular, there exist positive constants θ, C depending only on ρ, ρ , K, p such that θ √ dp-var;[0,1] (X, Y) q ≤ C |RX −Y |∞ q. L (P)

15.4 The Young–Wiener integral Given a suitable d-dimensional Gaussian process X, assuming in particular ﬁnite ρ-variation of the covariance for some ρ ∈ [1, 2), we have constructed a Gaussian rough path X of ﬁnite p-variation, for any p > 2ρ. At the same time we have seen that for any h ∈ H, the associated Cameron–Martin

space has ﬁnite ρ-variation. Clearly, integrals of the form h ⊗ dh are well-deﬁned Young integrals. However, cross integrals of the form hdX are only well-deﬁned as Young integrals if 1/ρ + 1/p > 1, which would require ρ ∈ [1, 3/2). However, we can deﬁne the integral probabilistically,

Gaussian processes

434

say in L2 -sense, and it will suﬃce to look at the scalar-valued case. Let us remark that such cross integrals arise if we consider perturbations of the random variable X (·) in Cameron–Martin directions or when dealing with non-centred Gaussian processes. We have Proposition 15.39 (Young–Wiener integral) Assume X is a continuous, centred Gaussian with covariance R of ﬁnite ρ-variation. Let h ∈ C q -var ([0, 1] , R) , with q −1 + ρ−1 > 1. Then, for any piecewise linear or molliﬁer approximation (X n ) to X, the indeﬁnite integral t hdX n 0

converges, for each t ∈ [0, 1], in L2 and its common limit is denoted by

t hdX. For all s < t in [0, 1] , we have the Young–Wiener isometry 0 2 t = hu dXu hu hv dR (u, v) , E 2 s

[s,t]

and if h (s) = 0 we have the Young–Wiener estimate 2 t 2 E ≤ Cρ,q |h|q -var;[s,t] |R|ρ-var;[s,t] 2 . hu dXu

(15.25)

s

t At last, the process t → 0 hdX admits a continuous version with sample path of ﬁnite p-variation, for any p > 2ρ. Proof. When X has (piecewise) smooth sample paths, the Young–Wiener isometry is obvious from 2 t t t =E hu dXu hu hv dXu dXv = hu hv dR (u, v) . E 2 s

s

s

[s,t]

Finite q-variation of h implies that h ⊗ h also has ﬁnite q-variation (now in 2D sense) and from the Young 2D estimates it follows that 2 t ≤ c1 |h ⊗ h|q -var;[s,t] |R|ρ-var;[s,t] 2 hu dXu E s

≤

2

c2 |h|q -var;[s,t] |R|ρ-var;[s,t] 2

where c1 , c2 depend on p, ρ. Replace X by X n − X m , then piecewise linear or molliﬁer approximation yields 2 t t 2 n m ≤ c2 |h|q -var;[0,1] |RX n −X m |ρ-var;[0,1] 2 . hdX − hdX sup E t∈[0,1]

0

0

15.4 The Young–Wiener integral

435

In fact, by choosing ρ > ρ small enough (so that 1/q + 1/ρ > 1) we can use interpolation to see that (constants may now also depend on h and ρ ) 2 t t n m sup E hdX − hdX

t∈[0,1]

0

0

≤ c3 |RX n −X m |ρ -var;[0,1] 2 ≤ ≤

ρ/ρ 1−ρ/ρ c4 |RX n −X m |∞;[0,1] 2 sup R(X n ,X m ) ρ-var;[0,1] 2 n ,m

c5 |RX n −X m

1−ρ/ρ |∞;[0,1] 2

,

where the last estimate is justiﬁed exactly as in step (i) of the proof of Theorem 15.33; that is, by means of Proposition 15.14. It follows that t n hdX : n ∈ N is Cauchy in L2 (P) and hence convergent. Then, similar 0 to step (iii) of the aforementioned proof, one sees that this limit does not depend on a particular approximation. At last, the p-variation regularity is the content of Exercise 15.41 below. Remark 15.40 When X is Brownian motion, dR = δ {s=t} and we recover the Itˆo isometry for Itˆ o–Wiener integrals. Exercise 15.41 In the context of Proposition 15.39, assuming in particular that X has covariance of ﬁnite ρ-variation controlled by some 2D control ω, show that hdX admits a version which has ﬁnite p-variation for any p > 2ρ.

t Solution. Since It − Is := s hdX is Gaussian, we have |It − Is |L r (P)

t ≤ hs,u dXu + |h|∞ |Xs,t |L r (P) s L r (P) / 0 t ≤ c1 hs,u dXu + |h|∞ |Xs,t |L 2 (P) s L 2 (P) 1/2 ≤ c2 |h|q -var;[s,t] + |h|∞ |R|ρ-var;[s,t] 2 ρ

where c1 , c2 may depend on r, ρ, q. Setting ω (s, t) := |R|ρ-var;[s,t] 2 yields 1 |It − Is |L r (P) = O |ω (0, t) − ω (0, s)| 2 ρ −1

older continuous so that by Kolmogorov’s criterion J = I ◦ ω (0, ·) is H¨ with any exponent less than 1/2ρ. It follows that J (and then I) have the claimed p-variation regularity, p > 2ρ.

Gaussian processes

436

15.5 Strong approximations 15.5.1 Piecewise linear approximations We now establish the rate of convergence for piecewise linear approximations with focus. Those results are here for clarity, as we only need to put pieces together to obtain them. Theorem 15.42 Assume that X = X 1 , . . . , X d is a centred continuous Gaussian process with independent components and covariance R of ﬁnite ρ-variation, ρ ∈[1, 2), controlled by some 2D control ω. Fix an arbitrary 1 1 p ∈ (2ρ, 4) , η ∈ 0, 2ρ − p and write X for the natural lift of X. Then 2 (i) if ω [0, 1] ≤ K, there exists some constant C1 = C (ρ, p, K, θ) , such that for all D ∈ D [0, 1] and q ∈ [1, ∞), η /3 √ 2 dp-var;[0,1] X, S3 X D q ≤ C q max ω [t , t ] , 1 i i+1 L (P)

(15.26)

t i ∈D

and also

(n ) ∀n ∈ {1, 2, 3} : ρp-var;[0,1] X, S3 X D

L q (P)

η n 2 ≤ C1 q 2 max ω [ti , ti+1 ] ; t i ∈D

(n )

(ii) if ω (s, t) ≤ K |t − s| for all s < t in [0, 1] then dp-var;[0,1] , ρp-var;[0,1] (n )

in the above estimates may be replaced by d1/p-H¨o l;[0,1] , ρ1/p-H¨o l;[0,1] respectively. Remark 15.43 If ρ ∈ [1, 3/2) we can take p ∈ (2ρ, 3) and then only need a step-2 lift. Since the power 1/3 in (15.26) is readily traced back to (15.13) we see that, in the case ρ ∈ [1, 3/2), we have √ η /2 d1/p-H¨o l;[0,1] X, S3 X D q ≤ C q |D| . L (P) In particular, the above estimates applied to enhanced Brownian motion are in precise agreement with those obtained earlier (Corollary 13.21) by direct computation in a Brownian context. Proof. Pick ρ ∈ (ρ, p/2) and note that, following Remark 15.32, ρ 2 ω D (A) := R(X ,X D ) ρ-var;A (any rectangle A ⊂ [0, 1] ) also controls the ρ -variation of X, X D , while interpolation gives 1−ρ/ρ

|RX −X D |ρ -var;[s,t] 2 ≤ c2 |RX −X D |∞

1/ρ 2 ω D [s, t]

15.5 Strong approximations

437

where we note that |RX −X D |∞ = sups,t∈[0,1] E Xs − XsD Xt − XtD is bounded by ! 2 2 " ≤ 2 max E Xt i ,t i + 1 sup E Xt − XtD t i ∈D

t∈[0,1]

≤

1/ρ 2 . 2 max ω D [ti , ti+1 ] t i ∈D

Proposition 15.13, applied with ρ instead of ρ and ε2 = c3 maxt i ∈D ω 1 − 1 2 ρ ρ [ti , ti+1 ] , yields

1− 1 2 6ρ 6ρ 1/2 dp-var X, S3 X D q [t ≤ c q max ω , t ] , 1 D i i+1 L (P) t i ∈D

and for k = 1, 2, 3, (k ) ρp-var X, S3 X D

L q (P)

1− 1 2 2ρ 2ρ ≤ c1 q k /2 max ω D [ti , ti+1 ] . t i ∈D

2 We conclude the p-variation estimates by observing that ω D [ti , ti+1 ] ≤ 2 (see Proposition 15.11 and Lemma 15.12). At last, the c2 ω [ti , ti+1 ] H¨ older estimate is obtained similarly. Exercise 15.44 Assume Dn = 2kn , 0 ≤ k ≤ 2n . Show that under the assumptions of Theorem 15.42, part (ii), d1/p-H¨o l;[0,1] X, S3 X D → 0 a.s. Solution. From Theorem 15.42, there exists θ > 0 such that √ d1/p-H¨o l;[0,1] X, S3 X D q ≤ C2−n θ q. L (P) A standard Borell–Cantelli argument ﬁnishes the proof.

15.5.2 Molliﬁer approximations Theorem 15.45 Assume that X is a centred Rd -valued continuous Gaussian process with independent components and covariance R = RX of ﬁnite ρ-variation, ρ ∈ [1, 2), so that there exists a natural lift X, with p-variation sample paths for any p ∈ (2ρ, 4). Fix a molliﬁer function ϕ (·) : R → R, set dµn (u) = nϕ (nu) dt and deﬁne (componentwise) approximations by t → Xtn = Xt−u dµn (u) . Then

dp-var;[0,1] (X, S3 (X n )) q L (P) → 0 as n → ∞. sup √ q q ∈[1,∞)

438

Gaussian processes

Proof. Similar to the arguments of step 1 in Theorem 15.33. The details are left to the reader.

15.5.3 Karhunen–Lo`eve approximations

Any Rd -valued centred Gaussian process X = X 1 , . . . , X d with continuous sample paths Wiener space (E, H, P) with gives rise to an abstract (cf. E = C [0, 1] , Rd and H ⊂ C [0, 1] , Rd . From general principles k Section D.3 of Appendix D), for any ﬁxed orthonormal basis h : k ∈ N ⊂ H, there is a Karhunen–Lo`eve expansion (a.s. and L2 -convergent) X=

Zk hk ,

k ∈N

where Zk , the image of hk under the Payley–Wiener map, is a sequence of independent standard normal random variables. With our standing assumptions of independence of its component processes, each component gives rise to an abstract Wiener space on C ([0, 1] , R) with Cameron–Martin space Hi and H ∼ = ⊕di=1 Hi . The 1-dimensional considerations of Section 15.2.5 then apply without changes to the d-dimensional setting (with d independent components) and we have from Lemma 15.18, setting again X A = E [X· |FA ] where FA = σ (Zk , k ∈ A) and A ⊂ N, that for any ρ ≥ 1 and A ⊂ N, |RX A |ρ-var ≤ (1 + min {|A| , |Ac |}) |R|ρ-var;[s,t] 2

(15.27)

and |RX A |2-var;[s,t] 2 ≤ |R|2-var;[s,t] 2 .

(15.28)

We now assume that R has ﬁnite ρ-variation for some ρ ∈ [1, 2) dominated by some 2D control ω. For ﬁxed A ⊂ N, ﬁnite or with ﬁnite compleA A ,1 A ,d admits a natural = X , . . . , X ment, it follows from (15.27) that X G3 Rd -valued lift, denoted by XA . Of course, XN = X, the natural lift of X. Lemma15.46 Assume that (i) X = X 1 , . . . , X d is a centred continuous Gaussian process with independent components; (ii) X has Karhunen–Lo`eve expansion k ∈N Zk hk where (hk = (hk ;1 , . . . , hk ;d )) is an orthonormal basis for H; (iii) the covariance of X is of ﬁnite ρ-variation, for some ρ ∈ [1, 2), controlled by some 2D control ω; (iv) A ⊂ N so that min {|A| , |Ac |} < ∞;

15.5 Strong approximations

Then, (a) for all s < t in [0, 1], for i = E Xs,t |FA i,j = E Xs,t |FA = E Xi,j,k s,t |FA = E Xi,i,j s,t |FA

439

all i, j, k distinct in {1, . . . , d} , we have12 A ,i Xs,t

(15.29)

,i,j XA s,t

(15.30)

,i,j,k XA s,t ,i,i,j XA + s,t

1 2

(15.31) t A c ;i 2 E Xs,u dXuA ;j ; (15.32)

s

(b) for all s < t in [0, 1] and n ∈ {1, 2, 3, . . . }, we have n /ρ 2 ≤ Cω [s, t]2 E π n XA sup s,t A ⊂N, m in{|A |,|A C |}< ∞

where C depends on ρ. Proof. (a) Equality (15.29) is essentially the deﬁnition of X A . Equality (15.30) is also easy: one just needs to note that E (·|FA ) is a projection in L2 and hence L2 -continuous; since both X and XA are L2 -limits of their respective lifted piecewise linear approximations (a general feature of enhanced Gaussian processes), the claim follows. The proof of equality (15.31) follows the same argument, while (15.32) is a consequence of A c ;i 2 i 2 A ,i 2 E Xs,u FA − Xs,u = E Xs,u . From the L2 -projection property distinct, ,i 2 ≤ E XA s,t ,i,j 2 ≤ E XA s,t ,i,j,k 2 ≤ E XA s,t 2 ≤ E E Xi,i,j |F A s,t

of E (·|FA ) we then see that, for i, j, k 1/ρ 2 2 E Xis,t ≤ c1 ω [s, t] 2 2/ρ 2 E Xi,j ≤ c ω [s, t] 1 s,t 2 3/ρ 2 E Xi,j,k ≤ c ω [s, t] 1 s,t 2 3/ρ 2 E Xi,i,j ≤ c ω [s, t] ; 1 s,t

for some constant c1 = c1 (ρ), thanks to (15.20). Thus, to prove (b), it only remains to prove that 2 3/ρ t c 2 2 A ;i A ;j ≤ c2 ω [s, t] E Xs,u dXu . E s

1 2 Ac

= N\A.

Gaussian processes

440

To this end, observe that, thanks to i = j, t t A c ;i 2 A c ;i 2 A ;j j E Xs,u dXu = E E Xs,u dXu FA , s

s

and hence 2 2 t t c 2 c 2 A ;i A ;i E ≤ E E Xs,u E Xs,u dXuA ;j dXuj . s

s

c

A ;i 2 We deﬁne f (u) := E(|Xs,u | ), noting that f (s) = 0, and for u < t in [s, t], u, v s, u u, v c c c + RX A ; i + RX A ; i fu ,v = RX A ; i u, v u, v s, u

so that 2

2

2

2

|fu ,v | ≤ |RX A c ; i |2-var;[u ,v ] 2 + |RX A c ; i |2-var;[u ,v ]×[s,t] + |RX A c ; i |2-var;[s,t]×[u ,v ] . As the right-hand side above is super-additive in [u, v], it follows from the uniform 2-variation estimates (15.28) that 2/ρ 2 2 2 2 |f |2-var;[s,t] ≤ 3 |RX A c ; i |2-var;[s,t] 2 ≤ 3 |RX i |2-var;[s,t] 2 ≤ 3ω [s, t] and we conclude with the Young–Wiener estimate of Proposition 15.39. Theorem that 15.47 Assume (i) X = X 1 , . . . , X d is a centred continuous Gaussian process with independent components; (ii) X has Karhunen–Lo`eve expansion k ∈N Zk hk where (hk = (hk ;1 , . . . , hk ;d )) is an orthonormal basis for H; (iii) the covariance of X is of ﬁnite ρ-variation, for some ρ ∈ [1, 2), con 2 trolled by some 2D control ω with ω [0, 1] ≤ K; (iv) p > 2ρ and An := {1, . . . , n} . Then, there exists a constant η = η (p, ρ, K) > 0 2 (15.33) sup E exp η XA n p-var;[0,1] < ∞ n ∈N

and, for all q ∈ [1, ∞), dp-var;[0, 1] XA n , X c An X p-var;[0, 1]

→

0 in Lq (P) as n → ∞,

(15.34)

→

0 in Lq (P) as n → ∞.

(15.35)

If ω is H¨ older dominated, i.e. ω (s, t) ≤ K |t − s| for all s < t in [0, 1], then (15.33), (15.34), (15.35) also hold in 1/p-H¨ older sense.

15.5 Strong approximations

441

Proof. Inequality (15.33) follows from Lemma 15.46 and Proposition 15.22. Let us observe that the proof of (15.34) can be reduced to pointwise convergence n (15.36) d XA t , Xt → 0 in probability 2 under the H¨ older assumption “ω [s, t] ≤ K |t − s|”. Indeed, assuming this H¨older domination on ω, this follows directly from Proposition A.15 ˜ := whereas the general case is reduced to the H¨older one by considering X 2 −1 X ◦ [(ω (0, ·) /ω([0, 1] )] and noting that both natural lift and Karhunen– Lo`eve expansions commute with a deterministic, continuous time-change, ˜ An , X ˜ An , X ˜ ≤ d1/p-H¨o l;[0, 1] X ˜ . dp-var;[0, 1] XA n , X = dp-var;[0, 1] X We thus turn to the proof of (15.36). From Proposition 15.62, it will be enough to prove that for i, j, k distinct, A n ,i − Xit → 0 in L2 (P) , Xt A n ,i,j 2 − Xi,j Xt t → 0 in L (P) , A n ,i,j,k − Xi,j,k → 0 in L2 (P) , Xt t A n ,i,i,j − Xi,i,j → 0 in L2 (P) . Xt t The ﬁrst three convergence results are pure martingale convergence results. For the last one, in view of (15.32) we also need to prove that

t A c ;i 2 E Xs,un dXuA n ;j converges to 0 in L2 . To this end we note that, s again by a martingale argument, A cn ;i 2 i A n ;i 2 → 0 as n → ∞. sup E Xs,u = sup E Xs,u − Xs,u u ∈[s,t]

u ∈[s,t]

On the other hand, in the proof of Lemma 15.46, that the 2-variation we saw A cn ;i 2 2 of u ∈ [s, t] → E Xs,u is bounded by c1 ω [s, t] . By interpolation, this means that for ε > 0, its (2 + ε)-variation converges to 0 when n tends 1 1 to ∞. We pick ε such that 2+ε + 2ρ > 1. After recalling that 2 2 t t A cn ;i 2 A cn ;i 2 A n ;j j ≤E E Xs,u dXu E Xs,u dXu , E s

s

we therefore obtain, using the Young–Wiener integral bounds (Proposition 15.39), that 2 A c ;i 2 2 t A cn ;i 2 A n ;j n E Xs,u dXu ≤c2 E Xs,. |R|ρ-var;[s,t] , E s

(2+ε)-var;[s,t]

Gaussian processes

442

2 t A cn ;i 2 A n ;j → 0 as n tends to ∞. It only and hence E s E Xs,u dXu remains to prove (15.35) which is reduced, as above, to pointwise convergence (in probability or L2 ) of the i, (i, j) , (i, j, k) , (i, i, j)-coordinates. By the backward martingale convergence theorem and Kolmogorov’s 0–1 law, for i, j, k distinct, A c ,i Xt n = E Xt,i |FA c | → E Xt,i |∩k FA c | = E Xt,i = 0 (with convergence in L2 , as n → ∞) and similarly, using the fact that i, j, k are distinct, c c A n ,i,j A n ,i,j,k Xt , Xt → 0 in L2 (P) , c A ,i,i,j so that we are only left to show that Xt n → 0 in L2 (P) , which in 2 t A c ;j A n ;i 2 dXu n → 0 in view of (15.32) requires us to prove that s E Xs,u L2 as n → ∞. From 2 " t ! A ;i 2 A cn ;j i 2 n lim E E Xs,u − E Xs,u dXu = 0, n →∞

s

2 t A c ;j i 2 dXu n → 0, this can be reduced to L2 -convergence of s E Xs,u which follows, thanks to t t i 2 i 2 c dXuA n ;j = E dXuj FA c , E Xs,u E Xs,u n s

s

from backward martingale convergence. The proof is then ﬁnished. Exercise 15.48 In the context of Theorem 15.47, show that 2 < ∞. ∃η > 0 : sup E exp η XA n ψ -var;[0,1] n ∈N

ρ,ρ/2

15.6 Weak approximations 15.6.1 Tightness Proposition 15.49 Assume that (i) ω is a 2D control; (ii) (X n ) is a sequence of centred, d-dimensional, continuous Gaussian processes with independent components; (iii) for ρ ∈ [1, 2) and for some constant C and for all s < t in [0, 1] , ρ 2 sup |RX n |ρ-var;[s,t] 2 ≤ Cω [s, t] ; n

15.6 Weak approximations

443

(iv)Xndenotes the natural lift of X n with sample paths in Co0,p-var ([0, 1] , G3 Rd , for some p > 2ρ. n Then the family (P∗ Xn ),i.e. the laws dof X viewed as Borel measures on 0,p-var 3 [0, 1] , G R , are tight. If ω is H¨ older domithe Polish space Co d 0,1/p-H¨o l 3 nated then tightness holds in Co [0, 1] , G R . Proof. Let us ﬁx p ∈ (2ρ, p) and consider ﬁrst the case of H¨ older-dominated ω. Deﬁne KR = x : x1/p -H¨o l ≤ R 0,1/p-H¨o l [0, 1] , G3 Rd , and note that KR is a relatively compact set in C0 which is a simple consequence from Arzela–Ascoli and interpolation (see Proposition 8.17). From the Fernique estimates in Theorem 15.33, there exists a constant c such that sup P (Xn ∈ KR ) ≤ ce−R

2

/c

n

and the tightness result follows. The general case is a time-changed version of the H¨ older case, using relative compactness of 1/p 2 2 x : d (xs , xt ) ≤ R ω [0, t] − ω [0, s]

in Co0,p-var [0, 1] , G3 Rd . We leave the details to the reader.

15.6.2 Convergence We now turn to convergence. By Prohorov’s theorem,13 tightness already implies existence of weak limits and so it only remains to see that there is one and only one limit point; the classical way to see this is by checking convergence of the ﬁnite-dimensional distributions. We need a short lemma concerning the interchanging of limits. ¯ n∈N ¯ a colLemma 15.50 Let (E, d) a Polish space and Z m ,n : m ∈ N, lection of E-valued random variables. Assume Z m ,n converges weakly to Z m ,∞ as n → ∞ for every m ∈ N. Assume also Z m ,n → Z ∞,n in probability, uniformly in n; that is, ∀δ > 0 : sup P (d (Z m ,n , Z ∞,n ) > δ) → 0 as m → ∞. n ∈N

Then Z ∞,n converges weakly to Z ∞,∞ . 1 3 For

example, [13].

Gaussian processes

444

Proof. By the Portmanteau theorem,14 it suﬃces to show that for every f : E → R, bounded and uniformly continuous, Ef (Z ∞,n ) → Ef (Z ∞,∞ ) . To see this, ﬁx ε > 0 and δ = δ (ε) > 0 such that d (x, y) < δ implies |f (x) − f (y)| < ε. By assumption we can take m = m (ε) large enough such that sup P (d (Z m ,n , Z ∞,n ) > δ) < ε. 0≤n ≤∞

Hence, sup |Ef (Z ∞,n ) − Ef (Z m ,n )|

0≤n ≤∞

≤ sup0≤n ≤∞ |E [|f (Z ∞,n ) − f (Z m ,n )| ; d (Z ∞,n , Z m ,n ) ≥ δ]| ∞,n m ,n ∞,n m ,n + sup0≤n ≤∞ |E [|f (Z ) < δ]| ) − f (Z )|; d (Z , Z ≤ 2 |f |∞ sup0≤n ≤∞ P d∞ Xn , S3 XnD ≥ δ + ε ≤ (2 |f |∞ + 1) ε. On the other hand, for n ≥ n0 (m, ε) = n0 (ε) large enough, we also have |Ef (Z m ,n ) − Ef (Z m ,∞ )| ≤ ε and the proof is then ﬁnished with the triangle inequality, |Ef (Z ∞,n ) − Ef (Z ∞,∞ )|

≤

≤

|Ef (Z ∞,n ) − Ef (Z m ,∞ )| + |Ef (Z m ,∞ ) − Ef (Z ∞,∞ )| + |Ef (Z m ,n ) − Ef (Z m ,∞ )| (2 |f |∞ + 1) 2ε + ε.

Theorem 15.51 Assume that (i) (X n )0≤n ≤∞ is a sequence of centred, d-dimensional, continuous Gaussian processes on [0, 1] with independent components; (ii) the covariances of X n are of ﬁnite ρ-variation, ρ ∈ [1, 2), uniformly controlled by some 2D control ω; (iii) Xndenotes the natural lift of X n with sample paths in Co0,p-var ([0, 1] , G3 Rd , for some p > 2ρ; 2 (iv) RX n converges pointwise on [0, 1] , to RX ∞ . n Then, for any p > 2ρ, X converges weakly to X∞ with respect to pvariation topology. If ω is H¨ older-dominated, then convergence holds with respect to 1/p-H¨ older topology. 1 4 For

example, [13].

15.7 Large deviations

445

Proof. Tightness was established in Proposition 15.49 so we only need weak convergence of the ﬁnite-dimensional distributions: (Xnt : t ∈ S) =⇒ (X∞ t : t ∈ S)

for any S ∈ D [0, 1] .

By assumption (iv) this holds on level one, meaning that (Xtn : t ∈ S) =⇒ (Xt∞ : t ∈ S) . Now, given a continuous path x ∈ C [0, 1] , Rd it is easy to see that (x : t ∈ S) → S3 xD ∈ C [0, 1] , G3 Rd is continuous and so it is clear that n ,D : t ∈ S =⇒ S3 X ∞,D t : t ∈ S . S3 X t On the other hand, it follows from Theorem 15.42 that, along any sequence (Dm ) ⊂ D [0, 1] with mesh tending to zero, S3 X n ,D m → Xn , pointwise and in probability (much more was shown!), and also uniformly in n, thanks to the explicit estimates 15.42.It then suﬃces to of Theorem apply Lemma 15.50 with Z m ,n = S3 X n ,D m t : t ∈ S with state-space ×(# S ) . E = G3 Rd Example 15.52 Set R (s, t) = min (s, t). The covariance of fractional Brownian motion is given by 1 2H 2H s + t2H − |t − s| . RH (s, t) = 2 Take a sequence Hn → 1/2. It is easy to see that RH n → R pointwise and from our discussion of fractional Brownian motion, for any ρ > 1, H R n ρ-var;[s, t] 2 < ∞. lim sup 1/ρ n →∞ |t − s|

15.7 Large deviations As in previous sections, X = (X 1 , . . . , X d ) denotes a centred continuous Gaussian process on [0, 1], with independent components, each with covariance of ﬁnite ρ-variation for some ρ ∈ [1, 2) and dominated by some 2D control ω. We write H for its associated Cameron–Martin space. Re call from Section 15.3.3 that X admits a natural lift to a G3 Rd -valued process X, obtained as the limit of lifted piecewise linear approximations along dissections D with mesh |D| → 0, dp-var;[0,1] X, S3 X D → 0 in Lq (P) for all q ∈ [1, ∞).

Gaussian processes

446

Since the law of X induces a Gaussian measure on C [0, 1] , Rd , it follows from general principles (see Section D.2 in Appendix D) that (εX : ε > 0) satisﬁes a large deviation principle with good rate function I in uniform topology, where I is given by 1 d 2 !x, x"H if x ∈ H ⊂ C [0, 1] , R I(x) = +∞ otherwise. We write Φm for the piecewise linear approximations along the dissection Dm = {i/m : i = 0, . . . , m}. It is clear that S3 ◦ Φm : C [0, 1] , Rd , |·|∞ → C [0, 1] , G3 Rd , d∞ is continuous. By the contraction principle, S3 (εΦm (X)) satisﬁes a large deviation principle with good rate function Jm (y) = inf {I (x) , x such that S3 (Φm (x)) = y} , the inﬁmum of the empty set being +∞. Essentially, a large deviation principle for δ ε X is obtained by sending m to inﬁnity. To this end we now prove that S3 (Φm (X)) is an exponentially good approximation to X. Lemma 15.53 Let δ > 0 ﬁxed. Then, for p > 2ρ, we have lim lim ε2 log P (dp-var (S3 (Φm (εX)) , δ ε X) > δ) = −∞.

m →∞ ε→0

If ω is H¨ older-dominated, then lim lim ε2 log P d1/p-H¨o l (S3 (Φm (εX)) , δ ε X) > δ = −∞. m →∞ ε→0

Proof. First observe that dp-var (S3 (Φm (εX)) , δ ε X) = εdp-var (S3 (Φm (X)) , X) . θ

Clearly, for θ > 0, αm ≡ |RX −X D m |∞ → 0 as m → ∞ and from Theorem 15.42, √ (15.37) |dp-var (S3 (Φm (X)) , X)|L q ≡ C qαm → 0. We then estimate

δ P (dp-var (S3 (Φm (εX)) , δ ε X) > δ) = P dp-var (S3 (Φm (X)) , X) > ε −q δ √ q q q αm ≤ ε ! ε √ " αm q , ≤ exp q log δ and after choosing q = 1/ε2 we obtain, for ε small enough, α m ε2 log P (dp-var (S3 (Φm (εX)) , δ ε X) > δ) ≤ log . δ

15.7 Large deviations

447

Now take the limits limε→0 and limm →∞ to ﬁnish the proof, for the dp-var case. The proof is (almost) identical for the 1/p-H¨ older case. From our embedding of the Cameron–Martin space into the space of paths of ﬁnite ρ-variation, we obtain Lemma 15.54 For all Λ > 0, and p > 2ρ, we have lim

sup

m →∞ {h:I (h)≤Λ}

dp-var [(S3 ◦ Φm ) (h) , S3 (h)] = 0.

(15.38)

If ω is H¨ older-dominated, then lim

sup

m →∞ {h:I (h)≤Λ}

d1/p-H¨o l [(S3 ◦ Φm ) (h) , S3 (h)] = 0.

(15.39)

Proof. First, let us observe that for s < t in [0, T ] , we have, as ρ < 2, from Theorem 9.5 and Proposition 5.20, (S3 ◦ Φm ) (h)ρ-var;[s,t]

≤

c1 Φm (h)ρ-var;[s,t]

≤

c2 hρ-var;[s,t] .

Now using Proposition 15.7, we obtain for h with I (h) ≤ Λ, 1/2ρ 2 . (S3 ◦ Φm ) (h)ρ-var;[s,t] ≤ c3 Λ1/2 ω [s, t] In particular, we see that sup

sup

(S3 ◦ Φm )(h)2ρ-var;[0,1] ≤ sup

m ≥0 {h:I(h)≤Λ}

sup

(S3 ◦ Φm )(h)ρ-var;[0,1]

m {h:I(h)≤Λ}

< ∞ and, if ω is Holder-dominated, sup

sup

m {h:I (h)≤Λ}

(S3 ◦ Φm ) (h)1/2ρ-H¨o l;[0,1] < ∞.

In particular, we ﬁrst see that by interpolation, to prove (15.38) and (15.39), it is enough to prove that lim

sup

m →∞ {h:I (h)≤Λ}

d0 [(S3 ◦ Φm ) (h) , S3 (h)] = 0.

We will actually prove the stronger statement lim

sup

m →∞ {h:I (h)≤Λ}

dρ -var;[0,1] [(S3 ◦ Φm ) (h) , S3 (h)] = 0,

for ρ ∈ (ρ, 2) . But, as we picked ρ < 2, we can use the uniform continuity on bounded sets of the map S3 (Corollary 9.11) to see that it only remains to prove sup dρ -var;[0,1] [Φm (h) , h] = 0. lim m →∞ {h:I (h)≤Λ}

Gaussian processes

448

Using interpolation once again, it is enough to prove that lim

sup

m →∞ {h:I (h)≤Λ}

d∞-var;[0,1] [Φm (h) , h] = 0.

This follows from m −1

d∞-var;[0,1] [Φm (h) , h] ≤ max |h|ρ-var; [ i i=0

≤

m

1/2 m −1

(2Λ)

max ω i=0

+1 , im

%

] i i+1 , m m

&2 1/2ρ .

That concludes the proof. We are now in a position to state the main theorem of this section: Theorem 15.55 Assume that (i) X = X 1 , . . . , X d is a centred continuous Gaussian process on [0, 1] with independent components; (ii) H denotes the Cameron–Martin space associated with X; (iii) the covariance of X is of ﬁnite ρ-variation dominated by some 2D control ω, for some ρ ∈ [1, 2); (iv) X denotes the natural lift of X to a G3 Rd -valued process. Then, for any p ∈ (2ρ, 4), the family (δ ε X)ε> 0 satisﬁes a large deviation principle in p-variation topology with good rate function, deﬁned for x ∈ Co0,p-var [0, 1] , G3 Rd , given by J (x) =

1 !π 1 (x) , π 1 (x)"H if π 1 (x) ∈ H. 2

If ω is H¨ older-dominated then the large deviation principle holds in 1/pH¨ older topology. Proof. The proof is the same as in the Brownian motion case: after (re)stating the large deviation principle satisﬁed by S3 (εΦm (X)) , we only need to use the extended contraction principle and Lemmas 15.53 and 15.54 that (δ ε X)ε> 0 .

15.8 Support theorem We recall the standing assumptions. Under some probability measure P we have a d-dimensional Gaussian process X on [0, 1], always assumed to be centred, continuous, with independent components. We write H for the associated Cameron–Martin space. Under the assumption that X has covariance of ﬁnite ρ-variation for some ρ ∈ [1, 2), we have seen in Section 15.3.3 that X admits a natural lift to a G3 Rd -valued process X whose sample paths are, almost surely, geometric p-rough paths, p ∈ (2ρ, 4). We can and

15.8 Support theorem

449

will assume that P is a Gaussian measure on C [0, 1] , Rd so that X (ω) = ω t is realizedas a coordinate process. X can then be viewed as a measurable map from C [0, 1] , Rd into the Polish space Ω := C00,p-var [0, 1] , G3 Rd , 0,1/p-H¨o l resp. C0 [0, 1] , G3 Rd , almost surely deﬁned as X (ω) = lim S3 ω D n , n →∞

in probability say, where ω D n denotes the piecewise linear approximation based on any sequence of dissections (Dn ) with mesh |Dn | tending to zero. The law of X is viewed as a Borel measure on Ω. We now introduce the assumption of complementary Young regularity. Condition 15.56 There exists q ≥ 1 with 1/p + 1/q > 1 so that H → C q -var [0, T ] , Rd . We say that H has complementary Young regularity to X.

Thanks to Proposition 15.7, Condition 15.56 is satisﬁed when X has covariance of ﬁnite ρ-variation for some ρ ∈ [1, 3/2); indeed, this follows from considering 1 1 + >1 ρ p where the critical value ρ∗ = 3/2 is obtained by replacing p by (its lower bound) 2ρ and “greater than 1” by “equal to 1”. Remark 15.57 An application of Proposition 15.5 shows that fractional Brownian motion (“ρ = 1/ (2H)”) satisﬁes Condition 15.56 for Hurst parameter H > 1/3. One can actually do better: it follows from Remark 15.10 that for any H > 1/4 complementary Young regularity holds. Lemma 15.58 Assume complementary Young regularity. Then, (i) for P-almost every ω we have ∀h ∈ H : X (ω + h) = Th X (ω) where T denotes the translation operator for geometric rough paths; (ii) for every h ∈ H the laws of X and Th X are equivalent. Proof. The arguments are essentially identical to those employed for Brownian motion (Theorem 13.36 and Proposition 13.37): Ad (i). By switching to a subsequence if needed, we may assume that X (ω) is deﬁned as limn →∞ S3 ω D n whenever this limit exists (and arbitrarily on the remaining null-set N ). Now ﬁx h ∈ H; using complementary Young regularity, we have S3 ω D n + hD n = Th D n S3 ω D n → Th X (ω) as n → ∞

450

Gaussian processes

and thus see that X (ω + h) = Th X (ω) for all h and ω ∈ / N. Ad (ii). By Cameron–Martin, the laws of X and X + h, as Borel measures on C [0, 1] , Rd , are equivalent. It follows that the image measures under the measurable map X (·), Borel measures on Ω, are equivalent. But this says precisely that the laws of X and X (· + h) are equivalent and the proof is ﬁnished since X (· + h) = Th X almost surely. Although elementary, let us spell out the following in its natural generality. Lemma 15.59 Let S, S be two Polish spaces and µ a Borel measure on S. Assume x ∈ supp [µ] and f is continuous at x. Then f (x) ∈ supp [f∗ µ]. If, in addition, S = S and f∗ µ ∼ µ then f (x) ∈ supp [µ]. Proof. Write Bδ (x) for an open ball, centred at x of radius δ > 0. For every ε > 0 there exists δ such that Bδ (x) ⊂ f −1 (Bε (f (x))) and hence 0 < µ (Bδ (x)) ≤ (f∗ µ) (Bε (f (x))) so that f (x) ∈ suppf∗ µ. If f∗ µ ∼ µ then and 0 < (f∗ µ) (Bε (f (x))) =⇒ 0 < µ (Bε (f (x))) and so f (x) ∈ supp [µ]. We are now ready to state the main result in this section. Theorem 15.60 Let X∗ P denote the law of X, a Borel measure on the Polish space C00,p-var [0, 1] , G3 Rd where p > 2ρ. Assume that complementary Young regularity holds. Then supp [X∗ P] = S3 (H), where support and closure are with respect to p-variation topology. If ω is 2 H¨ older-dominated, i.e. ω [s, t] ≤ K |t − s| for some constant K, we can use 1/p-H¨ older topology instead of p-variation topology. Proof. As a preliminary remark, note that S3 (H) is meaningful since any h ∈ H has ﬁnite ρ-variation (Proposition 15.7) and hence lifts canonically to a G3 Rd -valued path (of ﬁnite ρ-variation) by iterated Young integration (or more precisely, as an application of Theorem 9.5). Step 1: ⊂-inclusion. Since X {1,...,n } := E X· |F{1,...,n } ∈ H almost surely and converges to X in the respective rough path metrics, the ﬁrst inclusion is clear. ˆ ∈ C [0, 1] , Rd Step 2: ⊃-inclusion. The idea is to ﬁnd at least one ﬁxed ω such that X (ˆ ω ) ∈ supp [X∗ P] and such that there exists a (deterministic!) ˆ , such that T−g n X (ˆ ω) = sequence (gn ) ⊂ H, which can and will depend on ω X (ˆ ω − gn ) → X (0) = S3 (0) in rough path metric. Having found such an element ω ˆ (with suitable sequence gn ) we can applyLemma 15.59 with µ as the law of X, a Borel measure on S = C00,p-var [0, 1] , G3 Rd resp. 0,1/p-H¨o l C0 [0, 1] , G3 Rd , S = S and continuous function f : S → S given by f : x → T−g n x; using the fact that the law of Th X is equivalent to

15.9 Appendix: some estimates in G3 Rd

451

the law of X, cf. Lemma 15.58, we conclude that T−g n X (ˆ ω ) ∈ supp [X∗ P]. This holds true for all n and by closedness of the support, the limit X (0) = S3 (0) must be in the support. The same argument shows that any further translate Th S3 (0) = S3 (h) must be in the support and thus supp [X∗ P] ⊃ S3 (H) . Passing the (p-variation resp. 1/p-H¨ older rough path) closure on both sides then ﬁnishes the proof. It remains to see how to ﬁnd ω ˆ with the required properties. Since X (ω) ∈ supp[X∗ P] and T−g n X (ω) = X (ω − gn ) holds / N1 will true for almost every ω, there is a null-set N1 so that any ω ∈ have these properties. Furthermore, Theorem 15.47, tells us that there is ˆ∈ / (N2 ∪ N1 ) and another null-set N2 so that we can pick ω m X (ˆ ω ) = lim S3 ξ (hk ) |ωˆ hk (·) = lim X{1,...,m } (ˆ ω) m →∞

m →∞

i=1

ω ) → S3 (0) . X{n +1,n +2,... } (ˆ It now suﬃces to set gn (·) = see that

n i=1

ξ (hk ) |ωˆ hk (·) ∈ H → C q -var ; we then

= T−g n X (ˆ ω ) = lim T−g n X{1,...,m } (ˆ ω)

X (ˆ ω − gn )

= =

lim X

m →∞ {n +1,...,m }

m →∞ {n +1,n +2,... }

X

(ˆ ω)

(ˆ ω ) → X (0) = S3 (0) ,

as required, and this ﬁnishes the proof.

15.9 Appendix: some estimates in G3 Rd Proposition 15.61 Let g ∈ G3 Rd . Then, for some constant c, i,j 2 (i) |π 2 (g)| ≤ c maxi,j distinct g + |π 1 (g)| ; i,i,j i,j,k + |π 1 (g)|3 + |π 2 (g)|3/2 . (ii) |π 3 (g)| ≤ c maxi,j,k distinct g , g Proof. Pick a path x ∈ C01-var [0, 1] , Rd such that g = S3 (x)0,1 =: x0,1 . Then, statement (i) follows from the calculus identity 1 1 i 2 1 2 i,i i,j x1 = |π 1 (g)| . xi dxi = g = x0,1 ≡ 2 2 0 For (ii) we use the basic inequality ab ≤ 13 a3 + 23 b3/2 plus the identities g j,i,i g

i,j,i

= g j g i,i − g i g i,j + g i,i,j , = g j g i,i − g i,i,j − g j,i,i = g i g i,j − 2g i,i,j ,

(15.40) (15.41)

Gaussian processes

452

which we now establish by calculus. Indeed, (15.40) follows from xj,i,i = dxju 1 dxiu 2 dxiu 3 0,1 0< u 1 < u 2 < u 3 < 1 i 2 j i 1 xu ,1 dxu = 1 x0,1 − xi0,u 2 dxju = 2 0< u < 1 2 0< u < 1 =

j i,j i,i,j i xi,i 0,1 x0,1 − x0,1 x0,1 + x0,1

j,i,i whereas (15.41) follows from the fact that xi,j,i 0,1 + x0,1 equals

xiu xju dxiu 0< u < 1

2 1 1 = xi0,1 xj0,1 − 2 2

i 2 j xu dx

0,u

j i,i,j = xi,i 0,1 x0,1 − x0,1 .

0< u < 1

The proof is ﬁnished. Proposition 15.62 Let g, h ∈ G3 Rd with g , h ≤ M for some positive constant M. Assume that for all distinct indices i, j, k ∈ {1, . . . , d} i g − hi ≤ εM, i,j g − hi,j ≤ εM 2 , i,i,j g − hi,i,j ≤ εM 3 , i,j,k g − hi,j,k ≤ εM 3 . Then δ 1/M (g − h) ≤ cε for some constant c. Proof. We may replace g, h by δ 1/M g, δ 1/M h and hence there is no loss of generality assuming M = 1. The proof is now similar to the previous one.

15.10 Comments Our exposition here follows in essence Friz and Victoir [61]. The lift of certain Gaussian processes, including fractional Brownian motion with Hurst parameter H > 1/4, is due to Coutin and Qian [32] and based on piecewise linear approximations. The key role of (enough) decorrelation of increments for the existence of stochastic area was also pointed out by Lyons and Qian [120]. Karhunen–Lo`eve approximations for fractional Brownian motion are studied by Millet and Sanz-Sol´e [130] and also Feyel and de la Pradelle [52], implicitly in Friz and Victoir [63]. We remark that equations (15.29), (15.30) explain why martingale arguments (see also Coutin and Victoir [33], Friz and Victoir [62] and Friz [69]) are enough to discuss the step-two case (H > 1/3), whereas equation (15.32) shows that the step-three case requires

15.10 Comments

453

additional care. A large deviation principle for the lift of fractional Brownian motion was obtained by Millet and Sanz-Sol´e [129], for the Coutin–Qian class in Friz and Victoir [65]. Support statements for lifted fractional Brownian motion, for H > 1/3, appeared in Feyel and de la Pradelle [52] and Friz and Victoir [63]. Our Theorem 15.60 may also be obtained by applying the abstract support theorem of Aida–Kusuoka–Stroock [2, Corollary 1.13]. We conjecture that complementary Young regularity (Condition 15.56) is not needed for Theorem 15.60 to hold true.

16 Markov processes We have seen in a previous chapter that Brownian motion B can be enhanced to a stochastic process B = B (ω) for which almost every realization is a geometric 1/p-H¨older rough path, p ∈ (2, 3). As is well known,1 d-dimensional Brownian motion B is a diﬀusion, i.e. a Markov process with continuous sample paths, with generator 1 2 1 ∆= ∂ . 2 2 i=1 i d

In the present chapter, our aim is to replace Brownian motion by a diﬀusion X = X a with uniformly elliptic generator in divergence form, d 1 ij ∂i a ∂j · , 2 i,j =1 a followed by the construction of a suitable ij lifted process X with geomethad enough regularity, one could ric rough (sample) paths. If a = a eﬀectively realize X a as a semi-martingale, and then construct Xa as an enhanced semi-martingale. However, assuming no regularity (beyond measurability), this route fails. The generator for X a itself is only deﬁned in the “weak” sense and the focus must be on the bilinear (“Dirichlet”) form (f, g) , (f, g) → Rd i,j ai,j ∂i f ∂j g dx. The main tool in this chapter will be a suitable Dirichlet form that allows for a direct construction and analysis of Xa . From Section 16.2 on, we shall rely heavily on the well-developed theory of Dirichlet forms. The essentials (for our purposes) are collected in Appendix E.

16.1 Motivation As is common in the theory of partial diﬀerential equations, we shall assume that a = a (x) is a symmetric matrix such that for some Λ > 0, ∀ξ ∈ Rd : 1 See,

1 2 2 |ξ| ≤ ξ · a (.) ξ ≤ Λ |ξ| ; Λ

for example, [143] Chapter VII, Proposition (1.11).

16.1 Motivation

455

no regularity of a = a (x) is assumed (besides measurability in x ∈ Rd ). Let us say again2 that the study of such a diﬀusion process X a , which in many ways behaves like a Brownian motion, relies heavily on analytic Dirichlet form techniques. In general, X a cannot be constructed as a solution to a stochastic diﬀerential equation and need not be a semimartingale. The main idea is roughly the following. Assume at ﬁrst that a (.) is smooth, with bounded derivatives of all orders. In this case, X a can be constructed as a solution to a stochastic diﬀerential equation: it suﬃces to write the generator of X a in non-divergent form as d 1 ij a ∂i ∂j + ∂i aij ∂j ; 2 i,j =1

then, knowing that a admits a Lipschitz square root,3 say σ, so that o calculus to see that the diﬀuσσ T = a, it is a standard exercise in Itˆ sion constructed as a solution to the (Itˆo) stochastic diﬀerential equation d j ij ∂i a dX = σ (X) dB + b (X) dt, with b = b = i=1

has indeed generator 12 ∂i aij ∂j · . Moreover, X=X a , the so-constructed process,4 is plainly a semimartingale X and hence, following Section 14.1, there is a well-deﬁned stochastic area process t t 1 a;i a;j a;j a;i , with 1 ≤ i < j ≤ d. t → Aa;i,j = X dX − X dX s s 0,t 0,s 0,s 2 0 0 It is not hard o calculus, cf. Exercise 16.15) a to asee (also using standard Itˆ that t → X0,t , A0,t is a diﬀusion process on Rd ⊕ so (d), started at 0, with generator given by d 1 ij ui a uj · La := 2 i,j =1 where, for i = 1, . . . , d,  1 ui |x = ∂i + 2

1≤j < i≤d

2 See,

x1;j ∂j,i −

 x1;j ∂i,j  .

(16.1)

1≤i< j ≤d

for example, [70, 158]. for example, [162, Theorem 5.2.2]. 4 Strictly speaking, this construction depends on the choice of square root and one may prefer to write X σ . However, we shall construct the lifted process in such a way that its generator (and hence its law) only depends on σσ T = a; thereby justifying our notation. 3 See,

Markov processes

456

Here, ∂i denotes the ith coordinate vector ﬁeld on Rd and ∂i,j with i < j the respective coordinate vector ﬁeld on so (d), identiﬁed with its upper diagonal elements. Following Sections 13.2 and 14.1, the enhancement to X a is of the form t exp a a a a a a X0,s ⊗ ◦dXs X0,t ≡ X0,t , A0,t 1, X0,t , log

0

where we can switch between the “path area view” and the “iterated Stratonovich integral view” using exp : Rd ⊕ so (d) ≡ g2 Rd → G2 Rd and its inverse log, respectively. This suggests constructing Xa directly as a g2 Rd-valued Markov process with generator La . In fact, for f, g ∈ Cc∞ g2 Rd integration by parts shows that ai,j ui f uj g dm. !La f, g" = g2 (Rd ) i,j

(Observe that a (.) may indeed be a function on g2 Rd rather than only Rd . In fact, this construction is carried out naturally on gN Rd , which allows for a direct “Markovian” modelling of the “higher-order” areas of Xa .) The right-hand side above, which involves no derivatives of aij , is another ij instance of a Dirichlet form, and allows us to deal with measurable a . Remark 16.1 We shall ﬁnd it more convenient in the present chapter to adopt the path area view and deﬁne the enhanced Markov process Xa as the g2 Rd -valued process (X a , Aa ). Upon setting (x, A) ∗ (x , A )

≡ log (exp (x, A) ⊗ (exp (x , A ))) 1 = x + x , A + A + (x ⊗ x − x ⊗ x) , 2

−1

= (−x, −A) , (x, A) 2 d we see that exp : g R , ∗ → G2 Rd , ⊗ is a Lie group isomorphism. We then can and will work in g2 Rd identiﬁed with G2 Rd , using identical notation. For instance, the Carnot–Caratheodory norm and distance are given by (x, A)

=

1/2

exp (x, A) ∼ |x| + |A| −1 (x, A) ∗ (x , A ) ;

,

d ((x, A) , (x , A )) = elements in C α -H¨o l [0, 1] , g2 Rd , α ∈ (1/3, 1/2), are α-H¨ older geometric 2 d R is replaced rough paths and so forth. The same remarks apply when g we can always use the exponential map to identify by gN Rd , noting that N d N d g R with G R . For instance, the Lyons lift becomes sN : C α -H¨o l [0, 1] , g2 Rd → C α -H¨o l [0, 1] , gN Rd

16.2 Uniformly subelliptic Dirichlet forms

457

where, writing expk for the exponential map from gk Rd to Gk Rd , sN := exp−1 N ◦SN ◦ exp2 .

16.2 Uniformly subelliptic Dirichlet forms The Lie algebra g = gN Rd is naturally graded in the sense that it has a vector space decomposition gN Rd = V1 ⊕ · · · ⊕ VN ⊗n where V1 ∼ is given by5 = Rd , and V2 ∼ = so (d) , and Vn ⊂ Rd Vn ∼ = Rd , . . . , Rd , Rd . . . = span [v1 , . . . , [vn −1 , vn ]] : v1 , . . . , vn ∈ Rd . The Campbell–Baker–Hausdorﬀ formula makes (g, ∗) into a Lie group, iso morphic to GN Rd , ⊗ ). There are left-invariant vector ﬁelds u1 , . . . , ud on g determined by ui |0 = ∂i |0 , i = 1, . . . , d, where ∂i |0 are the coordinate vector ﬁelds associated with the canonical basis of V1 = Rd and T ∇hyp = (u1 , . . . , ud ) is the hypoelliptic gradient on g. Example 16.2 When N = 1, g1 Rd ∼ = Rd and the ui are precisely the standard coordinate vector ﬁelds ∂i . When N = 2, we can identify g = g2 Rd with Rd ⊕ so (d) and in this case ui takes the form given in (16.1). Deﬁnition 16.3 [Ξ (Λ)] For Λ ≥ 1 we call Ξ (Λ) = ΞN ,d (Λ) the set of all measurable maps a (.) from g = gN Rd into the space of symmetric matrices such that ∀ξ ∈ Rd :

1 2 2 |ξ| ≤ ξ · a (.) ξ ≤ Λ |ξ| . Λ

Theorem 16.4 Fix Λ ≥ 1. For f, g ∈ Cc∞ (g,R) and a ∈ Ξ (Λ) we deﬁne the carr´e-du-champ operator Γa (f, g) := ∇hyp f · a∇hyp g =

d i,j =1

5 Recall

that [a, b] = a ⊗ b = b ⊗ a.

ai,j ui f uj g

Markov processes

458

and dm (x) = dx denotes the Lebesgue measure on g, a a Γ (f, g) dm . E (f, g) := g

When a = I, the identity matrix, we simply write Γ, E rather than ΓI , E I . Then E a extends to a regular Dirichlet form, as deﬁned in Section E.2, Appendix E, which possesses Cc∞ (g,R) as a core. The domain of E a , denoted by W 1,2 (g,dm) := D (E a ), does not depend on the particular choice of a ∈ Ξ (Λ) and is given as the closure of Cc∞ -functions with respect to 2

|f |W 1 , 2 (g,dm ) := E (f, f ) + !f, f "L 2 (g,dm ) . At last, E a is strongly local (in the sense of Deﬁnition E.3). Proof. We ﬁrst discuss the case a = I. By invariance of the Lebesgue measure m under (left and right) multiplication on (g, ∗), established in Proposition 16.40 in the appendix to this chapter, one sees that the vector ﬁelds u1 , . . . , ud are formally skew-symmetric so that, for any f, g ∈ Cc∞ (g,R), E (f, g) =

d

!ui f, ui g"L 2 = −

i=1

d

!ui ui f, g"L 2 = !Lf, g"

i=1

2 where L is given in H¨ormander form i=1 u2i . Consider now a sequence of Cc∞ -functions fn → 0 in L2 so that E (fn − fm , fn − fm ) → 0 as n, m → ∞. To see that E is closeable with core Cc∞ (g,R), and hence extends to a regular Dirichlet form, we need to check that E (fn , fn ) → 0 as n → ∞. To this end, ﬁx ε > 0 and pick k large enough so that E (fn − fk , fn − fk ) ≤ 1 for all n > k. Using bilinearity and E (fn , fk ) ≤ E (fn , fn ) easily follows that sup n ∈{k ,k +1,... }

1/2

1/2

E (fk , fk )

it

E (fn , fn ) ≤ C < ∞

where C depends on E (fk , fk ) only. Moreover, for all n > m > k, E (fn , fn ) = !Lfm , fn "L 2 + E (fn − fm , fn ) ≤

1/2

|fn |L 2 |Lfm |L 2 + CE (fn − fm , fn − fm )

so that we can ﬁrst choose m large enough such that for all n > m, 1/2

CE (fn − fm , fn − fm )

< ε/2,

followed by taking n large enough so that |fn |L 2 |Lfm |L 2 < ε/2. But this shows that for n large enough E (fn , fn ) < ε, as required. For the discussion

16.2 Uniformly subelliptic Dirichlet forms

459

of a = I, let us note that E (fn , fn ) → 0 can be equivalently expressed by saying 2 ∀i ∈ {1, . . . , d} : |ui fn |L 2 → 0 as m → ∞ and by passing to a subsequence we may assume that ui fn → 0 a.e. for all i = 1, . . . , d. Hence   d E a (fm , fm ) = lim  ai,j ui (fn − fm ) uj (fn − fm ) dm n →∞

i,j =1

≤ lim inf E a (fn − fm , fn − fm ) m →∞

by Fatou’s lemma, which shows that E a is also closeable. Strong locality of the resulting Dirichlet form, also denoted by E a , is a simple consequence of the fact that the ui are (pure) ﬁrst-order diﬀerential operators. We now establish three important properties related to this setup. Proposition 16.5 Let a ∈ ΞN ,d (Λ). Then the following hold. (i) The intrinsic distance associated with E a , da (x, y) = sup {f (x) − f (y) : f ∈ D (E a ) ∩ Cc (g) and Γa (f, f ) ≤ 1} , deﬁnes a genuine metric on g. When a = I, it coincides with the Carnot– Caratheodory metric d on g. Otherwise we have, for all x, y ∈ g, 1 d (x, y) ≤ da (x, y) ≤ Λ1/2 d (x, y) . Λ1/2 In particular, the topology induced by da coincides with the canonical topology on g. (ii) Set B a (x, r) = {y ∈ g : da (x, y) < r}. Then, ∀r ≥ 0 and x ∈ g : m (B a (x, 2r)) ≤ 2Q m (B a (x, r)) with doubling constant Q given by Q = (dimH g) (1 + 2 ln Λ/ ln 2) where dimH (g) is the “homogenous” dimension6 of g deﬁned by dimH (g) =

N

n dim Vn .

n =1

(iii) The weak Poincar´e inequality; that is, for all r ≥ 0, x ∈ g and f ∈ D (E), f − f¯ 2 dm ≤ Cr2 Γ (f, f ) dm B a (x,r )

B a (x,2r )

6 As a matter of fact, it is also the Haussdorﬀ dimension of g when equipped with Carnot–Caratheodory metric.

Markov processes

460

where C = C (Λ, . . . ) and f¯ is the average of f over B a (x, r), i.e. −1 f¯ = m (B a (x, r)) f dm. B a (x,r )

Proof. Case 1: a = I, the identity matrix. In this case (i)–(iii) are wellknown facts from analysis on free nilpotent groups. For the sake of completeness, statement (i) is shown in Section 16.9.3; (ii) follows by leftinvariance and scaling; noting that on g = gN Rd , the dilation map δ 2 : x(1) , . . . , x(N ) → 2x(1) , . . . , 2N x(N ) has a Jacobian with value 2dim g . A few more details are given in Section 16.9.1. At last, (iii) is a Poincar´e inequality and the reader can ﬁnd a self-contained proof in Section 16.9.2. Case 2: a ∈ Ξ (Λ) and no regularity assumptions beyond measurability. The key observation is that E I and E a are (obviously) quasi-isometric in the sense that 1 E (f, f ) ≤ E a (f, f ) ≤ ΛE (f, f ) Λ

or

1 Γ (f, f ) ≤ Γa (f, f ) ≤ ΛΓ (f, f ) Λ

and we conclude with invariance of properties (i)–(iii) under quasi-isometry (cf. Theorem E.8 in Appendix E). As it turns out (cf. Section E.4 in Appendix E for precise statements), the just-established properties (i)–(iii) allow us to use a highly developed, essentially analytic machinery. In particular, E a determines a (non-positive) self-adjoint operator La on L2 (g, dm) and weak (local) solutions to ∂t u = La u satisfy a parabolic Harnack inequality as well as H¨ older regularity in spacetime. More precisely, we have Proposition 16.6 (parabolic Harnack inequality) Let a ∈ ΞN ,d (Λ). There exists a constant7 CH = CH (Λ) such that sup (s,y )∈Q −

u (s, y) ≤ CH

inf

(s,y )∈Q +

u (s, y) ,

whenever u is a non-negative weak solution of the parabolic partial diﬀerential equation ∂t u = La u on some cylinder Q = t − 4r2 , t × B a (x, 2r) − 2 2 + reals at, r > 0. Here, Q = t − 3r , t − 2r × B (x, r) and Q = for some 2 t − r , t × B (x, r) are lower and upper subcylinders of Q separated by a lapse of time. 7 As

usual, dependence on N, d is not explicitly written.

16.2 Uniformly subelliptic Dirichlet forms

461

Proposition 16.7 (de Giorgi–Moser–Nash regularity) Let a ∈ Ξ (Λ). Then there exist constants η ∈ (0, 1) and CR , only depending on Λ, such that η 1/2 |s − s | + da (y, y ) |u (s, y) − u (s , y )| ≤ CR sup |u| . sup r u ∈Q 2 (s,y ),(s ,y )∈Q 1 whenever u is a non-negative weak solution of theparabolicpartial diﬀeren2 Q2 ≡ tial equation ∂s u = La u on some t − 4r , t × B (x, 2r) for cylinder 2 2 some reals t, r > 0. Here Q1 ≡ t − r , t − 2r × B (x, r) is a subcylinder of Q2 . We also note that the L2 -semi-group (Pta : t ≥ 0) associated with La resp. E a admits a kernel representation8 of the form (Pta f ) = f (y) pa (t, ·, y) dm (y) (16.2) where the so-called heat-kernel p is a non-negative weak solution of the parabolic partial diﬀerential equation ∂s u = La u with (distributional) initial data u (0, ·) = δ x . Thanks to self-adjointness of La , the heat-kernel p = pa (t, x, y) is symmetric in x and y. As discussed in the generality of Section E.5, Appendix E, the heat-kernel allows for the construction of a continuous, symmetric diﬀusion process X = Xa,x associated with La resp. E a so that the ﬁnite-dimensional distributions of X are given by a;x pa (t1 , x, y1 ) . . . pa (tn P [(Xt 1 , . . . , Xt n ) ∈ B] = B

− tn −1 , yn −1 , yn ) dy1 . . . dyn . We remark that Pa,x = X∗ Pa;x , the law of X, can be viewed as a Borel measure on Cx ([0, ∞), g). Although it is not always necessary to be speciﬁc about the underlying probability space, this allows us to realize X as a coordinate process on the path-space, i.e. Xt (ω) = ω t for ω ∈ Cx ([0, ∞), g), equipped with Pa,x . Proposition 16.8 (weak scaling) For any a ∈ Ξ (Λ) , r = 0 set ar (x) := a δ 1/r x ∈ Ξ (Λ), where we recall that δ denotes dilation on g, r a,δ (x) D Xat ,x : t ≥ 0 = δ r Xt/r 12 / r :t≥0 . Proof. It is easy to see, cf. Remark E.16, that (Xaλt : t ≥ 0) is the symmetric diﬀusion associated with λ2 E a . On the other hand, our state space has a structure that allows spatial scaling via dilation. Then the generator of 8 See

Exercise 16.10 below for a direct proof.

Markov processes

462

r (δ r Xat : t ≥ 0) is given by r2 La (δ 1 / r ·) ≡ r2 La or, equivalently, the Dirichlet r 2 ) shows form r2 E a . Combining these two transformations (take λ = 1/r a ar that δ r Xt/r 2 : t ≥ 0 has associated Dirichlet form given by E . It is also r

clear that starting δ r Xat/r 2 at x is tantamount to starting Xat/r 2 at δ 1/r (x). Exercise 16.9 Let B be an enhanced Brownian motion. (i) Show that XI /2;x ≡ x ∗ log SN ((B)) d is a symmetric diﬀusion, started at x ∈ gN Rd , with generator i=1 ui ◦ui . (ii) Use scaling for enhanced Brownian motion to deduce the on-diagonal heat-kernel estimate p (t, x, x) ≤ t− dim H

g/2

p (1, δ t −1 / 2 x, δ t −1 / 2 x) ≤ ct− dim H

(This is equivalent to |Pt |L 1 →L ∞ ≤ ct− dim H semi-group.)

g/2

g/2

.

where Pt is the associated

Exercise 16.10 (i) Assume E is an abstract (symmetric) Dirichlet form and write (Pt ) for the associated Markovian semi-group. Let ν ∈ (0, ∞). Show that the following two statements are equivalent: - there exists C1 such that for all t > 0, |Pt |L 1 →L ∞ ≤ C1 t−ν /2 ;

(16.3)

- Nash’s inequality holds, i.e. there exists C2 such that for all f ∈ D (E)∩ L1 , 2+ 4/ν

|f |L 2

2/ν

≤ C2 E (f, f ) |f |L 1 .

(16.4)

(When switching between the two estimates, the constant Cj depends only on Ci and ν.) N ,d (Λ) and the Dirichlet form E a on L2 (g, dm) (ii) Consider now d a ∈ Ξ N where g = g R as usual. Use Exercise 16.9 and invariance of (16.4) under quasi-isometry to establish |Pta |L 1 →L ∞ ≤ C3 t− dim H

g/2

.

(iii) Deduce the existence of a heat-kernel pa , with the on-diagonal estimate ∀t > 0, x ∈ g : pa (t, x, x) ≤ C3 t− dim H so that (16.2) holds for any f ∈ L2 .

g/2

,

16.3 Heat-kernel estimates

463

16.3 Heat-kernel estimates As in the previous section, write g = gN Rd . We now turn to “Gaussian” estimates of the heat-kernel pa : (0, ∞) × g × g → [0, ∞). Sharp estimates involve the intrinsic metric da on g, introduced in Proposition 16.5; although, for most (rough path) purposes one can use the Carnot– Caratheodory metric d. Once more, following Section E.4, Appendix E, all results of this Section are an automatic consequence of properties (i)–(iii) in Proposition 16.5. Nonetheless, it is instructive to note that an application of Harnack’s inequality immediately leads to pa (t, x, x) ≤ c1

1 1 ≤ c2 ≤ c3 t− dim H m B a x, t1/2 m B x, t1/2

g/2

,

where c1 , c2 , c3 depend only on Λ, in agreement with the conclusion of Exercise 16.10. We now state the full heat-kernel estimates. Theorem 16.11 Let a ∈ Ξ (Λ). Then, for all t > 0 and x, y ∈ g we have: (i) (upper heat-kernel bound) for any ε > 0 ﬁxed there exists Cu = Cu (ε, Λ) such that 2 a C d (x, y) u ; exp − pa (t, x, y) ≤ √ (4 + ε) t tdim H g (ii) (lower heat-kernel bound) there exists Cl = Cl (Λ) such that 2 1 Cl da (x, y) 1 a √ . p (t, x, y) ≥ exp − Cl tdim H g t Proof. An immediate corollary of the abstract heat-kernel estimates in Section E.4, Appendix E. Corollary 16.12 For any a ∈ Ξ (Λ), write X = Xa,x for the (continuous) g-valued diﬀusion process associated with E a , started at x ∈ g. Then, (i) for all η < 1/ (4Λ) we have 2 d (Xt , Xs ) a,x < ∞; (16.5) exp η sup sup E Mη := sup t−s a∈Ξ(Λ) x∈g2 (Rd ) 0≤s< t≤1 moreover, there exists C (Λ) such that Mη ≤ 1 + Cη ≤ exp (Cη) for all 1 ; η ∈ 0, 16Λ (ii) for any α ∈ (0, 1/2), there exists c = c (α, Λ) such that9 2 sup sup Ea,x exp cη Xα -H¨o l;[0,1] < ∞; a∈Ξ(Λ) x∈g2 (Rd )

9 By convention, X α -H ¨o l;[0 , 1 ] is deﬁned with respect to the Carnot–Caratheodory d on g.

Markov processes

464

(iii) there exists c = c (Λ) such that sup

sup

a∈Ξ(Λ) x∈g2 (Rd )

2 Ea,x exp cη Xψ 2 , 1 -var;[0,1] < ∞.

Proof. A straightforward computation shows that the upper heat-kernel estimate implies (16.5); this is not speciﬁc to the present setting and hence is carried out in a general context in Section E.6.1, Appendix E. The estimate on Mη for small λ < 1/ (16Λ) follows readily from the inequality exp (x) ≤ 1 + x exp (x) , for x > 0, and we obtain 2 2 d (Xt , Xs ) d (Xt , Xs ) a,x exp η . Mη ≤ 1 + η sup sup E t−s t−s x∈Rd s< t∈[0,1] Now it suﬃces to apply Cauchy–Schwarz, noting that 2η ≤ 1/ (8Λ) < 1/ (4Λ). The Fernique estimates for Xα -H¨o l;[0,1] are then a consequence of general principles, namely Theorem A.19 in the Appendix A. Recall that pa (t, x, y) dy is precisely the law of Xa;x t , i.e. the marginal law of a g-valued diﬀusion process. We now state a “localized” lower bound by considering the process Xa;x killed at its ﬁrst exit from a ﬁxed ball in g. Theorem 16.13 (localized lower heat-kernel bound) Let a ∈ Ξ (Λ) , x, x0 ∈ g, r > 0 and write X = Xa,x for the diﬀusion process associated with E a , started at x. Also set ∈ / B a (x0 , r)} . ξ B (x 0 ,r ) = inf {t ≥ 0 : Xa;x t Then the measure Pa;x Xt ∈ · , ξ B (x 0 ,r ) > t admits a density paB (x 0 ,r ) (t, x, y) dy with respect to the Lebesgue measure on g. Moreover, if x, y are two elements of B a (x0 , r) joined by a curve γ which is at a da -distance R > 0 of g/B a (x0 , r), there exists a constant Cll = Cll (Λ) such that 2 Cll t 1 da (x, y) a exp − 2 exp −Cll pB (x 0 ,r ) (t, x, y) ≥ 2 t R Cll δ d /2 where δ = min t, R2 .

16.4 Markovian rough paths The considerations of the previous section, with g = gN Rd and uniformly elliptic matrix a ∈ Ξ (Λ) = ΞN ,d (Λ), apply to every ﬁxed N ∈ {1, 2, . . . }. Corollary 16.12 tells us that the g-valued process Xa has a.s. sample paths of ﬁnite α-H¨ older regularity (with respect to the Carnot–Caratheodory metric on g), for any α ∈ [0, 1/2).

16.4 Markovian rough paths

465

For N = 1 and a ∈ Ξ1,d (Λ) we prefer to write X a (instead of Xa . . . ) and note that X a is an Rd -valued Markov process. Similar to Brownian motion, its sample paths are not geometric rough paths. If a is smooth, X a is a semia martingale but for general a ∈ Ξ1,d (Λ), not be a semi-martingale. X need 1 1 For N ≥ 2 we can pick any α ∈ N +1 , N and thus obtain a Markov process Xa whose sample paths are a.s. α-H¨ older geometric rough paths. Of course, by means of the Lyons lift we have a deterministic one-to-one correspondence (cf. Theorem 9.12), applicable to almost every realization of Xa , ˜ a ≡ (1, π 1 (Xa ) , π 2 (Xa )) ↔ Xa , X and we can recover {Xas : 0 ≤ s ≤ t} from its level-two projection ˜ as : 0 ≤ s ≤ t . On the other hand, the projected process X ˜ a need not X a be Markovian: for instance, when N = 3, the future evolution d of X (and a a 3 ˜ thus of X ) will depend on the current state of X ∈ g R and thus, in ˜a. general, on π 3 (Xa ) which is not part of the state space for X Deﬁnition 16.14 (i) Let N ≥ 2 and a ∈ ΞN ,d (Λ). Almost every sample path process Xa , constructed from the Dirichlet form E a on ofa Markov L2 gN Rd , a ∈ ΞN ,d (Λ), is an α-H¨older geometric rough path,10 for some α < 1/2, and is called a Markovian rough path. (ii) Let a ∈ Ξ1,d (Λ) , a ◦ π 1 ∈ Ξ2,d (Λ) and ﬁx x ∈ g2 Rd . The g2 Rd a◦π 1 ,x valued Markov , constructed from the Dirichlet form E a◦π 1 process X on L2 g2 Rd , is called a natural lift of X a;π 1 (x) or an enhanced Markov process. The “naturality” of our deﬁnition of an enhanced Markov process comes from various points of view. (i) If a ∈ Ξ1,d (Λ) is smooth then X a is a semi-martingale; following Section 14.1, we can then enhance X a with its stochastic area, a by iterated Stratonovich integration, and so obtain say A ,dgiven 2 a g R -valued lift of X a which is seen to have the same law as Xa◦π 1 , deﬁned via the Dirichlet form E a◦π 1 . (See Exercise 16.15 below.) (ii) If a general a ∈ Ξ1,d (Λ) is the limit of smooth an ⊂ Ξ1,d (Λ), then Xa n ◦π 1 converges weakly and the limiting law coincides with that implied by E a◦π 1 . (See Exercise 16.15 below.) a (ii bis) This implies that, if for general a ∈ Ξ1,d (Λ) we construct X via a d E (for instance, on the canonical path space C [0, 1] , R with appropriate measure P a ) and also Xa◦π 1 via E a◦π 1 (for instance, 1 0 . . . on

any interval [0, T ] . . .

Markov processes

466

on the canonical path space C [0, 1] , g2 Rd with appropriate measure Pa◦π 1 ), then D

π 1 (Xa◦π 1 ) = X a

or, equivalently, (π 1 )∗ Pa◦π 1 = P a .

(16.6)

(See Exercise 16.15 below.) (iii) For any a ∈ Ξ2,d (Λ) we can construct X = Xa via E a (for in2 d with apstance, on the canonical path space C [0, 1] , g R propriate measure Pa ). We will see that, on that same probabilty space, for α < 1/2, D dα -H¨o l;[0,1] S2 π 1 (X) m , X → 0 in Pa -probability. Here xD m denotes the piecewise linear approximations, and (Dm ) ⊂ D [0, 1] is a sequence of dissection with mesh |Dm | → 0. (See Theorem 16.25 in Section 16.5.2.) X = X a via E a (for instance, on (iv) If a ∈ Ξ1,d (Λ) we can construct the canonical path space C [0, 1] , Rd with appropriate measure ˜ deﬁned on the P a ). We can then ask if there exists a process X same probability space as X such that ˜ → 0 in P a -probability for α < 1/2. dα -H¨o l S2 X D m , X D The answer is aﬃrmative. Indeed, from (iii) π 1 (X) m , m ∈ N is Cauchy with respect to dα -H¨o l in Pa -probability and hence, ˜ say, using (16.6), also Cauchy in P a -probability, with limit X constructed on the same probability space as X. On the other D D D ˜ = X. hand, since π 1 (X) m = X D m for all m we must have X ˜ does not depend on the partic(This also shows that the limit X ˜ ular sequence (Dm ) underlying the construction of X.) Exercise 16.15 Fix x ∈ g2 Rd and take, for simplicity of notation only, x = 0. Let a ∈ Ξ1,d (Λ) be smooth. o stochastic diﬀerential (i) Construct X a = X a,0 as a solution to an Itˆ equation and verify that X a is a semi-martingale. (ii) From Section 14.1, we can enhance X a with its stochastic area, say Aa , given by iterated Stratonovich integration, and so obtain a g2 Rd valued lift of X a . Verify that this is consistent with of the construction Xa◦π 1 = Xa◦π 1 ,0 via the Dirichlet form E a◦π 1 on L2 g2 Rd in the sense that D (X a , Aa ) = Xa◦π 1 .

16.5 Strong approximations

467

a◦π 1 a◦π 1 Deduce viewed as a Borel measure that, if2 P d denotes the law aof X P denotes the law of X a viewed as a on C [0, 1] , g R , and similarly Borel measure on C0 [0, 1] , Rd , then

(π 1 )∗ (Pa◦π 1 ) = P a . (iii) Let (˜ an ) ⊂ Ξ2,d (Λ) be smooth so that a ˜n → a ˜ ∈ Ξ2,d (Λ) a.s. (which is always possible by using molliﬁer approximations for a given a ˜ ∈ Ξ2,d (Λ)). As we shall see later in Section 16.6, this entails weak convergence (with respect to uniform topology, say) Xa˜ n =⇒ Xa˜ . Apply this convergence result to a ˜n = an ◦ π 1 , where an ∈ Ξ1,d is a smooth molliﬁer approximation to a ∈ Ξ1,d . Conclude that (16.6) remains valid by the (uniformly for all a ∈ Ξ1,d (Λ), where Pa◦π 1 is fully determined a P subelliptic) Dirichlet form E a◦π 1 on L2 g2 Rd and d fully determined a 2 by the (uniformly elliptic) Dirichlet form E on L R .

16.5 Strong approximations 16.5.1 Geodesic approximations

Recall that g = gN Rd equipped with Carnot–Caratheodory distance d is a geodesic space. Given dissection D of [0, 1] and a deterministic a path x ∈ C α -H¨o l [0, 1] , gN Rd we can approximate x by its piecewise geodesic approximation, denoted by xD , obtained by connecting the points (xt i : ti ∈ D) with geodesics run at unit speed. This was already discussed in Section 5.2, where we saw that sup xD α -H¨o l;[0,1] ≤ 31−α xα -H¨o l;[0,1] (16.7) D ∈D[0,1]

and also xD → x uniformly on [0, 1] as |D| → 0. Of course, our state space g, which may be identiﬁed with GN Rd , has additional structure. The approximation xD has ﬁnite length and so has its projection geodesic D D π 1 x to an Rd-valued path. We can then recover x by computing areaD integral(s) of π 1 x , formally xD = log SN π 1 xD ≡ sN π 1 xD . By interpolation (cf. Proposition 8.17) it is then clear that dα -H¨o l log SN π 1 xD , x → 0 as |D| → 0. Observe that π 1 xD is constructed based on knowledge of the entire gvalued path x. In our present application, it would be enough to know only s2 (x), the projection of x to its ﬁrst two levels. We have

468

Markov processes

Proposition 16.16 Let N ≥ 2 and x ∈ C α -H¨o l [0, 1] , gN Rd for any α < 1/2. Then for any k ∈ {2, . . . , N }, D , x → 0 as |D| → 0. dα -H¨o l sN π 1 sk (x) Remark 16.17 This proposition is purely deterministic and does not hold, in general, for k = 1. However, when x = x (ω) is a suitable sample path (see discussion below) then, for k = 1, convergence may hold almost surely. Proof. It is enough to consider k = 2. Take α ∈ (1/3, 1/2) so that the projection s2 (x) ≡ (π 1 (x) , π 2 (x)) is a geometric α-H¨older rough path which allows us to reconstruct the original path as the Lyons lift sN ◦ s2 (x) = x. Obviously, the geodesic approximations to s2 (x), given by D D [s2 (x)] = s2 ◦ π 1 s2 (x) , converge uniformly with uniform α -H¨older bounds, where α ∈ (α, 1/2) and then, by interpolation, in α-H¨older distance. By continuity of the Lyons lift, this implies D → sN ◦ s2 (x) = x as |D| → 0. sN ◦ π 1 s2 (x) To see that this cannot be true for k = 1, it suﬃces to take a pure area rough path, say t → (0, 0; t) ∈ g2 R2 which is (1/2)-H¨ older. Obviously, no (lifted) geodesics approximation to (0, 0) can possibly recover the original path in g2 R2 . We emphasize that this approximation applies to Markovian rough paths Xa;x in a purely deterministic fashion, path-by-path, and requires (at least) a priori knowledge of path and area, s2 (Xa;x ) ≡ (π 1 (Xa;x ) , π 2 (Xa;x )) . In contrast, we shall establish in the next section the probabilistic statement that (lifted) piecewise linear approximations to the Rd -valued path π 1 (Xa;x ) also converge to Xa;x , i.e. D dα -H¨o l sN π 1 (x) , x → 0 as |D| → 0 in probability (and, in fact, in Lq for all q < ∞).

16.5 Strong approximations

469

16.5.2 Piecewise linear approximations In contrast to the just-discussed geodesic approximation, convergence of piecewise linear approximations, based on the Rd -valued path π 1 (Xa,x ) alone and without a priori knowledge of the area π 2 (Xa,x ), is a genuine probabilistic statement and relies on subtle cancellations. We maintain our standing notation, g = gN Rd , and ΞN ,d (Λ) denotes uniformly elliptic matrices deﬁned on g. Recall that for any a ∈ ΞN ,d (Λ), we have constructed a g-valued diﬀusion process X = Xa , associated with the Dirichlet form E a . This process can be projected to an Rd -valued process, X = π 1 (X) , which will not be Markov in general. Theorem 16.18 Let α < 1/2, N ≥ 2 and a ∈ ΞN ,d (Λ). Then, for every x ∈ g we have dα -H¨o l;[0,1] sN X D , X → 0 in Lq (Pa,x ) as |D| → 0. The proof stretches over the remainder of this section, and we shall just argue here how to reduce the proof to the seemingly simpler statements that ˜ → 0 in probability, (16.8) dα -H¨o l;[0,1] s2 X D , X ˜ := s2 (X) = (π 1 (X) , π 2 (X)) is non-Markov in general, and where X < ∞. (16.9) sup s2 X D α -H¨o l;[0,1] q a ,x L (P

D

)

(We will obtain (16.8), (16.9) in the forthcoming Theorems 16.25 and 16.19 below). Indeed, taking α ∈ (1/3, 1/2) we can use continuity and basic estimates of the Lyons lift, to see that (16.8), (16.9) imply dα -H¨o l;[0,1] sN X D , X → 0 in probability, (16.10) sup sN X D α -H¨o l;[0,1] < ∞. (16.11) q a ,x D

L (P

)

The convergence statement in Theorem 16.18 then follows a fortiori from general principles (based on interpolation), see Proposition A.15 in Appendix A. We now discuss the ideas that will lead us to the proof of (16.8) and (16.9). The ideas Fix a dissection D = {ti : i} of [0, 1] and a ∈ Ξ (Λ). Let us project X = Xa to the Rd -valued process X = X a and consider piecewise linear approximations to X based on D, denoted by X D . Of course, X D has a canonically

Markov processes

470

deﬁned area given by the usual iterated and thus gives rise to a integrals g-valued path which we denote by s2 X D . For 0 ≤ α < 1/2 as usual, the convergence ˜ → 0 in probability dα -H¨o l s2 X D , X (16.12) as |D| → 0 is a subtle problem and the diﬃculty is already present in the pointwise convergence statement ˜ 0,t as |D| → 0. s2 X D 0,t → X Our idea is simple. Noting that straight-line segments do not produce area, it is an elementary application of the Campbell–Baker–Hausdorﬀ formula to see that for t ∈ D = {ti }, −1 ˜ 0,t = s2 X D 0,t ∗X At i ,t i + 1 ,

(16.13)

i

˜ and ∪i [ti , ti+1 ] = [0, t]. On the other hand, it is where A is the area of X relatively straightforward to show that the Lp norm of s2 X D α -H¨o l;[0,1] is ﬁnite uniformly over all D. In essence, this reduces (16.12) to the point wise convergence statement, which we can rephrase as i At i ,t i + 1 → 0. It is natural to show this in L2 , since this allows us to write11  2  2 E  At i ,t i + 1  = E At i ,t i + 1 + 2 E At i ,t i + 1 · At j ,t j + 1 . i

i

i< j

For simplicity only, assume ti+1 − ti ≡ δ for all i. As a sanity check, if X were a Brownian motion and A the usual L´evy area, all oﬀ-diagonal terms are zero and 2 2 1 E At i ,t i + 1 ∼ δ ∼ δ 2 → 0 with |D| = δ → 0, δ i i which is what we want. Back to the general case of X = Xa , the plan must be to cope with the oﬀ-diagonal sum. Since there are ∼ δ 2 /2 terms, what we need is E At i ,t i + 1 · At j ,t j + 1 = o δ 2 . To this end, let us momentarily assume that sup Ea,x (A0,δ ) = o (δ)

(16.14)

x∈g

1 1 Recall 2

that so (d) ⊂ Rd ⊗ Rd has Euclidean structure, i.e. A · A˜ =

d k ,l= 1

A k , l A˜k , l

and |A| = A · A. It may be instructive to consider d = 2, in which case A can be viewed as scalar.

16.5 Strong approximations

471

holds. Then, using the Markov property,12 ≤ E At ,t × EX t j A0,δ = E At ,t × o (δ) E At ,t · At ,t i i+ 1 j j+1 i i+ 1 i i+ 1 and since E At i ,t i + 1 ∼ δ, by a soft scaling argument, we are done. Unfortunately, (16.14) seems to be too strong to be true, but we are able to establish a weak version of (16.14) which is good enough to successfully implement what we just outlined. The key to all this (cf. the proof of the forthcoming Proposition 16.20) is a semi-group argument which leads to the desired cancellations. Uniform H¨ older bound Let X D denote the piecewise linear approximation to X = X (ω). We will need Lq -bounds, uniformly over all dissections D, of the homogenous αH¨ older norm of the path X D and its area. That is, we want sup s2 X D α -H¨o l;[0,1] q a ; x < ∞. D

L (P

)

This will follow a fortiori from the following uniform Fernique estimates. Theorem 16.19 There exists η = η (Λ) > 0 such that    2 D X s s,t    2 sup sup Ea,x exp η sup  < ∞. t−s a∈Ξ(Λ),x∈g D 0≤s< t≤1

(16.15)

As a consequence, for any α ∈ [0, 1/2) there exists C = C (α, Λ) > 0 so that 2 < ∞. sup Ea,x exp C s2 X D α -H¨o l;[0,1] sup a∈Ξ(Λ),x∈g D

Proof. Estimate (16.15) shows that the process s2 X D satisﬁes the Gaussian integrability condition put forward in Section A.4, Appendix A, uniformly over a, x, D as indicated. The consequence then follows from general principles, found in the same appendix. (We could also obtain uniform ψ 2,1 variation estimates.) In other words, we only have to establish (16.15). To 1 ), this end, we recall from Corollary 16.12 that for η ∈ [0, 4Λ 2 Xs,t a,x < ∞. exp η sup sup E Mη ≡ t−s a∈Ξ(Λ),x∈g 0≤s< t≤1 ˆt ∈ is important that we condition with respect to X t j ∈ gN Rd and not X j d ˆ R , since X is Markov whereas, in general, X is not.

1 2 It

g2

Markov processes

472

Then, by the triangle inequality,13 s2 X D s,s D s2 X D s D ,t D s2 X D t D ,t s2 X D s,t √ √ 5 √ + ≤ + t − tD t−s sD − s tD − sD D D Xt ,t s2 X D s D ,t D Xs,s D 5 + ≤ √ +√ D t − tD sD − s t − sD D Xs,s D s2 X D s D ,t D Xt ,t 5 + ≤ √ +√ D t − tD sD − s tD − sD  1/2 x 2 2 2 D 3 X s 2 3 Xt D ,t  s D ,t D  3 Xs,s D + ≤ +  . sD − s tD − sD t − tD Hence,   2 D X s 2 s,t    Ea,x exp η  t−s 

≤

 

 

3 X s D Ea,x exp η  s Ds ,−s 





2 ≤ M6η Ea,x exp 6η

2

+

D 3 s2 (X )

s2 (X D ) sD ,tD t D −s D

sD ,tD

t D −s D 2

2

3 X t D , t t−t D

2

+



   



and the proof is reduced to show that for some η > 0 small enough,    2 D   s2 X s,t  sup sup Ea,x exp 6η sup  < ∞. t−s a∈Ξ(Λ),x∈g D s< t∈D By the triangle inequality for the Carnot–Caratheodory distance, for ti , tj ∈ D, D ˜ ˜ . s2 X D t ,t ≤ X t i ,t j + d Xt i ,t j , s2 X t ,t i

j

i

To proceed we note that, similar to equation (16.13),

s2 X

D

−1

t i ,t j

˜ t i ,t j = ∗X

j −1 k =i

1 3 Note

ˆ that X s , t ≤ X s , t for all s < t.

At k ,t k + 1 .

j

16.5 Strong approximations

473

By left-invariance of the Carnot–Caratheodory distance d and equivalence of continuous homogenous norms (so that, in particular, (x, A) ∼ |x| + 1/2 where |·| denotes Euclidean norm on Rd resp. Rd ⊗ Rd ), there exists |A| C such that j −1 D ˜ t i ,t j , s2 X = 0, A d X t ,t k k + 1 t i ,t j k =i @ 1/2 j −1 Aj −1 A At ,t At k ,t k + 1 ≤ CB ≤ C k k+1 k =i k =i @ Aj −1 A Xt ,t 2 . ≤ CB k k+1 k =i

By Cauchy–Schwarz,  

2  D X s 2 t i ,t j   a,x  E exp 6η  tj − ti j −1 2 2 X t i , t j k = i X t k , t k + 1 ≤E exp 12η t j −t i exp 12Cη t j −t i 2 F X tk ,tk + 1 j −1 ≤ M24η Ea,x k = i exp 24Cη t j −t i a,x

and the Ea,x (. . . ) term in the last line is estimated using the Markov property as follows: Ea,x

j −1 G

exp 24Cη

k=i

≤ ≤

Fj −1 k=i

Fj −1

Xt

k ,t k + 1

2

tj − ti

supx∈g E

a,x

2 t k + 1 −t k X 0 , t k + 1 −t k exp 24Cη t j −t i t k + 1 −t k

M24C η t k + 1 −t k t j −t i Fj −1 1 −t k ≤ k = i exp C × 24Cη t kt+j −t i = exp (24C Cη) < ∞, k=i

for η small enough

where we used the “estimate on Mη ” given in Corollary 16.12, valid for η small enough. The proof is then ﬁnished. The subtle cancellation Let us deﬁne rδ (t, x) :=

1 a,x E (At,t+δ ) ∈ so (d) δ

and

rδ (x) := rδ (0, x) .

Markov processes

474

For instance, (16.14) is now expressed as limδ →0 rδ (x) → 0 uniformly in x. Our goal here is to establish a weak version of this. We also recall that At,t+δ = π 2 (Xt,t+δ ) = π 2 X−1 t ∗ Xt+ δ . Proposition 16.20 (i) We have uniform boundedness of rδ ;t (x) , sup

sup

sup

x∈g2 (R d ) δ ∈[0,1] t∈[0,1−δ ]

(ii) For all h ∈ L1 (g, dm) ,

dx h (x) rδ (x) ≡ 0.

lim

δ →0

rδ (t, x) < ∞.

g

Proof. (i) follows from Theorem 16.19. For (ii) we ﬁrst note that it suﬃces to consider h smooth and compactly supported. Now the problem is local and we can assume that smooth locally bounded functions such as the coordinate projections π 1;j and π 2;k ,l are in D (E a ). (More formally, we could smoothly truncate outside the support of h and work on a big torus.) Clearly, it is enough to show the componentwise statement lim dx h (x) π 2;k ,l (rδ (x)) ≡ 0 δ →0

g

for k < l ﬁxed in {1, . . . , d}. To keep the notation short we set f ≡ π 2;k ,l (·) and abuse it by writing A instead of Ak ,l . We can then write Ea,· (At ) ≡ Ea,· (f (Xt )) =: Pta f (.) and note that P0a f (x) = A when x = x1 , A ∈ g. Writing !·, ·" for the usual inner product on L2 (g, dx), we have H I 1 a,· a,. a,· 1 [π 1 (·) , Xt ] h, E f (Xt ) − A − E !h, E A0,t " = 2 H I 1 a,· a a 1 [π 1 (·) , Xt ] = !h, Pt f − P0 f " − h, E 2 H I t 1 a,· a a 1 [π 1 (·) , Xt ] E (h, Ps f ) − h, E = 2 0 H I 1 a,· a 1 [π 1 (·) , Xt ] + o (t) . = E (h, f ) × t − h, E 2 Here, again, we abused the notation by writing [·, ·] instead of picking out k ,l the (k, l) component and using the cumbersome notation [·, ·] . Note that in general E a (h, f )×t = o (t) and our only hope is cancellation of 2E a (h, f ) with the bracket term 2 3 3 2 h, Ea,· [π 1 (·) , X1t ] ≡ h, Ea,· [π 1 (·) , X1t ]k ,l .

16.5 Strong approximations

475

To see this cancellation, we compute the bracket term 3 2 a,· 1 k ,l 1;l 1;k [π 1 (·) , Xt ] = dx h (x) Ea,x x1;k X1;l h, E t − x Xt dx h (x) x1;k [Pta π 1;l ] (x) = −x1;l [Pta π 1;k ] (x) , and by adding and subtracting x1;k x1;l inside the integral this rewrites as dx h (x) x1;k {[Pta π 1;l ] (x) − π 1;l (x)} − dx h (x) x1;l {[Pta π 1;k ] (x) − π 1;k (x)} . It now follows, as earlier, that 2 3 h, Ea,· [π 1 (·) , X1t ]k ,l = [E a (hπ 1;k , π 1;l ) − E a (hπ 1;l , π 1;k )] × t + o (t) and we see that the required cancellation takes place if, for all h smooth and compactly supported, [E a (hπ 1;k , π 1;l ) − E a (hπ 1;l , π 1;k )]

2E a (h, π 2;k ,l ) .

=

(to b e checked)

We will check this with a direct computation. First note that a a a E (hπ 1;k , π 1;l ) − E (hπ 1;l , π 1;k ) = π 1,k dΓ (h, π 1,l ) − π 1,l dΓa (h, π 1,k ) which is immediately seen via symmetry of dΓa (·, ·), inherited from the ij symmetry of a , and the Leibnitz formula E a (gg , h) = gdΓa (g , h) + g dΓa (g, h) . It is immediately checked from the deﬁnition of the vector ﬁelds ui , see equation (16.1), that   − (1/2) π 1;l if i = k (1/2) π 1;k if i = l ui f ≡ ui π 2;k ,l =  0 otherwise so that (noting π 1,k = 2ui f and also using uj π 1;l = δ j l , i.e. 1 if j = l and 0 otherwise) π 1,k aij ui huj π 1,l = 2 (ul f ) ail (ui h) π 1,k dΓa (h, π 1,l ) = i,j

i

Markov processes

476

and similarly (−π 1,l ) aij ui huj π 1,k =2 (uk f ) aik (ui h) . − π 1,l dΓa (h, π 1,k ) = i,j

i

Therefore, using uj f = 0 for j = {k, l} in the second equality, (uj f ) aij (ui h) E a (hπ 1;k , π 1;l ) − E a (hπ 1;l , π 1;k ) = 2 j =k ,l

=

2

i

(uj f ) aij (ui h)

i,j

and this equals precisely 2E a (h, f ) as required. Corollary 16.21 For all t ∈ [0, 1) and all h ∈ L1 (g, dx) , At,t+δ a,x lim dxh (x) E ≡ 0. δ →0 g δ Proof. We ﬁrst write At,t+δ dxh (x) Ea,x = δ =

h (x) pa (t, x, y) rδ (y) dxdy h (x) pa (t, x, y) dx rδ (y) dy.

Then, noting that y → h (x) pt (x, y) dx is in L1 (g, dx), the proof is ﬁnished by applying the previous proposition. Theorem 16.22 For all bounded sets K ⊂ g and all σ ∈ (0, 1], a,y At,t+δ = 0. lim sup sup E δ →0 t∈[σ ,1] y ∈K δ ¯ (0, R) ⊂ g of Proof. It suﬃces to prove this for a compact ball K = B arbitrary radius R > 0. We ﬁx σ ∈ (0, 1] and think of rδ = rδ (t, y) as a family of maps, indexed by δ > 0, deﬁned on the cylinder [σ, 1] × K; that is, (t, y) ∈ [σ, 1] × K → rδ (t, y) ∈ so (d) . By Proposition 16.20(i) we know that supδ > 0 |rδ |∞ < ∞. We now show equicontinuity of {rδ : δ > 0}. By the Markov property, At,t+δ rδ (t, y) = Ea,y δ I H Ea,· (A0,δ ) = pa (t, y, ·) , δ a = !p (t, y, ·) , rδ (0, ·)" ,

16.5 Strong approximations

477

so that, for all (s, x) , (t, y) ∈ [σ, 1] × K, |rδ (s, x) − rδ (t, y)|

= |!pa (s, x, ·) − pa (t, y, ·) , rδ (.)"| ≤

sup |rδ |∞

δ ∈(0,1]

|pa (s, x, ·) − pa (t, y, ·)|L 1 .

From de Giorgi–Moser–Nash regularity (Proposition 16.7), (t, y) ∈ [σ, 1] × K → pa (t, y, z) is continuous for all z; the dominated convergence theorem then easily gives continuity of (t, y) → pa (t, y, ·) ∈ L1 . In fact, this map is uniformly continuous when restricted to the compact [σ, 1] × K, and it follows that {rδ : δ > 0} is equicontinuous as claimed. By Arzela–Ascoli, there exists a subsequence (δ n ) such that rδ n converges uniformly on [σ, 1] × K to some (continuous) function r. On the other hand Proposition 16.20 (ii), applied to h = pa (t, y, ·), shows that rδ (t, y) → 0 as δ → 0 for all ﬁxed y, t > 0. This shows that r ≡ 0 is the only limit point and hence a,y At,t+δ = 0. lim sup sup E δ →0 t∈[σ ,1] y ∈K δ

Convergence of the sum of the small areas For ﬁxed a ∈ Ξ (Λ) and x ∈ g let us deﬁne the real-valued quantity

Kσ ,δ :=

sup

0≤u 1 < u 2 < v 1 < v 2 ≤1: v 1 −u 2 ≥σ , |u 2 −u 1 |,|v 2 −v 1 |≤δ

|Ea,x (Au 1 ,u 2 · Av 1 ,v 2 )| (u2 − u1 ) (v2 − v1 )

where δ, σ ∈ (0, 1). As above, · denotes the scalar product in so (d).

Proposition 16.23 For ﬁxed σ ∈ (0, 1), k, l ∈ {1, . . . , d} we have limδ →0 Kσ ,δ = 0.

Markov processes

478

Proof. By the Markov property,14 |Ea,x (Au 1 ,u 2 · Av 1 ,v 2 )| = (u2 − u1 ) (v2 − v1 ) a,x E Au 1 ,u 2 · Ea,X u 2 (Av 1 −u 2 ,v 2 −u 2 ) (u2 − u1 ) (v2 − v1 ) a,x E Au 1 ,u 2 · Ea,X u 2 (Av 1 −u 2 ,v 2 −u 2 ; Xu 2 ≤ R) ≤ (u2 − u1 ) (v2 − v1 ) a,x a,X E Au 1 ,u 2 · E u 2 (Av 1 −u 2 ,v 2 −u 2 ; Xu 2 > R) + (u2 − u1 ) (v2 − v1 ) a,y a,x E Au ,u + δ E (|Au 1 ,u 2 | ; Xu 2 ≤ R) ≤ sup sup (u2 − u1 ) δ δ ≤δ y ≤R +Ea,x

u ∈[σ ,1]

Ea,x Au ,u + δ |Au 1 ,u 2 | ; Xu 2 > R sup sup u2 − u1 δ δ ≤δ y ≤R u ∈[σ ,1]

a,y E Au ,u + δ E (|Au 1 ,u 2 |) sup sup ≤ (u2 − u1 ) δ ≤δ y ≤R δ a,x

u ∈[σ ,1]

@ A a,x A 5 Au 1 ,u 2 2 A E u ,u + δ B sup + Pa,x (Xu 2 > R) Ea,x u2 − u1 δ ,u ,x δ a,y E 5 Au ,u + δ ≤ C sup sup + C Pa,x (Xu 2 > R) δ δ ≤δ |y |≤R u ∈[σ ,1]

for some constant C = C (x , σ, Λ) using Corollary 16.12 and Proposition 16.20(i). We then ﬁx ε > 0 and choose R = R () large enough so that C sup u 2 ∈[0,1]

Pa,x Xxu 2 > R ≤ ε/2.

On the other hand, Theorem 16.22 shows that a,y E Au ,u + δ ε C sup sup ≤ 2 δ δ ≤δ |y |≤R u ∈[σ ,1]

for all δ small enough and the proof is ﬁnished.

1 4 Again,

g2

Rd

.

ˆ· ∈ it is important to condition with respect to X · ∈ gN Rd and not X

16.5 Strong approximations

479

Corollary 16.24 There exists C = C (Λ) such that for all subdivisions D of [0, 1] , s, t ∈ D, for any σ ∈ (0, 1) , 4 ! " 2 Ea,x d s2 X D s,t , Xs,t ≤ C (t − s) Kσ ,|D | + (t − s) σ . Proof. Recalling the discussion around (16.13), equivalence of homogenous norms leads to 4 D a,x At i ,t i + 1 |2 ). E d s2 X s,t , Xs,t ≤ c1 Ea,x (|

Let us abbreviate i:t i ∈D ∩[s,t) to (| i At i ,t i + 1 |2 ) is estimated by 2 times Ea,x At i ,t i + 1 · At j ,t j + 1 i≤j

≤

i

i:t i ∈D ∩[s,t)

in what follows. Clearly, Ea,x

Ea,x At i ,t i + 1 · At j ,,tt j + 1 +

i≤j t j −t i + 1 ≥σ

≤ Kσ ,|D |

Ea,x At i ,t i + 1 · At j ,t j + 1

i≤j t j −t i + 1 < σ

(ti+1 − ti ) (tj +1 − tj )

i≤j t j −t i + 1 ≥σ

+

C

2 2 Ea,x At i ,t i + 1 Ea,x At j ,t j + 1

i≤j t j −t i + 1 < σ

2

≤ Kσ ,|D | (t − s) + c2

(ti+1 − ti ) (tj +1 − tj )

i,j t j −t i + 1 < σ

and the very last sum is estimated as follows: (ti+1 − ti ) (tj +1 − tj ) | ≤ σ (ti+1 − ti ) = σ (t − s) . | i

j t j −t i + 1 < σ

i

The proof is ﬁnished. Putting things together Theorem 16.25 Let D be a dissection of [0, 1] with mesh |D| . Then, for all 1 ≤ q < ∞ and 0 ≤ α < 1/2, dα -H¨o l;[0,1] s2 X D , X → 0 in Lq (Pa,x ) as |D| → 0. Proof. We ﬁrst show pointwise convergence. We ﬁx ε > 0 and apply Corollary 16.24 with σ = ε/2C. Then, 4 D ε a,x sup E d s2 X s,t , Xs,t ≤ CKσ ,|D | + . 2 s,t∈D s< t

Markov processes

480

By Proposition 16.23 it then follows that, for |D| small enough, 4 sup d s2 X D s,t , Xs,t 4

L (Pa , x )

s,t∈D s< t

≤ ε.

By Theorem 16.19 we have, for all q ∈ [1, ∞), sup s2 X D α -H¨o l;[0,1]

L q (Pa , x )

D

+ Xα -H¨o l;[0,1]

L q (Pa , x )

and both results combined yield lim sup d s2 X D s,t , Xs,t |D |→0 0≤s< t≤1

L 4 (Pa , x )

< ∞ (16.16)

= 0.

By H¨older’s inequality the last statement remains valid even when we replace L4 by Lq for any q ∈ [1, ∞). We can then conclude by using Proposition A.15.

16.6 Weak approximations We maintain our standing notation, in particular g = gN Rd , and recall that for any a ∈ ΞN ,d (Λ), we have constructed a g-valued diﬀusion process Xa , associated with the Dirichlet form E a .

16.6.1 Tightness Proposition 16.26 Let (an ) ⊂ ΞN ,d (Λ). Then, for any starting point x ∈ g and any α ∈ [0, 1/2), the family of processes (Xa n ,x : n ∈ N) is tight in the Polish space Co0,α -H¨o l ([0, 1] , g). Proof. Let us ﬁx α ∈ (α, 1/2). From Proposition 8.17, KR = {x : xα -H¨o l ≤ R} is relatively compact in Co0,α -H¨o l ([0, 1] , g). The proof is then ﬁnished with the Fernique estimate from Corollary 16.12, sup P (Xa n ∈ KR ) ≤ ce−R

2

/c

.

n

16.6.2 Convergence In order to discuss weak convergence, let us ﬁrst specialize some properties of non-negative quadratic forms to the present setting. From (E.1), using also quasi-isometry (“E a ∼ E”), we have that for all f, g ∈ W 1,2 (g, dm),

16.6 Weak approximations

481

the common domain of all E a with a ∈ ΞN ,d (Λ), and s > 0, 2 |∇Psa f | dm ≤ ΛE a (Psa f, Psa f ) 1 2 |f |L 2 ∧ E a (f, f ) ≤ Λ 2s Λ 2 2 |f |L 2 ∧ Λ2 |∇f | dm . ≤ 2s

(16.17)

Lemma 16.27 Let (an : 1 ≤ n ≤ ∞) ⊂ ΞN ,d (Λ) and assume an → a∞ almost everywhere (with respect to Lebesgue measure on g). Set E n = E a n with associated semi-group P n . Assume g, f ∈ W 1,2 (g, dm), the common domain of all E n . Then (i) for every ﬁxed s ∈ [0, 1] , assuming L2 -convergence Psn f to some limit, say Qs f , and boundedness of {(Psn f ) : n} in W 1,2 , we have E n (Psn f, g) → E ∞ (Qs f, g) as n → ∞; (ii) we have sup sup |E n (Psn f, g)| < ∞. n

s∈[0,1]

Proof. (i) Recall that D (E ∞ ) is a Hilbert space with inner product given by !·, ·"E ∞ = !·, ·"L 2 + E ∞ (·, ·) and (by quasi-isometry) Λ−1 |f |W 1 , 2 ≤ !f, f "E ∞ ≤ Λ |f |W 1 , 2 . By assumption, {(Psn f ) : n} ⊂ W 1,2 is bounded and hence 2

2

Psn f E ∞ = |f |L 2 + E ∞ (Psn f, Psn f ) 2

2

is also uniformly bounded in n. Together with Psn f → Qs f as n → ∞ in L2 , an application of Lemma E.1 shows that this convergence holds weakly in (D (E ∞ ) , !·, ·"E ∞ ). In particular, since g ∈ W 1,2 (g, dm) = D (E ∞ ), E ∞ (Psn f, g) → E ∞ (Qs f, g) as n → ∞. Thus, it only remains to see that δ (n) := E n (Psn f, g) − E ∞ (Psn f, g) =

!∇Psn f, (an − a) ∇g" dm

converges to zero. From Cauchy–Schwarz we obtain 1 1 2 2 2 n |∇Ps f | dm |an − a∞ | |∇g| dm. |δ (n)| ≤

Markov processes

482

It now suﬃces to note, from (16.17), and for ﬁxed s > 0, 2 sup |∇Psn f | dm < ∞; n

δ (n) → 0 is then a consequence of an → a∞ almost everywhere and bounded convergence. (ii) Using quasi-isometry of E n ∼ E, Cauchy–Schwarz and (16.17), 1 1 2 2 2 |∇Psn f | dm |∇g| dm |E n (Psn f, g)| ≤ c2 1

1 2

|∇f | dm

≤ c3

2

2

|∇g| dm

and this bound is uniform in s ∈ [0, 1] and n. Theorem 16.28 Let x ∈ g, (an : 1 ≤ n ≤ ∞) ⊂ ΞN ,d (Λ) and assume older an → a∞ almost everywhere. Then Xa n ;x converges weakly in α-H¨ topology to Xa ∞ ;x . Proof. From Proposition 16.26 the family (Xa n ,x ) is tight. It then suﬃces to establish convergence of the ﬁnite-dimensional distributions. To this end, let us set a := a∞ and consider pa ∞ = pa ∞ (t, x, y), the heat-kernel (or transition density) of Xa ∞ ,x . It will suﬃce to check that pa n → pa ∞ , uniformly on compacts in (0, ∞)×g×g. Since each heat-kernel pa is a (weak) solution older regularto the respective parabolic PDE ∂t p = Lay p, it follows from H¨ ity of weak solutions (Proposition 16.7), uniformly over all a ∈ ΞN ,d (Λ), that (t, y) → pa (t, x, y) is equicontinuous over sets of the form (s, t) × K ⊂⊂ (0, ∞) × g. More precisely, we cover (s, t) × K by ﬁnitely many “parabolic” cylinders Qk1 so that Qk1 ⊂ Qk2 ⊂ (0, ∞) × g and note that maxk |pa (·, x, ·)|∞;Q k is bounded 2 by some constant (depending on Λ and the distance of ∪k Qk2 to {0} × g, which can be made close to s > 0 by taking k large enough), uniformly over x ∈ K. By symmetry, the same holds for (t, x) → pa (t, x, y) and from the triangle inequality, |pa (t, x, y) − pa (t , x , y )|

≤

|pa (t, x, y) − pa (t , x, y )| + |pa (t , x, y ) − pa (t, x , y )| ;

we see that pa : a ∈ ΞN ,d (Λ) is equicontinuous on any compact set of form (s, t) × K × K. In conjunction with the heat-kernel bounds, it is clear from Arzela–Ascoli that there exists some q ∈ C ((0, ∞) × g × g) so that, after switching to a subsequence if necessary, pa n → q uniformly on compacts.

16.7 Large deviations

483

Validity of the Chapman–Kolmogorov equation is preseved in this limit, and so Qt f := f (x) q (t, ·, y) dy, f ∈ Cc∞ extends (uniquely) to strongly continuous semi-groups (Qt : t ≥ 0) on L2 (g). Quite obviously then, at least for ﬁxed t, Pta n f → Qt f in L2 . Moreover, (Pta n f : n ∈ N) is bounded in W 1,2 , as is clear from (16.17) and so (Lemma E.1) Pta n f → Qt f weakly in W 1,2 . Thanks to Lemma 16.27 we can now pass to the limit in t an E a n (Psa n f, g) ds !Pt f, g"L 2 = !f, g"L 2 + 0

and learn that !Qt f, g"L 2 = !f, g"L 2 +

t

E a ∞ (Qs f, g) ds. 0

But this identiﬁes (Qt ) as the semi-group associated with E a ∞ . In particular, q must coincide with pa ∞ , which implies convergence of the ﬁnitedimensional distributions. The proof is then ﬁnished.

16.7 Large deviations We maintain our standing notation, g = gN Rd , and ΞN ,d (Λ) denotes uniformly elliptic matrices deﬁned on g. Theorem 16.29 Let a ∈ ΞN ,d (Λ) and Xa;x be the symmetric g-valued diﬀusion associated with the Dirichlet form15 1 a E , 2 started at some ﬁxed point x ∈ g. Then the family (Xa;x (ε·) : ε > 0) satisﬁes a large deviation principle with good rate function da ht i , ht i + 1 2 1 a sup I (h) = 2 D ⊂D[0,T ] |ti+1 − ti | i:t i ∈D

1 5 The factor 1/2 deviates from our previous convention but leads to a more familiarlooking rate function.

484

Markov processes

in α-H¨ older topology, α ∈ [0, 1/2). More precisely, viewing Pa;x = (P)∗ Xa;x ε ε , a;x α -H¨o l ([0, 1] , g), the family the law of X (ε·), as a Borel measure on Cx : ε > 0) satisﬁes a large deviation principle in Cxα -H¨o l ([0, 1] , g) with (Pa;x ε good rate function I a . Proof. A large deviation principle in uniform topology, with rate function 1 2 I a (ω) = |ω|W 1 , 2 ([0,1],gN (Rd )) , 2 follows from an abstract Schilder theorem, Theorem E.20, applied to the speciﬁc context of the g-valued diﬀusion Xa;x associated with the Dirichlet form 12 E a . Remark 16.30 The rate function is precisely 1 2 |h| 1 , 2 N d 2 W ([0,1],g (R )) N d a relative to the metric space g R , d . When a = I, da is the Carnot– Caratheodory metric on gN Rd and in this case, from Exercise 7.60 and basic facts of W 1,2 [0, 1] , Rd , 1 2 ˙ 2 2 |h|W 1 , 2 ([0,1],gN (Rd )) = |h|W 1 , 2 ([0,1],Rd ) = hu du 0

where we wrote h = π 1 (h).

16.8 Support theorem We maintain our standing notation, g = gN Rd , and ΞN ,d (Λ) denotes uniformly elliptic matrices deﬁned on g. Recall that for any a ∈ ΞN ,d (Λ), we have constructed a g-valued diﬀusion process Xa;x , started at some x ∈ g associated with the Dirichlet form E a .

16.8.1 Uniform topology Theorem 16.31 There exists a constant C = C (Λ, N ) so that for any h ∈ W01,2 [0, 1] , Rd and any ε ∈ (0, 1), we have  2  C 1 + |h| 1 , 2 W ;[0,1]   Pa,0 d∞;[0,1] (X, SN (h)) ≤ ε ≥ exp − . ε2 As a consequence, we have full support of Xt = Xxt in uniform topology. In other words, supp (Pa,x ) = Cx ([0, 1] , g) . Proof. This follows from general principles, Theorem E.20, applied to the speciﬁc g-valued process Xa;x associated with the Dirichlet form E a .

16.8 Support theorem

485

16.8.2 H¨older topology

Recall that, modulo starting points, W 1,2 [0, 1] , Rd is in one-to-one correspondence to W 1,2 ([0, 1] , g) where we write g = gN Rd as usual. Indeed, any Rd -valued W 1,2 -path h lifts to sN (h) ≡ log ◦SN (h) ∈ W 1,2 ([0, 1] , g) , cf. Exercise 7.60; conversely, it suﬃces to project h ∈ W 1,2 ([0, 1] , g) to its ﬁrst level π 1 (h). Observe also that, for any α ∈ [0, 1/2], Wx1,2 ([0, 1] , g) ⊂ Cxα -H¨o l ([0, 1] , g) . Lemma 16.32 Assume α ∈ [0, 1/4). Fix h ∈ Wx1,2 ([0, 1] , g), ε > 0 and deﬁne Bεh = {x : |x|α -H¨o l ≤ 2 |h|α -H¨o l , d∞ (x, h) ≤ ε}. Then Pa,x X ∈ Bεh > 0. Proof. Step 1: Taking 0 ≤ α < β ≤ 1 we claim that α /β

xα -H¨o l xβ -H¨o l d∞ (x, h)

1−α /β

+ hα -H¨o l .

To see this, note that d (xs , xt ) can be estimated in two ways: α

d (xs , xt )

≤ 2d∞ (x, h) + hα -H¨o l |t − s| ,

d (xs , xt )

≤

β

xβ -H¨o l |t − s| .

Given r, we can use the ﬁrst estimate when |t − s| ≤ r and the second when |t − s| > r, so that d (xs , xt ) β −α 2d∞ (x, h) sup ≤ min xβ -H¨o l r , + hα -H¨o l α rα 0≤s≤t≤1 |t − s| xβ -H¨o l rβ 2d∞ (x, h) + hα -H¨o l . , ≤ min rα rα Choosing r optimally, namely such that rβ = 2d∞ (x, h) / xβ -H¨o l , then gives α /β 1−α /β xβ -H¨o l + hα -H¨o l . xα -H¨o l ≤ 21−α /β d∞ (x, h) Step 2: Now let α < β < 1/2 so that we have Fernique estimates for Xβ -H¨o l = X (ω)β -H¨o l;[0,1] . Then α/β 1−α/β Pa,x X ∈ Bεh ≥ Pa,x (2d∞ (X, h)) |x|β -H¨o l ≤ |h|α -H¨o l , d∞ (X, h) ≤ ε hα -H¨o l α /β a,x xβ -H¨o l ≤ , d∞ (X, h) ≤ ε =P 1−α /β (2ε) β /α hα -H¨o l a,x a,x ≥ P ( d∞ (X, h) ≤ ε) − P xβ -H¨o l > β /α −1 (2ε) = ∆1 − ∆2 .

Markov processes

486

a,x Obviously, both ∆1 and ∆2 tend to zero as ε → 0. Positivity of P h X ∈ Bε will follow from checking that ∆2 /∆1 → 0 as ε → 0. Keeping h ﬁxed, Theorem E.21 gives 1 log ∆1 ≥ −c1 ε2

while the Fernique estimates imply log ∆2 ≤ −c2

1 ε2β /α −2

for some irrelevant positive constants c1 and c2 . We see that ∆2 /∆1 → 0 if 2 < 2β/α − 2 or equivalently 2α < β. Since β < 1/2 was needed to apply the Fernique estimates, we see that the argument works for any α ∈ [0, 1/4). Theorem 16.33 Let a ∈ ΞN ,d (Λ) and X = Xa;x be the g-valued symmetric diﬀusion process, started at some x ∈ g and associated with the Dirichlet form E a . Fix h ∈ Wx1,2 ([0, 1] , g). Then, for every α ∈ [0, 1/4) and every δ > 0, Pa;x dα -H¨o l;[0,1] (X, h) < δ > 0. older topology In particular, for every α ∈ [0, 1/4) the support of Xa;x in α-H¨ is precisely Cx0,α -H¨o l [0, 1] , gN Rd . Proof. Without loss of generality, x = Xa (0) = h (0) = 0 ∈ g. Pick α ∈ (α , 1/4). By interpolation and the d0 /d∞ -estimate, 1−1/N

d∞ (x, y) ≤ d0 (x, y) (x∞ + y∞ )

1/N

d∞ (x, y)

so that dα -H¨o l (Xa , h) (|Xa |α -H¨o l α /α

+ |h|α -H¨o l )

(|Xa |∞ + |h|∞ )

(1−1/N ) (1−α /α )

d∞ (Xa , h)

1 −α / α N

.

In particular, for Xa ∈ Bεh = {x : |x|α -H¨o l ≤ 2 |h|α -H¨o l , d∞ (x, h) ≤ ε} there exists c1 (which may in particular depend on h, α , α) so that dα -H¨o l (Xa , h) ≤ c1 ε

1 −α / α N

.

1 −α / α

Fix δ > 0 and take ε small enough such that c1 ε N < δ. Clearly then Px dα -H¨o l;[0,1] (Xa , h) < δ ≥ Pa,x X ∈ Bεh > 0 where the ﬁnal, strict inequality is due to Lemma 16.32.

16.8 Support theorem

487

Remark 16.34 By taking α ∈ (1/5, 1/4) and N = 4 this yields a support characterization of Xa;x in α-H¨older rough path topology. Since Xa;x has sample paths which enjoy H¨ older regularity for any exponent less than 1/2, one suspects that the above support description holds true for any a < 1/2. Although we are able to show this when h ≡ 0, see the following section, the extension to h = 0 remains an open problem.

16.8.3 H¨older rough path topology: a conditional result a;x We ﬁrst study the probability that X stays in the bounded open domain N d D ⊂ g = g R for long times.

Proposition 16.35 Let D be an open domain in g with ﬁnite volume, no regularity assumptions are made about ∂D. Let a ∈ Ξ (Λ) and Xa be the process associated with E a started at x ∈ g and assume x ∈ D. Then there exist positive constants K1 = K1 (x, D, Λ) and K2 = K2 (D, Λ) so that for all t ≥ 0 ∈ D ∀s : 0 ≤ s ≤ t] ≤ K2 e−λt K1 e−λt ≤ P [Xa,x s where λ ≡ λa1 > 0 is the simple and ﬁrst Dirichlet eigenvalue of −La on the domain D. Moreover, ∀a ∈ Ξ (Λ) : 0 < λm in ≤ λa1 ≤ λm ax < ∞ where λm in , λm ax depend only on Λ and D. Remark 16.36 The proof will show that K1 ∼ ψ a1 (x). Noting that a ψ a1 (x) e−λ 1 t solves the same PDE as ua (t, x), the above can be regarded as a “partial” parabolic boundary Harnack statement. Proof. If paD denotes the Dirichlet heat-kernel for D, we can write a x a paD (t, x, y) dy. u (t, x) := P [Xs ∈ D ∀s : 0 ≤ s ≤ t] = D

As is well known,16 paD is the kernel for a semi-group PDa : [0, ∞)×L2 (D) → L2 (D) which corresponds to the Dirichlet form (E a , FD ) whose domain FD consists of all f ∈ F ≡ D (E a ) with quasicontinuous modiﬁcations equal to 0 q.e. on Dc . The inﬁnitesimal generator of PDa , denoted by LaD , is a self-adjoint, densely deﬁned operator with spectrum σ (−LaD ) ⊂ [0, ∞). We now use an ultracontractivity argument to show that σ (−LaD ) is discrete. To this end, we note that the upper bound on pa plainly implies 2 |paD (t, ·, ·)|∞ = O(t−d /2 ). Since |D| < ∞ it follows that PDa (t)L 1 →L ∞ < ∞ which is, by deﬁnition, ultracontractivity of the semi-group PDa . It is now 1 6 See,

for example, [70].

Markov processes

488

a standard consequence17 that σ (−LaD ) = {λa1 , λa2 , . . . } ⊂ [0, ∞), listed in non-decreasing order. Moreover, it is clear that λa1 = 0; indeed, the heat kernel estimates are more than suﬃcient to guarantee that PDa (t)L 2 →L 2 → 0 as t → ∞ which contradicts the existence of non-zero f ∈ L2 (D) so that PDa (t) f = f for all t ≥ 0. Let us note that λa1

=

inf σ (H) = inf E a (f, f ) : f ∈ FD with |f |L 2 (D ) = 1 (by Rayleigh–Ritz) Γa (f, f ) dm : f ∈ FD with |f |L 2 (D ) = 1 = inf D

and since Γa (f, f ) /ΓI (f, f ) ∈ Λ−1 , Λ for f = 0 it follows that λa1 ∈ [λm in , λm ax ] for all a ∈ Ξ (Λ) where we set λm in = Λ−1 λI1 , λm ax = ΛλI1 .

(16.18)

The lower heat-kernel estimates for the killed process imply18 irreducibility of the semi-group PDa , hence simplicity of the ﬁrst eigenvalue λ, and there is an a.s. strictly positive eigenfunction to λ ≡ λa1 , say ψ ≡ ψ a1 , and by de Giorgi–Moser–Nash regularity we may assume that ψ is H¨older continuous and strictly positive away from the boundary (this follows also from Harnack’s inequality). We also can (and will) assume that ψL 2 (D ) = 1. Lower bound: Noting that v (t, x) = e−λt ψ (x) is a weak solution of ∂t v = LaD v with v (0, ·) = ψ, we have v (t, x) = paD (t, x, y) ψ (y) dy, D

at ﬁrst for a.e. x but by using a H¨ older-regular version of paD the above holds for all x ∈ D. It follows that 0

<

ψ (x) paD (t, x, y) ψ (y) dy = eλt D λ(t+1) a ≤ e pD (t, x, y) paD (1, y, z) ψ (z) dzdy D D 1 1 ≤

λ(t+1)

paD

e

(t, x, y)

D

D

[paD

≤ C (Λ, D) eλ(t+1) ua (t, x) = C (Λ, D) eλ m a x × eλt ua (t, x) 1 7 See, 1 8 See,

for example, [38, Theorem 1.4.3]. for example, [38, Theorem 1.4.3].

2

(1, y, z)] dz

2

ψ (z) dz

dy

16.8 Support theorem

489

and this gives the lower bound with K1 = ψ (x) / C (Λ, D) eλ m a x . Clearly ψ = ψ a1 depends on a and so does K1 . Thus, what we need to show is that ψ (x) can be bounded from below by a quantity which depends on a only through its ellipticity constant Λ. To this end, from paD (t, y, y) =

∞

e−λ i t |ψ ai (y)| a

2

i=1

evaluated at t = 1, say, we see that 2

|ψ (y)| ≤ eλ paD (1, y, y) ≤ eλ m a x paD (1, y, y) ≤ eλ m a x pa (1, y, y) and by using our upper heat-kernel estimates for pa we see that there is a constant M = M (Λ, D) such that |ψ|∞ ≤ M . Given x and M we can ﬁnd a compact set K ⊂ D so that m (D\K) ≤ 1/(4M 2 ) and x ∈ K (recall that m is a Haar measure on g). By Harnack’s inequality, sup ψ ≤ Cψ (x) K

for C = C (K, Λ) = C ( x, D, Λ) .We then have 5 5 5 1 = |ψ|L 2 ≤ M m (D\K) + Cψ (x) m (K) ≤ 1/2 + Cψ (x) m (D) which gives the required lower bound on ψ (x) ≡ ψ a1 (x), which only depends on x, D and Λ but not on a. Upper bound: Recall that −λ ≡ −λa1 denotes the ﬁrst eigenvalue of LaD with associated semi-group PDa . It follows that |PDa (t) f |L 2 ≤ e−λt |f |L 2 , which may be rewritten as paD (t, ·, z) f (z) dz D

≤ e−λt |f |L 2 . L2

Let t > 1. Using Chapman–Kolmogorov and symmetry of the kernel, a pD (t, x, z) dz = paD (1, x, y) paD (t − 1, z, y) dydz u (t, x) = D

=

5

paD

m (D) D

=

D

D

(t −

≤

5 m (D) |PDa (t − 1) paD (1, x, ·)|L 2 (D ) 5 m (D)e−λ(t−1) |paD (1, x, ·)|L 2 (D ) 5 m (D)eλ m a x e−λt paD (2, x, x)

≤

K2 e−λt ,

= ≤

(1, x, y) dy

D

5 a a m (D) pD (t − 1, ·, y) pD (1, x, y) dy D

1/2

2 1, z, y) paD

L 2 (D )

dz

Markov processes

490

where we used upper heat-kernel estimates in the last step to obtain K2 = K2 (D, Λ) . Corollary 16.37 Fix a ∈ Ξ (Λ). There exists K = K (Λ) and for all ε > 0 there exist λ = λ(ε) such that ! " −2 K −1 e−λtε ≤ Pa,0 ||X||0;[0,t] < ε (16.19) ! " −2 (16.20) ∀x : Pa,x ||X||0;[0,t] < ε ≤ Ke−λtε . Proof. A straightforward consequence of scaling and Proposition 16.35 applied to D = B (0, 1) = {y : y < 1} where · is the standard Carnot–Caratheodory norm on g. Then λ is the ﬁrst eigenvalue corresponding to a scaled by factor ε.

Proposition 16.38 Let α ∈ [0, 1/2). There exists a constant C16.38 such that for all ε ∈ (0, 1] and R > 0, Xs,t sup P α > R X0;[0,1] < ε |t−s|< ε 2 |t − s| 1 R2 . ≤ C16.38 exp − C16.38 ε2(1−2α )

a,0

Proof. There will be no confusion in writing Pε for P ·| X0;[0,1] < ε . Suppose there exists a pair of times s, t ∈ [0, 1] such that s < t, |t − s| < ε2 and

Xs,t α > R. |t − s|

E D Then there exists a k ∈ {1, . . . , 1/ε2 } so that [s, t] ⊂ (k−1) ε2 , (k+1) ε2 . In particular, the probability that such a pair of times exists is at most 2 *1/ε +

Pε Xα ;[(k −1)ε 2 ,(k +1)ε 2 ] > R .

k =1

Set (k − 1) ε2 , (k + 1) ε2 =: [T1 , T2 ]. The rest of the proof is concerned with the existence of C such that R2 ε −1 P ||X||α ;[T 1 ,T 2 ] > R ≤ C exp −C ε 2 ( 1 −2 α )

16.8 Support theorem

491

E D since the factor 1/ε2 can be absorbed in the exponential factor by making C bigger. We estimate P0 ||X||α ;[T 1 ,T 2 ] > R ||X||0;[0,1] < ε P0 ||X||α ;[T 1 ,T 2 ] > R; ||X||0;[0,T 1 ] < ε; ||X||0;[T 2 ,1] < ε ! " . ≤ P0 ||X||0;[0,1] < ε By using the Markov property and the above lemma, writing λ(ε) = λa;ε , this equals ! " E0 PX T 2 ||X||0;[0,1−T 2 ] < ε ; ||X||α ;[T 1 ,T 2 ] > R; ||X||0;[0,T 1 ] < ε ! " P0 ||X||0;[0,1] < ε ! " ( ε ) −2 (ε ) −2 ≤ Ceλ ε E0 e−λ (1−T 2 )ε ; ||X||α ;[T 1 ,T 2 ] > R; ||X||0;[0,T 1 ] < ε ! " (ε ) −2 = Ceλ T 2 ε P0 ||X||α ;[T 1 ,T 2 ] > R; ||X||0;[0,T 1 ] < ε where constants were allowed to change in insigniﬁcant ways. If X had independent increments in the group (such as is the case for enhanced Brownian motion B) P0 [. . . ] would split up immediately. This is not the case here, but the Markov property serves as a substitute; using the Dirichlet heat-kernel paB (0,ε) we can write ! " P0 ||X||α ;[T 1 ,T 2 ] > R; ||X||0;[0,T 1 ] < ε ! " = dx paB (0,ε) (T1 , 0, x) Px ||X||α ;[0,T 2 −T 1 ] > R . B (0,ε)

Then, scaling and the usual Fernique-type estimates for the H¨ older norm of X give 2 ! " R 1 , sup Px ||X||α ;[0,T 2 −T 1 ] > R ≤ C exp − C ε1−2α x where we used T2 − T1 = 2ε2 , and we obtain ! " P0 ||X||α ;[T 1 ,T 2 ] > R; ||X||0;[0,T 1 ] < ε 2 ! " R 1 P0 ||X||0;[0,T 1 ] < ε ≤ C exp − 1−2α C ε 2 (ε ) −2 R 1 e−λ T 1 ε . ≤ C exp − 1−2α C ε

Markov processes

492

Putting things together we have P0 ||X||α ;[T 1 ,T 2 ] > R ||X||0;[0,1] < ε 2 R 1 λ ( ε ) (T 2 −T 1 )ε −2 exp − ≤ Ce C ε1−2α 2 R 1 ≤ Ce2λ m a x exp − C ε1−2α and the proof is ﬁnished. Theorem 16.39 Let α ∈ [0, 1/2). For all R > 0 the ball x : xα -H¨o l;[0,1] < R} has positive Pa,0 -measure and lim Pa,0 Xα -H¨o l;[0,1] < R X0;[0,1] < ε → 1. →0

(16.21)

In particular, for any δ > 0 Pa,0 Xα -H¨o l;[0,1] < δ > 0. Proof. We ﬁrst observe that the uniform conditioning allows us to localize the H¨ older norm. More precisely, take s < t in [0, 1] with t − s ≥ 2 and note that from X0;[0,1] < ε we get Xs,t 1−2α . α ≤ |t − s| It follows that for ﬁxed R and small enough, Pa,0 Xα -H¨o l;[0,1] ≥ R X0;[0,1] < ε Xs,t a,0 sup =P α ≥ R X0;[0,1] < ε |t−s|< ε 2 |t − s| and the preceding proposition shows convergence to zero with and (16.21) follows. Finally, Pa,0 Xα -H¨o l;[0,1] < R ≥ Pa,0 Xα -H¨o l;[0,1] < R X0;[0,1] < ε × Pa,0 X0;[0,1] < ε ≥ Pa,0 X0;[0,1] < ε /2 (for small enough) and this is positive by Proposition 16.37.

16.9 Appendix

493

16.9 Appendix: analysis on free nilpotent groups 16.9.1

Haar measure

In Section 7.5 we free nilpotent groups. More precisely, what introduced representation, namely within we called GN Rd , ⊗ was a particular the tensor algebra T N Rd , +, ⊗ , of an abstract (connected andsimply connected Lie group) G associated with the Lie algebra g = gN Rd ⊂ T N Rd with bracket given by [u, v] = u ⊗ v − v ⊗ u. The abstract exponential map from a Lie algebra to its associated (connected and simply connected) Lie group was then given explicitly by the exponential map on T N Rd based on power series with respect to ⊗. Another representation of G is given by (g, ∗), where ∗ :g×g→g is given by the Campbell–Baker–Hausdorﬀ formula, as derived in Section 7.3. Thanks to nilpotency, ∗ is a polynomial map and (g, ∗) is indeed a realization of G and the abstract exponential map is merely the identity. In any case, G is uniquely determined by g up to isomorphism and whatever concepts such as Carnot–Caratheodory norm/distance we have developed on GN Rd , ⊗ are immediately transfered to (g, ∗), or indeed other representations of G, cf. Remark 16.1. the terminology of Folland Let us recall a few facts about gN Rd . Using and Stein [54], we can say that g = gN Rd is graded in the sense that g =

V1 ⊕ · · · ⊕ VN ,

[Vi , Vj ] ⊂

Vi+j ⊗j with “jth level” given by Vj = g ∩ Rd . It is also stratiﬁed in the sense that V1 generates g as an algebra and so [V1 , Vj ] = V1+j . Moreover, a natural family of dilations on g is given by {δ r : r > 0}, where δ r (g1 , . . . , gN ) = rg1 , r2 g2 , . . . , rN gN , with gj ∈ Vj . As already discussed (cf. Exercise 7.55), each dilation induces a group homomorphism of the form exp ◦δ r ◦ exp−1 . Proposition 16.40 The Lebesgue measure λ on g = gN Rd is the (unique up to a constant factor) left- and right-invariant Haar measure m on (g, ∗). Moreover, if G is the abstract Lie group associated with g, the left- and right-invariant Haar measure on G is given by m = (exp)∗ λ.

494

Markov processes

Remark 16.41 Where no confusion is possible we shall write |A| instead of m (A) for a measurable set A ⊂ G. Proof. Set n = dim gN Rd and also nj = dim (Vj ⊕ · · · ⊕ VN ) . We can choose a basis {en −n N +1 , . . . , en } for VN , extend it to a basis en −n N −1 +1 , . . . , en for VN −1 ⊕ VN , and so forth, obtaining eventually a basis {e1 , . . . , en } for g. The dual basis {ξ 1 , . . . , ξ n } provides global coordinates for gN Rd and, by the Campbell–Baker–Hausdorﬀ formula, η k (x ∗ y) = η k (x) + η k (y) + Pk (x, y) where Pk (x, y) is a polynomial which depends only on the coordinates η i (x) , η i (y) with i < k. Therefore, the diﬀerentials of the maps x → x ∗ y (with y ﬁxed) and y → x ∗ y (with x ﬁxed) are given with respect to the coordinates (η k ) by lower-triangular matrices with ones on the diagonal, and their determinants are therefore identically one. It follows that the volume form dη 1 . . . dη n , which corresponds to Lebesgue measure on g, is left- and right-invariant. Remark 16.42 As the proof immediately reveals, this result holds true for an arbitrary (connected and simply connected) nilpotent Lie group G with Lie algebra g. Deﬁnition 16.43 The homogenous dimension of g = gN Rd is deﬁned as N j (dim Vj ) . dimH g : = j =1

If G is the abstract Lie group associated with g, we equivalently write dimH G.

Lemma 16.44 Let G be the abstract Lie group associated with g = gN Rd . Then, for all r > 0 and measurable sets E ⊂ G, m (δ r E) = rdim H

G

m (E) .

In particular, for B (x, r) = {y ∈ G : d (x, y) < r} we have m (B (x, r)) = crdim H

g

with c = m (B (0, 1)) .

(16.22)

Proof. By construction of the Haar measure and dilation on G, it suﬃces to compute everything in the Lie algebra with exponential coordinates (xj,i )

j =1,...,N i=1,...,dim Vj

16.9 Appendix

495

and with respect to the Lebesgue measure. The image under δ λ is precisely j λ xj,i j =1,...,N i=1,...,dim Vj

and the determinant of the Jacobian of (xj,i ) → λj xj,i is obviously λQ .

16.9.2 Jerison’s Poincar´e inequality

The Lie algebra of g = gN Rd has a decomposition of the form g = V1 ⊕ · · ·⊕VN and we shall represent the associated Lie group on the same space, e.g. G = (g, ∗). There are left-invariant vector ﬁelds u1 , . . . , ud on the group determined by ui |0 = ∂i |0 where ∂i |0 are the coordinate vector ﬁelds associated with the canonical basis of V1 = Rd . Example 16.45 When N = 2, we can identify g with Rd ⊕ so (d) and for i = 1, . . . , d we have   1 x1;j ∂j,i − x1;j ∂i,j  ui |x = ∂i + 2 1≤i< j ≤d

1≤j < i≤d

where ∂i denotes the coordinate vector ﬁeld on Rd and ∂i,j with i < j the coordinate vector ﬁeld on so (d), identiﬁed with its upper-diagonal elements. T

hyp Deﬁnition 16.46 = (u1 , . . . , ud ) the hypoelliptic gra call ∇ NWe d dient on G = g R , ∗ . When no confusion arises, we also write ∇.

The following lemma is sometimes summarized by saying that the hypoelliptic gradient forms an “upper gradient” on G, equipped with Carnot– Caratheodory metric. Since g and G have been identiﬁed, we state and prove it in the following form: Lemma 16.47 (upper gradient lemma) Let x, z ∈ g. For all compactly supported, smooth u : g → R, and admissible path Υ, in the sense that Υ ∈ C 1-H¨o l ([0, z] , g) , Υ1-H¨o l ≤ 1), which also has the property that Υ (0) = x, Υ (z) = xz, we have |u (x) − u (xz)| ≤ 0

z

hyp ∇ u (Υs ) ds.

Markov processes

496

Remark 16.48 The result extends to u ∈ W 1,2 (g, dm) ∩ C (g), using the notation of Theorem 16.4. Proof. Let du denote the 1-form j ∂j u (.) dxj with j = 1, . . . , dim g. By the fundamental theorem of calculus, |u (x) − u (y)| ≤

z

> ? ˙ s ds. du (Υs ) , Υ

0

But any Υ ∈ C 1-H¨o l is, viewed through the global “log”-chart, the step-N older path γ and so lift of an Rd -valued 1-H¨ ˙t = Υ

d

ui (Υt ) γ˙ it .

i=1

older path [0, z] → Rd , at unit speed.) Then (Namely, γ = π 1 (Υ) is a 1-H¨ >

˙s du (Υs ) , Υ

? =

d

!du (Υs ) , ui (Υs )" γ˙ is

i=1

=

(∇u|Υ s ) · γ˙ s ,

where · is the inner product on Rd . For a.e. s we have |γ˙ s | ≤ 1 and Cauchy–Schwarz on Rd shows that |(∇u|Υ s ) · γ˙ s | ≤ |(∇u|Υ s )| .

Proposition 16.49 (weak Poincar´e inequality) There exists a constant C such that for all smooth u : g → R and all y ∈ g and r > 0, 2 hyp 2 (u (x) − u ¯r ) dx ≤ Cr2 ∇ u (x) dx B (y ,r )

where u ¯r =

B (y ,r )

B (y ,2r )

u (x) dx.

Proof. We may assume that y = 0 ∈ g so that B = B (y, r) is centred at the unit element in the group. We shall also write Υz : [0, z] → g for a geodesic which connects the unit element with z, parametrized to run at unit speed. It follows that s → xΥzs

16.9 Appendix

497

is a geodesic from x to xz, run at unit speed. From the “upper-gradient lemma” z ∇u (xΥzs ) ds |u (x) − u (xz)| ≤ 0

and by Cauchy–Schwarz, 2

|u (x) − u (xz)| ≤ z

z

2

|∇u| (xΥzs ) ds.

0

This and the left-invariance of the Lebesgue measure on g yields 2 1 2 (u (x) − u ¯) dx = (u (x) − u (y)) dy dx |B| B B B 1 2 ≤ ((u (x) − u (y))) dydx |B| B B 1 2 = 1B (x) 1B (xz) ((u (x) − u (xz))) dzdx |B| g g z 1 2 ≤ 1B (x) 1B (xz) z |∇u| (xΥzs ) dsdzdx. |B| g g 0 By right-invariance of the Lebesgue measure we obtain 2 z 2 1B (x) 1B (xz) ∇u (xΥs ) dx = 1B Υ zs (ξ) 1B z −1 Υ zs (ξ) |∇u| (ξ) dξ g g 2 |∇u| (ξ) dξ . (16.23) ≤ 12B (z) 2B

Here we denote by Bh the right translation of B by h. The above inequality requires some explanation. If the expression under the sign of the middle interval has a non-zero value, then ξ = xΥzs = yz −1 Υzs for some x, y ∈ B. −1 Hence z = x−1 y ∈ 2B. Thus ξ = xΥxs y lies on a geodesic that joins x with y and so d (x, ξ) + d (ξ, y) = d (x, y), which together with the triangle inequality implies ξ ∈ 2B. This leads to the estimate (16.23), as claimed. Then z 1 2 2 (u (x) − u ¯) dx ≤ 12B (z) z |∇u| (ξ) dξdsdz |B| g 0 B 2B 1 2 2 = z |∇u| (ξ) dξdz |B| 2B 2B and we conclude with |B| = |B (0, 1)| rQ , where Q is the homogenous dimension of g, cf (16.22), and 2r 2 z dz = |B (0, 1)| ρ2 d ρN = (const) × rQ +2 . 2B

0

Markov processes

498

16.9.3 Carnot–Caratheodory metric as intrinstic metric Following the notation of Theorem 16.4 we set Γ (f, g) := ∇hyp f · ∇hyp g =

d

ui f ui g, E (f, g) =

Γ (f, g) dm

i=1

for f, g ∈ Cc∞ (g,R) where g ≡ gN Rd is equipped with Lebesgue measure dm. The domain of Γ resp. E naturally extends to W 1,2 (g, dm), the closure of Cc∞ with respect to 2

f F = E (f, f ) + !f, f "L 2

with L2 = L2 (g, dm) .

Let us deﬁne the intrinsic metric on g by (x, y) : = sup {f (y) − f (x) : f ∈ Cc∞ (g,R) and |Γ (f, f )|∞ ≤ 1} = sup f (y) − f (x) : f ∈ W 1,2 (g, dm) ∩ C (g,R) and |Γ (f, f )| ≤ 1 a.s.} and the Carnot–Caratheodory distance on g by 1 |dh| : h Lipschitz, γ (0) d (x, y) = inf 0 $ d i = x, γ (1) = y : dγ = ui (γ) dh .

(16.24)

i=1

It is easy to see (cf. Remark 7.43) that this is precisely the Carnot– Caratheodory distance on G = GN Rd , cf. Deﬁnition 7.41, as seen through the log-chart.19 Theorem 16.50 The Carnot–Caratheodory distance on g coincides with the intrinsic metric. Proof. (x, y) ≤ d (x, y) : Fix f ∈ Cc∞ gN Rd ,R with |Γ (f, f )|∞ ≤ 1 d and h Lipschitz such that the solution to the ODE dγ = i=1 ui (γ) dhi , γ (0) = x satisﬁes γ (1) = y. Clearly, f (y) − f (x) =

1

Df (γ t ) dγ t = 0

≤

|Γ (f, f )|∞

d i=1

1

0

|dh| ≤ 0

1

(ui f |)γ t dhit

1

|dh| . 0

1 9 Strictly speaking, we should take 16.24 as the deﬁnition for d˜ (x, y), so that d˜ (log g, log h) = d (g, h) for all g, h ∈ G N Rd = log −1 gN Rd .

16.10 Comments

499

Passing the sup (resp. inf) over all such f (resp. h) we see that (x, y) ≤ d (x, y). (x, y) ≥ d (x, y) : Assume momentarily that d (x, ·) is an admissible function in the deﬁnition of . It would then follow that (x, y) ≥ d (x, y) − d (x, x) = d (x, y), which is what we seek. To make this rigorous we proceed in two steps. First we extend {ui : i = 1, . . . , d}

to a full basis of gN Rd ∼ = Rm with m = dim gN Rd , ε vj : i = 1, . . . , m := {ui : i = 1, . . . , d} ∪ ε1/2 ∂j dim gN Rd to the last where {∂j } denotes the coordinate vector ﬁeldscorresponding (m − d) coordinates of Rm . Replacing {ui } by vεj we can deﬁne an inm trinsic distance ε (x, y) associated with Γε (f, g) := i=j vj f vj g and similarly a control distance dε (x, y). Leaving straightforward molliﬁcation arguments to the reader (or [22, p. 285]), the class of admissible functions can be extended to include dε (x, ·) and it follows that dε (x, y) = ε (x, y) . On the other hand, it is not hard to see that d (x, yε ) ≤ dε (x, y) = ε (x, y) ≤ (x, y) for some yε which converges to y as ε → 0. Using continuity of the Carnot–Caratheodory distance it suﬃces to send ε → 0 and the proof is ﬁnished.

16.10 Comments The Dirichlet form approach to Markovian rough paths was ﬁrst adopted by Friz and Victoir [68] and already contains the bulk of the results in this chapter. (One could in fact bypass the use of abstract Dirichlet form theory and give a direct analytical treatment along the lines of Saloﬀ-Coste and Stroock [151]). For some background on (symmetric) Dirichlet forms, the reader can consult Appendix E and the references cited therein. The theory of non-symmetric Dirichlet forms (e.g. in particular the heat-kernel estimates from Sturm [164]) would allow us to construct enhanced nonsymmetric Markov process in a similar way. Exercise 16.10, concerning Nash’s inequality and (on-diagonal) estimates for heat-kernels, is taken from Carlen et al. [22]. The construction of a stochastic area associated with a Markov process of the present type goes back to Lyons and Stoica [121]: they use

500

Markov processes

forward–backward martingale decomposition to see that, in particular, an Rd -valued (Markov) process X a (with uniformly elliptic generator in divergence form) can be rendered accessible to Stratonovich integration; see also Rozkosz [146, 147] for related considerations. As a consequence of the Wong–Zakai theorem, (lifted) piecewise linear approximation will converge (to the Stratonovich lift) and so yield an enhanced Markov process Xa . Lejay [105, 106] then establishes p-variation rough regularity of this enhancement Xa . The link with our approach is made in comment (iv) after Deﬁnition 16.14 in Section 16.4. A version of Theorem 16.28 appears in Lejay [106], the author then also discusses applications to homogenization. Sample path large deviations for Markovian rough paths were established by Friz and Victoir [68]; we observed that the same arguments apply in the abstract setting of local Dirichlet spaces and therefore give the proof in the appropriate Section E.6 of Appendix E. (Although we are unaware of any reference to such a result in the context of Dirichlet space theory, J. Ram´ırez has shown us an unpublished preprint which covers our results.) Similarly, a support theorem was established by Friz and Victoir [68]; we have here outsourced the key estimates in the abstract setting of Section E.6 of Appendix E and obtained a slight sharpening of the result. The restriction to H¨ older exponent < 1/4 in the (support) Theorem 16.33 is almost certainly a technical one though. Indeed, Theorem 16.39 shows that every ball around the trivial zero-path is charged in (rough-path) H¨ older metric, for any exponent < 1/2, which is what one suspects. The problem, in contrast to the similar support discussion of Section 13.8, is the lack of the Cameron–Martin theorem in the present context. It is conjectured that the use of (time-dependent, non-symmetric) Dirichlet forms will allow us to generalize our discussion of Section 16.8.3 so as to obtain a support theorem for exponent < 1/2. In Appendix 16.9 we collect some classical analytic results for free nilpotent groups. See Folland and Stein [54] for a general discussion. The Poincar´e inequality (Proposition 16.49) appears explicitly in Jerison [93]; our simpliﬁed proof is a variation of a proof attributed to Varopoulos by Hajtasz and Koskela [83]. Theorem 16.50 on the consistency between intrinsic and Carnot–Caratheodory distance is taken from Carlen et al. [22].

Part IV

Applications to stochastic analysis

17 Stochastic diﬀerential equations and stochastic ﬂows We saw in Part III that large classes of multidimensional stochastic processes, including semi-martingales, Gaussian and Markov processes, are naturally enhanced to random rough paths, i.e. processes whose sample paths are almost surely geometric rough paths. Clearly then, there is a pathwise notion of stochastic diﬀerential equation driven by such processes, simply by considering the rough diﬀerential equation driven by (geometric rough path) realizations of the enhanced process. As will be discussed in this chapter, there is a close link to (Stratonovich) stochastic diﬀerential equations, based on stochastic integration theory. But ﬁrst, we start with a working summary of “rough path” continuity results (with precise references to the relevant statements in Part II).

17.1 Working summary on rough paths 17.1.1 Iterated integration

Let p ≥ 1 and x ∈ C p-var [0, T ] , G[p] Rd be a weak geometric p-rough path. From Corollary 9.11, the Lyons-lifting map SN SN

: Cop-var [0, T ] , G[p] Rd → Cop-var [0, T ] , GN Rd , : Co1/p-H¨o l [0, T ] , G[p] Rd → Co1/p-H¨o l [0, T ] , GN Rd

is continuous in p-variation and 1/p-H¨older (rough path) topology respectively. We think of SN (x) as attaching higher iterated integrals to x. Indeed, when x consists of all iterated integrals (in the Riemann–Stieltjes sense) up to order [p] of some x ∈ C 1-var [0, T ] , Rd , that is x = S[p] (x), then SN (x) = SN (x).

17.1.2 Integration

Let γ > p ≥ 1 and ϕ = (ϕ1 , . . . , ϕd ) ⊂ Lipγ −1 Rd , Re . From Theorem 10.47, there is a unique rough integral of ϕ against x and1 1 By

convention, rough integrals are G [p ] (Re )-valued; π 1 is the projection to Re .

504

Stochastic diﬀerential equations and stochastic ﬂows

Cop-var [0, T ] , G[p] Rd

→

x

→

Cop-var ([0, T ] , Re ) · ϕ (x) dx π1 0

is continuous in p-variation topology. The same statement holds in 1/pH¨ older topology. This integral generalizes the Riemann–Stieltjes integral in the sense that for x = S[p] (x) with x ∈ C 1-var [0, T ] , Rd , we have · · π1 ϕ (x) dx = ϕ (x) dx 0

0

where the right-hand side is a well-deﬁned Riemann–Stieltjes integral.

17.1.3 Diﬀerential equations Let γ > p ≥ 1 and V = (V1 , . . . , Vd ) ⊂ Lipγ (Re ), a collection of vector ﬁelds on Re . From Theorem 10.26 and its corollaries there is a unique solution y = π (V ) (0, y0 ; x) to the RDE dy = V (y) dx, started at y0 ∈ Re and Cop-var [0, T ] , G[p] Rd

→

Cop-var ([0, T ] , Re )

x

→

π (V ) (0, y0 ; x)

is continuous in p-variation topology. The same statement holds in 1/pH¨older topology. RDEs generalize ODEs in the sense that for x = S[p] (x) with x ∈ C 1-var [0, T ] , Rd , we are dealing with a solution to the classical ODE dy = V (y) dx, understood as a Riemann–Stieltjes integral equation. One has also the reﬁned result that uniqueness/continuity holds when V ⊂ Lipp (Re ) and x has ﬁnite p-variation. In fact, motivated by sample path properties of (enhanced) Brownian motion, namely B (ω) ∈ / C 2-var but B (ω) ⊂ C ψ 2 , 1 -var ∩ C α -H¨o l a.s. for any α ∈ [0, 1/2), the regularity assumption on x can be further relaxed to ﬁnite ψ p,1 -variation, C p-var ⊂ C ψ p , 1 -var ⊂ C (p+ε)-var , cf. Deﬁnition 5.45. The continuity statements become somewhat more involved, but this will not be restrictive in applications. For V ⊂ Lipp (Re )

17.1 Working summary on rough paths

505

and 1 ≤ p < p < [p] + 1 we have continuity x ∈ C ψ p , 1 -var [0, T ] , G[p] Rd , dp -var → π (V ) (0, y0 ; x) ∈ C p -var ([0, T ] , Re ) , |·|p -var and, given 1 ≤ p < p < p < [p] + 1, x ∈ C ψ p , 1 -var ∩ C 1/p -H¨o l [0, T ] , G[p] Rd , d1/p -H¨o l → π (V ) (0, y0 ; x) ∈ C 1/p -H¨o l ([0, T ] , Re ) , |·|1/p -H¨o l . Elements in the (non-complete, non-separable) metric space C ψ p , 1 -var ∩ C 1/p -H¨o l [0, T ] , G[p] Rd , d1/p -H¨o l are simply weak geometric 1/p -H¨older rough paths with additonal ψ p,1 -var regularity. To avoid measurability issues arising from non-separability, we can restrict attention to the (non-complete, separable) metric space C ψ p , 1 -var ∩ C 0,1/p -H¨o l [0, T ] , G[p] Rd , d1/p -H¨o l , elements of which are geometric 1/p -H¨older rough paths with additional ψ p,1 -var regularity.

17.1.4 Diﬀerential equations with drift Let us now consider two collections of vector ﬁelds, V W

= (V1 , . . . , Vd ) ⊂ Lipγ (Re ) , = (V1 , . . . , Vd ) ⊂ Lipβ (Re ) ,

driven by x ∈ C p-var [0, T ] , G[p] Rd and h ∈ C 1-var [0, T ] , Rd , respectively. When γ > p ≥ 1 and β > 1 it follows from Theorem 12.10 that there is a unique solution y = π (V ,W ) (0, y0 ; (x,h)) to the RDE with drift dy = V (y) dx + W (y) dh, y (0) = y0 ∈ Re , started at y0 ∈ Re and x ∈ C p-var [0, T ] , G[p] Rd , dp-var → π (V ,W ) (0, y0 ; (x, h)) ∈ C p-var ([0, T ] , Re ) is continuous in p-variation topology. Again, the same statement holds in 1/p-H¨ older topology.

Stochastic diﬀerential equations and stochastic ﬂows

506

One has also the reﬁned result that uniqueness/continuity holds when γ = p, β = 1 and the regularity assumption on x is relaxed to ﬁnite ψ p,1 variation: from Theorem 12.11 (and the remarks afterwards) with 1 ≤ p < p < p < [p] + 1 we have the following continuity statements: x

C ψ p , 1 -var [0, T ] , G[p] Rd , dp -var → π (V ,W ) (0, y0 ; (x, h)) ∈ C p -var ([0, T ] , Re ) , |·|p -var

∈

and x

∈ →

[0, T ] , G[p] Rd , d1/p -H¨o l π (V ,W ) (0, y0 ; (x, h)) ∈ C 1/p -H¨o l ([0, T ] , Re ) , |·|1/p -H¨o l .

C ψ p , 1 -var ∩ C 1/p

-H¨o l

17.1.5 Some further remarks For simplicity, we have only stated continuity of the RDE solutions as a function of x. In fact, looking at the relevant statements in Part II shows continuity of (y0 , x;V ) → π (V ) (0, y0 ; x) and similar for RDEs with drift (see e.g. Corollary 10.40 and Theorem 12.11). It is also worth remarking that the assumption V ⊂ Lipγ (Re ) , γ > p, allows for continuity results on the level of ﬂows of diﬀeomorphisms, cf. Section 11.2 and Section 17.5 below.

17.2 Rough paths vs Stratonovich theory In the present section we make the link to the theory of stochastic integration (resp. diﬀerential equations) with respect to (continuous) semimartingales.2

17.2.1 Stratonovich integration as rough integration We show in this subsection that rough integration along enhanced semimartingales coincides with classical Stratonovich integration. The case of integration against (enhanced) Brownian motion is, of course, a special case. 2 As is customary in this context, it is understood that there is an underlying probability space, say (Ω, F, P), where F is P-complete and there will be a right-continuous ﬁltration (Ft )t ≥0 with F0 containing all P-null sets (the “usual conditions”).

17.2 Rough paths vs Stratonovich theory

507

Let N be a real-valued continuous semi-martingale, M = M 1 , . . . , M d an Rd -valued continuous semi-martingale and f ∈ C 1 (Rn , R). We ﬁx a time-horizon [0, T ] and a sequence of dissections (Dn ) with mesh |Dn | → 0. As usual, N D n denotes the path obtained by piecewise linear approximation to the (sample) path N = N (·; ω) and the same notation applies to f (M ) = f (M (·; ω)). It is a routine exercise in stochastic integration3 to show that4 lim

n →∞

t

Dn

[f (M )] 0

1 2 i=1 d

t

dN D n =

f (M ) dN + 0

t

2 3 ∂i f (M ) d M i , N ,

0

(17.1) in probability and uniformly in t ∈ [0, T ]. One can then use either side as a deﬁnition of the Stratonovich integral of f (M ) against N , denoted by

t

f (M ) ◦ dN. 0

This allows us in particular to deﬁne

t

ϕ (M ) ◦ dM ≡ 0

d i=1

t

ϕi (M ) ◦ dM i

0

where ϕ = (ϕ1 , . . . , ϕd ) is a collection of C 1 Rd , Re -functions. On the other hand, given γ > 2 we can pick p ∈ (2, min (3, γ)) and know from Proposition 14.9 that M can be enhanced to a geometric p-rough path M, i.e. M (ω) ∈ C 0,p-var [0, T ] , G2 Rd . In particular, there is a well-deﬁned rough integral in the sense of Section 10.6:

t

ϕ (M ) dM 0

provided ϕ ∈ Lipγ −1 with γ > p > 2.

Proposition 17.1 Let γ > 2 and ϕi ∈ Lipγ −1 Rd , Re for i = 1, . . . , d, and M an Rd -valued semi-martingale. Then the rough integral of the enhanced semi-martingale M against ϕ exists and, with probability one, t t ϕ (M ) ◦ dM = π 1 ϕ (M ) dM . ∀t ∈ [0, T ] : 0

0

t Proof. The fact that 0 ϕ (M ) dM is well-deﬁned follows from the fact that M is almost surely a geometric p-rough path, for any p ∈ (2, min (3, γ)) , see 3 See, for example, Stroock [160, p. 229]; and recall that [f (M )]D denotes the piecewise linear

approximation to the path f (M ·), based on the dissection D. 4 f (M ) dN denotes the Itˆ o stochastic integral.

Stochastic diﬀerential equations and stochastic ﬂows

508

Theorem 14.12. Take a sequence of dissections (Dn ) with mesh |Dn | → 0. From Theorem 14.16, at the beginning of that chapter, we and the remarks know that dp-var S2 M D n , M →n →∞ 0 in probability and so, by basic continuity properties of rough integration (cf. Section 10.6), t t dp-var π 1 ϕ (M ) dM , ϕ M D n dM D n →n →∞ 0 0

0

in probability. On the other hand, from (17.1) we have t t D d∞ ϕ (M ) ◦ dM, [ϕ (M )] n dM D n →n →∞ 0 0

0

in probability. So, to conclude the proof, we only need to prove that t Dn Dn Dn − [ϕ (M )]s ϕ Ms dMs →n →∞ 0 sup t∈[0,T ]

0

in probability. Given a dissection D = (ti ) and s ∈ [0, t] we have s − sD Ms D ,s D ϕ MsD = ϕ Ms D + D s − sD s − sD γ −1 Ms D ,s D + O |M |0;[s D ,s D ] , = ϕ (Ms D ) + ϕ (Ms D ) D s − sD s − sD D [ϕ (M )]s = ϕ (Ms D ) + D [ϕ (Ms D ) − ϕ (Ms D )] s − sD s − sD γ −1 ϕ (Ms D ) Ms D ,s D + O |M |0;[s D ,s D ] . = ϕ (Ms D ) + D s − sD It follows that for Dn = (ti ) and s ∈ [ti , ti+1 ] , D γ −1 ϕ MsD n − [ϕ (M )]s n = O |M |0;[t i ,t i + 1 ] , and hence that t Dn Dn γ Dn − [ϕ (M )]s ϕ Ms dMs ≤ c1 sup |M |0;[t i ,t i + 1 ] t∈[0,T ]

0

i

which converges to zero, even almost surely, with |Dn | → 0, as is easily seen from Theorem 8.22, or directly from γ p γ −p sup |M | |M |0;[t i ,t i + 1 ] ≤ |M |p-var;[0,T ] i

|t−s|≤|D n |

and a.s. uniform continuity of t ∈ [0, T ] → Mt . This concludes the proof.

17.2 Rough paths vs Stratonovich theory

509

Exercise 17.2 Let M be a continuous, Rd -valued local martingale with lift M. (i) Show that the collection of iterated Stratonovich integrals up to level N , i.e. Strato (M )t := SN

1, Mt , . . . ,

{0< s 1 < ···s N < t}

◦dMs 1 ⊗ · · · ⊗ ◦dMs N

viewed as a continuous path in GN Rd coincides with the Lyons lift SN(M). (ii) Let F be a moderate function (cf. Deﬁnition 14.4). Show that there exists C = C (N, F, d) such that   1/N   ◦dMs 1 ⊗ · · · ⊗ ◦dMs N E F  sup 0≤t< ∞ {0< s 1 < ···s N < t} 1/2 ≤ CE F |!M "∞ | . Solution. (i) is an easy consequence of Proposition 17.1 and we have in particular ◦dMs 1 ⊗ · · · ⊗ ◦dMs N = π N (SN (M)) . {0< s 1 < ···s N < t}

Clearly, 1/N ◦dMs 1 ⊗ · · · ⊗ ◦dMs N ≤ sup 0≤t< ∞ {0< s 1 < ···s N < t}

sup SN (M)0,t

0≤t< ∞

≤ SN (M)p-var;[0,∞) . From Theorem 9.5, we have for some constant c1 = c1 (N, p) , SN (M)p-var;[0,∞) ≤ c1 Mp-var;[0,∞) . Hence, using Theorem 14.12 we see that   1/N   E F  sup ◦dMs 1 ⊗ · · · ⊗ ◦dMs N 0≤t< ∞ {0< s 1 < ···s N < t} ≤ E F c1 Mp-var;[0,∞) 1/2 . ≤ c2 E F |!M "∞ |

17.2.2 Stratonovich SDEs as RDEs We extend the result of the previous section to diﬀerential equations. The main point is that solutions to RDEs driven by (enhanced) semimartingales solve the corresponding Stratonovich stochastic diﬀerential

Stochastic diﬀerential equations and stochastic ﬂows

510

equation. Again, the case of (enhanced) Brownian motion is a special case of this. We start by recalling that a solution to the Stratonovich stochastic diﬀerential equation d Vi (Y ) ◦ dM i , (17.2) dY = i=1

driven by a general (continuous) semi-martingale M = M 1 , . . . , M d is, by deﬁnition, a solution to the integral equation Y0,t =

d i=1

t

Vi (Y ) dM i +

0

d 3 2 1 t Vi Vj (Y ) d M i , M j , 2 i,j =1 0

(17.3)

assuming V ∈ C 1 so that Vi Vj ≡ Vik ∂k Vj is well-deﬁned. Obviously then, Y is a semi-martingale itself and from basic stochastic calculus Y is indeed a solution to the Stratonovich integral equation Y0,t =

d i=1

t

Vi (Y ) ◦ dM i

0

where the Stratonovich integral on the right-hand side was deﬁned in the previous section. The extension to SDEs with drift-vector ﬁelds W = d process H = (H1 , . . . , Hd ) (W1 , . . . , Wd ), driven by an R -valued adapted 1-var d with sample paths in C [0, T ] , R , is easy since H itself is a semimartingale (with vanishing quadratic variation): a solution to the Stratonovich SDE with drift dY =

d

Vi (Y ) ◦ dM + i

i=1

d

Wj (Y ) dH j

(17.4)

i=1

is then a solution to the equation Y0,t

=

d

t

Vi (Y ) dM i +

0

i=1

d

+

j =1

d 3 2 1 t Vi Vj (Y ) d M i , M j 2 i,j =1 0

t

Wj (Y ) dH j .

(17.5)

0

Theorem 17.3 Let p, γ be such that 2 < p < γ. Assume (i) V = (Vi )1≤i≤d is a collection of vector ﬁelds in Lipγ (Re ); (i bis) W = (Wi )1≤i≤d is a collection of vector ﬁelds in Lip1 (Re ); d -valued semi-martingale, enhanced to M = M (ω) ∈ C 0,p-var (ii) M is an Rd 2 [0, T ] , G R almost surely;

17.2 Rough paths vs Stratonovich theory

511

d (ii bis) H is an R -valued continuous, adapted process, so that H (ω) ∈ C 1-var [0, T ] , Rd almost surely; (iii) y0 ∈ Re . Then the (for a.e. ω well-deﬁned) RDE solution

Y (ω) = π (V ,W ) (0, y0 ; (M (ω) , H (ω))) solves the Stratonovitch SDE dY =

d i=1

Vi (Y ) ◦ dM i +

d

Wj (Y ) dH j , Y (0) = y0 .

(17.6)

i=1

Remark 17.4 In the case of driving Brownian motion, V ∈ Lip2 and W ∈ Lip1 suﬃces to have an ω-wise uniquely deﬁned RDE solution (which then solves the Stratonovich SDE driven by Brownian motion). Indeed, in the drift-free case (W ≡ 0) this follows from ψ 2,1 -variation of Brownian motion and Theorem 10.41. In the drift case, we rely on Theorem 12.11. Remark 17.5 Our proof of Theorem 17.3 does not rely on any existence results for Stratonovich SDEs (and in fact yields such a result). En passant, we obtain the classical Wong–Zakai theorem (e.g. [88], [160] or [97]), which asserts π (V ) 0, y0 ; M D n → Y in probability, uniformly on [0, T ] , as an immediate corollary of our Theorem 17.3. Conversely, if one accepts the Wong–Zakai theorem then continuity of the Itˆo–Lyons map combined with dp-var S2 M D n , M →n →∞ 0 in probability (Theorem 14.16) immediately tells us that π (V ,W ) (0, y0 ; (M (ω) , H (ω))) is a Stratonovich solution. Proof. We may assume that M is a continuous local martingale (since its bounded variation part can always be added to the “drift”-signal H ). By a localization argument we may assume that5 3 2 !M " ≡ M i , M j : i, j ∈ {1, . . . , d} and the p-variation of the enhanced martingale M remains bounded. We ﬁx a sequence of dissections Dn = (tnk ) with |Dn | → 0, and write, as 5 The notation here is in slight contrast to Section 14.1, where we preferred to set 3 2 M ≡ M i , M i : i = 1, . . . , d . Let us remark, however, that the two quantities are comparable, as seen from the Kunita–Watanabe (or, in essence, the Cauchy–Schwarz) inequality; see, for example, [143, Chapter IV, Corollary (1.16)].

Stochastic diﬀerential equations and stochastic ﬂows

512

usual, M D n , H D n for the respective piecewise linear approximations of M, H based on Dn . Deﬁne Y˜ n = π (V ,W ) 0, y0 ; (M D n , H D n ) , and also the Euler approximation with “backbone Y˜ n ” to (17.5), that is6 Yt nn

=

k+1

1 Ytnnk + V Y˜tnnk Mt nk ,t nk + 1 + V 2 Y˜tnnk !M "t n ,t n k k+1 2 n + W Y˜t n Ht n ,t n k

k

k+1

with Y·n deﬁned within the intervals tnk , tnk+1 by linear interpolation. We see that for a ﬁxed k, k −1 n n n n ˜ ˜ Yt nj ,t nj+ 1 − Yt nj ,t nj+ 1 Yt nk − Yt nk = j =0 k −1 k −1 n n δj + Xj ≤ j =0 j =0 where δ nj = π (V ,W ) tnj , Y˜tnnj ; (M D n , H D n ) n n − E(V ) Y˜tnnj , S2 M D n t n ,t n j j+1 t j ,t j + 1 −E(W ) Y˜tnnj , Ht nj ,t nj+ 1 and Xjn =

1 2 ˜n V Yt nj !M "t n ,t n − Mt⊗2 . n ,t n j j+1 j j+1 2

We now apply Corollary 12.8 (or Davie’s estimate, Lemma 10.7, in the case of no drift). Using only Lipγ −1 regularity for V and Lipγ −2 regularity for W we have that, for some θ > 1, k −1

δ nj

≤

c1

j =0

k D p S2 M n j =0

≤

c2

k j =0

6 In

d

i,j = 1

p-var; [t nj ,t nj + 1 ]

pθ

|M |p-var;

[t nj ,t nj+ 1 ]

+ |H D n |1-var; [t n ,t n ] j j+1

θ

+ |H|θ1-var; [t n ,t n ] → 0 a.s. j j+1

what follows V (·) M s , t stands for 3 2 V i V j (·) M i , M j s , t .

d i= 1

V i (·) M si , t and V 2 (·) M s , t for

17.2 Rough paths vs Stratonovich theory

513

as |Dn | → 0 where we used piecewise linearity of M D n on tnj , tnj+1 so that D S2 M n

p-var; [t nj ,t nj + 1 ]

=

D M n

≤

31−1/p |M |p-var; [t n ,t n ]. j j+1

p-var; [t nj ,t nj + 1 ]

⊗2 d×d On theother -valued) martingale and hand, t → !M "t − Mt is an (R k −1 2 ˜n since V Y n is Ft n -measurable it follows that X n : k = 1, 2, . . . tj

j =0 2

j

j

is a martingale. Hence, using in particular Doob’s L -inequality and orthogonality of martingale increments, we have 2  12 k −1  # D n Xjn  E max k =1 j =0 

≤

 2  12 # D −1 n   2E  Xjn  j =0 

=

2E 

#D n −1

 12 n 2 Xj 

j =0

 ≤

2 |V |Lip 1 E 

#D n −1

 ≤

c1 |V |Lip 1 E 

1 2 2  n ,t n !M "t nj ,t nj+ 1 − Mt⊗2 j j+1

j =0 #D n −1

 2  !M "t nj ,t nj+ 1

j =0

→ 0 as |Dn | → 0. The last estimate comes from the (classical) Burkholder–Davies–Gundy inequality (Theorem 14.6), and the ﬁnal convergence is justiﬁed by 2 | !M " n j t j ,t nj + 1 | → 0 and bounded convergence. Switching to a subsequence, if necessary, we see that n Yt nk − Y˜tnnk → 0 a.s. and it is a small step to see that this implies n ˜n Y − Y

∞;[0,T ]

→ 0 a.s.

Now, if V ∈ Lipγ and W ∈ Lip1 , then Y˜ n converges in probability (and uniformly on [0, T ]) to the (pathwise unique) RDE solution Y˜ = π (V ,W ) (0, y0 ; (M, H))

Stochastic diﬀerential equations and stochastic ﬂows

514

and so we see that Y n − Y˜

∞;[0,T ]

→ 0 in probability.7 On the other hand,

from the deﬁnition of (Y n ) we have that for all t ∈ (0, T ], 1 n V Y˜tnnk Mt nk ,t nk + 1 + V 2 Y˜tnnk !M "t n ,t n Y0,t D n = k k+1 2 k :t nk < t + W Y˜tnnk Ht nk ,t nk + 1 ,

(17.7)

where as usual tD n denotes the right-hand neighbour to t in Dn . We observe n n ˜ that Jt := V Yt is uniformly bounded by |V |∞ (and hence by any Lipγ -norm . . . ), Ft -measurable for t ∈ Dn (and Ft D n -measurable for a general t ∈ [0, T ]). We note that Jt := limn →∞ Jtn = V Y˜t exists in probability and uniformly is adapted, thanks to right in t ∈ [0, T ], and continuity of (Ft ). Write Jt D n : t ∈ [0, T ] for the piecewise constant, leftpoint approximation; that is, equal to Jt nk whenever t ∈ (tnk , tnk+1 ]. Similarly for JtnD n : t ∈ [0, T ] . Then V Y˜tnnk Mt nk ,t nk + 1 k :t nk < t

tD n

JsnD n dMs

= 0

=

Js D n dMs +

0

→

tD n

t

0

tD n

JsnD n − Js D n

dMs

V Y˜s dMs in probability as n → ∞;

0

where we used convergence of left-point Riemann–Stieltjes approximations to the Itˆ o-integral, as well as 0

tD n

JsnD n − Js D n

dMs → 0 in probability as n → ∞,

as is easily seen from the dominated convergence theorem for stochastic integrals.8 Similarly, but easier, the other two terms of the right-hand side of (17.7) are seen to converge to the Riemann-Stieltjes integrals 1 t 2 ˜ V Ys d !M "s + W Y˜sn dHs . 2 0 7 Whatever subsequence we have so far extracted we can also extract a further subsequence along which we have a.s. convergence, and this in fact implies that the original sequence converges in probability. 8 See, for example, [143, Chapter IV, (2.12)].

17.3 Stochastic diﬀerential equations driven by non-semi-martingales

515

n ˜ At last, using Y0,t D n → Y0,t as n → ∞, we see that

Y˜0,t = 0

t

t 1 t 2 ˜ ˜ V Ys dMs + V Ys d !M "s + W Y˜s dHs 2 0 0

and so the proof is ﬁnished.

17.3 Stochastic diﬀerential equations driven by non-semi-martingales As was seen in Part III, there are many multidimensional stochastic processes which allow for a natural enhancement to (random) geometric prough paths. These include Brownian motion and semi-martingales, but also non-semi-martingales such as certain Gaussian and Markov processes. RDE theory leads immediately to a pathwise notion of stochastic diﬀerential equations driven by such processes. It is reassuring that such solutions have a ﬁrm probabilistic justiﬁcation. More precisely, if X = X (ω) denotes either an enhanced Brownian motion, semi-martingale, Gaussian or Markov process, then the abstract (random) RDE solution π (V ) (0, y0 ; X (ω)) can be identiﬁed as a (strong or weak) limit of various natural approximations. In particular, in all cases (cf. Sections 13.3.3, 14.5, 15.5.1, 16.5.2) we have seen the validity of a Wong–Zakai-type result in the sense that, for any (deterministic) sequence of dissections (Dn ) ⊂ D [0, T ] with mesh |Dn | → 0 Xn ≡ S[p] X D n → X, with X ≡ π 1 (X) (in rough path topology and, at least, in probability) so that the abstract (random) RDE solution π (V ) (0, y0 ; X) is identiﬁed as a “Wong–Zakai” limit of solutions to the approximating ODEs dy n = V (y n ) dX D n , y n (0) = y0 , and not dependent on the particular sequence (Dn ). We can thus characterize RDE solutions driven by Gaussian and Markov processes in a completely elementary way. Theorem 17.6 (diﬀerential equations driven by Gaussian signals) Assume that (i) X = X 1 , . . . , X d is a centred continuous Gaussian process on [0, T ] with independent components;

516

Stochastic diﬀerential equations and stochastic ﬂows

(ii) H denotes the Cameron–Martin space associated with X; (iii) the covariance of X is of ﬁnite ρ-variation in 2D sense for some ρ ∈ [1, 2); (iv) V = (V1 , . . . , Vd ) is a collection of Lipγ -vector ﬁelds on Re , with γ > 2ρ; (v) let (Dn ) ⊂ D [0, T ] with mesh |Dn | → 0. Then the (random) sequence of ODE solutions π (V ) 0, y0 ; X D n ⊂ C ([0, T ] , Re ) is Cauchy-in-probability with respect to uniform topology. The unique limit point, a C ([0, T ] , Re )-valued random variable, does not depend on the particular sequence (Dn ) and is identiﬁed as the (random) RDE solution π (V ) (0, y0 ; X) , where X is the natural enhancement of X to a geometric p-rough path, p ∈ (2ρ, min (γ, 4)). Theorem 17.7 (differential equations driven by Markovian signals) Assume that (i) X = X 1 , . . . , X d is a (symmetric) Markov process with uniformly elliptic generator in divergence form9 d 1 ij ∂i a (·) ∂j · 2 i,j =1

where a ∈ Ξ1,d (Λ), that is, measurable, symmetric and Λ−1 I ≤ a ≤ ΛI, for some Λ > 0, in the sense of positive deﬁnite matrices; (ii) V = (V1 , . . . , Vd ) is a collection of Lip2 -vector ﬁelds on Re ; (iii) let (Dn ) ⊂ D [0, T ] with mesh |Dn | → 0. Then the (random) sequence of ODE solutions π (V ) 0, y0 ; X D n ⊂ C ([0, T ] , Re ) is Cauchy-in-probability with respect to uniform topology. The unique limit point, a C ([0, T ] , Re )-valued random variable, does not depend on the particular sequence (Dn ) and is identiﬁed as the (random) RDE solution π (V ) (0, y0 ; X) , where X is the enhancement of X to a geometric α-H¨ older rough path, α ∈ (1/3, 1/2). The proof of these theorems involves little more than combining the convergence results of Sections 15.5.1, 16.5.2 with continuity properties of 9 Understood

in the weak sense, i.e. via Dirichlet forms.

17.4 Limit theorems

517

the Itˆo–Lyons map. Let us perhaps remark that enhanced Markov processes have ﬁnite ψ 2,1 -variation (exactly as enhanced Brownian motion), so that we can use the “reﬁned” continuity result with minimal Lip2 -regularity. (In fact, this would also work for enhanced Gaussian processes provided ρ = 1.)

17.4 Limit theorems 17.4.1 Strong limit theorems Since almost sure convergence and convergence in probability are preserved under continuous maps, continuity results for RDE solutions (such as those recalled in Section 17.1) imply immediately corresponding probabilistic limit theorems. For the reader’s convenience we spell out the following two cases; the (immediate) formulation for RDEs with drift is left to the reader. Theorem 17.8 Assume that older (i) (Xn ) is a sequence of random geometric p-rough paths (resp. 1/p-H¨ rough paths) such that Xn → X∞ a.s. [or: in probability, or: in Lq (P) ∀q < ∞] in p-variation (resp. 1/p-H¨ older) rough path topology; (ii) V = (V1 , . . . , Vd ) ∈ Lipγ (Re ) , γ > p and y0 ∈ Re . Then π (V ) (0, y0 ; Xn ) → π (V ) (0, y0 ; X∞ ) a.s. [or: in probability, or: in Lq (P) ∀q < ∞] in p-variation (resp. 1/p-H¨ older) rough path topology. Proof. The case of almost sure convergence and convergence in probability is obvious from the above remarks. Stability of the Itˆ o–Lyons map under Lq (P)-convergence, for all q < ∞, follows from the (purely) deterministic estimate (10.15) of Theorem 10.14. Theorem 17.9 Assume that 1 ≤ < p < p < [p] + 1, p (i) (Xn ) ⊂ C ψ p , 1 -var [0, T ] , G[p] Rd (resp. C ψ p , 1 -var ∩C 1/p -H¨o l ) a.s. such that Xn → X∞

a.s. [or: in probability / Lq (P) for all q ∈ [1, ∞)]

older) rough path topology; in p -variation (resp. 1/p -H¨ (ii) V = (V1 , . . . , Vd ) ∈ Lipp (Re ) and y0 ∈ Re .

Stochastic diﬀerential equations and stochastic ﬂows

518

Then π (V ) (0, y0 ; Xn ) → π (V ) (0, y0 ; X∞ ) a.s. [or: in probability / Lq (P) for all q ∈ [1, ∞)] older) rough path topology. in p -variation (resp. 1/p -H¨ Exercise 17.10 In the context of either Theorem 17.8 or 17.9, assume Xn → X∞ in Lq (P) for some ﬁxed q < ∞. Use estimate (10.15) to discuss Lq˜ convergence of π (V ) (0, y0 ; Xn ) → π (V ) (0, y0 ; X∞ ) for suitable q˜ = q˜ (q) < ∞. Theorem 17.8 applies in particular if X is a semi-martingale or an enhanced Brownian motion. In the latter case, as detailed in Theorem 17.9, the assumptions on V and x can be slightly weakened (V ∈ Lipp (Re ) , x ∈ C ψ p , 1 -var . . . ); in all these cases the limiting RDE solutions are indeed classical Stratonovich solutions. But of course, these theorems can equally well be applied to (rough) diﬀerential equations driven by Gaussian or Markovian signals. There is no reason to list all possible approximation results: the reader may simply consult the catalogue of strong convergence results established in Sections 13.3, 14.5, noting that the molliﬁer and Karhunen–Lo´ eve approximations of Section 15.5 are also applicable to enhanced Brownian motion. Let us also draw attention to the existence of “non-standard” approximations X n , which may be based upon knowing only ﬁnitely many points (Xt : t ∈ Dn ) ⊂ Rd , with the property that X n → X, say uniformly and in probability, but such that π (V ) (0, y0 ; X n ) π (V ) (0, y0 ; X) . Indeed, in Theorem 13.24 we established a set of conditions under which SN (X n ) converges in probability/rough path sense10 to a limit with possibly “modiﬁed area”. The following corollaries are then an immediate consequence from Theorem 17.8. Corollary 17.11 (McShane [126]) Let BnM Sh denote the McShane interpolation to 2-dimensional Brownian motion, inExample 13.28, as deﬁned based on a ﬁxed interpolation function φ = φ1 , φ2 ∈ C 1 [0, 1] , R2 with φ (0) = (0, 0) and φ (1) = (1, 1). Then, given V = (V1 , V2 ) of Lip2 -regularity, the solutions to dy n = V (y n ) dBnM Sh ,

y n (0) = y0

1 0 Taking α ∈ (0, 1], N ≥ [1/α] and β = 1/N in Theorem 13.24 will lead to γ-H¨ o lder convergence for any γ < min (α, 1/N ). One can then pick γ large enough such that [1/γ] = N .

17.4 Limit theorems

519

converge (in α-H¨ older, α < 1/2/in probability) to the solution of the Stratonovich SDE

with c =

2 π

dy = V (y) ◦ dB + c [V1 , V2 ] (y) dt, y (0) = y0

1 1 1 − 2 0 φ˙ (s) φ2 (s) ds .

Proof. It was veriﬁed in Example 13.28 that the assumptions of Theorem 13.24 are met with α ∈ (1/3, 1/2) , β = 1/2 and N = 2. More precisely, Pt was identiﬁed as exp (tΓ) with 2 φ A0,1 0 π Γ= − π2 Aφ0,1 0 and so the “correction” drift-vector ﬁeld is of the form 2 φ 2 A0,1 t + [V2 , V1 ] d − Aφ0,1 t [V1 , V2 ] d π π 4 φ A [V1 , V2 ] dt = π 0,1 1 1 2 2 ˙ 1−2 φ (s) φ (s) ds [V1 , V2 ] dt. = π 0

Corollary 17.12 (Sussmann [167]) Let BnSm denote Sussmann’s approximation to d-dimensional Brownian motion, constructed in detail in Exam ⊗N , N ∈ {2, 3, . . . }, by (reple 13.27 for some ﬁxed v ∈ gN Rd ∩ Rd peated) concatenation of linear chords and “geodesic loops” associated with v. Then, given V = (V1 , . . . , Vd ) of LipN -regularity, the solutions to dy n = V (y n ) dBnSm ,

y n (0) = y0

converge (in α-H¨ older, α < 1/N /in probability) to the solution of the Stratonovich SDE   i 1 ,...,i N  dt. Vi 1 , . . . , Vi N −1 , Vi N . . . z v dy = V (y) ◦ dB +  i 1 ,...,i N

In particular, by suitable choice of N and v every possible Lie bracket of {V1 , . . . , Vd } can be made to appear as a drift-vector ﬁeld to the limiting SDE. Proof. It was veriﬁed in Example 13.27 that the assumptions of Theorem 13.24 are met with α ∈ (1/3, 1/2) , β = 1/N .

520

Stochastic diﬀerential equations and stochastic ﬂows

17.4.2 Weak limit theorems Similar to the previous section, a “weak” probabilistic formulation of Lyons’ limit theorem is an immediate consequence of the fact that weak convergence is preserved under continuous maps. Again, the immediate extension to RDEs with drift is left to the reader. Theorem 17.13 Assume that older (i) (Xn ) is a sequence of random geometric p-rough paths (resp. 1/p-H¨ rough paths), possibly deﬁned on diﬀerent probability spaces, such that older) topology; Xn → X∞ weakly in p-variation (resp. 1/p-H¨ (ii) V ∈ Lipγ (Re ) , γ > p, and y0 ∈ Re . Then π (V ) (0, y0 ; Xn ) → π (V ) (0, y0 ; X∞ ) weakly in p-variation (resp. 1/p-H¨ older) rough path topology. Theorem 17.14 Assume that 1 ≤ p< p < p , ψ p , 1 -var [0, T ] , G[p] Rd (resp. C ψ p , 1 -var ∩C 1/p -H¨o l ) a.s. such (i) (Xn ) ⊂ C that older) topology; Xn → X∞ weakly in p -variation (resp. 1/p -H¨ (ii) V = (V1 , . . . , Vd ) ∈ Lipp (Re ) , and y0 ∈ Re . Then π (V ) (0, y0 ; Xn ) → π (V ) (0, y0 ; X∞ ) weakly in p -variation (resp. 1/p -H¨ older) rough path topology. Again, there is no reason to list all of the possible weak approximation results: the reader may simply consult the catalogue of weak convergence results established for enhanced Brownian motion, Gaussian and Markov processes and apply Theorem 17.13. Nonetheless, let us list a few. Example 17.15 (Donsker–Wong–Zakai) Let (ξ i : i = 1, 2, 3, . . . ) be a D sequence of independent random variables, identically distributed, ξ i = ξ q with zero mean, unit covariance, and moments of all orders, i.e. E |ξ| < ∞ (n ) for a rescaled, piecewise linearly connected, for all q < ∞. Write Wt random walk 1 (n ) Wt = 1/2 ξ 1 + · · · + ξ [tn ] + (nt − [nt]) ξ [n t]+1 . n Also, let V ∈ Lip2 (Re ). Then π (V ) 0, y0 ; W (n ) converges weakly, in αH¨older topology for any α ∈ [0, 1/2), to the Stratonovich solution of dY = V (Y ) ◦ dB, Y0 = y0 ∈ Re .

17.5 Stochastic ﬂows of diﬀeomorphisms

521

Example 17.16 (diﬀerential equations driven by Gaussian signals) Consider enhanced fractional Brownian motion BH and the resulting rough diﬀerential equations of the form dY H = V Y H dBH , Y = y0 . As an application of a general Gaussian approximation result, established in Section 15.6, we have BH → B1/2 as H → 1/2 weakly in α-H¨older topology for any α ∈ [0, 1/2) and hence weak convergence in α-H¨older topology for any α ∈ [0, 1/2), to the Stratonovich solution of dY = V (Y ) ◦ dB, Y0 = y0 ∈ Re , provided V ∈ Lipγ (Re ) , γ > 2.

Example 17.17 (differential equations driven by Markovian signals) Let an ∈ Ξ1,d (Λ), smooth, so that an → a ∈ Ξ1,d (Λ) , d n almost surely with respect to the ij measure on R . Let X denote d Lebesgue 1 the diﬀusion with generator 2 i,j =1 ∂i an (·) ∂j · . Itˆo theory allows us to construct X n as a semi-martingale (e.g. on d-dimensional Wiener space) and granted V ∈ Lip2 (Re ) , solutions to the Stratonovich SDEs

dY n = V (Y n ) dX n , Y0 = y0 ∈ Re are given by π (V ) (0, y0 ; X n ) and converge weakly, in α-H¨older topology for any α ∈ [0, 1/2), to the (random) RDE solution π (V ) (0, y0 ; X) where enhancement to the symmetric diﬀusion X with generator d X is the 1 ij i,j =1 ∂i a (·) ∂j · , understood via Dirichlet forms. 2

17.5 Stochastic ﬂows of diﬀeomorphisms Recall from Section 11.2, Corollary 11.14, that for V ∈ Lipγ + k −1 with γ > p ≥ 1 and x a geometric p-rough path π (V ) (0, ·; x)t is a C k -diﬀeomorphism and11 x → π (V ) (0, ·; x) ∈ Dk (Re ) is continuous in the sense of ﬂows of C k -diﬀeomorphisms. Once more, this can be applied immediately in a purely pathwise fashion to almost every sample path X = X (ω) of an enhanced Brownian motion (or semimartingale, Gaussian or Markov process) and every strong or weak 1 1 The

Polish space Dk (Re ) was deﬁned in Section 11.5.

522

Stochastic diﬀerential equations and stochastic ﬂows

approximation result for X leads to the corresponding limit theorem for the stochastic ﬂows. (This kind of reasoning is exactly as in Section 17.4.) To illustrate all this, consider an enhanced continuous semi-martingale M with sample path in C 0,p-var [0, T ] , G2 Rd almost surely with 2 < p < γ. We then learn, still assuming V ∈ Lipγ +k −1 , that π (V ) (0, ·; M) ∈ Dk (Re ) where π (V ) (0, ·; M (ω)) is not only a Stratonovich solution to the SDE dY = V (Y ) ◦ dM , with M = π 1 (M), but now the solution ﬂow to this equation (see [140, Section V.9] for a “classical” discussion of this). If we now assume that Mn → M (a.s., or: in probability) in p-variation rough path topology then we have12 (also a.s., or: in probability) sup ∂α π (V ) (0, y0 ; Mn ) − ∂α π (V ) (0, y0 ; M) p-var;[0,T ] → 0 as n → ∞.

y 0 ∈Re

The classical case is when Mn = S2 (B n ), lifted dyadic piecewise linear approximations to Brownian motion. In this case we recover a classical (Wong–Zakai-type) limit theorem for stochastic ﬂows, see [88] or [124]. The case Mn = S2 (M n ), lifted piecewise linear approximations to a generic semi-martingale, has been discussed in a classical context in [97]. We can also apply this to weak approximations. More precisely, if Mn → M weakly in C 0,p-var [0, T ] , G2 Rd in p-variation rough path topology, then still assuming V ∈ Lipγ + k −1 , π (V ) (0, ·; Mn ) → π (V ) (0, ·; M) weakly in Dk (Re ) . For instance, the weak (Donsker–Wong–Zakai-type) convergence of π (V ) 0, ·; W (n ) , where W (n ) is a rescaled d-dimensional random walk (cf. Example 17.15), also holds in the sense of ﬂows of C k -diﬀeomorphisms as long as V ∈ Lipγ + k −1 with γ > p ≥ 1. 1 2 The uniformity in y ∈ Re is a consequence of the invariance of the Lip γ -norm under 0 translation, ∀y 0 ∈ Re , γ > 0 : |V |L ip γ = |V (y 0 + ·)|L ip γ .

17.6 Anticipating stochastic diﬀerential equations

523

17.6 Anticipating stochastic diﬀerential equations 17.6.1 Anticipating vector ﬁelds and initial condition Because RDE solutions are constructed pathwise, it is clear that we can allow the vector ﬁelds V to be random as long as the appropriate Lipschitzregularity holds with probability one. In particular, there is no problem if this randomness anticipates the randomness of the driving signal. The same remark applies for the initial condition. With the focus on enhanced Brownian motion, we have the following result: Proposition 17.18 Assume that (i) B denotes a G2 Rd -valued enhanced Brownian motion, lifting B = π 1 (B); (ii) V = (V1 , . . . , Vd ) is a collection of random vector ﬁelds on Re , almost surely in Lip2 ; (iii) y0 is an Re -valued random variable; (iv) the stochastic process Y is deﬁned as the RDE solution of dY = V (Y ) dB, Y0 = y0 ∈ Re ; (v) (Dn ) ⊂ D [0, T ] with mesh |Dn | → 0. Then, for any α ∈ [0, 1/2), dα -H¨o l;[0,T ] π (V ) 0, y0 ; B D n , π (V ) (0, y0 ; B) → 0 as n → ∞, in probability and in Lq for all q < ∞. If (Dn ) is nested, e.g. Dn = (kT 2−n : k ∈ {0, . . . , 2n }), then convergence also holds almost surely. Inclusion of drift-vector ﬁelds is straightforward, as is the similar statement for full RDEs. It is also clear that any other strong convergence result for enhanced Brownian motion will yield a similar limit theorem for such “anticipating” stochastic diﬀerential equations. Under further regularity assumptions, V ∈ Lipγ + k −1 with γ > 2, the convergence holds at the level of C k -ﬂows. At last, the usual remark applies that B can be replaced by a variety of other rough paths, including enhanced semi-martingales, Gaussian and Markovian processes. Remark 17.19 Following Nualart–Pardoux (cf. [135] and the references cited therein) one can say that Y is a solution to the (anticipating) Stratonovich equation dY = V (Y ) ◦ dB, Y0 = y0 ∈ Re if, by deﬁnition, for every sequence (Dn ) with mesh tending to zero, t y0 + V (Y ) dB D n → Yt as t → ∞, 0

524

Stochastic diﬀerential equations and stochastic ﬂows

in probability and uniformly in t ∈ [0, T ]. It was veriﬁed in [30] that π (V ) (0, y0 ; B) is also a solution in the Nualart–Pardoux sense.

17.6.2 Stochastic delay diﬀerential equations Let ε ∈ (0, 1). A real-valued Brownian motion β, started at time −1 say, gives rise to the R2 -valued process by setting t ∈ [0, T ] → (β εt , β t ) := β t−ε , β t . On a suﬃciently small time interval (of length ≤ ε), it is clear that β ε and β have independent Brownian increments so that ε β s,t , β s,t : t ∈ [s, s + ε] has the distribution of a 2-dimensional standard Brownian motion (Bt : t ∈ [0, ε]) and so there is a unique solution to the Stratonovich SDE dYtε = V1 (Ytε ) ◦ dβ εt + V2 (Ytε ) ◦ dβ t , Y ε (0) = y0 ∈ Re

(17.8)

where V1 , V2 ∈ Lip2 (Re ). For t > ε, we are eﬀectively dealing with an anticipating SDE but one can (classically) get around this by solving (17.8) as a stochastic ﬂow: ﬁrst on [0, ε] then on [ε, 2ε] and so on. By composition, we can then deﬁne a solution to (17.8) over [0, T ]. It is easy to see that this construction is in precise agreement with solving the RDE V1 (Ytε ) dBε , Y ε (0) = y0 ∈ Re , dYtε = V2 where Bε is the lift of (β ε , β) constructed in Section 13.3.5. Indeed, from Theorem 17.3 (resp. Proposition 17.18) both constructions are consistent on [0, ε], then [ε, 2ε], etc. and hence on [0, T ]. Theorem 17.20 Let α ∈ (1/3, 1/2) and V1 , V2 ∈ Lip2 . Deﬁne for ε > 0, Y ε to be the solution of the anticipating Stratonovich SDE dYtε = V1 (Ytε ) ◦ dβ t + V2 (Ytε ) ◦ dβ t−ε , Y ε (0) = y0 , and Z to be the solution of the (standard) Stratonovich SDE dZ = (V1 (Z) + V2 (Z)) ◦ dβ t − [V1 , V2 ] (Z) dt. Then, for any α ∈ [0, 1/2), |Y ε − Z|α -H¨o l;[0,T ] → 0 as ε → 0 in probability and in Lq for all q < ∞.

17.7 A class of stochastic partial diﬀerential equations

525

Proof. Set V = (V1 , V2 ). It is clear from the remarks preceding this theorem that Y ε = π (V ) (0, y0 ; Bε ) solves the given Stratonovich SDE for Y ε . Similarly, setting W := [V1 , V2 ] ∈ Lip1 and β t = exp ((β t , β t ) ; 0) ∈ G2 R2 we see (from Theorem 17.3 and the ﬁrst remark thereafter) that Z := π (V ,W ) (0, y0 ; (β, t)) solves the Stratonovich SDE for Z. But then Theorem 12.14 tells us that almost surely, ˜ , Z = π (V ) 0, y0 ; X ˜ t = exp((β t , β t ) ; −t/2). For γ > 2, we conclude using Theorem where X 13.31 and continuity of the Itˆo–Lyons map. For γ = 2 we need Proposition 13.30 to ensure that every Bε has ﬁnite ψ 2,1 -variation.

17.7 A class of stochastic partial diﬀerential equations We now return to the setting of Section 11.3 where we studied non-linear evolution equations with “rough noise”, such as a typical realization of ddimensional Brownian motion and L´evy’s area, B (ω) = exp (B, A). The equation then reads (17.9) du = F t, x, Du, D2 u dt + Du (t, x) · V (x) dB, n u (0, ·) = u0 ∈ BUC (R ) , where F = F (t, x, p, X) ∈ C ([0, T ] , Rn , Rn , S n ) is assumed to be degenerate elliptic and u = u (t, x) ∈ BUC ([0, T ] × Rn ) is a real-valued function of time and space. Under the assumption that V = (V1 , . . . , Vd ) ⊂ Lipγ , γ > 4, and that F satisﬁes Φ(3) -invariant comparison (as discussed in detail and with examples in Section 11.3), we then have a robust notion of a solution to the above stochastic partial diﬀerential equation. Indeed, combining Theorem 11.16 with convergence of (lifted) piecewise linear approximations of B to B suggests calling the so-obtained solutions to (17.9) Stratonovich solutions, writing also13 du = F t, x, Du, D2 u dt + Du (t, x) · V (x) ◦ dB. (17.10) Let us leave aside the ﬁrst beneﬁt of this approach, which is to deal with SPDEs with non-Brownian and even non-semi-martingale noise.14 The 1 3 Further justiﬁcation for the “Stratonovich” notation (17.10) is possible, cf. the references at the end of this section. 1 4 It suﬃces to replace B by some other (e.g. Gaussian or Markovian) rough path.

526

Stochastic diﬀerential equations and stochastic ﬂows

continuous dependence on the driving signal B in rough path topology implies various stability results (i.e. weak and strong limit theorems) for such SPDEs: it suﬃces that an approximation to B converges in rough path topology; examples beyond “piecewise linear” are molliﬁer and Karhunen– L´ oeve approximations, as well as (weak) Donsker-type random walk approximations. A slightly more interesting example is left to the reader in the following Exercise 17.21 Let V = (V1 , . . . , Vd ) be a collection of C ∞ -bounded vector ﬁelds on Rn and B a d-dimensional standard Brownian motion. Show N that, for every α = (α1 , . . . , αN ) ∈ {1, . . . , d} , N ≥ k2, there exist (piecek wise) smooth approximations z to B, with each z only dependent on B (t) : t ∈ Dk where Dk is a sequence of dissections of [0, T ] with mesh tending to zero, such that almost surely z k → B uniformly on [0, T ] . But uk , the solutions to duk = F t, x, Duk , D2 uk dt + Duk (t, x) · V (x) dz k , uk (0, ·) = u0 ∈ BUC (Rn ) , converge almost surely locally uniformly to the solution of the “wrong” differential equation du = F t, x, Du, D2 u + Du (t, x) · Vα (x) dt + Du (t, x) · V (x) ◦ dB where Vα is the bracket-vector ﬁeld given by Vα = Vα 1 , Va 2 , . . . Vα N −1 , Vα N ]]]. (Hint: Combine Sussmann’s twisted approximations to Brownian motion (Exercise 13.27) with continuity of SPDE with respect to B in rough path topology.)

17.8 Comments Most of the material of Section 17.2 belongs to the folklore of rough path theory. We note that the Itˆ o stochastic diﬀerential equation dY = V (Y ) dB, V = (V1 , . . . , Vd ) ⊂ Lip2 can be written in Stratonovich form and then solved pathwise as a (unique) RDE solution 1 dY = V (Y ) dB − V 2 (Y ) d !M " , 2 3 2 i d 1 2 j with “Lip -drift” given by V (Y ) d !M " = i,j =1 Vi Vj (Y ) d M , M . Note that the existence of RDE solutions is ensured for V ⊂ Lipγ , γ > 1.

17.8 Comments

527

For a discussion of pathwise uniqueness under this assumption, we refer to Davie [37]. The classical Wong–Zakai theorem can be found, for instance, in Stroock [160], Kurtz et al. [97] or the classical monograph of Ikeda and Watanabe [88]. In the latter, the reader can also ﬁnd a criterion for convergence of the Itˆo map with “modiﬁed” limit which covers McShane’s example [126], but not Sussmann’s example [167]. The material of Section 17.3, on SDEs driven by non-semi-martingales, consists of essentially trivial corollaries of the relevant results of Parts II and III; but we have tried to make the statements accessible to readers with no background in rough path theory. Section 17.4.2 collects a number of weak limit theorems, including a “Donsker–Wong–Zakai” theorem which, perhaps, is known but for which we have failed to ﬁnd a reference. The discussion of stochastic ﬂows, Section 17.5, is rather immediate from the deterministic results of Chapter 11, nonetheless in striking contrast to the work required to obtain similar results previous to rough path theory (see Ikeda and Watanabe [88], Kunita [96] and Malliavin [124]). In the context of anticipating SDEs, Section 17.6, rough path theory was ﬁrst exploited by Coutin et al. [30]. Theorem 17.20 concerning a simple delay equation appears (without proof) in Lyons [117] and is attributed to Ben Hoﬀ [85]. See Neuenkirch et al. [133] for a recent study of “rough” delay equations. Section 17.7 is a straightforward application of the deterministic results of Section 11.3 to SPDEs of the form d H (x, Du) ◦ dB i , du = F t, x, Du, D 2 u dt + i=1

with F fully non-linear but H = (H1 , . . . , Hd ) linear (with respect to the derivatives of u); see Caruana et al. [24] (also for Exercise 17.21) and also Buckdahn and Ma [17]. In the case when both F and H are linear, (Wong– Zakai-type) approximations have been studied in great detail by Gy¨ ongy [78–82] and in Caruana and Friz [23] with rough path methods. The above class of (fully non-linear) SPDEs, possibly generalized to H = H (x, u, Du), is considered to be an important one (see Lions and Souganidis [109–112]) and the reader can ﬁnd a variety of examples (drawing from ﬁelds as diverse as ﬁltering and stochastic control theory, pathwise stochastic control, interest rate theory, front propagation and phase transition in random media, etc.) in the articles [110, 112]. See also the works of Buckdahn and Ma [17, 18, 19, 20]. Other classes of SPDEs (including a stochastic heat equation) can be studied using rough path methods; see Gubinelli and coworkers [76, 77], Teichmann [169] and also the relevant comments in Section 11.4.

18 Stochastic Taylor expansions Our very approach to rough diﬀerential equations was based on good estimates of higher-order approximations, such as obtained in Davie’s lemma. In particular, these can be written in the form of error estimates of higherorder Euler approximations (cf. Corollary 10.15). We shall now put these estimates in a stochastic context.

18.1 Azencott-type estimates We now consider RDEs driven by a (random) geometric rough path. To this end, ﬁx p ≥ 1 and let us ﬁrst consider a continuous G[p] -valued process X = Xt (ω) which satisﬁes   / 02  d (Xs , Xt )  sup E exp η = M < ∞. (18.1) 1/p 0≤s< t≤T |t − s| Recall that this assumption applies to enhanced Brownian motion and Markov processes with p = 2. It also applies to our class of enhanced Gaussian processes (although, in general, a deterministic time-change may be needed, cf. Exercise 15.36). From the results of Section A.4, Appendix A, this implies that X has a.s. ﬁnite ψ p,p/2 -variation and also ﬁnite 1/p H¨ older regularity for any p > p. In particular, by choosing p small enough older rough (such that [p] = [p ]) it is clear that X is a geometric 1/p -H¨ path. As a consequence, for any integer N ≥ [p] , there is a well-deﬁned Lyons lift of X to a GN -valued path, denoted by SN (X). Theorem 18.1 (Azencott-type estimates) Let γ > p ≥ 1 and let X be a continuous G[p] -valued process which satisﬁes (18.1). Let V = (Vi )1≤i≤d be a collection of Lipγ −1 vector ﬁelds in Re . Then, for any ﬁxed interval [s, t] ⊂ [0, T ] and time-s initial condition ys ∈ Re , γ /p P sup π (V ) (s, ys ; X)s,r − E(V ) ys , Sγ (X)s,r > R |t − s| r ∈[s,t]

#

≤ C exp −

1 2

|V |Lip γ −1 (Re )

R C

2/γ $

where C = C (M, η, γ, p). Under the weaker assumption V ∈ Lipγlo−1 c one

18.1 Azencott-type estimates

has1

lim P

t→0

529

sup π (V ) (0, y0 ; X)0,r − E(V ) y0 , Sγ (X)0,r > Rtγ /p

r ∈[0,t]

#

≤ C exp −

1 2

|V |Lip γ −1 (B (y 0 ,1))

R C

2/γ $ .

(18.2)

In particular, we see that supr ∈[0,t] π (V ) (0, y0 ; X)0,r − E(V ) y0 , Sγ (X)0,r tγ /p is bounded in probability as t → 0 and for all θ ∈ [0, γ/p) we have supr ∈[0,t] π (V ) (0, y0 ; X)0,r − E(V ) y0 , Sγ (X)0,r →0 tθ

(18.3)

in probability as t → 0. Proof. Let us ﬁx p ∈ (p, γ), e.g. (for the sake of tracking the constants) p = (p + γ) /2. Then a.e. X (ω) is a geometric p -rough path and there ˜ > 0, depending on η, M only, such that exist η˜, M   2  Xp -var;[s,t] ˜ < ∞;  = M (18.4) E exp η˜ 1/p |t − s| see equation (A.20) in Appendix A. Thanks to Lipγ −1 -regularity of the vector ﬁelds, γ > p , we have existence of an RDE solution, i.e. π (V ) (s, ys ; X) = ∅. As usual, we abuse the notation and write π (V ) (s, ys ; X) for any such (not necessarily unique) RDE solution. From our Euler RDE estimates, Corollary 10.15, there exists c1 = c1 (p , γ) such that γ /p P sup π (V ) (s, ys ; X)s,r − E(V ) ys , Sγ (X)s,r > R |t − s|

r ∈[s,t]

γ

γ

γ /p

≤ P c1 |V |Lip γ −1 Xp -var;[s,t] > R |t − s| 1/γ Xp -var;[s,t] R 1 > = P 1/p |V | c γ −1 1 |t − s| Lip 2/γ 1 R ˜ exp −η ≤ M . 2 |V |Lip γ −1 c1 1 If

+∞.

explosion happens, we agree that π (V ) (0, y 0 ; X)0 , s − E(V ) y 0 , S N (X)0 , s =

Stochastic Taylor expansions

530

At last, consider the case of Lipγloc−1 -vector ﬁelds V . For ﬁxed y0 , we can then ﬁnd V˜ ∈ Lipγ −1 so that V˜ ≡ V on a unit ball around y0 . Setting Yt ≡ π (V ) (0, y0 ; X)t and Y˜t ≡ π (V˜ ) (0, y0 ; X)t we see that P ≤P

sup Y0,r − E(V ) y0 , Sγ (X)0,r > Rtγ /p

r ∈[0,t]

sup Y0,r − E(V ) y0 , Sγ (X)0,r > Rtγ /p ; sup |Yr − y0 | < 1

r ∈[0,t]

+P

r ∈[0,t]

sup |Yr − y0 | ≥ 1

r ∈[0,t]

# $ 2/γ R ≤ c2 exp − + P sup |Yr − y0 | ≥ 1 c2 r ∈[0,t] where c2 depends on M, η, γ, p and |V |Lip γ −1 ;B (y 0 ,1) . Noting that P |Y − y0 |∞;[0,t] ≥ 1 → 0 as t → 0, the claimed estimate now follows. At last, observe that (thanks to θ < γ/p) for every ﬁxed ε, R > 0 there exists Rtγ /p ≤ εtθ for t small enough. It follows that, if zt = supr ∈[0,t] π (V ) (0, y0 ; X)s,r − E(V ) y0 , Sγ (X)0,r ,

lim P zt > εt

t→0

θ

≤ lim P zt > Rt t→0

γ /p

# $ 2/γ R ≤ c2 exp − c2

and since we take R arbitrarily large, we see that the lim sup is zero (and therefore a genuine lim). Example 18.2 Consider an enhanced Gaussian process X, which satiﬁes   / 0 2  d (Xs , Xt )  sup E exp η <∞ H 0≤s< t≤T |t − s| for some H ∈ (1/4, 1/2]. (After setting ρ = 1/ (2H), it was pointed out in Exercise 15.36 that this holds for all enhanced Gaussian processes run at the correct time-scale.) Let V = (V1 , . . . , Vd ) be a collection of smooth (possibly unbounded) vector ﬁelds. Then, for all N ∈ {2, 3, . . . }, we may apply Theorem 18.1 with γ = N + 1 and p = 1/H to see that, for every ﬁxed ε > 0, P π (V ) (0, y0 ; X)0,t − E(V ) y0 , SN (X)0,t > εtH N → 0 as t → 0+. This applies in particular to enhanced fractional Brownian motion with Hurst parameter H.

18.2 Weak remainder estimates

531

Let us now give a variation of Theorem 18.1 applicable to enhanced martingales. Theorem 18.3 Let V, γ, p be as above and let y = y (ω) = π (0, y0 ; M) be the (pathwise unique) random RDE solution to dy = V (y) dM where M = M (ω) is an enhanced Lq -martingale, q ∈ [1, ∞). Then for any ﬁxed t ∈ (0, 1] and 2/q q /2 = |!M "t |L q / 2 φ (t) := E |!M "t | we have P sup

for C = C (q, γ, p), γq γ 1 . π (0, y0 ; M)0,s −E(V ) y0 , Sγ (M)s,t > Rφ (t) 2 ≤ C R 0≤s≤t

Proof. Similar to the proof of Theorem 18.1 and left to the reader.

18.2 Weak remainder estimates Recall that the Euler approximation E(V ) (. . . ) came from setting f = I, the identity function, in the Taylor expansion N f π (V ) (0, y0 ; x)t =f (y0 )+

Vi 1 · · ·Vi k f (y) xk0,t,i 1 ,··· ,i k +RN (t, f ; x);

k =1 i 1 ,...,i k ∈{1,...,d}

valid, at least, for suﬃciently smooth f and x ∈ C 1-var [0, T ] , Rd with canonically deﬁned kth iterated integrals xk . This obviously makes sense for RDEs and we can ask for an estimate of the remainder term RN (t, f ; X) := f π (V ) (0, y0 ; X)t   N  − ) + f (y 0 

k =1 i 1 ,...,i k ∈{1,...,d}

 Vi 1 · · · Vi k f (y) Xk0,t,i 1 ,··· ,i k  ,

where we have abused the notation by writing X instead of SN (X). Theorem 18.4 Let γ > p ≥ 1 and let X be a continuous G[p] -valued process which satisﬁes (18.1). Let V = (Vi )1≤i≤d be a collection of Lipγ −1 vector ﬁelds in Re . Then, for any function f ∈ Lipγ (Re , R) we have ∀q ∈ [1, ∞) : Rγ (t, f ; X)L q = O tγ /p as t → 0. γ In the case of Lipγlo−1 c vector ﬁelds V and f ∈ Liplo c , we still have that for any θ ∈ [0, γ/p),

Rγ (t, f ; X) → 0 in probability as t → 0. tθ

532

Stochastic Taylor expansions

Proof. It it clear that Y ≡ π (V ) (0, y0 ; X)t and f (Y ) can be written jointly as an RDE solution; say dz = V˜ (z) dX with Lipγ −1 vector ﬁelds V˜ obtained from writing the ODE system dz (1) = V z (1) dx, dz (2) = f z (1) dz (1) = f V z (1) dx in the form dz = V˜ (z) dx, with z = z (1) , z (2) ∈ Re+1 . It follows that Rγ (t, f ; X) is precisely the (e + 1)th component of π (V˜ ) (0, y0 ; X)0,t − E(V˜ ) y0 , Sγ (X)0,t and the estimate of Theorem 18.1 is more than enough to ensure that ˆ t := Rγ (t, f ; X) /tγ /p has moments of all orders, the random variable R uniformly in t ∈ (0, 1]. But then, for all t ∈ (0, 1], ˆ Rγ (t, f ; X) q ≤ × tγ /p R sup L (P) t∈(0,1] q L (P)

and the proof is ﬁnished. In the case of Lipγlo−1 c vector ﬁelds V and f ∈ Lipγlo c the same construction yields V˜ ∈ Lipγlo c and we conclude again with Theorem 18.1.

18.3 Comments In a Brownian – and semi-martingale – context, the estimates of Section 18.1 go back to Azencott [4], Ben Arous [10] and Platen [138]. Estimate (18.2) plays an important role in subsequent developments, such as Castell [27]. For related works in a fractional Brownian rough path context, we mention Baudoin and Coutin [8]. Our presentations of Sections 18.1 and 18.2 improve on earlier results by the authors obtained in [67]. Let us also mention Aida [3] and then Inahama and Kawabi [89, 92], where the authors are led to somewhat diﬀerent (stochastic) Taylor expansions for rough diﬀerential equations (in essence, asymptotic development in ε of solutions to dy ε = V (y ε , ε) dx).

19 Support theorem and large deviations We now discuss some classical results of diﬀusion theory: the Stroock– Varadhan support theorem and Freidlin–Wentzell large deviation estimates. Everything relies on the fact that the Stratonovich SDE dY =

d

Vi (Y ) ◦ dB i + V0 (Y ) dt, Y0 = y0 ∈ Re ,

i=1

can be solved as an RDE solution which depends continuously on enhanced Brownian motion in rough path topology, subject to the suitable Lipregularity assumptions on the vector ﬁelds. (A summary of the relevant continuity results was given in Section 17.1.)

19.1 Support theorem for SDEs driven by Brownian motion Theorem 19.1 (Stroock–Varadhan support theorem) Assume that V = (V1 , . . . , Vd ) is a collection of Lip2 -vector ﬁelds on Re , and V0 is a Lip1 -vector ﬁeld on Re . Let B be a d-dimensional Brownian motion and consider the unique (up to indistinguishability) Stratonovich SDE solution Y on [0, T ] to dY =

d

Vi (Y ) ◦ dB i + V0 (Y ) dt, Y0 = y0 ∈ Re .

(19.1)

i=1

Let us write y h = π (V ,V 0 ) (0, y0 ; (h, t)) for the ODE solution to dy =

d

Vi (Y ) dhi + V0 (Y ) dt

i=1

started at y0 ∈ Re where h is a Cameron–Martin path, i.e. h ∈ W01,2 ([0, T ] , Rd . Then, for any α ∈ [0, 1/2) and any δ > 0, Y − y h lim P < δ |B − h| < ε →1 ∞;[0,T ] α -H¨o l;[0,T ] ε→0

(where the Euclidean norm is used for conditioning |B − h|∞;[0,T ] < ε).

534

Support theorem and large deviations

Proof. Without loss of generality, α ∈ (1/3, 1/2). Let us write, for a ﬁxed Cameron–Martin path h, Pε (·) ≡ P ·| |B − h|∞;[0,T ] < ε . From Theorem 17.3, there is a unique solution π (V ,V 0 ) (0, y0 ; (B, t)) to the RDE with drift dY = V (Y ) dB + V0 (Y ) dt, Y0 = y0 which then solves the Stratonovich equation (19.1). Set h ≡ S2 (h) and observe that h is of ﬁnite 1-variation and 1/2-H¨ older, hence h ∈ C ψ 2 , 1 -var [0, T ] , G2 Rd ∩ C 0,α -H¨o l [0, T ] , G2 Rd . Take α ∈ (1/3, α). We now use continuity of the Itˆ o–Lyons map from C ψ 2 , 1 -var [0, T ] , G2 Rd ∩C 0,α -H¨o l [0, T ] , G2 Rd →C 0,α -H¨o l ([0, T ] , Re ) , in (rough path) α-H¨older to (classical) α -H¨older topology, respectively, at the point h. Given δ > 0 ﬁxed, there exists η = η (h, δ) small enough such that for < δ. ∀B ∈ C ψ 2 , 1 -var ∩ C 0,α -H¨o l : dα -H¨o l (B, h) < η =⇒ Y − y h α -H¨o l

In particular, using the fact that B ∈ C ψ 2 , 1 -var ∩ C 0,α -H¨o l almost surely, Pε Y − y h α -H¨o l < δ ≥ Pε (dα -H¨o l (B, h) < η) →

1 as ε → 0

thanks to Theorem 13.66. Remark 19.2 The regularity assumptions of Theorem 19.9 are optimal in the sense that V ⊂ Lip2 and V0 ∈ Lip1 are needed for a unique Stratonovich solution. As an immediate corollary, we obtain the characterization of the support of the law of the solution of a Stratonovich SDE. Corollary 19.3 Assume that V = (V1 , . . . , Vd ) is a collection of Lip2 vector ﬁelds on Re , V0 a Lip1 -vector ﬁeld on Re , and B a d-dimensional Brownian motion. Consider the unique (up to indistinguishability) Stratonovich SDE solution on [0, T ] to dY =

d i=1

Vi (Y ) ◦ dB i + V0 (Y ) dt

19.1 Support theorem for SDEs driven by Brownian motion

535

started at some y0 ∈ Re . Then, for any α ∈ [0, 1/2), the topological support of Y , viewed as a C 0,α ([0, T ] ; Re )-valued random variable, is precisely the α-H¨ older closure of S = π (V ,V 0 ) (0, y0 ; (h, t)) , h ∈ W01,2 . Proof. The ﬁrst inclusion, supp (law of π (0, y0 ; B)) ⊂ S, is straight-forward from the Wong–Zakai theorem (equivalently: use Theorem 17.3 with Remarks 17.4 and 17.5). For the other inclusion (usually considered the diﬃcult one), it suﬃces to show that for every Cameron–Martin path h and every δ > 0, the event Ah,δ = Y − y h α -H¨o l < δ = {|π (0, y0 ; B) − π (0, y0 ; h)|α -H¨o l < δ} has positive probability. But this is an obvious consequence of Theorem 19.1. Remark 19.4 If one is only interested in the conclusion of Corollary 19.3, one bypasses the “conditional” consideration of Theorem 13.66 on which our proof of Theorem 19.1 relied. Indeed, in Section 13.7 we obtained with much less work (Theorem 13.54) the qualitative statement supp (law of B) = S2 (h) : h ∈ W01,2

(19.2)

(support and closure with respect to α-H¨older rough path topology) so that for any η > 0 and h ∈ W01,2 , P (dα -H¨o l (B, S2 (h)) < η) > 0. Given δ > 0 ﬁxed, there exists η = η (h, δ) small enough such that dα -H¨o l (B, h) < η =⇒ Y − y h α -H¨o l < δ. Hence P Y − y h α -H¨o l < δ ≥ P (dα -H¨o l (B, h) < η) > 0, which yields the (diﬃcult) inclusion in the Stroock–Varadhan support the orem. We can also deal with the support at the level of stochastic ﬂows (as discussed in Section 17.5). Theorem 19.5 (support for stochastic ﬂows) Assume that V = (V1 , . . . , Vd ) is a collection of Lipγ + k −1 -vector ﬁelds on Re , γ > p, so that π (V ) (0, ·; B), the solution ﬂow to the Stratonovich SDE Y = V (Y ) ◦ dB,

Support theorem and large deviations

536

induces a C k -ﬂow of diﬀeomorphisms and we can view π (V ) (0, ·; B) as a Dk (Re )-valued random variable.1 Then, for any h ∈ W01,2 [0, T ] , Rd , lim P dDk (Re ) π (V ) (0, ·; B) , π (V ) (0, ·; h) < δ |B − h|∞;[0,T ] < ε → 1 ε→0

and supp (law of π (0, ·; B)) = S ⊂ Dk (Re ). where S = π (0, ·; h) , h ∈ W01,2 [0, T ] , Rd

Proof. The argument is then similar to the proof of Theorem 19.1. Let α ∈ (1/3, 1/2). Let us write, for a ﬁxed Cameron–Martin path h, Pε (·) ≡ P ·| |B − h|∞;[0,T ] < ε . Thanks to Corollary 11.14, Lipγ + k −1 -vector ﬁelds imply continuity of → Dk (Re ) C 0,α -H¨o l [0, T ] , G2 Rd x → π (V ) (0, ·, x) and we simply use it at the point h ≡ S2 (h). Given δ > 0 ﬁxed, there exists η = η (h, δ) small enough such that for ∀B ∈ C 0,α -H¨o l : dα -H¨o l (B, h) < η =⇒ Y − y h α -H¨o l < δ. It follows, thanks to Theorem 13.66, that Pε Y − y h α -H¨o l < δ ≥ Pε (dα -H¨o l (B, h) < η) → 1 as ε → 0. The proof is ﬁnished.

19.2 Support theorem for SDEs driven by other stochastic processes The reader will have noticed the proofs of the previous section are essentially trivial corollaries of a suitable support description of enhanced Brownian motion in rough path topology, followed by appealing to continuity of the Itˆ o–Lyons map. We have seen in Part II (more precisely, Theorems 15.60 and 16.33) that similar support descriptions hold for enhanced Gaussian and Markovian processes. As a consequence, the very same arguments lead us to support theorems for stochastic diﬀerential equations driven by Gaussian and Markovian signals. We have 1 The

Polish space Dk (Re ) was deﬁned in Section 11.5.

19.2 Support theorem for SDEs driven by other stochastic processes

537

Proposition 19.6 Assume that (i) X = X 1 , . . . , X d is a centred continuous Gaussian process on [0, 1] with independent components; (ii) H denotes the Cameron–Martin space associated with X; (iii) the covariance of X is of ﬁnite ρ-variation dominated by some 2D control ω, for some ρ ∈ [1, 2); (iv) X denotes the natural lift of X to a G[2ρ] Rd -valued process (with geometric rough sample paths); (v) V = (V1 , . . . , Vd ) is a collection of Lipγ -vector ﬁelds on Re , with γ > 2ρ. Then, for any p > 2ρ, the topological support of the solution to dY = V (Y ) dX, Y (0) = y0 ∈ Re , viewed as a C 0,p-var ([0, T ] ; Re )-valued random variable, is precisely the pvariation closure of S = π (V ) (0, y0 ; h) , h ∈ H . If ω is H¨ older-dominated, the topological support of Y , viewed as a C 0,1/p-H¨o l ([0, T ] ; Re )-valued random variable, is precisely the 1/p-H¨ older closure of S = π (V ) (0, y0 ; h) , h ∈ H . Proof. Left to the reader. Proposition 19.7 Assume that (i) X = X 1 , . . . , X d is a Markov process with uniformly elliptic generator d in divergence form, 12 i,j =1 ∂i aij (·) ∂j · understood in a weak sense (i.e. via Dirichlet forms) where a ∈ Ξ1,d (Λ), that is, measurable, symmetric and Λ−1 I ≤ a ≤ ΛI in the sense of positive deﬁnite matrices; (ii) X denotes the natural lift of X to a G2 Rd -valued process (with geometric rough sample paths); (iii) V = (V1 , . . . , Vd ) is a collection of Lip4 -vector ﬁelds on Re . Then, for any α ∈ [0, 1/4), the topological support of the solution to dY = V (Y ) dX, Y (0) = y0 ∈ Re , viewed as a C 0,α ([0, T ] ; Re )-valued random variable, is precisely the αH¨ older closure of S = π (V ) (0, y0 ; h) , h ∈ W01,2 [0, T ] , Rd . ˜ = S4 (X), the step-4 Lyons lift of X. Clearly, Proof. Set X ˜ Y = π (V ) (0, y0 ; X) = π (V ) 0, y0 ; X ˜ ∈ C ψ 4 , 1 -var ∩ C α -H¨o l . It then suﬃces to use continuity of the solution and X ˜ → Y in α-H¨older topology for any α ∈ (1/5, 1/4), following the map X precise argument given in Remark 19.4.

Support theorem and large deviations

538

It should be noted that the restriction to H¨ older exponent < 1/4 (and Lip4 - rather than Lip2 -regularity) in Proposition 19.7 is a consequence of Theorem 16.33 where a support characterization of the enhanced Markov process X in α-H¨older rough path topology was only established for α < 1/4. As noted in the comments of Section 16.10, the result is conjectured to hold for α < 1/2 in which case we would have, for all h ∈ W01,2 [0, T ] , Rd and δ > 0, P π (V ) (0, y0 ; X) − π (V ) (0, y0 ; h) α -H¨o l;[0,T ] < δ > 0. In fact, we can show this for h = 0. Proposition 19.8 Under the assumptions of Proposition 19.7 but with the weakened regularity assumption V = (V1 , . . . , Vd ) ⊂ Lip2 (Re ) we have, for any α ∈ [0, 1/2), π (V ) (0, y0 ; X) < δ X∞;[0,T ] < ε → 1. lim P α -H¨o l;[0,T ] ε→0

As a consequence, for all δ > 0, P π (V ) (0, y0 ; X) α -H¨o l;[0,T ] < δ > 0. Proof. This follows readily from Theorem 16.39.

19.3 Large deviations for SDEs driven by Brownian motion In Theorem 13.42, we saw that Schilder’s theorem holds for enhanced Brownian motion. That is, if B denotes a G2 Rd -valued enhanced Brownian motion on [0, T ], then for any α ∈ [0, 1/2), the family (δ ε B : ε > 0) satisgiven by ﬁes a large deviation C 0,α -H¨o l [0, T ] , G2 Rd with rate function I (π 1 (·)) , where π 1 (·) denotes the projection of a G2 Rd -valued path to an Rd -valued path and # 2 T ˙ h dt for h ∈ W01,2 [0, T ] , Rd . 0 t (19.3) I (h) = +∞ otherwise Since π (V ) (0, y0 ; B), a Stratonovich solution to dY = V (Y ) dB, depends continuously on B in this α-H¨older rough path topology, we can apply the contraction principle to deduce (without any further work) a large deviation principle for solution of stochastic diﬀerential equations; better known as Freidlin–Wentzell estimates. More precisely, also including a drift term, we have

19.3 Large deviations for SDEs driven by Brownian motion

539

Theorem 19.9 (Freidlin–Wentzell large deviations) Assume that V = (V1 , . . . , Vd ) is a collection of Lip2 -vector ﬁelds on Re , and V0 is a Lip1 -vector ﬁeld on Re . Let B be a d-dimensional Brownian motion and consider the unique (up to indistinguishability) Stratonovich SDE solution on [0, T ] to d ε Vi (Y ) ◦ εdB i + V0 (Y ) dt dY = i=1

started at y0 . Let α ∈ [0, 1/2). Then Y ε satisﬁes a large deviation principle (in α-H¨ older topology) with good rate function given by J (y) = inf I (h) : π (V ,V 0 ) (0, y0 ; (h, t)) = y where I is given in (19.3). Proof. Let α ∈ (1/3, 1/2) without loss of generality. The Stratonovich solution is given by the random RDE solution π (V ,V 0 ) (0, y0 ; (δ ε B,t)) and depends continuously (see Theorem 12.10, or Theorem 10.26 in the absence of a drift term) on2 δ ε B ∈ C ψ 2 , 1 -var [0, T ] , G2 Rd ∩ C 0,α -H¨o l [0, T ] , G2 Rd ≡ C ψ 2 , 1 -var ∩ C 0,α -H¨o l with respect to α-H¨older rough path topology. Since ε B : ε > 0) satis (δ ﬁes a large deviation principle in C 0,α -H¨o l [0, T ] , G2 Rd with good rate function I and = 1, P δ ε B ∈C ψ 2 , 1 -var [0, T ] , G2 Rd it follows from Proposition C.5 that (δ ε B : ε > 0) satisﬁes a large deviation principle in the (non-complete, separable) metric space ψ -var ∩ C 0,α -H¨o l , dα -H¨o l C 2,1 with identical rate function. We conclude with the contraction principle, Theorem C.6. Remark 19.10 The regularity assumptions of Theorem 19.9 are optimal in the sense that Lip2 -regularity is needed for a unique Stratonovich solution. If one deals with Itˆ o stochastic diﬀerential equations, dY ε =

d

Vi (Y ) εdB i + V0 (Y ) dt,

i=1

us remark that under slightly stronger “Lip γ , γ > 2” regularity assumptions one can ignore ψ 2 , 1 -variation. 2 Let

540

Support theorem and large deviations

it is well known that Lip1 -regularity suﬃces for existence/uniqueness. In this case, the large deviation estimates of Theorem 19.9 are known (e.g. [41, Lemma 4.1.6]) to be valid with identical rate function. Exercise 19.11 Assume V0 = 0, d = e such that V1 , . . . , Vd span the tangent space at every point, a Riemannian metric !., ."(V ) is deﬁned by declaring V1 , . . . , Vd orthonormal. Express J (y) as the energy of the path y, i.e. 1 1 J (y) = !y˙ t , y˙ t "( V ) dt. 2 0 Let us discuss various extensions of this. As noted in Section 17.6, the rough path approach has no reliance whatsoever on adaptedness, and hence anticipating SDEs does not require a separate analysis. We can state the following large deviation principle for a class of such anticipating stochastic diﬀerential equations. For simplicity of notation only, we take V0 = 0 here. Theorem 19.12 Let B be a G2 Rd -valued enhanced Brownian motion. Let also (Y0ε (ω) : ε ≥ 0) be a family of random elements of Rd , V ε (ω) ≡ (V1ε (ω) , . . . , Vdε (ω) : ε ≥ 0) be a random collection of Lip2 -vector ﬁelds, both deterministic for ε = 0, such that for all δ > 0 ε 2 0 = −∞, lim ε log P max Vi − Vi Lip 2 > δ ε→0 1≤i≤d ε lim ε log P y0 − y00 > δ = −∞. ε→0

Let Y ε (ω) = π (V ε (ω )) (0, Y0ε (ω) ; δ ε B (ω)) denote the unique ω-wise deﬁned RDE solution to dy ε = εV ε (y ε ) dBt

(19.4)

started from Y0ε (ω). Then (Y ε : ε > 0) satisﬁes a large deviation principle in α-H¨ older topology, for any α ∈ [0, 1/2), with good rate function J(y) = inf I(h) : π (V 0 ) 0, y00 ; h = y . Proof. Without loss of generality, assume α ∈ (1/3, 1/2) and take α ∈ (α, 1/2). We know that {δ ε B : ε ≥ 0} satisﬁes a large deviation principle in α -H¨older rough path topology. The assumptions onthe vector ﬁelds and the initial conditions give that y0ε , (Viε )1≤i≤d , δ ε B satisﬁes a large deviation principle in Rd × Lip2 × C 0,α -H¨o l [0, T ] , G2 Rd .

19.4 Large deviations for SDEs driven by other stochastic processes

541

From continuity of (y0 , V, x) → π (V ) (0, y0 ; x), see Theorem 10.41 (or Corollary 10.27 when V ⊂ Lipγ with γ > 2), we conclude with the contraction principle. We can also deal with large deviations at the level of stochastic ﬂows. To this end, recall from Corollary 11.14 that, granted suﬃcient regularity, we have continuity of x → π V (0, ·; x) ∈ Dk (Re ) in the sense of ﬂows of C k -diﬀeomorphisms.3 Using the large deviation principle for (δ ε B), it is an immediate application of the contraction principle to obtain Theorem 19.13 (large deviations for stochastic ﬂows) Assume V ∈ Lipγ + k −1 with γ > 2 and k ∈ {1, 2, . . . }. Then, the Dk (Re )-valued random variable given by π V (0, ·; δ ε B), i.e. the stochastic ﬂow of the Stratonovich d i equation dY ε = i=1 Vi (Y ) ◦ εdB , satisﬁes a large deviation principle with good rate function J (ϕ) = inf I (h) : π (V ) (0, ·; h) = ϕ ∈ Dk (Re ) .

19.4 Large deviations for SDEs driven by other stochastic processes Using the large deviation results for enhanced Gaussian and Markov processes established in Section 15.7 resp. Section 16.7, we can generalize the previous section to RDEs driven by Gaussian and Markovian signals. The proofs are the same: Proposition 19.14 Assume that (i) X = X 1 , . . . , X d is a centred continuous Gaussian process on [0, 1] with independent components; (ii) H denotes the Cameron–Martin space associated with X; (iii) the covariance of X is of ﬁnite ρ-variation dominated by some 2D control ω, for some ρ ∈ [1, 2); (iv) X denotes the natural lift of X to a G[2ρ] Rd -valued process (with geometric rough sample paths); (v) V = (V1 , . . . , Vd ) is a collection of Lipγ -vector ﬁelds on Re , with γ > 2ρ; (vi) Y ε = π (V ) (0, y0 ; δ ε X) is the RDE solution to dY ε = εV (Y ε ) dX, Y (0) = y0 ∈ Re . Then, for any p > 2ρ, (Y ε : ε > 0) satisﬁes a large deviation principle in p-variation topology, with good rate given by 1 2 |h|H : π (V ) (0, y0 ; h) = y J (y) = inf 2 2

where we agree that |h|H = +∞ when h ∈ / H. 3 The

Polish space Dk (Re ) was deﬁned in Section 11.5.

Support theorem and large deviations

542

If ω is H¨ older-dominated, then the above large deviation principle also holds in 1/p-H¨ older topology. Proposition 19.15 Assume that (i) X = X 1 , . . . , X d is a Markov process with uniformly elliptic generator d in divergence form, 12 i,j =1 ∂i aij (·) ∂j · understood in a weak sense (i.e. via Dirichlet forms) where a ∈ Ξ1,d (Λ), that is, measurable, symmetric and Λ−1 I ≤ a ≤ ΛI in the sense of positive deﬁnite matrices; (ii) X denotes the natural lift of X to a G2 Rd -valued process (with geometric rough sample paths); (iii) V = (V1 , . . . , Vd ) is a collection of Lip2 -vector ﬁelds on Re ; (iv) Y ε = π (V ) (0, y0 ; Xε ) is the RDE solution to dY ε = V (Y ε ) dXε , Y (0) = y0 ∈ Re where Xε (·) ≡ X (ε·). older topology, Then, (Y ε : ε > 0) satisﬁes a large deviation principle in α-H¨ for any α ∈ [0, 1/2), with good rate function J(y) = inf I a (h) : π (V ) (0, y0 ; h) = y where

da ht i , ht i + 1 2 1 sup I (h) = 2 D ⊂D[0,T ] |ti+1 − ti | a

i:t i ∈D

and d is the intrinsic distance on Rd associated with a. a

It is clear that Propositions 19.14 and 19.15 can also be formulated for stochastic ﬂows or with random vector ﬁelds, with initial conditions along the lines of Theorems 19.13 and 19.13. Inclusion of drift-vector ﬁelds is a similarly straightforward matter.

19.5 Support theorem and large deviations for a class of SPDEs Let us return to the study of SPDEs of the form du = F t, x, Du, D2 u dt + Du (t, x) · V (x) ◦ dB, u (0, ·) = u0 ∈ BUC (Rn ) ,

(19.5)

understood as RPDE (cf. Sections 11.3 and 17.7), du = F t, x, Du, D2 u dt + Du (t, x) · V (x) dB (ω) with (enhanced) Brownian noise signal B (ω) = exp (B, A). To this end we assume that F = F (t, x, p, X) ∈ C ([0, T ] , Rn , Rn , S n ) is a degenerate

19.5 Support theorem and large deviations for a class of SPDEs

543

elliptic and Φ(3) -invariant comparison (as discussed in detail and with examples in Section 11.3). We also assume that V = (V1 , . . . , Vd ) ⊂ Lipγ , γ > 4. Under these assumptions we saw that (u0 , B) → u ∈ BUC ([0, T ] × Rn ) is continuous.4 The following two theorems concerning large deviations and support descriptions for such SPDEs are then proved, mutatis mutandis, with the arguments that we have already used for SDEs. We have Theorem 19.16 (large deviations for stochastic partial diﬀerential equations) Let Uε = Uε (t, x; B (ω)) denote the (ω-wise unique BUC) solution to dUε = F t, x, DUε , D2 Uε dt + DUε (t, x) · V (x) ◦ εdB, (19.6) u (0, ·) = u0 ∈ BUC (Rn ) . Then the family (Uε : ε > 0) of BUC ([0, T ] × Rn )-valued random variables satisﬁes a large deviation principle with good rate function J (v) = inf

h∈H

I (h) : v = uh

where H = W01,2 [0, T ] , Rd and uh is the unique BUC ([0, T ] × Rn )solution5 to duh uh (0, ·)

=

F t, x, Duh , D2 uh dt + Duh (t, x) · V (x) dh,

= u0 ∈ BUC (Rn ) .

Theorem 19.17 (support theorem for stochastic partial diﬀerential equations) Let U = u (t, x; B (ω)) be the (ω-wise unique BUC) solution to du u (0, ·)

= F t, x, Du, D2 u dt + Du (t, x) · V (x) ◦ dB, = u0 ∈ BUC (Rn ) .

Then, for any δ > 0, U − uh < δ lim P ∞;[0,T ] ε→0

(19.7)

|B − h|∞;[0,T ] < ε → 1.

(The Euclidean norm is used for conditioning |B − h|∞;[0,T ] < ε.) In particular, the topological support of the law of U , viewed as a Borel measure on BUC ([0, T ] × Rn ), is precisely {uh : h ∈ H}, where the closure is with respect to locally uniform convergence. 4 Unless otherwise stated, BUC spaces are equipped with the topology of locally uniform convergence. 5 In viscosity sense, cf. Section 11.3.

544

Support theorem and large deviations

19.6 Comments The rough path approach to the Stroock–Varadhan support description [161] (see Aida et al. [2], Ben Arous et al. [12], Millet and Sanz-Sol´e [128]) and the Freidlin–Wentzell large deviation estimates (e.g. Dembo and Zeitouni [41], Deuschel and Stroock [42] and the references therein) was ﬁrst carried out by Ledoux et al. [101], by establishing the relevant support and large deviation properties for EBM (in p-variation rough path topology, p > 2). Our Theorem 19.1 is based on the (conditional) support result for EBM in H¨ older rough path topology, as obtained in Section 13.7 (which itself follows Friz et al. [57], see comments at the end of Chapter 13). Theorem 19.5 on the support of stochastic ﬂows is known (e.g. Kunita [96]), although our (rough path) proof appears to be new. As seen in this chapter, the “rough path” pattern of proof for these support theorems extends without changes to other (Gaussian, Markovian) driving signals, for which support descriptions in rough path topology are available. Such results were obtained in Sections 15.8 and 16.8, and we refer to the comments sections in these chapters for pointers to the literature. Support theorems for (simple) diﬀerential equations driven by Gaussian processes have been used in a ﬁnancial context to construct markets without arbitrage under transaction costs (Guasoni [73]). It will also play a role when discussing RDEs driven by Gaussian signals under H¨ ormander’s condition, to be discussed in Section 20.4. Theorem 19.9 is a classical result in the theory of large deviations (e.g. Baldi et al. [5], Dembo and Zeitouni [41], Deuschel and Stroock [42]); so are large deviation results for anticipating SDEs (Millet et al. [127]) (with a rough path proof, cf. Theorem 19.12, taken from Coutin et al. [30]) and stochastic ﬂows (Ben Arous and Castell [11]) (the rough path proof of Theorem 19.13 is new). Support theorems for classes of (linear) stochastic diﬀerential equations appear in Gy¨ ongy [79] and also in Kunita [96]. We are unaware of large deviation results for SPDEs of the type in Theorem 19.16.

20 Malliavin calculus for RDEs We consider stochastic diﬀerential equations driven by a d-dimensional Gaussian process in the rough path sense, cf. Section 17.3. Examples to have in mind include Brownian motion, the Ornstein–Uhlenbeck process, fractional Brownian motion with Hurst parameter H > 1/4 and various (Brownian or other Gaussian) bridge processes. Let us note that if the driving signal is also a semi-martingale (e.g. in the case of Brownian motion or the Ornstein–Uhlenbeck process), it follows from Theorem 17.3 that we actually work with classical stochastic diﬀerential equations in the Stratonovich sense. The (driving) Gaussian process induces a Gaussian measure on C ([0, 1] , Rd and can be viewed as an abstract Wiener space, which serves as the underlying probability space on which the enhanced Gaussian process was constructed, cf. Section 15.3.3. Solving a rough diﬀerential equation thus yields an (abstract) Wiener functional and is, a priori, accessible to methods of Malliavin calculus. In particular, we shall see in this chapter that, subject to certain non-degeneracy conditions, solutions to stochastic diﬀerential equations driven by Gaussian processes in the rough path sense admit a density with respect to the Lebesgue measure.

20.1 H-regularity of RDE solutions We assume X = X 1 , . . . , X d is a centred continuous Gaussian process on [0, T ] with independent components. The associated Cameron–Martin space is denoted by H ⊂ C [0, T ] , Rd . Recall from Proposition 15.7 that H → C ρ-var [0, T ] , Rd if we assume that covariance of X is of ﬁnite ρ-variation in 2D sense. Let us also recall that ρ < 2 is a suﬃcient (and essentially sharp) condition for X to admit a natural enhancement X to a geometric p-rough path, p ∈ (2ρ, 4). It is important to understand perturbations of X in Cameron–Martin directions. More speciﬁcally, having realized X as the coordinate process on the path space, Xt (ω) = ω t , we want to understand X (ω + h) . It is clear from the Cameron–Martin theorem that, for every ﬁxed h ∈ H, ω → X (ω + h) is a well-deﬁned Wiener functional. On the other hand, the formal computation (X + h) ⊗ d (X + h) = X ⊗ dX + h ⊗ dX + X ⊗ dh + h ⊗ dh

546

Malliavin calculus for RDEs

suggests that X (ω + h) can be expressed in terms of X (ω) and crossintegrals of X and h. Under the standing assumption that ρ < 2, the last integral h ⊗ dh is obviously a well-deﬁned Young integral. On the other hand, the integral h ⊗ dX

(20.1)

may not be a (pathwise deﬁned) Young integral since 1/ρ + 1/p ≯ 1, in general when ρ < 2 and p > 2ρ. The following condition, already encountered in Section 15.8, is designed to ensure that (20.1) does make sense as a Young integral. Condition 20.1 Let X = X 1 , . . . , X d be a centred continuous Gaussian process on [0, T ] with independent components which admits a natural lift in the sense of Section 15.3.3 to a (random) geometric p-rough path X. We assume that H has complementary Young regularity to X, by which we mean that H → C q -var [0, T ] , Rd for some q ≥ 1 with 1/p + 1/q > 1.

For instance, Condition 20.1 is satisﬁed if the covariance of X has ﬁnite ρ-variation for some ρ < 3/2. This covers in particular Brownian motion (where we can take p = 2 + ε and q = 1) and fractional Brownian motion with H > 1/3. In fact, thanks to a certain Besov regularity of HH , the Cameron–Martin space associated with fractional Brownian motion, we can also cover the regime H ∈ (1/4, 1/3], despite the fact (cf. Proposition 15.5) that H ∈ (1/3, 1/4) corresponds to ρ ∈ (3/2, 2). Part (i) of Exercise 20.2 below gives a hint of what happens in the case ρ ∈ [3/2, 2). Exercise 20.2 Throughout this exercise, assume X is a centred continuous Gaussian process on [0, T ] with independent components, with covariance of ﬁnite ρ-variation in 2D sense. (i) Assuming ρ ∈ [1, 2), show that (20.1) makes sense as a Young–Wiener integral in the sense of Proposition 15.39. (ii) Assuming ρ ∈ [1, 3/2), show that Condition 20.1 is satisﬁed (in particular, (20.1) makes sense as a classical Young integral). (iii) Consider d-dimensional fractional Brownian motion with Hurst parameter H. Using the fact (cf. Remark 15.10) that −1

HH → C q -var for any q > (H + 1/2)

show that, for any H > 1/4, Condition 20.1 is satisﬁed. Exercise 20.3 Let h ∈ C ρ-var ([0, T ] , R) and X be a real-valued, centred continuous Gaussian process with covariance R = R (s, t) of ﬁnite ρvariation in 2D sense. Assume ρ ∈ [1, 2) and show that the Young–Wiener

20.1 H-regularity of RDE solutions

integral

547

T

hdX 0

is a Gaussian random variable with zero-mean and variance given by the 2D Young integral T T h (s) h (t) dR (s, t) . 0

0

A convenient consequence of Condition 20.1 is the possibility of considering X (ω + h) simultaneously for all h ∈ H; in contrast to the general case, where ω → X (ω + h) is only deﬁnent up to h-dependent null-sets. For the reader’s convenience, we now recall Lemma 15.58. Lemma 20.4 Assume X is a Gaussian process which satisﬁes Condition 20.1. Then, writing X for the natural lift of X, the event {ω : X (ω + h) ≡ Th X (ω) for all h ∈ H}

(20.2)

has probability one. Proposition 20.5 Assume X = X 1 , . . . , X d is a Gaussian process which satisﬁes Condition 20.1 and write X for its natural lift, a (random) geometric p-rough path X. Assume V = (V1 , . . . , Vd ) ⊂ Lipγ (Re ) with γ > p. Then, the (unique) RDE solution to dY = V (Y ) dX, Y0 = y0 ∈ Re is almost surely continuously H-diﬀerentiable; i.e. for almost every ω, the map h ∈ H →π (V ) (0, ·; X (ω + h)) ∈ C p-var ([0, T ], Re ) is continuously diﬀerentiable in Fr´echet sense. In particular, the Re -valued Wiener functional Yt = Yt (ω) = π (V ) (0, y0 ; X (ω))t admits an H-valued derivative DYt = DYt (ω) with the property that, with probability one, ∀h ∈ H : Dh Yt := !DYt , h"H =

t d

X Jt←s (Vj (Ys )) dhjs

(20.3)

0 j =1

X X where Jt←s = Jt←s (ω) denotes the Jacobian of π (V ) (s, ·; X (ω))t . (The integral above makes sense as a Young integral since the integrand has ﬁnite p-variation regularity.)

Malliavin calculus for RDEs

548

Proof. By assumption, H → C q -var [0, T ] , Rd , with 1/q + 1/p > 1, the embedding is (trivially) Fr´echet smooth. On the other hand, for any ω in a set of full measure on which (20.2) holds we have π (V ) (0, y0 ; Th X (ω)) = π (V ) (0, y0 ; X (ω + h)) . Using Fr´echet regularity of the Itˆ o map, as detailed in Section 11.1.2, we see that h ∈ C q -var → π (V ) (0, y0 ; Th X (ω)) ∈ C p-var ([0, T ], Re ) , and hence also h ∈ H →π (V ) (0, y0 ; X (ω + h)) ∈ C p-var ([0, T ], Re ) , must be continuously H-diﬀerentiable. At last, time-t evaluation on C p-var ([0, T ], Re ) is trivially Fr´echet smooth, so that h ∈ H →π (V ) (0, y0 ; X (ω + h))t ∈ Re is also continuously H-diﬀerentiable. The representation (20.3) then follows from the fact that, with probability one, ∀h ∈ H : π (V ) (s, ·; X (ω + h))t = π (V ) (s, ·; Th X (ω))t and Duhamel’s principle holds, as discussed in Exercise 11.9. The proof is now ﬁnished. Exercise 20.6 What will be needed in the sequel is the conclusion from Proposition 20.5 that Yt is a.s. continuously H-diﬀerentiable with explicit representation given in (20.3). One can arrive at this conclusion without relying on the Fr´echet smoothness results of Section 11.1.2 but only relying on some knowledge about directional derivatives. To this end, recall from Section 11.1.1 that, given a geometric p-rough path x , q ≥ 1 : 1/p+1/q > 1 and V ⊂ Lipγ (Re ) , γ > p, h ∈ C q -var [0, T ] , Rd → π (V ) (0, y0 ; Th x)t ∈ Re d admits directional derivatives Dh Yt := dε π (V ) (0, y0 ; Tεh x)t ε=0 with the representation formula Dh Yt =

t d

Jt←s (Vi (Ys )) dhis ,

0 i=1

where Jt←s is the Jacobian of π (V ) (s, ·; x)t . (i) Use this representation formula to deduce the existence of an H-valued derivative DY so that Dh Yt = !DY, h"H . (ii) Use continuity of the Itˆ o–Lyons map to show that Yt is a.s. continuously H-diﬀerentiable.

20.2 Non-degenerate Gaussian driving signals

549

20.2 Non-degenerate Gaussian driving signals We 1 remain din the framework of the previous section. In particular, X = X , . . . , X is again a centred continuous Gaussian process on [0, T ] with independent components which admits a lift X to a (random) geometric p-rough path. The overall aim of this chapter is to ﬁnd suﬃcient criteria on the process X and vector ﬁelds V = (V1 , . . . , Vd ) so that, for every t ∈ (0, T ], the Re -valued random variable π (V ) (0, y0 ; X (ω))t admits a density with respect to the Lebesgue measure on Re . To this end, Condition 20.1 on the regularity of the Cameron–Martin space, namely H → C q -var [0, T ] , Rd , 1/p + 1/q > 1, will be in force throughout. As a simple consequence, thanks to Young’s inequality, we have C p-var [0, T ] , Rd → H∗ ∼ = H → C q -var [0, T ] , Rd , where every f ∈ C p-var [0, T ] , Rd is viewed as an element in H∗ via

T

f dh ≡

h ∈ H → 0

d k =1

T

fk dhk .

0

Condition 20.7 We assume non-degeneracy of the pro Gaussian cess X on [0, T ] in the sense that for any f ∈ C p-var [0, T ] , Rd ,

T

f dh = 0∀h ∈ H

=⇒ f ≡ 0.

0

(Note that non-degeneracy on [0, T ] implies non-degeneracy on [0, t] for any t ∈ (0, T ].) It is instructive to see how this condition rules out the Brownian bridge returning to the origin at time T or earlier; a Brownian bridge which returns to zero after time T is allowed. The Ornstein–Uhlenbeck process is another example for which Condition 20.7 is satisﬁed; and so is fractional Brownian motion for any value of its Hurst parameter H, simply because C0∞ [0, T ] , Rd ⊂ HH , as was pointed out in (15.5), Remark 15.10, and thanks to

Malliavin calculus for RDEs

550

Exercise 20.8 Show that Condition 20.7 is satisﬁed if every smooth path (ht : t ∈ [0, T ]), possibly pinned at zero, is contained in H. ˙ ∈ C ∞ [0, T ] , Rd , hence Solution. We see that f is orthogonal to any h must be zero in L2 [0, T ] , Rd and thus is identically equal to zero by continuity).

T Lemma 20.9 The requirement that 0 f dh = 0∀h ∈ H in Condition 20.7 can be relaxed to the quantiﬁer “for all h in some orthonormal basis of H”. Proof. It suﬃces to check that $ #

#

T

f dh = 0∀h ∈ H 0

⇔

$

T

f dhk = 0∀k ∈ N 0

where (hk ) is an orthonormal basis for H. Only the “⇐=” direction is

T not-trivial. Assuming 0 f dhk = 0 for all k implies

T

f dh[n ] = 0 for all n, 0

n

where h[n ] ≡ k =1 !hk , h" hk is the Fourier expansion of h. It obviously converges in H (and hence also in C q -var ) to h and we conclude by continuity of the Young integral.

20.3 Densities for RDEs under ellipticity conditions We have the following density result for RDEs driven by a Gaussian rough path X. Theorem 20.10 Let X = X 1 , . . . , X d be a centred continuous Gaussian process on [0, T ] with independent components which admits a natural lift in the sense of Section 15.3.3 to a (random) geometric p-rough path X. Assume that (i) H has complementary Young regularity to X, i.e. H → C q -var ([0, T ] , Rd with 1/q + 1/p > 1; (ii) X is non-degenerate in the sense of Condition 20.7; (iii) y0 ∈ Re is a ﬁxed “starting” point; (iv) there are vector ﬁelds V = (V1 , . . . , Vd ) ⊂ Lipγ (Re ) , γ > p, which satisfy the following ellipticity condition at the starting point, (E) : span {V1 , . . . , Vd }|y 0 = Ty 0 Re ∼ = Re .

20.3 Densities for RDEs under ellipticity conditions

551

Then, for every t ∈ (0, T ], the Re -valued RDE solution π (V ) (0, y0 ; X (ω))t admits a density with respect to the Lebesgue measure on Re . Proof. Fix t ∈ (0, T ]. From Proposition 20.5 we know that the Re -valued Wiener functional Yt (ω) := π (V ) (0, y0 ; X (ω))t is a.s. continuously H-diﬀerentiable. By a well-known criterion due to Bouleau–Hirsch, cf. Section D.5 of Appendix D, all we have to do is show that the so-called Malliavin covariance matrix ? > ∈ Re×e σ t (ω) := DYti , DYtj H i,j =1,...,e

is invertible with probability one. To see this, assume there exists a (random) vector v ∈ Re which annihilates the quadratic form σ t . Then1 e 2 i 0 = v σt v = vi DYt T

and so v T DYt ≡

H

i=1

e

vi DYti ∈ 0 ∈ H.

i=1

Using the representation formula (20.3) this says precisely that ∀h ∈ H : v T Dh Yt =

t d

X v T Jt←s (Vj (Ys )) dhjs = 0,

(20.4)

0 j =1

where the last integral makes sense as a Young integral since the (continuous) integrand has ﬁnite p-variation regularity. Noting that the nondegeneracy condition on [0, T ] implies the same non-degeneracy condition on [0, t], we see that the integrand in (20.4) must be zero on [0, t] and evaluation at time 0 shows that for all j = 1, . . . , d, X (Vj (y0 )) = 0. v T Jt←0 X is orthogonal to Vj (y0 ) , j = 1, . . . , d It follows that the vector v T Jt←0 X and hence zero. Since Jt←0 is invertible, this follows immediately from the chain rule and

− = Id| where ← − (·) = x (t − ·) , x x π (V ) 0, π (V ) (0, ·; x)t ; ← Re t we see that v = 0. The proof is ﬁnished. 1 Upper

T denotes the transpose of a vector or matrix.

Malliavin calculus for RDEs

552

> Exercise 20.11 Let σ t (ω) :=

DYti , DYtj

? H i,j =1,...,e

∈ Re×e denote

the Malliavin covariance matrix of the RDE solution Yt ≡ π (V ) (0, y0 ; X (ω)), where X (ω) is the lift of some Gaussian process X 1 , . . . , X d with covariance of ﬁnite ρ-variation for ρ ∈ [1, 3/2). Show that σt =

d t 0

k =1

t

X X Jt←s (Vk (Ys )) ⊗ Jt←s (Vk (Ys )) dRX k (s, s )

0

where the right-hand side is a well-deﬁned 2D Young integral. Let p ∈ ˆ (X (ω)), where σ ˆ is a continuous map (2ρ, 3) and show that σ t (ω) = σ from C p-var [0, T ] , G2 Rd to Re×e . (k ) Solution. Let hn : n be an orthonormal basis of H(k ) , the Cameron– (k ) Martin space associated with X k . It follows that hn : n=1, 2, . . . ; k = 1, . . . , d) is an orthonormal basis of H = ⊕di=1 H(k ) where we identify  (1)  hn  0  (1) (1)  hn ∈ H ≡   ...  ∈ H 0 and similarly for k = 2, . . . , d. From Parseval’s identity, ? > DYti , DYtj σt = H i,j =1,...,e ? ? > > (k ) ) DYt , hn = ⊗ DYt , h(k n H

n ,k

=

n

k

=

t 0

k

H

t X Jt←s

(Vk (Ys )) dhn(k,s)

0

0

t X ) Jt←s (Vk (Ys )) dh(k n ,s

0 t

X X (k ) Jt←s (Vk (Ys )) ⊗ Jt←s (s, s ) . (Vk (Ys )) dR

0

For the last step we used the fact that T T f dhn gdhn = n

⊗

0

0

T

T

f (s) g (t) dR (s, t) 0

whenever f = f (t, ω) and g are such that the integrals are a.s. well-deﬁned Young integrals. We then conclude with R (s, t) = E (Xs Xt ) and the L2 expansion of the Gaussian process X, ξ (hn ) hn (t) X (t) = n

where the ξ (hn ) form an IID family of standard Gaussians.

20.4 Densities for RDEs under H¨ ormander’s condition

553

20.4 Densities for RDEs under H¨ormander’s condition In the case of driving Brownian motion, it is well known that solutions to SDEs of the form d Vi (Y ) dB i , dY = i=1

started at y0 ∈ Re , admit a (smooth) density provided the vector ﬁelds, now ormander’s condition at the starting assusmed to be C ∞ -bounded, satisfy H¨ point. By this we mean that the linear span of {V1 , . . . , Vd } and all iterated Lie brackets at y0 have full span. (There is a well-known extension to SDEs with drift-vector ﬁeld V0 in which case (H) is replaced by the condition of full span by {V1 , . . . , Vd } and all iterated Lie brackets of {V0 , V1 , . . . , Vd } at y0 .) The aim of this section is to establish a similar density result for RDEs driven by a Gaussian rough path. Our focus will be the drift-free case, i.e. when V0 ≡ 0. Also, the conditions on the underlying Gaussian driving signals are somewhat more involved than those required in Theorem 20.10, but still remain checkable for many familiar Gaussian processes. We have Theorem 20.12 Let Xt1 , . . . , Xtd = (Xt : t ∈ [0, T ]) be a continuous, centred Gaussian process with independent components X 1 , . . . , X d . Assume X satisﬁes the conditions listed in Section 20.4.1 below. (In particular, X is assumed to admit a natural lift X to a random geometric rough path.) Let V = (V1 , . . . , Vd ) be a collection of C ∞ -bounded vector ﬁelds on ormander’s condition Re which satisﬁes H¨ (H) : Lie [V1 , . . . , Vd ]|y 0 = Ty 0 Re ∼ = Re at some point y0 ∈ Re . Then the random RDE solution Yt (ω) = π (V ) (0, y0 ; X (ω))t admits a density with respect to the Lebesgue measure on Re for all times t ∈ (0, T ]. Note that when (Xt : t ∈ [0, T ]) happens to be a semi-martingale (such as a Brownian motion, an Ornstein–Uhlenbeck process, a Brownian bridge returning to the origin after time T , etc.), Theorem 20.12 really yields information about classical solutions to the Stratonovich SDE dY =

d i=1

Vi (Y ) ◦ dB i , Y0 = y0 ∈ Re .

Malliavin calculus for RDEs

554

20.4.1

Conditions on the Gaussian process

We assume that X = X 1 , . . . , X d is a centred continuous Gaussian process on [0, T ] with independent components which admits a natural lift in the sense of Section 15.3.3. Recall that this requires ∃ρ ∈ [1, 2) : |R|ρ-var;[0,1] 2 < ∞ where R denotes the covariance function of X. Equivalently, setting H = 1/ (2ρ), this condition may be stated as ∃H ∈ (1/4, 1/2] : |R|(1/2H )-var;[0,1] 2 < ∞. Given such a process X one can, cf. Exercise 15.36, ﬁnd a determinstic ˜ the covariance function of time-change τ : [0, T ] → [0, T ] such that R, ˜ X = X ◦ τ , satisﬁes ˜ 2H < (const) × |t − s| . (20.5) ∀s < t in [0, T ] : R 2 (1/2H )-var;[s,t]

Since the conclusion of Theorem 20.12 is invariant under such a time-change we can in fact assume, without loss of generality, that the covariance of X itself has H¨older-dominated 1/ (2H)-variation in the sense of (20.5). It now follows from Theorem 15.33 that the natural lift X has 1/p-H¨older sample paths for any p > 1/H and also that   / 02  , X ) d (X s t  < ∞. sup E exp η H 0≤s< t≤T |t − s| Although the parameter H is reminiscent of fractional Brownian motion with Hurst parameter H we insist that, up to this point, our assumption covers every enhanced Gaussian process (up to irrelevant deterministic time-change). Condition 20.13 H has complementary Young regularity to X, i.e. H → C q -var [0, T ] , Rd with 1/q + 1/p > 1. Condition 20.14 X is non-degenerate on [0, T ]; recall that this means that for any f ∈ C p-var [0, T ] , Rd ,

T

f dh ≡ 0

d k =1

0

T

fk dh = 0∀h ∈ H k

=⇒ f ≡ 0.

Condition 20.15 X obeys a Blumenthal zero–one law in the sense that the germ σ-algebra ∩t> 0 σ (Xs : s ∈ [0, t]) contains only events of prob ability zero or one.

20.4 Densities for RDEs under H¨ ormander’s condition

555

Condition 20.16 Let X denote the natural lift of X and assume that for all N ≥ [p] , the step-N Lyons lift of X has H-rescaled full support in the small time limit, by which we mean that for all g ∈ GN Rd and for all ε > 0, lim inf P d δ t −H SN (X)0,t , g < ε > 0.

t→0

Some remarks are in order. • Conditions 20.13 and 20.14 were already in force in our “elliptic” discussion and are just repeated for the sake of completeness. • Condition 20.15 holds whenever X can be written as an adapted functional of Brownian motion. This includes fractional Brownian motion and more generally all examples in which X has a so-called Volterra presentation2 of the form Xt =

t

K (t, s) dBs

(Itˆ o integral).

0

It also includes (non-Volterra) examples in which (Xt : t ∈ [0, T ]) is given as a strong solution of an SDE driven by Brownian motion, such as a Brownian bridge returning to the origin after time T , say. An example where the 0–1 law fails is given by the random-ray X : t → tBT (ω), in which case the germ-event {ω : dXt (ω) /dt|t=0+ ≥ 0} has probability 1/2. (In fact, sample path diﬀerentiability at 0+ implies non-triviality of the germ σ-algebra; see [46] and the references cited therein.) We observe that the random-ray example (a) is already ruled out by Condition 20.14 and (b) should be ruled out anyway since it does not trigger the bracket phenomenon needed for a H¨ ormander statement. • Condition 20.16 says – in essence – that the driving signal must have, at least approximately and for small times, a fractional scaling behaviour similar to fractional Brownian motion with Hurst parameter H. As will be seen in the following proposition (see also Exercise 20.18), this condition can be elegantly verifed via the support theorem obtained in Section 15.8. Proposition 20.17 Let B H denote d-dimensional fractional Brownian motion with ﬁxed Hurst parameter H ∈ (1/4, 1/2] and consider the lift to a (random) geometric p-rough path, denoted by X = BH , with p ∈ (2, 4). Then it satisﬁes Condition 20.16. 2 See

[40].

Malliavin calculus for RDEs

556

Proof. Let us observe that B H (and then BH ) has ﬁnite 1/p-H¨older sample paths for any p > 1/H. To keep the notation simple, we shall write B, B ˜ = SN (B) . From the support Theorem rather than B H , BH and also set B for Gaussian rough paths, Theorem 15.60, we know that the support of the law of B in p-variation topology is precisely S[p] (H) = C00,p-var [0, T ] , G[p] Rd , using, for instance, the fact that C0∞ [0, T ] , Rd ⊂ H. By continuity of ˜ followed by evaluation of the path at time 1, it the Lyons lift SN : B → B, ˜ 1 has full support; that is, is clear that B ˜ 1 , g < ε > 0. ∀g ∈ GN Rd , ε > 0 : P d B D On the other hand, fractional scaling nH Bt/n : t ≥ 0 = (Bt : t ≥ 0) imD ˜ ˜ 1/n = ˜1, B1 and so, thanks to full support of B plies δ n H B ˜ 1/n , g < ε = P d B ˜ 1 , g < ε > 0. lim inf P d δ n H B n →∞

Exercise 20.18 Show that Condition 20.16 holds for any centred, continuous Gaussian process X which is assumed to be asymptotically comparable to B H in the small time limit in the following sense: (i) there exists a probability space,3 such that X and fractional Brownian motion can be realized jointly as a (2d)-dimensional Gaussian process X, B H = X 1 , B H ;1 , . . . , X d , B H ;d , with X i , B H ;i independent for i = 1, . . . ,d; (ii) the (2d)-dimensional Gaussian process X, B H has covariance of ﬁnite (1/2H)-variation in the 2D sense; (iii) we have (20.6) t−2H |RX −B H |∞;[0,t] 2 → 0 as t → 0 where RX −B H X − BH .

is the covariance of the Rd -valued Gaussian process

Proof. To keep the notation simple, we shall write again B, B rather than B H , BH . Let us also set 1/n = t. By independence of the pairs X i , B i , the covariance matrix RX −B H is diagonal and we focus on one entry. With mild abuse of notation (writing X, B instead of X i , B i . . . ) we have n−2H |RX −B |∞;[0,1/n ] 2 = sup E[nH Xs/n − Bs/n nH Xt/n − Bt/n ] s,t∈[0,1]

3 Eﬀectively,

a Gaussian measure on C [0, T ] , R2 d .

20.4 Densities for RDEs under H¨ ormander’s condition

557

which can be rewritten in terms of the rescaled process X (n ) = nH X·/n , and similarly for B, as " ! (n ) (n ) = RX ( n ) −B ( n ) ∞;[0,1] 2 . sup E Xs(n ) − Bs(n ) Xt − Bt s,t∈[0,1]

By assumption, in particular (20.6) and continuity estimates for Gaussian rough paths obtained in Theorem 15.37, we see that dp-var X(n ) , B(n ) → 0 as n → ∞ in probability. ˜ (n ) = SN X(n ) for ﬁxed N , and simiBy continuity of SN , still writing X larly for B(n ) , we have ˜ (n ) , B ˜ (n ) ≤ dp-var;[0,1] X ˜ (n ) → 0 in probability. ˜ (n ) , B d X 1 1 But then

˜ 1/n , g < ε P d δn H X ˜ (n ) , g < ε = P d X 1 ˜ (n ) , g < ε ˜ (n ) + d B ˜ (n ) , B ≥ P d X 1 1 1 (n ) ˜ , g < ε/2 ≥ P d B 1 ˜ (n ) > ε/2 ˜ (n ) , B −P d X 1 1

and so

˜ 1/n , g < ε ≥ lim inf P d B ˜ (n ) , g < ε/2 lim inf P d δ n H X 1 n →∞

n →∞

and this is positive thanks to Proposition 20.17. Exercise 20.19 Show that Theorem 20.12 applies to the following multidimensional Gaussian driving processes: (i) Brownian motion B; (ii) fractional Brownian motion B H with Hurst parameter, H > 1/4; (iii) the Ornstein–Uhlenbeck process, realized (for instance) by Wiener–Itˆ o integration, t

Xti =

e−(t−r ) dBri with i = 1, . . . , d;

0

(iv) a Brownian bridge returning to zero after time T , e.g. X T +ε with ε > 0 where t BT +ε for t ∈ [0, T ] . XtT + ε := Bt − T +ε

Malliavin calculus for RDEs

558

Solution. (i), (ii) are immediate from the comments made above and Proposition 20.17. (iii) We leave it to the reader to check that H → C 1-var [0, T ] , Rd and the process is non-degenerate on [0, T ]. To see validity of the zero–one law it suﬃces to note that X has Volterra structure. At last, Condition 20.16 is satisﬁed since X is asymptotically comparable to Brownian motion in the small time limit in the sense of Exercise 20.18. Indeed, take r, s ∈ [0, t] and compute, with focus on one non-diagonal entry, RX −B (r, s)

≡ E[ (Xr − Br ) (Xs − Bs )] t = e−(r −s) − 1 e−(r −s) − 1 dr = O t3 . 0

1-var

(iv) Again, H → C [0, T ] , Rd and non-degeneracy on [0, T ] are easy to see. Validity of the zero–one law follows by writing X = X T +ε as a strong solution to an SDE driven by Brownian motion with (well-behaved) drift (on [0, T ]). At last, take r, s ∈ [0, t] ⊂ [0, 1] and compute RX −B (r, s)

≡

E[ (Xr − Br ) (Xs − Bs )] t t3 BT +ε BT +ε s dr ≤ = r T +ε T +ε T +ε 0

so that, as one expects, the Brownian bridge is asymptotically comparable to Brownian motion in the small time limit.

20.4.2 Taylor expansions for rough diﬀerential equations Given a smooth vector ﬁeld W and smooth driving signal x (·) for the ODE dy = V (y) dx, it follows from basic calculus that t x x (W (ytx )) = W (y0 ) + J0←s ([Vi , W ] (ysx )) dxis , J0←t 0

where Einstein’s summation convention is used throughout. Iterated use of this leads to the Taylor expansion = W |y 0 + [Vi , W ] |y 0 x1;i 0,t

x (W (ytx )) J0←t

+ [Vi , [Vj , W ]] |y 0 x2;i,j 0,t +... + [Vi 1 , . . . [Vi k , W ]] |y 0 xk0,t;i 1 ,...,i k +··· where we write xk0,t;i 1 ,...,i k

t =

(20.7)

uk

... 0

0

0

u2

dxiu11 . . . dxiukk .

20.4 Densities for RDEs under H¨ ormander’s condition

559

Note that such an expansion makes immediate sense when x is replaced by a weak geometric p-rough path x. In this case, xk0,t = π k (Sk (x0,t )) where Sk denotes the Lyons lift. Remainder estimates can be obtained via x (W (ytx )) as a solution to a Euler estimates, at least when expressing J0←t diﬀerential equation (ODE resp. RDE) of the form dz = Vˆ (z) dx. (Cf. Proposition 10.3 and Corollary 10.15 and the resulting stochastic corollaries of Section 18.1.) This is accomplished by setting x x , J0←t (W (ytx ))) ∈ Re ⊕ Re×e ⊕ Re . z := z 1 , z 2 , z 3 := (y x , J0←t x Noting that J0←t (W (ytx )) is given by z 2 · W z 1 in terms of matrix multiplication we have dz 1 = Vi z 1 dxi dz 2 = −z 2 · DVi z 1 dxi dz 3 = dz 2 · W z 1 + z 2 · d W z 1 = z 2 · −DVi z 1 · W z 1 + DW z 1 · Vi z 1 dxi =

z 2 · [Vi , W ] |z 1 dxi

started from z0 = (y0 , I, W (y0 )) where I denotes the identity matrix in Re×e and we see that Vˆ is given by   Vi z 1 1 2 3 (20.8) Vˆi z , z , z =  −z 2 · DVi z 1  , i = 1, . . . , d. z 2 · [Vi , W ] (z1 ) We now consider the corresponding rough diﬀerential equation, dz = Vˆ (z) dx, where x is a weak geometric p-rough path. From the very construction of Vˆ it is clear that an expansion of the form zt = z0 + Vˆ (z0 ) x10,t + Vˆ Vˆ (z0 ) x20,t + . . . , after projection to the third component of zt = zt1 , zt2 , zt3 , yields precisely the expansion (20.7). To be more precise, let us recall that, given smooth d ⊗k vector ﬁelds V = (V1 , . . . , Vd ) on Re , an element g ∈ ⊕m and k =0 R y ∈ Re we write E(V ) (y, g) :=

m

gk ;i 1 ,...,i k Vi 1 · · · Vi k I (y) .

k =1 i 1 ,...,i k ∈{1,...,d}

(Here I denotes the identity function on Re and vector ﬁelds are identiﬁed with ﬁrst-order diﬀerential operators.) In a similar spirit, given another suﬃciently smooth vector ﬁeld W , we ﬁrst set [Vi 1 , Vi 2 , . . . , Vi k , W ] := [Vi 1 , [Vi 2 , . . . [Vi k , W ] . . .]]

Malliavin calculus for RDEs

560

and then

gk · [V, . . . , V, W ] |y 0 :=

gk ,i 1 ,...,i k [Vi 1 , Vi 2 , . . . , Vi k , W ] I (y0 ) (20.9)

i 1 ,...,i k ∈{1,...,d}

with the convention that g0 · Vk = Vk . We can then state the following lemma. Lemma p3 : Re ⊕ Re×e ⊕ Re → Re for the projection given Write 1 2 20.20 3 3 by z , z , z → z . Let f be a smooth function on Re lifted to fˆ = f ◦ p3 , a smooth function on Re ⊕ Re×e ⊕ Re . With the vector ﬁelds Vˆ as deﬁned in (20.8) and z0 = (y0 , I, W (y0 )), we have Vˆi 1 · · · Vˆi N |z 0 fˆ = [Vi 1 , . . . , Vi N , W ] |y 0 f. As a consequence, for any g ∈ Gm Rd , !

"

p3 E(Vˆ ) (z0 , g) = W |y 0 −

m

π k (g) · [V, . . . , V, W ] |y 0 .

k =1

Proof. Taylor expansion of the evolution equation of z 3 (t) shows that Vˆi 1 · · · Vˆi N |z 0 f = [Vi 1 , . . . , Vi N , W ] |y 0 f , as required. ∼ Re with |a| = 1. Let H ∈ (1/4, 1/2] Corollary 20.21 Fix a ∈ Ty 0 Re = and X be a Gaussian rough path with the covariance of the underlying Gaussian process satisfying (20.5). Then, writing y for the solution to the random RDE dy = V (y) dX started at y0 , and J for the Jacobian of its ﬂow, we have that for all ε > 0 lim P aT J0←t (W (yt )) n →∞  m ε − aT Xk0,t · [V, . . . , V, W ] |y 0 > n−m H  = 0. 2 k =0

t=1/n

Proof. As discussed at the beginning of Section 20.4.1, assumption (20.5) implies   / 0 2  , X ) d (X s t  < ∞ sup E exp η H 0≤s< t≤T |t − s| which was the standing assumption for remainder estimates of Azencotttype established in Theorem 18.1. Thanks to |a| = 1 and the previous lemma, applied with g = Sm (X0,t ), we can write   m k ε T T −m H  a X0,t · [V, . . . , V, W ] |y 0 > n P  a J0←t (W (yt )) − 2 k =0 t=1/n ! ε " −m H ≤ P π (Vˆ ) (0, z0 , X)0,1/n − E(Vˆ ) z0 , Sm (X)0,1/n > n . 2

20.4 Densities for RDEs under H¨ ormander’s condition

561

The vector ﬁelds Vˆ as deﬁned in (20.8) are smooth but, in general, unbounded. Using the remainder estimates given in Theorem 18.1 in the small time limit (valid for Liplo c -vector ﬁelds) we then obtain (as pointed out explicitly in Example 18.2) the required convergence.

20.4.3 H¨ormander’s condition revisited Let V = (V1 , . . . , Vd ) denote a collection of smooth vector ﬁelds deﬁned in a neighbourhood of y0 ∈ Re . Given a multi-index I = (i1 , . . . , ik ) ∈ k {1, . . . , d} , with length |I| = k, the vector ﬁeld VI is deﬁned by iterated Lie brackets VI := [Vi 1 , Vi 2 , . . . , Vi k ] ≡ [Vi 1 , [Vi 2 , . . . , [Vi k −1 , Vi k ] . . . ].

(20.10)

If W is another smooth vector ﬁeld deﬁned in a neighbourhood of y0 ∈ Re we write4 a · [V, . . . , V, W ] := ai 1 ,...,i k −1 [Vi 1 , Vi 2 , . . . , Vi k −1 , W ].

∈(Rd ) ⊗( k −1 )

length k

i 1 ,...,i k −1 ∈{1,...,d}

Recall that the step-r free nilpotent group with d generators, Gr (Rd ), was realized as a submanifold of the tensor algebra ⊗k . T (r ) Rd ≡ ⊕rk =0 Rd Deﬁnition 20.22 Given r ∈ N we say that condition (H)r holds at y0 ∈ Re if span {VI |y 0 : |I| ≤ r} = Ty 0 Re ∼ (20.11) = Re ; we say that H¨ ormander’s condition (H) is satisﬁed at y0 if (H)r holds for some r ∈ N. d ⊗k An element g ∈ ⊕∞ is called group-like iﬀ for any N ∈ N, k =0 R d ⊗k (π 0 (g) , . . . , π N (g)) ∈ GN Rd ⊂ ⊕N . k =0 R The following result tells us that H¨ ormander’s condition is equivalent to a (seemingly stronger, “H¨ormander-type”) condition that involves only Lie brackets of V contracted against group-like elements. It will be important in carrying out the crucial induction step in the proof of Theorem 20.12. 4 We

introduced this notation in the previous section, cf. (20.9).

Malliavin calculus for RDEs

562

Deﬁnition 20.23 Given r ∈ N we say that condition (HT)r holds at y0 ∈ Re if the linear span of   

  

π k −1 (g) · [V, . . . , V, Vi ] |y 0 : k = 1, . . . , r; i = 1, . . . , d, g ∈ Gr −1 (Rd ) 



 

length k

(20.12)

is full; that is, equal to Ty 0 Re ∼ = Re .

Proposition 20.24 Let r ∈ N and V = (V1 , . . . , Vd ) be a collection of smooth vector ﬁelds deﬁned in a neighbourhood of y0 ∈ Re . Then the (H)r span, by which we mean the linear span of (20.11), equals the (HT)r -span, that is, the linear span of (20.12). In particular, H¨ ormander’s condition (H) is satisﬁed at y0 if and only if the span of (20.12) is full for some r large enough. Proof. We ﬁrst make the trivial observation that (HT)r implies (H)r for any r ∈ N. For the converse, ﬁxing a multi-index I = (i1 , . . . , ik −1 , ik ) of length k ≤ r and writing e1 , . . . , ed for the canonical basis of Rd , we deﬁne g

= g (t1 , . . . , tk −1 ) = exp (t1 ei 1 ) ⊗ · · · ⊗ exp tk −1 ei k −1 ∈ Gr −1 (Rd ) ⊂ T r −1 Rd .

It follows that any π k −1 (g) · [V, . . . , V, Vi k ] |y 0

length k

lies in the (HT)r -span. Now, the (HT)r -span is a closed linear subspace of Ty 0 Re ∼ = Re and so it is clear that any element of the form π k −1 (∂α g) · [V, . . . , V, Vi k ] |y 0 ,

length k

where ∂α stands for any higher-order partial derivative with respect to t1 , . . . , tk −1 , i.e. ∂α =

∂ ∂t1

α 1

...

∂ ∂tk −1

α k −1

with α ∈ (N ∪ {0})

k −1

,

20.4 Densities for RDEs under H¨ ormander’s condition

563

is also in the (HT)r -span for any t1 , . . . , tk −1 and, in particular, when evaluated at t1 = · · · = tk −1 = 0. For the particular choice α = (1, . . . , 1) we have ∂ k −1 g|t =0,....,t k −1 =0 = ei 1 ⊗ · · · ⊗ ei k −1 =: h ∂t1 . . . ∂tk −1 1 where h is an element of T r −1 Rd with the only non-zero entry arising on the (k − 1)th tensor level, i.e. π k −1 (h) = ei 1 ⊗ · · · ⊗ ei k −1 . Thus,

π k −1 (h) · [V, . . . , V, Vi k ] |y 0 = Vi 1 , . . . , Vi k −1 , Vi k |y 0

length k

is in our (HT)r -span. But this says precisely that, for any multi-index I of length k ≤ r, the bracket vector ﬁeld evaluated at y0 , i.e. VI |y 0 , is an element of our (HT)r -span.

20.4.4

Proof of Theorem 20.12

We are now in a position to give the proof of Theorem 20.12. Proof. We ﬁx t ∈ (0, T ]. As usual, it suﬃces to show a.s. invertibility of > σt =

DYti , DYtj

? H i,j =1,...,e

∈ Re×e .

In terms of an orthonormal basis (hn ) of the Cameron–Martin space we can write !DYt , hn "H ⊗ !DYt , hn "H (20.13) σt = n

=

t X Jt←s

(Vk (Ys )) dhkn ,s

⊗

X Jt←s (Vl (Ys )) dhln ,s .

0

n

t

0

(Summation over up–down indices is from here on tacitly assumed.) Invertibility of σ is equivalent to invertibility of the reduced covariance matrix Ct :=

n

0

t

t

X J0←s (Vk (Ys )) dhkn ,s ⊗

X J0←s (Vl (Ys )) dhln ,s , 0

which has the advantage of being adapted, i.e. being σ (Xs : s ∈ [0, t])measurable. We now assume that P (det Ct = 0) > 0

Malliavin calculus for RDEs

564

and will see that this leads to a contradiction with H¨ ormander’s condition. Step 1: Let Ks be the random subspace of Ty 0 Re ∼ = Re , spanned by X J0←r (Vk (Yr )) ; r ∈ [0, s] , k = 1, . . . , d . The subspace K0 + = ∩s> 0 Ks is measurable with respect to the germ σalgebra and by our “0–1 law” assumption, deterministic with probability one. A random time is deﬁned by Θ = inf {s ∈ (0, t] : dim Ks > dim K0 + } ∧ t,

(20.14)

and we note that Θ > 0 a.s. For any vector v ∈ Re we have 2 t T X k v J0←s (Vk (Ys )) dhn ,s . v Ct v = T

0

n

Assuming v T Ct v = 0 implies ∀n :

t X v T J0←s (Vk (Ys )) dhkn ,s = 0

0

and hence, by our non-degeneracy condition on the Gaussian process and Lemma 20.9, X (Vk (Ys )) = 0 v T J0←s for any s ∈ [0, t] and any k = 1, . . . , d, which implies that v is orthogonal to Kt . Therefore, K0 + = Re , otherwise Ks = Re for every s > 0 so that v must be zero, which implies Ct is invertible a.s., in contradiction of our hypothesis. Step 2: We saw that K0 + is a deterministic and linear subspace of Re with strict inclusion K0 + Re In particular, there exists a deterministic vector z ∈ Re \ {0} which is orthogonal to K0 + . We will show that z is orthogonal to all vector ﬁelds and (suitable) brackets evaluated at y0 , thereby contradicting the fact that our vector ﬁelds satisfy H¨ ormander’s condition. By deﬁnition (20.14) of Θ, K0 + ≡ Kt for 0 ≤ t < Θ and so for every k = 1, . . . , d, X (Vk (Yt )) = 0 for t ≤ Θ. (20.15) z T J0←t Observe that, by evaluation at t = 0, this implies z ⊥ span{V1 , . . . , Vd } |y 0 . Step 3: In view of Proposition 20.24, it suﬃces to show that z is orthogonal to all iterated Lie brackets of V = (V1 , . . . , Vd ) contracted against grouplike elements. To this end, we keep k ∈ {1, . . . , d} ﬁxed and make the induction hypothesis I (m − 1) : ∀g group-like, j ≤ m − 1 : z T π j (g) [V, . . . , V ; Vk ]|y 0 = 0.

20.4 Densities for RDEs under H¨ ormander’s condition

565

We can now take the shortest path γ n : [0, 1/n] → Rd such that Sm (γ n ) equals π 1,...,m (g), the projection of g to the free step-m nilpotent group with d generators, denoted Gm Rd . Then |γ n |1-var;[0,1/n ] = π 1,...,m (g)G m (Rd ) < ∞ and the scaled path hn (t) = n−H γ n (t) , H ∈ (0, 1) has length (over the interval [0, 1/n]) proportional to n−H , which tends to 0 as n → ∞. Our plan is to show that n hn h Vk y1/n (20.16) ∀ε > 0 : lim inf P z T J0←1/n < ε/nm H > 0 n →∞

which, since the event involved is deterministic, really says that n m H T hn h n z J0←1/n Vk y1/n <ε holds true for all n ≥ n0 (ε) large enough. Then, sending n → ∞, a Taylor expansion and I (m − 1) shows that the left-hand side converges to T mH n z n π m (Sm (h )) · [V, . . . , V ; Vk ]|y 0 < ε

= π (g) m

and since ε > 0 is arbitrary we showed I (m), which completes the induction step. Step 4: The only thing left to show is (20.16); that is, positivity of lim inf of n hn h Vk y1/n P z T J0←1/n < ε/nm H n T X T hn h mH < ε/n ≥ P z J0←· (Vk (y· )) − z J0←· Vk y· ·=1/n

−P (Θ ≤ 1/n) and since Θ > 0 a.s. it is enough to show that n X hn Vk y·h (Vk (y· )) − z T J0←· lim inf P z T J0←· n →∞

·=1/n

< ε/n

mH

> 0.

Using I (m − 1) + stochastic Taylor expansion (more precisely, Corollary 20.21) this is equivalent to showing positivity of lim inf of n ε mH T m T hn h . < /n P z X0,· [V, . . . , V ; Vk ] − z J0←· Vk y· 2 ·=1/n

Malliavin calculus for RDEs

566

(Let us remark that the assumption Hp < 1 + 1/m needed to apply Corollary 20.21 is satisﬁed thanks to Condition 20.16, part (ii), and the remark that our induction stops when m has reached r, the number of brackets needed in H¨ ormander’s condition.) Rewriting things, we need to show positivity of lim inf of n ε T m H hn h J0←1/n Vk y1/n | 0. − π m (g) < lim inf P z T [V, . . . , V ; Vk ]|y 0 nm H Xm 0,1/n n →∞ 2 But this is implied by Condition 20.16 and so the proof is ﬁnished.

20.5 Comments The bulk of the material of Sections 20.1, 20.2 and 20.3 is taken from Cass et al. [26]. Let us note that H-diﬀerentiability of a Wiener func1,2 -regularity where D1,2 is deﬁned as the subspace of tional implies Dloc 2 L (P) obtained as the closure of “nice” Wiener functionals with respect to F D1 , 2 = |F |L 2 (P) + |DF |L 2 (P,H) . In particular, the H-derivative is then precisely the Malliavin derivative; some details and references on this are given in Section D.5, Appendix D. Exercise 20.11 on the representation of the Malliavian covariance matrix has well-known special cases: in the case of Brownian motion, dR (s, s ) is a Dirac measure on the diagonal {s = s } and the double integral reduces to a (well-known) single-integral expression; in the case of fractional Brownian motion with H > 1/2, 2H −2 dsdt, which is integrable at zero iﬀ H > 1/2. dRH (s, t) ∼ |t − s| (The resulting double-integral representation of the Malliavin covariance is also well known and appears, for instance, in Baudoin and Hairer [9], Saussereau and Nualart [152], Hu and Nualart [87].) Our discussion of RDEs under H¨ ormander’s condition follows closely Cass and Friz [25]. In the case of driving Brownian motion all this is, of course, classical and closely related to H¨ormander’s work on hypoellipticity. Previous works in this direction were focused on driving fractional Brownian motion B H , with Hurst parameter H > 1/2, so that dY = V (Y ) dB H makes sense as a Young diﬀerential equation. A density result under the ellipticity condition on V appeared in Saussereau and Nualart [152]. The deterministic estimate (cf. Exercise 11.10), y 0 ,X p (20.17) ≤ C exp C Xp-var;[0,T ] J·←0 p-var;[0,T ]

20.5 Comments

567

y 0 ,X can be applied in a step-1 setting5 to see that Jt←0 ∈ Lq (P) for all q < ∞ H of all orders, thanks to Gaussian integrability of B (ω) p-var;[0,T ] . This allows us to obtain Lq -estimates on the inverse of the covariance matrix from which one obtains existence of a smooth density. This was carried out, again under the ellipticity condition on V , by Hu and Nualart [87]. Existence of a smooth density under H¨ ormander’s condition was then obtained by Baudoin and Hairer [9], relying on some speciﬁc properties of fractional Brownian motion. At present, the question of how to obtain Lq estimates in the regime ρ ∈ [1, 2) is open. It is worthwhile noting (Friz and Oberhauser [59]) that the deterministic estimate (20.17) is optimal, so that Lq -estimates in the regime for ρ ∈ [1, 2) will require further probabilistic input, presumably in the form of Gaussian chaos integrability.

5 That

is, with p ∈ (1/H, 2) and X p -va r;[0 , T ] = B H (ω) p -va r;[0 , T ] .

Part V

Appendices

Appendix A: Sample path regularity and related topics A.1 Continuous processes as random variables A.1.1 Generalities A stochastic process with values in some measure space (E, E) is a collection of random variables, i.e. measurable maps Xt : (Ω, A, P) → (E, E), indexed by set T. Equivalently, X is a measurable map X : (Ω, A, P) → Tt in Tsome where E T is the space of E-valued functions on T and E T is E ,E the smallest σ-algebra such that all projections π t : E T → E, deﬁned of X is the by f → ft , are measurable maps. The law or distribution image measure PX := X∗ P deﬁned on E T , E T . Since the underlying probability model (Ω, A, P) is usually irrelevant it can be replaced by the canonical model E T , E T , PX and Yt : E T , E T , PX → (E, E) given by f → π t (f ) = ft is the canonical version of the stochastic process X. More precisely, X and Y are versions of each other in the following sense. One says that two processes X and X deﬁned respectively on the probability spaces (Ω, A, P) and (Ω , A , P ), having the same state space (E, E), are versions of each other – or that they are “versions of the same process” – if for any ﬁnite sequence t1 , . . . , tn and sets Ai ∈ E, P [Xt 1 ∈ A1 , . . . , Xt n ∈ An ] = P Xt 1 ∈ A1 , . . . , Xt n ∈ An . Two processes X and X deﬁned on the same probability space are said to be modiﬁcations of each other if P [Xt = Xt ] = 1 for all t. At last, they are said to be indistinguishable if P [Xt = Xt for all t ] = 1. Let us now assume that T = [0, T ] and E is Polish (with E = BE , the Borel σ-algebra). It is natural to ask if C = f ∈ E T : f : T → E is continuous has PX -measure one. Unfortunately, the above set need not be E T -measurable but we can still ask if C has full outer measure. If this is the case then, still writing Yt (f ) = π t (f ) = ft , the measure PX on E T induces a probability measure Q on C, deﬁned on the σ-algebra C ≡ σ (Yt : t ∈ T) = T X ˜ ˜ is any set in E T such that Γ where Γ E ∩ C by setting Q (Γ) := P ˜ ∩ C. Obviously the process Y deﬁned on (C, C, Q), better denoted Γ=Γ

Appendix A

572

by

Y˜t :

(C, C, Q) → (E, E) f → π t (f ) = ft

is another version of X and, moreover, a genuine (C, C)-valued random variable. This version again is deﬁned on a space of functions and is made up of coordinate mappings and will also be referred to as canonical ; we will also write PX or X∗ P instead of Q; this causes no confusion so long as we know the space we work in. We say that a stochastic process X : (Ω, A, P) → E T , E T is continuous if the set of continuous functions from T → E has (outer) measure one. In this case, there is a version of X which is a genuine (C, C)-valued random variable. Let X denote this continuous version. Then X : (Ω, A, P) → (C ([0, T ] , E) , C) , where C is the σ-algebra generated by the coordinate maps in C ([0, T ] , E). On the other hand, C ([0, T ] , E) is a Polish space under the topology induced by uniform distance1 and there is a natural Borel σ-algebra B generated by the open sets. The law of X, i.e. X∗ P, deﬁnes in fact a Borel measure on B. This follows from B = C, which is easy to see. (All coordinate maps π t are continuous, which shows that C ⊂ B. Conversely, C ([0, T ] ,E) has a countable basis for its topology of the form ∩t∈[0,T ]∩Q f : d ft , f˜t < ε ∈ C, f˜ ∈ C ([0, T ] , E). If E has a compatible2 group structure, we can deﬁne increments fs,t = fs−1 ft and then H¨ older metrics of the form d (f0 , g0 ) + dα -H¨o l;[0,T ] (f, g) where dα -H¨o l;[0,T ] (f, g) =

sup 0≤s< t≤T

d (fs,t , gs,t ) . α |t − s|

ol The resulting path space C α -H¨ d([0, T ] , E) is not separable in general but, d N at least when E = R or G R , there are Polish subspaces (cf. Sections 5.3 and 8.6), denoted by C 0,α -H¨o l ([0, T ] , E), deﬁned as the set of all f ∈ C ([0, T ] , E) such that

|f |α -H¨o l;[0,T ] ≡

sup 0≤s< t≤T

d (fs , ft ) d (fs , ft ) sup α < ∞ and limε→0 α = 0. |t − s| 0< t−s< ε |t − s|

By restricting s, t, ε to rationals one easily sees that C 0,α -H¨o l ([0, T ] , E) and 0,α -H¨o l by the ([0, T ] , E), the σ-algebra C 0,α coordinate maps in C , generated 0,α : A ∈ C . On the other hand, there is a natural coincides with A ∩ C 1 See

Stroock [159], for instance. is, all group operations are continuous.

2 That

A.2 The Garsia–Rodemich–Rumsey estimate

573

Borel σ-algebra denoted by B 0,α on C 0,α -H¨o l ([0, T ] , E). Using, in particular, separability of C 0,α -H¨o l ([0, T ] , E) and continuity of the group operations, it is easy to see that the Borel σ-algebra B 0,α equals C 0,α . In summary, a continuous process X : (Ω, A, P) → (C ([0, T ] , E) , C) which assigns full measure to C 0,α -H¨o l can be regarded as a C 0,α -H¨o l ([0, T ] , E)-valued random variable whose law is a well-deﬁned Borel measure on B 0,α . Identical remarks apply to p-variation spaces.

A.2 The Garsia–Rodemich–Rumsey estimate A.2.1 Garsia–Rodemich–Rumsey on metric spaces We discuss the Garsia–Rodemich–Rumsey result and several consequences, including a frequently used Besov–H¨ older embedding and a simple proof of Kolmogorov’s tightness criterion. Unless otherwise stated, (E, d) denotes a complete metric space. Theorem A.1 (Garsia–Rodemich–Rumsey) Consider f ∈ C ([0, T ], E) where (E, d) is a metric space. Let Ψ and p be continuous, strictly increasing functions on [0, ∞) with p(0) = Ψ(0) = 0 and Ψ(x) → ∞ as x → ∞. Then T T d (fs , ft ) dsdt ≤ F (A.1) Ψ p(|t − s|) 0 0 implies, for 0 ≤ s < t ≤ T ,

t−s

d (fs , ft ) ≤ 8 0

Ψ−1

4F u2

dp(u).

(A.2)

In particular, if osc (f, δ) ≡ sup {d (fs , ft ) : s, t ∈ [0, T ] , |t − s| ≤ δ} denotes the modulus of continuity of f , we have δ 4F dp(u). Ψ−1 osc (f, δ) ≤ 8 u2 0 Proof. Given f ∈ C ([0, T ] , E) and p (·) we can set f˜ (·) = f (T ·) and p ˜ (·) = p (T ·) and a simple change of variable shows that the “T = 1”estimates obtained for f˜, p ˜ imply the required estimates for f ,p. Thus, we can and will take T = 1 in the remainder of the proof. Deﬁne I (t) =

1

1 Ψ (d (fs , ft ) /p(|t − s|)) ds. Since 0 I (t) = F , there exists t0 ∈ (0, 1) 0 such that I (t0 ) ≤ F . We shall prove that 1 4F −1 dp(u). (A.3) Ψ d (ft 0 , f0 ) ≤ 4 u2 0

Appendix A

574

By a similar argument

1

d (f1 , ft 0 ) ≤ 4

Ψ−1

0

4F u2

dp(u),

and combining the two we have (A.2); ﬁrst for s = 0, t = 0 and then for arbitrary 0 ≤ s < t ≤ 1 by reparametrization. To prove (A.3) we shall pick recursively two sequences {un } and {tn } satisfying t0 > u1 > t1 > u2 > · · · > tn −1 > un > tn > un +1 > . . . , so that tn , un 0 as n → ∞, in the following manner. By induction, if tn −1 has already been chosen, pick 1 p (tn −1 ) . 2

u

u

1 Trivially then, 0 n I (t) dt ≤ F and also 0 n J (s) ds ≤ 0 J (s) ds = I (tn −1 ) where we set J (s) := Ψ d fs , ft n −1 /p(|tn −1 − s|) . un ∈ (0, tn ) : p (un ) =

Now, tn ∈ (0, un ) is chosen so that I (tn ) ≤ 2F/un and also so that J (tn ) ≤ 2I (tn −1 ) /un . (To see that this is possible, assume the contrary so that (0, un ) = T1 ∪ T2 where T1 T2

=

{t ∈ (0, un ) : I (t) > 2F/un } ,

= {t ∈ (0, un ) : J (t) > 2I (tn −1 ) /un } .

Then |T1 | 2F/un ≤ T 1 I (t) dt ≤ F and since the inequality is strict if |T1 | > 0, we have |T1 | < un /2. The same argument gives |T2 | < un /2 and we have the desired contraction |T1 ∪ T2 | < un .) Having completed the construction of {un } and {tn }, we note that, by the deﬁning properties of {tn } , d ft n , ft n −1 I (tn −1 ) 4F 4F Ψ ) ≤2 ≤ ≤ 2 p (tn −1 − tn ) un un −1 un un and this implies, using p (tn −1 − tn ) ≤ p (tn −1 ) ≤ 2p (un ) ≤ 4 (p (un ) − p (un +1 )), 4F ≤ ψ −1 p (|tn −1 − tn |) d ft n , ft n −1 u2n 4F (p (un ) − p (un +1 )) ≤ 4ψ −1 u2n un 4F dp (u) . ψ −1 ≤ 4 u2 un + 1

A.2 The Garsia–Rodemich–Rumsey estimate

575

Using continuity of f , summation over n = 1, 2, . . . we get u1 4F dp (u) ψ −1 d (f0 , ft 0 ) ≤ 4 u2 0 and we are done. Corollary A.2 (Besov–H¨ older embedding) Let q > 1, α ∈ (1/q, 1) and x ∈ C ([0, T ] , E) and set

q

d (xv , xu )

q

Fs,t := |x|W α , q ;[s,t] :=

[s,t] 2

1+q α

|v − u|

dudv.

Then there exists C = C (α, q) such that for all 0 ≤ s < t ≤ T , q

d (xs, xt )

or |x|(α −1/q )-H¨o l;[s,t]

q α −1

≤ ≤

C |t − s| Fs,t , C |x|W α , q ;[s,t] ,

(A.4) (A.5)

and a possible choice of the constant is C = 32 (α + 1/q) / (α − 1/q) . Proof. We take Ψ (u) = uq and p (u) =uα +1/q in (A.2) and a simple computation yields the claimed estimate with constant 8.41/q (α + 1q) / (α − 1/q) ≤ 32 (α + 1q) / (α − 1/q). Corollary A.3 (Besov-variation embedding) Under the assumptions of the previous statement we have, for all 0 ≤ s < t ≤ T , α −1/q

|x|(1/α )-var;[s,t] ≤ C |t − s|

|x|W α , q ;[s,t] .

Proof. From (A.4), d (xs, xt )

1/α

≤ C |t − s|

1− q1α

1 qα Fs,t .

Obviously, ω 1 (s, t) = t − s and ω 2 (s, t) = Fs,t are controls. But then ω 1−ε ω ε2 1 1/α

with ε = 1/ (qα) ∈ (0, 1) is also a control and so we can replace d (xs, xt ) 1/α by |x|(1/α )-var;[s,t] in the above estimate.

Exercise A.4 Assume |x|(α −1/q )-H¨o l ≡ K < ∞. Give a direct proof of (A.4) with C = C (α, q), but not dependent on K. Solution. For brevity, let us write |xs,t | instead of d (xs, xt ). By the triangle inequality, |xs,t | ≤ |xs,v | + |xv ,u | + |xu ,t |

Appendix A

576

and we average this over v ∈ [s, s + ε] and u ∈ [t − ε, t], where ε will be 1 chosen later (in fact, equal to ε = (1/4) α −1 / q |t − s|). This yields |xs,t | ≤

1 ε

s+ ε

|xs,v | dv + s

1 ε

t

|xu ,t | du + t−ε

1 ε2

t

t

|xv ,u | dudv. s

s

Using the (1/q − α)-H¨older modulus of x, the ﬁrst two integrals on the right-hand side are estimated by 2Kεα −1/q . For the last term,we write the double integral as t t s

≤ ≤

s

|xv ,u |

q1

q

1+q α

|u − v|

|u − v| t t

|x|W α , q ;[s,t]

|u − v|

s

(1/q + α )q

(1/q + α )q

1 q

dudv

q1

dudv

s

|x|W α , q ;[s,t] |t − s|( q

1

+α )+ q2

where we used H¨older’s inequality with q = q/ (q − 1). Putting things together, we have |xs,t | ≤ 2Kεα −1/q +

1 2+α −1/q |t − s| |x|W α , q ;[s,t] . ε2 α− 1

Choosing ε = δ |t − s| makes the right-hand side a multiple of |t − s| q , from which we learn that K ≤ 2δ α −1/q K + δ −2 |x|W α , q ;[s,t] . Choosing δ

such that 2δ α −1/q = 1/2 turns this into an estimate on K, namely K ≤ 2δ −2 |x|W α , q ;[s,t] and the proof is ﬁnished.

Corollary A.5 (Besov–L´ evy modulus embedding) Let x ∈ C ([0, T ] , E), p > 1 and assume  2  , x ) d (x u v  dudv = Fs,t < ∞. exp η ∃η > 0 : 1/p [s,t] 2 |v − u| Then there exists a constant C depending on p only such that for all 0 ≤ s < t ≤ T, 1 d (xs, xt ) ≤ C 1/2 ζ (t − s) × log (Fs,t ∨ 4) η 5

h where ζ (h) = 0 u1/p−1 log (1 + 1/u2 )du . As a consequence, /

η exp C2

sup

s≤u < v ≤t

d (xu , xv ) ζ (v − u)

2 0 ≤ Fs,t ∨ 4.

A.2 The Garsia–Rodemich–Rumsey estimate

577

5 Remark A.6 ζ (h) ∼ h1/p log 1/h as h → 0+ in the sense that their ratio converges to a constant c ∈ (0, ∞). Remark A.7 From monotonicity of ζ and (s, t) → Fs,t we see that for any [u, v] ⊂ [s, t], 1 ζ (u − v) × η 1/2

|x|0;[u ,v ] ≤ C so that



η exp  2 C

sup

s≤u < v ≤t

|x|0;[u ,v ] ζ (v − u)

log (Fs,t ∨ 4) 2   ≤ Fs,t ∨ 4.

2

Proof. Using Ψ (u) = eη u − 1 and p (u) = u1/p in (A.2) leads to an estimate of the form C t−s 1 1/p−1 log (1 + 4Fs,t /u2 )du. u d (xs, xt ) ≤ 8/p η 0 Obviously, we may replace Fs,t by F˜s,t = Fs,t ∨ 4 in the above estimate. Then, by a change of variable,   8 ˜ 1/(2p)  t − s  . ζ d (xs, xt ) ≤ 1/2 Fs,t pη F˜ s,t

It is easy to check that for suitable constants c1 , c2 > 0 c1 ζ (u) ζ (v) for u, v ∈ (0, 1) , 5 ζ (u) ≤ c2 u1/p log 1/u for u ∈ (0, 1/2) . −1/2p ˜ We see that ζ (t − s) / Fs,t ≤ (const)×ζ (t − s)× F˜s,t so C d (xs, xt ) ≤ c3 η 1/2 ζ (t − s) log F˜s,t , ζ (uv) ≤

log F˜s,t and

as claimed.

A.2.2 Garsia–Rodemich–Rumsey on GN Rd -valued paths We now specialize nilpo the general metric setting to the free step-N from tent group GN Rd , ⊗ , a metric space under (g, h) → d (g, h) = g −1 ⊗ h. The resulting path spaces were discussed in Section 8.1. Proposition A.8 Let q > 2r ≥ 2 and x ∈ C [0, T ], GN Rd such that 0

T

0

T

d (xs , xt )

q

q /r

|t − s|

dsdt ≤ M q .

Appendix A

578

−1 Set p = 1r − 2q > 1. Then, there exists C = C (q, r), which can be chosen non-increasing in q ∈ (2r, ∞), x1/p-H¨o l;[0,T ] ≤ CM. Proof. An immediate consequence of Corollary A.2, also shows that which 32 1 2 a possible choice of C is given by C (q, r) = r / r − q . We observe that the constant in the previous proposition does not depend on T . One can reconﬁrm this with the following scaling argument: rescale by deﬁning x ˜ (·) := x (T ·) so that 0

1

0

1

˜ x1/p-H¨o l;[0,1] q1 q d (˜ xs , x ˜t ) dsdt q /r |t − s|

= =

T 1/p x1/p-H¨o l;[0,T ] T

T 1/r −2/q

0

T

0

d (xs , xt )

q /r

|t − s|

q1

q

dsdt

,

with identical scaling in T since 1/p = 1/r − 2/q. Similarly, we have k /q q /k 1 1 |π k (xs,t − ys,t )| dsdt q /r 0 0 |t − s| 1/q q T T |π (x − y )| k s,t s,t = T k /r −2k /q dsdt q k /r 0 0 |t − s| 1/q q T T |π k (xs,t − ys,t )| k /p =T dsdt , q k /r 0 0 |t − s| |π k (xs,t − ys,t )| |π k (xs,t − ys,t )| = T k /p sup , sup k /p k /p 0≤s< t≤1 0≤s< t≤T |t − s| |t − s| which reduces the proof of the following result to T = 1. Proposition A.9 Let q > 2r ≥ 2 and x, y ∈ C [0, T ], GN Rd such that, for non-negative constants M and ε, 1/q 1/q q q T T T T xs,t ys,t dsdt ≤ M and dsdt ≤ M, q /r q /r 0 0 |t − s| 0 0 |t − s| (A.6) k /q q /k T T |π k (xs,t − ys,t )| dsdt ≤ εM, for k = 1, . . . , N. (A.7) q /r 0 0 |t − s| −1 > 1, there exists C = C (q, r) , non-increasing Then, setting p = 1r − 2q in q, such that x1/p-H¨o l;[0,T ] ≤ CM and y1/p-H¨o l;[0,T ] ≤ CM

(A.8)

A.2 The Garsia–Rodemich–Rumsey estimate

579

and, for all k = 1, . . . , N , sup

|π k (xs,t − ys,t )| k /p

|t − s|

0≤s< t≤T

≤ εCM k

(A.9)

where C can be chosen non-increasing in q ∈ (2r, ∞). Proof. The above scaling argument, as already pointed out, allows us to assume that T = 1. Moreover, at the cost of replacing x and y by δ 1/M x and δ 1/M y, we can and will assume that M = 1. In this proof, all the constants ci , when dependent on q, but will be non-decreasing in q. That inequality (A.8) holds true follows from Proposition A.8. We now prove (A.9) by induction over the level k ∈ {1, . . . , N }. The case k = 1 is, again, a consequence of Proposition A.8. Let us now assume that it is true for all levels 1, . . . , k − 1 and establish the estimate for level k. Fix s < t ∈ [0, 1] , and deﬁne zrs = π k (xs,s+ r − ys,s+ r ) . Fix u < v in [0, t − s]. Using xs,s+v − xs,s+ u = xs,s+ u ⊗ (xs+u ,s+v − 1) and π j (xs+ u ,s+ v − 1) = 0 (or π j (xs+u ,s+ v )) for j = 0 (or j > 0), we have zvs − zus

= =

π k (xs,s+ v − xs,s+ u ) − π k (ys,s+ v − ys,s+u ) k

π k −j (xs,s+ u ) ⊗ π j (xs+u ,s+v ) −

j =1

k

π k −j (ys,s+ u )

j =1

⊗ π j (ys+ u ,s+v ) =

k

π k −j (xs,s+ u ) ⊗ π j (xs+u ,s+v − ys+u ,s+v )

j =1

+

k

π k −j (xs,s+u − ys,s+u ) ⊗ π j (ys+u ,s+v ) .

j =1

Furthermore, using π 0 (xs,s+ u − ys,s+u ) = 0, we obtain zvs − zus =

k −1

π k −j (xs,s+ u ) ⊗ π j (xs+u ,s+ v − ys+u ,s+v )

j =1 k −1

+

π k −j (xs,s+u − ys,s+ u ) ⊗ π j (ys+u ,s+v )

j =1

+ π k (xs+ u ,s+ v − ys+u ,s+v ) . Hence,

0

t−s

0

t−s

|zvs − zus |

q /r

|v − u|

1/q

q

dvdu

≤ ∆1 + ∆2 + ∆3

Appendix A

580

where ∆1 = c1

k −1

c2

k −1

0

t−s

q /r

|v − u|

t−s

q

|π k −j (xs,s+ u − ys,s+u )|

t−s

q /r

1/q

,

1/q dudv

,

1/q dudv

q /r

|v − u|

0

dudv

qj

|v − u|

q

|π k (xs+ u ,s+v − ys+u ,s+v )|

t−s

c3 0

ys+u ,s+ v

0

q

|π j (xs+u ,s+v − ys+u ,s+v )|

0

0

j =1

∆3 =

q (k −j )

xs,s+ u

j =1

∆2 =

t−s t−s

.

Now, x1/p-H¨o l;[0,1] ,y1/p-H¨o l;[0,1] ≤ C1 (q, r) by Proposition A.8 for a constant C1 , non-increasing in q. Hence, ∆1 ≤ c4

k −1

|t − s|

( k −j ) p

t−s

0

j =1

1/q

q

|π j (xs+u ,s+v − ys+u ,s+v )|

t−s

dudv

q /r

|v − u|

0

.

From the induction hypothesis, we have q 1 q 1− 1 (j −1) |π j (xs+ u ,s+ v − ys+ u ,s+ v )| ( j ) ≤ c5 εq (1− j ) |t − s| p .

(A.10)

Hence, we obtain ∆1

≤

c6

k −1

|t − s|

( k −1 ) p

ε(1− j ) 1

j =1

t−s

0

≤

c6 ε

t−s

k −1

|t − s|

dudv

q j /r

|v − u|

0 ( k −1 ) p

1/q

q /j

|π j (xs+u ,s+ v − ys+u ,s+ v )|

by assumption (A.7).

j =1

For u < v < t − s we also have ys+u ,s+v q ys+ u ,s+ v so that ∆2

≤

c7

k −1

|t − s|

qj

q (j −1)/p

q

≤ C1 (q, r) |t − s|

( j −1 ) p

j =1

t−s

t−s

|π k −j (xs,s+ u − ys,s+u )| 0

0

1/q

q

q

ys+u ,s+ v q /r

|v − u|

dudv

.

A.2 The Garsia–Rodemich–Rumsey estimate

581

From the induction hypothesis, we have |π k −j (xs,s+ u − ys,s+ u )| ≤ c8 εu

k −j p

≤ c8 ε |t − s|

k −j p

;

in particular, we see that ∆2

≤

c9 ε |t − s|

( k −1 ) p

k −1

c10 ε |t − s|

( k −1 ) p

0

j =1

≤

t−s

t−s

q /r

|v − u|

0

1/q

q

ys+u ,s+v

dudv

by assumption on y.

Finally, using assumption (A.7), we have, deﬁning Υs,t = supu ,v ∈[s,t] |π k (x u , v −y u , v )| , |v −u |k / p ∆3

=

t−s

c3 0

≤

c3 ε

q /r

0

sup |π k (xu ,v

1/k

1/q

q /k

|π k (xs+u ,s+v − ys+u ,s+v )|

t−s

u ,v ∈[s,t]

dudv

|v − u| k −1 1− 1 − y )|( k ) |t − s| p

1−1/k

Υs,t

u ,v

(with c9 = max (c3 , 1), not dependent on q). Hence, we see that

1 |t − s|

k −1 p

t−s

0

t−s

q /r

|v − u|

0

1/q

q

|zvs − zus |

dvdu

1−1/k . ≤ c11 ε + ε1/k Υs,t

Another application of Proposition A.8 gives |zts − zss | k /p

|t − s|

1−1/k , ≤ c12 ε + ε1/k Υs,t

i.e. we prove that for all s, t ∈ [0, T ] , |π k (xs,t − ys,t )| k /p

|t − s|

1−1/k , ≤ c12 ε + ε1/k Υs,t

which readily implies that 1−1/k , Υs,t ≤ c12 ε + ε1/k Υs,t or

/ 1− k1 0 Υs,t Υs,t ≤ c12 1 + . ε ε

This last inequality implies that induction.

Υs , t ε

≤ c13 , which concludes the

582

Appendix A

A.3 Kolmogorov-type corollaries A.3.1 H¨older regularity and tightness Let (Xt : t ∈ [0, T ]) be a stochastic process with values in some Polish space (E, d) and assume there exist positive constants a, b, c such that for all s, t ∈ [0, T ] , a

1+b

E (d (Xs , Xt ) ) ≤ c |t − s|

.

Kolmogorov’s criterion asserts that X then has a continuous version, see [144] for instance, which we shall also denote by X without further notice. In fact, as is well known (and will be seen below), X can be chosen with γ-H¨older continuous sample paths for any γ < b/a. The above condition is equivalent to the existence of M > 0 and q > r ≥ 1 such that 1/r

|d (Xs , Xt )|L q (P) ≤ M |t − s|

(A.11)

and we ﬁnd it convenient in applications to formulate the following criteria in this form. Theorem A.10 (Kolmogorov) Let M > 0, q > r ≥ 1 and assume (A.11) holds for all s, t ∈ [0, T ]. Then, for any γ ∈ [0, 1/r − 1/q) there exists C = C (r, q, γ, T ) non-increasing in q, such that q1 q ≤ CM. E |X|γ -H¨o l;[0,T ] Proof. Fix γ ∈ [0, 1/r − 1/q). Since q > 1 and α := 1/q + γ < 1/r ≤ 1, the Besov–H¨ older embedding, established in Corollary A.2 (equivalently: Proposition A.8) shows that there exists a constant c1 such that q q d (Xs , Xt ) q d (Xs , Xt ) d (Xs , Xt ) ≤ c dsdt = c 1 1 1+q α 2+q γ dsdt. |t − s|γ [0,T ] 2 |t − s| [0,T ] 2 |t − s| After taking sups,t∈[0,T ] and expectations, we have q q E |X|γ -H¨o l;[0,T ] ≤ (c1 M )

[0,T ] 2

|t − s|

−2+q (1/r −γ )

dsdt < ∞,

using 1/r − γ > 1/q to see that the last double integral is ﬁnite. Assume now that E has enough structure so that bounded sets in C γ -H¨o l ([0, T ] , E) are precompact in C γ -H¨o l ([0, T ] , E) for γ < γ . This requires interpolation and Arzela–Ascoli and holds true, for instance, when E = R or more generally GN Rd , the step-N free nilpotent group over Rd , equipped with Carnot–Caratheodory distance. It will be enough for us to focus on this case. We have

A.3 Kolmogorov-type corollaries

583

Corollary A.11 (Kolmogorov–Lamperti tightness criterion) Let (Xnt : t ∈ [0, T ]) be a sequence of continuous GN Rd -valued processes. Let M > 0, q > r ≥ 1 and assume 1/r

sup |(Xns , Xnt )|L q (P) ≤ M |t − s| n

holds for all s, t ∈ [0, T ]. Then (Xn ) is tight in C γ -H¨o l [0, 1] , GN Rd for any γ ∈ [0, 1/r − 1/q). Proof. Take γ < γ < 1/r − 1/q. By the previous theorem q q sup E Xn γ -H¨o l;[0,T ] ≤ (CM ) < ∞.

n

Writing BR = x ∈C γ -H¨o l [0, T ] , GN Rd : xγ -H¨o l;[0,T ] < R it is clear from Chebyshev’s inequality that sup P [Xn ∈ BR ] → 0 as R → ∞. n

is precompact with The proof is then ﬁnished with the remark that BR respect to γ-H¨older topology in C γ -H¨o l [0, 1] , GN Rd . Although not necessary for the next result, we remain in the strictly setting of GN Rd -valued processes for the remainder of this section. The following is a variation of Theorem A.10 with the feature that the constant C does not depend on q. This will be important since ina typical application M itself will be taken as a function of q (e.g. M = O q 1/2 as q → ∞ in a Gaussian setting). and it is important that C2 does not depend on q. Theorem A.12 Let (Xt : t ∈ [0, T ]) be a continuous GN Rd -valued process. For any γ ∈ [0, 1/r) there exist q0 (r, γ) and C = C (r, γ, T ) such that if 1/r ∀s, t ∈ [0, T ] : |d (Xs , Xt )|L q (P) ≤ M |t − s| holds for some q ≥ q0 , then we also have q1 q E Xγ -H¨o l;[0,T ] ≤ CM or, equivalently, for all k = 1, . . . , N, |π k (Xs,t )| sup 0≤s< t≤T |t − s|k γ

k

≤ (CM ) . Lq / k

Proof. Pick q0 = q0 (γ, r) large enough so that for all q ≥ q0 : γ < 1/r−2/q. It follows that q

Xγ -H¨o l;[0,T ]

≤ ≤

q

cq1 X(1/r −2/q )-H¨o l;[0,T ] T T q d (Xs , Xt ) cq2 dsdt q /r 0 0 |t − s|

Appendix A

584

with c1 , c2 dependent on γ, r, T but not on q (we used the fact that the constant in Proposition A.8 can be chosen non-increasing in q). After taking sups,t∈[0,T ] and expectations, we have q q E Xγ -H¨o l;[0,T ] ≤ (c2 M ) T 2 < ∞. If the previous result was about γ-H¨older sample path regularity of a GN Rd -valued process, the following is about the γ-H¨older distance of two such processes. Theorem A.13 Let X, Y be continuous GN Rd -valued processes. Let γ ∈ [0, 1/r). Assume that for some constant M and for q ≥ q0 (r, γ) , we have for all s, t ∈ [0, T ] , |d (Xs , Xt )|L q (P)

≤

1/r

,

(A.12)

1/r

,

(A.13)

M |t − s|

|d (Ys , Yt )|L q (P)

≤

M |t − s|

|π k (Xs,t − Ys,t )|L q / k (P)

≤

εM |t − s| k

k /r

for k ∈ {1, . . . , N } . (A.14)

Then, there exists C = C (r, γ, T, N ) such that (i) for all k = 1, . . . , N , |π k (Xs,t )| |π k (Ys,t )| k , sup ≤ (CM ) sup 0≤s< t≤T |t − s|k γ q / k 0≤s< t≤T |t − s|k γ q / k L L |π k (Xs,t − Ys,t )| k ≤ ε (CM ) ; sup kγ 0≤s< t≤T |t − s| q/k L

and (ii)

|dγ -H¨o l (X, Y)|L q / N ≤ C max ε, ε1/N M.

Remark A.14 In a typical (Gaussian) application, assumptions (A.12), (A.13), (A.14) hold for all q with M = Cq 1/2 . In this case, the conclusions take the form |π k (Xs,t − Ys,t )| ˜ k /2 ∀k ∈ {1, . . . , N } : sup ≤ εCq kγ 0≤s< t≤T |t − s| q L

for C˜ = C˜ (r, γ, T, N ) and similarly, d1/p-H¨o l (X, Y)

Lq

≤ C˜ max ε, ε1/N q 1/2 .

(A.15)

Proof. Pick q0 = q0 (γ, r) large enough so that for all q ≥ q0 : γ < 1/r − 2/q =: 1/p. It follows from Proposition A.9 that there exists c1 which

A.3 Kolmogorov-type corollaries

585

can be chosen independent of q, |π k (Xs,t − Ys,t )| 1 sup kγ ε 0≤s< t≤T |t − s| ≤

|π k (Xs,t − Ys,t )|

sup

k /p

|t − s| k /q T T |π (X )|q /k k s,t ≤ c1 max dsdt q /r k =1,...,N 0 0 |t − s| k /q T T |π (Y )|q /k k s,t + c1 max dsdt q /r k =1,...,N 0 0 |t − s| k /q T T |π (X − Y )|q /k 1 k s,t s,t max + c1 dsdt . q /r ε k =1,...,N 0 0 |t − s| Hence, if ∆ =

0≤s< t≤T

1 kq

q /k

∆ ≤ c2

ε

E sup0≤s< t≤T

T

max

T

q , we have

|π k (X s , t −Y s , t )| k |t−s|k γ

q /k E |π k (Xs,t )|

dsdt q /r |t − s| T T E |π k (Ys,t )|q /k q /k + c2 max dsdt q /r k =1,...,N 0 0 |t − s| q /k T T |π k (Xs,t − Ys,t )| q /k 1 + c2 max E dsdt q /r εq /k k =1,...,N 0 0 |t − s| c q /k 2 q /k q εq /k M q = (c4 M ) , ≤ c3 c2 M q + ε |π (X s , t −Y s , t )| which is equivalent to sup0≤s< t≤T k |t−s| q / k ≤ c5 εM k , which is kγ L what we wanted to prove. 1/γ (ii) Take g = δ 1/λ Xs,t with λ = CM |t − s| and similarly h = δ 1/λ Ys,t . Note that by part (i), |π k (Ys,t )| sup |π k (Xs,t )| , sup ≤ 1 s,t s,t λk λk Lq / k Lq / k sup |π k (Xs,t − Ys,t )| ≤ ε. s,t k λ Lq / k k =1,...,N

0

0

From Proposition 7.49, 1/N 1−1/N d (g, h) ≤ c6 |g − h| + |g − h| max (1, |g|)

Appendix A

586

and so sup s,t

|d (Xs,t , Ys,t )| λ ≤

|π k (Xs,t − Ys,t )| λk |π k (Xs,t − Ys,t )| 1/N +c6 max sup k =1,...,N s,t λk |π (X )|1/k 1−1/N k s,t max 1, max sup . k =1,...,N s,t λ c6

max sup

k =1,...,N s,t

1/N (1−1/N ) (1−1/N ) p ≤ |A|1/N B , we can From H¨older’s L p |B|L p L inequality, A then bound sups,t |d (Xs,t , Ys,t )| /λ L q / N by a constant (namely c6 ) times |π k (Xs,t − Ys,t )| max sup k s,t λk Lq /N |π k (Xs,t − Ys,t )| 1/N + max sup k s,t λk |π (X )|1/k 1−1/N k s,t max 1, max sup . k λ s,t q /N L |π (X − Y )| k s,t s,t ≤ max sup k =1 s,t λk Lq / k |π k (Xs,t − Ys,t )| 1/N + max sup . k =1 s,t λk Lq / k 1−1/N |π (X )|1/k k s,t max 1, max sup q / k k s,t λ L |π (X )|1/k k s,t 1/N = ε+ε using sup s,t λ

≤ 1.

Lq / k

1/γ

Since λ = CM |t − s|

, the proof is ﬁnished.

A.3.2 Lq -convergence for rough paths Theorem A.13 can be obviously used to establish Lq -convergence (with quantitative estimates!) of a sequence of continuous GN Rd -valued processes. It can be useful to have the following “soft” criterion for Lq -convergence which only relies on basic interpolation estimates.

A.4 Sample path regularity under Gaussian assumptions

587

Proposition A.15 Let Xn , X∞ be continuous GN Rd -valued processes deﬁned on [0, T ]. Let q ∈ [1, ∞) and assume that we have pointwise convergence in Lq (P), i.e. for all t ∈ [0, T ], q d (Xnt , X∞ t ) → 0 in L (P) as n → ∞;

and uniform H¨ older bounds, i.e. q sup E Xn α -H¨o l;[0,T ] < ∞. 1≤n ≤∞

(A.16)

(A.17)

Then for α < α, dα -H¨o l;[0,T ] (Xn , X∞ ) → 0 in Lq (P) . Remark A.16 To check condition (A.17) one typically uses Theorem A.12. Let us also note that (A.16) may be replaced by the assumption of pointwise convergence in probability; in this case, it is clear from (A.17) that q d (Xnt , X∞ t ) is uniformly integrable for any q ∈ [1, q) and hence the conclusion becomes dα -H¨o l (Xn , X∞ ) → 0 in Lq (P) . Obviously, there is no need to distinguish between q and q if (A.17) holds for all q < ∞. (This is typical in our applications.) Proof. It is enough to show d∞ convergence in Lq . Indeed, once we have d∞ convergence in Lq , we have d0 convergence in Lq , and then by interpolation, we have dα -H¨o l convergence in Lq . For any integer m, 21−q E [d∞ (Xn , X∞ ) ] / % q & n ∞ E sup d X i T , X i T +E sup q

≤ ≤

m

i=1,...,m

m i=1

m

q + E d Xni , X∞ i m

m

T m

α q

T |t−s|< m

0

n q ∞ q Xs,t + Xs,t

" ! q × 2 sup E Xn α -H¨o l;[0,T ] . 1≤n ≤∞

By ﬁrst choosing m large enough, followed by choosing n large enough, we see that d∞ (Xn , X∞ ) → 0 in Lq as required.

A.4 Sample path regularity under Gaussian assumptions We start with a simple characterization of Gaussian integrability.

Appendix A

588

Lemma A.17 For a real-valued non-negative variable Z, the following three conditions are equivalent: (i) (Gauss tail) there exists η 1 > 0 such that P [Z > x] ≤

1 −η 1 x 2 e ; η1

(ii) (Gaussian integrability) there exists η 2 > 0 such that 2 E eη 2 Z < ∞; (iii) (square-root growth of moments) there exists η 3 > 0 such that for all q ∈ [1, ∞), 1√ q < ∞. |Z|L q (P) ≤ η3 When switching from the ith to the jth statement, the constant η j only depends on η i . Proof. (i) implies (ii) by Chebyshev’s inequality and the converse holds by using the formula ∞

E [X] =

P [X ≥ x] dx.

0

Using the same formula, (i) implies (iii) since ∞ = P Z 2 ≥ x d (xp ) E Z 2p 0 1 ∞ −η x p ≤ e d (xp ) = p+1 Γ (p) . η 0 η Stirling’s approximation for the Gamma function is given by p p 5 Γ (p) = 2π/p (1 + O (1/p)) e p p ≤ for p large enough. e √ It is then clear that |Z|L 2 p ≤ c p and by making c = c (η) large enough this holds for all p. To see that (iii) implies (ii) we simply expand the exponential 2 E eη Z

∞ 1 n 2n η (|Z|L 2 n ) n! n =1 ∞ 2 n 2c η nn ≤ n! n =1

=

and we see with Stirling, Γ (n + 1) = n!, that this sum is ﬁnite for η = η (c) small enough.

A.4 Sample path regularity under Gaussian assumptions

589

We shall now see that (often sharp) generalized H¨ older or variation regularity of a stochastic process can be shown from the following simple condition Gaussian integrability condition. It is not only satisﬁed by Brownian motion, a generic class of Gaussian processes and Markov processes with uniform elliptic generator in divergence form, but also by all respective enhancements to rough paths (provided one works with homogenous “norms” and distances on the rough path spaces). Condition A.18 (Gaussian integrability (p)) Given a (continuous3 ) process X on [0, T ] with values in a Polish space (E, d) , there exist p ≥ 1, η > 0 such that  / 02  , X ) d (X s t  < ∞. (A.18) sup E exp η 1/p 0≤s< t≤T |t − s| Let us note that, from Lemma A.17, condition (A.18) is equivalent to d (X , X ) √ s t = O ( q) as q → ∞. sup 1/p q 0≤s< t≤T |t − s| L (P) It turns out that this rather generic condition implies a number of sample path properties, many of which are well known in the setting of Gaussian processes (see [48] and the references cited therein). Given a “modulus” function ζ : [0, ∞) → [0, ∞), 0 at 0 and strictly increasing, we set |X|ζ -H¨o l;[0,T ] :=

sup 0≤s< t≤T

|X|0;[s,t] d (Xs , Xt ) = sup . ζ (t − s) 0≤s< t≤T ζ (t − s)

(A.19)

(The second equality, with |X|0;[s,t] = supu ,v ∈[s,t] d (Xs , Xt ), follows from monotonicity of ζ.) Theorem A.19 Assume (Xt : t ∈ [0, T ]) is a continuous process with values in a Polish space (E, d) which satisiﬁes the Gaussian integrability condition (A.18) with parameters p, η. Assume 5 h1/p log 1/h < ∞. lim sup ζ (h) h→0 Then there exists c = c (p, T ) > 0 such that 2 E exp cη |X|ζ -H¨o l;[0,T ] < ∞. 3 Condition (A.18) is more than enough, with Kolmogorov’s criterion, to guarantee the existence of a continuous version of X . We shall simply assume that X is continuous.

Appendix A

590

Proof. Without loss of generality, we may assume η = 1. (Otherwise, replace the distance d by η −1/2 d.) Condition (A.18) implies that  2  , X ) d (X u v  dudv exp  F (ω) := 1/p [0,T ] 2 |v − u| has ﬁnite expectation. By Corollary A.5, there exists c > 0 such that  2  , X ) d (X s t  ≤ E (F (ω) ∨ 4) < ∞, E exp c sup 0≤s< t≤T ζˆ (t − s) 5 5

h where ζˆ (h) := 0 u1/p−1 log (1 + 1/u2 )du ∼ h1/p log 1/h. By assumption, there exist positive constants c1 , c2 so that ζˆ (h) ≤ c1 ζ (h) for h ∈ [0, c2 ). 2 Moreover, by making c smaller if necessary, E exp c |X|0;[0,T ] < ∞ and the general case follows from the split |X|ζ -H¨o l;[0,T ] ≤

sup s,t:|t−s|≤c 2

|X|0;[s,t] 1 |X|0;[0,T ] . + ˆ ζ (c2 ) ζ (t − s) /c1

Exercise A.20 Let p > p. Show that, under the assumptions of Theorem A.19,  / 02  2 |X|p -var;[s,t]  < ∞. (A.20) sup E exp cη 1/p 0≤s< t≤T |t − s| Theorem A.19 implies that (Xt ) has ζ-modulus regularity where ζ (h) = 5 h1/p ln 1/h. We know from examples (e.g. Brownian motion with p = 2) that this is the exact modulus, and in this sense Theorem A.19 is optimal. This should be compared with the following law of iterated logarithm in which, essentially, sup0≤s< t≤T in (A.19) is restricted to ﬁxed s = 0. Theorem A.21 Assume (Xt : t ∈ [0, T ]) is a process with values in a Polish space (E, d) which satisiﬁes the Gaussian integrability condition (A.18) with parameters p, η. Then, there exists a (deterministic) constant C = C (p, η) < ∞ s.t. lim sup

|X|0;[0,h] 5 ≤ C a.s. ln ln 1/h

h↓0 h1/p

A.4 Sample path regularity under Gaussian assumptions

591

5 Remark A.22 In the notation of Deﬁnition 5.45, h1/p ln ln 1/h = ψ 1/p,−1/2 (h) as h ↓ 0. We insist that, in general, (Xt ) will not have ψ 1/p,−1/2 -modulus regularity. However, we will see below that (Xt ) enjoys ψ p,p/2 -variation regularity where ψ p,p/2 is Lipschitz equivalent (in the sense of Lemma 5.48) to the inverse of ψ 1/p,−1/2 . ˜ Proof. We start with a tail estimate on |X|0;[0,T ] = X , introducing 0;[0,1]

˜ (·) := X (T ·) : [0, 1] → E and noting that X ˜ the reparametrization X 2/p satisﬁes the Gaussian integrability condition with parameters p, η/T . It is then clear from Theorem A.19 that |X|0;[0,T ] /T 1/p enjoys Gaussian integrability (uniformly in T ) and hence, by Chebyshev, has a Gaussian tail; that is, for c1 large enough, not dependent on T , " ! 1 x 2 . P |X|0;[0,T ] ≥ x ≤ c1 exp − c1 T 1/p The main idea is Fix ε > 0, 5 now to scale by a geometric sequence. 5 q ∈ (0, 1), set c2 = (1 + ε) c1 and also set ϕ (h) = h1/p ln ln 1/h. Deﬁne the event An = |X|0;[0,q n ] ≥ c2 ϕ (q n ) . It follows that, for n large enough, ! " P (An ) = P |X|0;[0,q n ] ≥ c2 ϕ (q n ) 2 1 c2 ϕ (q n ) ≤ c1 exp − c1 q n /p 1 2 n = c1 exp − c2 ln ln q c1 −c 22 /c 1

= c1 (−n ln q)

.

This is summable in n and hence, by the Borel–Cantelli lemma, we get that only ﬁnitely many of these events occur; i.e. |X (ω)|0;[0,q n ] < c2 ϕ (q n ) for all n ≥ n0 (ε, ω) large enough. For all h small enough, pick n such that q n +1 ≤ h < q n . We then have lim sup h↓0

|X|0;[0,h] ϕp,2 (h)

ϕp,2 (q n ) |X|0;[0,q n ] ϕp,2 q n +1 ≤ lim sup n +1 ) ϕ n ϕp,2 (h) n →∞ ϕp,2 (q p,2 (q ) 5 −1/p (1 + ε) c1 ≤ q

and the proof is ﬁnished. (Sending q ↑ 1, followed by ε ↓ 0, actually shows √ that one can take C = c1 .)

Appendix A

592

We now turn to variational regularity of (Xt ). Theorem A.19 readily implies ψ-variation regularity provided ψ is taken as the inverse of the 5 1/p ln 1/h; equivalently (cf. Lemma 5.48), ψ (h) = modulus ζ (h) = h 5 p (h/ ln 1/h) . We shall establish a sharper result below (with ln replaced by the iterated logarithm ln ln). There are examples (e.g. Brownian motion, Theorem 13.69) in which this is the exact variation, and in this sense Theorem A.24 below is optimal. Lemma A.23 Let x : [0, 1] → Rd be a continuous path, and φ : R+ × R+ → R+ a function increasing in the ﬁrst dimension, and decreasing in the second dimension. Then, ∞ 2 −4 φ d xt i , xt i + 1 , ti+1 − ti ≤ 2 φ |x|0; [ n

sup (t i )∈D([0,1])

n =2 k =0

i

k 2n

, k2+n 4 ]

, 2−n .

Proof. For (s, t) ⊂ [0, 1] , we ﬁrst ﬁnd the integer ns,t ∈ {0, 1, 2, . . . } such that s,t s,t 2−n −1 < |t − s| ≤ 2−n . We then cover (s, t) with the interval I s,t = [σ s,t , σ s,t ] , where s,t s,t s,t σ s,t = max k2−n −1 , k ∈ 0, . . . , 2n +1 , k2−n −1 ≤ s , s,t s,t s,t σ s,t = min k2−n −1 , k ∈ 0, . . . , 2n +1 , k2−n −1 ≥ t . s,t

Observe ﬁrst that |I s,t | 2n +1 is equal to 2 or 3. Indeed, by deﬁnition of ns,t , we have s,t n s , t +1 s,t I 2 ≥ |t − s| 2n +1 > 1, s,t

s,t

and as both σ s,t 2n +1 and σ s,t 2n +1 are integers, we must have s,t n s , t +1 s,t n s , t +1 ≥ 2. Also, if |I ≥ 4, it means that the interval |I ! |2 " |2 s,t 1 1 s,t σ s,t + 2 n s , t + 1 , σ − 2 n s , t + 1 is of length greater than or equal to 2−n , " ! and hence, as σ s + 2 n s 1, t + 1 , σ s,t − 2 n s 1, t + 1 ⊂ [s, t] , so is [s, t] . This contradicts the deﬁnition of ns,t . Observe that if (s, t) and (u, v) are two disjoint intervals, then I s,t and u ,v are not identical. Assume this were not the case. Without loss of genI erality, we can assume u ≥ t. Necessarily, ns,t = nu ,v . As u ≥ t, we obtain σ u ,v ≥ σ s,t − 2 n1+ 1 ≥ σ s,t + 2 n1+ 1 , which contradicts the assumption that I s,t = I u ,v . For a ﬁxed dissection (ti ) ∈ D ([0, 1]) , we have from the monotonicity assumption on φ, ti ,ti+ 1 −1 φ d xt i , xt i + 1 , ti+1 − ti ≤ φ |x|0,I t i , t i + 1 , 2−n i

i

A.4 Sample path regularity under Gaussian assumptions

593

(2) (3) , αn ,k = # {I t i ,t i + 1 = and then deﬁne αn ,k = # I t i ,t i + 1 = 2 nk+ 1 , 2kn+2 +1 k (2) (3) (2) k +3 . We have just seen that αn ,k , αn ,k ∈ {0, 1} and in fact αn ,k + 2n + 1 , 2n + 1 (3)

αn ,k ≤ 1. We therefore obtain φ d xt i , xt i + 1 , ti+1 − ti i ∞ 2 −2 n+1

≤

n =0

+

k =0

+1 ∞ 2 n −3

n =1

≤2

k =0

+1 ∞ 2 n −4

n =1 (2)

(2) αn ,k φ |x|0; [ k , k + 2 ] , 2−(n +1) 2n + 1 2n + 1

k =0

(3) αn ,k φ |x|0; [ k , k + 3 ] , 2−(n +1) 2n + 1 2n + 1

φ |x|0; [ k , k + 4 ] , 2−(n +1) , 2n + 1 2n + 1

(3)

using αn ,k + αn ,k ≤ 1. It then suﬃces to take the supremum over all dissections. The next result deals with generalized variation and we recall (cf. Deﬁnition 5.45) that for p ≥ 1, p t ψ p,p/2 (t) = 5 ∗ ∗ ln ln 1/t

with ln∗ (h) = max (1, ln h) .

Theorem A.24 Assume (Xt : t ∈ [0, T ]) is a process with values in a Polish space (E, d) which satisiﬁes the Gaussian integrability condition (A.18) with parameters p, η. Then, there exists c = cp > 0 such that η 2 E exp c 2/p |X|ψ p , p / 2 -var;[0,T ] < ∞. T ˜ (·) := X (T ·) : [0, 1] → E satisﬁes the Proof. The reparametrization X Gaussian integrability condition with parameters p, η/T 2/p . At the same time, ˜ = |X|ψ p , p / 2 -var;[0,T ] X ψ p , p / 2 -var;[0,1]

and so we can assume without loss of generality that T = 1. Furthermore, at the price of replacing the distance d by η −1/2 d, we may assume η = 1. After these preliminary remarks, let us deﬁne φα by φα (r) = ψ p,p/2 (r) 1ψ p , p / 2 (r )> α .

Appendix A

594

For a ﬁxed M > 0 and a dissection (ti ) of [0, 1] , and a ﬁxed continuous path x, we have d xt i , xt i + 1 ψ p,p/2 M i d xt i , xt i + 1 = 1 ψ p,p/2 d (x t , x t t −t i i+ 1 ) M ≤ i + 12 i ψp ,p / 2 M i d xt i , xt i + 1 + φ t i + 1 −t i M 2 i d xt i , xt i + 1 1 ≤ + . φ t i + 1 −t i 2 M 2 i

Taking the supremum over all dissections, we see that $ # d xt i , xt i + 1 1 ≤ . |x|ψ p , p / 2 -var ≤ inf M > 0, sup φ t i + 1 −t i M 2 2 (t i )∈D([0,1]) Using the previous lemma, we obtain $ # n ∞ 2 −4 |x|0;[k 2 −n ,(k +4)2 −n ] 1 |x|ψ p , p / 2 -var ≤ inf M > 0, ≤ , φ2 −n −1 M 4 n =2 k =0

and in particular, that

P |X|ψ p , p / 2 -var ≥ M ≤ P

∞ 2 −4

n

φ2 −n −1

|X|0;[k 2 −n ,(k +4)2 −n ]

M

n =2 k =0

1 > 4

.

c From Theorem A.19 and (A.19) there exists c1 > 0 such that P (ΩM ) ≤ 2 c1 exp −M /c1 , where       |X|0;[s,t] ΩM = sup < M . 1/2   0≤s< t≤1 |t − s|1/p ln∗ 1  t−s

Now, on the set ΩM , |X|0;[k 2 −n ,(k +4)2 −n ] φ2 −n −1 M 1/p ∗ n −2 1/2 −(n −2) ≤ ψ p,p/2 ln 2 2 1

ψp ,p / 2

|X |

0 ; [ k 2 −n , ( k + 4 ) 2 −n M

]

> 2 −n −1

.

A.4 Sample path regularity under Gaussian assumptions

For n ≥ 2 we have 1/p 1/2 ln∗ 2n −2 2−(n −2) ≤ ψ p,p/2

2−(n −2)

ψ p,0

595

1/p

ln∗ 2n −2

1/2

p/2 = 2−(n −2) ln∗ 2n −2 p/2 p/2 ≤ 4.2−n max 1, (ln 2) (n − 2) ≤

4.2−n (n − 1)

p/2

,

and so P |X|ψ p , p / 2 -var ≥ M − P ΩC M ≤ P |X|ψ p , p / 2 -var ≥ M ∩ ΩM ∞ −(n −2) p/2 (n − 1) ≤P n =2 2 × Using P (

2 n −4 k =0

αi 1A i > β) ≤

1

|X |

ψp ,p / 2

1 β

0 ; [ k 2 −n , ( k + 4 ) 2 −n M

E (αi 1A i ) =

αi i β

]

> 2 −n −1

1Ω M >

1 4

.

P (Ai ), we obtain

P |X|ψ p , p / 2 -var ≥ M − P ΩC M ≤

∞

24−n (n − 1)

n =2

×

n 2 −4

p/2

P ψ p,p/2

|X|0;[k 2 −n ,(k +4)2 −n ]

k =0

≤

∞ n =0

4−n

2

(n − 1)

p/2

n −4 2

k =0

M P ψ p,p/2

> 2−n −1 ∩ ΩM

|X|0;[k 2 −n ,(k +4)2 −n ] M

−n −1

>2

So it only remains to bound |X|0;[k 2 −n ,(k +4)2 −n ] −n −1 >2 P ψ p,p/2 M |X|0;[k 2 −n ,(k +4)2 −n ] −n −1 M −1 =P 2 . > ψ 1/p 1/p p,p/2 (4.2−n ) (4.2−n ) Now, we have seen in Theorem A.19 that, for a positive constant c2 ,  2  |X|0;[s,t]  < ∞. sup E exp c2 1/p 0≤s< t≤1 |t − s|

.

Appendix A

596

Then, from Chebyshev’s inequality and for large enough constant c3 , |X|0;[k 2 −n ,(k +3)2 −n ] > 2−n −1 P ψ p,p/2 M M 2 n /p −1 −n −2 2 ≤ c3 exp − , 2 ψ p,p/2 2 c3 and so

P |X|ψ p , p / 2 -var ≥ M

M 2 n /p −1 −n −2 2 ≤ c3 (n − 1) exp − 2 ψ p,p/2 2 c3 n =2 + c1 exp −M 2 /c1 . ∞

p/2

We have seen in Lemma 5.48 that c5 ψ p,−1/2 ≤ ψ −1 p,p/2 , which implies that for n ≥ 2, n /p −1 −n −2 2 ≥ c5 ln∗ ln∗ 2n −2 ≥ c6 (1 + ln n) . 2 ψ p,p/2 2 Hence, for a positive constant c7 , c 7 M 2 1 M 2 n /p −1 −n −2 2 2 exp − . ≤ exp −c7 M . 2 ψ p,p/2 2 c3 n 2 p/2 For M large enough, we have c3 n ≥2 (n − 1) n−c 7 M = c8 < ∞ and so P |X|ψ p , p / 2 -var ≥ M ≤ c8 exp −c7 M 2 + c1 exp −M 2 /c1 . The proof is now ﬁnished.

A.5 Comments The Garsia–Rodemich–Rumsey result is well known, e.g. Stroock [159], and so are the resulting Besov–H¨ older and Besov–L´evy modulus embeddings. The Besov-variation embedding is a more recent insight, Friz and Victoir [64]. Exercise A.4 is due to Krylov [95]. Everything up to and including Kolmogorov’s criterion is standard, see Revuz and Yor [143] or Stroock [159], for instance. The ψ-variation sample path behaviour of a generic process under the Gaussian integrability assumption of Condition of A.18 is essentially taken from Friz and Oberhauser [59]; it implies ψ 2,1 -variation regularity of Brownian motion, a classical result of Taylor [168].

Appendix B: Banach calculus Throughout this appendix, E, F, ... are Banach spaces with respective norms |·|E and |·|F and we simply write |·| when no confusion is possible.

B.1 Preliminaries We say that a map f from (a, b) ⊂ R into a Banach space E is continuously diﬀerentiable, in symbols f ∈ C 1 ((a, b) , E), if and only if f˙ (t) := df dt (t) := limε→0 (f (t + ε) − f (x)) /ε exists (as a strong limit in E) for all t ∈ (a, b) with f˙ ∈ C ((a, b) , E). Similarly, the Riemann integral of a continuous function can be deﬁned as a strong limit of Riemann sum approximations and the fundamental theorem of calculus is valid; see [45] for instance. The following proposition is useful in showing that the directional derivatives of an ODE (resp. RDE) solution, as a function of starting point and driving signal, exist as strong limits in C 1-var ([0, 1] , R) (resp. C p-var ([0, 1] , R)), simply using the continuous embedding C p-var ([0, 1], R) → C([0, 1], R). Proposition B.1 Assume E → F , i.e. E is continuously embedded in F . Assume f ∈ C 1 ((a, b) , F ) such that its derivative f˙ (deﬁned as a strong limit in F ) actually takes values in E and extends to a continuous function from [a, b] into E. Assume furthermore that f (a) ∈ E. Then f ∈ C 1 ((a, b) , E) with derivative given by f˙. Proof. By assumption, f˙ extends to a continuous function from [a, b] into E → F . By the fundamental theorem of calculus, for all t ∈ (a, b) , t f (t) − f (a) = f˙ (s) ds, a

where the deﬁnite integral is the strong limit in F of approximating Riemann sums. On the other hand, f˙ ∈ C ([a, b] , E) and so the deﬁnite integral also exists as a strong limit in E of approximating Riemann sums. Since convergence in E implies convergence in F , these integrals coincide and in particular t

f (t) = f (a) +

f˙ (s) ds ∈ E

a

for all t ∈ (a, b). By the fundamental theorem of calculus, we now see that f is continuously diﬀerentiable in E with derivative given by f˙. In other

Appendix B

598

words, f˙ (which was deﬁned as a strong limit of diﬀerence quotients with convergence in F ) is actually convergent in E.

B.2 Directional and Fr´echet derivatives The space of linear, continuous maps from E to F is denoted by L (E, F ) and is itself a Banach space under the operator norm |f | :=

sup y ∈E :|y |≤1

|f (y)| .

Deﬁnition B.2 Let U be an open set in E, and f : U → F is Fr´echet diﬀerentiable at x ∈ U iﬀ there exists Df (x) ∈ L (E, F ) s.t. for all h ∈ E |f (x + h) − f (x) − Df (x) h| = o (|h|) . It is said to be Fr´echet diﬀerentiable on U if Df (x) exists for all x ∈ U . If x → Df (x) is continuous1 we say that f is C 1 in the Fr´echet sense and write f ∈ C 1 (U, F ). Deﬁnition B.3 We say that f : U ⊂o E → F has directional derivative at x ∈ U in direction h ∈ E if the following limit exists: f (x + th) − f (x) = Dh f (x) . t Directional derivatives are automatically homogenous in h in the sense that lim

t→0

Dλh f (x) = λ lim

t→0

f (x + tλh) − f (x) = λDh f (x) , λ ∈ R. tλ

However, existence of Dh f (x) for all h ∈ E, need not imply linearity in h, as is seen in the example f (0, 0) = 0; f (x, y) → x2 y/ x2 + y 2 . Obviously, Fr´echet diﬀerentiability implies the existence of directional derivatives in all directions and Dh f (x) = Df (x) h. In applications one is often interested in the converse. The following two propositions are useful criteria for this purpose. Proposition B.4 Let U be an open set in E, f : U → F a function that has directional derivatives in all directions, and A : U → L (E, F ) a continuous map such that Dh f (x) = A (x) h for all x ∈ U, h ∈ E. Then f ∈ C 1 (U, F ) and Df (x) h = A (x) h. 1 With

respect to the operator norm on L (E, F ).

B.2 Directional and Fr´ echet derivatives

599

Proof. By the fundamental theorem of calculus, 1 1 1 df (x + th) dt= Dh f (x+th) dt= A (x+th) dt. f (x+h) − f (x) = dt 0 0 0 It follows that with ε (h) ≡

1 0

f (x + h) − f (x) − A (x) h = ε (h) h (A (x + th) − A (x)) dt and

1

A (x + th) − A (x) dt

ε (h) ≤ 0

≤

max A (x + th) − A (x) → 0 as h → 0

t∈[0,1]

by continuity of A. Proposition B.5 Let U be an open set in E, and f : U → F be a continuous map that admits directional derivatives at all points and in all directions; more precisely, for all x ∈ U and h ∈ E, Dh f (x) = lim

ε→0

∂ f (x + εh) − f (x) = {f (x + εh)}ε=0 ε ∂ε

exists (as a strong limit in F ). Assume further that (x, h) ∈ U × E → Dh f (x) ∈ F is uniformly continuous on bounded sets. Then f is C 1 in the Fr´echet sense. We prepare the proof with the following: Lemma B.6 Let U ⊂o E and ϕ : U × E → F be uniformly continuous on bounded sets such that for all x ∈ U, the map h → ϕ (x, h) =: ϕ (x) h is linear. Then the map ϕ ˜

:

U → L (E, F )

x

→

(h → ϕ (x) h)

is well-deﬁned and uniformly continuous on bounded sets. Proof. Fixing x, by assumption h → ϕ (x) h is linear; by the assumption on uniform continuity on bounded sets h → ϕ (x) h is also continuous and hence a well-deﬁned element of L (E, F ). By the assumption of uniform continuity on bounded sets, for every R > 0 and ε > 0 there exists δ such that for x, x ∈ U and h, h ∈ E with |x| , |x | ≤ R and |h| , |h | ≤ R, |x − x | + |h − h | < δ =⇒ |ϕ (x, h) − ϕ (x , h )| < ε.

Appendix B

600

Restricting attention to R > 1, given x, x ∈ U with |x| , |x | ≤ R and |x − x | < δ implies |˜ ϕ (x) − ϕ ˜ (x )|op ≡

sup

|ϕ (x) h − ϕ (x ) h| < ε.

h∈E :|h|=1

This says precisely that ϕ ˜ is uniformly continuous on bounded sets. Proof (Proposition B.5). We ﬁrst show that ϕ (x, h) := Dh f (x) is linear in h. As remarked after the deﬁnition of the directional derivative, homogeneity in h is clear. Given g, h ∈ E we have f (x+ε (g+h)) −f (x) f (x+ε (g+h)) −f (x+εg) f (x + εg) − f (x) + = . ε ε ε

→ D g + h f (x) as ε→0

→D g f (x) as ε→0

The ﬁrst term on the right-hand side hence converges as ε → 0. We claim it equals Dh f (x). To this end, using the fundamental theorem of calculus and homogeneity, 1 d f (x + εg + tεh) dt f (x + ε (g + h)) − f (x + εg) = dt 0 1 = ε Dh f (x + εg + th) dt. 0

It follows that f (x + ε (g + h)) − f (x + εg) − D f (x) h ε 1 |Dh f (x + ε (g + th)) − Dh f (x)| dt → 0 as ε → 0, ≤ 0

where we used in the last step uniform continuity of D· f (·) on bounded sets. This completes the proof that Dh f (x) is linear in h. By Lemma B.6 linearity of Dh f (x) in h together with the (uniform) continuity (on bounded sets) assumption of (x, h) → Dh f (x) then implies that x → {h → Dh f (x)} ∈ L (E, F ) is continuous and by Proposition B.4 we can conclude that f ∈ C 1 (U, E). The following result is sometimes referred to as ”closedness of the diﬀerentiation operator”. Proposition B.7 Assume fn ∈ C 1 (U, F ), where U is an open set in E and fn → f uniformly on bounded sets in U (which implies a priori f ∈ C (U, F )). Let g ∈ C (U, L (E, F )) and assume that Dfn → g also uniformly on bounded sets. Then f ∈ C 1 (U, F ) and Df = g.

B.3 Higher-order diﬀerentiability

601

Proof. Fix x ∈ U and h ∈ E. Then f (x + εh) − f (x) =

lim fn (x + εh) − fn (x) ε = lim Dfn (x + th) hdt n →∞ 0 ε = g (x + th) hdt n →∞

0

thanks to Dfn → g uniformly on bounded sets. By continuity of g, ∂ f (x + εh) − f (x) Dh f (x) = ∂ε ε ε=0 exists and equals Dh f (x) = g (x) h and so we can conclude with Proposition B.4.

B.3 Higher-order diﬀerentiability Deﬁnition B.8 We ﬁx U , an open set of E. We say that f : U → F has a directional derivative at x ∈ U in direction (h1 , . . . , hk ) ∈ E k if D(h 1 ,...,h k ) f (x) := Dh 1 · · · Dh k f (x)

(B.1)

exists.

The calculus example f (0, 0) = 0 and f (x, y) = xy x2 − y 2 /x2 + y 2 otherwise (in which 1 = ∂x ∂y f (0, 0) = ∂y ∂x f (0, 0) = −1) shows that the order of h1 , . . . , hk can matter. Nonetheless, under reasonable conditions (namely continuity of the kth directional derivatives), the order does not matter and D(h 1 ,...,h l ) f (x) behaves multilinearly in (h1 , . . . , hk ). Higher-order Fr´echet diﬀerentiability is deﬁned inductively as follows.

Deﬁnition B.9 Let k ∈ {1, 2, . . . } , and U an open set of E. A function f : U → F is (k + 1)-times Fr´echet diﬀerentiable on U if it is Fr´echet diﬀerentiable on U and Df : U → L (E, F ) is k-times Fr´echet diﬀerentiable on U . The kth-order diﬀerential is a map Dk f : U → L (E, . . . , L (E, L (E, F ))) ∼ = L E ⊗k , F where L E ⊗k , F is the space of multilinear bounded maps from E ×· · ·×E (k times) into F . If Dk f is continuous then we say that f is C k in the Fr´echet sense and write f ∈ C k (U, F ) . A map which is C k Fr´echet for all k ≥ 1 is said to be Fr´echet smooth.

Appendix B

602

Given A ∈ L E ⊗k , F we shall indicate multilinearity by writing A !h1 , . . . , hk " instead of A (h1 , . . . , hk ). The criteria we have seen to establish that a function is C 1 in the Fr´echet sense all extend to the case of C k . Proposition B.10 Suppose k ∈ {1, 2, . . . } and U is an open set of E. Assume that f : U → F is a function such that Dh 1 · · · Dh l f (x) exists for all x ∈ U and h1 , . . . , hl ∈ E, and l = 1, 2, .. . , k. Further assume there exist continuous functions Al : U → L E ⊗l , F such that Dh 1 · · · Dh l f (x) = Al (x) !h1 , . . . , hl " for all x ∈ U, h1 , . . . , hl ∈ E, and l = 1, 2, . . . , k. Then f : U → F is C k in the Fr´echet sense. Proposition B.11 Suppose k ∈ {1, 2, . . . } and U is an open set of E. Assume that f : U → F is a function such that Dh 1 · · · Dh l f (x) exists for all x ∈ U and h1 , . . . , hl ∈ E, and l = 1, 2, . . . , k. Assume further that (x; h1 , . . . , kk ) ∈ U × E k → Dh 1 · · · Dh l f (x) ∈ F is uniformly continuous on bounded sets. Then f is C k in the Fr´echet sense. k Proof. Take l ∈ {1, . . . , k} and deﬁne g (ε1 , . . . , εl ) = f x + j =1 εj hj and note that   k ∂k g εj h j  = . D(h 1 ,...,h k ) f x + ∂ε1 . . . ∂εk j =1 Since the order of the partial derivatives does not matter here, we have Dh 1 · · · Dh l f (x) = Dh π ( 1 ) · · · Dh π ( l ) f (x) for any permutation π of {1, . . . , l}. In view of Proposition B.5, it is clear that f ∈ C 1 in the Fr´echet sense and so Dh π ( 1 ) · · · Dh π ( l −1 ) Dh π ( l ) f (x) = Dh π ( 1 ) · · · Dh π ( l −1 ) Df (x) hπ (l) . This shows multilinearity in h1 , . . . , hl . By the assumption of uniform continuity on bounded sets, Al (x) !h1 , . . . , hl " := Dh 1 · · · Dh l f (x) deﬁnes a continuous map from U → L E ⊗l , F , and we conclude with Proposition B.10.

B.4 Comments Fr´echet regularity is a classical topic in non-linear functional analysis. Propositions B.5 and B.10 appear in Driver [45], for instance. We are unaware of any reference for Propositions B.5, B.11.

Appendix C: Large deviations C.1 Deﬁnition and basic properties Let X be a topological space. A rate function is a lower semicontinuous mapping I : X → [0, ∞], i.e. a mapping so that all level sets {x ∈ X : I (x) ≤ Λ} are closed. A good rate function is a rate function for which all level sets are compact subsets of X . The set DI := {x ∈ E : I (x) < ∞} is called the domain of I. Given A ⊂ E we also set I (A) = inf I (x) . x∈A

Lemma C.1 Let I be a good rate function. Then for each closed set F in E, I (F ) = lim I F δ δ ↓0

where the open δ-neighbourhood of a set A ⊂ E is deﬁned as Aδ = ∪ {B (x, δ) : x ∈ A} , B (x, δ) = {y ∈ E : d (x, y) < δ} . Proof. [41, Lemma 4.1.6]. Unless otherwise stated, we assume that probability measures on X are deﬁned on the Borel sets, i.e. the smallest σ-algebra generated by the open sets in X .

Deﬁnition C.2 A family {µε : ε > 0} of probability measures on X satisﬁes the large deviation principle (LDP) with good rate function I if, for every Borel set A, −I (A◦ ) ≤ lim inf ε log µε (A) ≤ lim sup ε log µε (A) ≤ −I A¯ . ε→0

ε→0

Remark C.3 Sometimes it is practical to parametrize the family of prob ability measures so as to consider ε2 log µε (A). Before turing to (various) contraction principles, we state two basic properties of LDPs and refer to [41, Lemma 4.1.6] for proofs.

Appendix C

604

Proposition C.4 A family {µε : ε > 0} of probability measures on a regular topological space can have at most one rate function associated with its LDP. Proposition C.5 Let E be a measurable subset of X such that µε (E) = 1 for all ε > 0. Suppose that E is equipped with the topology induced by X . If {µε } satisﬁes the LDP in E with (good) rate function I and DI ⊂ E, then the same LDP holds in E.

C.2 Contraction principles Theorem C.6 (contraction principle) Let X and Y be Hausdorﬀ topological spaces. Suppose f : X → Y is a continuous map. If {µε } satisﬁes an LDP on X with good rate function I, then the image measures {f∗ µε }, where f∗ µε ≡ µε ◦ f −1 , satisfy an LDP on Y with good rate function J (y) = inf {I (x) : x ∈ X and f (x) = y} . Proof. [41, Lemma 4.1.6]. Deﬁnition C.7 A family {µε : ε > 0} of probability measures on a topological space X is exponentially tight if for every M < ∞, there exists a compact1 set KM such that c lim sup ε log µε (KM ) < −M. ε→0

Theorem C.8 (inverse contraction principle) Let X and Y be Hausdorﬀ topological spaces. Suppose g : Y → X is a continuous injection and that {ν ε } is an exponentially tight family of probability measures on Y. If {g∗ ν ε } satisﬁes an LDP in X with rate function I : X → [0, ∞], then {ν ε } satisﬁes an LDP in Y with good rate function I ≡ I ◦ g. Proof. [41, Theorem 4.2.2] combined with Proposition C.5 and the remark that DI ⊂ g (Y) . Theorem C.9 (extended contraction principle) Let {µε } be a family of probability measures that satisﬁes an LDP with good rate function I on a Hausdorﬀ topological space X . For m = 1, 2, . . . , let f m : X → Y be continuous maps, with (Y, d) a separable metric space. Assume there exists a measurable map f : X → Y such that for every Λ < ∞, lim

sup

m →∞ {x:I (x)≤Λ}

1 Since

KM

c

d (f m (x) , f (x)) = 0.

c it is enough to require that K ⊂ KM M is precompact.

C.2 Contraction principles

605

Assume that {f∗m µε } are exponentially good approximations of {f∗ µε } and in the sense that2 lim lim sup ε2 log µε ({x : d (f m (x) , f (x)) > δ}) = −∞.

m →∞

ε→0

Then {f∗ µε } satisﬁes an LDP in Y with good rate function I ≡ inf {I (x) : y = f (x)} . Proof. [41, Theorem 4.2.23].

2 The

separability on Y guarantees measurability of {x : d (f m (x) , f (x)) > δ} .

Appendix D: Gaussian analysis D.1 Preliminaries We start with a description of the general set-up of Gaussian analysis on a Banach space, following closely Ledoux’s Saint Flour notes [102]. Other references with a similar point of view are [103] and [42, Chapter 4]. A mean-zero Gaussian measure µ on a real separable Banach space E equipped with its Borel σ-algebra B and norm |·| is a Borel probability measure on (E, B) such that the law of each continuous linear functional on E is a zero-mean Gaussian random variable. We ﬁrst claim that 2 2 !ξ, x" dµ (x) < ∞. sup σ = ξ ∈E ∗ ,|ξ |≤1

Indeed, writing i : E ∗ → L2 (µ) = L2 (E, B, µ; R) for the injection map, σ is the operator norm of i which is bounded by the closed graph theorem. Since E is separable, the Banach norm |·| may be described as a supremum over a countable set (ξ n )n ≥1 of elements of the unit ball of the dual space E ∗ ; that is, for every x ∈ E, |x| = sup !ξ n , x" n ≥1

and in particular, the norm is a measurable map on (E, B). There is an abstract Wiener space factorization of the form j

E ∗ −→ L2 (µ) −→ E. i

Here i denotes the embedding of E ∗ into L2 (µ) and the linear, continuous map j is constructed so that i∗ = j, provided L2 (µ) is identiﬁed with its dual. By linearity, the construction of j is readily reduced to deﬁning j (ϕ), where ϕ ∈ L2 (µ) is non-negative, with total mass one, so that ϕ (x) dµ (x) yields a probability measure. The integrand x → x being trivially continuous, one has existence of the Bochner integral 1 j (ϕ) := xϕ (x) dµ (x) 1 Following [150], one may prefer to construct the Bochner integral over any compact K . Taking a compact exhaustion (K n ) of E it is easy to see that j (ϕIK n ) is Cauchy and we write j (ϕ) for the limit.

D.1 Preliminaries

607

as the unique element j (ϕ) ∈ E, so that for all λ ∈ E ∗ , !λ, j (ϕ)"E ∗ ,E = !λ, x"E ∗ ,E ϕ (x) dµ (x) . One deﬁnes E2∗ to be the closure of E ∗ , or more precisely: the closure of i (E ∗ ), in L2 (µ). The reproducing kernel Hilbert space H of µ is then deﬁned as H := j (E2∗ ) ⊂ j L2 (µ) ⊂ E. The map j restricted to E2∗ is linear and bijective onto H and induces a Hilbert structure > ? ˜ g˜ ∀h, g ∈ H !h, g"H := h, L 2 (µ)

where

˜ ≡ j|E ∗ −1 (h) ∈ L2 (µ) h ∈ H → h 2

is also known as a Paley–Wiener map.2 To summarize, we have the picture E ∗ −→ i (E ∗ ) and j|E 2∗ i

j

⊂ i (E ∗ ) =: E2∗ ⊂ L2 (µ) −→ E E2∗ ←→ H ⊂ E

:

and the triplet (E, H , µ) is known as an abstract Wiener space. Under µ, the ˜ (x) is a Gaussian random variable with variance |h|2 = !h, h" . map x → h H H Note that σ is also given by supx∈K |x|, where K is the closed unit ball of H. In particular, for every x ∈ H, |x| ≤ σ |x|H . Moreover, K is a compact subset of H. (To this end, use weak compactness of E ∗ to show that j is compact and conclude that j ∗ is also compact.) Deﬁnition D.1 The triplet (E, H , µ) is called an abstract Wiener space. Theorem D.2 (Cameron–Martin) For any h ∈ H, the probability measure µ (h + ·) is absolutely continuous with respect to µ, with density given by the formula

2

|h| µ (h + A) = exp − H 2 Proof. [102] or [42, Chapter 4]. 2 We

˜ also use the notation ξ (h) := h.

A

˜ dµ. exp −h

Appendix D

608

Example D.3 Take E = C0 ([0, 1] , R) and µ to be the Wiener measure, i.e. the distribution of a standard Brownian motion started at the origin. If m is a ﬁnitely supported measure on [0, 1], say m = ci δ t i with {ci } ⊂ R and {ti } ⊂ [0, 1], then clearly h = j ∗ j (m) is the element of E given by h (t) = ci min (ti , t) ; it satisﬁes

0

1

h˙ 2t dt =

2

2

!m, x" dµ (x) = |h|H .

By a standard extension, we can then identify H with the Sobolev space W01,2 ([0, 1] , R). Observe that for h ∈ H, we have 1 ˜ = j ∗ |E ∗ −1 (h) = h˙ t dWt . h 2 0

(While we equipped the Wiener space C0 ([0, 1] , R) with uniform topology, other choices are possible.)

D.2 Isoperimetry and concentration of measure Gaussian measures enjoy a remarkable isoperimetric property. Following [102], we state it in the form due to C. Borell. Theorem D.4 (Borell’s inequality) Let (E, H, µ) be an abstract Wiener space and A ⊂ E a measurable Borel set with µ (A) > 0. Take a ∈ (−∞, ∞] such that a 2 1 √ e−x /2 dx =: Φ (a) . µ (A) = 2π −∞ Then, if K denotes the unit ball in H and µ∗ stands for the inner measure3 associated with µ, then for every r ≥ 0, µ∗ (A + rK) = µ∗ {x + rh : x ∈ A, h ∈ K} ≥ Φ (a + r) .

(D.1)

The following corollary is applicable in a Gaussian rough path context. Corollary D.5 (generalized Fernique estimate) Let (E, H, µ) be an abstract Wiener space and A ⊂ E a measurable Borel set with µ (A) > 0. Assume f : E → R∪ {−∞, ∞} is a measurable map and N ⊂ E a µ-nullset such that for all x ∈ / N, |f (x)| < ∞

(D.2)

3 Measurability of the so-called Minkowski sum A + rK is a delicate topic. Use of the inner measure bypasses this issue and is not restrictive in applications.

D.2 Isoperimetry and concentration of measure

609

and for some positive constant c, ∀h ∈ H: |f (x)| ≤ c {|(f (x − h))| + σ |h|H } .

Then

2 exp η |f (x)| dµ (x) < ∞ if η <

(D.3)

1 . 2c2 σ 2

Proof. We have for all x ∈ / N and all h ∈ rK, where K denotes the unit ball of H and r > 0, {x : |f (x)| ≤ M } ⊃ {x : c (|f (x − h)| + σ |h|H ) ≤ M } ⊃ {x : c (|f (x − h)| + σr) ≤ M } {x + h : |f (x)| ≤ M/c − σr} .

= Since h ∈ rK was arbitrary,

{x : |f (x)| ≤ M } ⊃ ∪h∈r K {x + h : |f (x)| ≤ M/c − σr} {x : |f (x)| ≤ M/c − σr} + rK

= and we see that

µ [|f (x)| ≤ M ] = µ∗ [|f (x)| ≤ M ] ≥

µ∗ ({x : |f (x)| ≤ M/c − σr} + rK) .

We can take M = (1 + ε) cσr and obtain µ [|f (x)| ≤ (1 + ε) cσr] ≥ µ∗ ({x : |f (x)| ≤ εσr} + rK) . Keeping ε ﬁxed, take r ≥ r0 where r0 is chosen large enough such that µ [{x : |f (x)| ≤ εσr0 }] > 0. Letting Φ denote the distribution function of a standard Gaussian, it follows from Borell’s inequality that µ [|f (x)| ≤ (1 + ε) cσr] ≥ Φ (a + r) for some a > −∞. Equivalently,

¯ µ [|f (x)| ≥ x] ≤ Φ a +

x (1 + ε) cσ 2 ¯ ≡ 1 − Φ and using Φ ¯ (z) exp −z /2 we see that this implies with Φ 2 exp η |f (x)| dµ (x) < ∞

provided

2 1 1 . 2 (1 + ε) cσ Sending ε → 0 ﬁnishes the proof. η<

Appendix D

610

Corollary D.6 (large deviations) The family µε (·) = µ ε−1 (·) satisﬁes an LDP with good rate function 1 2 |h| ; h ∈ A ∩ H . I (A) = inf 2 H Proof. (Sketch) Borell’s inequality quickly leads to the upper bound, Cameron–Martin to the lower bound. See [102] for details.

D.3 L2 -expansions Recall the picture ι∗

E ∗ −→ ι∗ (E ∗ ) and ι|E 2∗

⊂ ι∗ (E ∗ ) =: E2∗ ⊂ L2 (µ) −→ E : E2∗ ←→ H ⊂ E ι

˜ (x) is a Gaussian random variable (with Given h ∈ H, the map x → h 2 variance |h|H ) under the measure µ. We can think of x → X (x) = x as an E-valued random variable with law µ. Then, for any ONB (hk ) ⊂ H we have the L2 -expansion X (x) = lim

m →∞

m

˜ k (x) hk a.s. h

k =1

where the sum converges in E for µ-a.e. x and in all Lp (µ)-spaces, p < ∞. For any A ⊂N we deﬁne X A (x) =

˜ k (x) hk h

k ∈A

˜k : k ∈ A . = Eµ X|σ h

a.s.

Note that for |A| < ∞, this is a ﬁnite sum with values in H; if |A| = ∞ this sum converges in E for µ-almost every x and in every Lp (µ). All this ˜ k : k ∈ A} follows from X A being the conditional expectation of X given {h and suitable (vector-valued) martingale convergence results, cf. [102] and the references cited therein.

D.4 Wiener–Itˆo chaos Let (E, H , µ) be an abstract Wiener space. Let (hk ) be the sequence of Her√ 2 k!hk is an mite polymomials deﬁned via eλx−λ /2 = λk hk (x) so that

D.4 Wiener–Itˆ o chaos

611

orthonormal basis of L2 (γ 1 ) where γ 1 is the canonical Gaussian measure on R. For any multi-index α = (α0 , α1 , . . . ) ∈ NN with |α| = α0 +α1 +· · · < ∞ we set √ Hα = α!Πi hα i ◦ ξ i (where α! = α0 !α1 ! . . . ). Then the family (Hα ) constitutes an orthonormal basis of L2 (µ). Deﬁnition D.7 The (real-valued) homogenous Wiener chaos W (n ) (µ) of degree n is deﬁned as4 W (n ) (µ, R) = φ ∈ L2 (µ) : !φ, Hα " = 0 for all α : |α| = n = span {Hα : |α| = n}

with closure in L2 (µ) .

Any element ψ ∈ W (n ) (µ) can be written as.5 ψ= !ψ, Hα " Hα . α :|α |=n

The (real-valued, non-homogenous) Wiener chaos of degree n is deﬁned as C (n ) (µ) = ⊕ni=0 W (n ) (µ) . Obviously, any ψ ∈ C (n ) (µ) can be written as ψ= !ψ, Hα " Hα = L2 - lim α :|α |≤n

m →∞

!ψ, Hα " Hα .

α :|α |≤n α k =0,k > m

Since α :|α |≤n ,α k =0,k > m !ψ, Hα " Hα is a polynomial of degree ≤ n in the variables ξ 1 , . . . , ξ m , we see that C (n ) (µ) is precisely the L2 -closure of all polynomials in ξ i of degree less than or equal to n. Theorem D.8 (Wiener–Itˆ o chaos integrability) (i) For ψ ∈ W (n ) (µ) and 1 < p < q < ∞ we have n q−1 2 |ψ|L p (P) . (D.4) |ψ|L p (P) ≤ |ψ|L q (P) ≤ p−1 (ii) For ψ ∈ C (n ) (µ) and 1 < p < q < ∞ we have6 n /2 −n |ψ|L p (P) . |ψ|L p (P) ≤ |ψ|L q (P) ≤ (n + 1) (q − 1) max 1, (p − 1) (D.5) 4 If F denotes a real separable Banach space, we can deﬁne the F -valued homogenous chaos W (n ) (µ, F ) as φ ∈ L 2 (µ; F ) : φ, H α = 0 for all α : |α| = n . 5 This sum is convergent µ-a.s. and in L 2 (µ). 6 By a somewhat more involved argument [39, Theorem 3.2.5], the constant on the n 2 but this will be of no right-hand side of (D.5) can be taken in the form C n pq −1 −1 advantage to us.

Appendix D

612

(iii) Let ψ ∈ C (n ) (µ) and 0 < p < q < ∞. Then there exists C = C (n, p, q) such that |ψ|L p (P) ≤ |ψ|L q (P) ≤ C |ψ|L p (P) . Proof. The estimate in (i) for 1 < p < q < ∞ is a well-known consequence of the hypercontractivity of the Ornstein–Uhlenbeck process on abstract Wiener spaces. Ad (ii). Take k ∈ {0, . . . , n} and call Jk : C (n ) (µ) → W (k ) (µ) the L2 projection on the kth homogenous chaos. Then # k /2 (p − 1) |ψ|L p if p ≥ 2 , |Jk ψ|L p ≤ −k /2 |ψ|L p if p < 2 (p − 1) which is easily seen from (D.4) when p > 2 and from a duality argument for 1 < p < 2. In particular, Jk : Lp → Lp is a bounded linear n operator for n any 1 < p < ∞. From ψ = k =0 Jk ψ, we have |ψ|L q ≤ k =0 |Jk ψ|L q and hence k n q−1 2 |ψ|L q ≤ |Jk ψ|L p p−1 k =0 # n k /2 (q − 1) if p ≥ 2 ≤ |ψ|L p k /2 −k if p < 2. (q − 1) (p − 1) k =0 n /2 −n , ≤ |ψ|L p (n + 1) (q − 1) max 1, (p − 1) as required. (iii) For the extension to 0 < p < q < ∞ it suﬃces to consider the case 0 < p ≤ 1, q = 2 and ψ ∈ C (n ) (µ). Using Cauchy–Schwarz, 2 p/2 2−p/2 E |ψ| = E |ψ| |ψ| 1/2 p 1/2 4−p ≤ (E (|ψ| )) E |ψ| (2−p/2) 1−p/4 p 1/2 n /2 2 (n + 1) (3 − p) ≤ (E (|ψ| )) E |ψ| , we obtain |ψ|L 2 (P) ≤ c |ψ|L p (P) for some constant c = c (p, n). A practical corollary is that, for ψ, ψ ∈ C (n ) (µ), we have ψψ L 2 (P) ≤ for c = c (n). (A direct proof of this can be found in [124, c |ψ| 2 ψ 2 L (P)

L (P)

Proposition 1.7.2] where it is used to establish equivalence of all moments.) If the previous theorem implies the qualitative statement |ψ|L p (P) ∼ |ψ|L q (P) for all p, q > 0 and ψ ∈ C (n ) (µ) ,

(D.6)

the following result can be viewed as an extension to p = 0, where L0 convergence is understood as convergence in µ-probability.

D.5 Malliavin calculus

613

Theorem D.9 (i) For any p ∈ [0, ∞) the (non-homogenous) Wiener chaos C (n ) (µ) is the Lp -closure of polynomials in ξ i of degree less than or equal to n. (ii) For any p ∈ (0, ∞) and any sequence of random variables in C (n ) (µ), convergence in µ-probability is equivalent to Lp -convergence. Proof. It suﬃces to check that a Cauchy sequence in probability, say Xn , also converges in Lp for p > 0. We argue by contradiction and assume it does not converge in Lp , for some p > 0. Then there exists ε > 0, such that for arbitrarily large m, n one has |Xn − Xm |L p > ε. Let δ ∈ (0, 1). It follows that & % |Xn − Xm | > δ ≤ P [|Xn − Xm | > δε] P |Xn − Xm |L p which, by assumption, tends to zero as m, n → ∞. On the other hand, using equivalence of |·|L q (P) and |·|L p (P) on C (n ) (µ), for any q > p, the next lemma applied with ξ = |Xn − Xm | implies that inf P [|Xn − Xm | > δ |Xn − Xm |L p ] > 0

n ,m

and so yields the desired contradiction. Lemma D.10 (Paley–Zygmund inequality) Let 0 δ |ξ|L p (P) ≥

(1 − δ ) p

|ξ|L p (P)

q p q −p

|ξ|L q (P)

. p/q

Proof. Set t = δ |ξ|L p (P) . From Eξ p ≤ tp + E (ξ p ; ξ ≥ t) ≤ tp + E (ξ q ) 1−p/q

P [ξ > t]

it follows that

P [ξ > t]

q −p p

p

≥

|ξ|L p (P) − tp p

|ξ|L q (P)

=

|ξ|L p (P) |ξ|L q (P)

p (1 − δ p ) .

D.5 Malliavin calculus Following [99], [136, Section 4.1.3] or [171, Section 3.3] we have the following notion of H-regularity for a Wiener functional F . Deﬁnition D.11 Given an abstract Wiener space (E, H, µ), a random variable (i.e. measurable map) F : E → R is a.s. continuously H-diﬀeren1 a.s., if for µ-almost every ω, the map tiable, in symbols F ∈ CH h ∈ H → F (ω + h)

(D.7)

614

Appendix D

is continuously Fr´echet diﬀerentiable. A vector-valued r.v. F = F 1 , . . . , F n : E → Rn is a.s. continuously H-diﬀerentiable iﬀ each F i is a.s. continuously H-diﬀerentiable. k Similarly, if (D.7) is a.s. k-times Fr´echet diﬀerentiable, we write F ∈ CH a.s. and say that F is k-times a.s. continuously H-diﬀerentiable. When k = ∞ we say that F is a.s. H-smooth. The notion of H-diﬀerentiability was introduced in [99] and plays a fundamental role in the study of transformation of measure on Wiener space. 1 -regularity is stronger than Integrability properties of F and DF aside, CH Malliavin diﬀerentiability in the usual sense. Indeed, by [136, Theorem 1,2 1 implies Dlo 4.1.3] (see also [99], [171, Section 3.3]) CH c -regularity where 1,2 the deﬁnition of Dlo c is based on the commonly used Shigekawa Sobolev space D1,p . (Our notation here follows [136, Sections 1.2, 1.3.4]). This remark will be important to us since it justiﬁes the use of Bouleau–Hirsch’s criterion (e.g. [136, Section 2.1.2]) for establishing absolute continuity of F . Proposition D.12 (Bouleau–Hirsch) Let (E, H, µ) be an abstract Wiener space and F = F 1 , . . . , F n : E → Rn a measurable map. Assume 1 is weakly non-degenerate, by which we mean that the Malliavin F ∈ CH covariance matrix 3 2 σ (ω) := DF i , DF j H i,j =1,...,n ∈ Rn ×n is µ-almost surely non-degenerate. Then F , viewed as an Rn -valued random variable under µ, admits a density with respect to Lebesgue measure on Rn . 1,2 1 Proof. From [136, Section 4.1.3], F ∈ CH implies that f ∈ Dloc and the usual Bouleau–Hirsch criterion [136, Section 4.1.3] applies.

D.6 Comments Section D.1 follows closely Ledoux’s Saint Flour notes [102]. Other references with a similar point of view are Ledoux and Talagrand [103] and Deuschel and Stroock [42, Chapter 4]. Sections D.2 and D.3 follow Ledoux [102]. The generalized Fernique estimate is taken from Friz and Oberhauser [60]. The basic deﬁnitions of Section D.4 also follow Ledoux [102, Section 5]. Integrability of the Wiener–Itˆ o chaos via hypercontractivity of the Ornstein–Uhlenbeck semi-group is classical, see Ledoux [102, Section 8] as well as most references on Malliavin calculus, such as Nualast [135]. Theorem D.9 appears in Schreiber [153], our proof is taken from de la Pe˜ na and Gin´e [39]. Section D.5:. There are many good books on Malliavin calculus, including Malliavin [124], Nualart [135] and Shigekawa [155]. The concept of H-diﬀerentiability is due to Kusuoka [99]; see Nualart [135] and in partic¨ unel and Zakai [171, Section 3.3]. ular S¨ uleyman Ust¨

Appendix E: Analysis on local dirichlet spaces E.1 Quadratic forms Consider a Hilbert space (H, !·, ·") with a non-positive self-adjoint operator 1 L, deﬁned on a dense linear subspace D (L). Spectral calculus √ allows us to √ deﬁne the self-adjoint operator −L, with domain D −L . A quadratic √ form, deﬁned on D := D (Q) := D −L , is then given by ? >√ √ −Lf, −Lf Q (f, f ) = and this form is non-negative in the sense that Q (f, f ) ≥ 0 for all f . (By polarization, this induces a symmetric bilinear form Q, deﬁned on D × D, 3 2√ √ −Lf, −Lg .) It is well known2 that this yields a so that Q (f, g) = closed form in the sense that whenever (fn ) ⊂ D such that fn → f in H with n → ∞ and Q (fn − fm , fn − fm ) → 0 with n, m → ∞ then f ∈ D and Q (fn − f, fn − f ) → 0 with n → ∞. Conversely, every such form arises in this way from a (non-positive) self-adjoint operator L. In many applications one has forms which are not closed but closable in the sense that whenever (fn ) ⊂ D is such that fn → 0 in H with n → ∞ and Q (fn − fm , fn − fm ) → 0 with n, m → ∞ then Q (fn , fn ) → 0 with n → ∞. In this case, Q admits a (minimal) extension to a closed form Q; we shall not distinguish between Q and Q. Let us further recall that the domain D of a (symmetric, closed, nonnegative) form is a Hilbert space under the inner product !f, g"Q = !f, g" + Q (f, g) . By spectral calculus, one deﬁnes Pt = etL : H → H, which yields a (strongly continuous, contraction) semi-group (Pt : t ≥ 0) on H with inﬁnitesimal generator given by L. For t > 0, Pt maps H into D and t !Pt f, g" = !f, g" − Q (Ps f, g) ds for f, g ∈ D; 0 1 See, 2 See,

for example, Yosida [176, Chapter XI]. for example, [38].

Appendix E

616

as may be seen by integrating the equality t√ √ λe−λs λds, λ ∈ [0, ∞) e−tλ = 1 − 0

against d !Eλ f, g", where {Eλ : λ ∈ [0, ∞)} is the spectral resolution of the (non-negative, self-adjoint) operator −L. The following lemma is based on similar ideas. √ Lemma E.1 (i) For all f ∈ D = D −L , 1 !f, f " ∧ Q (f, f ) . Q (Pt f, Pt f ) ≤ 2t

(E.1)

(ii) Assume fn → f in H and assume that (fn ) is bounded in D, !·, ·"Q ; sup !fn , fn "Q < ∞. n

Then f ∈ D and fn → f weakly in D, !·, ·"Q . Proof. (i) It suﬃces to integrate the elementary inequality λe−2tλ ≤

1 ∧ λ, λ ∈ [0, ∞) 2t

against d !Eλ f, f ", where {Eλ : λ ∈ [0, ∞)} is the spectral resolution of −L. (ii) Step 1. Let us ﬁrst assume that f ∈ D. For every h ∈ D (L) ⊂ √ D = D −L we have −Q (fn , h) = !fn , Lh" → !f, Lh" and so, for all h ∈ D (L) , !fn , h"Q → !f, h"Q . What we want is that this convergence holds for all h ∈ D. If we can show density of D (L) in D, !·, ·"Q , the extension to all h ∈ D is straightforward. But this density statement is also easy to see: for instance, given h ∈ D one has Pt h ∈ D (L) and Pt h → h in D since 2 Q !Pt h − h, Pt h − h" = λ e−λt − 1 d !Eλ h, h" → 0 [0,∞)

by bounded convergence, using λd !Eλ h, h" = Q (h, h) < ∞ since h ∈ D. Step 2. Let us now consider an arbitrary f ∈ H. We mollify using the semigroup. For t > 0 we set ft := Pt f and similarly ftn := Pt f n . It is easy to

E.2 Symmetric Markovian semi-groups and Dirichlet forms

617

see that ftn → ft in H as n → ∞ and also that (ftn ) is uniformly bounded in the sense that M := sup !ftn , ftn "Q < ∞. n ,t∈(0,1]

Apply step 1 to see that ftn → ft weakly in D, !·, ·"Q . In particular, !ft , ft "Q ≤ lim inf !ftn , ftn "Q ≤ M. n →∞

Hence supt∈(0,1] !ft , ft "D ≤ M < ∞ and this entails that f ∈ D. Indeed, by monotone convergence λd !Eλ f, f " = lim λe−2λt d !Eλ f, f " t↓0

[0,∞)

≤

[0,∞)

lim !ft , ft "Q < ∞, t↓0

which shows that f ∈ D. We can now appeal to step 1 to conclude the proof.

E.2 Symmetric Markovian semi-groups and Dirichlet forms Let us now consider a quadratic (equivalently: symmetric bilinear) form E (·, ·) on the Hilbert space L2 (E, m) where E is assumed to be a locally compact Polish space, m is a Radon measure on E of full support. The d classical example to have Lebesgue

in mind2 is E = R , equipped with dx, deﬁned on D (E) = W 1,2 Rd , the measure, and E (f, f ) = |∇f (x)| usual Sobolev space of L2 -functions on Rd with weak derivatives in L2 . Following Fukushima et al. [70] we have the following abstract Deﬁnition E.2 A non-negative deﬁnite symmetric bilinear form E, densely deﬁned on L2 (E, m), is called a Dirichlet form if it is (i) closed in the sense of quadratic forms; i.e. whenever (fn ) ⊂ D (E) is such that fn → f in L2 with n → ∞ and E (fn − fm , fn − fm ) → 0 with n, m → ∞ then f ∈ D (E) and E (fn − f, fn − f ) → 0 with n → ∞; (ii) Markovian in the sense that f ∈ D (E) , g = (0 ∨ f ) ∧ 1 =⇒ g ∈ D (E) , E (g, g) ≤ E (f, f ) . The pair (E, D (E)) is called a Dirichlet space relative to L2 (E, m).

Appendix E

618

Everything said in the previous section applies to Dirichlet forms: in particular, there exists a non-positive self-adjoint operator L on L2 (E, m) so that √ −L , D (E) = D ? >√ √ −Lf, −Lg E (f, g) = L2

and there is a (strongly continuous, contraction) semi-group (Pt : t ≥ 0) on L2 (E, m) with inﬁnitesimal generator given by L, so that !Pt f, g"L 2 = !f, g"L 2 −

t

E (Ps f, g) ds

for f, g ∈ D (E)

0

and Lemma E.1 is also valid. The Markovian property of E is equivalent to Markovianity of the associated L2 -semi-group in the sense that ∀t > 0 : f ∈ L2 (E, m) , 0 ≤ f ≤ 1 m-a.e. =⇒ 0 ≤ Pt f ≤ 1 m-a.e. The Dirichlet forms interesting to us enjoy further properties of the following kind. Deﬁnition E.3 A Dirichlet form E is called regular if there exists a core; that is, a subset C ⊂ D (E) ∩ Cc (E) which is dense in D (E) with respect to E1 and dense in Cc (E) with respect to uniform norm. It is called strongly local if E (f, g) is zero whenever f ∈ D (E) is constant on a neighbourhood of the support of g ∈ D (E). Strong locality of E has the interpretation of no killing and no jumps (e.g. within its Beurling–Deny decomposition [70, Section 3.2]). Any such Dirichlet form can be written as dΓ (f, g) E (f, g) = E

where Γ is the so-called energy measure, a positive semi-deﬁnite, symmetric bilinear form D (E) with values in the signed Radon measures on E. It can be deﬁned by 1 ϕdΓ (f, f ) = E (f, ϕf ) − E f 2 , ϕ 2 for every f ∈ D (E) ∩ L∞ and ϕ ∈ D (E) ∩ Cc . In all our applications, Γ (f, g) will be absolutely continuous with respect to m and we shall simply write dΓ (f, g) = Γ (f, g) dm

E.2 Symmetric Markovian semi-groups and Dirichlet forms

619

and call the map f , g → Γ (f, g), D (E) × D (E) → L1 (E, m), the carr´e du champ operator. The intrinsic metric of E is then deﬁned as d (x, y) = sup {f (x) − f (y) : f ∈ D (E) and continuous, dΓ (f, f ) /dm ≤ 1 m-a.e.} . In general, d can be degenerate, i.e. d (x, y) = ∞ or ρ (x, y) = 0 for some x = y. Deﬁnition E.4 A strongly local Dirichlet form E with domain D (E) is called strongly regular if it is regular and its intrisic metric is a genuine metric on E whose topology coincides with the original one. Let us recall (cf. Deﬁnition 5.17) that a geodesic (or geodesic path) joining two points x, y ∈ E is a continuous path γ : [0, 1] → E such that γ (0) = x, γ (1) = y and d (γ s , γ t ) = |t − s| d (x, y)

∀0 ≤ s < t ≤ 1

and that E is called a geodesic space if all points x, y can be joined by a geodesic. Observe that z := γ 1/2 is a midpoint of x, y in the sense that 1 d (x, y) . 2 If all x, y ∈ E have a midpoint we say that E has the midpoint property. In fact, any complete metric space with the midpoint property is geodesic: given x, y, iterated use of the midpoint property yields γ 1/2 then γ 1/4 , γ 3/4 and so on, which yields (a candidate for) a geodesic, deﬁned on all dyadic rationals. The extension of γ = γ t to all t ∈ [0, 1] is possible by the completeness assumption; continuity of γ is then easy to check. d (x, z) = d (z, y) =

Proposition E.5 Assume E is a strongly local, strongly regular Dirichlet form on E so that (E, d) is a complete metric space. Then (E, d) is a geodesic space. Proof. We follow [165]. Fix arbitrary elements x, y ∈ E and set R = d (x, y). It then suﬃces to show the midpoint property ∃z ∈ E : d (x, z) = d (z, y) = R/2. We argue by contradiction. Assuming that there is no midpoint z, we have ¯R /2 (y) = ∅. By compactness, these sets have a positive dis¯R /2 (x) ∩ B B tance, say 3ε > 0, and it is also clear that ¯R /2+ε (y) > ε. ¯R /2+ε (x) , B d B We now deﬁne the continuous function  ¯R /2+ε (x)  d (x, ·) − (R/2 + ε) on B ¯R /2+ε (y) (R/2 + ε) − d (y, ·) on B f0 =  0 else

Appendix E

620

and note that dΓ (f0 , f0 ) = 1B¯ R / 2 + ε (x) dΓ (d (x, ·) , d (x, ·)) + 1B¯ R / 2 + ε (y ) dΓ (d (y, ·) , d (y, ·)) ≤ dm, using the fact that Γ (d (x, ·) , d (x, ·)) ≤ 1 for all x. Moreover, f0 (x) − f0 (y) = R + 2ε > R. But this is a contradiction to R = sup {f (x) − f (y) : f ∈ D (E) and f continuous, Γ (f, f ) ≤ 1} .

E.3 Doubling, Poincar´e and quasi-isometry We now make the standing assumption that E is a strongly local, strongly regular Dirichlet form on E. As we shall see, the following properties (I)– (III) have remarkably powerful implications. Deﬁnition E.6 Let (E, D (E)) be a Dirichlet space relative to L2 (E, m). Assume that E is a strongly local and strongly regular Dirichlet form, with intrinsic metric d. We then say that E has (or satisﬁes) the (I) completeness property if the metric space (E, d) is a complete metric (and hence by Proposition E.5 a geodesic) space; (II) doubling property if there exists a doubling constant N = N (E) such that ∀r ≥ 0, x ∈ E : m (B (x, 2r)) ≤ 2N m (B (x, r)) ; (III) weak Poincar´ e inequality if there exists CP = CP (E) such that for all r ≥ 0 and f ∈ D (E), f − f¯r 2 dm ≤ CP r2 Γ (f, f ) dm B (x,r )

B (x,2r )

where −1 f¯r = m (B (x, r))

f dm. B (x,r )

Let us make two useful remarks. First, the doubling property (II) readily implies N (E.2) ∀0 < r < r : m (B (x, r )) ≤ (2r /r) m (B, r) ; and second, the right-hand side in the weak Poincar´e inequality can be written as 2 f − f¯r 2 dm = inf |f − α| dm. (E.3) B (x,r )

α ∈R

B (x,r )

E.3 Doubling, Poincar´e and quasi-isometry

621

Deﬁnition E.7 Two (strongly local and strongly regular) Dirichlet forms ˜ E and E are quasi-isometric if D (E) = D E˜ and there exists Λ ≥ 1 such that for all f in the common domain, 1 E (f, f ) ≤ E˜ (f, f ) ≤ ΛE (f, f ) . Λ

If we write d, d˜ for the respective intrinsic metrics associated with E and ˜ E, it is clear that the metrics are Lipschitz equivalent in the sense that 1 Λ1/2

d (x, y) ≤ d˜(x, y) ≤ Λ1/2 d (x, y)

for all x, y ∈ E. Theorem E.8 Let E satisfy properties (I)–(III). Then, assuming E and E˜ are quasi-isometric, E˜ also satisﬁes properties (I)–(III), with new (doubling and Poincar´e) constants depending on Λ. Proof. Invariance of the completeness property is clear from Lipschitz ˜ equivalence of d and d. Second, we assume doubling for balls with respect to d. Then, using (E.2), ˜ (x, 2r) m B ≤ m B x, 2Λ1/2 r = m B x, 2Λr/Λ1/2 N ≤ (4Λ) m B x, r/Λ1/2 N ˜ (x, r) ≤ (4Λ) m B ˜ (x, r) = y ∈ E : d˜(x.y) < r . At last, let us write where B −1 ˜ (x, r) f˜r = m B

f dm

B˜ (x,r )

˜ (x, r). Using (E.3) and the weak Poincar´e infor the average of f over B equality for E, we see that 2 2 ˜ |f − α| dm f − fr dm ≤ inf α ∈R B (x,r Λ 1 / 2 ) B˜ (x,r ) f − f¯r Λ 1 / 2 2 dm = B (x,r Λ 1 / 2 ) 2 ≤ CP r Λ dΓ (f, f ) B (x,2r Λ 1 / 2 ) ˜ (f, f ) ≤ CP r2 Λ2 dΓ B˜ (x,2r Λ)

Appendix E

622

˜ denotes the energy measure of E. ˜ By a covering argument, dewhere Γ rived from (I), (II), this implies the weak Poincar´e property for E˜ on L2 (E, m). A special case of quasi-isometry arises from scaling. Proposition E.9 (scaling) Let E be a Dirichlet form on L2 (E, m) with doubling and Poincar´e constants N, CP respectively. Then, for every ε > 0, the scaled Dirichlet form E ε ≡ εE satisﬁes (I)–(III) with the same doubling and Poincar´e constants N, CP and with intrinsic metric given by ∀x, y ∈ E : dε (x, y) =

1 d (x, y) . ε1/2

Proof. The relation dε ≡ d/ε1/2 is an obvious consequence of the definition. We only need to check the behaviour of doubling and Poincar´e constants and this quasi-isometry. Writing B ε for balls with respect to dε , we obviously have B ε (x, r) = B x, rε1/2 for any x ∈ E, r > 0. Clearly then, = 2Q m (B ε (x, r)) m (B ε (x, 2r)) ≤ 2Q m B x, rε1/2 and so Q is also the doubling constant for E ε . Finally, note that frε

ε

−1

f dm = f¯r ε 1 / 2

:= m (B (x, r))

B ε (x,r )

and so

2

B ε (x,r )

|f − frε | dm

= B (x,r ε 1 / 2 )

f − f¯r ε 1 / 2 2 dm

≤

CP r2 ε

dΓ (f, f )

B (x,2r ε 1 / 2 )

= CP r2

dΓε (f, f ) B ε (x,2r )

where Γε = εΓ is the energy measure of E ε . We see that CP is the Poincar´e constant for E ε and the proof is ﬁnished.

E.4 Parabolic equations and heat-kernels

623

E.4 Parabolic equations and heat-kernels Recall the standing assumption that E is a strongly local, strongly regular Dirichlet form on L2 (E, m). There is an associated non-positive self-adjoint operator L on L2 (E, m) and a strongly continuous semi-group (Pt : t ≥ 0). We can now consider weak solutions to the parabolic partial diﬀerential equation ∂t u = Lu; that is, a function u : t → u (t, ·) ∈ D (E) so that ∀g ∈ D (E) : !u (t, ·) , g"L 2 = !u (0, ·) , g"L 2 −

t

E (u (s, ·) , g) ds. 0

What one has in mind is that actually u = u (t, x) for (t, x) ∈ [0, ∞) × E, regarded as a one-parameter family (u (t, ·) : t ≥ 0) depending only on space. (Obviously, the semi-group operator Pt u0 yields a solution to this PDE with intial date u (0, ·) = u0 .) The notion of solution can be localized. Indeed, by restricting ourselves to times in some interval I ⊂ [0, ∞) and a test function g compactly supported on some open set G ⊂ E, we can speak of (local) weak solutions to ∂t u = Lu on Q where Q = I ×G is a (parabolic) cylinder. All of the following four theorems are classical, proofs can be found in [163] and [164, p. 304]. Theorem E.10 (de Giorgio–Moser–Nash regularity) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III). Then there exist constants η R ∈ (0, 1) and CR (depending only on N, CP , i.e. the doubling and Poincar´e constants of E) so that3 η 1/2 |s − s | + d (y, y ) sup |u (s, y) − u (s , y )| ≤ CR sup |u| . r u ∈Q 2 (s,y ),(s ,y )∈Q 1 whenever u is a non-negative weak solution of the parabolic partial diﬀer2 cylinder Q ≡ t − 4r , t × B (x, 2r) for ential equation ∂s u = Lu on some 2 some reals t, r > 0. Here, Q1 ≡ t − r2 , t − 2r2 × B (x, r) is a subcylinder of Q2 . Theorem E.11 (parabolic Harnack inequality) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes 3 Strictly speaking, the statement is that any non-negative weak solution to ∂ u = Lu s on Q 2 has an m almost-identical version that enjoys this regularity.

Appendix E

624

(I)–(III). Then there exists a constant CH which depends only on N, CP (the doubling and Poincar´e constants of E) such that sup (s,y )∈Q −

u (s, y) ≤ CH

inf

(s,y )∈Q +

u (s, y)

whenever u is a non-negative weak solution of the parabolic partial diﬀer2 Q = t − 4r , t × B (x, 2r) for ential equation ∂t u = Lu on some cylinder − 2 2 × B (x, r) and Q+ = = t − 3r , t − 2r some reals t, r > 0. Here, Q t − r2 , t × B (x, r) are lower and upper subcylinders of Q separated by some elapse of time. Theorem E.12 (heat-kernel) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III). Let L and (Pt ) denote the associated self-adjoint operator resp. Markovian semi-group. Then (i) there exists a continuous function, called the heat-kernel, p : (0, ∞) × E × E → [0, ∞), symmetric in the last two arguments, i.e. p (t, x, y) = p (t, y, x), so that ∀t > 0 : Pt f = p (t, ·, y) f (y) dm (y) , f ∈ L2 ; (ii) for every ﬁxed x ∈ E, the map (t, y) → p (t, x, y) is a (global, weak) solution to ∂t u = Lu on (0, ∞) × E; (iii) it satisﬁes the Chapman–Kolmogorov equations: for all s < t and x, y ∈ E, p (t, x, z) = p (s, x, y) p (t − s, y, z) dy. The proof of the heat-kernel existence (cf. [164, p. 304]) follows immediately from an estimate on the operator norm Pt L 1 →L ∞ , which in turn follows from suitable Sobolev inequalities. Let us note that (t, y) → p (t, x, y) is a weak solution to ∂t u = Ly u and so, by Harnack’s inequality, we have . p (t, x, x) ≤ cH inf p (2t, x, y) : y ∈ B x, t1/2 Integrating this estimate over the ball B x, t1/2 , we obtain m B x, t1/2 p (t, x, x) ≤ cH p (2t, x, y) dy ≤ cH , B a (x,t 1 / 2 ) which leads to an on-diagonal estimate for the heat-kernel. Let us now state the full estimates.

E.5 Symmetric diﬀusions

625

Theorem E.13 (Aronson heat-kernel estimates) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III) and let p denotes its heat kernel. Then (i) for every ε > 0, there exists a constant CU which depends only on ε, N, CP (the doubling and Poincar´e constants of E) such that the following upper bound holds, 2 CU d (x, y) p (t, x, y) ≤ exp − (4 + ε) t ; m B x, t1/2 m B y, t1/2 (ii) there exists CL = CL (E) such that the following lower bound holds, 2 CL d (x, y) 1 1 exp − , p (t, x, y) ≥ CL m B x, t1/2 t always for all t, x, y ∈ (0, ∞) × E × E. One should observe that the exponent in the upper heat-kernel bound does not involve any constant and implies 2

lim sup t log p (t, x, y) ≤ − t→0

d (x, y) , 4

as is seen by sending ε → 0 after taking log, multiplying by t and taking the lim sup. (It is known, however, that one cannot take ε = 0 in the actual upper heat-kernel bound.) A famous result due to Varadhan [172], in the setting of diﬀusions on Euclidean space with elliptic generator, states that the lim sup above can be replaced by a genuine limit and equality holds. An extension to free nilpotent groups was given by Varopoulos [173], the extension to the present abstract setting was obtained by Ram´ırez [141]. Theorem E.14 (Varadhan–Ram´ırez formula) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III) and let p denotes its heat kernel. Then, for all x, y ∈ E, 2

4t log p (t, x, y) → −d (x, y) as t → 0.

E.5 Symmetric diﬀusions As is well known, a Dirichlet form (always assumed to be symmetric) induces a symmetric Markov process, although the construction, e.g. [70, Chapter 7], involves some subtleties. In the present context it is easier to proceed directly, i.e. by using the heat-kernel associated with E as the transition density of a time-homogenous diﬀusion.

Appendix E

626

Proposition E.15 (associated Markov process) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III) and let p denotes its heat kernel. For every x ∈ E, there exists a Markov process X = X x , deﬁned on some probability space (Ω, F,P) , P = Px , with the property that, for any 0 ≤ t1 < · · · < tn ≤ 1 and any measurable subset B of the n-fold product of E, p (t1 , x, y1 ). . . p (tn − tn −1 , yn −1 , yn ) dy1 . . . dyn . P [(Xt 1 , . . . , Xt n ) ∈ B] = B

In fact, we may take P as a Borel measure on Ω = Cx ([0, ∞), E) so that X can be realized as the canonical coordinate process, Xt (ω) ≡ ω t . Proof. This is classical and we shall be brief. Thanks to the Chapman– Kolmogorov equations, p (t1 , x, y1 ) . . . p (tn − tn −1 , yn −1 , yn ) dy1 . . . dyn µt 1 ,...,t n (B) := B

deﬁnes a consistent set of ﬁnite-dimensional distributions. By Kolmogorov’s extension theorem, there exists a unique probability measure on E [0,∞) which has the correct ﬁnite-dimensional distributions and ω : E [0,∞) → E, ω → ω t is a realization of X with X0 = x. It is easy to see that Kolmogorov’s criterion is satisﬁed (this follows a fortiori from the upper heat-kernel bounds, although softer arguments are possible) and so we can switch to a version of X with a.s. continuous sample path. The law of this process is indeed a Borel measure on Cx ([0, ∞), E), and the coordinate process on that space has the same law. Remark E.16 Let E be as in the previous proposition, with doubling and Poincar´e constant given by N, CP respectively. If X is the symmetric diﬀusion associated with E, started at some ﬁxed point x ∈ E, then the scaling process X ε (·) = X (ε·) is the symmetric diﬀusion associated with the scaled Dirichlet form εE. In this context, recall from Proposition E.9 that the associated intrinsic metric was precisely dε ≡ d/ε1/2 and that doubling/Poincar´e holds for εE with identical constants N, CP . Proposition E.17 (localized lower heat-kernel bounds) [163] Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III). Write X x for the associated symmetric diﬀusion. For x0 ∈ E and r > 0, deﬁne ξ xB (x 0 ,r ) = inf {t ≥ 0 : Xtx ∈ / B (x0 , r)} .

E.6 Stochastic analysis

627

Then the measure P Xtx ∈ · ; ξ xB (x 0 ,r ) > t admits a density with respect to m; we call it pB (x 0 ,r ) (t, x, y) dy. Moreover, if x, y are two elements of B (x0 , r) joined by a curve γ which is at a d-distance R > 0 of E/B (x0 , r), then there exists a constant C which depends only on N, CP (the doubling and Poincar´e constants of E) such that 2 d (x, y) Ct 1 1 exp −C exp − 2 , pB (x 0 ,r ) (t, x, y) ≥ C m B x, δ 1/2 t R where δ = min t, R2 .

E.6 Stochastic analysis Let us assume, throughout, that E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III) and write X for the associated symmetric diﬀusion process. It should be no surprise that the strong Gaussian tail estimates for the heat-kernel imply sample paths regularity reminiscent of Brownian sample paths. Moreover, we will establish an abstract Schilder and support theorem.

E.6.1 Fernique estimates Lemma E.18 For every η < 1/4 there exists M only dependent on η and N, CP (the doubling and Poincar´e constants of E) so that 2 d (Xs , Xt ) x ≤ M < ∞. sup sup E exp η t−s x∈E 0≤s< t≤1 In other words, X satisﬁes the Gaussian integrability condition A.18, uniformly over all possible starting points. Proof. Since (Xt ) is a (time-homogenous) Markov prosess, we clearly have 2 2 d (Xs , Xt ) d (x, Xt−s ) x x ≤ sup E exp η . E exp η t−s t−s x∈E ˜ (·) ≡X ((t−s) ·). We now ﬁx s < t in [0, 1] and consider the scaled process X Following Remark E.16, the corresponding scaled (intrinsic) metric is d˜ ≡ 1/2 and so d/ |t − s| 2 2 d (Xs , Xt ) x x ˜ ˜ ˜ ≤ sup E exp η d X0 , X1 . E exp η t−s x∈E

Appendix E

628

˜ hold with constants only depending Now, the heat-kernel estimates for X on N, CP (i.e. independent of the scaling) and so we obtain E

x

2 ˜ ˜ ˜ exp η d X0 , X1 ≤ c1 exp − E

1 2 − η d (x, y) dy 4(1 + ε)

where η < 1/4, ε > 0 small enough so that η < The last integral is of the form

f (d (x, y)) dy ≤ N

(∗) =

∞

1 4(1+ ε)

and c1 = c1 (ε, η).

f (r) rN −1 dr < ∞

0

E

1 where f (r) = e−c 2 r , c2 = 4(1+ ε) − η and N denotes the doubling constant. To see this, let us ﬁrst remark that the doubling property (II) implies 2

N

∀r ≥ 1, x ∈ E : m (B (x, r)) ≤ (2r) m (B (x, 1)) ;

(E.4)

as is seen by taking N as the smallest integer such that r ≤ 2N , so that N N m (B (x, r)) ≤ 2.2N −1 m B x, r/2N ≤ (2r) m (B (x, 1)) . We then have f (d (x, y)) dy E

R = lim f (·) d (m (B (x, ·))) R →∞ 0 ∞ =− f (r) (m (B (x, r))) dr 0 ∞ ≤ c3 f (r) rN dr 0 ∞ ≤ c3 N rN −1 f (r) dr

as a Riemann–Stieltjes integral using integration by parts from (E.4) and − f ≡ |f | using integration by parts.

0

Proposition E.19 For every α ∈ [0, 1/2), there exists η > 0, only depending on N, CP (the doubling and Poincar´e constants of E), so that 2 sup Ex exp η |X|α -H¨o l;[0,1] < ∞.

x∈E

Proof. Immediate from Section A.4 of Appendix A.

E.6 Stochastic analysis

629

E.6.2 Schilder’s theorem We can now prove a sample path large deviation statement for the family (X (ε·) : ε > 0). To this end, let us recall our notation 2 |x|W 1 , 2

≡

sup

d xt i , xt i + 1 2 , x ∈ C ([0, 1] , E) , |ti+1 − ti |

D ⊂D[0,T ] i:t ∈D i

and, writing xD for the piecewise geodesic approximation based on some D = (ti ), d xt i , xt i + 1 2 D 2 x 1 , 2 = , W |ti+1 − ti | i:t i ∈D

as was seen in Exercise 5.24. We then have Theorem E.20 (Schilder’s theorem) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III). Write X for the symmetric diﬀusion associated with E, started at some ﬁxed point o ∈ E, and set X ε (t) = X (εt). Then the family (X ε (t) : ε > 0) satisﬁes a large deviation principle. More precisely, if Pε = (X ε )∗ P denotes the law of X ε , viewed as a Borel measure on the Polish space (Co ([0, 1] , E) , d∞ ), then (Pε : ε > 0) satisﬁes a large deviation principle on this space with good rate function given by4 I (x) =

1 2 |x| 1 , 2 ∈ [0, ∞] , 4 W ;[0,1]

(E.5)

deﬁned for any x ∈ Co ([0, T ] , E) , d∞ . Proof. (Upper bound5 ) Write x for a generic path in Co ([0, 1] , E) and let xm denote the piecewise geodesic approximation of x interpolated at points in Dm = {i/m : i = 0, . . . , m}. For brevity, write H for Wo1,2 ([0, 1] , E) and 2 |x|H

:=

2 |x|W 1 , 2

d xt i , xt i + 1 2 . = sup |ti+1 − ti | D ⊂[0,1]

2 Step 1: For G open and non-empty, l := inf |h|H : h ∈ G ∩ H < ∞ and so ! " 2 Pε [xm ∈ G] = Pε [xm ∈ G ∩ H] ≤ Pε |xm |H ≥ l .

4 Recall 5 The

that |x|2W 1 , 2 ;[0 , 1 ] ≡ sup D ⊂[0 , 1 ]

2 d xt i ,xt i + 1 |t i + 1 −t i |

.

argument follows closely the argument of Schilder’s theorem for Brownian motion, Theorem 13.38.

Appendix E

630

By Chebyshev’s inequality, it follows that Pε [xm ∈ G] is bounded by6 m η 2 L 1 η −lη /ε m 2 −lη /ε |x |H = e e Eε exp Eε exp d x i −1 , i m m ε ε i=1 m m 2 L ε −lη /ε = e E exp η d Xε i −1 ,ε i m m m i=1 ≤ e−lη /ε Mηm , where we used the Markov property in the last estimate; Mη is the constant of Lemma E.18, ﬁnite for any η < 1/4. It follows that lim sup ε log Pε [xm ∈ G] ≤ −lη, ε→0

and upon sending η ↑ 1/4 shows that 1 lim sup ε log Pε [xm ∈ G] ≤ − I (G) . 4 ε→0 Step 2: We show that geodesic approximation to X m is an exponentially good approximation to X in the sense that for every δ > 0, lim sup ε log Pε d∞;[0,1] (xm , x) ≥ δ ≤ −∞ as m → ∞. ε→0

Indeed, ﬁx α ∈ [0, 1/2) and observe that |X|α -H¨o l;[0,1] has a Gaussian tail. Using sup X D α -H¨o l;[0,1] ≤ 3 |X|α -H¨o l;[0,1] D

(thanks to Proposition 5.20) and d Xt , XtD ≤ d (Xt , Xt D ) + d Xt D , XtD is readily follows that Pε d∞;[0,1] (xm , x) ≥ δ ≤ P d∞;[0,1] (X m (ε·) , X (ε·)) ≥ δ 0 / δ ε m = P sup d (Xεt , Xεt ) ≥ 1/2 ε t∈[0,1]    (ε) (ε) dε Xs , Xt δ ≥ ≤ P  sup mα  . α |t − s| ε1/2 0≤s< t≤1 The proof is then easily ﬁnished noting that for α ∈ (0, 1/2), the corresponding α-H¨older “norm” of X (ε) , with respect to dε , has a Gaussian tail which only depends on the doubling and Poincare constants, both of which are independent of ε. 6E

ε

denotes expectations with respect to P ε .

E.6 Stochastic analysis

631

Step 3: Exactly as in the Brownian motion case, Theorem 13.38. Proof. (Lower bound) It is enough to consider an open ball of ﬁxed radius, say 2δ, centred at some h ∈ H. Write again Dm = {i/m : i = 0, . . . , m} and set ¯ m (h, δ) = {x ∈ Co ([0, 1] , E) : ∀t ∈ Dm : |x (t) − h (t)| ≤ δ} . B Writing B (h, 2δ) ⊂ C ([0, 1] , E) for the open ball of radius 2δ in the uniform distance, centred at h, we can estimate m m ¯ (h, δ) − Pε B ¯ (h, δ) \B (h, 2δ) . Pε [B (h, 2δ)] ≥ Pε B The second term can be handled with the upper bound already proven. Indeed, let us assume that m is large enough so that max |h|0; [ i −1 , i ] < δ/2. m m

i=1,...,m

¯m It then follows for any x (·) ∈ B (h, δ) \B (h, 2δ), there exists some i−1 ithat time t ∈ m , m so that d xt , x i −1 , d xt , x mi ≥ δ/2. m ≥ 1 or t − i ≥ 1 we see that |x|2 1 , 2 ≥ m δ 2 . Since either t − i−1 W m 2m m 2m 2 ¯ m (h, δ) \B (h, 2δ), we Hence, using the upper bound with the closed set B see that 2 m ¯ (h, δ) \B (h, 2δ) ≤ − 1 mδ → −∞ as m → ∞ lim sup ε log Pε B 4 2 ε→0 m ¯ so that the other term, Pε B (h, δ) , gives the main contribution. Writing

m ε G , xi−1 , xi dx1 . . . dxm p m A1 A m i=1 ¯ h i , δ , by dividing we can normalize the measure on each ball Ai = B mm ¯ (h, δ) is bounded through |Ai |, so that by Jensen’s inequality log Pε B from below by m G m ε G 1 , xi−1 , xi dx1 . . . dxm . |Ai | + ... log p log A1 · · · Am A 1 m A m i=1 i=1

Pε

¯ m (h, δ) = B

...

Then lim ε log Pε [B m (h, δ)] ε→0

1 ε→0 A1 · · · Am

≥ lim ≥−

m 1 4 A1 · · · Am

m G

... A1

A m i=1 m

ε , xi−1 , xi dx1 . . . dxm m

...

A1

ε log p

A m i=1

2

d (xi−1 , xi ) dx1 . . . dxm

Appendix E

632

and by continuity of d, we can now send δ → 0 and see that M 1 1 2 d (hi−1 , hi ) 4 i=1 m m

lim ε log Pε [B m (h, δ)] ≥

ε→0

≥

−

1 2 1 − |h|W 1 , 2 ;[0,1] = − I (h) . 4 4

The proof is then ﬁnished.

E.6.3

Support theorem

Theorem E.21 (support) Assume E is a strongly local, strongly regular Dirichlet form on L2 (E, m) which satisﬁes (I)–(III). Write X for the symmetric diﬀusion associated with E, started at some ﬁxed point x ∈ E. Then there exists a constant C only dependent on N, CP (the doubling and Poincar´e constants of E) so that for any path h ∈ Wx1,2 ([0, 1] , E) and any ε ∈ (0, 1), we have Px

sup d (Xt , ht ) ≤ ε

t∈[0,1]



2  C 1 + |h| 1 , 2 W ;[0,1]   ≥ exp − . ε2

In particular, if P x = X∗ Px denotes the law of X x , viewed as a Borel measure on Cx ([0, 1] , E), then supp (Px ) = Cx ([0, 1] , E) . Proof. As a preliminary remark, let us note that for any M ≥ 1 we have inf

r>0 a,b∈E :d(a,b)≤M r

1 m (B (a, r)) ≥ >0 N m (B (b, r)) (4M )

as is seen from (E.2) and m (B (b, r)) ≤ m (B (a, r + d (a, b))) ≤ m (B (a, (M + 1) r)) ≤

N

(2 (M + 1)) m (B (a, r)) .

We now turn to the actual proof, given in three steps. 2 Step 1: From the very deﬁnition of W 1,2 -regularity, (s, t) → |h|W 1 , 2 ;[s,t] is super-additive (in fact, additive) and 1/2

d (hs , ht ) ≤ |t − s|

|h|W 1 , 2 ;[0,1] .

(E.6)

E.6 Stochastic analysis

633

By the Markov property, and deﬁning y0 = x and ti = i/n where δ will be ﬁxed later (as a function of ε), P

x

∀i = 1, . . . , n : d (Xt i , ht i ) <

···

= B (h t 1 ,n −1 / 2 )

B (h t n ,n −1 / 2

1 1

n2

and

sup

t i −1 ≤t≤t i

d Xt , ht i −1 < ε

1 , y0 , y1 · · · pB (h t ,ε ) 0 n ) 1 , yn −1 , yn dy1 · · · dyn . · · · pB (h t ,ε ) n −1 n

= : qε,n . We join the points yi and yi+1 by the curve γ i , which is the concatenation of three geodesic curves joining ﬁrst yi with ht i , then ht i with ht i + 1 and ﬁnally ht i + 1 with yt i + 1 . Using d (yi , ht i ) ≤ n−1/2 for all i, we see that d (yi , yi+1 ) ≤ length (γ i ) ≤ 2n−1/2 + d ht i , ht i + 1 ,

(E.7)

and also that γ i remains in the ball B ht i , n−1/2 + d ht i , ht i + 1 ⊂ B ht i , n−1/2 + n−1/2 |h|W 1 , 2 ;[0,1] where the last inclusionis due to (E.6). Choose n as the smallest integer such that ε ≥ n−1/2 2 + |h|W 1 , 2 ;[0,1] . The curve γ i then stays inside B (ti , ε); more precisely, c

≡

d (γ i , B (ht i , ε) ) ≥ ε − n−1/2 1 + |h|W 1 , 2 ;[0,1] ≥ n−1/2 .

Ri

In particular, δ := min n1 , Ri2 = n1 which also implies nRi2 ≥ 1. If c1 denotes the constant whose existence is guaranteed by the localized lower heat-kernel bound, then 1 , yi , yi+1 pB (h t ,ε ) i n 1 2 −c 1 1 1 e−c 1 n d(y i ,y i + 1 ) e n R i2 ≥ c1 m B y , δ 1/2 i

≥

−9c 1

e

1 exp(−2c1 |h|2W 1 , 2 ;[t i ,t i + 1 ] ) c1 m B yi , n−1/2 =:exp(c 2 )

Appendix E

634 2

2

using (E.7) =⇒ nd (yi , yi+1 ) ≤ 8 + 2 |h|W 1 , 2 ;[t i ,t i + 1 ] in the last line. With this lower bound at hand, and the previous lemma, noting that ≤ n−1/2 + d ht i , ht i + 1 d ht i + 1 , yt i ≤ n−1/2 1 + |h|W 1 , 2 ;[t i ,t i + 1 ] =: M n−1/2 , it follows immediately from the deﬁnition of qε,n that nG −1 ht i + 1 , n−1/2 −2c 1 |h|2W 1 , 2 ; t , t −c 2 m B [ i i+ 1 ] e e qε,n ≥ m B yt i , n−1/2 i=0 ≥

nG −1 i=0

−N −2c |h|2 e−c 2 1 W 1 , 2 ; [t i , t i + 1 ] 1 + |h| e . 1 , 2 W ;[t i ,t i + 1 ] 4N

Of course, by making c1 larger we can absorb the polynomial factor (1 + |h|W 1 , 2 ;[t i ,t i + 1 ] )−N into the exponential factor. Thus, for c3 large enough, also chosen such that e−c 2 /4N ≥ e−c 3 , we have n −1 2 −n c 3 qε,n ≥ e exp −2c3 |h|W 1 , 2 ;[t i ,t i + 1 ] i=0

2 exp −2c3 |h|W 1 , 2 ;[0,1] . −1/2 We chose n such that (n − 1) > ε/ 2 + |h|W 1 , 2 ;[0,1] ≥ n−1/2 . Hence  2  2 + |h|W 1 , 2 ;[0,1]   2 qε,n ≥ e−c 3 exp −c3 |h| exp −2c  1 , 2 3 W ;[0,1] ε2

−n c 3

≥ e

 =

 exp −c4

2 + |h|W 1 , 2 ;[0,1]

2 

ε2

 2  exp −2c4 |h|W 1 , 2 ;[0,1] .

Step 2: We ﬁrst note that from ti = i/n and our choice of n, d ht i −1 , ht ≤ n−1/2 |h|W 1 , 2 ;[0,1] ≤ ε. The probability of sup0≤t≤1 d (Xt , ht ) < 2ε equals Px

max

≥ P

x

max

≥ P

x

sup

i=1,...,d t i −1 ≤t≤t i

sup

i=1,...,d t i −1 ≤t≤t i

∀i = 1, . . . , n :

d (Xt , ht ) < 2ε

d Xt , ht i −1 + d ht i −1 , ht i < 2ε sup

t i −1 ≤t≤t i

d Xt , ht i −1 < ε

≥ qε,n ,

E.7 Comments

635

which is plainly estimated from below by q ε and the last probability is what we estimate in step 1. This ﬁnishes the proof of the estimate in the statement of the theorem. Step 3: The quantitative estimate established in step 2 plainly implies that W 1,2 ⊂ supp(X∗ Px ) and hence W 1,2 ⊂ supp (X∗ Px ), by passing to the uniform closure. To see the converse, we use again the geodesic nature of the state space and recall from Lemma 5.19 that X D → X uniformly on [0, 1] as |D| → 0. Since X D (ω) ∈ W 1,2 for every ω this easily implies that supp (X∗ Px ) ⊂ W 1,2 .

E.7 Comments The basic theory on quadratic forms appears in Davies [38], for instance. We then follow Fukushima et al. [70], Varopoulos et al. [174] and especially Strum [163]. Proposition E.5 is taken from Sturm [165]. We are unaware of any precise references to the material of Section E.6.2.

Frequently used notation Finite-dimensional objects Rd d-dimensional Euclidean space with basis {e1 , . . . , ed } d ⊗k R k-tensors over Rd , see page 129 T N Rd step-N truncated tensor algebra, see page 129 ⊗k π k projection from T N Rd → Rd δ λ dilation map, see page 133 GN Rd step-N free nilpotent group over page Rd , see page 142 d gN Rd step-N free nilpotent Lie algebra over page R , see page 139 N d N d see page 134 t R , 1+t R ⊗k |·| Euclidean norm, on Rd or Rd forsome k ∈ {1, . . . , N } · Carnot–Caratheodory norm, on GN Rd , see page 144 U1 , . . . , Ud invariant vector ﬁelds GN Rd , see page 149 u1 , . . . , ud invariant vector ﬁelds GN Rd , see page 457 Paths and path-spaces x = (xt : t ∈ [0, T ]) a generic path with values in some metric space D a dissection (tj ) of [0, T ] |D| the mesh of D, i.e. maxj |tj +1 − tj | D [0, T ] the set of all dissections of [0, T ] xD a piecewise linear or geodesic approximation, see page 32, C ([0, T ] , E) continuous paths with values in a metric space, see page 19 older continuous paths, with exponent α, see page 77 C α -H¨o l ([0, T ] , E) H¨ C p-var ([0, T ] , E) continuous paths of ﬁnite p-variation, see page 77 W 1,p ([0, T ] , E) paths with W 1,p -Sobolev regularity, see page 42 W δ ,p ([0, T ] , E) fractional Sobolev (or Besov) paths, see page 87 ω (s, t) a control function, see page 21 ω ([s, t] , [u, v]) a 2D control function, see page 105 ˜ f D , D , f µ, µ˜ approximations to f = f (s, t), see pages 108, 110 Rough paths and rough path-spaces x = (xt ) a path with values in the group GN Rd , ⊗ xs,t = x−1 s ⊗ xs (group) increment of x see page 195 C 1/p-H¨o l [0, T ] , G[p] Rd , C p-var [0, T ] , G[p] Rd d d 0,1/p-H¨o l [p] 0,p-var [p] [0, T ] , G R ,C [0, T ] , G R see page 195 C ·p-var , ·1/p-H¨o l see page 165 dp-var , d1/p-H¨o l homogenous distances, see page 166 ρp-var , ρ1/p-H¨o l;[0,T ] inhomogenous distances, see page 170 C (p,q )-var [0, T ] , Rd ⊕ Rd , ·p,q -ω 1 ,ω 2 , ρp,q -ω 1 ,ω 2 see pages 197, 170

Frequently used notation

637

Operations on rough path-spaces SN (x) Lyons lift, see page 186 SN (x, h) Young pairing, see page 204 Th (x) translation operator, see page 209 Diﬀerential equations V = (V1 , . . . , Vd ) a collection of vector ﬁelds x, x (smooth, rough) driving signal π (V ) (0, y0 ; x) , π (V ) (0, y0 ; x) ODE, RDE solution, see pages 55, 224

π (V ) (0, y0 ; x) full RDE solution, see page 241 ϕ (x) dx rough integral, see page 253 Stochastic processes β real-valued standard Brownian motion, see page 327 B Rd -valued standard Brownian motion, see page 327 dB, ◦dB Itˆ o, Stratonovich diﬀerential B G2 Rd -valued enhanced Brownian motion, see page 333 class of Rd -valued continuous local martingales, see page 386 Mc0,lo c Rd M Rd -valued continuous (semi-)martingale, see page 386 dM, ◦dM Itˆo, Stratonovich diﬀerential !M " quadratic variation process, componentwise deﬁned, see page 386 M G2 Rd -valued enhanced (semi-)martingale, see page 387 X Rd -valued Gaussian process H Cameron–Martin (reproducing kernel Hilbert) space to X R covariance of a Gaussian process, typically of ﬁnite ρ-variation, ρ ≥ 1 β H real-valued fractional Brownian motion, see page 405 B H Rd -valued fractional Brownian motion, see page 431 BH G[1/H ] Rd -valued enhanced fractional Brownian motion, page 431 X G[2ρ] Rd -valued enhanced Gaussian process, see ijpage 429 a X a , X a,x Rd -valued Markov process, generator ∂ ∂ i j , see page 454 Xa , Xa,x G2 Rd -valued Markov process, see page 461 pa (t, x, y) heat kernel, see page 461 E a Dirichlet form, see page 457 La generator (in divergence form), see page 460

References [1] A. A. Agrachev. Introduction to optimal control theory. In Mathematical Control Theory, Part 1, 2 (Trieste, 2001), ICTP Lecture Notes, VIII, pages 453–513 (electronic). Abdus Salam International Center for Theoretical Physics, Trieste, 2002. [2] S. Aida, S. Kusuoka and D. Stroock. On the support of Wiener functionals. In Asymptotic Problems in Probability Theory: Wiener Functionals and Asymptotics (Sanda/Kyoto, 1990), Volume 284 of Pitman Research Notes in Mathematics Series, pages 3–34. Longman Science and Technology, Harlow, 1993. [3] S. Aida. Semi-classical limit of the bottom of spectrum of a Schr¨ odinger operator on a path space over a compact Riemannian manifold. J. Funct. Anal., 251(1):59–121, 2007. [4] R. Azencott. Formule de Taylor stochastique et d´eveloppement asymptotique d’int´egrales de Feynman. In Seminar on Probability, XVI, Supplement, pages 237–285. Springer, Berlin, 1982. [5] P. Baldi, G. Ben Arous and G. Kerkyacharian. Large deviations and the Strassen theorem in H¨ older norm. Stochastic Process. Appl., 42(1):171–180, 1992. [6] R. F. Bass and T. Kumagai. Laws of the iterated logarithm for some symmetric diﬀusion processes. Osaka J. Math., 37(3):625–650, 2000. [7] F. Baudoin. An Introduction to the Geometry of Stochastic Flows. Imperial College Press, London, 2004. [8] F. Baudoin and L. Coutin. Operators associated with a stochastic differential equation driven by fractional Brownian motions. Stochastic Process. Appl., 117(5):550–574, 2007. [9] F. Baudoin and M. Hairer. A version of H¨ ormander’s theorem for the fractional Brownian motion. Probab. Theory Related Fields, 139(3– 4):373–395, 2007. [10] G. Ben Arous. Flots et s´eries de Taylor stochastiques. Probab. Theory Related Fields, 81(1):29–77, 1989. [11] G. Ben Arous and F. Castell. Flow decomposition and large deviations. J. Funct. Anal., 140(1):23–67, 1996.

References

639

[12] G. Ben Arous, M. Gr˘ adinaru and M. Ledoux. H¨ older norms and the support theorem for diﬀusions. Ann. Inst. H. Poincar´e Probab. Statist., 30(3):415–436, 1994. [13] P. Billingsley. Convergence of Probability Measures. John Wiley & Sons Inc., New York, 1968. [14] R. L. Bishop and R. J. Crittenden. Geometry of Manifolds. AMS Chelsea Publishing, Providence, RI, 2001. Reprint of the 1964 original. [15] J.-M. Bismut. Large Deviations and the Malliavin Calculus, Volume 45 of Progress in Mathematics. Birkh¨ auser Boston Inc., Boston, MA, 1984. [16] E. Breuillard, P. Friz and M. Huesmann. From random walks to rough paths, Proc. Amer. Math. Soc., 137:3487–3496, 2009. [17] R. Buckdahn and J. Ma. Pathwise stochastic control problems and stochastic HJB equations. SIAM J. Control Optim., 45(6):2224–2256 (electronic), 2007. [18] R. Buckdahn and J. Ma. Stochastic viscosity solutions for fully nonlinear SPDEs (I). Stochastic Process. Appl., 93:181–204, 2001. [19] R. Buckdahn and J. Ma. Stochastic viscosity solutions for fully nonlinear SPDEs (II). Stochastic Process. Appl., 93:205–228, 2001. [20] R. Buckdahn and J. Ma. Pathwise stochastic Taylor expansion and stochastic viscosity solution for fully nonlinear SPDEs. Ann. Prob., 30(3):1131–1171, 2002. [21] D. Burago, Y. Burago and S. Ivanov. A Course in Metric Geometry, Volume 33 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2001. [22] E. A. Carlen, S. Kusuoka and D. W. Stroock. Upper bounds for symmetric Markov transition functions. Ann. Inst. H. Poincar´e Probab. Statist., 23(2, suppl.):245–287, 1987. [23] M. Caruana and P. Friz. Partial diﬀerential equations driven by rough paths. J. Diﬀerent. Equations, 247(1):140–173, 2009. [24] M. Caruana, P. Friz and H. Oberhauser. A (rough) pathwise approach to a class of non-linear stochastic partial diﬀerential equations. arXiv:0902.3352v2, 2009.

640

References

[25] T. Cass and P. Friz. Densities for rough diﬀerential equations under Hoermander’s condition. Ann. Math., accepted, 2008. [Available for download at http://pjm.math.berkeley.edu/scripts/ coming.php?.jpath=annals] [26] T. Cass, P. Friz and N. Victoir. Non-degeneracy of Wiener functionals arising from rough diﬀerential equations. Trans. Amer. Math. Soc., 361:3359–3371, 2009. [27] F. Castell. Asymptotic expansion of stochastic ﬂows. Probab. Theory Related Fields, 96(2):225–239, 1993. [28] K. T. Chen. Integration of paths, geometric invariants and a generalized Baker–Hausdorﬀ formula. Ann. Math. (2), 65:163–178, 1957. [29] K. T. Chen. Integration of paths—a faithful representation of paths by non-commutative formal power series. Trans. Amer. Math. Soc., 89:395–407, 1958. [30] L. Coutin, P. Friz and N. Victoir. Good rough path sequences and applications to anticipating stochastic calculus. Ann. Probab., 35(3):1172–1193, 2007. [31] L. Coutin and A. Lejay. Semi-martingales and rough paths theory. Electron. J. Probab., 10(23):761–785 (electronic), 2005. [32] L. Coutin and Z. Qian. Stochastic analysis, rough path analysis and fractional Brownian motions. Probab. Theory Related Fields, 122(1):108–140, 2002. [33] L. Coutin and N. Victoir. Enhanced Gaussian processes and applications. Preprint, 2005. [34] M. G. Crandall, H. Ishii and P.-L. Lions. User’s guide to viscosity solutions of second order partial diﬀerential equations. Bull. Amer. Math. Soc. (N.S.), 27(1):1–67, 1992. [35] S. Das Gupta, M. L. Eaton, I. Olkin, M. Perlman, L. J. Savage and M. Sobel. Inequalities on the probability content of convex regions for elliptically contoured distributions. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (University of California, Berkeley, CA, 1970/1971), Vol. II: Probability Theory, pages 241–265. University of California Press, Berkeley, CA, 1972. [36] P. Cr´epel and A. Raugi. Th´eor`eme central limite sur les groupes nilpotents. Ann. Inst. H. Poincar´e Sect. B (N.S.), 14(2):145–164, 1978.

References

641

[37] A. M. Davie. Diﬀerential equations driven by rough paths: an approach via discrete approximation. Appl. Math. Res. Express. AMRX, (2):Art. ID abm009, 40, 2007. [38] E. B. Davies. Heat Kernels and Spectral Theory, Volume 92 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 1989. [39] V. H. de la Pe˜ na and E. Gin´e. Decoupling. Springer-Verlag, New York, 1999. [40] L. Decreusefond. Stochastic integration with respect to Volterra processes. Ann. Inst. H. Poincar´e Probab. Statist., 41(2):123–149, 2005. [41] A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications, Volume 38 of Applications of Mathematics (New York). Springer-Verlag, New York, second edition, 1998. [42] J.-D. Deuschel and D. W. Stroock. Large Deviations, Volume 137 of Pure and Applied Mathematics. Academic Press Inc., Boston, MA, 1989. [43] J. Dieudonn´e. Foundations of Modern Analysis. Academic Press, New York, 1969. Enlarged and corrected printing, Pure Appl. Math., 10-I. [44] H. Doss. Liens entre ´equations diﬀ´erentielles stochastiques et ordinaires. C. R. Acad. Sci. Paris S´er. A–B, 283(13):Ai, A939–A942, 1976. [45] B. K. Driver. Analysis tools with applications. Draft, 2003. [46] R. M. Dudley. Sample functions of the Gaussian process. Ann. Prob., 1(1):66–103, 1973. [47] R. M. Dudley and R. Norvaiˇsa. Diﬀerentiability of Six Operators on Nonsmooth Functions and p-Variation, Volume 1703 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1999. With the collaboration of Jinghua Qian. [48] R. M. Dudley and R. Norvaiˇsa. An introduction to p-variation and Young integrals – with emphasis on sample functions of stochastic processes. Lecture given at the Centre for Mathematical Physics and Stochastics, Department of Mathematical Sciences, University of Aarhus, 1998. [49] C. Feng and H. Zhao. Rough path integral of local time. C. R. Math. Acad. Sci. Paris, 346(7–8):431–434, 2008.

642

References

[50] D. Feyel and A. de la Pradelle. Curvilinear integrals along enriched paths. Electron J. Probab., 11:860–892, 2006. [51] D. Feyel and A. de la Pradelle. On fractional Brownian processes. Potential Anal., 10(3):273–288, 1999. [52] D. Feyel and A. de la Pradelle. Curvilinear integrals along enriched paths. Electron. J. Probab., 11(34):860–892 (electronic), 2006. [53] W. H. Fleming and H. Mete Soner. Controlled Markov Processes and Viscosity Solutions, Volume 25 of Stochastic Modelling and Applied Probability. Springer, New York, second edition, 2006. [54] G. B. Folland and E. M. Stein. Hardy Spaces on Homogeneous Groups, Volume 28 of Mathematical Notes. Princeton University Press, Princeton, NJ, 1982. [55] G. B. Folland. Real Analysis. John Wiley & Sons Inc., New York, second edition, 1999. [56] D. Freedman. Brownian Motion and Diﬀusion. Springer-Verlag, New York, second edition, 1983. [57] P. Friz, T. Lyons and D. Stroock. L´evy’s area under conditioning. Ann. Inst. H. Poincar´e Probab. Statist., 42(1):89–101, 2006. [58] P. Friz and H. Oberhauser. Rough path limits of the Wong–Zakai type with a modiﬁed drift term. J. Funct. Anal., 256(10):3236–3256, 2009. [59] P. Friz and H. Oberhauser. Isoperimetry and rough path regularity, 2007. [60] P. Friz and H. Oberhauser. A generalized Fernique theorem and applications. Preprint, 2009. [61] P. Friz and N. Victoir. Diﬀerential equations driven by Gaussian signals. Ann. Inst. H. Poincar´e Probab. Statist., 46 (DOI 10.1214/09AIHP202). [62] P. Friz and N. Victoir. Approximations of the Brownian rough path with applications to stochastic analysis. Ann. Inst. H. Poincar´e Probab. Statist., 41(4):703–724, 2005. [63] P. Friz and N. Victoir. A note on the notion of geometric rough paths. Probab. Theory Related Fields, 136(3):395–416, 2006. [64] P. Friz and N. Victoir. A variation embedding theorem and applications. J. Funct. Anal., 239(2):631–637, 2006.

References

643

[65] P. Friz and N. Victoir. Large deviation principle for enhanced Gaussian processes. Ann. Inst. H. Poincar´e Probab. Statist., 43(6):775– 785, 2007. [66] P. Friz and N. Victoir. The Burkholder–Davis–Gundy inequality for enhanced martingales. In S´eminaire de Probabilit´es XLI, Volume 1934 of Lecture Notes in Mathematics Springer, Berlin, 2008. [67] P. Friz and N. Victoir. Euler estimates for rough diﬀerential equations. J. Diﬀerent. Equations, 244(2):388–412, 2008. [68] P. Friz and N. Victoir. On uniformly subelliptic operators and stochastic area. Probab. Theory Related Fields, 142(3–4):475–523, 2008. [69] P. K. Friz. Continuity of the Itˆ o-map for H¨older rough paths with applications to the support theorem in H¨ older norm. In Probability and Partial Diﬀerential Equations in Modern Applied Mathematics, Volume 140 of IMA Volumes in Mathematical Applications, pages 117–135. Springer, New York, 2005. ¯ [70] M. Fukushima, Y. Oshima and M. Takeda. Dirichlet Forms and Symmetric Markov Processes, Volume 19 of de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin, 1994. [71] D. Gilbarg and N. S. Trudinger. Elliptic Partial Diﬀerential Equations of Second Order. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1998 edition. [72] M. Gromov. Metric Structures for Riemannian and Non-Riemannian Spaces. Modern Birkh¨ auser Classics. Birkh¨auser Boston Inc., Boston, MA, English Edition, 2007. Based on the 1981 French original, with appendices by M. Katz, P. Pansu and S. Semmes, translated from the French by S. M. Bates. [73] P. Guasoni. No arbitrage under transaction costs, with fractional Brownian motion and beyond. Math. Finance, 16(3):569–582, 2006. [74] M. Gubinelli. Controlling rough paths. J. Funct. Anal., 216(1):86– 140, 2004. [75] M. Gubinelli. Ramiﬁcations of rough paths, arXiv:math/060300v1, to appear in J. Diﬀerent. Equations. [76] M. Gubinelli and S. Tindel. Rough evolution equations, arXiv:0803. 0552v1, to appear in Annals of Probability, 2008. [77] M. Gubinelli, A. Lejay and S. Tindel. Young integrals and SPDEs. Potential Anal., 25(4):307–326, 2006.

644

References

[78] I. Gy¨ ongy. The stability of stochastic partial diﬀerential equations and applications. I. Stochastics Stochastics Rep., 27(2):129–150, 1989. [79] I. Gy¨ ongy. The stability of stochastic partial diﬀerential equations and applications. Theorems on supports. In Stochastic Partial Diﬀerential Equations and Applications, II (Trento, 1988), Volume 1390 of Lecture Notes in Mathematics, pages 91–118. Springer, Berlin, 1989. [80] I. Gy¨ ongy. The stability of stochastic partial diﬀerential equations. II. Stochastics Stochastics Rep., 27(3):189–233, 1989. [81] I. Gy¨ ongy. The approximation of stochastic partial diﬀerential equations and applications in nonlinear ﬁltering. Comput. Math. Appl., 19(1):47–63, 1990. [82] I. Gy¨ ongy. On stochastic partial diﬀerential equations. Results on approximations. In Topics in Stochastic Systems: Modelling, estimation and adaptive control, Volume 161 of Lecture Notes in Control and Information Science, pages 116–136. Springer, Berlin, 1991. [83] P. Haj lasz and P. Koskela. Sobolev met Poincar´e. Mem. Amer. Math. Soc., 145(688):x+101, 2000. [84] P. Hartman. Ordinary Diﬀerential Equations, Volume 38 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. Corrected reprint of the second (1982) edition [Birkh¨ auser, Boston, MA; MR0658490 (83e:34002)], with a foreword by P. Bates. [85] B. Hoﬀ. The Brownian frame process as a rough path, arXiv:math/ 0602008v1, 2006. [86] Y. Hu and D. Nualart. Rough path analysis via fractional calculus, Trans. Amer. Math. Soc. 361(5):2689–2718, 2009. [87] Y. Hu and D. Nualart. Diﬀerential equations driven by H¨ older continuous functions of order greater than 1/2, Stochast. Anal. Appl., 2:399–413, 2007. [88] N. Ikeda and S. Watanabe. Stochastic Diﬀerential Equations and Diffusion Processes. North-Holland Publishing Co., Amsterdam, second edition, 1989. [89] Y. Inahama. A stochastic Taylor-like expansion in the rough path theory. Preprint, 2008. [90] Y. Inahama. Laplace’s method for the laws of heat processes on loop spaces. J. Funct. Anal., 232(1):148–194, 2006.

References

645

[91] Y. Inahama and H. Kawabi. Large deviations for heat kernel measures on loop spaces via rough paths. J. London Math. Soc. (2), 73(3):797– 816, 2006. [92] Y. Inahama and H. Kawabi. Asymptotic expansions for the Laplace approximations for Itˆ o functionals of Brownian rough paths. J. Funct. Anal., 243(1):270–322, 2007. [93] D. Jerison. The Poincar´e inequality for vector ﬁelds satisfying H¨ ormander’s condition. Duke Math. J., 53(2):503–523, 1986. [94] I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus, Volume 113 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1991. [95] N. V. Krylov. Introduction to the Theory of Diﬀusion Processes, Volume 142 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1995. Translated from the Russian manuscript by V. Khidekel and G. Pasechnik. [96] H. Kunita. Stochastic Flows and Stochastic Diﬀerential Equations, Volume 24 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1997. Reprint of the 1990 original. ´ Pardoux and P. Protter. Stratonovich stochastic dif[97] T. G. Kurtz, E. ferential equations driven by general semimartingales. Ann. Inst. H. Poincar´e Probab. Statist., 31(2):351–377, 1995. [98] S. Kusuoka. On the regularity of solutions to SDE. In Asymptotic Problems in Probability Theory: Wiener functionals and asymptotics (Sanda/Kyoto, 1990), Volume 284 of Pitman Research Notes in Mathematics Series, pages 90–103. Longman Science and Technology, Harlow, 1993. [99] S. Kusuoka. The nonlinear transformation of Gaussian measure on Banach space and absolute continuity. I. J. Fac. Sci. Univ. Tokyo Sect. IA Math., 29(3):567–597, 1982. [100] J. Lamperti. On convergence of stochastic processes. Trans. Amer. Math. Soc., 104:430–435, 1962. [101] M. Ledoux, Z. Qian and T. Zhang. Large deviations and support theorem for diﬀusion processes via rough paths. Stochastic Process. Appl., 102(2):265–283, 2002. [102] M. Ledoux. Isoperimetry and Gaussian analysis. In Lectures on Probability Theory and Statistics (Saint-Flour, 1994), Volume 1648 of Lecture Notes in Mathematics, pages 165–294. Springer, Berlin, 1996.

646

References

[103] M. Ledoux and M. Talagrand. Probability in Banach Spaces, Volume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Springer-Verlag, Berlin, 1991. [104] A. Lejay. An introduction to rough paths. In S´eminaire de Probabilit´es XXXVII, Volume 1832 of Lecture Notes in Mathematics, pages 1–59. Springer, Berlin, 2003. [105] A. Lejay. Stochastic diﬀerential equations driven by processes generated by divergence form operators. I. A Wong–Zakai theorem. ESAIM Probab. Stat., 10:356–379 (electronic), 2006. [106] A. Lejay. Stochastic diﬀerential equations driven by processes generated by divergence form operators. II: Convergence results. ESAIM Probab. Stat., 12:387–411 (electronic), 2008. [107] A. Lejay and N. Victoir. On (p, q)-rough paths. J. Diﬀerent. Equations, 225(1):103–133, 2006. [108] D. L´epingle. La variation d’ordre p des semi-martingales. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 36(4):295–316, 1976. [109] X.-D. Li and T. J. Lyons. Smoothness of Itˆ o maps and diﬀusion pro´ cesses on path spaces. I. Ann. Sci. Ecole Norm. Sup. (4), 39(4):649– 677, 2006. [110] P.-L. Lions and P. E. Souganidis. Viscosity solutions of fully nonlinear stochastic partial diﬀerential equations. S¯ urikaisekikenky¯ usho K¯ oky¯ uroku, (1287):58–65, 2002. Viscosity solutions of diﬀerential equations and related topics (in Japanese) (Kyoto, 2001). [111] P.-L. Lions and P. E. Souganidis. Fully nonlinear stochastic partial diﬀerential equations. C. R. Acad. Sci. Paris S´er. I Math., 326(9):1085–1092, 1998. [112] P.-L. Lions and P. E. Souganidis. Fully nonlinear stochastic partial diﬀerential equations: non-smooth equations and applications. C. R. Acad. Sci. Paris S´er. I Math., 327(8):735–741, 1998. [113] P.-L. Lions and P. E. Souganidis. Uniqueness of weak solutions of fully nonlinear stochastic partial diﬀerential equations. C. R. Acad. Sci. Paris S´er. I Math., 331(10):783–790, 2000. [114] E. R. Love. A generalization of absolute continuity. J. London Math. Soc., 26:1–13, 1951.

References

647

[115] T. Lyons. The interpretation and solution of ordinary diﬀerential equations driven by rough signals. In Stochastic Analysis (Ithaca, NY, 1993), pages 115–128. American Mathematical Society, Providence, RI, 1995. [116] T. Lyons. Diﬀerential equations driven by rough signals. Rev. Mat. Iberoamericana, 14(2):215–310, 1998. [117] T. Lyons. Systems controlled by rough paths. In European Congress of Mathematics, pages 269–281. European Mathematical Society, Z¨ urich, 2005. [118] T. Lyons and Z. Qian. Calculus of variation for multiplicative functionals. In New Trends in Stochastic Analysis (Charingworth, 1994), pages 348–374. World Scientiﬁc Publishing, River Edge, NJ, 1997. [119] T. Lyons and Z. Qian. Flow of diﬀeomorphisms induced by a geometric multiplicative functional. Probab. Theory Related Fields, 112(1):91–119, 1998. [120] T. Lyons and Z. Qian. System Control and Rough Paths. Oxford University Press, 2002. [121] T. Lyons and L. Stoica. The limits of stochastic integrals of diﬀerential forms. Ann. Probab., 27(1):1–49, 1999. [122] T. Lyons and N. Victoir. An extension theorem to rough paths. Ann. Inst. H. Poincar´e Anal. Non Lin´eaire, 24(5):835–847, 2007. [123] T. J. Lyons, M. Caruana and T. L´evy. Diﬀerential Equations Driven by Rough Paths, Volume 1908 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24, 2004, with an introduction concerning the Summer School by Jean Picard. [124] P. Malliavin. Stochastic Analysis, Volume 313 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1997. [125] H. P. McKean. Stochastic Integrals. AMS Chelsea Publishing, Providence, RI, 2005. Reprint of the 1969 edition, with errata. [126] E. J. McShane. Stochastic diﬀerential equations and models of random processes. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (University of California, Berkeley, CA, 1970/1971), Vol. III: Probability Theory, pages 263– 294. University of California Press, Berkeley, CA, 1972.

648

References

[127] A. Millet, D. Nualart and M. Sanz. Large deviations for a class of anticipating stochastic diﬀerential equations. Ann. Probab., 20(4):1902– 1931, 1992. [128] A. Millet and M. Sanz-Sol´e. A simple proof of the support theorem for diﬀusion processes. In S´eminaire de Probabilit´es, XXVIII, Volume 1583 of Lecture Notes in Mathematics, pages 36–48. Springer, Berlin, 1994. [129] A. Millet and M. Sanz-Sol´e. Large deviations for rough paths of the fractional Brownian motion. Ann. Inst. H. Poincar´e Probab. Statist., 42(2):245–271, 2006. [130] A. Millet and M. Sanz-Sol´e. Approximation of rough paths of fractional Brownian motion. In Seminar on Stochastic Analysis, Random Fields and Application, Volume 59 of Progress in Probability, pages 275–303. Birkhuser, Basel, 2008. [131] R. Montgomery. A Tour of SubRiemannian Geometries, their Geodesics and Applications, Volume 91 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2002. [132] J. Musielak and Z. Semadeni. Some classes of Banach spaces depending on a parameter. Studia Math., 20:271–284, 1961. [133] A. Neuenkirch, I. Nourdin and S. Tindel. Delay equations driven by rough paths, 2007. [134] D. Neuenschwander. Probabilities on the Heisenberg Group, Volume 1630 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1996. [135] D. Nualart. The Malliavin Calculus and Related Topics. SpringerVerlag, New York, 1995. [136] D. Nualart. The Malliavin Calculus and Related Topics. SpringerVerlag, Berlin, second edition, 2006. [137] G. Pisier and Q. H. Xu. The strong p-variation of martingales and orthogonal series. Probab. Theory Related Fields, 77(4):497–514, 1988. [138] E. Platen. A Taylor formula for semimartingales solving a stochastic equation. In Stochastic Diﬀerential Systems (Visegr´ ad, 1980), pages 157–164. Springer, Berlin, 1981. [139] M. H. Protter and C. B. Morrey, Jr. A First Course in Real Analysis. Undergraduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1991.

References

649

[140] P. E. Protter. Stochastic Integration and Diﬀerential Equations, Volume 21 of Stochastic Modelling and Applied Probability. SpringerVerlag, Berlin, 2005. Second edition, version 2.1, corrected third printing. [141] J. A. Ram´ırez. Short-time asymptotics in Dirichlet spaces. Comm. Pure Appl. Math., 54(3):259–293, 2001. [142] C. Reutenauer. Free Lie Algebras. Clarendon Press, New York, 1993. [143] D. Revuz and M. Yor. Continuous Martingales and Brownian Motion, Volume 293 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, third edition, 1999. [144] L. C. G. Rogers and D. Williams. Diﬀusions, Markov Processes, and Martingales. Vol. 1. Cambridge University Press, Cambridge, 2000. Reprint of the second (1994) edition. [145] L. C. G. Rogers and D. Williams. Diﬀusions, Markov Processes, and Martingales. Vol. 2. Cambridge University Press, Cambridge, 2000. Reprint of the second (1994) edition. [146] A. Rozkosz. Stochastic representation of diﬀusions corresponding to divergence form operators. Stochastic Process. Appl., 63(1):11–33, 1996. [147] A. Rozkosz. Weak convergence of diﬀusions corresponding to divergence form operators. Stochastics Stochastics Rep., 57(1–2):129–157, 1996. [148] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, New York, third edition, 1976. [149] W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, third edition, 1987. [150] W. Rudin. Functional Analysis. McGraw-Hill, New York, second edition, 1991. [151] L. Saloﬀ-Coste and D. W. Stroock. Op´erateurs uniform´ement souselliptiques sur les groupes de Lie. J. Funct. Anal., 98(1):97–121, 1991. [152] B. Saussereau and D. Nualart. Malliavin calculus for stochastic diﬀerential equations driven by a fractional Brownian motion. Stochastic Process Appl., 119(2):391–409, 2009. [153] M. Schreiber. Fermeture en probabilit´e de certains sous-espaces d’un espace L2 . Application aux chaos de Wiener. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 14:36–48, 1969/70.

650

References

[154] L. A. Shepp and O. Zeitouni. A note on conditional exponential moments and Onsager–Machlup functionals. Ann. Probab., 20(2):652– 654, 1992. [155] I. Shigekawa. Stochastic Analysis, Volume 224 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 2004. Translated from the 1998 Japanese original by the author. [156] E.-M. Sipil¨ ainen. A pathwise view of solutions of stochastic diﬀerential equations. PhD thesis, University of Edinburgh, 1993. [157] R. S. Strichartz. The Campbell–Baker–Hausdorﬀ–Dynkin formula and solutions of diﬀerential equations. J. Funct. Anal., 72(2):320– 345, 1987. [158] D. W. Stroock. Diﬀusion semigroups corresponding to uniformly elliptic divergence form operators. In S´eminaire de Probabilit´es, XXII, Volume 1321 of Lecture Notes in Mathematics, pages 316–347. Springer, Berlin, 1988. [159] D. W. Stroock. Probability Theory, An Analytic View. Cambridge University Press, Cambridge, 1993. [160] D. W. Stroock. Markov Processes from K. Itˆ o’s Perspective, Volume 155 of Annals of Mathematics Studies. Princeton University Press, Princeton, NJ, 2003. [161] D. W. Stroock and S. R. S. Varadhan. On the support of diﬀusion processes with applications to the strong maximum principle. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (University of California, Berkeley, CA, 1970/1971), Vol. III: Probability Theory, pages 333–359. University of California Press, Berkeley, CA, 1972. [162] D. W. Stroock and S. R. Srinivasa Varadhan. Multidimensional Diffusion Processes, Volume 233 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1979. [163] K. T. Sturm. Analysis on local Dirichlet spaces. III. The parabolic Harnack inequality. J. Math. Pure Appl. (9), 75(3):273–297, 1996. [164] K.-T. Sturm. Analysis on local Dirichlet spaces. II. Upper Gaussian estimates for the fundamental solutions of parabolic equations. Osaka J. Math., 32(2):275–312, 1995.

References

651

[165] K.-T. Sturm. On the geometry deﬁned by Dirichlet forms. In Seminar on Stochastic Analysis, Random Fields and Applications (Ascona, 1993), Volume 36 of Progress Probability, pages 231–242. Birkh¨ auser, Basel, 1995. [166] H. J. Sussmann. On the gap between deterministic and stochastic ordinary diﬀerential equations. Ann. Probability, 6(1):19–41, 1978. [167] H. J. Sussmann. Limits of the Wong–Zakai type with a modiﬁed drift term. In Stochastic Analysis, pages 475–493. Academic Press, Boston, MA, 1991. [168] S. J. Taylor. Exact asymptotic estimates of Brownian path variation. Duke Math. J., 39:219–241, 1972. [169] J. Teichmann. Another approach to some rough and stochastic partial diﬀerential equations. arXiv:0908.2814v1, 2009. [170] N. Towghi. Multidimensional extension of L. C. Young’s inequality. JIPAM J. Inequal. Pure Appl. Math., 3(2):Article 22, 13 pp. (electronic), 2002. ¨ unel and M. Zakai. Transformation of measure [171] A. S¨ uleyman Ust¨ on Wiener space. Springer Monographs in Mathematics. SpringerVerlag, Berlin, 2000. [172] S. R. S. Varadhan. On the behavior of the fundamental solution of the heat equation with variable coeﬃcients. Comm. Pure Appl. Math., 20:431–455, 1967. [173] N. Th. Varopoulos. Small time Gaussian estimates of heat diﬀusion kernels. II. The theory of large deviations. J. Funct. Anal., 93(1):1– 33, 1990. [174] N. Th. Varopoulos, L. Saloﬀ-Coste and T. Coulhon. Analysis and Geometry on Groups, Volume 100 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 1992. [175] F. W. Warner. Foundations of Diﬀerentiable Manifolds and Lie Groups, Volume 94 of Graduate Texts in Mathematics. SpringerVerlag, New York, 1983. Corrected reprint of the 1971 edition. [176] K. Yosida. Functional Analysis, second edition. Die Grundlehren der mathematischen Wissenschaften, Band 123. Springer-Verlag, New York, 1968. [177] L. C. Young. An inequality of H¨ older type connected with Stieltjes integration. Acta Math., (67):251–282, 1936.

Index approximation of 2D function molliﬁer, 110 piecewise linear type, 108 Arzela–Ascoli theorem, 20 Azencott-type estimate, 528 Besov regularity, 39, 87 Besov–H¨older embedding, 575 Besov–L´evy modulus embedding, 576 Besov-variation embedding, 575 Bouleau–Hirsch criterion, 551, 614 bounded variation, 21 Brownian motion, 327 and its delay, 351 Cameron–Martin theorem for, 358 enhanced, 333 ﬁnite quadratic variation, 383 fractional, 405 inﬁnite 2-variation, 382 natural lift of, 333 on a Lie group, 334 Schilder’s theorem for, 360 support theorem for, 368 Brownian rough path, 333 large deviations for, 359 Burkholder–Davis–Gundy inequality for enhanced martingale, 389 for martingale, 388 in homogenous p-variation norm, 394 Cameron–Martin embedding theorem, 408 Campbell–Baker–Hausdorﬀ formula, 137 Cantor function, 27

Carnot–Caratheodory metric, 148 centre, 316 of Lie algebra, 317 Chen’s theorem, 133 Chow’s theorem, 140 compactness, 94, 177 concatenation of paths, 61 control function, 22, 80 2D, 105 Coutin–Qian condition, 407 Davie’s lemma, 216 dilation, 133 Dirac measure, 110 directional derivative, 69, 72, 284, 598 Dirichlet form, 617 dissection, 21 distance p-variation, 77 H¨older, 77, 166, 170 supremum or inﬁnity, 19 Doss–Sussman method, 316 driving signal, 53 time reversed, 62 Duhamel’s principle, 74 ellipticity condition, 550 enhanced Brownian motion, 333 Cameron–Martin theorem for, 358 deﬁnition, 333 Donsker’s theorem for, 354 exact variation, 338 geodesic approximation, 340 law of iterated logarithm, 339 L´evy modulus, 338 McShane approximation, 350 non-standard approximation, 347

Index

piecewise linear approximation, 340, 343 rough path regularity, 336 scaling, 335 Schilder’s theorem for, 362 Strassen’s law for, 367 support theorem for, 370 support theorem in conditional form, 380 Sussmann approximation, 349 weak approximation, 354 enhanced Gaussian process deﬁnition, 429 existence, 429 Karhunen–L´ oeve approximation, 438 modulus and exact variation, 419 molliﬁer approximation, 437 piecewise linear approximation, 436 weak approximation, 443 enhanced Markov process, 465 geodesic approximation, 467 piecewise linear approximation, 469 Schilder theorem for, 483 support theorem, H¨ older topology, 486 support theorem, uniform topology, 484 weak convergence, 482 enhanced martingale, 387 piecewise linear approximation, 398 rough path regularity, 390 equicontinuous set, 20 Euler approximation for ODEs, 55 for RDEs, 238 Euler estimate for ODEs, 213 for RDEs, 223 Euler scheme, 212

653

for RDEs, convergence, 238 higher-order, 212 explosion time, 55 exponential map, 136 ﬁnite ϕ-variation, 99 ﬁnite p-variation, 77, 105 ﬁnite H¨ older regularity, 77 ﬂows of diﬀeomorphisms, 290 Fr´echet derivative, 73, 287, 598 fractional Brownian motion, 405 Cameron–Martin space for, 410 free Lie algebra, 139 free nilpotent group, 143 Garsia–Rodemich–Rumsey theorem, 573 Gaussian process Karhunen–L´ oeve approximation, 415 molliﬁer approximation, 413 natural lift of, 429 non-degeneracy, 549 of Volterra type, 555 piecewise linear approximation, 411 Gaussian rough path, 429 large deviations for, 445 geodesic, 144 approximation, 88, 174, 340 space, 88 geodesic scheme for RDEs, 239 geometric rough path, 195 Gronwall’s lemma, 54 H-regularity, 545 deﬁnition, for abstract Wiener functionals, 613 of RDE solutions driven by Gaussian rough paths, 547 H¨ ormander’s condition, 553, 561

654

Index

heat kernel, 624 Heisenberg group, 147, 196 homogenous distance, 166 homogenous norm, 146 homogenous norms equivalence of, 149 Hopf–Rinow theorem, 88 Hurst parameter, 1 increments of a map, 30 inhomogenous distance, 170 integral Riemann–Stieltjes, 45 rough, 253 Stratonovich, 507 Young, 116 Young–Wiener, 434 interpolation, 78, 177 Kolmogorov criterion, 582 Kolmogorov–Lamperti tightness criterion, 583 L´evy’s area, 329 as time-changed Brownian motion, 332 large deviations, 603 contraction principle, 604 for anticipating SDE, 540 for Brownian rough path, 359 for Gaussian rough paths, 445 for Markovian rough path, 483 for SPDE, 543 for stochastic ﬂow, 541 for symmetric diﬀusion, 629 Lemma A, 215 Lemma B, 216 Lie algebra, 136 free, 139 Lie group, 135 lift of (p, q)-type, 204 of geometric rough path (Lyons), 185 of smooth path, 129

limit theorem for stochastic ﬂows, 522 strong, 517 weak, 520 Lipschitz map, 213 Lyons-lift, 185 of (p, q)-type, 204 Malliavin covariance matrix, 551 Malliavin derivative, 566 Markov process natural lift of, 465 Markovian rough path, 465 large deviations for, 483 martingale continuous local, 387 techniques, 341, 441, 610 moderate function, 388 modulus of continuity, 80, 83 natural lift of a Markov process, 465 of Brownian motion, 333 of Gaussian process, 429 neo-classical inequality, 211 non-degeneracy condition on Gaussian driving signal, 549 non-explosion condition on vector ﬁelds, 69 ordinary diﬀerential equation, 55 continuity of solution map, 62, 65 Euler approximation for, 55 Euler estimate for, 213 existence, 55 uniqueness, 59 ordinary diﬀerential equations diﬀerent starting points, same driving signal (Lemma B), 216 same starting point, diﬀerent driving signals (Lemma A), 215

Index

path α-H¨ older, 77 approximation to piecewise geodesic, 88 piecewise linear, 32 concatenation, 61 continuous, 19 1-H¨older, 28 absolutely, 26, 34 absolutely of order p, 86 Lipschitz, 28 continuously diﬀerentiable, 30 lift of, 129 of Besov regularity, 39, 87 of bounded variation, 21 of ﬁnite ϕ-variation, 99 of ﬁnite p-variation, 77 of ﬁnite H¨older regularity, 77 of Sobolev regularity, 39, 42, 87 rectiﬁable, 44 time reversal of, 61 Poincar´e inequality on free nilpotent groups, 495 quadratic form, 615 Riemann–Stieltjes integral, 45 rough diﬀerential equation deﬁnition, 224 directional derivative, 284 Euler estimate for, 223 Euler scheme for, 238 existence, 222 Fr´echet derivative, 287 full, 241 geodesic scheme for, 239 linear, 265 perturbed, 317 uniqueness, 233 with drift, 303 rough integral, 253

655

rough partial diﬀerential equation, 295 rough path geometric, 195 Schilder’s theorem, 629 semi-martingale, 386 signature, 129 Sobolev regularity, 39, 42, 87 stochastic diﬀerential equation anticipating, 523, 540 driven by Gaussian signal, 515, 537 driven by Markovian signal, 516, 537 in Stratonovich sense, 510 with delay, 524 stochastic ﬂow large deviations, 541 support theorem, 535 stochastic partial diﬀerential equation, 525 large deviations, 542 support theorem, 542 stochastic Taylor expansion strong remainder estimate, 528 weak remainder estimate, 531 Stratonovich integral, 507 sub-Riemannian manifold, 149 superadditive map, 22 support theorem for enhanced Markov process, H¨older topology, 486 for Brownian motion, 368 for enhanced Brownian motion, 370 for enhanced Brownian motion, conditional, 380 for enhanced Markov process, uniform topology, 484 for SDE driven by Gaussian signal, 537

656

support theorem (cont.) for SDE driven by Markovian signal, 537 for SPDE, 543 for stochastic ﬂow, 535 Stroock–Varadhan, 533 symmetric diﬀusion, 625 time reversal of paths, 61 translation operator, 209 upper gradient lemma, 495 vector ﬁelds Ck bounded, 68 1-Lipschitz continuous, 53

Index

Lipschitz regular in sense of Stein, 213 non-explosion condition on, 69 Wiener’s characterization, 96 Wong–Zakai theorem, 511 Young integral, 116 Young pairing, 204 Young regularity, complementary, 449, 546 Young–L´ oeve estimate, 114 Young–L´ oeve–Towghi estimate, 122 Young–Wiener integral, 434