Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES
State of the Art in Probability and Statistics Festschrift for Willem R. van Zwet Mathisca de Gunst, Chris Klaassen & Aad van der Vaart, Editors
Volume 36
Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Volume 36
State of the Art in Probability and Statistics Festschrift for Willem R. van Zwet Mathisca de Gunst, Chris Klaassen & Aad van der Vaart, Editors
Institute of Mathematical Statistics Beachwood, Ohio
Institute of Mathematical Statistics Lecture Notes-Monograph Series Editorial Board Andrew A. Barbour, Joseph Newton, Joel Greenhouse and David Ruppert (Editors)
The production of the IMS Lecture Notes-Monograph Series is managed by the IMS Business Office: Julia A. Norton, IMS Treasurer, and Elyse Gustafson, IMS Executive Director.
Library of Congress Control Number: 2001-131058 International Standard Book Number 0-940600-50-1 Copyright © 2001 Institute of Mathematical Statistics All rights reserved Printed in the United States of America
PREFACE
Prom Maxch 23-26, 1999 a symposium was held in the Lorentz Center of the University of Leiden in honour of the 65th birthday of Willem van Zwet. On each of the first three days of the symposium, six leading researchers in probability and statistics gave one-hour talks on current developments in these fields. The symposium was closed with two talks addressed to a general mathematical audience. Following the symposium Willem van Zwet gave a farewell lecture in the "Academie Gebouw" of the university. (See [62] on page xviii.) This volume contains the proceedings of the symposium plus 13 other papers and serves as a Festschrift for Willem van Zwet. In addition to the 33 refereed articles on a wide range of topics in probability and statistics, we have included a short version of Willem's vitae and the family tree of Willem's students. (See the contribution by van Eeden.) Amsterdam, July 2000 Mathisca de Gunst Chris Klaassen Aad van der Vaart
Willem R. van Zwet, Oberwolfach 1992
CONTENTS
Preface Contents Contributors Curriculum Vitae Willem R van Zwet Publications Willem R van Zwet
iii v viii xii xiv
Prom A to Z: Asymptotic expansions by van Zwet 2 W. Albers Likelihoods and pseudolikelihoods for Markov spatial processes 21 A.J. Baddeley Laser cooling and stochastics 50 O.E. Barndorff-Nielsen and F.E. Benth Extremal fits in REACT confidence sets 72 Rudolf Beran The bootstrap in hypothesis testing 91 Peter J. Bickel and Jian-Jian Ren An alternative point of view on Lepski's method 113 Lucien Birge Localization and decay of correlations for a pinned lattice free field in dimension two 134 Erwin Bolthausen and David Brydges Quadratic statistics in testing problems of large dimension 150 DM. Chibisov Some remarks on likelihood factorization 165 D.R. Cox Trimmed sums from the domain of geometric partial attraction of semistable laws 173 Sandor Csόrgό and Zoltan Megyesi Statistical problems involving permutations with restricted positions . 195 Persi Diaconis, Ronald Graham and Susan P. Holmes Markov chain conditions for admissibility in estimation problems with quadratic loss 223 Morris L. Eaton The scientific family tree of Willem R. van Zwet 244 Constance van Eeden
vi
Contents
Asymptotics in quantum statistics 255 Richard D. Gill Adaptive choice of bootstrap sample sizes 286 Friedrich Gόtze and Alfredas Rackauskas Conformal invariance, droplets, and entanglement 310 Geoffrey Grimmett Nonparametric analysis of earthquake point-process data 324 Edwin Choi and Peter Hall A note on estimators of gradual changes 345 M. Huskova Estimation of analytic functions 359 ί. Ibragimov The deterministic evolution of general branching populations 384 Peter Jagers Chi-square oracle inequalities 399 Iain M. Johnstone A statistical approach to the Cauchy problem for the Laplace equation 419 G. Golubev and R.Z. Khasminskii The two-sample problem in Rm and measure-valued martingales 434 J.H.J. Einmahl and E.V. Khmaladze On central limit theory for random additive functions under weak dependence restrictions 464 M.R. Leadbetter, H. Rootzen, and H. Choi An exponential inequality for a weighted approximation to the uniform empirical process with applications 477 David M. Mason A nonparametric asymptotic version of the Cramer-Rao bound 499 J. Pfanzagl The reassembling of shattered brownian sheet 518 Ronald Pyke Inverting noisy integral equations using wavelet expansions: a class of irregular convolutions 533 Peter Hall, Frits Ruymgaart, Onno van Gaans, and Arnoud van Rooij Note on a stochastic recursion 547 David Siegmund Ancillary history 555 Stephen M. Stigler On order statistics close to the maximum 568 JefL. Teugels
Contents Adaptive tuning, 4-d var and representers in rkhs G. Wahba Some converse limit theorems for exchangeable bootstraps Jon A. Wellner Perfect stochastic EM Erik van Zwet
vii 582 593 607
CONTRIBUTORS
W. Albers, Faculty of Mathematical Sciences, P.O. Box 217, 7500 AE Enschede, Netherlands, (
[email protected]). A.J. Baddeley, Department of Mathematics and Statistics, University of Western Australia, Nedlands WA 6907, Australia, (adήan@maths. uwa.edu.au). O.E. Barndorff-Nielsen, MaPhySto, Department of Mathematical Sciences, University of Aarhus, Ny Munkegade, DK-8000 Aarhus C, Denmark, (
[email protected]). F.E. Benth, MaPhySto, Department of Mathematical Sciences, University of Aarhus, Ny Munkegade, DK-8000 Aarhus C, Denmark, (fredb@imf. au.dk). Rudolf Beran, Department of Statistics, University of California Berkeley, Berkeley, CA 94720-3860, USA, (
[email protected]). Peter J. Bickel, Department of Statistics, University of California Berkeley, Berkeley, CA 94720, USA, (
[email protected]). Lucien Birge, Laboratoire de Probabilites et Modeles Stochastiques, Boϊte 188, Universite Paris VI, 4 Place Jussieu, F-75252 Paris Cedex 05, France, (
[email protected]). Erwin Bolthausen, Department of Mathematics, Universitat Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switserland, (
[email protected]. ch). David Brydges, University of Virginia. D.M. Chibisov, Steklov Mathematical Institute, GSP-1117966 Moscow, Russia, (chibisoυ@mi.ras.ru). Edwin Choi, Centre for Mathematics and its Applications, Australian National University, Canberra, ACT 0200, Australia. H. Choi, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599-3260, USA. D.R. Cox, Department of Statistics and Nuffield College, Oxford DX1 INF, UK, (David. Cox@nuf. ox. ac. uk). Sandor Csδrgδ, Department of Statistics, University of Michigan, 4062 Frieze Building, Ann Arbor, Michigan 48109-1285, USA, and Bolyai Institute, University of Szeged, Aradi vertanύk tere 1, H-6720 Szeged, Hungary, (
[email protected],
[email protected]).
Contributors
ix
Persi Diaconis, Mathematics and Statistics, Sequoia Hall, Stanford University, CA 94305-4065, USA. Morris L. Eaton, School of Statistics, 224 Church Street SE, Minneapolis, MN 55455, University of Minnesota, (
[email protected]). Constance van Eeden, Department of Statistics, The University of British Columbia, 333-6356 Agricultural Road, Vancouver, B.C., Canada, V6T 1Z2, (cυ
[email protected]). J.H.J. Einmahl, Department of Mathematics and Computer Science, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, Netherlands, (einmahl@win. tue. nl). Onno van Gaans, Department of Mathematics, Katholieke Universiteit Nijmegen, Toernooiveld, 6525 ED Nijmegen, Netherlands. Richard D. Gill, Mathematical Institute, University Utrecht, P.O. Box 80010, 3508 TA Utrecht, Netherlands, (
[email protected]). G. Golubev, Universite d'Aix-Marseille 1, Centre de Mathematiques et dΊnformatique, 39, rue F. Joliot-Curie, 13453 Marseille, Prance, (golubev@gyptis. univ-mrs.fr). Priedrich Gotze, Department of Mathematics, Bielefeld University, Bielefeld 33501, Germany, (
[email protected]). Ronald Graham, Computer Science, University of California at San Diego, and ATT, Florham Park,NJ, (
[email protected]). Geoffrey Grimmett, Statistical Laboratory, DPMMS, University of Cambridge, 16 Mill Lane, Cambridge CB2 1SB, UK, (g.r.grimmett®statslab.cam.ac.uk). Peter Hall, Centre for Mathematics and its Applications, Australian National University, Canberra, ACT 0200, Australia, (
[email protected]. au). Susan Holmes, Statistics, Stanford University, CA 94305-4065, USA, and Unite deBiometrie, INRA-Montpellier, Prance, (
[email protected], edu). M. Huskova, Department of Statistics, Charles University, Sokolovska 83, CZ 18600 Praha, Czech Republic, (
[email protected]). I. Ibragimov, St.Petersburg Branch of Steklov Mathematical Institute Russian Ac.ScL, Fontanka 27, St.Petersburg, 191011, Russia, (ibr32@ pdmi.ras.ru). Peter Jagers, School of Mathematical and Computing Sciences, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden, (jagers @ math.chalmers.se). Iain M. Johnstone, Department of Statistics, Sequoia Hall, Stanford University, Stanford CA 94305, USA, (
[email protected]).
x
Contributors
R.Z. Khasminskii, Department of Mathematics, Wayne State University, Detroit, MI-48202, USA, (
[email protected]). E.V. Khmaladze, A. Razmadze Math. Institute, M. Alexidze 1, 380093 Tbilisi, Republic of Georgia, and University of New South Wales, (
[email protected]). M.R. Leadbetter, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599-3260, USA, (
[email protected]). David M. Mason, Department of Mathematical Sciences, 501 Ewing Hall, University of Delaware, Newark, Delaware, USA, (daυidm@math. udel.edu). Zoltan Megyesi, Bolyai Institute, University of Szeged, Aradi vertanύk tere 1, H-6720 Szeged, Hungary, (
[email protected]). J. Pfanzagl, Mathematisches Institut der Universitat zu Kόln, Weyertal 8690, D - 50931 Kόln, Germany. Ronald Pyke, Department of Mathematics, University of Washington, Seattle, Washington, (
[email protected]). Alfredas Rackauskas, Vilnius. Jian-Jian Ren, Department of Mathematics, Tulane University, New Orleans, LA 70118, USA, (
[email protected]). Arnoud van Rooij, Department of Mathematics, Katholieke Universiteit Nijmegen, Toernooiveld, 6525 ED Nijmegen, Netherlands. H. Rootzen, Department of Mathematics, Chalmers University of Technology, S-412 96 Gothenburg, Sweden (
[email protected]). Frits Ruymgaart, Department of Mathematics, Texas Tech University, Lubbock TX 79409, USA, (
[email protected]). David Siegmund, Department of Statistics, Sequoia Hall, Stanford University, Stanford, CA 94305, USA, (
[email protected]). Stephen M. Stigler, Department of Statistics, University of Chicago, 5734 University Av., Chicago, IL 60637, USA, (
[email protected]. edu). Jef L. Teugels, Department of Mathematics, Katholieke Universiteit Leuven, Celestijnenlaan 200B, B-3001 Leuven, Belgium, (
[email protected]). G. Wahba, Department of Statistics, University of Wisconsin-Madison, 1210 W. Dayton Street, Madison WI 53706, USA, (
[email protected]). Jon A. Wellner, University of Washington, Department of Statistics, Box 354322, Seattle, WA 98195-4322, USA, (
[email protected]). Erik van Zwet, University of California, Department of Statistics, 367 Evans Hall Berkeley, CA 94720-3860, USA, (υ
[email protected], edu).
CURRICULUM VITAE WILLEM RUTGER VAN ZWET
Born: Citizenship: Education:
March 31, 1934 The Netherlands Ph.D. mathematics, University of Amsterdam, 1964
Professional experience: Mathematisch Centrum Staff member Amsterdam Consultant Chair Science Committee Chair Visiting Committee Member of the Board University of Leiden Associate professor of Statistics Professor of Statistics Chair Dept. Applied Math. Dean Fac. Math.& Natural Sciences Thomas Stieltjes Research Director Institute EURANDOM, Eindhoven Scientific Director Member of the Corporation National Institute of Member of the Board Statistical Sciences Chairman of the Board Rijksherbarium, Leiden Member and Chair Science Com. Leiden University Fund Member of the Board Visiting Associate Professor University of Oregon Miller Professor University of California, Visiting Professor Berkeley University of North Carolina Hotelling Lecturer William Newman Professor Principal Lectures NSF-CBMS Regional Conference Vice-chair Netherlands' Government Advisory Com. for Math. Organizer Annual stochastics meeting at Lunteren Member NATO Science Committee Advisory Panel Member Visiting Committee Applied Mathematics Sweden Member Science Panel Foundation for Strategic Research, Sweden Vice-chair Visiting Committee Mathematics in Flanders Professional affiliation: Institute of Mathematical Statistics
1961-1964 1965-1984 1983-1990 1987 1993-1996 1965-1968 1968-1999 1974-1978 1982-1983 1992-1999 1997-2000 1993-1997 19971997-1999 1984-1999 19931965 1997 1967,72,76,80 81,82,84,89,93 1988 1990-1998 1982 1989-1992 1972-1999 1991-1993 1995 1995, 1999 1995-1997
Member & chair European Reg. Com. 1969-1975 Associate Editor Annals of Statistics 1972-1980 Member council 1978-1981 Editor-in-chief Annals of Statistics 1986-1988
Curriculum Vitae Willem Rutger van Zwet
International Statistical Institute
Bernoulli Society American Statistical Association Netherlands Statistical Society
President Wald Memorial Lecturer Associate editor Int. Statist. Review Chair Org. Com. Centenary Session Vice-president Chair Programme Committee President Member & chair European Reg. Com. Member council President Member Board of Directors
xiii 1991-1992 1992 1981-1985 1981-1985 1985-87,87-89 1989-1993 1997-1999 1975-1980 1976-79,84-87 1987-1989 1993-1995
Associate editor Statistica Neerlandica 1962-1973 Member of the Board 1970-1972
Honors: Van Dantzig Award, Netherlands Statistical Society, 1970 Elected member International Statistical Institute, 1971 Fellow Institute of Mathematical Statistics, 1972 Honorary Fellow Royal Statistical Society, 1978 Elected member Royal Netherlands Academy of Sciences, 1979 Bernoulli Medal, Tasjkent, 1986 Peace Medal, Charles University, Prague, 1988 Fellow American Statistical Association, 1988 Medaille de la Ville de Paris, 1989 Elected member Academia Europaea, 1990 Adolphe Quetelet Medal, International Statistical Institute, 1993 Certificate of Appreciation, American Statistical Association, 1995 Knight in the Order of the Netherlands' Lion, 1996 AKZO-Nobel Award, Netherlands, 1996 Doctor h.c, Charles University, Prague, 1997 Honorary member International Statistical Institute, 1999 Honorary member Vereniging voor Statistiek en Operationele Research, 2000
PUBLICATIONS WILLEM R. VAN ZWET
[I] Convex Transformations of Random Variables, Math. Centre Tracts 7 (1964), Mathematisch Centrum, Amsterdam. [2] Convex transformations: A new approach to skewness and kurtosis, Statist. Neerlandica 18 (1964), 433-441. [3] De waarde van hydroxychloroquine (Plaquenil) voor de behandeling van chronische discoide lupus erythematodes (with J.H. Kraak, W.G. van Ketel and J.R. Prakken), Nederl Tijdschr. Geneesk. 109 (1965), 461-469. [4] Bias in estimation from type I censored samples, Statist Neerlandica 20 (1966), 143-148. [5] On mixtures of distributions (with W. Molenaar), Ann. Math. Statist. 37 (1966), 281-283. [6] Het hachelijke oordeel, Statist. Neerlandica 12 (1967), 117-130. [7] On the combination of independent test statistics (with J. Oosterhoff), Ann. Math. Statist. 38 (1967), 659-680. [8] Host discrimination in pseudocoila bochei (with K. Bakker, S.N. Bagchee and E. Meelis), Ent Exp. & Appl. 10 (1967), 295-311. [9] An inequality for expected values of sample quantiles, Ann. Math. Statist. 38 (1967), 1817-1821. [10] On convexity preserving families of probability distributions, Statist. Neerlandica 22 (1968), 23-32. [II] Stemmingen zonder winnaar (with R.J. In 't Veld), Statist. Neerlandica 23 (1969), 269-276. [12] Asymptotic properties of isotonic estimators for the generalized failure rate function (with R.E. Barlow), Bull. Int. Statist. Inst. 43 (1969), 252-253. [13] Asymptotic properties of isotonic estimators for the generalized failure rate function. Part I: Strong consistency (with R.E. Barlow), Nonparametric Techniques in Statistical Inference, M.L. Puri editor, Cambridge University Press (1970), 159-173. [14] Some remarks on the two-armed bandit (with J. Fabius), Ann. Math. Statist. 41 (1970), 1906-1916. [15] Grondbegrippen van de Waarschijnlijkheidsrekening (with J. Fabius), Math. Centre Syllabus 10 (1970), Mathematisch Centrum, Amsterdam.
Publications Willem R. van Zwet
xv
[16] Comparison of several nonparametric estimators of the failure rate function (with R.E. Barlow), Operations Research and Reliability, D. Grouchko editor, Gordon & Breach, New York (1971), 375-399. [17] The likelihood ratio test for the multinomial distribution (with J. Oosterhoff), Proceedings 6th Berkeley Symp. on Math. Statist and Probability 1 (1972), 31-49. [18] Asymptotic normality of nonparametric tests for independence (with F.H. Ruymgaart and G.R. Shorack), Ann. Math. Statist. 43 (1972), 1122-1135. [19] Asymptotic expansions for the power of distributionfree tests in the onesample problem (with W. Albers and P.J. Bickel), Ann. Statist. 3 (1976), 108-156. [20] Asymptotic expansions for the distribution functions of linear combinations of order statistics, Statistical Decision Theory and Related Topics I I , S.S. Gupta and D.S. Moore editors, Academic Press, New York (1977), 421-438. [21] A proof of Kakutani's conjecture on random subdivision of longest intervals, Ann. Probability 6 (1978), 133-137. [22] Asymptotic expansions for the power of distributionfree tests in the twosample problem (with P.J. Bickel), Ann. Statist. 6 (1978), 937-1004. [23] Mean, median, mode, II, Statist. Neerlandica 33 (1979), 1-5. [24] A note on contiguity and Hellinger distance (with J. Oosterhoff), Contributions to Statistics (Jaroslaυ Hάjek Memorial Volume), J. Jureckova editor, Academia, Prague (1979), 157-166. [25] The Edgeworth expansion for linear combinations of uniform order statistics, Proc. 2nd Prague Symp. on Asymptotic Statistics, P. Mandl and M. Huskova editors, North Holland, Amsterdam (1979), 93-101. [26] On a theorem of Hoeffding (with P.J. Bickel), Asymptotic Theory of Statistical Tests and Estimation, I.M. Chakravarti editor, Academic Press, New York (1980), 307-324. [27] A strong law for linear combinations of order statistics, Ann. Probability 8 (1980), 986-990. [28] On efficiency of first and second order (with P.J. Bickel and D.M. Chibisov), International Statistical Review 49 (1981), 169-175. [29] An asymptotic expansion for the distribution of the logarithm of the likelihood ratio (with D.M. Chibisov), Proc. Third Intern. Vilnius Conference on Probability Theory and Math. Statist. I I , Akademia Nauk. USSR (1981), 55-56.
xvi
Publications Willem R. van Zwet
[30] The Berry-Esseen bound for U-statistics (with R. Helmers), Statistical Decision Theory and Related Topics I I I , Vol. 1, S.S. Gupta and J.O. Berger editors, Academic Press, New York (1982), 497-512. [31] On the Edgeworth expansion for the simple linear rank statistic, Nonparametric Statistical Inference, Coll. Math. Soc. Janos Bolyai, Budapest (1982), 889-909. [32] An inequality for random replacement sampling plans, Festschrift for Erich Lehmann; P.J. Bickel, K.A. Doksum and J.L. Hodges, Jr. editors, Wadsworth, Belmont (1982), 441-448. [33] Ranks and order statistics, Recent Advances in Statistics, Papers in Honor of Herman Chernoff; M.H. Rizvi, J. Rustagi and D. Siegmund editors, Academic Press, New York (1983), 407-422. [34] A Berry-Esseen bound for symmetric statistics, Z. Wahrsch. Verw. Gebiete 66 (1984), 425-440. [35] On the Edgeworth expansion for the logarithm of the likelihood ratio, I (with D.M. Chibisov), Teor. Veroyatnost i Primenen. 29 (1984), 417-439. [36] On the Edgeworth expansion for the logarithm of the likelihood ratio, II (with D.M. Chibisov), Asymptotic Statistics II, Proc. Third Prague Symp. Asympt. Statist, P. Mandl and M. Huskova editors, Elsevier Science Publishers, Amsterdam (1984), 451-461. [37] Van de Hulst on robust statistics: A historical note, Statist. Neerlandica 39 (1985), 81-95. [38] A simple analysis of third-order efficiency of estimates (with P.J. Bickel and F. Gδtze), Proc. Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, I I , L.M. LeCam and R.A. Olshen editors, Wadsworth, Monterey (1985), 749-768. [39] On estimating a parameter and its score function (with C.A.J. Klaassen), Proc. Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, II, L.M. LeCam and R.A. Olshen editors, Wadsworth, Monterey (1985), 827-840. [40] The Edgeworth expansion for U-statistics of degree two (with P.J. Bickel and F. Gδtze), Ann. Statist. 14 (1986), 1463-1484. [41] A note on the strong approximation to the renewal process (with D.M. Mason), Pub. Inst. Univ. Paris XXXII, fasc. 1-2 (1987), 81-91. [42] A refinement of the KMT inequality for the uniform empirical process (with D.M. Mason), Ann. Probability 15 (1987), 871-884. [43] On estimating a parameter and its score function, II (with C.A.J. Klaassen and A.W. van der Vaart), Statistical Decision Theory and Related Topics IV, 2, S.S. Gupta and J.O. Berger editors, Springer, New York (1988), 281-288.
Publications Willem R. van Zwet
xvii
[44] Modelling the growth of a batch culture of plant cells: a corpuscular approach (with M.C.M. de Gunst, P.A.A. Harkes, J. Val and K.R Libbenga), Enzyme Microb. Technol. 12 (1990), 61-71. [45] Andrei Nikolaevich Kolmogorov, Jααrboek 1989 Koninklijke Nederlαnds Akαdemie van Wetenschappen, North Holland, Amsterdam (1990), 166-171. [46] A non-Markovian model for cell population growth: speed of convergence and central limit theorem (with M.C.M. de Gunst), Stoch. Processes and Appl. 41 (1992), 297-324. [47] Steekproeven uit steekproeven: De Baron van Munchhausen aan het werk? Verslag Afd. Nat, Koninklijke Nederlandse Akademie van Wetenschappen, 102 (5) (1993), 49-54. [48] Comment on double-blind refereeing, Statist. Sci. 8 (1993), 327-330. [49] A non-Markovian model for cell population growth: tail behavior and duration of the growth process (with M.C.M. de Gunst), Ann. Appl. Probability 3 (1993), 1112-1144. [50] The asymptotic distribution of point charges on a conducting sphere, Statistical Decision Theory and Related Topics, V, S.S. Gupta and J.O. Berger editors, Springer, New York (1993), 427-430. [51] Wassily Hoeffding's work in the sixties (with J. Oosterhoίf), The Collected Works of Wassily Hoeffding, N.I. Fisher and P.K. Sen editors, Springer Verlag, New York (1994), 3-15. [52] Detecting differences in delay vector distributions (with C. Diks, F. Takens and J. de Goede), Phys. Rev. E. 53 (1996), 2169-2176. [53] Resampling: consistency of substitution estimators (with H. Putter), Ann. Statist. 24 (1996), 2297-2318. [54] Resampling fewer than n observations: Gains, losses, and remedies for losses (with P.J. Bickel and F. Gόtze), Statist. Sinica 7 (1997), 1-31. [55] On a set of the first category (with H. Putter), Festschrift for Lucien he Cam, D. Pollard, E. Torgersen and G.L. Yang, editors, Springer Verlag, New York (1997), 315-323. [56] An Edgeworth expansion for symmetric statistics (with V. Bentkus and F. Gόtze), Ann. Statist. 25 (1997), 851-896. [57] Comment on "Improving the relevance of economic statistics" by Yves Franchet, Economic Statistics: Accuracy, Timeliness and Relevance, Z. Kenessey, editor, US Department of Commerce, Bureau of Economic Analysis, Washington D.C. (1997), 23-26. [58] On the shape theorem for the supercritical contact process (with M. Fiocco), Prague Stochastics '98, M. Huskova, P. Lachout and J.A. Vίsek, editors, Union of Czech Mathematicians and Physicists, Prague (1998), 569-573.
xviii
Curriculum Vitae Willem Rutger van Zwet
[59] Empirical Edgeworth expansions for symmetric statistics (with H. Putter), Ann. Statist 26 (1998), 1540-1569. [60] Discussion of "Towards systems of social statistics" by I.P. Fellegi and M. Wolfson, Bull. Intern. Statist. Inst LVII, 3 (1998), 96-97. [61] Introducing EURANDOM (with W.J.M. Senden), Statist. Neerlandica 53 (1999), 1-4. [62] No complaints so far, Leiden University Press, Leiden (1999), pp.15; also appeared in Nieuw Arch. Wist 17 (1999), 268-279. [63] A remark on consistent estimation (with E.W. van Zwet), Math. Methods of Statistics 8 (1999), 277-284. [64] Asymptotic efficiency of inverse estimators (with A.CM. van Rooij and F.H. Ruymgaart), Teor. Veroyatnost i Primenen 44 (1999), 826844. [65] Opening address of the 52nd Session of the International Statistical Institute at Helsinki, Bull. Intern. Statist. Inst. LVIII (3), 57-60. [66] Decaying correlations for the supercritical contact process conditioned on survival (with M. Fiocco), Technical report EURANDOM (1999), submitted to Bernoulli. [67] Statistical estimation of the parameter of the supercritical contact process (with M. Fiocco), Technical report EURANDOM (1999), submitted to Bernoulli
FROM A TO Z: ASYMPTOTIC EXPANSIONS BY VAN Z W E T
W. ALBERS
University of Twente Refinements of first order asymptotic results axe reviewed, with a number of Ph.D. projects supervised by van Zwet serving as stepping stones. Berry-Esseen bounds and Edgeworth expansions are discussed for R-, L- and [/-statistics. After these special classes, the question about a general second order theory for asymptotically normal statistics is addressed. As a final topic, empirical Edgeworth expansions are considered. AMS subject classifications: 62E20, 62G20, 60F05.
Keywords and phrases: Second order asymptotics, Berry-Esseen, (empirical) Edgeworth expansion, R-, L-, and [/-statistics, symmetric statistics, bootstrap.
1
Introduction
In this paper an attempt is made to sketch van Zwet's contributions to the area of asymptotic expansions. Such a task is not particularly simple, as it concerns an expanding area in more than one sense, which also covers an impressively long period: from the early Seventies till now. (Hence the attempt to capture this comprehensive aspect in a literal manner in the title!) As a consequence, the resulting picture could easily become so loaded with details that the reader will have difficulty to focus, and the remaining impression will be blurred. To avoid this from happening, we shall impose severe restrictions. In the first place, technical details will be dealt with rather loosely, and references will be given only sparingly. Both are amply available in the papers which we do refer to. Moreover, striving for completeness as far as references are concerned, would simply exhaust the available space and thus replace the intended sketch. A more essential restriction, however, is the fact that we shall not try to cover the whole area, but instead will select a single path through it. Our selection criterion, which seems suitable for an occasion like this, will be van Zwet's joint work on asymptotic expansions with quite a few of his students, during and following their Ph.D. projects under his guidance. Other contributions he made will typically only be included if these provided essential tools in these Ph.D. projects, or answered questions arising from such work.
From AtoZ
3
As will be clear from the above, almost no attention at all will be devoted to areas of and approaches to asymptotic expansions other than those used by van Zwet, and hence the efforts of many important contributors to the field as a whole will remain unmentioned. Moreover, those working on similar topics as van Zwet, or even together with him, may still go unnoticed. Finally, those who do get mentioned, may feel that they are represented only bleakly. So let us hasten to apologize to whom it may concern, once more asking understanding for the consequences of just hacking a rather single-minded path, linking van Zwet's contributions from the point where it more or less began, till today. The organization of the paper is as follows. In section 2 we briefly consider the classical case and the corresponding standard techniques. The next two sections are devoted to rank tests. In section 3 the one- and two-sample cases figure, which are linked to the Ph.D. thesis of Albers (1974). Section 4 is devoted to the simple linear rank test, concerned with Does' thesis from 1982. In section 5 we move from it-statistics to L-statistics. Such linear combinations of order statistics were studied in the Ph.D. thesis of Helmers (1978). Note that we do not adhere strictly to chronological order: from time to time we backtrack a little, to pick up developments which have been unfolding simultaneously. This is also the case for [/-statistics, which we consider in section 6. No Ph.D. project was directly involved here, but, as was joked among his students, it was really unavoidable that Willem would do something about [/-statistics: his university, the "Rijksuniversiteit Leiden", is commonly denoted by its abbreviation as "the RUL". Hence [/-statistics form the missing link in his roots between R- and L-statistics! (Incidentally, since 1998 it is simply "Universiteit Leiden", so this time the ranks seem to have gone missing.) Several questions arose from the research till this point. In section 7 we briefly consider the one about "why first order efficiency implies second order efficiency", while section 8 is devoted to the question how things can be generalized, leading to the results for symmetric statistics. This material is used in section 9 for empirical Edgeworth expansions and the bootstrap, which are the topics of the Ph.D. thesis of Putter (1994). 2
The classical case
For several decades now there has been a profound interest in refinements of first order asymptotic results, such as asymptotic normality of test statistics and estimators. A definite impetus in this respect was provided by the special invited paper on Edgeworth expansions in nonparametric statistics by Peter Bickel (1974). He lists the following four reasons for interest in higher order terms: 1) better numerical approximations than with simple normal approxima-
4
W. Albers tions, 2) qualitative insight into the regions of unreliability of first order results, 3) discrimination between first order equivalent procedures, for example in terms of Hodges-Lehmann deficiencies, 4) challenging probabilistic problems.
The starting point for both first and second order results has been the classical case of sums of independent identically distributed (i.i.d.) random variables (r.v.'s). Let X L , . . . , X / V be i.i.d. r.v.'s with positive and finite variance σ 2 and let FN denote the distribution function (d.f.) of SN = N~ιl2 ΣjLι(Xj-EXj)/<τ> Then by virtue of the central limit theorem supa, \FN(X) — Φ(X)\ = 0(1), where Φ is the standard normal d.f. An improvement of this first order result is provided by the Berry-Esseen (BE) bound, which allows replacement of the mere " = o(l)" by "< CN-^2E\Xι\3/σ^\ for some positive constant C, assuming of course that £7|Xi| 3 < 00. Further progress beyond this rate of convergence result requires replacement of Φ by an Edgeworth expansion (EE). A typical result runs like (1)
sup\FN(x)-FN(x)\
= o(N-1),
N
where FN(X) equals
(2) Φ(x) - φ(x) [ ^ r ( * 2 - 1) + ^ ( * 3 - 3s) + ^ ( z 5 - 10x3 + 15a:)], in which K3 and K4 are the 3rd and 4th cumulant of -XΊ, respectively, and φ = Φ'. The choice in (2) is a two-step EE; omitting the terms of order JV"1 produces the one-step EE, which gives o{N~ιl2) rather than o(N~ι) in (1). These first two improvements beyond the BE bound are of primary interest, for example in second order comparisons. Hodges and Lehmann (1970) focussed attention on this area with admirable clarity in a paper with the concise title "Deficiency". Usually two competing statistical procedures A and B are compared as follows: if B requires k = kn observations to match the performance of A based on n observations, the ARE e = Zzran_>Oorc/fcn of B with respect to A is studied. As Hodges and Lehmann point out, a more natural quantity than this ratio would be the simple difference dn — kn — n. Especially, whenever e = 1, i.e. the procedures are first order equivalent, study of this deficiency dn is rewarding. For example, obtaining d = limn-^oo dn (if it exists) allows a perfectly simple comparison: one procedure eventually just requires d more observations than the other one. However, to obtain this type of knowledge, the distributions involved have to be known up to o(ΛΓ~1), rather than merely up to o(l), i.e. a result like the one given by (1) and (2) is required. In their paper, Hodges and
From AtoZ
5
Lehmann demonstrated the use of the deficiency concept through some very convincing and elegant examples. Even more important perhaps, was the fact that at the end of the paper they posed a number of open problems. This stimulated many studies in the Seventies. One of their questions concerned the deficiency of rank tests with respect to their parametric competitors, which inspired the research covered in sections 3 and 4. To ensure that (1) actually holds, obviously a moment condition like £7|ΛΊ|r < oo for some r > 4, is needed. But we also have to assume something like Cramer's condition (C): (3)
limsup|p*(ί)| < 1, |ί|-»oo
where p* is the characteristic function (ch.f.) of X\. We shall now take a look at the arguments used in the proof of (1). This will explain why (3) is used, but more importantly, it will be helpful in the sections that follow. According to Esseen's smoothing lemma, the difference between FN and FN can be measured by comparing their Fourier transforms p# and p#, respectively. In fact, for all T > 0, we have that sup x \FN(X) — FN(X)\ is of order ~ pN(t) / t J-Ί As Piv(ί), the ch.f. of SJV, satisfies |pjv(ί)| = \p*(N 1l2t/σ)\N, an expansion for pN holds for \t\ < δNιl2, for some δ > 0. Now FN in (2) has been chosen such that PN precisely equals the expansion for p^ truncated after the fourth term, which suffices to make the integral in (4) sufficiently small for T = δNιl2. But to get o(N~ι) in (1), we need that Γ " 1 = o(N~ι). On the remaining set / it no longer helps to look at PN — PN and we simply need to show that (4)
(The accompanying result / 7 \pN(t)/\t\dt = o(N ι) is trivial.) If X\ has a lattice distribution, p* is periodic and |PΛΓ(*)| = \p*(N~1/2t/σ)\N will keep returning to 1 and (5) may not hold true. To see that things in fact do go wrong, just consider the binomial case, where (1) clearly is false. Hence the strong non-lattice condition (3), to stay out of this kind of trouble. 3
One- and two-sample rank tests
The basic question is how to extend results for the i.i.d. case to more complicated situations. As concerns first order results, a lot of effort was devoted in
6
W. Albers
the Fifties and Sixties to obtaining asymptotic normality for classes of rank statistics. As we saw above, in the early Seventies, similar questions arose for second order problems. For the easiest case, the one-sample linear rank statistic, this led to the Ph.D. thesis of Albers (1974) and to Albers, Bickel and van Zwet (1976). The idea is that here it is not necessary to expand the statistic: a direct approach will work, using an appropriate conditioning argument. Let Xi,...,X/v be i.d.d. r.v.'s with common d.f. G. Consider the order statistics 0 < Z\ < Z2 < ... < ZN of |ΛΊ|,..., |XJV| and the antiranks DI,...,DN defined through \XDJ\ = Zj Let Vj — 1 if XDJ > 0, and 0 otherwise, j = l,...,iV, then the hypothesis that the distribution determined by G is symmetric about zero is tested on the basis of N
(6)
TN =
where the scores aj are typically generated by some continuous function J on (0,1), e.g. through aj = J(j/(N + 1)) (approximate scores). For J equal to 1, t or Φ - 1 ([l + ί]/2), we obtain the sign, the Wilcoxon signed rank or the one-sample van der Waerden test, respectively. The problem is that the summands in (6) are independent under the hypothesis only. The key step is to note that, conditional on Z = (Zi,..., ZN), the Vj are independent under the alternative as well. Hence the classical theory applies after all and an EE like (2) can be given for the conditional d.f. of T/v A serious obstacle, however, is that the Vj are obviously lattice r.v.'s and (3) will not hold. Fortunately, we are generally saved by the fact that in this respect the i.i.d. case is least favourable. If |PΛΓ(£)| = \p*(N~ι/2t/σ)\N, the only way to keep \p^\ away from 1, is to do so for |p*| through (3). But in the case of varying components, for (5) it amply suffices if for each t there is a positive fraction among the ch.f.'s of the summands which are not close to 1 in modulus. This in its turn is easily achieved by letting the aj vary, i.e. by letting J be non-constant. (On the other hand, a constant J produces the binomially distributed sign statistic, for which the situation is indeed hopeless). Hereafter it remains to obtain an unconditional expansion for the d.f. of T/v by taking the expectation with respect to Z of the conditional EE. Although attention is restricted to the hypothesis and contiguous location alternatives, there are still a lot of technicalities involved and the resulting paper needs almost 50 Annals pages. The resulting expansions, however, are completely explicit and enable quick and illuminating comparisons to first order equivalent tests, such as parametric counterparts. As an example we mention that the aforementioned Hodges-Lehmann deficiency d^ (the additional number of observations required to match the power) of the normal
From
AtoZ
scores test with respect to the f-test satisfies (7)
dN ~ i
Hence the bad news is that its limit is infinite; the good news is that for all practical purposes a single additional observation suffices. Several extensions of the basic result for the one-sample case were realized; we merely mention adaptive rank tests (Albers (1979)) and two-stage rank tests (Albers (1991)). Next we turn to the two-sample problem. We modify the situation described at the beginning of this section as follows: ΛΊ,..., X^ are still independent, but now Xi,..., Xm have common d.f. F and X m + i , . . . , XN have common d.f. G. The Zj in this case are the order statistics if Xi,... ,-XΛΓ? the anti-ranks are defined through Xj)ά — Zj and Vj = 1 if m + 1 < Dj < N and Vj =0 otherwise, j = 1,..., N. Then TN from (6) stands for the general linear rank statistic for testing the hypothesis that F = G. An asymptotic expansion to order N~x for the d.f. of this TN under the hypothesis and contiguous alternatives, was obtained by Bickel and van Zwet (1978). This paper is the natural counterpart of the one-sample paper by Albers, Bickel and van Zwet (1976), but there is also a major difference. In the one-sample problem we are always dealing with symmetric distributions and therefore the terms of order N~ιl2 in the expansions vanish. Hence, when comparing first order equivalent tests, deficiencies of order (almost) 1 (cf. (7)) will typically arise. For the two-sample case there is no reason to expect symmetry, and terms of order N~1/2 do occur. Consequently, one would expect to find deficiencies of order TV1/2, but this is not what happens. In fact, the results for the one- and two-sample case are typically qualitatively the same. This quite surprising result is due to the fact that invariably for first order efficient tests all terms of order JV"1/2 agree, and hence drop out in the deficiency computations. The phenomenon of first order efficiency implying second order efficiency, noted earlier by Pfanzagl (see e.g. Pfanzagl (1979)), was sufficiently intriguing to be studied in its own right and we shall come back to it in section 7. Although the techniques employed are similar to those of Albers, Bickel and van Zwet (1976), the occurrence of the Λ/'~1/2-terms makes the twosample case essentially more complicated to handle. An additional complication is that the distance to the independent case is larger here. For, after conditioning on Z in the one-sample problem, TN is distributed as Σj=\ CLJWJ, where the Wj are independent Bernoulli r.v.'s. In the two-sample case, however this step produces a TN which is distributed as Σf=ι ajWj, given that ΣjLi Wj = N — m. Hence an additional trick, essentially due to Erdos and Renyi (1959), is required to obtain again an explicit representation for the conditional ch.f. of TN- The foregoing hopefully demonstrates that it would be a major understatement to call the two-sample case a straightforward
8
W. Albers
generalization of the one-sample problem. In fact, it took almost 70 pages in the Annals to do so! 4
The simple linear rank test
Let X I , . . . , X Λ Γ be independent r.v.'s with d.f.'s i
(8)
TN
which can be used to test the hypothesis of randomness Fι = ... = FN- The two-sample case from the previous section is contained in (8) as a special case for the choice Cj = 0, j = 1,..., ra, Cj = 1, j = m + 1,..., N. For this general statistic, a direct approach no longer seems feasible and we resort to the more or less traditional road of attack, which consists of decomposing or expanding the statistic itself. The basic scheme suggests to write (9) TN = SN + RN, where SN is a sum of independent r.v.'s, thus allowing application of the classical approach from section 2, while the remainder RN = TN — SN is supposed to be negligible in comparison to SN. For example, under the hypothesis we can compare TN from (8) to SN = Σ?=i ^"^(^(^j))' where J is the score function which generates the aj and F is the common d.f. of the Xj under HQ. This approach has been used extensively to obtain asymptotic normality results for T/v, under varying sets of conditions. (Note that there is a trade-off: allowing quite general Fj means strong conditions on the Cj and αj, whereas e.g. under contiguous alternatives the conditions on regression constants and scores can be much milder.) Typically, the first steps on the road towards second order results are taken by just pressing the above argument a bit harder: SV, being an i.i.d. sum, also allows a classical BE bound, while generally not merely RN = op(\TN\), but in fact RN = OP(N-ι/2\TN\) will hold. Let GN be the d.f. of the standardized version T^ = (TN — J5Tτv)/σ(TΛr), then we simply use, for some sequence e^v > 0,
together with a similar inequality in the opposite direction. The last probability in (10) can be bounded by E\RN\r/(€N&(TN)Y for some large r. As RN = OP{N-ι/2\TN\), this will typically be of order N~rl2e^. The first
9
From AtoZ
probability on the right-hand side of (10) will equal Φ(x + e^r) + O{N~ιl2) by virtue of the classical BE bound. Hence GN(X) — Φ(x) will be of order eN + N-W + eJfN-*'2.
(11)
This sketch shows that this type of argument is not only simple (apart from the technicalities involved!), but unfortunately also simply not good enough: no matter how we choose the €N in (11), we will never get the "right" rate N~1/2. Something like N~ll2+* or maybe N~ιl2\ogN will be the best (11) gets us. Of course, one could object that for practical purposes it really does not matter that much whether the error behaves like N~ιl2 or like N~ιl2\ogN. The point is, however, that a method which already falls short of providing the right answer in the first improvement step, will be quite useless to get any further, i.e. to obtain asymptotic expansions. To get rid of the final £, we need to replace a crude inequality like (10) by a more delicate analysis using the smoothing lemma from section 2: just use (4) for pN(t) = exp(-£ 2 /2) and T ~ Nιl2. To begin with, replace RN in (9) by QN + RN, with e.g. N
(12)
1
QN = N- Σ
CjWXj))
{Rj -
and RN as the new remainder T/γ — (SN+QN) For the standardized version, write Ttf = SN + QN + RN and use that its ch.f. satisfies (13)
pN(t) = EeιtbN + itEeιtbNQN
+O ί -
The first term in (13) equals exp(-ί 2 /2) + O(Λ^- 1 / 2 exp(-ί 2 /4)) because of the classical theory. As Sjy and QΛΓ are sums of independent r.v.'s and as such are almost independent, the expectation in the second term can be shown to be O{N-ιl2e~t2lA) as well. Finally, since both EQ2N and E\RN\ 1 are O(ΛΓ~ ), it follows that the integral in (4) can be made O(N~1'2) for T ~ iV1/4, rather than for T = δ*Nιl2 for some δ* > 0, which is what we really need. To bridge the gap, we expand PN in this interval much farther, producing a remainder term \t\mE\Qjsf\m/m\. As E\Qisr\m can be shown to behave like (cm)mN~m/2 for some constant c, it follows that this remainder term leads to a contribution of order (δ*ce)m in (4). Hence for δ* sufficiently small and e.g. m = log iV, this can be made negligible and the desired result follows. This ingenious argument, which we have discussed in a little bit more detail to convey the flavor of the techniques used, was mentioned by Bickel (1974) in connection with [/-statistics, applied by his Ph.D. student Bjerve
10
W. Albers
to L-statistics and used by Huskova on simple linear rank statistics. Nevertheless, it may have brought us at the right BE rate of order iV"1/2, but there it stops again: the trick with m = logiV works only up to T ~ N1/2. As we already mentioned in section 2, what is really needed to obtain asymptotic expansions, is a way to deal with the integral in (5). This is precisely what we find in van Zwet (1982): he essentially shows that there exist positive β and b such that (14)
\pN{t)\ = O(N-βlo&N)
for logiV < \t\ < WV"3/2.
Clearly (14) amply suffices to show (5). The techniques used to derive this smoothness property depend on the particular structure of TJV They are related to the arguments used in Albers, Bickel and van Zwet (1976) and Bickel and van Zwet (1978), according to which some variation in the summands already suffices to keep their lattice character from destroying the smoothness. Using van Zwet (1982) as a starting point, Does could make remarkable progress in his 1982 Ph.D. thesis. To begin with, he obtained a BE bound under weaker conditions, allowing unbounded score functions, such as the important special case Φ" 1 , which is optimal for normal underlying distributions (see Does (1982a,b)). But he also obtained the desired expansions to o{N~ι), both under the hypothesis (Does (1983)) and under contiguous alternatives (Does (1984)). In view of (14), the emphasis in this work lies on studying the integral from (4) over the region |t| < logiV. This is a highly technical matter, using more sophisticated versions of (12) and (13). A large part of the effort required is due to the desire to not merely prove the result, but to do so under mild conditions which allow direct verification in applications. 5
Linear combinations of order statistics
Since we started with β-statistics in section 3 for the one- and two-sample case, it made sense to continue this development in the next section for the case of the simple linear rank statistic. As a consequence, our changing from β-statistics to L-statistics in the present section means going back in time a little: the developments for linear combinations of order statistics were parallel to or even preceded those for Tjy from (8). Incidentally, the fact that this is not some matter of chance, is discussed by van Zwet (1983). He observes that there exists a striking similarity between the techniques employed in both areas, and uses the image of two armies marching on parallel roads. In fact, they are basically going the same place, in the sense that he manages to show under very general conditions that asymptotic normality of a two-sample rank statistic under a fixed alternative follows from a similar result for an appropriate L-statistic. Another occasion where the two areas
From AtoZ
11
meet, was encountered in Albers, Bickel and van Zwet (1976). Here it was observed that an asymptotic expansion for the d.f. of the one sample rank statistic under fixed alternatives would require such an expansion for the d.f. of an L-statistic. For this reason, attention was restricted to contiguous alternatives. Let Xi,..., J*0v be i.i.d. r.v.'s from some d.f. F, then we replace (8) by N 1
(15)
TN = Nth
where X{:N is the i order statistic of Xi,..., XN and the CM are weights. Just as was the case with i?-statistics, asymptotic results for L-statistics are available under varying sets of conditions. Typically, either the weights are smooth, i.e. C^AΓ = J(i/{N + 1)) for some smooth function on (0,1), or F is supposed to be smooth. In the latter case, however, attention has to be restricted to trimmed L-statistics, i.e. with c^v = 0 for i < Na or i > Nβ, for certain 0 < a < β < 1. Again we begin with the BE case. As we already mentioned in the previous section, Bjerve obtained such a result for L-statistics, using an argument due to Bickel. He considered the trimmed case, while Helmers (1977) applied the same type of approach to smooth weights. The result of Bickel for {/-statistics was further improved by Callaert and Janssen (1978). Using this latter paper, Helmers (1981) improved his previous result by proving it under weaker conditions. Next we move to asymptotic expansions. Here the pattern is again the same as in the previous section: a special argument is required to deal with (5), and here as well this is provided by van Zwet. To be more precise, by van Zwet (1977) it is shown that the ch.f. p^ of the standardized Tjy- = — -ET/v)/cr(T/v) satisfies, for every positive integer r and for t φ 0, (16)
|p,v(ί)|
where 7 > 0 depends on r. Using (16), Helmers (1980) obtained an EE to o(N~1) for L-statistics with smooth weights; the companion result from the trimmed case is contained in Helmers (1979) (the special case of the trimmed mean was already covered by Bjerve). Just as in the case of Rstatistics, it is a highly laborious and technical matter to achieve all this under reasonably mild conditions. Collected together, all this material can be found in Helmers' Ph.D. thesis (1978). An additional remark is that van Zwet (1979) demonstrated that for the special case of uniform underlying distributions stronger results can be obtained than in general. As was discussed in sections 2 and 3, the interest in second order analysis of i2-statistics was stimulated by the desire to obtain deficiencies. Similarly,
12
W. Albers
one can wonder about deficiencies of first order efficient tests based on Lstatistics. For results of this nature we refer to Bening (1995). 6
[/-statistics
After R- and L-statistics, we shall now consider [/-statistics. Let again XI,...,XN be i.i.d. r.v.'s, but this time introduce for symmetric h (i.e. h(x,y) = h(y,x)) the [/-statistic ΛΓ-1
(17)
UN = Σ Σ κχ»Xi)i
where we assume Eh(X\,X2)
E(h(Xι,X2)\Xι (18)
N
= 0 and Eh?{Xι,X2) < oo. Defining g(x) =
= x) and ψ{x,y) = h(x,y) - g(x) - g(y), we can write UN
with UN = (N-l)Σ?=19(Xi) and Δ * = Σ ^ Σ & i + i !«**,*;)• Provided that Eg2(X\) > 0, we have that [/jv/σ([/jv) is asymptotically standard normal. As was mentioned in section 4, the first BE bound for [/-statistics was already obtained by Bickel (1974). Moreover, in the previous section we discussed how this result was used by Helmers to obtain a BE bound for L-statistics, and that he subsequently sharpened his result by using an improved version of the BE bound for [/-statistics due to Callaert and Janssen. The final step in this apparent interplay between rate of convergence results for L- and [/-statistics was due to Helmers and van Zwet (1982), who obtained the BE-bound for [/jv/σ([/;v) under the natural condition that The situation for asymptotic expansions to o(N~1) is as follows: the first result on EE's for [/-statistics was obtained by Callaert, Janssen and Veraverbeke (1980) (also see Janssen's Ph.D. thesis from 1978). However, they had to impose a complicated smoothness condition on the distribution of /i, which was difficult to verify, and also clearly more strict than necessary. But, just as in section 3, it turns out that problems caused by a possible lattice character, become less, rather than more pronounced as the situation gets more complicated. In the former case, the i.i.d. sum was least favorable and some variation in the summands already sufficed to obtain the required smoothness. Here we observe that in going from single to double sums, like those in (17), the magnitude of the jumps in the d.f. for the lattice case ι 2 3 2 typically goes down from N~ l to N~ / (cf. the "bad" sign statistic to the "good" Wilcoxon or signed rank statistic, which falls under (17)). Consequently, Bickel, Gδtze and van Zwet (1986) succeeded in establishing the EE to o(N~λ) under very mild conditions that are easy to verify and
13
From AtoZ
do not involve smoothness of the d.f. of Λ(-XΊ, X2), but only of the d.f. of g(X\). In fact, conditions on g are given such that UN from (18) admits an EE, supplemented with a moment condition on ^(JYi, JΓ2) to control the behavior of the remainder AN in (18). However, one awkward condition remains. Let u>i,W2,... be some orthonormal sequence of eigenfunctions of the kernel φ with respect to the d.f. F of the Xi, and let λi, λ2,... be the corresponding eigenvalues, i.e. (19)
ίφ(x,y)ωj(x)dF{x)
= XjUjίy).
Then it is assumed that a sufficient number of these λj are nonzero. The meaning of this condition only becomes clear during the proof. Again, the source of trouble is the behavior of the ch.f. PN(Ϊ) for large |t|, making it hard to prove (5). In the present case the problem is that for these large |ί| this behavior is no longer governed by UN, but instead by the remainder AN- TO avoid degeneration in the subsequent analysis, a certain number of eigenvalues should be nonzero.
7
Efficiency of first and second order
After completion of sections 3-6, we have reached the level where BE bounds and EE's to o(N~λ) are available for iZ-, U- and L-statistics. Before climbing on to the next level, we briefly pause to contemplate the phenomenon of first order efficiency implying second order efficiency, which we encountered in section 3 in connection with two-sample rank tests. In the mean time, several other groups, such as Pfanzagl and his students, had also made significant contributions to higher order theory. Here we merely mention that Pfanzagl (1979) demonstrated that this phenomenon happens in general when first order efficient tests are compared. The powers of such tests typically agree to second, rather than merely to first order. Now it is one thing to observe this state of affairs, but because of the technicalities involved, it is quite something else to understand why it does happen. Fortunately, Bickel, Chibisov and van Zwet (1981) provide a nice intuitive explanation of the phenomenon. The idea (very roughly!) is as follows. For N = 1,2,..., let XN be the outcome of an experiment and suppose that this XN has density either pNfi or PNI- (Usually N simply stands for the number of independent r.v.'s in the Nth testing problem.) The test function of the most powerful level-αw test in this case is . ,A v ί 1 iϊAN>cN, /oπ, (20) v
J
ΦN\AN) v
— S
n
IL
^ ' \ 0 otherwise, where AN is the log likelihood ratio log{pN,i(XN)/PN,o(XN)}' Typically, we are interested in the contiguous case, where CXN = EN,OΦN(AN) re-
14
W. Albers
mains bounded away from zero, while the power π ^ = EN,IΦN(ΔN) remains bounded away from one. Under these circumstances, ΛΛΓ is generally asymptotically normal and moreover usually admits an EE for π ^ like (21)
π*N
Let ZΛΓ be a competing first order efficient test statistic, with level test function ΨN{ZN) = 1 for ZN > d/v and ΨN{ZN) = 0 otherwise, and power 7πv admitting an EE TΠV = c0 + c'ιN~ιl2 + o(N~1/2). Note that we use the same co here as in (21) by virtue of the first order efficiency. However, calculation for explicit examples invariably shows that also c[ = c\, implying that ZJV is in fact second order efficient. To understand why 7Γy — ΈN = o(N~1/2), rather than of the exact order iV" 1 / 2 , we observe that this power difference equals (22) Note that the contribution involving edN in (22) can be smuggled in because both tests have level a^ and thus EN,QΦ{AN) = EN,OΨN(ZN) Since ZN is first order efficient, we can write ZN = AN + ΔJV, with AN = 0p(|ΛjvΊ) (cf.(9)). The factor (ΦN(AN) - ΨN{ZN)) in (22) will be non-zero only on the set where AN is between CAT and (IN — AN- In view of the first order equivalence of the tests, CN and (IN are close and therefore AN is with large probability close to (IN on this set. Consequently, when the second factor in (22) is non-zero - which happens with small probability - the first factor will typically be small. This provides the acceleration from precise order N~ιl2
to
o{N-λl2).
As a final remark in this section we mention that Bickel, Gόtze and van Zwet (1983) have extended the approach above to the study of thirdorder efficiency of maximum likelihood-type estimates.
8
Symmetric statistics
Nowadays, many scientists are thrilled by studies of the expanding universe. Some, however, seem to have reversed preferences and rather pursue universal expansions! As van Zwet (1984) pointed out, the multitude of results obtained till then (and described in the previous sections) may have been extremely useful for statistical applications, but from a probabilistic perspective it still looks rather ad hoc, without much hope for a general theory. Consequently, he started the development of a general second order theory for asymptotically normal statistics. As the statistics involved are functions of i.i.d. r.v.'s X\,..., XN , it can be assumed without loss of generality that the functions involved are symmetric. But this restriction to symmetric statistics is the only limitation imposed. Nevertheless, even this limitation
15
From A to Z
can be avoided, but for arbitrary functions the conditions involved will be much more complicated and difficult to verify. Consider T — t{X\,... ,XN), where the function t is symmetric in its N arguments. As we have seen, a common approach towards second order results involved Taylor expansion of T (cf. e.g. (9) and (12)). But the smoothness of ί, which is needed for this method, does not seem to be essential. The proper approach for the general case is Hoeffding's decomposition, which expands T in a series of [/-statistics of increasing order. Assume that 2 ET < oo and write N
(23)
N-l
N
T
where Ti = E(T\Xi)-ET To illuminate the idea behind (23), let Tm be the L2-projection of T on the linear space spanned by functions of at most m r.v.'s from then N
N-l
f
N
f
N
τ
* - * =Σ Σ ^ i=l
j=i+l
R
f
=Σ( i
- fi-ι)
j=3
(The alternative term ANOVA-type decomposition is sometimes used, in view of these repeated orthogonal projections.) Using (23) and properties of L2-projections, van Zwet (1984) obtained the BE bound for T, assuming that E\E(T\Xι)\3 = O(ΛT 3 / 2 ), together with a simple moment condition to control the behavior of T — f\. If this result is applied to special cases like U- and L-statistics, it reproduces the optimal results for these situations (e.g. E\g(Xι)\* < oo and Eh2(XuX2) < oo for ^-statistics, cf. section 6). For the present general case, the step from the BE bound to an appropriate EE, is essentially more complicated than in the special cases studied before. In view of the similarity between (18) and (23), at first sight one would expect that the approach of Bickel, Gόtze and van Zwet from section 6 for [/-statistics, would lead in a rather straightforward manner to an EE to o(N~λ) here as well. Unfortunately, the behavior of the "sole" difference R between (23) and (18) turns out to be extremely complex. In a sense, this is not completely surprising: the term preceding R in (23) corresponds to ΔJV from (18), and already AN required a peculiar eigenvalue condition (cf. (19)) to ensure its proper behavior. Hence, for terms of still higher order, things probably get even worse. The situation at present is as follows: an EE to o(N~1) does exist (see Gδtze and van Zwet (1991)), but is as yet not in a form fit for publication.
16
W. Albers
Bentkus, Gotze and van Zwet (1997) present an EE to O(ΛΓ~1), which thus not includes the terms of order iV"1, but does attain the right order, and not something like O(N~ι+δ) (cf. the discussion following (11)). Incidentally, they also show that without the eigenvalue condition, the need of which was in some doubt, the EE to o(N~1) for [/-statistics is not necessarily valid. The result obtained looks quite natural: take the one-step EE (cf. (2)) and use for KS simply the third cumulant of N
N-l
N
i.e. neglect R in (23). This leads to an error O(N~ι) under appropriate moment conditions: a fourth for 2\, a third for Tyi and a relatively simple one to control the behavior of R. In addition, as expected, a Cramer-type condition on T\ is needed. Just as in the BE case, the general result obtained here turns out to be comparable to the best available results for special cases. The proof is long and tedious, among others since the traditional smoothing lemma (cf. (4)) does not seem to work anymore; its role is now played by a nonstandard smoothing inequality, on which a technique called data-dependent smoothing is based. 9
Empirical Edgeworth expansions
In this final section we consider the results obtained by Putter in his '94 Ph.D. thesis. He studies substitution estimators (formerly known as plug-in estimators!), with the bootstrap as the most prominent example. Besides results on consistency of such substitution estimators (see Putter and van Zwet (1996)), he also pays ample attention to so-called empirical Edgeworth expansions (EEE's), which provide the link to the present review. In analogy to our observation in section 5 about R- and L-statistics, the existence of such a link is no coincidence: the closer one looks, the better one sees the relation between bootstrap and EEE. To begin with, practitioners often hope that the bootstrap automatically works, and thus effectively replaces the need for statistical thinking by routine application of simulation, but (un?)fortunately, this is not the case. Van Zwet in particular has shown that typically the bootstrap requires asymptotically linear and asymptotically normal statistics. Moreover, finer properties such as second order correctness, which have made the bootstrap even more popular, typically require the validity of an Edgeworth expansion. Hence it seems that the bootstrap and appropriate expansion techniques work under similar circumstances. In addition, the use of expansions helps to understand the behavior of the bootstrap. Consider for example the second order correctness property,
17
From AtoZ
which means that the error of the bootstrap approximation can actually be of a smaller order of magnitude than the error in the customary normal approximation. Specifically, let ΛΊ,... ,Xχ be i.i.d. r.v.'s from a d.f. F and let TN = tpι(Xi,- . ,XN) be a symmetric statistic (cf. section 8). Suppose the d.f. GN of the standardized version (24)
T*N =
(TN-ETN)/σ(TN)
admits an EE. Then the bootstrap approximation G*N for GN relies on replacing F by some empirical version, like the empirical d.f. FN Consequently, the coefficients in the EE for G*N are just the empirical counterparts of the corresponding coefficients in the EE for GN. But now a similar argument applies as in section 7: these coefficients are ι 2 ι of order N~ l (or even N~ ) to begin with, and estimation errors are op(ί) 1 2 (typically even Op(N~ / )), which in combination leads to an approximation error op{N~ιl2) (or even Op(N~1)), rather than merely Op(7V~1/2), for this EEE, and thus for the bootstrap. Incidentally, do note that we have considered the standardized version Tjy. For Γ/v itself, σ(TJv) will occur already in the leading term of the EE, leading to an estimation error of at least order N~ιl2. As σ(T/v) is typically unknown in practice, the statistic of real interest is neither T^ from (24) nor TV, but a Studentized version (25)
fN = (TN -
ETN)/SN,
where Sjj is some appropriate estimator of σ2(Tjy). The above immediately prompts the following question: instead of merely using the EEE to explain the bootstrap, can't we use it to replace the bootstrap altogether? In this way, a lot of simulation effort can be avoided. This attractive idea is studied extensively by Putter. Generally speaking, it turns out that both bootstrap and EEE indeed outperform the ordinary normal approximation. In the mutual comparison, the bootstrap seems to be slightly better than the EEE, which agrees with intuition as the bootstrap also estimates higher order coefficients, whereas the EEE stops after one (or two) steps. Up to now, we have mainly outlined the motivation and the general ideas. At the end of this section, we shall briefly also consider some specific aspects, such as methods applied, types of estimators used, etc. But, as usual, we largely refer to the relevant papers, which in this case are Putter (1994) and Putter and van Zwet (1998). Consider symmetric statistics T/v with ETN = 0, then one-step EE's with error o(N~1/2) are established for TN/σ(TN) and for fN = TN/SN (cf. (25)). For S2N the well-known jackknife estimator of variance is used. Next, the coefficients in these EE's are estimated in a similar fashion, also using jackknife techniques, and it is shown that the resulting one-step EEE's have error op{N~ιl2).
18
W. Albers
As concerns the methods of proof, for the EE's the key tool again is Hoeffding's decomposition (cf. (18) and (23)). Extensive use is made of the results by Bickel, Gδtze and van Zwet (1986) on the EE for [/-statistics, which were discussed in section 6. For the step from EE's to EEE's, it suffices to show the consistency of the jackknife estimators applied. It is demonstrated that the results obtained are sufficiently general to allow application to [/-statistics, L-statistics, smooth functions of the sample mean, as well as smooth functionals of the empirical d.f. Moreover, it is also demonstrated how the results can be used to prove second order correctness of the bootstrap for Studentized [/-statistics of degree two, a case which was studied earlier by Helmers (1991) under stronger moment conditions. Here our sketch comes to an end. Maybe this comes across a little abruptly, leaving the reader out on a limb. But remember that this is the appropriate place to be at the end of a journey through a tree-like structure such as this review!
REFERENCES Albers, W. (1974). Asymptotic expansions and the deficiency concept in statistics. Mathematical Centre Tract 58. Mathematisch Centrum, Amsterdam. (Ph.D. thesis) Albers, W., Bickel, P.J., van Zwet, W.R. (1976). Asymptotic expansions for the power of distribution free tests in the one-sample problem. Annals of Statistics 4 108-156. Albers, W. (1979). Asymptotic deficiencies of one-sample rank tests under restricted adaptation. Annals of Statistics 7 944-954. Albers, W. (1991). Second order analysis of two-stage rank tests for the one-sample problem. Annals of Statistics 19 1042-1052. Bening, V.E. (1995). A formula for deficiency: one sample L- and i?-tests I, II. Mathematical Methods Statistics 4 167-188, 274-293. Bentkus, V., Gδtze, F., van Zwet, W.R. (1997). An Edgeworth expansion for symmetric statistics. Annals of Statistics 25 851-896. Bickel, P.J. (1974). Edgeworth expansions in nonparametric statistics. Annals of Statistics 2 1-20. Bickel, P.J., van Zwet, W.R. (1978). Asymptotic expansions for the power of distribution free tests in the two-sample problem. Annals of Statistics 6 937-1004. Bickel, P.J., Chibisov, D.M., van Zwet, W.R. (1981). On efficiency of first and second order. International Statistical Review 49 169-175. Bickel, P.J., Gόtze, F., van Zwet, W.R. (1983). A simple analysis of thirdorder efficiency of estimates. Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II 749-768.
From AtoZ
19
Bickel, P.J., Gόtze, F., van Zwet, W.R. (1986). The Edgeworth expansion for {/-statistics of degree two. Annals of Statistics 14 1463-1484. Callaert, H., Janssen, P. (1978). The Berry-Esseen theorem for {/-statistics. Annals of Statistics 6 417-421. Callaert, H., Janssen, P., Veraverbeke, N. (1980). An Edgeworth expansion for {/-statistics. Annals of Statistics 8 299-312. Does, R.J.M.M. (1982a). Higher order asymptotics for simple linear rank statistics. Mathematical Centre Tract 151. Mathematisch Centrum, Amsterdam (Ph.D. Thesis). Does, R.J.M.M. (1982b). Berry-Esseen theorems for simple linear rank statistics under the null-hypothesis. Annals of Probability 10 982991. Does, R.J.M.M. (1983). An Edgeworth expansion for simple linear rank statistics under the null-hypothesis. Annals of Statistics 11 607-624. Does, R.J.M.M. (1984). Asymptotic expansions for simple linear rank statistics under contiguous alternatives. Asymptotic Statistics 2 (eds. P. Mandl and M. Huskova), 221-230, North Holland. Erdδs, P., Renyi, A. (1959). On the central limit theorem for samples from a finite population. Magyar Tud. Akad. Mat Kutatό Int. Kόzl. 4 49-61. Gδtze, F., van Zwet, W.R. (1991). Edgeworth expansions for asymptotically linear statistics. Preprint 91-034, Universitat Bielefeld. Helmers, R. (1977). The order of the normal approximation for linear combinations of order statistics with smooth weight functions. Annals of Probability 5 940-953. Helmers, R. (1978). Edgeworth expansions for linear combinations of order statistics. Mathematical Centre Tract 105. Mathematisch Centrum, Amsterdam. (1982; Ph.D. thesis 1978) Helmers, R. (1979). Edgeworth expansions for trimmed linear combinations of order statistics. Asymptotic Statistics 1 (eds. P. Mandl and M. Huskova), 221-232. Helmers, R. (1980). Edgeworth expansions for linear combinations of order statistics with smooth weight functions. Annals of Statistics 8 13611374. Helmers, R. (1981). A Berry-Esseen theorem for linear combinations of order statistics. Annals of Probability 9 342-347. Helmers, R., van Zwet, W.R. (1982). The Berry-Esseen bound for Ustatistics. Statistical Decision Theory and Related Topics, III, Vol. 7497-512. Helmers, R. (1991). On the Edgeworth expansion and the bootstrap approximation for a Studentized {/-statistic. Annals of Statistics 19 470-484. Hodges, J.L., Lehmann, E.L. (1970) Deficiency. Annals of Mathematical Statistics 41 783-801.
20
W. Albers
Pfanzagl, J. (1979). First order efficiency implies second order efficiency. Contributions to Statistics, 167-196, Reidel, Dordrecht. Putter, H. (1994). Consistency of resampling methods. (Ph.D. thesis) Putter, H., van Zwet, W.R. (1996). Resampling: consistency of substitution estimators. Annals of Statistics 24 2297-2318. Putter, H., van Zwet, W.R. (1998). Empirical Edgeworth expansions for symmetric statistics. Annals of Statistics 26 1540-1569. van Zwet, W.R. (1977). Asymptotic expansions for the distribution functions of linear combinations of order statistics. Statistical Decision Theory and Related Topics, 7/421-437. van Zwet, W.R. (1979). The Edgeworth expansion for linear combinations of uniform order statistics. Asymptotic Statistics 1 (eds. P. Mandl and M. Huskova), 93-101, North Holland, van Zwet, W.R. (1982). On the Edgeworth expansion for the simple linear rank statistic. Nonparametric Statistical Inference, Vol. I, II 889909. van Zwet, W.R. (1983). Ranks and order statistics. Recent Advances in Statistics 407-422, Academic Press, New York. van Zwet, W.R. (1984). A Berry-Esseen bound for symmetric statistics. Z. Wahrsch. Verw. Gebiete 66 425-440. FACULTY OF MATHEMATICAL SCIENCES
P.O. Box 217 7500 AE ENSCHEDE NETHERLANDS
albers @math. utwente. nl
LIKELIHOODS AND PSEUDOLIKELIHOODS FOR MARKOV SPATIAL PROCESSES
A.J. BADDELEY
University of Western Australia We study spatial random processes (mainly point processes in Rd) which are defined to satisfy various spatial analogues of the Markov conditional independence property. We explore some issues in statistical inference for such models, including likelihood and pseudolikelihood methods, and identifiability. AMS subject classifications: 60D05, 60G55, 62M30.
Keywords and phrases: Area-interaction process, Berman-Turner device, Conditional intensity, Directed Markov point processes, Efficiently estimable parameters, Exponential families, Gibbs processes, Identifiability, Markov point processes, Markov Chain Monte Carlo, Multiparameter counting processes, Ord's process, Pairwise interaction, Spatial clustering, Strauss process, Total variation distance, Widom-Rowlinson model.
1
Introduction
Markov point processes [75, 76] axe a rich class of stochastic models for spatial patterns, with the virtue of being relatively tractable. They are defined to satisfy one of several spatial counterparts of the Markov conditional independence property. The likelihood takes a simple explicit form, apart from a difficult normalising factor. Indeed typically the likelihood is an exponential family, and the canonical sufficient statistic is often closely related to nonparametric spatial statistics. Typically each process is the equilibrium measure of an associated space-time Markov process; thus it is amenable to Markov Chain Monte Carlo simulation and bootstrap inference. Accordingly there is much current interest in exploring the potential applications of Markov point processes, which include spatial statistics, digital image analysis, and geostatistics. The first half of this article is a condensed introduction to Markov point processes. The second half describes recent work by the author and collaborators (N.A. Cressie, N.I. Fisher, J. M0ller, G. Nair, A. Sarkka and T.R. Turner) on finding new Markov models for different types of patterns, elaborating properties of these models, and performing statistical inference for spatial datasets using bootstrap, likelihood or pseudolikelihood methods. 2
Background
This section covers basic background about point process densities, Gibbs and Markov point processes, and conditional intensities.
22
2.1
AJ. Baddeley
Point process densities
See [18, 21] for definitions and background on point processes. In order that likelihoods may exist, we shall restrict attention to finite simple point processes whose distributions are absolutely continuous with respect to the distribution of the Poisson process. Such a process may be visualised very easily as a random finite number of points at random locations in a space 5. A realisation of the point process X is a finite unordered set of points, x = {zi,...,z n },
XitS,
n>0
The space 5 in which the points lie is typically a subset of Rd, but may be any Polish space. Let X be the space of all such realisations. All the point process models X in this paper will be absolutely continuous with respect to the distribution of the Poisson point process [18, 43] with intensity measure v on S where v is a fixed, nonatomic, finite Borel measure. Then X has a probability density f : X -> [0, oo] such that (1)
F{XeA}
=
e-^ 71=0
for each AeT,
where J o (/, A) = 1 {0 E A} /(0) and for n > 1
In(f,A) = -n [... [ - Js Js
l{{x1,...,xn}eA}f({xu...,xn})dv(xι)...dv{xn).
In the simple case where S is a bounded subset of W1 and v is the restriction to S of Lebesgue measure, /({xi,...,a; n })dxi... dxn (for distinct xi,... ,xn E S) is the probability that the process consists of a point near each of the locations # i , . . . , xn and no other points. Example 2.1 (Poisson process) Let n(x) denote the number of points in a realisation x E X. If α, β > 0 are constants, /(x) = aβn{^ is recognised from (1) as the density of the Poisson process with intensity measure βv( ), and the normalising constant a equals exp{—(β — l)ι/( Example 2.2 For a function β : S -+ [0, oo), n(x)
(2)
/(x) = a Π β{*i)
Likelihoods and Pseudolikelihoods
23
is the density of the "inhomogeneous" Poisson process with intensity measure κ(B) = fB β(u) du(u) on 5 and the normalising constant is
2.2
Interpoint interactions
Definition 2.1 A finite Gibbs point process is α finite simple point process with α density f(x) satisfying the positivity condition (3)
/(x) > 0
=> /(y) > 0 for all y C x.
See [68, 81] and the excellent surveys by Ripley [74, 75]. By an application of the Mδbius inversion formula or "inclusion-exclusion" [14, chap. 5,12] the density of any finite Gibbs point process can be written in the form n(x)
(4)
where VQ is constant and Vk : Sk —>MU {—oo} are symmetric functions. Thus the log likelihood of a particular configuration x is a sum of penalties incurred for the presence of each point Xi G x, for the interaction between each pair of points x^ Xj E x, for the interaction between each triple of points Xi,Xj,Xk E x, and so on. The sum can be interpreted as the physical "potential energy" of the configuration. This interpretation is familiar in statistical physics [77, 69]; the individual functions Vk are called "interaction potentials". Example 2.3 (Pairwise interaction) A pairwise interaction process on S has a density of the form n(x)
(5)
/(x) = α
where 6 : 5 ->• M+ is the 'activity' and h : S x S -> M+ the 'interaction' function, and a > 0 is the normalising constant. The terms b(x{) in (5) influence the intensity and location of points, while the terms h(xi,Xj) introduce dependence ('interaction') between different points of the process X. Note that conditions must be imposed on 6, h to ensure (5) is integrable. Typically a is not known explicitly; this is the 'partition function'.
24
AJ. Baddeley
Example 2.4 (Strauss process [82, 38]) This is a pairwise interaction process with constant activity b(u) = β and a 'threshold' interaction function h(u,υ) =< ' v ' '
\ 1
'
— vll < r .
otherwise
where r > 0 is a fixed interaction distance and 0 < 7 < 1 is the interaction parameter. Hence the probability density is (6)
/(x) = α
(taking 0° = 1) where β ( x ) = #{{i,j)
:i<j,
0<\\xi-
Xj\\ < r}
is the number of unordered pairs of close points in x. The Strauss process is well-defined for all 7 G [0,1]; the density is not integrable for 7 > 1. If 7 = 1, this reduces to the Poisson process with intensity βv. If 7 < 1 the process exhibits "repulsion" or "inhibition" between points, since s(x) tends to be smaller than under the Poisson model. Example 2.5 (Hard core process) If 7 = 0 the Strauss density (6) reduces to
(7)
/(x) ί
[ί)
/ W
~ \ 0 ifβ(x)>0 called a classical hard core process with hard core diameter r. It is equivalent to a Poisson process with intensity βv( ) conditioned on the event that there are no points closer than r units apart. Interpoint interactions of higher order also arise naturally. Example 2.6 (Widom-Rowlinson process) Let 5 be a compact subset of R 2 . The Widom-Rowlinson penetrable sphere model [85], or 'areainteraction' process [7], has density (8)
f{x) = α /f»(*) 7 -*M
where β,j > 0 are parameters, a > 0 is the normalizing constant, and A(x) is the area of
c/ r (x)= I JJ where B{xι\r) is the disc of radius r centred at X{. The density (8) is integrable for all values of 7 > 0. The process produces clustered patterns
Likelihoods and Pseudolikelihoods
25
when 7 > 1, ordered patterns when 0 < 7 < 1, and reduces to a Poisson process when 7 = 1. The Gibbs decomposition (4) of the density (8) can be computed explicitly by applying the inclusion-exclusion formula to the area of the union of the discs B{xi\r). Interaction terms of all orders are non-vanishing, i.e. the Widom-Rowlinson model has interactions of infinite order. A simple but important relationship holds between a finite Gibbs point process and its conditional distributions. Lemma 2.1 Let X be α finite Gibbs point process on S with density f. Let Ad S be α compact subset Then the conditional distribution ofXΠA given X Π Ac is a finite Gibbs point process on A, with conditional density (9)
/Λ(z|y)=α(y)/(zUy)
(with respect to the Poisson process on A whose intensity measure is the restriction ofv to A) for finite sets z C A, y C Ac, where α(y) is a normalising constant. If / is expressed in terms of interaction potentials Vk as in (4), then the corresponding expression for /A has interaction potentials wk{z') = υk(z') + ] Γ un(y/uz')(y' u z ') y'cy
which is to say that interactions occur not only amongst the points of the configuration z but also between these random points and the 'fixed' points y. (Note that the marginal distribution of XΓ) A does not satisfy a statement similar to Lemma 2.1.) 2.3
Conditional intensities
The (Papangelou) conditional intensity of a point process is the continuous space analogue of a certain conditional probability for discrete random fields. The conditional intensity λ(u; x) of X at a location u £ S may be loosely interpreted as giving the conditional probability that X has a point at u given that the rest of the process coincides with x: \(u; x) = hm Aui{u}
y \ I/(Δu)
where the limit is taken over decreasing open neighbourhoods Δu of u E S. Formally the conditional intensity is a Radon-Nikodym derivative defined to satisfy (10)
E
A.J. Baddeley
26
(the "Nguyen-Zessin formula") for all nonnegative bounded measurable functions g : S x X -> R+. See [37] for an informal introduction, or [28, 29, 36, 44] for details. For any Gibbs process on W (see section 2.2) with density /, the conditional intensity at a point u G W equals (11)
λ(w x) =
/(x)
if ix 0 x, while for X{ G x we have λ ( ^ x) = /(x)//(x \ {xi}). In the statistical physics interpretation, logλ(ΐx x) = log/(x U {u}) — log/(x) is the energy required to add a new point u to an existing configuration x. For example, the inhomogeneous Poisson process with intensity function λ( ) has conditional intensity λ(u x) = X(u) at all points u. The general pairwise interaction process (5) has conditional intensity (x)
(12)
λ(τx x) = b(u) JJ h(u,Xi).
Note that the intractable normalising constant in (5) has been eliminated in the conditional intensity. For this reason, inference based on the conditional intensity is typically easier than maximum likelihood. 2.4
Markov point processes
A Markov point process [76, 74, 75] is one in which interpoint interactions occur only between those points which are deemed to be 'neighbours'. Example 2.7 Consider the pairwise interaction process (5) in R2. Assume the interaction function h has finite range r > 0, in the sense that /ι(u, v) = 1 whenever |τx — υ\\ > r. Declare two points u,v G S to be neighbours, written u ~ v, if they are closer than r units apart: (13)
u~υ
iff ||tx-υ|| < r.
Then interactions occur only between neighbours, i.e. (5) becomes n(x)
(14)
/(x) = a
Π' 2=1
where the second product is over all unordered pairs of neighbouring points. The conditional intensity (12) becomes (15)
λ(tx x) = b(u)
Likelihoods and Pseudolikelihoods
27
where the product is over all neighbours of u in x. Note that in this example the conditional intensity (15) depends only on u and on the neighbours of u in x. This important property signifies that interaction is "local". Definition 2.2 (Ripley &; Kelly [76]) Let ~ be a symmetric, reflexive relation on S. A Markov point process on S with respect to ~ is a finite Gibbs point process whose conditional intensity \(u; x) depends only on u and {xi £ x : x% ~ u}. For example, the inhomogeneous Poisson process (2) is a Markov point process with respect to any relation ~ since λ(w; x) = β(u) depends only on u. Example 2.8 For the Strauss process (Example 2.4) (16)
λ(u,x)
where ί(w,x) = s(x U {u}) — s(x) = #{#i G x : 0 < \\xι — u\\ < r} is the number of points Xj E x which are close to u, other than u itself. See Figure 1. Hence the Strauss process is Markov with respect to the relation - of (13).
Figure 1. Illustration of conditional intensities. Left: Strauss process; Right: WidomRowlinson process. The conditional intensity of the Strauss process at a point u (o) depends on the number of existing points (•) of the configuration x which are closer than r units distant from u. In this illustration t(u, x) = 2. The conditional intensity of the Widom-Rowlinson process at a point u (o) depends on the shaded area.
Example 2.9 For the Widom-Rowlinson process (Example 2.6) (17)
λ(u;x)
28
A.J. Baddeley
where T(u, x) = A(xU{u})-A(x) is the area of the region B(u; r)\Ur(x). See Figure 1. Clearly T(w,x) depends only on ^ and {x{ E x : ||u — a?<|| < 2r}. Thus, the Widom-Rowlinson process is Markov with respect to the relation (13) with r replaced by 2r. Definition 2.3 Let ~ be α symmetric relation on S. The neighbourhood of a set A C S is λί (A) = {u E S : u ~ v for some υ E A}. Definition 2.2 then states that a finite Gibbs point process is Markov if \(u; x) depends only on u and on λί (u) Π x. The epithet 'Markov' for these processes is justified by the following conditional independence property. Lemma 2.2 (Spatial Markov Property) Let X be a Markov point process on S. Then the conditional distribution of XΓ\A given X Γ\AC depends only on X in the neighbourhood λί {A) Π Ac: c
P{X n A\x n A ) = P{x n A\x n {λί (A) n Ac)). In (4) we saw that a Gibbs density can be decomposed into interaction terms Vk of each order k = 0,1,2,... For the case of a Markov point process, the interaction term for a particular A -tuple of points is nontrivial only if all these points are neighbours. Grimmett [30] introduced the term "clique": Definition 2.4 Let ~ be a symmetric, reflexive relation on S. A configuration x E X is a clique if all points of x are neighbours (u ~ v for all u,υ E xj. A set containing 0 or 1 points is a clique. The following is analogous to the Hammersley-Clifford theorem for discrete Markov random fields [30, 16]. Theorem 2.1 (Ripley-Kelly [76]) A finite simple point process with density f is a Markov point process iff its interaction potentials Vk satisfy Vfc(z) = 0 whenever z is not a clique. Equivalently, the density / is Markov iff
(18)
/(x) = Π ^ z ) zCx
where φ(z) = exp{υn(z)(z)} is equal to 1 unless z is a clique. For example, for the pairwise interaction density (5) the interaction potentials are VQ = logα, υι({u}) — logδ(u) for singletons, V2({u,v}) = logh(u,v) for two-point cliques, and ψ(y) = 0 for cliques y containing 3 or more points. The process is Markov iff h(u,υ) = 1 for pairs of points u,v with \\u — v\\ > r.
Likelihoods and Pseudolikelihoods
2.5
29
Markov Chain Monte Carlo
Stochastic simulation of a finite Gibbs point process cannot in practice be performed by generic Monte Carlo techniques for sampling from a densitysuch as the rejection method [32, 73]. For example, although the hard core process (Example 2.5) is the conditional distribution of a Poisson process given that no pair of points is closer than r units apart, the probability of this event for interesting cases is prohibitively small. Instead, finite Gibbs point processes can be simulated using Markov Chain Monte Carlo (MCMC) techniques. Early examples are [55, 69, 72]; see the excellent reviews [26, 27]. In brief, these techniques involve running a Markov Chain (Fj), in discrete or continuous time, with state space X (the space of all finite point patterns). The chain is designed to converge in distribution to the distribution of the point process X of interest, so that after a long run time the state of Yt can be taken as a realisation of X. The chain must also be simple and quick to run. Typically the transitions or 'updates' of (Yt) are simple operations such as the "birth" of a new point, x H* XU{U}, where x G ί , w G S ; the "death" of an existing point, x H> X \ {xi} where Xi G x; or the shifting of an existing point X{ G x to a new location u. To ensure that the stationary distribution of (Yt) is the distribution π of X, it is sufficient and convenient to require that the transition kernel P*(x, A) = ¥{Yt G A I Yo = x} be in 'detailed balance' with π,
(19)
f P\x, B) dπ(x) = f P\y, A) dπ(y) JA
JB
for all t > 0. If, for example, the only possible transitions are instantaneous births x H> x u {u} at rate b(x,u) du(u) and instantaneous deaths x *-+ x\{xi} at rate D(x,Xi), then detailed balance is equivalent to 6(x,w)/D(xU {u},u) = X(u;x) whenever /(xU {u}) > 0. This can be achieved by various schemes of Gibbs and Metropolis-Hastings type. If such a process (Yt) exists (if the backwards equations have a unique solution) then it is irreducible and time reversible, and π is its unique equilibrium distribution [69]. The convergence of (Yt) can be extremely slow, and is difficult to measure. This limitation was lifted recently following the work of Propp and Wilson [70] who developed a coupling algorithm for drawing exact simulations from the equilibrium distribution of a discrete state Markov chain. This idea has been adapted to some spatial birth-and-death processes to obtain exact simulation algorithms for certain finite Gibbs point processes [27, 40, 41, 31]. The virtues of exact simulation algorithms are that the output is guaranteed to have the correct distribution, and that the computation time is usually orders of magnitude smaller than that required for the convergence of Metropolis-Hastings algorithms.
A.J. Baddeley
30
3
Pseudolikelihood inference
This section describes Besag's concept of pseudolikelihood for point processes, and reports on recent work by the author and Rolf Turner [3] on fitting Gibbs/Markov point process models using pseudolikelihood. 3.1
Pseudolikelihood
Suppose we have data consisting of a spatial point pattern x observed in a bounded region W of Rd. Thus x = {xi,..., xn} where the number of points n > 0 is not fixed, and each x\ is a point in W. There may also be spatial covariates. The aim is to fit to the data a finite Gibbs point process model with density /^(x) governed by a parameter θ ranging over Θ C F . It is generally difficult to evaluate and maximise the likelihoods of point processes. The loglikelihood of the inhomogeneous Poisson process (2) includes an integral requiring iterative optimization methods. Even simple exponential family models such as the pairwise interaction processes (5) include a normalising constant which is an intractable function of θ. Methods for approximating α( ) and maximising likelihood include functional expansions of α( ), Monte Carlo integration, and analogues of E-M and stochastic approximation [27, 56, 57, 58, 59, 60, 63]. An alternative to the likelihood function is the pseudolikelihood [10, 11, 12, 35] which we describe here. See [22, 23, 24, 74, 75, 78, 83] for other applications. Originally Besag [10, 11] defined the pseudolikelihood of a finite set of random variables Xi,... ,X n as the product of the conditional likelihoods of each individual X{ given the other variables {Xj : j φ i}. This was extended [11, 12] to point processes, for which it can be viewed as an infinite product of infinitesimal conditional probabilities. Besag [11] defined the pseudolikelihood of a point process with conditional intensity λ^(ϋ x) to be n(x)
PL(0;x) =
(20)
exp< - / λθ(u;x)du >
I Js
)
Further theory was developed in [12, 34, 35]. If the process is Poisson the pseudolikelihood coincides with the likelihood (2) up to the factor exp(|5|). For a pairwise interaction process (5), the pseudolikelihood is PL (0 x) n(x)
(21)
=
Π t=l
exp < - / b$(u) TT hβ(u,Xi)du
31
Likelihoods and Pseudolikelihoods
in which the intractable normalising constant α(θ) appearing in the likelihood (5) has been replaced by an exponential integral in (21) as if the process were Poisson. For processes with 'weak interaction' in the sense that λ^(w x) can be approximated well by a function of u only, the process is approximately Poisson and the pseudolikelihood is an approximation to the likelihood. Hence the maximum pseudolikelihood estimator should be efficient if interaction is weak. Folklore holds that it is inefficient for strong interactions. For an exponential family model, the maximum pseudolikelihood normal equations d/dθ log PL (0; x) = 0 can be shown to be unbiased estimating equations using the Nguyen-Zessin formula (10). Diggle et αl [22] showed in the stationary case that maximum pseudolikelihood is a special case of the Takacs-Fiksel method, itself an application of the method of moments [23, 24, 83]. These estimating equations can also be derived naturally from properties of the Markov chains used in MCMC methods [4]. Jensen and M0ller [35] proved that for Gibbs point processes with exponential family likelihoods, the pseudolikelihood is log-concave and the maximum pseudolikelihood estimator is consistent as 5 / Md, under suitable conditions. Jensen and Kϋnsch [34] proved the MPLE is asymptotically normal for stationary pairwise interaction processes, under suitable conditions (see (Cl) and (C2) of [34]). There may be room for considerable generalisation, since the latter results impose strong constraints on the interaction potential which are not needed for the case of discrete random fields [17]. The pseudolikelihood of a point process is analogous to the pseudolikelihood of a discrete (Markov) random field as defined in [10]. Indeed [11, 12] certain classes of point processes can be obtained as the a.s. limit of a sequence of Markov random fields defined on discrete lattices whose spacing tends to zero; the pseudolikelihood function of the Markov random field converges pointwise to the pseudolikelihood of the point process. Recent applications include [78]. 3.2
Computational device for maximum pseudolikelihood
In [3] we proposed a computational device for obtaining approximate maximum pseudolikelihood estimates. The method is an adaptation of a technique of Berman and Turner [9]. Related ideas have been explored by Lindsey [48, 49, 50, 51]. Approximating the integral in (20) by a finite sum using any quadrature rule, we may approximate the log pseudolikelihood n(x)
(22)
m
log PL (0 x) « ] Γ log Xθ{xi x) - ^
λθ(uj
x) Wj
32
A.J. Baddeley
where Uj, j = 1,..., m are points in W and Wj > 0 are quadrature weights. Note that if the list of points {uj,j — 1,... , ra} includes all the data points {xi, i = 1,..., n}, then we can rewrite (22) as (23)
log PL (θ x) » £
where λj = λθ(uj)
an
•9 ,v J
d yj =
ZJ/WJ,
( w log λ, - λj) t ^
where
_ ί 1 if Wj is a data point, Uj E {#i,..., a;n} \ 0 if Wj is a dummy point, Uj 0 {xi,..., x n }.
The right side of (23), for fixed x, is formally equivalent to the log likelihood of independent Poisson variables Yk ~ Poisson(λfc) taken with weights Wk The expression (23) can therefore be maximised using standard software for fitting Generalised Linear Models [53]. This makes it possible to fit rapidly a wide variety of Gibbs point process models incorporating effects such as spatial trend, dependence on covariates, interpoint interaction, and mark information.
4
Identifiability
It is relatively straightforward to construct Markov point process models since there is an explicit characterisation of their densities (Theorem 2.1). The interaction potentials may be chosen virtually at will, subject to the requirement that the density be integrable. However, the behaviour of the resulting process is difficult to determine. In particular it is not clear whether the resulting process will be distinguishable from the Poisson process and whether the parameters will be identifiable. This is important in the case of the Widom-Rowlinson process. Simulated realizations of both the repulsive and attractive cases displayed in [7] and [40] do not seem to differ markedly from Poisson patterns. This has been further investigated by A. Sarkka and the author [79]. Recall that the Widom-Rowlinson density (8) involves the area A(x) of the union of discs of radius r centred at the points X{ G x, intersected with 5. If r is small, then under the reference Poisson process, there is a high probability that these discs do not overlap, so that A(X) is equal to n(X) πr2 with high probability. Thus
2
with high probability under the reference Poisson process, where δ = βη~ΈT . Thus, when r is small, the Widom-Rowlinson process is approximately Poisson with intensity δ. The parameters β,η are not identifiable, only the derived parameter δ.
Likelihoods and Pseudolikelihoods
33
Alternatively, if r is large, then either x is empty or the discs cover the whole domain 5, so that
0
ifn(X)>0 if
with high probability under the Poisson process. Thus when r is large, the Widom-Rowlinson process is approximately a mixture of a Poisson process with intensity β and the process which is a.s. empty. The parameter 7 is not identifiable. This is an instance of the general fact that in a 2-parameter exponential family
(z) + θ2T(x)), if S and T are linearly dependent statistics under the reference distribution, then the model degenerates to a 1-parameter or O-parameter family and the parameters are not identifiable. For a general exponential family fe(x) = cexp(θτB(x)), where θ and B(x) are p-dimensional, θ is efficiently estimable iff μ lies in the convex hull of the support of the distribution of B(X) under the reference distribution (θ = 0). Geyer [27] has made very similar comments in relation to Monte Carlo maximum likelihood methods for Gibbs point processes. We have investigated this aspect of identifiability for the Widom-Rowlinson process by simulation. Figure 4 shows scatterplots of the empirical distribution of (ra(JC), A(X)) under the Poisson process, for various values of r. The first and last plots, for r = 0.02 and r = 0.12, confirm the predictions that for small and large r values, respectively, the statistics n(X),A(X) are linearly dependent. In the middle of the range, r « 0.08, the statistics appear to be linearly independent. Since the Widom-Rowlinson process degenerates to a Poisson process or a Poisson/empty mixture in cases of linear dependence, the question is whether for some values of r the process is distinguishable from a Poisson process. We investigated this by computing the total variation distance between the Widom-Rowlinson process and a Poisson process with equal intensity. Let P and Q be any probability distributions having densities / and g (respectively) with respect to some reference measure μ. The total variation distance [71, sections 1.3-1.4] is (25)
\\Q - P\\ = sup \Q(B) - P(B)\ =
\
2
9(X)
f(X)
Now let Q be the distribution of the Widom-Rowlinson process with parameters β, 7 and P the distribution of the Poisson process with intensity λ.
A.J. Baddeley
34
Figure 2. Scatterplots of the observed values of the sufficient statistics n(x) and A(x) of the Widom-Rowlinson process, for various values of r, generated by the Poisson point process with β = 100. All panels show (n(X),A(X)) on the same scale. The values of r are (top row, left to right) 0.2, 0.4, 0.6, (bottom row, left to right) 0.8, 1.0 and 1.2.
Then 1 2
<x{β,Ί) fβ\ n(X)
α(λ,l) \XJ
which can be estimated by simulation as follows. First we generate an adequate number of simulated realisations from Q to estimate the intensity of the Widom-Rowlinson process. Setting λ equal to this estimated intensity, we generate an adequate number of simulated realisations x^ 1 ),..., χ( m ) of P, the Poisson process with intensity λ. We estimate the ratio of normalising constants α(/?,7)/α(λ, 1) following [27, 63] by
and estimate the total variation distance ||Q — P\\ by
Figure 3 shows estimates of total variation distance plotted against disc radius r. They refer to the Widom-Rowlinson process in the unit square
Likelihoods and Pseudolikelihoods
0.05 0.10 disc radius
35
0.05 0.10 disc radius
Figure 3. Total variation distance between some a) attractive and b) repulsive areainteraction processes and the corresponding Poisson process with the same intensity; β = 100 and the labels 20, 40, ... refer to the values of log 7.
with β = 100 and log 7 = -80, -60,..., 60,80 for various values of r, and the Poisson process with equal intensity. The curves confirm the prediction that for small r and for large r the Widom-Rowlinson process degenerates to a Poisson process. However for moderate values of r they show that the Widom-Rowlinson process is distinguishable from the Poisson process. We also calculated the Hellinger and Kullback-Leibler distances which yield very similar results. These distance curves can be used to determine reasonable values for r for data analysis. For further details see [79]. It is common in statistical physics to construct 'phase diagrams' which partition the parameter space into regions where the process exhibits different 'regimes' of behaviour. The graphs in Figure 3 could be reinterpreted as a three-dimensional surface of total variation values for each point in (7, r) space; and a level set of this surface could be regarded as a phase diagram. See also [54]. 5
'Dynamic' Markov point processes
In spatial statistics there is considerable interest in developing new point process models, because the existing repertoire is thought to be too narrow and unrealistic for many applications. In particular, clustered point pattern data are thought to be very difficult to model by Markov point processes. An extension of Markov point processes was proposed in [6]. Recall that the interpoint interactions in a Markov point process occur between all points Xi, Xj of the configuration x that are 'close' in a predefined sense (x{ ~ Xj). In [6] this is extended by allowing the definition of 'close' to depend on the configuration x. This allows interactions to occur only between those points
36
A.J. Baddeley
which are close in the context of the configuration - for example, nearest neighbours — and for interaction to occur at any distance. For example, consider a renewal process in one-dimensional time. The intervals between successive points of the process are i.i.d. random variables. The density of this process on [0, T] is of the form n(x)
(26)
/(x)=
where x\ < x^ < . . . < xn are the points of x. While the form of (26) is very similar to a pairwise interaction density (5), this is not a Markov point process with respect to any nontrivial fixed relation ^ o n i Interpoint interactions occur only between consecutive pairs of points, and occur regardless of the distance between them. Again, Ord [62] suggested that for two-dimensional point pattern data in geography, an improvement on the Strauss process would be a model taking account of the size of the 'territory' of each point as expressed by its Dirichlet cell: (27) /(x) = αβn l[g( area of C(xi\x)) i
where C(xi\x) is the Dirichlet or Voronoi cell associated with point X{ in the configuration x, C(xi I x) = {u E M2 : | | u - X i | |
=min||ii-Xj||} 3
see [61]. Sibson [80] and others have suggested using the areas of DirichletVoronoi cells and the associated neighbour distances as statistics for point pattern data. Processes such as (27) do not have the Markov property as in Definition 2.2 because C(xi\x) depends on Xi and its Dirichlet neighbours (those Xj such that C(XJ | x) has a common edge with C(xi | x)) regardless of how far away these neighbours are. In [6] we defined a Markov property analogous to the Ripley-Kelly definition 2.2, except that the concept of "neighbourhood" now depends on context and is denoted ~. X
Definition 5.1 Assume that for each x E X there is given a symmetric, reflexive relation ~ defined on x. // points ix, v E x are related under ~ we write u ~ v and say that u and v are x-neighbours. The x-neighbourhood of a subset y C x is λί (y|x) = {u : u ~ v for some υ E y}.
Likelihoods and Pseudolikelihoods
37
The subset y C x is α clique in x if all members ofy are -^-neighbours (u ~ v for all w, v G y). A set containing 0 or 1 members is a clique. Example 5.1 (Renewal process) In one dimension, given the points x = {xi,... ,xn} C R with x\ < #2 < ••• < Xn, define X{ ~ x +i for i = 1,..., n — 1 and let no other relations hold. That is, each point is a neighbour of the next and the previous elements in the sequence. Example 5.2 (Delaunay neighbours) In R2 define X{ ~ Xj iff x^Xj are neighbors in the Delaunay tessellation generated by x. That is, x^Xj are neighbours iff their Dirichlet cells C{xχ | x), C(XJ | x) have a common edge. Example 5.3 (Connected components) In R2 let ~ be any fixed relation such as (13), and define ~ to be the transitive closure of ~ on x. Thus, iff x
J
Xi ~ V\ ~ 2/2 ~
~ Um ~ Xj
for some y^ G x. Two points are ^-neighbours if they belong to the same connected component of the graph induced by ~ on x. Next we generalise Definition 2.2 of Markov point processes to the case of non-constant relations ~. This should at least embrace functions of the X
form
However since ~ now depends on x, X
the new terms arise because some pairs X{,Xj E x may be neighbours with respect to ~ but not ~ , or vice versa. If f(xUu)/f(x) is to depend only on x
xUu
"local" information, then ~ must only differ from ~ in a "neighbourhood" xUu
^
x
of u. Thus we need to impose conditions on ~. X
Definition 5.2 For a dynamic relation ~ define the clique indicator function 0 if not.
-J
Define the following conditions on ~. Let y C z G X and u,υ G S with u, v # z and z U {w, v} G X.
38
A.J. Baddeley
(Cl) χ(y|z) φ χ(y|z U {u}) implies y C M [u\τ U {u}) (C2) if u r/j v where x = z U {u,v},
then χ(y|z U {u}) + χ(y|z U {υ}) =
Note particularly the strength of the conclusion in (Cl), i.e. "all points of y are neighbours of u". It is shown in [6] that Examples 5.1-5.3 and several other examples satisfy (Cl) and (C2). Definition 5.3 A finite Gibbs point process with density f is a dynamic Markov point process with respect to ~ if the conditional intensity λ(u x) depends only on u, λί (u\xΌu) and on the restrictions of the relations ~ , X
~ to λί (u\x U u). xUu
'
In words, the conditional intensity depends only on the added point u, on the neighbours of u (after its addition), and on information whether the addition of u has altered neighbourhood relations between any of these points. If the relation ~ does not depend on x, then this is equivalent to the original Ripley-Kelly definition of a Markov point process, Definition 2.2. Dynamic Markov point processes with respect to the connected component relation (Example 5.3) are studied further in [5, 15, 84]. This class includes many cluster processes, and is closed under superposition. The following result generalises the Hammersley-Clifford theorem for Markov point processes (Theorem 2.1). Theorem 5.1 Let ~ be a system of neighbour relations satisfying
(Cl)-
X
(C2). Then a finite Gibbs point process with density f is a dynamic Markov point process with respect to ~ if and only if X
(28)
/(x) yCx
(taking 0° = 0) for all x G X, where φ : X -> [0, oo) satisfies {II) if φ(x) > 0 then φ(y) > 0 for all y C x; (72) if φ(x) > 0 and ψ (λί (u\x U {u})) > 0, then φ(x U {u}) > 0. When /(x) > 0 the decomposition (28) reduces to (29)
f{χ) = cliques
where the product is over all y C x with χ(y|x) = 1.
Likelihoods and Pseudolikelihoods
39
This larger class of dynamic Markov point processes is amenable to MCMC techniques, with some increase in complexity of the algorithms. Kendall [39] proved an analogue of the spatial Markov property (Lemma 2.2). Monte Carlo maximum likelihood or maximum pseudolikelihood techniques can be applied. Baddeley and Turner [3] use pseudolikelihood to fit Ord's process (27) to point pattern data.
6
Directed Markov point processes
The simulation of Markov point processes is still computationally expensive, despite recent advances. This is a bottleneck for many applications, and also retards the development of our mathematical understanding of such processes. This is in contrast to Markov processes in one-dimensional time, which are relatively easy to simulate using the natural ordering of the real line. One strategy for avoiding the bottleneck is to modify the interpoint interactions in a spatial Markov point processes so that they respect a partial order. This is explored in our recent papers [2, 20] for one special case. Consider point processes in the unit square S = [0,1]2 in R 2 , and define a partial order ^ on 5 by (u,υ) •< (ur,vf) iff u < v! and v < v'. Interpoint interactions will occur only between points that are related in this partial order. We might call such processes directed Markov point processes. They axe analogous to the directed Markov random fields (Markov mesh models) on a discrete lattice, studied by Abend et al. [1], Pickard [64, 65, 66, 67], Lauritzen, Spiegelhalter et al. [46, 45], Cressie [19, 52, 47] and others [25, 42, 8]. Directed Markov random fields can be simulated exactly in a single pass over the lattice, in close analogy with the simulation of Markov chains in one-dimensional time. Naively one could try to construct directed Markov point processes by writing down probability densities /(x) which are products of terms associated with subsets y C x o f points that are related in the partial order, or by similarly constraining the factorisation (18). However, this turns out to be incorrect; such densities do not have the desired dependence properties. Instead one should modify the spatial Markov property (Lemma 2.2) and the conditional intensity (section 2.3) and derive the necessary form of /(x). This approach is pursued in [2]. Another way to construct specific point process models of this directed type is to take the limit of a sequence of directed Markov fields defined on increasingly finer discrete lattices. This approach is explored in a few special cases in [20].
40
A.J. Baddeley
For the partial order considered here, general results are already known from the theory of counting processes in multidimensional time [33]. A finite point process X in S is equivalent to a counting process (Nz,z G S) where Nz counts the number of points of X dominated by z. For z = {z\,z<ι) G S let D(z) = [0, z\] x [0, Z2] = {u G 5 : u •< z) be the set of points dominated by 2, and D*(z) = ([0, zλ] x [0,1]) U ([0,1] x [0, z2]) = {u G S : z £ u} the set of points that do not dominate z. See Figure 4. Let TZ^T^ be the σ-fields of events generated by X Π D(z) and X Π D*(z) respectively.
Figure 4. The regions D(z) (left) and D*(z) (right).
Then (Nz,z G S) has a directed conditional intensity λ+(ϊz,x) if N
(30)
- f λ+{ JD(Z)
is a 'strong martingale' [33] with respect to the filtration T* = (^"*, z £ S). Equivalently (31)
E
g(u,X)\+(u,X)du
for all ^-predictable functions g(u,X). This is very similar in form to the Nguyen-Zessin identity (10) which defines the "undirected" Papangelou conditional intensity. Our paper [2] proves equivalences between several different versions of the spatial Markov property, one being the condition known as (F4) in [33], and others being similar to the conclusion of Lemma 2.2 or to the onedimensional Markov property. The following results are known from multiparameter counting process theory [33, Theorems 1.3, 2.1] and [13]. Suppose λ + exists and fs λ + (u; x) du is bounded above, uniformly in x. Then X has a probability density / which satisfies the Mazziotto-Szpirglas exponential formula (32)
/(*) =
J ] λ+fo x) exp j^[l-λ+(ti;x)]
Likelihoods and Pseudolikelihoods
41
Furthermore, a similar expression holds for the probability density fz of the subprocess XΠD(z) = {xi EX : Xidiz} with respect to the Poisson process on D(z). Conversely if λ+(u;X) is any positive, predictable process such that fs\+(u;x)du is bounded above uniformly in x, then the function / constructed by (32) is a probability density for a point process absolutely continuous with respect to the unit Poisson process, satisfying the same measurability properties [33, Theorem 1.3, p. 275]. Incidentally the undirected conditional intensity λ(u; x) is related to the directed conditional intensity λ + (u;x) through (32),
λ(u x) =
λ+(u,*υ{u})f[ x exp (- ί [λ+(ϋ,xU {u}) - λ+(v,x)] dυ J
(33)
Note the similarity of (32) to the expressions for the likelihood of a Poisson process (2) and the pseudolikelihood of a finite Gibbs process (20). It differs from the likelihood of a general finite Gibbs process (4) in that (32) contains an integral over S. The latter effectively replaces the intractable normalising constant in (4). For example if λ + is a deterministic function, λ + (n, X) = g(u), we obtain the inhomogeneous Poisson process with intensity function g. Example 6.1 ('Directed Hard Core') By analogy with the Hard Core process (Example 2.5), let L if \\z — Xi\\ > r for all x\ •< z
) otherwise where r > 0 is a fixed parameter, the interaction distance. The function /(x) resulting via (32) is where J(x) is the indicator function jγ \ '
ί 1 if \\xj —χi\ ^ \ 0 otherwise
r
f° Γ all Xi,Xj G x such t h a t X{ •< Xj
and \U(x)\ is the area of t h e region C/(x)
=
{u E S : \\u - Xi\\ > r for all x ^ G x such t h a t X{ -< u]
= S\\JCr(xi) i
where Cr(z) = b(z,r) \ R*{z). See Figure 5.
42
A.J. Baddeley i i
r 1
1 I
• ir
\ 1
** v "" 1 \ \ J %
1
Γ
\
v ^ ". -^ - 1 \
\
i 1
\
\ . 1
* " "
Figure 5. A typical realisation of the directed hard core process in Example 6.1. Filled dots: points x» G x. Dotted circular sectors: forbidden regions Cr(xi). Shaded area: permitted region U(x).
Example 6.2 ('Directed Strauss') By analogy with the Strauss process set where 0 < 7 < 1 and s(s,x) = #{xi G x : Xi -< z and \\xι - z\\ < r}
is the number of points of x which are closer to z than a fixed distance r > 0 and which precede z in the partial order. The case 7 = 0 reduces to the previous Example. One of the chief advantages of directed Markov random fields in the discrete case, as expounded by Pickard [64, 65, 66, 67] and others, is that they can be simulated directly in a single sweep of the index set. Each value Xυ is drawn from the conditional distribution given the already generated values {Xu : u -< v}. Similarly, Monte Carlo simulation of directed Markov point processes is much simpler than for their undirected counterparts. Under mild conditions, a directed Markov point process can be obtained from a Poisson process by a random (i.e. data-dependent) multidimensional time change. This can be interpreted to give a simple algorithm for generating a realisation of the desired process in a single sweep of the spatial domain. Because they are easy to simulate, directed Markov processes have numerous potential uses. They might be used as reference distributions for importance sampling, or as proposal distributions for simulating Markov point processes (either by the rejection method or for Metropolis-Hastings
43
Likelihoods and Pseudolikelihoods
algorithms). They might serve as approximations to (undirected) Markov point processes in some cases. It would also be of interest to generalise the partial order •< and indeed to allow dynamic directed graphs (partial orders which depend on the configuration x). Acknowledgements. I thank the referees for helpful advice.
REFERENCES
[1] K. Abend, T.J. Harley, and L.N. Kanal. Classification of binary random patterns. IEEE Transactions on Information Theory, IT-11:538-544, 1965. [2] A. Baddeley, N. Cressie, and G. Nair. Directed markov point processes. Manuscript in preparation, 1999. [3] A. Baddeley and R. Turner. Practical maximum pseudolikelihood for spatial point patterns. Australian and New Zealand Journal of Statistics, 2000. To appear. [4] A.J. Baddeley. Time-invariance estimating equations. Bernoulli appear.
To
[5] A.J. Baddeley, M.N.M. van Lieshout, and J. M0ller. Markov properties of cluster processes. Advances in Applied Probability, 28:346-355, 1996. [6] A.J. Baddeley and J. M0ller. Nearest-neighbour Markov point processes and random sets. International Statistical Review, 57:89-121, 1989. [7] A.J. Baddeley and M.N.M van Lieshout. Area-interaction point processes. Ann. Inst. Statist Math., 47:601-619, 1995. [8] D. Basu and C.A.B. Pereira. Conditional independence in statistics. Sάnkhyd series A, 45:324-337, 1983. [9] M. Berman and T.R. Turner. Approximating point process likelihoods with GLIM. Applied Statistics, 41:31-38, 1992. [10] J. Besag. Statistical analysis of non-lattice data. 24:179-195, 1975.
The
Statistician,
44
A.J. Baddeley
[11] J. Besag. Some methods of statistical analysis for spatial data. Bulletin of the International Statistical Institute, 47:77 - 92, 1977. [12] J. Besag, R. Milne, and S. Zachary. Point process limits of lattice processes. Journal of Applied Probability, 19:210-216, 1982. [13] T.C. Brown, B.G. Ivanoff, and N.C. Weber. Poisson convergence in two dimensions with application to row and column exchangeable arrays. Stochastic Processes and their Applications, 23:307-318, 1986. [14] P.J. Cameron. Combinatorics: topics, techniques, algorithms. Cambridge University Press, 1994. [15] Y.C. Chin and A.J. Baddeley. On connected component Markov point processes. Advances in Applied Probability, 31:279-282, 1999. [16] Peter Clifford. Markov random fields in statistics. In Disorder in physical systems, pages 19-32. Oxford Univ. Press, New York, 1990. [17] F. Comets and M. Janzura. A central limit theorem for conditionally centred random fields with an application to Markov fields. J. AppL Prob., 35:608-621, 1998. [18] D. R. Cox and V. Isham. Point processes. Chapman and Hall, London, 1980. [19] N. Cressie and J.L. Davidson. Image analysis with Partially Ordered Markov Models. Preprint 94-15, Department of Statistics, Iowa State University, June/October 1994. [20] N.A.C. Cressie, J. Zhu, A.J. Baddeley, and M.G. Nair. Directed Markov point processes as limits of partially ordered markov models. Methodology and Computing in Applied Probability, 2000. To appear. [21] D.J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Springer Verlag, New York, 1988. [22] P.J. Diggle, T. Fiksel, P. Grabarnik, Y. Ogata, D. Stoyan, and M. Tanemura. On parameter estimation for pairwise interaction processes. International Statistical Review, 62:99-117, 1994. [23] T. Fiksel. Estimation of parameterized pair potentials of marked and non-marked Gibbsian point processes. Elektronische Informationsverarbeitung u. Kybernetika, 20:270-278, 1984. [24] T. Fiksel. Estimation of interaction potentials of Gibbsian point processes. Statistics, 19:77-86, 1988.
Likelihoods and Pseudolikelihoods
45
[25] M. Frydenberg. The chain graph Markov property. Scandinavian Journal of Statistics, 17:333-353, 1990. [26] Charles J. Geyer and Jesper M0ller. Simulation procedures and likelihood inference for spatial point processes. Scand. J. Statist., 21(4):359373, 1994. [27] C.J. Geyer. Likelihood inference for spatial point processes. In O.E. Barndorff-Nielsen, W.S. Kendall, and M.N.M. van Lieshout, editors, Stochastic Geometry: Likelihood and Computation, number 80 in Monographs on Statistics and Applied Probability, chapter 3, pages 79-140. Chapman and Hall / CRC, Boca Raton, 1999. [28] E. Glόtzl. Bemerkungen zu einer Arbeit von O.K. Kozlov. Mathematische Nachrichten, 94:277-289, 1980. [29] E. Glotzl. Lokale Energien und Potentiate fur Punktprozessen. Mathematische Nachrichten, 96:195-206, 1980. [30] G.R. Grimmett. A theorem about random fields. J. London Math. Soc, 5:81-84, 1973. [31] O. Haggstrόm, M.N.M. van Lieshout, and J. M0ller. Characterisation results and Markov Chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli, 1998. To appear. [32] J.M. Hammersley and D.C. Handscomb. Methuen, London, 1964.
Monte
Carlo methods.
[33] B.G. Ivanoff and E. Merzbach. Intensity-based inference for planar Poisson processes. Journal of Multivariate Analysis, 32:269-281, 1990. [34] J.L. Jensen and H.R. Kϋnsch. On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes. Annals of the Institute of Statistical Mathematics, 46:475-486, 1994. [35] J.L. Jensen and J. M0ller. Pseudolikelihood for exponential family models of spatial point processes. Annals of Applied Probability, 1:445-461, 1991. [36] O. Kallenberg. Random measures. Akademie Verlag/Academic Press, Berlin/New York, third edition, 1983. [37] O. Kallenberg. An informal guide to the theory of conditioning in point processes. International Statistical Review, 52:151-164, 1984. [38] F.P. Kelly and B.D. Ripley. Biometrika, 63:357-360, 1976.
On Strauss's model for clustering.
46
A.J. Baddeley
[39] W.S. Kendall. A spatial Markov property for nearest-neighbour Markov point processes. Journal of Applied Probability, 28:767-778, 1990. [40] W.S. Kendall. Perfect simulation for the area-interaction point process. Technical Report 292, Department of Statistics, University of Warwick, Coventry, 1996. [41] W.S. Kendall and J. M0ller. Perfect metropolis-hastings simulation of locally stable point processes. Research Report 347, Department of Statistics, University of Warwick, 1999. [42] H. Kiiveri, T.P. Speed, and J.B. Carlin. Recursive causal models. Journal of the Australian Mathematical Society, series A, 36:30-52, 1984. [43] J.F.C. Kingman. Poisson processes. Oxford University Press, 1993. [44] O.K. Kozlov. Gibbsian description of point random fields. Theory of probability and its applications, 21:339-355, 1976. [45] S.L. Lauritzen, A.P. Dawid, B.N. Larsen, and H.-G. Leimer. Independence properties of directed Markov fields. Networks, 20:491-505, 1990. [46] S.L. Lauritzen and D.J. Spiegelhalter. Local computations with probabilities on graphical structures and their applications to expert systems (with discussion). Journal of the Royal Statistical Society, series B, 50:157-224, 1988. [47] J. Lee, M.S. Kaiser, and N. Cressie. Multiway dependence in exponential family conditional distributions. Preprint 96-17, Department of Statistics, Iowa State University, July 1996. [48] J.K. Lindsey. The analysis of stochastic processes using GLIM. Springer, Berlin, 1992. [49] J.K. Lindsey. Fitting parametric counting processes by using linear models. Applied Statistics, 44:201-212, 1995. [50] J.K. Lindsey. Fitting bivariate intensity functions, with an application to modelling delays in reporting acquired immune deficiency syndrome. Journal of the Royal Statistical Society, Series A, 159:125-131, 1996. [51] J.K. Lindsey and G. Mersch. Fitting and comparing probability distributions with log linear models. Computational Statistics and Data Analysis, 13:373-384, 1992.
Likelihoods and Pseudolikelihoods
47
[52] C. Liu and N. Cressie. On the equivalence between symmetric Ising models and some partially ordered Markov models for binary images. Preprint 95-9, Department of Statistics, Iowa State University, April 1995. [53] P. McCullagh and J.A. Nelder. Generalized Linear Models. Chapman and Hall, second edition, 1989. [54] K. Mecke. Integralgeometrie in der Statistischen Physik. Number 25 in Reihe Physik. Verlag Harri Deutsch, 1993. [55] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equations of state calculations by fast computing machines. J. Chem. Phys., 21:1087-1092, 1953. [56] R.A. Moyeed and A.J. Baddeley. Stochastic approximation for the MLE of a spatial point process. Scandinavian Journal of Statistics, 18:39-50, 1991. [57] Y. Ogata and M. Tanemura. Estimation of interaction potentials of spatial point patterns through the maximum likelihood procedure. Annals of the Institute of Statistical Mathematics, B 33:315-338, 1981. [58] Y. Ogata and M. Tanemura. Likelihood analysis of spatial point patterns. Journal of the Royal Statistical Society, series B, 46:496-518, 1984. [59] Y. Ogata and M. Tanemura. Estimation of interaction potentials of marked spatial point processes through the maximum likelihood method. Biometrics, 41:421-433, 1985. [60] Y. Ogata and M. Tanemura. Likelihood estimation of interaction potentials and external fields of inhomogeneous spatial point patterns. In I.S. Francis, B.J.F. Manly, and F.C. Lam, editors, Pacific Statistical Congress, pages 150-154. Elsevier, 1986. [61] A. Okabe, B. Boots, and K. Sugihara. Spatial tessellations: concepts and applications of Voronoi diagrams. Wiley series in probability and mathematical statistics. John Wiley and Sons, Chichester, England; New York, 1992. [62] J.K. Ord. Contribution to discussion of the paper by b.d. ripley. Journal of the Royal Statistical Society, series B, 39, 1997. [63] A. Penttinen. Modelling interaction in spatial point patterns: parameter estimation by the maximum likelihood method. Number 7 in Jyvaskyla
48
A.J. Baddeley Studies in Computer Science, Economics and Statistics. University of Jyvaskyla, 1984.
[64] D.K. Pickard. A curious binary lattice process. Journal of Applied Probability, 14:717-731, 1977. [65] D.K. Pickard. Unilateral Markov fields. Advances in Applied Probability, 12:655-671, 1980. [66] D.K. Pickard. Statistical inference on discrete Markov fields. Advances in Applied Probability, 17:253-254, 1985. [67] D.K. Pickard. Inference for discrete Markov random fields: the simplest non-trivial case. Journal of the American Statistical Association, 82:9096, 1987. [68] C.J. Preston. Random fields. Springer Verlag, Berlin-Heidelberg-New York, 1976. [69] C.J. Preston. Spatial birth-and-death processes. Bulletin of the International Statistical Institute, 46:371 - 391, 1977. [70] J.G. Propp and D.B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, 9:223-252, 1996. [71] R.-D. Reiss. A Course on Point Processes. Springer-Verlag, New YorkBerlin-Heidelberg, 1993. [72] B.D. Ripley. Simulating spatial patterns: dependent samples from a multivariate density. Applied Statistics, 28:109-112, 1979. [73] B.D. Ripley. Stochastic simulation. John Wiley and Sons, New York, 1987. [74] B.D. Ripley. Statistical inference for spatial processes. Cambridge University Press, 1988. [75] B.D. Ripley. Gibbsian interaction models. In D.A. Griffiths, editor, Spatial statistics: past, present and future, pages 1-19. Image, New York, 1989. [76] B.D. Ripley and F.P. Kelly. Markov point processes. Journal of the London Mathematical Society, 15:188-192, 1977. [77] D. Ruelle. Statistical mechanics. John Wiley and Sons, New York, 1969.
Likelihoods and Pseudolikelihoods
49
[78] A. Sarkka. Modelling interaction in spatial point patterns: parameter estimation by the maximum likelihood method. Number 22 in Jyvaskyla Studies in Computer Science, Economics and Statistics. University of Jyvaskyla, 1993. [79] A. Sarkka and A. Baddeley. Parameter estimation of area-interaction processes. Manuscript in preparation, 1999. [80] Robin Sibson. The Dirichlet tessellation as an aid in data analysis. Scand. J. Statist, 7(l):14-20, 1980. [81] D. Stoyan, W.S. Kendall, and J. Mecke. Stochastic Geometry and its Applications. John Wiley and Sons, Chichester, second edition, 1995. [82] D.J. Strauss. A model for clustering. Biometrika, 63:467-475, 1975. [83] R. Takacs. Estimator for the pair potential of a Gibbsian point process. Statistics, 17:429-433, 1986. [84] Hakon Tjelmeland and Julian Besag. Markov random fields with higherorder interactions. Scand. J. Statist, 25(3):415-433, 1998. [85] B. Widom and J.S. Rowlinson. A new model for the study of liquidvapor phase transitions. Journal of Chemical Physics, 52:1670-1684, 1970.
DEPARTMENT OF MATHEMATICS AND STATISTICS UNIVERSITY OF W E S T E R N AUSTRALIA NEDLANDS
WA 6907
AUSTRALIA
adrian ©maths, uwa. edu. au
LASER COOLING AND STOCHASTICS
1
O.E. BARNDORFF-NIELSEN AND F.E. BENTH
1
MaPhySto , University of Aarhus In the statistical analysis of cooling and trapping of atoms by a combination of laser and magnetic field technology, Aspect, Bardou, Bouchaud and Cohen-Tannoudji (1994) showed that Levy flights is the key tool. A review of their analysis, from the point of view of renewal theory and occupation times for stochastic processes, is given here and some further analysis provided. Brief discussions of two related types of models are also given. AMS subject classiήcations: 60K05 60J25 60E07 62E20. Keywords and phrases: Laser cooling, Levy flights, occupation times, renewal theory, stable processes.
1
Introduction
Cooling and trapping of atoms, by a combination of laser and magnetic field technology, is a subject area of great current interest in physics. By directing a number of laser beams towards a chosen point in space and setting up a suitable magnetic field around the point it is possible to hold a cloud of atoms largely concentrated in a very small region around the point, as indicated in Figure 1. The basis of the techniques is the fact that light acts mechanically on material objects, such as atoms, meaning that it can change their positions and velocities. Each single atom follows a random trajectory, but is staying most of the time near the centre of the trapping region; it moves very little and is therefore 'cold'. Stochastic considerations have led to a substantially better understanding, and subsequently to a dramatic improvement in efficiency, of the cooling; see Bardou et al. (1994), Bardou (1995) and Reichel et al. (1995). Of particular interest are the questions: (i) how much of the total time of the experiment does the momentum (vector) of the atom belong to a small neighbourhood of the origin. (ii) what is the distribution of the momentum given that it belongs to such a neighbourhood. 1
MaPhySto - Centre for Mathematical Physics and Stochastics, funded by a grant from the Danish National Research Foundation.
Laser Cooling and Stochastics
51
v
Figure 1. Experimental setup for laser cooling and trapping.
These and related questions are discussed in considerable detail in a forthcoming paper by Bardou, Bochaud, Aspect and Cohen-Tannoudji (1999), a preliminary version of which has kindly been provided to us by Francois Bardou. (See also Bardou and Castin (1998)). We shall refer to their treatment as the ABBC analysis. The ABBC analysis led to the heart of the matter but through an approximate analysis, ab initio. In Sections 4 and 5 we review and extend that work in the light of the theory of renewals and occupation times for stochastic processes. In this we draw on well known results of that theory as expounded, for instance, in Bingham, Goldie and Teugels (1987). Section 2 outlines the physical setting in more detail and Section 3 specifies the resulting stochastic process model for a one-dimensional component of the momentum vector. Some analogous, but simpler, models that allow of a fairly detailed analysis are briefly treated in Section 6, and the final Section 7 contains concluding remarks. 2
Laser cooling
The four most prominent cooling techniques, listed in the order they arose chronologically, are Doppler cooling, Sisyphus cooling, VSCPT (VelocitySelective Coherent Population Trapping) and Raman ccoling. Doppler cooling and Sisyphus cooling were capable of bringing the temperature down to 1 μK, approximately, but lower limits are not achievable by these methods due to a recoil effect. With VSCPT and Raman cooling temperatures of the order 1 nK are reached. These two methods rely heavily on the effects of what in the physics literature has become to be known as 'Levy flights', and which play an important role in many other contexts in physics. In the language of stochastics the effects are those associated to the properties of the stable laws (cf. Sections 4 and 5). For an atom subjected to VSCPT, the quantum mechanical description of its behaviour is as a wave function φ and it is this function that un-
52
O.E. Barndorff-Nielsen and F.E. Benth
dergoes a random trajectory in Hubert space, the stochastic movements being caused by absorption and emission of photons. In this connection, see Cohen-Tannoudji, Bardou and Aspect (1992), Castin and M0lmer (1995) and M0lmer and Castin (1996). The models to be described and discussed in the following refer mainly to the Raman method. Under that type of experimental setup we have in mind here the momentum of the atom is accurately determined, in the sense of having a narrow probability distribution (centered on zero) and hence, due to complementarity, the position is only vaguely determined. Correspondingly, the stochastic processes we shall be discussing in the following sections are to be conceived as models for the time behaviour of the momentum rather than the position of the atom. However, this still means (recall the atomic scales) that with high probability the position will be in a (roughly) spherical region with a diameter of the order of 1 mm or less (the central region in Figure 1). Laser cooling and trapping makes it possible to measure important physical quantities with unprecedented precision and to study various types of fundamental questions in particle physics, for instance concerning atom optics, atom interferometry, atomic clocks, and high resolution spectroscopy. The 1997 Nobel Price in physics was given for research in this area, to Steven Chu, Claude Cohen-Tannoudji and William D. Phillips. The three Nobel Prize Lectures, by Chu (1998), Cohen-Tannoudji (1998) and Phillips (1998), are highly readable and informative. An earlier, less technical and very illuminating, discussion was given by Aspect and Dalibar (1994). For the future, the techniques hold much promise for the study of 'pure' situations, such as systems of a small number of atoms in well-defined states exhibiting quantum features. 3
Stochastic momentum model
As indicated in Sections 1 and 2, the basic description of the behaviour of a single atom is in terms of the random 'path' of its wave function φ. Under Raman cooling (and also under VSCPT) the description can for many purposes be reduced to the following type of model for the momentum of the atom, as a function of t. Let Yt be a Markov jump process with state space RD and transition law μ(x, άy) for jumps from x to y. The rate function for the waiting times will be denoted by λ; in other words, letting τ(y) be the generic notation for a waiting time in state y we have that the law of τ(y) is exponential with mean λ(y)" 1 . We write I\4(ί) for the occupation time in a set A up till time ί, i.e. ΓA(t) = I lAiXs)άs Jo
Laser Cooling and Stochastics
53
D
and Bx(p) will denote the ball in R with radius p and center x. We shall refer to i?o(r), for some small r, as the 'trap', this corresponding to the cold states of the atom. Finally, let At be the random variable that is 0 if the atom is in the trap at time t and is 1 otherwise, and define qt to be the conditional probability density of Yt given that At = 0, i.e. qt(y)=p(ytYt\At = o) The dimensions D = 1,2 or 3 are those of physical interest, and we shall mainly consider the one-dimensional case. The key experimental setting is such that, up to a scaling, which is unimportant in the present context (see further in Section 4, Footnote 3), λ(y) = φ Γ
for y e β 0 (1)
for some c > 0 and some 7 > 0. The parameter 7 is determined by the experimental setup 2 , the case 7 = 2 being of some special interest3. For y outside the ball BQ(1) various forms of λ(y) are considered. We shall discuss three model types: Model type I: For some R > 1 there is a reflecting barrier at the surface of the ball B0{R)> and λ(y) = c for 1 < \y\ < R. Model type II: λ(y) = c for all y with 1 < |y|. Model type III: λ(y) = 1 for 1 < \y\ < R and λ(y) = c{R/\y\γ for η > 0 and all y with R < \y\. Furthermore, under model types II and III μ(x, dy) is of the form
2
The reason for the special power form of the intensity function in a neighborhood of zero comes from physical considerations of the atoms influenced by the laser. In VSCPT cooling 7 = 2 because an atom which has absorbed a photon is in an unstable, excited state. Physical reasoning shows that the transmisson rate in this state will be proportional to the square of the momentum of the atom. This in turn leads to a transmission rate in the non-coupled, stable state with the same momentum dependency. For Raman cooling, on the other hand, 7 can be tuned by the experimenter. In this set-up the reasoning goes via the Fourier transform of light pulses. For example, a Blackman pulse gives 7 = 4, while for a rectangular pulse 7 = 2. See Aspect et al. (1988) and (1989) for more detailed physical explanations. 3 In some instances, however, a more realistic specification of λ inside the ball BQ(1) is as for co > 0 but very small. We shall not consider this possibility further here, but take it up in connection with the discrete model formulation in Subsection 6.1
O.E. Barndorff-Nielsen and F.E. Benth
54
R
r 1 Figure 2. Shape of X(y) in model type I with 7 = 2.
for some function φ, which in the one-dimensional case may be taken as Φ( ) = δ~ll[-δ/2,δ/2]( ) for some positive δ < 1; that is, for D = 1 the jump sizes are uniformly distributed between — δ/2 and J/2 4 . For model type I this form is modified near the reflecting barrier, in a natural fashion. (For model type I, only the behaviour within the ball BQ(R) is studied.) For investigations in physics, model type I is the most important and we shall mainly consider this. Furthermore, for simplicity, we largely restrict attention to the one-dimensional case, i.e. D = 1. 4
The ABBC analysis
As already indicated, in the ABBC approach one considers, in momentum space, a small ball Bo(r) - the trap - centred at 0 and with radius r. Let τi, T2,... and fi, T2,... denote the successive sojourn times in and out of the trap, respectively, the T -S constituting an i.i.d. sequence and likewise for the fi-s. It is assumed that with sufficient accuracy one can think of these two sequences as being independent. The degree of accuracy of the implied approximation depends on the size of δ. It is furthermore argued that provided r « δ/2 one can, to good approximation, assume that when the atom jumps into the trap from outside, the attained momentum y will be uniformly distributed in Bo(r). Letting λ(y) denote the rate of the exponential waiting time distribution in momentum state y one therefore has that the r^-s follow the distribution with density
= \B0(r)\-1 ί
λ(y)
JBo(r)
4
In the units chosen here, δ is of the order of h\k\ where h is Planck's constant and k is the optical wave-vector.
Laser Cooling and Stochastics
55
where |i?o(r)l * s the volume of BQ(Γ) and r is a generic random variable having the same distribution as the Tj-s. Let α = D/j, then
for some constant α. Thus, provided α < 2, the law of the Tf-s belongs to the domain of attraction of a positive α-stable distribution with a scaling constant b depending on α. We denote the distribution function of this positive α-stable law5 by Sα(x;b). In particular, if 7 = 2 and D = 1, then α = 1/2 and
p(x\τ) = -r-ιΊ{-,r2x)χ-*l2 where 7(α, x) is the incomplete gamma function
η{a,x) = I sa-le~sds Jo Hence the r^-s are in the domain of attraction of the ^-stable law with scaling constant b = 2~3r~2. As regards what happens outside the trap, it is argued that under model type I the τ;-s belong to the domain of attraction of the normal law, while under type II the domain of attraction is again that of a ^-stable law with some scaling constant 6, as is indeed plausible in view of well-known probabilistic results. Under model type III, the distribution of the fj-s is argued to belong to the domain of attraction of a ^-stable law when 77 is chosen equal to 2. In the calculations below we will frequently refer to the distributions of T{ and T{ as being α-stable and ά-stable respectively, where ά < α and ά, α G (0,1). In model type II we have α = ά while under model type III, ά < α. Mathematically, the most interesting cases are α = ά = 1/2 (model type II) and α = 1/2, ά = 1/4 (model type III).
4.1
Occupation times
We consider here the time spent by the atom in the trap between 0 and t. Model type I: Let τu\ denote the longest of the periods spent in the trap before time t. For t -> 00, Tφ is of the order of t (cf. a well-known property of the stable laws) and hence, in particular, ΓBo^(t)/t —> 1. Model type II: In this case
where b and b are the scaling constants of r and f, respectively. Model type III: In this model ά < α and hence TBo^(t)/t -> 0. 5
In a standard notation (see, e.g., Samorodnitsky and Taqqu (1994)) this law is denoted
Sa(b,β,μ)
with β = 1 and μ = 0.
56
4.2
O.E. Barndorff-Nielsen and F.E. Benth
The 'sprinkling distributions'
To obtain more precise information on the distribution of the momentum Yt at time t the authors derive the 'sprinkling distributions' SR and SE In the traditional probabilistic terminology and assuming that the atom starts outside the trap, SR and SE are, in fact, the renewal measures corresponding to the sequences {fi + ... + n + f*} and {fi + n + ... + τi + τ;}, respectively. Denoting the corresponding renewal densities by UR and UE we have
2=1
and uR{t)=p(t\τ)+
I uE{t-x)p(x\τ)άx Jo
The Laplace transforms of ΪXE(£) and u#(t) are
r r
Jo and
/o
Jo
respectively, where L{^ \ χ\ is the Laplace transform of the random variable
χat0. Now consider model type I. Then τ+τ belongs to the domain of attraction of a positive α-stable law with scale parameter b and, as the authors show and as follows also from results of Dynkin and Lamperti (see further in Section 5), we then have UB(t),UR(t)
^
for t —ϊ oo. In model type II f belongs to the domain of attraction of a positive α-stable law with scale parameter b. Thus,
(6 + o)Γ(α) when t -> oo. Under model type III, f belongs to the ά-stable domain, where ά < α. Hence, «E(t),«ϋ(t)
x^ 6Γ(α)
when t —>• oo.
Laser Cooling and Stochastics
4.3
57
Trapping probabilities
Next the authors discuss the probability of finding an atom in the trap. We give here a similar derivation of this probability: Let Q(t) = Pτ{At = 0}, i.e. be the probability of finding the atom in the trap Bo(r) at time t. We have (1)
Q(t) = G(t) + / p{x\h
+ n)Q(t - x)dx
Jo where
G(t)= f p{xtf)Pr{r Jo
>t-x}dx
Relation (1) is a renewal equation, which has the solution
(2)
Q(t) = [ G(t- x)uE{x) dx Jo = I Uβ(t — x) p{u t f) Pr{r > x — u}dudx Jo Jo
The Laplace transform of Q(t) takes the form
In order to study the asymptotics of Q(t) we need to distinguish between the different model types. First, consider model type I. Since f has finite expectation and r belongs to the domain of attraction of an α-stable law with scale 6, Γ e'etQ{t) dt ~ θ~ι - E{f} ./o when θ ->> 0. Hence, when t —> oc, Q(t) ~ 1. In model type II both f and r have distributions in the domain of attraction of an α-stable law with α E (0,1) and scale parameter b and 6, respectively. We have, for small 0,
Π
Jo
6+6
which implies
Q(t) ~ Λ6+6
when t -> oo. Finally, for model type III r and f have distributions in the domain of an α-stable and an ά-stable distribution, respectively, where ά < α and ά, α E (0,1). In this case, the small θ behaviour will be
f o Jo
e-θtQ{t)dt
b
58
O.E. Barndorff-Nielsen and F.E. Benth
which gives the large time asymptotics b
I ί - (
Ω
"
ά
)
Note that formally for ά = α this expression becomes Q(t) ~ 6/6, which differs from the correct results as given for model type II. 4.4
Momentum distribution
Finally the authors discuss the distribution of the momentum at time t inside the trap. We have,
= (2r)~ι / V φ ί fi + τ2 + . . . + h) Pr{r(y) >t-x}dx rt
= (2r)~ι / Pr{τ(y) > t - x}uR{x) dx Jo
= (2r)-λ f Jo
e-^-χ^^uR(x)dx tt
= {2r)~ι / Jo
tu) du t
The asymptotics for p(y, 0 X Yt, At) is easily studied in terms of the asymptotics of uR: uR(x) - c χ - ( 1 " ά ) where ά = α in model types I and II and ά = ά in model type III. Furthermore, c = (6Γ(α))~1 and c = ((6 + 6)Γ(α))~1 in model types I and II, respectively, and c = (6Γ(ά))~1 in model type III. Hence, for t —> oo, (3) where
p(v,0tYt,At)~±g&
(x)= ff
Jo
^^u7*-1
du
Our main interest lies in the large time behaviour of qt{y)' First, notice that qt(y) = P(y ί Yt\At = 0) = Q-\t)p{y, 0 \ Yu At)
59
Laser Cooling and Stochastics
Hence, we can give the asymptotic results for qt(y) under the three different model types appealing to the asymptotics for Q(t) derived in the subsection above: Under model type I (4)
qt(y)^(2rbT(α)r1-Gα(tλ(y))tα,
t -+ oo
For model type II, (5)
^
and, finally, for model type III,
(6)
qt{y)
It follows that through rescaling by the transformation u = βty, where βt is defined by tλ(β^1) = 1, one obtains a limit law for u (conditional on At = 0) in all three cases.
5
Further analysis
We now return to the two first themes of Section 4 in order to discuss these further in the light of existing probabilistic results on occupation times and renewal theory.
5.1
Occupation times
Let us first consider the general momentum model introduced in Section 3. From Ethier and Kurtz (1986), p.162, we know that Yt is a time-homogeneous Markov process with generator given by
(7)
Λf(x) = λ(x) [
(f{x) - f(y)) μ{x, dy)
The domain of A is the space of real-valued measurable functions on RD which are integrable with respect to the measure μ(x,dy). Assume λ( ) > 0 is bounded, and denote λ := sup 3/ λ(y). Introduce a modification of the transition probabilities μ in the following manner: (8)
μ{x, A) = (1
^- ) δA(x) + ~^-μ{x, A)
Let {xk} be the Markov chain with transition law μ. According to Ethier and Kurtz (1986), Yt has the same finite dimensional probability distributions as the process Xt := xp t , where Pt is a Poisson process with intensity λ
60
O.E. Barndorff-Nielsen and F.E. Benth
independent of {xk}- The transition probabilites for Xt are easily derived to be (9)
Pr{X t + ί € AI Xt = x} = e λ s V
^ - P φ
n
G Λ | z 0 = x}
Our main object of interest is relative occupation times for Yt. Denote the occupation time in a Borel set A given that YQ — x by
(10)
Γ*A(t)=
f Jo
The relative occupation time in question for laser cooling is
We consider the occupation time distribution of Yt by exploiting the equivalence between the processes Xt and Yt: Let A E B(RD). For n E No define (12)
NA(n) =
#{XieA:0
i.e. NA{TI) is the number of visits to the set A of the Markov chain {xi} up till time n. Denote the number of jumps of Xt between 0 and t by Nt and define (13)
NA(t) := NA(Nt) = # {Xi E A : 0 < i < Nt}
With these objects at hand, we can start to calculate an expression for the (defective) probability density of the occupation time of Yt in a set A. For s < ί, let px°(s f Γ^(ί)) be the (defective) probability density of TA(t) at s when YQ = XQ £ A. If NA(t) = fc we know that X s has spent A; exponentially distributed time periods in the set A on the time interval [0, ί]. These exponential waiting times are independent with intensity λ, and the sum of k periods will thus be gamma distributed with parameters k and λ. Hence, (14)
p^tΓΛίt^p^ Jfe=l
where g(s; fc, λ) = ΛrsSk~1e~Xs^ is the density of the gamma distribution. A straightforward calculation with conditional probabilities shows that OO
px°(k t NA(t)) = 5>*°(fc t NA(n))p(n % Nt) n=k
n=k
61
Laser Cooling and Stochastics since Nt is Possion distributed with intensity λ. Thus (15)
p*{s X ΓΛ(t))
= e~ 71=0
ni 71=0
n=0
ra=l
k=l
V
y
Hence, we see that the problem of calculating the occupation time of Yt is reduced to finding the occupation time in A for the chain {#&}. We now consider the asymptotics for the occupation time, in the framework provided by Takacs (1959): Assume we have a stochastic process which enters states A and B alternately. The states A and B are disjoint subsets of the state space of the process, and their union constitutes the whole state space. The sequences of the successive sojourn times spent in the two states are assumed to be independent positive random variables. Under some asymptotic assumptions for the sums of the sojourn times in the two states, Takacs (1959) provides explicit asymptotic results for the total sojourn time in state B (or A) during the time interval (0, ί). His results are directly applicable to the laser cooling framework. We consider this in further detail: Let state B — BQ(Γ) where r « 1 and, as in the Section above, {r^} denotes the sequence of sojourn times in 5 , while the sojourn times in state A = Bc are denoted {f^}. As we saw in Section 4, the r^-s will belong to the domain of attraction of a positive stable distribution of index α and scale parameter 6. I.e., (16)
lim Pr j ^
^
< x) = Sα(x; b)
Concerning the shape of the distribution of the fj-s, this will depend on the choice of model. In model type I, the fi-s will belong to the domain of attraction of the normal distribution: (17)
62
O.E. Barndorff-Nielsen and F.E. Benth
where σ 2 is the variance of the generic variable τ with the same distribution as the fi-s. Φ(x) is the standard normal distribution. In model type II, the f^-s belong to the domain of attraction of a stable law of index α with scale parameter b:
lim (x;b) lim Pr { ^=f^ <x\ = SSαα(x;
(18)
Finally, for model type III, the sojourn times in the 'hot' state are in the domain of attraction of a stable law Sά(x] b) (19)
lim Pr ( & ^
<x)
n-*oo
= Sά(x; b)
The conditions (16) and (17-19) are exactly what is needed in order to state the following result by Takacs (1959): Theorem 5.1 The asymptotics ofTβQ^(t)
lim P r { Γ B o ( r l ^ ~Mjt
(20)
is given by
<s} = Qjis),
j = 1,2,3
where. Model type I. m\ = α, M\ = 1 and M\ = E{τ}. Qι(s) is the distribution of —x" 1 / 2 where χ is distributed as 5 Q (s;6). Model type II. rri2 = \, M2 = 0 and M<ι = 1. Q2{s) is the distribution of x/{ζ + x) where ζ is distributed as Sa(sm, b) and χ as Sa(s; 6). Model type III. 777,3 = OL/OL, M3 = 0 and M3 = 1. Q$(s) is the distribution of χζ~ιl2 where ζ is distributed as Sά(s b) and χ as Sa(sm,b). 5.2
The 'sprinkling' distribution
Again, let {n}™ and {ri}™ be independent random variables denoting the sojourn times in the 'hot' and 'cold' states respectively. Let F(t) and F(t) be the distribution functions of T{ and f{ respectively, where we assume the tail behaviour (21)
1 - F{t)
and (22)
1 - P{t)
for positive constants 6, b and α. Note that in the case of model type II we have α = 1/2, i(t) — 1, the constants b and b being the scale parameters of
63
Laser Cooling and Stochastics
the stable distribution. For convenience, we denote the distribution function of σ := τ+τ by Fσ. Further, f and r are generic random variables distributed as Ti and T{. Introduce the two renewal processes (23) (24)
Mt = max {A; | h + n + f2 + ... + τk < t} Mt = max {k \ fi + n + f2 + ... + fk + τk < t]
By convention, we let Mt = 0 and Mt = 0 if the sets of A 's to maximize are empty. Define the process Mt
(25)
Mt
St =
The processes Mt and Mt decide the state of the cooling process. To see this, introduce the times R\ = fi, E\ = τ\ + ri, R2 = n + τi + f2, E2 = fi + τi + T2 + T2, The £7£-s denote the exzί times, i.e. the times when the process exits the cooling state. On the other hand, the Ri-s are the times the atom returns to the cooling state. It is easy to see that if Mt — ^ and Mt = n — 1, then t E [Rn, J5n), while if Mt = n and Mt = n, t G [En, i?n+i). Thus, Mt is either equal to or one less than M t . In the former case the process St is in the hot state (i.e. the waiting time to next change is distributed as fra), while in the latter St is in the cooled state (i.e. waiting time to next change is distributed according to τ n +i). In the previously introduced notation, At = 0 if and only if Mt = Mt + 1 while At — 1 if and only if Mt = Mt. We consider the asymptotic behaviour of the residual time Rt := t — St when we are in the 'cool' state, i.e. when At = 0. Motivated by the Dynkin-Lamperti theorem (see Bingham et al. (1987), p.361), it is natural to consider Rt/t and show that this has a limiting distribution. We adopt the argument in Bingham et al. (1987), p.361, to our case of two independent sequences of waiting times: Let u < v and w, v G [0,1]. We have that ut