A PRIMER ON STATISTICAL DISTRIBUTIONS
N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersburg State University Russia
A JOHN WILEY & SONS, INC., PUBLICATION
A PRIMER ON STATISTICAL DISTRIBUTIONS
This Page Intentionally Left Blank
A PRIMER ON STATISTICAL DISTRIBUTIONS
N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersburg State University Russia
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 0 2003 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, lnc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, e-mail:
[email protected]. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the US.at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data:
Balakrishnan, N., 1956A primer on statistical distributions / N. Balakrishnan and V.B. Nevzorov. p. cm. Includes hibliographical references and index. ISBN 0-471-42798-5 (acid-free paper) 1. Distribution (Probability theory) I. Nevzorov, Valery B., 1946- 11. Title. QA273.B25473 2003 519.2'4-dc21 Printed in the United States of America. I 0 9 8 7 6 5 4 3 2 1
2003041157
To my lovely daughters, Sarah and Julia
CNJN
To my wge, Ludmila (VB.N.)
This Page Intentionally Left Blank
CONTENTS PREFACE
xv
1 PRELIMINARIES 1.1 Ra.iidoni Varia.bles arid Distribut.ions . . . . . . . . . . . . . . . 1.2 Type of Distribution . . . . . . . . . . . . . . . . . . . . . . . . 1.3 h'foinent. Cha.ra.cteristics . . . . . . . . . . . . . . . . . . . . . . 1.4 S h p e Chara.ctcristics . . . . . . . . . . . . . . . . . . . . . . . 1.5 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Geiic?r;tt.iiig Function arid Cha.ra.cterist,icFuiict.iori . . . . . . . . . . . . . . . . . . . . . . 1.7 Decomposition of Distributions . . . . . . . . . . . . . . . . . . 1.8 St.able Dist.rihiitions . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Randoin Vectors and Multivariat.e Dist.ributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Conditional Distributioiis . . . . . . . . . . . . . . . . . . . . . 1.11 Moiiient Clmrxteristics o f Random Vect.ors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Coiidit.iona.l Expect,a.t,ioiis . . . . . . . . . . . . . . . . . . . . . 1.13 Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 Generat.ing Function of R.antlom Vwt.ors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.15 Tra.iisforiiia,t,ioiisof Variables . . . . . . . . . . . . . . . . . . .
I
DISCRETE DISTRIBUTIONS
1 1 4 4 7 8 10 14 14 15 18 19 20 21 22 24
27
29 2 DISCRETE UNIFORM DISTRIBUTION 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Not,a.tioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Molllrllt.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Grric:rat. iiig Fiinct.ion a.iid C1ixact.erist.ic. Fnnct.ion . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Convoliitioris . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Decorriposit.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7 Eiitropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.8 Rrhtiorisliips with Otlic'r Ilistributioiis . . . . . . . . . . . . . . 36
vii
viii
CONTENTS
3 DEGENERATE DISTRIBUTION 3.1 Int.rocluctioii . . . . . . . . . . . . . 3.2 hlommts . . . . . . . . . . . . . . 3.3 IiidcI,cndcnc:t:i(.c . . . . . . . . . . . . 3.4 Convolution . . . . . . . . . . . . . 3.5 Decorriposit.ion . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39 39 39 40 41 41
4 BERNOULLI DISTRIBUTION 4.1 1nt.roduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Nota.tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Convolut.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Maximal Valur:s . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Rda.t,ioriships with Other Distribiitions . . . . . . . . . . . . . .
43 43 43 44 45 46 47
5 BINOMIAL DISTRIBUTION 5.1 Iiitroduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Not,a.t,ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Useful Representatiori . . . . . . . . . . . . . . . . . . . . . . . 5.4 Generating Function a.nd Cliara.cterist,ic Function . . . . . . . . . . . . . . . . . . . . . . 5.5 R'lorllerlt.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 nlaxiiiiiiiii Proba.bilit,ies . . . . . . . . . . . . . . . . . . . . . . 5.7 Coiivoliitjioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.8 Dec.oinposit,ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 hlixturcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Coritlitioiial Probahilities . . . . . . . . . . . . . . . . . . . . . . 5.11 Tail Probabilit.ies . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Liiiiithg Distributions . . . . . . . . . . . . . . . . . . . . . . .
49 49 49 50 50 50 53 56 56 57 58 59
59
6 GEOMETRIC DISTRIBUTION 63 6.1 1nt.roduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 6.2 N o t a t h i s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Tail Prolmbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.4 Grricratiiig Fiinctioii mid Characteristic Funct.ion . . . . . . . . . . . . . . . . . . . . . . 64 6.5 Moinc~nts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.6 Convolut.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 . . 6.7 L>cvx)rripositiolls . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.8 Eiit.ropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.9 Conditional Probabilitics . . . . . . . . . . . . . . . . . . . . . . 71 6.10 Gtwiwt.ric Dist.riliiit.ion of 0rtic.r k . . . . . . . . . . . . . . . . 72
7 NEGATIVE BINOMIAL DISTRIBUTION 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Gcneratirig Function and Characteristic. Fiinctioii . . . . . . . . . . . . . . . . . . . . . .
73 73 74 74
CONTENTS
7.4 7.5 7.6 7.7
ix . . . .
. . . .
. . 74 . . . 76 . . 80 . . 81
. . . . . .
. . . . . .
. . . . . . . . . . . .
83 83 84 84 84 88
9 POISSON DISTRIBUTION 9.1 Iiitrotluctiori . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
hlornrmts . . . . . . . . . . . . . . . . . . . . . . . . . . Coiivolutioris arid Decoiripositions . . . . . . . . . . . Tail Probabilities . . . . . . . . . . . . . . . . . . . . . . Limiting Distributions . . . . . . . . . . . . . . . . . . .
8 HYPERGEOMETRIC DISTRIBUTION 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 8.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Grrierating Function . . . . . . . . . . . . . . . . . . 8.4 Chamcteristk Function . . . . . . . . . . . . . . . . 8.5 Moment.s . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Limiting Distxibut.ions . . . . . . . . . . . . . . . . .
83 . . . . . .
. . . . . .
89 9.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 9.3 Geiierating Function and 90 Chara.cterist,ic Function . . . . . . . . . . . . . . . . . . . . . . 9.4 Moirirwts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 9.5 Tail Prol-mbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 91 9.6 Corivoliit.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.7 Dt.c:oiiiposit,ioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.8 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . 94 9.9 Maxirrial Probability . . . . . . . . . . . . . . . . . . . . . . . . 95 96 9.10 Limiting Distribut.ion . . . . . . . . . . . . . . . . . . . . . . . . 9.11 Mixt.iires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.12 R.a.0-Rubin Cha.r~r.ct,t:rizatiori . . . . . . . . . . . . . . . . . . . 99 9.13 Geiicra.lized Poisson Distril-nition . . . . . . . . . . . . . . . . . 100
10 MISCELLANEA
10.1 10.2 10.3 10.4
I1
Iiitrodiictiori . . . . . . . . . . . . . . . . . . . P6lya Distribution . . . . . . . . . . . . . . . Pascal Distributioii . . . . . . . . . . . . . . . Negative Hypergeometric Distrihiitioii . . .
101 . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
CONTINUOUS DISTRIBUTIONS
11 UNIFORM DISTRIBUTION 11.1 Intmtluctioii . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101 101 102 103
105 107
107 11.2 Nota.tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.3 Moiiiellts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 110 11.5 Chara.ctcristic Furict.ion . . . . . . . . . . . . . . . . . . . . . . 11.6 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 11.7 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.8 Probability Integral Tra.nsforni . . . . . . . . . . . . . . . . . . 112 11.9 Distrihut.ioris of' Minima aad hla.xiriia . . . . . . . . . . . . . . . 112
CONTENTS
X
11.10 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . .
114
11.11 Relationships with Other Distributions . . . . . . . . . . . . . 117
12 CAUCHY DISTRIBUTION 12.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 12.4 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . .
119 119 120 120 120 121
12.6 St.able Distributions . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .
121 121
13 TRIANGULAR DISTRIBUTION 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 C1ia.racteristic Function . . . . . . . . . . . . . . . . .
. . . . .
123 123 123 124 125
14 POWER DISTRIBUTION 14.1 Iiitrodiiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127 127 127
14.3 14.4 14.5 14.6
..... ..... .....
Distributions of Maximal Values . . . . . . . . . . . . . . . . . 128 129 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 131
15 PARETO DISTRIBUTION 15.1 Introduction . . . . . . . . . . . . . . . 15.2 Notations . . . . . . . . . . . . . . . . 15.3 Distributions of Minimal Values . . . 15.4 Moments . . . . . . . . . . . . . . . . 15.5 Entropy . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. ..............
16 BETA DISTRIBUTION 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Some Transformations . . . . . . . . . . . . . . . . . . . . . . . 16.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 16.7 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 16.8 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.9 Relationships with Other Distributions . . . . . . . . . . . . . .
133 133 133 . 134 136 137 139 139 140 140 141 141 147 147 148 149
CONTENTS
xi
17 ARCSINE DISTRIBUTION 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8
151 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 151 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 154 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 154 Relationships with Other Distributions . . . . . . . . . . . . . . 155 155 Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
18 EXPONENTIAL DISTRIBUTION 157 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 18.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 18.3 Laplace Transform and Characteristic Function . . . . . . . . 158 18.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 18.5 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 160 18.6 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 162 18.7 Distributions of Minima . . . . . . . . . . . . . . . . . . . . . . 18.8 Uniform and Exponential Order Statistics . . . . . . . . . . . . 163 164 18.9 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.10 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . 165 18.11 Lack of Memory Property . . . . . . . . . . . . . . . . . . . . . 167 19 LAPLACE DISTRIBUTION 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 19.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 19.6 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.7 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8 Deconipositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.9 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .
169 169 169 170 171 172 172 173 174 174
20 GAMMA DISTRIBUTION 179 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 20.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 20.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 20.4 Laplace Transform and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 20.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 20.6 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 182 20.7 Convolutions and Decompositions . . . . . . . . . . . . . . . . . 185 20.8 Conditional Distributions and Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 20.9 Limiting Distributions . . . . . . . . . . . . . . . . . . . . . . . 187
xii
CONTENTS
21 EXTREME VALUE DISTRIBUTIONS 21.1 Int.roduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Limiting Distributions of Maximal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Limiting Distributions of Minimal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Relationships Between Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Generalized Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189 189 190 191 191 193 194
22 LOGISTIC DISTRIBUTION 197 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 22.2 Not.ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 22.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 22.4 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 201 22.5 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 201 22.6 Relationships with Other Distributions . . . . . . . . . . . . . . 203 22.7 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 22.8 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 22.9 Gcneralized Logistic Distributions . . . . . . . . . . . . . . . . 205 23 NORMAL DISTRIBUTION 209 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 23.2 Not.ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 23.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 23.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 23.5 Tail Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 23.6 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 214 23.7 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 23.8 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 217 23.9 Convolutions and Decompositions . . . . . . . . . . . . . . . . 217 23.10 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 219 23.11 Independence of Linear Combinations . . . . . . . . . . . . . . 220 221 23.12 Rernstein’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 23.13 Darnlois-Skitovit. ch’s Theorem . . . . . . . . . . . . . . . . . . 224 23.14 Helmert’s Transformation . . . . . . . . . . . . . . . . . . . . 226 23.15 Identity of Distributions of Linear Coixbinations . . . . . . . . 227 23.16 Asymptotic Relations . . . . . . . . . . . . . . . . . . . . . . 228 23.17 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 229 24 MISCELLANEA 235 24.1 Introdiiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 24.2 Linnik Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 235 24.3 Inverse Gaussian Distribution . . . . . . . . . . . . . . . . . . . 237 24.4 Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . 239 24.5 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
CONTENTS
xiii
24.5 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.7 Noncentra.1 Distribut.ions . . . . . . . . . . . . . . . . . . . . . .
240 245 246
I11
MULTIVARIATE DISTRIBUTIONS
247
25 MULTINOMIAL DISTRIBUTION 25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3 Conipositioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . 25.5 Conditional Distrihutioris . . . . . . . . . . . . . . . . . . . . . 25.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.7 Generating Function and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 25.8 Limit Theorcnis . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 MULTIVARIATE NORMAL DISTRIBUTION 26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 26.2 Notations . . . . . . . . . . . . . . . . . . . . . . . 26.3 Marginal Distributions . . . . . . . . . . . . . . . . 26.4 Distributions of Sums . . . . . . . . . . . . . . . . 26.5 Linear Combinations of Components . . . . . . . 26.6 Indeptmlenrc of Components . . . . . . . . . . . 26.7 Linear Transformations . . . . . . . . . . . . . . . 26.8 Bivariatc Normal Distribution . . . . . . . . . . .
249 249 250 250 250 251 252 254 256
259 259 260 ...... 262 . . . . . . 262 . . . . . . . 262 . . . . . . . 263 . . . . . . 264 . . . . . . . 265
. . . . . . . . . . . . . .
.
. . . . .
27 DIRICHLET DISTRIBUTION 269 27.1 Iiitroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 27.2 Derivation of Dirichlct Formula . . . . . . . . . . . . . . . . . . 271 27.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 27.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . 272 27.5 Marginal Moments . . . . . . . . . . . . . . . . . . . . . . . . . 274 27.6 Product Moments . . . . . . . . . . . . . . . . . . . . . . . . . 274 27.7 Diriclilet Distribution of Second Kind . . . . . . . . . . . . . . 275 27.8 Lioiiville Distribution . . . . . . . . . . . . . . . . . . . . . . . 276 APPENDIX
PIONEERS IN DISTRIBUTION THEORY
.
277
BIBLIOGRAPHY
289
AUTHOR INDEX
294
SUBJECT INDEX
297
This Page Intentionally Left Blank
PREFACE Distribiit,ioiis and t,heir properties a.nd iiiterrelationsliips assume a. very irnport,ant role iri most, upper-level uiidergradiia.tc a,s well a.s gradua.t,e courses in the st,at,istic:sprograin. For this rea.son, many introductory st,atistics textbooks discuss in a chapter or two a few h s i c st;ttistical distributions, such as binomial, Poisson, exponciitial, and normal. Yet a. good knowledge of sonic other distributions, such a.s gcornet,ric, negative binomial, Pareto, beta, ga.rrima, chi-square, logistic, Laplace, extreme value, niultinornial, niultivaria,te iiorIiia1, and Dirichlet will t)c iniiiierisely useful to those students who go on t,o upper-level untlergraduat,e or graduate courses in statist,ics. Students iii applied programs such as psychology, sociology, biology, geogra.phy, geology, econoinics, business, and erigirieeririg will also benefit sigriificaiit,ly from a,ri exposurc: to different distributions and their propert,ies as statisticti1 riiodelling of observed data is a.11intogral p a t , of t,h It, is for this rea.son we have prepa.red this textbook which is tailor-made (i1 distributions. All for a. one-term course (of a.bont, 35 lectures) on s t tlie prelimiimry coric:r:pt,s arid definitions are prcs in Chapter 1. The rest of the inaterial is divided into three parts, witah Part I covering discretc: distributions, Pa.rt I1 covering continuous distributions, a.nd Part I11 covering niult,ivariate distributions. In ea.ch cha.pt,er we have induded a few pertinent exercises (at,ail a.ppropriate level for students taking tlie course) which may be handed out as lioiriework at the end of each chapter. A biographical sketch of some of the 1oa.ding contribut,ors to t,hc area of statistical distribution theory is presented in the Appcndix to present students with a historical sensc o f development,s in t,his irnporta.nt a.nd fundamental area in the field of statistics. From our experience, uld suggest the following lecture allocation for teaching a, course on sta distributions based on t’his hook: 5
9 17 4
lectures lectures lectures lectures
on on
on
on
p7elirninaries discrete distributions continuous distributioris multi.ocl7.iatt:iu~iatedi~tributions
(Cha.pter 1) (Part, I) (Pa.rt 11) (Pa.rt 111)
We welcome cxmirnerits and criticisms from all those who tcach a coiirsc based on this hook. Any suggest ions for improvcnient or “neccssary” addit ion (omission of which in this vcmion should be regarded a s a consequcncc of our xv
PREFACE
xvi
ignorance, not of personal nonscic:ntific antipa.t,hy) sent t,o us will be riiuc~li .ted a i d will lit, acted upon when the opport,iiriit,yarises. inportant to mentiori hcrt: tliat, inany a.ut,horit,ativeand eiic.yc.lopc:dic* tical dist,ribiit,ioii theory exist in tlit literat,ure. For exarriplc: volll1llr:s 011 0
0
0
0
0
0
0
Johnson, Kot,z, a.nd Kemp (1992), dcscrihing discret>eiinivariatt: tlistribiit,ioiis Stuart, and Ord (1993), discussing gerieral distribiit,ion t,kir:ory ,Johnson, Kot,z, a.nd Ba.la.krishnaii (1994, 1995), clescribing continiious uiiivariat,c tlistribiitioiis Johnson, Kot A , arid Balakrisliriaii (lW7), describing discrt>tvniiiltivari;it? distrihitioris Vl'iniiner and Altniarin (1999), provitling variatt. tlistrihtions
rl
tlicsaiirus on discrete uni-
Evans, P t w v c k , and Hastiiigs (2000), desc.ril)ing discrete arid contiriiioiis distributions Kotz, Balwkrislinaii, arid ,Johnson (2000), discussing continiioiii rnultiV ~ I iatc. I tlistri1)iitiorls
are soni(~of t h prorniiiciit, ories. In addition, t,herr a.re sepa.ra.te books dedicatcd t,o sonic>specific distributions, such as Poisson, generaliimd Poisson, cliisquare, Pareto, exponent,ial, lognormal, logistic, normal, a.nd La.placc, (which have a11 hrcii rc,ft:rrtd to in this hook at appropria.tv places). Tliese books may 11c cwisiiltcd for any a.dditionad inform a.t'1011. WP t,a.ke this opport,iinit,y t,o express our sincere t,lianks t,o Mr. St,t:ve Quiglcy (of Johii Wiley & Sons, New York) for his support a.nd encoura.genient diiriiig t,he preparation of tjliis hook. Our specia.1 t1ia.nks go to htrs. Dc+bic: Iscoe (hlississauga, Ontario, Caiiatla.) for a.ssisting us wit,li t,lic canit:raproduction of the maniiscript, a.nc1 t,o Mr. Weiquari Liu for preparing all the figiircs. Wc also a.ckiiowledge wit,h gra.titude the financial support provided by the Natural Scicnces itrid Enginteririg R~esea.rcliCouncil of Canada. a.nd t,hr Riissia,ii Fountlation of Basic Research (Graiit,s 01-01-00031 aiid 00-15-96019) during t,lirx (aiirst' o f this projc
N . BALAKRISHNAN Hainilton, Canada
V. B. NEVZOROV St,. Pctcrsbiirg, Russia
April 2003
CHAPTER 1
PRELIMINARIES In this chapter we present some basic notations, notions, and definitions which a reader of this book must absolutely know in order t o follow subsequent chapters .
1.1
Random Variables and Distributions
Let (R, 7 ,P ) be a probability space, where R = {w} is a set of elementary events, 7 is a a-algebra of events, and P is a probability measure defined on ( 1 2 , 7 ) . Fiirthcr, let B denote an elcment of the Borel a-algebra of subsets of the real line R.
Definition 1.1 A finite sirigl(:-va.lued fiinction X = X ( w ) whic,h maps 0 into R is called a randorn v n ~ i u b l eif for a.riy Borel set, B in R, the inverse image of B, i.e., X - y B ) = {w : X ( w ) t B } belongs t o the a-dgebra. 7 . It means that for all B o d sets B , on(' can definc probahilities
P { X t B} = P { X - ' ( B ) } . In particular, if for any :x function
(-m
<
;G
<
m ) we take
B
=
(-o,z], then the
is defined for the random variable X .
Definition 1.2 Thc function F(:c) is called the distribution function or cumulative distrihu,tion,function (cdf) of tlic random variable X . Remark 1.1 Quite often, the cumulative distribution fiinction of a. random variable X is defined as
G(z) = P { X < x}. 1
2
PRELIMINARIES
Most of the propert,ios of both these versions of cdf (i.e., F a.nd G) coincide. Only oiic iiiiport,a.rit,difference exists bet,weeii functions F ( z ) a.nd G(z): F is right, coiitinuoiis, while G is left contiiiiious. 111 our treatmentj we iisc thc: cdf as given in Definition 1.2.
Thew a.rr three typm of distributions: absolutcly coiitiriiioiis, discret,e and singular, a.iid ariy cdf F ( x ) can be represented a.s a riiixt,urr
F ( x ) = P l F I ( X ) +I)”F2(z)+ m F 3 ( z )
(1.2)
of absolut,ely cwiitiiiuous Fl , discrete F2, a.nd singu1a.r F3 cdf’s, with nori~iegat,ivc:weights p l , p 2 , a.nd p 3 such that pl + p 2 p 3 = 1. In this book we restrict ourselves to dist,ribut,ioiis which are either purely absolut,ely cont>inuoils or purely discrete.
+
Definition 1.3 A random va.ria.ble X is said to havc a discrete distribution if there mists a countable set, B = { L I , z2,.. .} such tjliat,
P { X E B } = 1. Remark 1.2 To det,erniinr a random variablc having a discrete distribution, one mist>fix two sequences: a sequence of value 1 , z2.. . . and a. sequence of probahilititxs p k = P { X = zk}, k = 1:2 k
In this case, the cdf of X is given by (1.3) Definition 1.4 A ra.ndom varkble X with a. cdf F is said t o have ari absolutely coritirau,ous distribution if there exists a. nonnegative fiinctioii p(x) sur:h that,
F ( z )= & ( t )
dt
(1.4)
for ariy rcal :r. Remark 1.3 The fiiriction p ( ~then ) satisfies the condition
p(t) d t
= 1,
(1.5)
and it is called t,he probability density function, (pdf) of X . Note tha.t any nonnegativt. fiiiictioii p ( z ) satisfying (1.5) ( x i be the pdf of soiiie rantloni va.riablr X . Remark 1.4 If a random variable X has an absolutely contiriiious distribution, then its cclf F(.c) is continuous.
RANDOM VARIABLES AND DISTRIBUTIONS
3
Definition 1.5 We say that random variables X and Y ha.ve the same distribution, and write d
X = Y if the cdf’s of X and Y (i.e., F x and FX(2)= P { X
5
2)
Fy)
(1.6) coincide; that is,
= P{Y
5 }.
= Fy(2)
v 2.
Exercise 1.1 Construct an example of a probability space (R,T,P)and a finite single-valued function X = X ( w ) ,w E R, which ma.ps Cl into R,that is not a ra.ndom variable. Exercise 1.2 Let p ( z ) and q ( z )be probability dciisity fuhctions of two random variables. Consider now the followiiig functions: 1 ( a ) 2 P ( Z ) - 4x1; ( b ) P ( Z ) + % ( X I ; (c) IP(Z) - q(J.)l; ( 4 2 ( d r )+ d z ) ) . Which of these functions are probability density functions of some random variable for any choice of p ( z ) and q(z)?Which of them can br valid probability density functions under suitably chosen p ( z ) and q ( z ) ? Is there a function that can never he a probability density function of a random variable?
Exercise 1.3 Suppose that p ( z ) arid q(x) are probability density functions of X and Y , respectively, satisfying
p(x) = 2 Then, find P { X
< -1)
-
q(s) for
+ P { Y < 2).
0 < z < 1.
The quaritrle functron of a random variable X with cdf F ( z ) is defined by
Q ( u ) = inf{r : F ( z ) 2 u } ,
0 < ?L < 1.
In the case when X has an absolutely continuous distribution, thrn the quantile function & ( u ) inay simply be written as
Q(u)= F-yu),
0 < 7L
< 1.
The corresponding qunntrlc dcnsrt y functrori is given by
where p ( z ) is the pdf corresponding to the cdf F ( z ) . It, should be noted that just, as forms of F ( z ) may be used to propose familics of distributions, general forms of the quaiitile function Q ( u )may also be used to propose families of statistical distributions. Interested readers may refer to the recent hook by Gilchrist (2000) for a detailed discussion on statist,ical niodelling wit,li qimntile funct,ioris.
4
1.2
PRELIMINARIES
Type of Distribution
Definition 1.6 R.a.ndom variables X and Y arc said t20 belong to the sa.m,e type of di,strilmtion if there exist corist,ant,s n a.nd h > 0 such that
Y
d
= a+
hX.
(1.7)
Not,c then that, the cdf’s F=( and F y of the random variables X and satisfy tlic rtlatiori
F y ( x ) = Fx
Y
2-u ( 7 ‘d 2.)
One t ~ ~ itherefore, i , choost: a certa.in cdf F as tlie sta.nda,rd distribution fiinction of i t certain tlistribution family. Then this family would consist of all cdf’s of the form
and
F ( x ) = F ( x ,0: 1).
Tliiis, we have a two-pa.rarneter fa.mily of cdf’s F ( z ,a , h ) , where a is ca.lled thrx location pawmeter and h is t,he scale parameter. For a.bsolut,ely coiitiriuous distributions, one can introduce tlie corrcspondirig two-para.meter families of proba.bility density functions:
(1.10) wherc p ( ~ r = ) p ( a . 0 , l ) corresponds t o the random variable X with cdf E’, aiid p(x,a , I r ) cmresponds to the randoin variable Y = a h X with cdf F ( a ,(1, h )
+
1.3
Moment Characteristics
Tliare a,re soin? classical numerical c1iara.cteristics of random va.ria.blcs a.nd their dist,ribut ions. The most popular oms are expected values a.nd variances. Morc g m m d cliara.ct,rristics are the momen,ts. Among them, we emphasize rnoincnt s ;tl)out zero (about, origin) a.nd cent>ralmorrimts. Definition 1.7 For a. discrtite ra.ndorn variable X taking on va.lues 2 1 , 2 2 ; . . . wit,li proba.bilit,ies Ilk =
P{X
k
= Zk},
=
1 , 2 , . . .,
wt’ define t,lie n t h rnonaent of X about zero a.s
(1.11) k
We say tsliat oTL exists if
MOMENT CHAR.ACTERISTICS Notc that the cxpected value E X is nothing but mean of X or the mathematecal Pxpectataon o j X .
a1.
5
E X is also called the
Definition 1.8 The nth central m o m e n t of X is defined as (1.12)
c
given that,
k
1x1,- EXl"pk < 00.
If a random variable X has an absolutely continuous distribution with a pdf p(x), then the nioments about zero and t,hc central moments have the following expressions: 30
a,, = EX'l = l m z " p ( x ) dx
(1.13)
and (1.14)
We say that rnoiiients (1.13) exist if (1.15) The varzanc~of X is simply the sccond central riiornent: Var
x = p2 = E ( X
-
EX)^.
(1.16)
Central rrioriients are easily exprcssed in ternis of rnomerit,s about zero as follows:
d,,
=
E(X -- E X ) "
C(-l)" n
=
k=O
(1.17) k=O
In particular, we have Va.r X
= ,32 = a2
and
Note that tlir first central iriornent 81 = 0.
--
aI 2
(1.18)
6
PRELIMINARIES
The inverse problem cannot be solved, however, because all ccntral moments save no information about E X ; hence, the expected value cannot be expressed in terms of PTL( n = 1 , 2 . . . .). Nevertheless, the relation an
= =
=
k=O
c)
k=O
(L)ffFPn-k
EX"
9 2
=
E [ ( X- E X ) + E X ] " ( E X ) ' " E ( X- E X ) " - k (12 0 )
will cnahle us to express a , ( n = 2.3,. . .) in terms of central moments /&, . . . . In particular, we have
+3
a3 =
+
h ~ i a;
0 2 =
Pz + (27,
and
a4 =
E X and t,he
a1
(1.21)
p4+ 4 0 3 ~ ~t-16p2a: + a;.
(1.22)
Let X aiitl Y belong to the sa.rne type of distribution [see (1.7)], rnea.ning that, d Y =a, hX
+
for some constmts a and h > 0. Then, the following equalities a.llow us t o exprcss moments of Y in terms of the corresponding moments of X :
(1.23) and
E ( Y - -E Y ) " = E [ h ( X
~
E X ) ] " = h,"E(X - E X ) 7 1 .
(1.24)
Note that the centxal niomcnts of Y do not depend 011 t,ha 1oca.tioiipara.ineter a. As partic:ul;tr ca.ses of (1.23) and (1.24), wc havc
EY BY2 EY" EY'
=I
= = =
u+hEX,
(1.25)
az
(1.26) (1.27) (1.28)
+ 2ahEX + h,'EX2, Var Y = h2 Var X , + 3a2hEX + 3ah2EX2$- h 3 E X 3 , a4 + 4u'hEX + Ba2hzEX2+ 4 u h 3 E X 3 + h 4 E X " .
Definition 1.9 For ra.ndorn varia,bles takiiig oil values 0, 1 , 2 , . . ., tliejactorial momeats of pos%t.l,?ie order are defined as p,. = E X ( X
-
1). . . ( X
- 7'
+I),
'r = 1,2, . . .
~
(1.29)
while the f a c t o k l morrren,ts of negative order are defined as / L r=
E
[(X
+
1 l ) ( X '2).
+
I
. . (X + 7.) '
r
= 1,2,
(1.30)
SHAPE CHARACTERISTICS
7
While dealiiig with discrete distributions, it is quite often convenient to work with these fa.ctorial moments rather t,hari regular moments. For this reason, it is useful to note t,he following rehtionships between thc fa.ctoria1 rnoinents and the moments:
Exercise 1.4 Present, two different miidom variables having the same cxpectatioris and the same variances. Exercise 1.5 Let X be a random variable with expectation E X and variance Var X . Wha.t, is the sign of r ( X ) = E ( X - iXl)(Var X - Var IXl)? When does the qua.ntity r ( X ) eyua.1O? Exercise 1.6 Suppose tha.t X is a random variable such that P { X > 0) = 1 and that both E X and E ( l / X ) exist. Then, show that E X E ( l / X ) 2 2 .
+
Exercise 1.7 Suppose that P(0 5 X 5 l } = 1. Then, prove that E X 2 5 E X 5 E X 2 f . Also, find a.ll distributions for which the left and right bounds are attained.
+
Exercise 1.8 Construct a varia.ble X for which E X 3 = -5 and E X 6 = 24.
1.4
Shape Characteristics
For any distribution, we are often interested in some cha.ra.cteristics that are associated with t,he shape of the distribution. For example, we may be interested in finding out whether it is unimodal, or skewed, and so on. Two important measures in this respect are Pearson’s measures of skewness and kurtosis.
PRELIMINARIES
8
Definition 1.10 Pearson’s measures of skewness a.nd kurtosis are given hy
and
iij4
$2
72 = -
’
Since tliese mea.sures are functions of central moments, it is clear t,liat, they are free of t,lir. location. Siinilarly, dur to the fra.ct,ionalforni of thc rnca.sures, it can readily bt? vcrified that they are free of sca.le as well. It ca.n also he seen that tlie nieasure of skewness y1 may take on positive or nega.tive valiics depending on whtther /3:, is positive or negative, respectively. Obviously, whcn the distribiitiori is symnietric aboiit its mean, we may note that, /jn is 0, in which cast! tlie measure of skewiicss y1 is also 0. Hence, distrihiitions with y1 > 0 a.re sa.id to be positively skewed distributiorrs, while those with y1 < 0 arc said toohe n q n t i v e l y skewed distributions. Now, witliout, loss of generality, let, us consider a n arbitrary distribution with niean 0 a.nd va.riance 1. Then, by writing
and applyirig thr. Caiichy Schwarz ineqiiality, we readily obtain the inequality
Lat,er, we will observe the coefficient of kurtosis of a norma.1 distribution to hr 3 . Based on this value, distributions with 2 2 > 3 are called Zeptokwrtic distribu,tions, while those with y2 < 3 a,re ca.lled plntykurtic distributionw. Incidenta.lly, distribut,ionsfor which y2 = 3 (which clea.rly includes the normal) arc called m,esokurtic distributions.
Remark 1.5 Karl Pearson (1895) designed a. system of continuous distributions whcrcin t,he pdf of every riieniber satisfies a. differential cqiia.tion. By studying t,lit:ir irioriieiit properties and, in particiila.r, their coefficients of skewness and kiirtosis, he proposed seven families of distributioris which all occupied different, rcgioiis of the (71,~2)-pla,iie.Several prominent dist~rihutJions (such as beta., gxnma, normal, and t that, we will set in subsequent cha.pters) bclong to t,hcsc families. This tieveloprnent wa.s the first and historic attempt t o propose it iiriificd mechanism for developing different families of sta.tistica1 distributioiis.
1.5
Entropy
One more useful charact,erist,icof distributions (called entropy) was int,roducecl by Shannon.
ENTROPY
9
Definition 1.11 For a discrete random variable X taking on values . . . with probabilities p l , p 2 , . . . , the e72tropy H(X)is defined as
.c1,x2,
If X has an absolutely continuous distribution with pdf p(x), then t h entropy is defined as (1.39) where
D
=
{x : p(x) > O}.
In the case of discrete distributions, the transformation
Y
=u
+ hX.
-CC
< a < 00, h > 0
does not change the probabilities p , a.nd, consequently, we ha.ve
H ( Y ) = H(X). On the other h n d , if X has a pdf p ( . z ) , then Y
=a
+ hX
ha.s the pdf
and
whcre
It, is thcn easy to verify that
=
log h -tH ( X ) .
(1.40)
10
PRELIMINARIES
Generating Function and Characteristic F'unct ion
1.6
In this section we present some functions that are useful in geiicratiiig the probahilit,ies or the niornrnts of the distribution in a siiiiplc arid unified nia,nner. In addition, they ma.y also help in identifying the distribution of a.n underlying random va.ria.ble of interest.
Definition 1.12 Let X take on values 0, 1 , 2 , .. . with proba.bilit,ies p , = P { X = n } , ri = 0 , 1 , . . , . All the information a.bout this distribution is contained in tlir generntin,g function, which is defined as (1.41) n=O
with the right-hand side (RHS) of (1.41) converging a t least for / s / 5 1. Sonie iiiiportant properties of generating functions are as follows:
(a) P(1)= 1;
(11) for
Is/ <
1, t,liere exist deriva,t,ives of P ( s ) of any order;
(c) for o 5 s < 1, ~ ( sarici ) all its clcrivatives P ( ' ) ( s ) k, = 1 , 2 , .. . , a.rc: nonnegative incrmsing convex functions;
((1) the generating function P ( s ) uniquely determines protmbilities p T Ln, = 1 . 2 , . . . , and the followiiig relations are valid: PO
=
P(O),
( e ) if raridoiii va.riables X1, . . . , X, are independent, a.iid have generat,iiig finict ions Pk(s)=Esxk, k = l , . . . , n, then the gerieratirig function of thc sum Y = X 1 relation n
+ . . . + X,
Pk(S);
P y ( s ) =:
sa,tisfies t,hc
(1.42)
k=l
( f ) tlw factorial inonirnts can be determined from the generating function its
pk = E X ( X - l ) . . . ( X - k f l ) = P(')(l).
whtw
P('-)(I)= limP(')(A). ST1
(1.43)
GENERATING FUNCTION AND CHARACTERISTIC FUNCTION
11
Definition 1.13 The characteristic function f ( t ) of a random variable X is defined as
f ( t ) = Eexp{itX} If X takes on values x k ( k then
=EcostX+i
EsintX.
(1.44)
1 , 2 , .. .) with probabilities
p k = P{X = x k } ,
1
k
k
k
For a random variable having a. pdf p ( x ) , the characteristic function takes on an analogous form:
f(t)
=
J’
m
e i t z p ( z ) dx
-00
s, 00
-
cos(tz) p ( x ) dx + i
00
sin(tx) p ( z ) d x .
(1.46)
For random variables taking on values 0 , 1 , 2 ,. . . , there exists the following relationship between the characteristic function and the generating function: (1.47)
f ( t ) = P(e2t).
Some of the useful properties of characteristic functions are as follows:
f ( t ) is uniformly continuous; f ( t ) uniquely determines the distribution of the corresponding random variable X: if X has tlie characteristic fiinction f , then Y acteristic function g ( t ) = eznt f ( h t ) ;
=
a
+ h X has the char-
if ra.ndorn variables X I , . . . , X, are independent and their chamcteristic functions are fl(t), . . . f T L ( t )respectively, , then the characteristic function of the sum Y = Xi . . . X, is given by ~
+ +
PRELIMIN AR.IES
12
( g ) if the nth moment E X " of the random variable X exists, then the
charactcristic function f ( t ) of X has the first n derivatives, and
(1.49) moreover, in this situation, the following expansion is valid for the characteristic function: 7L
k=1 n
(1.50) k=l
where as
t
r, ( t )= o(t7')
+ 0;
(11) let random variables X , X I ,Xa.. . . have cdf's F , F l . F2,. . . and characteristic functions f , f l , f 2 , . . . , respectively. If for any fixrd t , as n i cx), f n ( C + f(t)l
(1.51)
&(x) + F ( z )
(1.52)
t hcn
for any 5 , where the limiting cdf is continuous. Notje that (1.52) also implies ( 1.51). There exist inversion formulas for charact,eristic functions which will enable us to determine the distribution that corresponds to a certain characteristic function. For exmiple, if
s, cx)
If(t)ldt < 00,
where f ( t ) is t h t characteristic function of a random variable X , then X has t hc pdf p ( z ) givtw by (1.53)
Remark 1.6 Instead of working with chara ristic fiinctioris, orif: c ~ u l t del fine the rmomen,t generntiny function of a ra om variable X a.s E exp{tX} (a real fiinction this time) a.nd work with it. However, t,ht:re a.re inst,arices where t,his nionicnt, genemting function may not exist, while the cha.ra. tic furict,ion a,lwa?jsexists. A classic exa.mple of this r11a.y be seen 1a.tt:r when we discuss Cauchy distxibutions. Nonethelcss, wheii the moment grnerating funct,ion does exist, it uniquely determines the dist,ribution just as the characteristic funct,ion does.
GENERATING FUNCTION AND CHARACTERISTIC FUNCTION
13
Exercise 1.9 Consider a random variable X which takes on values 0 , 1 , 2 , . . . with probabilities p , = P { X = n } ,n = 0 , 1 , 2 , . , . . Let P ( s ) be its generating find the probabilities function. If it is known that P ( 0 ) = 0 and P ( i ) = Pn.
i,
Exercise 1.10 Let P ( s ) and Q ( s )be the generating functions of the random variables X and Y . Suppose it is known that both EX and EY exist and that P ( s ) 2 Q ( s ) , 0 5 s < 1. W1ia.t can be sa.id about E(X - Y ) ? Can this expectation be positive, nega.tive, or zero? Exercise 1.11 If f ( t ) is a. chara.cteristic fiinction, then prove that, the functions
where Re f ( t ) denotes the real part of f ( t ) ,are also characteristic functions. Exercise 1.12 If f ( t ) is a characteristic function that is twice differentiable, prove that the fiinction g ( t ) = f ” ( t ) / , f ” ( 0 ) is also a cha.racteristic function. Exercise 1.13 Consider the functions f ( t ) and g ( t ) = 2 f ( t ) - 1. Then, prove that if g ( t ) is a characteristic function, f ( t ) also ought to be a characteristic function. The reverse may not be true. To prove this, construct an example of a characteristic function f ( t ) for which g ( t ) is not a characteristic function. Exercise 1.14 Find the only function among the following which is a characteristic function: f(t), f2(2t),
f“(3t),
arid
f6(6t).
Exercise 1.15 Find the only function among t h t following which is not a characteristic function: f(t), 2f(t)
-
1, 3 f ( t )
-
2,
and
4f(t)
-
3.
Exercise 1.16 It is easy to verify that f ( t ) = cos f is a characteristic function of a randoni variable that takes on values 1 and -1 with equal probability of :. Consider now the following functions:
Which of these are characteristic functions? Exercise 1.17 Prove that the functions f n ( t ) = COSY - sinTLt, n are characteristic functions only if n is an even integer.
= 1,2,. ..
14
1.7
PRELIMINARIES
Decomposition of Distributions
Definition 1.14 We say that a random va.riable X is decomposable if there d are two independent nondegenerate random variables Y and 2 such that X =
Y t-2.
Remark 1.7 An equivalent definition can be given in terms of characteristic functions as follows. A characteristic function f is decomposable if there are two nondegenemte characteristic functions f l anf f2 such that
f ( t )= fl(t)f"(t).
(1.54)
Note that degenerate characteristic functions have the form eiCt, which corresponds to the degenerate random variable taking on the value c with probability 1.
Definition 1.15 If for any n the relation
=
1 , 2 , .. . , a characteristic function f satisfies
f ( t )= {f7L(t)r
3
(1.55)
where f T 1 ( tare ) characteristic functions for any n = 1 , 2 , .. . , then we say that f is an infinitely divisible characteristic f u n c t i o n . A random variable is said to have an infinitely divisible distribution if its characteristic function f ( t ) is infinitely divisible.
Remark 1.8 Note that if a random variable has a bounded support, it cannot be infinitely divisible. It should also be noted that an infinitely divisible characteristic function cannot have zero values. Exercise 1.18 Construct a decomposable random variable X for which X 2 is indccomposa.ble. Exercise 1.19 Construct two indecomposable random variables X and Y for which X Y is decomposable.
1.8
Stable Distributions
Definition 1.16 We say that a chara.teristic function f is stable if, for any positive 0,1 a.nd n2, there exist n > 0 and h such that f ( a 1 t )f
(azt) = e"D"f(at).
(1.56)
A raridoni varia.ble is said to ha.ve a stable distribution if it,s characteristic function is stable. Remark 1.9 It is of interest t o note tha.t a.ny sta.ble distribution is a.bsolutely continuous, and is also infinitely divisible.
15
R.ANDOM VECTORS AND MULTIVARIATE DISTRIBUTIONS
1.9 Random Vectors and Multivariate Distributions Let (R, 7 ,P ) be a probability spa.ce where (1 = { w } is a set, of element,ary events, 7 is a. a-algebra, of events, and P is a probability mea,siire defined on (a, 7 ) .Further, let B denote an element, of tlie Borel a-algebra. of subsets of the n-din1ensiona.l Euclidean space R'"
Definition 1.17 An n-dimensional vector X = X(w) = (X,(W),. . . , X,,(w)) which maps R into R" is ca.lled a random vector (or an n-dimensional random variable) if, for any Borel set B in R",the inverse image of B giveii by
B} = {W : (X,(W),. . . X,(W)) t B}
= {W : X(W)E
X-l(B)
belongs to the a-algebra 7 . This niea.ns t,ha.t,for any Borel set B, we can define probability as
P{X t B} = P{X-l(B)}. In particular, for ariy x
= (XI,.
F(x) = F ( z 1 , . . .
. . , z,), the fimctioii
2,)
=
P ( X 1 5 XI,.. . , X , 5 Z,}? -CC < x i , . . . ,x, < CC,
(1.57)
is defined for the randoin vector X = (XI, . . . , Xn).
Definition 1.18 The function F ( x ) = F ( z 1 , . . . , x7,)is called the distribsution function of the ra.ridoin v Remark 1.10 The eltments X I , .. . , X , of the random vector X car1 be considered as n univariate random variables having distribution functions Fl(X)
=
F ( z :m , . . . ,m) = P ( X 1 5 x } ,
FZ(2)
=
F ( 3 0 , 2 , 0 0 , . . . , m ) = P ( X 2 5 X}; F ( o o , . . . . x , 2 ) = P { X n 5 x},
respectively. Moreover, ariy set of n random variables X I , . . . , X,, forms a random vector X = ( X I , .. . , X,). Hence,
F ( x ) = F ( z 1 , .. . , 2 , ) = P{X1 5
51,.
..
;x,5 2 , }
.
is often called t,he joint distribution, function of the variables X I ,. . . X,.
.
.T:'~)
is the joint, distribution function of the ra.ridorri variables
can obtain from it t,he joint distribiit,ion fiinction of any subset
,!)
ra.tlr1t.r ra.sily. For exa.rnple, we have
, x, 5 z:7n}= F ( s 1 , . . . as the joirit distribution function of
(XI,.
X,,,,w . . . . ; m )
. . ,X T n ) .
(1.58)
PRELIMINARIES
16
Definition 1.19 The ra.ndom variables X I . . . . X , a.re said to be independ e n t random! variables if
P(X1
5 2 1 , . . . , x , , 5 Xn}
n 7,
=
P{Xk I Zk}
(1.59)
k=l
for any -m
< % k < oc ( k = I , , . . , n ) .
.
Definition 1.20 The vectors XI = ( X I , . . . X n ) and X2 = ( X T L + l.,.. Xn+m) are said to be independent if
for any -m
< J'k < co ( k = 1,.. . , n + m ) .
In the following discussion we restrict ourselves t o the two-diniensiona.1 case. Let, ( X , Y ) be a. two-dimensional random vector and let F ( z ,y) = P ( X 5 r. Y 5 y) be the joint, distribut,ion function of ( X ,Y ) . Then, F x ( 2 ) = F ( J ,m) a.nd FlJ (y) = F ( m , y) are the marginal distribution functions. Now, as we did ea.rlier in the univariate case, we shall discuss discret,c a.nd absolutely continuous cases sepa.ra.tely. Definition 1.21 A two-diinensiona.1 random vector (X,Y )is sa.id t o have a. discrefe bi7iariate distribution if there cxists a. countable set, B = ( ( 2 1 ; y l ) , ( I C ~ , U ~. ). ,.} such that, P { ( X , Y )E B } = 1. Remark 1.11 In order to determine a.two-dimensional random vcctor ( X ,Y ) having a biva.riate discrete distribution, we need t,o fix two sequences: a. sequence of two-dimensional points ( 2 1 , y l ) , ( 2 2 , y2). . . . and a. sequcnce of proba.bilities p k = P { X = x k , Y = yk}, k = 1 , 2 , . . . , such that X k P k = 1. In this casc, the joint distribution fiinct,ion F ( z ,y) of (X,Y ) is given by (1.61)
Also, the coniponents of the vector ( X ,Y ) arc independent if
P{x
= .rk,Y = y k } =
P{X
= .ck}P{Y = y ~ }
for any k .
(1.62)
Definition 1.22 A two-dimensional random vect,or ( X ,Y ) with a joint distribution function F ( z , y) is said to have an absolutely con)tinwous biwarinte distribution, if there exists a nonnegat,ive function p ( u , such tha.t !ti)
( 1.63)
for any rcal
2
and y.
R.ANDOM VECTORS AND MULTIVARIATE DISTRIBUTIONS
17
Remark 1.12 The function p(u..rs) satisfies the condition (1.64) and it is ca.lled the probability density f u n c t i o n (pdf) of the bivariate random vector ( X , Y ) or the j o i n t probability density f u n c t i o n of the random variables X and Y.If p ( u , 21) is the pdf of' tlie bivariatr: vector ( X ,Y ) ,then tlic components X and Y have one-dimensional (margiiial) denshies (1.65)
(1.66) respect,ively. Also, the comporicnts of the absolutely continuous bivariate vector ( X ,Y ) are independent if P ( U , 2') = P X ( U ) P Y (7'1,
(1.67)
wlicrc px ( u ) and p y ( I>) are the marginal t1oiisitic.s as givcn in (1.65) and (1.66). Moreover, if the joint pdf p ( u , I > ) of ( X ,Y)admits a factorization of thp form P ( U , 2') =
ql(u)q2(v).
(1.68)
then the components X and Y are iridependent,, and tliere exists a nonzero constant c such that
Exercise 1.20 Let F ( J ,y) denote tlie distribution function of the random vector ( X , Y ) . Then, exprcss P { X 5 0,Y > l} in terms of the function
F ( x ,Y). Exercise 1.21 Let F ( z ) denote the distribution function of a random variable X . Consider the vector X, = ( X ,. . . , X ) with all its corriporimts coinciding with X . Express the distribution function of X, iri ternis of F ( x ) .
PRELIMINARIES
18
1.10
Conditional Distributions
Let ( X .k’) be a rantlorn vector having a discretc bivariat,? distribution conc w i t m t d on smie poiiit,s (xi,g,?),and Ict, p z J = P { X = ri,Y = y j } , qi
=
P{X
= xi}
> 0 , and rj
=
P(Y = yJ} > 0,
for i > j= 1 , 2 , . . . . Then, for any y.7 ( j = 1,2 , . . .), the conditional distrib,utio,ri of X . g i i w t , I-’ = yj, is defined a.s
(1.69) Siinilarly, for any :c, ( i X = r , , is dcfined a.s
P{Y
=
=
1,2: . . .), the conditional distribution of
? J ( X=xi}
=
P { X = 22,Y = yj} P{X =Xi}
-
pi, -
Y,givcn (1.70)
(12
N<.xt, lrt ( X ,Y)be an absolutely continuous hivariate random vcctor with a joint pdf p ( z .y ) and niargirial pdf’s I ) ~ ( T and ) p y ( ? j ) . In this case, for any vahic. !J at which py(y) is continuous and positive, the cond7tzonal p d f of X . gzoen Y = ~ j is, tlcfinctl as (1.71) Similarly, for any value T a t which px ( 2 ) is continuous and positivc, the ronditioncil p d f of Y,g l u m X = .r, is defined as (1.72)
Remark 1.13 Though the derivation of conditional distrihutioiis froin a. specified t)iv:i,ria,t,edistribiit,ion is rat,lier stra.ightforwa.rc1, t,lie rcversr is not, so, 1iow.t:vc~. Tlie constmiction of a. hivariat,r dist,ributioii with specified conditional distribut,ions requires solutions of fimct,iona.l equat,ioiis; for a. det,a.iled disciissioii, oiie ma,y refer to tlit: book by Arnold, Cast,illo, a.nd Sarabia (1999). In an analogous manner, wc c a r 1 also define condit,ional dist,ribiitioiis for t,hr general case o f ‘n-diiiieiisional randoni vectors (XI . X , > ) wit>li pdf px(.rl. . . . , .c,,). For example, let 11s consider the case wlie lie ?~-tliiiic.nsioiial ( X I ,. . . . X 7 ! )has an a.t)soliit,elycont,iiiiious dist,ril)iition. raadorn vector X : Let, u == ( X I , . . . , X I , , ) and v = (X7,>+1, , X r z )( 7 n < 71,) t K tjllc?random vet.tors corrcsponiliiig: t,o tlir. first, ‘/r/ ~ , i i (t.he l last T I - m conipoiicnts of t,he ra.iitlo~n vect>orX . We ( x i tlrfiiicx the ptlf’s p c ~ ( . x . l . .. . n.,) and pv(x,+l, 1 1 ) ill t,liis ( m e as in Eqs. (1.58) and (1.63). Thcn, the con,ditionnE pdf of the 7andom uector V, ,9i,oen,U = (z1, . . . x,,,),is defirictl a.s ~
~
(1.73)
MOMENT CHARACTERISTICS OF RANDOM VECTORS
1.11
19
Moment Characteristics of Random Vectors
Let, (X,Y ) be a. bivariate discrete randoni vector concentrating on points (xi,y j ) with probabilities p i j = P { X = xi,Y = yj} for i , j = 1 , 2 , . . . . For any measurable function g(x,y), we ca.n find the expected value of Z = g ( X , Y) as (1.74) Similarly, if (X,Y ) has an absolutely continuous bivariate distribution with the density fiinction p ( x ,y) , then we have the expected value of Z = g ( X , Y) a5
1,JI
m o o
EZ
= Ey(X,Y) =
co d
z . Y ) P ( T Y)
Of coiirse, as in the univariate case, we say that E Z
Eqs. (1.74) and (1.75) exist if
dx dY. =
(1.75)
E g ( X , Y ) defined in
respectively. In particular, if g(x,y) = x k y e , we obtain E Z = E g ( X ,Y ) = EX"', which is said to be the product m o m e n t of order ( k , t ) . Similarly, the moment E ( X - EX)'(Y - EY)' is said to be the central product m o m e n t of order ( k , l ) , and the specia.1 ca,se of E ( X E X ) ( Y - E Y ) is called the covariance between X and Y and is denoted by Cov(X,Y). Based on the covariance, we can define another measure of association which is invariant with respect, t o both location and scale of t,he variables X and Y (meaning that it, is not affccted if the means and the varia.nces of the variables are changed). Such a. mea.sure is the correlation coeficient between X and Y and ~
It can easily be shown tjhat 1pl 5 1. If we are dealing with a general n-dimensional random vector X = ( X I ,. . ,X,,), then the following moment characteristics of X will be of interest t o us: the iricari vector m = (ml, . . . , m n ) ,where m,= E X , ( 1 = 1,. . . , n ) , the covariance matrix C = ( ( c ~ ~ , ) ) ~ , = where ~, a,, = a,%= Cov(X,,X,) (for z # 3 ) and o,,= Var(X,), and the correlation matrix p = ( ( p z 3 ) ) F 7 = where Ll pz, = a,,/,/-. Note that the diagonal elements of the correlation matrix are all 1.
20
PRELIMINARIES
Exercise 1.22 Find all distributions of the random variable X for which the correlation coefficicmt p ( X ,X 2 ) = -1. Exercise 1.23 Suppose that the variances of the random variables X arid Y are 1 and 4, respectively. Then, find tlic cxact upper and lower bounds for Var ( X Y ) . ~
1.12
Conditional Expectations
111 Section 1.10 we introduced conditiorial distributions in the case of discrete as well a s absolutely continuous niultivariate distributions. Based on those conditional distributions, we describe in this section conditional expectations. For this piirpose, let us first consider the case when ( X ,Y ) ha,s a discretr hivariate dist,ribution concentrating O K points ~ (xci,y j ) (for i , j = 1,2 , . . .), arid as before, let pi,] = P { X = z ~ , Y= y,} a,lld r3 : P { Y = y j } > 0. Suppose also t1ia.t E X exists. Then, based on the definition of the conditional distribution of X, given Y = y j , presented in Eq. (1.69), we readily have the conditional mea'n of X , given Y = y j , as
E (XIY
= yj) =
1 J ~ P { X= xilY =
Pi, -.
:yj} =
(1.76)
7'j
2
2
More generally, for any nieasurable fiinctio~it i ( . ) for which E h ( X ) exists, wt' have t,he condztaonal expectatton of h ( X ) , gzven Y = y, as
E { h ( X ) l Y = y3}
h ( x 2 ) P { X= .r,lY
=
h(.c,) %. (1.77)
= y,} =
z
2
1'3
Based 011 (1.76), we can introduce the conditional expectation of X , giToeri Y , denot,ecl by E ( X l Y ) , as a new random variable which takes on the value E ( X I Y = g j ) when Y ta.kes on t h value y j (for j = 1 . 2 , . . .). Hcnce, the conditiorial expt:cta.tion of X, givtm Y , its a random variablt: takes on va,lues
with probabi1itic.s r3 (for j = 1 , 2 , . . .). Consequently,
E {E(XIY))
=
1E (XIY J
= Y,)
WP 7'3
rcadily observe that
21
REGRESSIONS
Similarly, if the conditional expectation E { h ( X ) I Y } is a random varia.ble which takes on values E { h ( X ) I Y = y j } when Y takes on va,lues y j (for j = 1 , 2 , . . .) with probabilities r j , we can show that
E [ E{ h ( X ) I Y ) I = E { h ( X ) } . Next, let us consider the case when the random vector ( X , Y ) has an absolutely continuous bivariate distribution with pdf p ( z ,y), and let p y ( y ) be the margina.1 densit,y function of Y . Then, from Eq. (1.71), we have the conditional mean of X , given Y = YJ, its
provided that E X exists. Similarly, if h(.) is a measurable function for which E h ( X ) exists, we have the conditional expectation of h ( X ) , given Y = y, as x
E { h ( X ) I Y = y}
=
As in the discrcte case, we can regard E ( X I Y ) and E { h ( X ) I Y }as random variables which take on the values
when the random variable Y takes shown that
thc vitluc y. In this case, too, it can be
and
E { E ( X I Y ) }= E X
1.13
011
E [ E { h ( X ) l Y }= ] E{h(X)}.
Regressions
In Eqs. (1.76) and (1.78) wr dcfinecl the conditional expectation E ( X I Y = y) provided that E X exists. From this conditional expectation, we may consider the function a(y) = E ( X I Y = Y) 1
(1.80)
which is called the rcgression function of X on, Y . Similarly, when E Y exists, the function b ( ~=) E ( Y I X
(1.81)
= X)
is called the regression function, of Y on X . Note tha.t when the random va.riables X and Y are indepcndent, then u ( y ) = E ( X I Y = y) = E X
and
h ( z )= E ( Y I X
=
z) = EY
are simply the unconditional means of X and Y , and do not dcpentl on y a.nd
z, respectively.
PRELIMINARIES
22
1.14
Generating Function of Random Vectors
Let X = ( X I , .. . , X,) be a random vector, elements of which take on values 0 , 1 , 2 , . . . . In this case, the generatzng function P ( s 1 , .. . , s,) is defined as P(S1.. . .
, s,)
=
EST' . . . s:n
(1.82) Although the following properties can be presented for this general case, we prment them for notational simplicity only for thc bivariate case ( n = 2). Let Px,~(s,t), P x ( s ) , and P y ( t ) be the generating fiinctioii of the bivariate random vector ( X , Y ) , the marginal generating function of X , and the marginal generating function of Y, defined by
P ~ ( s )= EsX
=c
P { X
=j
}~',
(1.84)
j=O 00
Py(t)
=
EtY
=C
P { Y =k}P,
(1.85)
k=O
respectively. Then, the following propcrt,ies of easily:
Px,y(s,t ) can be establislicd
Px y(1.1) = 1; P x . Y ( s ,1) = Px(s)and P x . Y ( = ~P , Y~()~ ) ; P y , ) ; ( s . s ) = P x + Y ( s ) ,where Px+y (s) = E S " + ~is the gcncrating function of the variable X Y ;
+
PX.l( s , t )= Px(~)Py(t) if and only if X and Y are independent;
Next, for the random vector X f m c t i o n f ( t 1 , . . . , t,) a.s
=
(XI.. . . , X,,,),we define the ch,nro,cteristic
GENERATING FUNCTION OF R.ANDOM VECTORS
23
(1.86)
Similarly, in the case when the ra.ndom vector X = ( X I , .. . , X,) has an absolutely coritiriiious distribution with density fiinction p(x1,. . . , z,,), then its characteristic function f ( t 1 , . . . , tTL) is defined a s
f ( t l , .. ,t n ) ,
=
E
ei(~lX1+...+tnXn)
S,-L 0
-
M
ez( t 1s1 +...+t ,,z, ) ~
( z I ., .., x , )d ~ l. .. dx,.
(1.87)
Once again, althoiigh t,he following properties ca.n be presented for this genera,] n-dimensional ca.sc, we present, t,hem for notational simplicity only for tkic bivaria.te case (71 = 2 ) . Let f x , y ( s ,t ) , fx(s), arid f y ( t ) be the cha.racteristic fiiiiction of the bivariate raridoin vector ( X ,Y ) ,thc marginal charat ristic function of X, and the marginal cha.racteristic function of Y , tlefiried by
1,.I_, c
f X , Y (*%t )
fX(S)
= =
.I_,
u
m
=
&"Jl+tI/)
I.(
eiSZl)x
Lm 00
fY(t)
m
eitglpy
p x , y ( : c ; y )d z dy,
(1.89)
dz,
(1.90)
(y) d y ,
respectively. Then, the following properties of easily:
(13 8 )
f ~ , (s. y
t ) call be established
(a) fX,Y(0,O) = 1;
(b)
fX,Y
(.%0)
= .fx(.)
and
fX,Y(O,t)
fv(t);
(c) f x , y ( s ,s) = , f x + ~ ( s )where , f ~ + + y ( s=) Et,i"(X+') is t,he characteristic Y; fiinckion of the w.riable X
+
(d) f x , Y ( s , t ) = f x ( . s ) f y ( t if ) and only if
X and Y are independent;
24
PRELIMINARIES
Exercise 1.24 Let P ( s , t ) be the gerieratirig function of the random vector ( X . Y ) . T h m , find the grnclrat,ing function Q ( s .t , 7 ~ ) of the rantlorn vector ( 2 X 1. r;, 2Y).
+ x + 3x +
1.15
Transformations of Variables
a:r,
ax1 ~
~
3Yl
aY2
d:Yl
dYz
...
a22 _ _ _ _8x2 _
J =
a&
~
dyl
ax 1 __ 8YTl
852 __ dYrl
... 11, ~
dY2
...
ax,, ~
?Y,L
,
TR.ANSFORMATIONS OF VARIABLES
25
where lJ1 is the a.bsoliite valiic of the Jacobian of tlic transforma.t,ion. Once a.ga,in, the marginal ptlf o f m y subset, of the new va.riables may be obtained from (1.92) by integrating out the other variables. Note that, if the transformation is not one-to-one, hiit B is the union of a finite number of mutually disjoint spa.ces, sa.y B1,. . . , Be, then we can construct l sets of one-to-one transformations (one for each Bi)and their respect,ive Jxobians, and then finally express the tleiisity function of the vector Y = (Yl... . , Y,) as the sun1 of k' ternis of the form (1.92) corresponding t o B1,. . . , Bb.
This Page Intentionally Left Blank
Part I
DISCRETE DISTRIBUTIONS
This Page Intentionally Left Blank
CHAPTER 2
DISCRETE UNIFORM DISTRIBUTION 2.1
Introduction
The general discrete uniform distribution takes on k distinct values x1,x2, . . . , x k with equal probabilities l / k , where k is a positive integer. We restrict our attention here to lattice distributions. In this case, xj = a j h , j = 0,1, . . . , k-1, where a is any real value and h > 0 is the step of the distribution. Sometimes, such a distribution is called a discrete rectangular distribution. The linear transformations of random variables enable us to consider, without loss of generality, just the standard discrete uniform distribution taking on values 0 , 1 , . . . , k - 1, which correspond to a = 0 and h = 1. Note that the case when a = 0 and h = l / k is also important, but it can be obtained from the standard discrete uniform distribution by means of a simple scale change.
+
2.2
Not at ions
We will use the notation
if
P{X=a+jh}=and
1
k
x
for j = O , l ,
-
. . . ,k - 1 ,
DU(k)
for the corresponding standard discrete uniform distribution; i.e., D U ( k ) is simply D U ( k ,0 , l ) .
Remark 2.1 Note that if
Y
N
D U ( k , a ,h )
and
29
X
N
DU(k),
DISCRETE UNIFORM DISTRIBUTION
30 the11
x=-Y -h a d
d
and
-
d
Y =a+hX,
where = denotes “having the same distribution” (see Definition 1.5). More generally, if Y1 D U ( k ,ul,h l ) and Y2 D U ( k ,a2, h2), then N
where c = 111 1112 and d = a1 - a2 hl lh2. This means that the random variables Y 1 a.nd Yl belorig to the same type of distribution, depending only on the shape pa,rarneter k , a.nd do not depend on location (a1 and u2) and scale (hl and ha) pa.ra.meters. Discrete uniform distributions p1a.y a na.tiira.lly important role in nmny classical problems of probability theory that, t1ea.l with a random choice with equal probabilities from a finite set of k items. For example, a. lottery machine contains k halls, riurnbercd 1 , 2 , . . . , k . On selecting one of these balls, we get a. random number Y which has the D U ( k , 1 , l ) distribution. This principle, in fact, allows 11s to genera.te tables of random numbers used in different sta.tistic:al simulations, by ta,king k sufficient,ly large (say, k = lo6 or 232). For t h rest of this chapter, we deal only with the standa,rddiscrete uniform D U ( k ) distribut>ion.
2.3
Moments
Wc will now determine tlic rnorn(wts of X X has a finite support.
N
D U ( k ) , all of which exist since
Moments nhout zero:
. k-l
T=O
( k - 1)(2k - 1)(3k2 - 3k 30
-
1)
(2.4)
MOMENTS
31
To obtain the expressions in (2.1)-(2.4), we liavc used the following wellknown identities for sums:
cr=--k-1
3
r=O
k-1
c . 3= r=O
k-1
C.2
(k - l)k
( k - l)"2 4
,
'
=
r=O
cr4
k-1
and
( k - 1 ) k ( 2 k - 1) 6
1
( k - l)k(2k
=
r=O
-
1)(3k2 - 3 k 30
-
1)
,
sce, for exarnplt, Gradshteyn and Ryzhik (1994, p. 2). Note that an N k"/(n
+ 1)
as k
+ cm.
(2.5)
Central momen,ts: The variancc? or the second central moment is obtained from (2.1) a.nd (2.2) as
The third cmtral rnomciit, is obtaiiird from (2.1) (2.3) as ,!)3
E(X
- 0!1)~
=
Q3 -
3CYzCy1
+ 2N:
At, first, (2.7) ma.y seem surprising; but once we realize that X is symmetric a1 = ( k - 1 ) / 2 , (2.7) makes perfect sense. In fact, it is about its mean va.11~ easy t o see that (A- - 1 - X ) and X take on the same values 0 , 1 , . . . , k - 1 with equal proba.bilities 1/k. Therefore, we have
x
-
d
N1 = a1
-
x,
and consequently,
which simply implies that /?2?.+l =
0,
r
=
1 , 2 , . .. .
Fuctoriul m,omen,ts of positive order pr
=
EX(X--l)...(X-r+l)
(2.8)
DISCRETE UNIFORM DISTRIBUTION
32
It is easily seen that p7
-
__ -
= 0 for T
2 k,
and
1 7n! k 711=1’ (rn - T ) !
-
( k - l)(k-2)...(k-r)
forr-1.2 ,.... k - 1 .
T + l
In deriving the last expression, we have used the wcll-known conibiiiatorial identity
In particular, we have I-11
=
a1
=
k-1 2 ’
(2.9)
~
(2.10) p3
=
a3
(k
-3a.L+2a1=
-
l ) ( k - 2 ) ( k ~-3 ) , 4
(2.11)
and pk-2
=
Pk-1
=
( k - 2)!, ( k - l)!
k
(2.12) (2.13)
Fuctorinl rriomerits of negatiue order:
p-,.
=
E
{(X +
1
1). . . ( X
+
T)
In pa.rt,icular, (2.14) and
(2.15)
GENERATING FUNCTION AND CHAR.ACTERISTIC FUNCTION
33
(2.16) -=
p-y
1 k-l k m=O ( m
-
1
+ l ) ( m + 2)(m + 3) 1
m+1
2k -
2.4
m +2
m+3
k+3 4(k+ 1)(k+2)'
(2.17)
Generating Function and Characteristic F'unction
The generating function of D U ( k ) distribution exists for any s and is given
bY
px (.s) = E s X
For s
#
=
1 k-l
-
k
5'
r=O
1, it can be rewritten in the form
Px(s)=
1 - SIC k(l - s)'
For any k = 1.2. . . . , Px ( s ) is a polyriornial of ( k difficult t o see that its roots coincide with s3
j
= exp(2?rij/k),
(2.18)
~
=
~
1)th degree, arid it is not
1 , 2 , . . . ,k - 1 for k
> 1.
This readily givrs us thc following form of the gmcrating function:
Another form of the generating function exploits the hypergeometric func-
tion, defined by
2F1[n,h;(.;x] =
1
a ( a + l)b(b + 1)x2 + abz + el! c(c + 1)2! ~
The genera.ting function, in terms of the hypergeometric function, is given by
Px(s)= zF1[-n
+ 1,1; + 1;s]. --7i
(2.21)
34
DISCRETE UNIFORM DISTRIBUTION
Sirice the characteristic function and the generating function for lionnegative intcger-valued random variables sa.tisfy the relation
f x ( t )= ~ e x p ( i t X = ) Px(eLt), if we change s by eit in Eqs. (2.18), (2.19), and (2.21), we obtain the corresponclirig expressions for the chara.ct,crist,icfiinct,ioii f x ( t ). For example, from (2.18) we get, (2.22)
Convolutions
2.5
-
Let, 11s t,akc two independent random variables, bot,h having discretc iiniform distributions, say, X D U ( k ) and Y D U ( T ) , k 2 r (without loss of a.ny generality). Tlien, what ca.n we say about the distribution of t,he sun1 2 = X Y ? The distribution of 2 is called the convolution, (or composition) of the two initid distributions. N
+
+
Exercise 2.1 It is clear that 0 5 2 5 k r - 2. Consider the tlircc tliff(wnt sit,iiat,ioris, aiid prove that P { Z 5 m } is given bv (a)
(b) (c)
(77)
+ l ) ( m + 2)
if O < n z < r - l , 2kr 2m -r 3 if r - l < m < k - l . 2k ( T + k - 2 - m ) ( T + k - 1- m ) . 1- if k < 7 r i < k + r - 2 . 2kr
+
(2.23)
From ( 2 . 2 3 ) , we readily obtain
( 0
ot herwisc.
One can see now that r = 1 is the only case when the corivolution of two discrc%curiiforiri D U ( k ) and D U ( T )distributions leads t o the same distribution. Note that in this situation P { Z = 0) = 1, which nieaiis that Z has a degenerate distribution. Nevertheless, it turns out that convolution of more general rioridegenerate discrctc uniform distribiitions may bdorig to tht. s a m , set of distiibutions.
DECOMPOSITIONS
-
35
-
Exercise 2.2 Suppose that Z DU(r,O,s) and Y D U ( s ) , where T = 2 , 3 , . . . , s = 2 , 3 , . . . and that Y and 2 are independent random variables. Show then that U = Y 2 D U ( T S ) .
+
N
-
d
Remark 2.2 It is easy to see that Z = s X , where X D U ( r ) . Hence, we get another equivalent form of the statement given in Exercise 2.2 as follows. If X D U ( T )and Y D U ( s ) , then the sum s X Y has the discrete uniform D U ( s r ) distribution. Moreover, due t o the symmetry argument, we can immediately obtain tha.t the sum X rY also has the same D U ( s r ) distribution.
-
+
N
+
2.6
Decompositions
Decomposition is an operation which is inverse to convolution. We want to know if a certain ra.ndom variable can be represented as a sum of at least two independent random variables (see Section 1.7). Of course, any random variable X can be rewritten as a trivial sum of two terms a+ ( X - a ) , the first of which is a degenerate random variable, but we will solve a more interesting problem: Is it, possible, for a certain random variable X , to find a pair of nondegenera.te independent random variables Y and 2 such that
Consider X
T
N
D U ( k ) ,where k is a compound number. Let k
= rs,
where
2 2 and s 2 2 are integers. It follows from the statement of Exercise 2.2 that
X is deconiposable as a sum of two random variables, both having discrete uniform distributions. Moreover, we note from Remark 2.2 that we have at least two different options for decomposition of D U ( r s ) if T # s. Let k be a. prime integer now. The simplest ca.se is k = 2, when X takes on two values. Of course, in this situa.tion X is indecomposable, because any nondegerierate random variable takes on a t least two values and hence it is easy to see that any sum Y 2 of independent nondegenerate random variables has at least three values. Now we can propose that D U ( 3 ) distribution is decomposable. In fact, there are a lot of random variables, taking on values 0, 1, a.nd 2 with probabilities p o , p 1 , and p 2 , that can be presented as a sum Y + Z , where both Y and 2 take on values 0 a.nd I, probably with different probabilities. However, it turns out that one ca.n not decompose a random variable, taking three values, if the corresponding probabilities are equal (PO = 1 Pl = p 2 = 3).
+
36
DISCRETE UNIFOR,M DISTRIBUTION
Exercise 2.3 Prove that D U ( 3 ) distribution is indecomposable.
In the general case when k is any prime integer, by considering the cmresponding generating function
we see that, the problem of decomposition in this c;ase is equivalent to the following problcm: Is it possible to present P x ( s ) as a product of two polynomials with positive coefficients if k is a prime? The nega.tive answer was given in Krasncr a.nd Ranu1a.c (1937),and independently by Raikov (19374. Sumnmrizing all these, we ha.ve the following result,.
Theorem 2.1 T h e discrete u n i f o r m D U ( k ) distribution (for k > 1) i s indecornposable iff k i s a p r i m e integer.
It is also evident that X D U ( k ) is not infinitely divisible when k is a prime nurnber. Moreover, it is known that any distribution with a. finite support ca,iinot be infinitely divisible (see Remark 1.6). This mea.ns that any D U ( k ) distribution is not infinitely divisible. N
2.7
Entropy
-
Frorn the definition of the entropy H ( X ) in (1.38), it is clear that the entropy of any D U ( k .a , 11.) distribution depends only on k . If X D U ( k ) ,then
H ( X ) = log k.
(2.25)
It is of interest to iiieritiori liere that among all the random variables taking on a t most k values, any random variable taking on distinct values T I , 2 2 . . . . . x k with probabilities p J = l / k ( j = 1 , 2 , . . . , k ) has log k t o bc the niaximurn possiblc valiie for its entropy.
2.8
Relationships with Other Distributions
The discrete uniform distribution f o r m the ba.sis for the derivat,ion of many distributions. Here, we present some key corinect,ions of the discrete uniform distrihiition t o sornc othcr distributions: (a) We ha.ve a.lrea.dy mentioned that DU( 1) distribution ta.kes on thr: value 0 with probability 1 a.nd, in fa.&, coincides with the degenerate disth-it)iition, which is discussed in Chapter 3.
RELATIONSHIP WITH OTHER DISTRIBUTIONS
37
(b) One more special case of discrete uniform distributions is D U ( 2 ) distribution. If X D U ( 2 ) , then it takes on values 0 and 1 with equal probability (of This belongs to the Bernoulli type of distribution discussed in Chapter 4.
i). N
(c) Let 11s consider a sequence of random variables X , DU(n), n 1 . 2 , . . . . Let Y, = X,/n. One can see that for any n = 1 , 2 , . . . , N
Y, DU ( n ,0, N
=
A)
arid it takes on values 0, l / n , 2 / n , . . . , ( n l ) / n with equal probabilities l / n . Let us try to find the limiting distribution for the sequence Y1, Y2,.. . . The simplest way is to consider the characteristic function. The characteristic function of Y, is given by ~
gy,%(t)= Eexp(itY,)
= Eexp{i(t/n)X,}
= f,(t/n),
where f n ( t ) is the cha.racterist#ic function of D U ( n ) distribution. Using (2.22), we readily find that
(2.26) It is not difficult to see now that for any fixed t ,
gY,(t) + g ( t )
as
-+ 00,
where
(2.27) The RHS of (2.27) shows that g ( t ) is the characteristic function of a continuous distribution with probability density function p(x),which is equal to 1 if 0 < 2 < 1 and equals 0 otherwise. The distribution with the given pdf is said to be uniform U ( 0 ,l ) ,which is discussed in detail in Chapter 11. From the rorivrrgence of the corresponding characteristic functions, we can immetliatrly concludc that thc uniform U ( 0 , l ) distribution is the limit for the sequence of random variables Y, = X n / n , where X , DiY(n). Thus, we have constructed an important “bridge” between continuous uniforni and discrete uniform distributions.
-
This Page Intentionally Left Blank
CHAPTER 3
DEGENERATE DISTRIBUTION 3.1
Introduction
Consider the D U ( k ) distributed random variable X when k = 1. One can see that X takes on only one value 20 = 0 with probability 1. It, gives an exa.mple of the degenerate distribution. In the general case, the degenerate random variable takes on only one value, say c, with probability 1. In the sequel, X D ( c ) denotes a ra,ridorn va.riable having a degenerate distribution concentrated a.t the only point c, -00 < c < 00. Degeneratr distributions a.ssume a specia.1 place in distribution theory. They can be included as a special case of many families of probability distributions, such as normal, geometric, Poisson, and binomial. For any sequence of random va.riables XI, X z , . . . and any arbitrary c, we can a.lways choose sequences of normalizing constants ana.nd ,Onsuch that the limiting distribution of ra.ndoni variables N
will become the degenerate D ( c ) distribution. A very important role in queueing theory is played by degenerate distributions. Kendall (1953), in his classification of queueing systems, has even reserved a special letter to denote systems with constant interarrival tirnes or constant service times of customers. For example, M / D / 3 means that a queueing system has three servers, all interarrival times are exponentially distributed, and the service of each custonier requires a fixed nonrandom time. It should be mentioned that practically only degenerate ( D ) ,exponential ( M ) , and Erlang ( E ) have their own letters in Kendall’s classification of queueing systems.
3.2
Moments
Degenerate distributions have all its moments finite. Let X following discussion.
39
N
D(c) in the
DEGENERATE DISTRIBUTION
40
Moments about zero: a, = EX"
n
= cn,
=
1,2.. .. .
(3.1)
In part,icular, QI=
EX
(3.2)
=c
and a2 =
E X 2 = C2.
(3.3)
The varianw is = Var
x = a2
(1:
~
(3.4)
=0
Note that (3.4) characterizes degenerate distributions, meaning that they are thc only dist,ribiitions having zero varia.nce.
Ch,aracte:ristic function: f x ( t ) = Ee LtX
~
-
and, in particular,
C
,
(3.5)
if c = 0.
f x ( t )= 1
3.3
p
Independence
It turns out, that any random variable X, having degenerate D ( c ) dist,ribution, is independent of any arbitrarily choscn random variable Y . For observing this, we must, chcck that for any 2 and y,
P{X
I z,
Y 5 y}
= P{X
Equality ( 3 . 6 ) is evidently true if x
P { X 5 z, Y 5 y} If z
5 z } P { Y I Y}.
< c, in which
=0
(3.6)
ca.se
and P{X 5
.E} =
0.
2 c, then P{X 5 x }
=1
and P{X 5 2 ,
Y L v } = P{Y 5 11).
and wc scc that ( 3 . 6 ) is once again truc.
Exercise 3.1 Let Y = X. Show that if X and Y are iridcpendrrit, thcn X has a degrricratc distribution.
DECOMPOSITION
3.4
41
Convolution
It is clear that the convolution o f two degenerate distributions is degenerate; that is, if X D ( q ) and Y D(c2), then X Y D(cl ca). Note also that if X D ( c ) and Y is an arbitrary random variable, then X Y belongs t o the same type of distribution as Y . N
+
N
+
N
N
+
Exercise 3.2 Let X I and X2 be independent random variables having a common distribution. Prove t1ia.t the equality
XI implies that X1
N
+ x2 = XI d
(3.7)
D ( 0).
Remark 3.1 WP see that (3.7) characterizes the degenerate distribution concentrated at zero. If wc take X I + c iristead of X1 on the RHS of ( 3 . 7 ) ,we get a characterization of D ( c )distribution. Moreover, if X I ,X2,. . . are iridepeiident arid identically distributed random variables, then the equality x1+. ' .
+ Xe = XI + . . . + XI, + c, d
gives a. characterizat,ion of degenerate D
3.5
15 k
< e,
(
k ) distribution.
Decomposition
There is 110 doubt that any degenerate random variable can be presented only as a sum of degenerate random variables, but even this evident statement needs to be proved.
Exercise 3.3 Let X and Y be independent random varia.bles, and let X $- Y have a degenerate distribution. Show then tha.t bot,h X and Y a.re degenera.te.
It is interesting t o observe that even such a simple distribution a.s a degenerate distribution possesses its own special properties and also assumes an important role among a very large family of distributions.
This Page Intentionally Left Blank
CHAPTER 4
BERNOULLI DISTRIBUTION 4.1 Introduction The next simplest case after the degenerate random variable is one that takes on two values, say z1 < 52,with nonzero probabilities p l and p 2 , respectively. The discrete uniform D U ( 2 ) distribution is exactly double-valued. Let us recall that if X DU(2), then N
P{X
= O} = P { X =
1 1) = -. 2
(4.1)
It is easy to give a.n example of such a random variable. Let X be the number of heads tha.t have appeared after a single toss of a. bahnced coin. Of course, X can be 0 or 1 and it satisfies (4.1). Similarly, unbalanced coins result in distributions with
P{X
= 1) = 1 - P { X = 0)
=p,
(4.2)
where 0 < p < 1. Indeed, a false coin with two tails or heads can serve as a model with p = 0 or p = 1 in (4.2), but in these situations X has degenera.te D ( 0 ) or D(1) distributions. The distribution defined in (4.2) is known a.s the Bernoulli distribution.
4.2
Notations
If X is defined by (4.2), we denot,e it by
x
N
Be(p).
Any random variable Y taking on two values zo
P{Y = 2 1 )
< z1 with probabilities
= 1- P { Y = 5 0 } = p
can clearly be represented as
Y
= (51
-
z0)X
43
+
50,
(4.3)
44
BERNOULLI DISTRIBUTION
-
B e ( p ) . This means that random variables defined by (4.2) a.nd where X (4.3) have the same type of distribution. Hence, X can be called a random variuble of the Bernoulli B e ( p ) type. Moreover, any random variable ta.king on two values with positive probabilities belongs to one of the Bernoulli types. In wha.t follows, we will deal with distributions satisfying (4.2).
4.3 Moments Let X B e ( p ) , 0 < p < 1. Since X has a finite support,, we can guarantee the cxistrricc o f all its moments. N
M o m m t s about zero:
Exercise 4.1 Lvt X have a noridcgcncrate distribution and
E X 2 = EX" = E X 4 .
< p < 1, such
Show then that therc exists a p , 0
x
N
that
Be(p).
Variance: ~2 = Var
It is c1ea.r t1ia.t 0 < [32 5
x = (12 04 -
=p(1-
(4.5)
p).
$, and i j 2 attains its maximum when p
=
Gentrul moments:
From (4.4), we readily find t,ha.t /jTl= p ( 1
--
+( - ~ ) ~ p ~ - l }
p ) { (1-
for n
2 2.
The expression of the va.ria.ncc in (4.5) follows from (4.6) if we set
(4.6) 7t =
2.
M e ( ~ s ~ r eof. 9 skewness n n d kurtosis: Frorri (4.6), we find Pearson's coefficient of skewness as =
h j p
-
1 - 2p (p(1 p ) } ' P -
This exprcwion for rea.dily reveals that the distribution is negatively skewed when p > and is positively skewed when p < $. Also, y1 = 0 only w1it:ri p = (in t,liis case, the distribution is also symmetric).
CONVOLUTIONS
45
Sirnilarly, we find Pearson's coefficient of kurtosis as
i]
This expression for 7 2 [due to the fact rioted earlier that /& = p(1 - p ) 5 readily implies that 7 2 2 1 and that 7 2 = 1 only when p = Thus, in the case of B e ( ; ) distribution, we do see that, 7 2 = 7: 1, which rnearis that the inequality presented in Section 1.4 cannot, be improved in general.
4.
+
Entropy: It is easy t o see that (4.7)
H ( X ) =-plogp-(1-p)log(l-p). Indeed, the maximal value of H ( X ) is attained when p equa.ls 1.
=
i, in which case it
Characteristic function: For X Be(p),0 < p < 1, the characteristic function is of the form N
f x ( t ) = (1- p ) +peit.
(4.8)
As a special case, we can also find the characteristic function of a random variable Y , taking on values -1 and 1 with equal probabilities, since Y can be expressed as Y = - 1, where X
-
2x
B e ( ; ) . It is easy to see that f y ( t ) = cost.
4.4 Convolutions Let X I , X 2 , . . . be independent and identically distributed B e ( p ) randorn variables, and let Yn=X1+...+X, . Many methods are available to prove that (4.9)
We will u5e here the characteristic function for this purpose. Lrt gT1,f l . . . . , f n bc the characteristic functions of Y,, XI,. . . , X,, resprctively. From (4.8),we have f k ( t )= ( ~ - p + p e " ) , 1 c = 1 , 2 , . . . , n.
BER.NOULL1 DISTRIBUTION
46 Then, we get
(4.10)
One can see that the sun1 on the RHS of (4.10) coincides with the characteristic function of a discrete random variable taking on values ni ( 7 7 ~ = 0 , 1 , . . . , n ) with probabilities
Thus, the probability distribution of Y, is given by (4.9). This distribution, called the binomial distribution, is discussed in deta.il in Chapter 5. Fiirtlier, from Chapter 3, we already know tha.t a,ny X Br(p) is indecomposahle.
-
4.5
Maximal Values
We have seen above that sums of Bernoulli raridorn variables do not have the same tvpe of distribution as their suminands, but Bernoulli distributions are stable with respect to another operation.
Exercise 4.2 Lct,
be indepeiiclmt random variablcs, and lct
M,z = niax(X1,. . . ,X,,}.
Quite oftrn, Bernoiilli raridorn variables appear as randoin iridicat ors of different events.
RELATIONSHIP WITH OTHER DISTRIBUTIONS
47
Example 4.1 Let us consider a. random sequence ~ 1a2, , . . . , a, of length n, which consists of zeros and ones. We siipposc t1ia.t ( 1 1 . 0 2 , . . . , a,, a.re independent ra.ndoni variables taking 011 va.lues 0 and 1 with proba.bilities T arid 1- T , respectively. We say that a peak is present a,t point k ( k = 2 , 3 , .. . , n - 1) if ai;-1 = a k + l = 0 arid a!, = 1. Let Nk be the total number of peaks prcisent in the sequence a l , ~ 2 : .. . , a,. What is tlie expected value of N,? To find EN,, let us introduce tlie events Ak = ( ( ~ k - 1 = O, ai;
=
1.
ak+l =
0}
and random indicators
Xk
=
k
l{Ak},
=
2 , 3 , . . ..TL
-
1.
Note that Xi; = 1 if A!, happens, and Xi; = 0 otherwise. In this cast’. P{xk=1}
=
l-P{Xk=O}
=
P{A!,} P{Q&l
=
= 0,
-
and
XA- Be(pj,
a!, = 1,
k
= 0} = (1 - T ) T 2
ai;+1
= 2 , 3 , .. . , n
-
1,
where p = (1 - r)?. Now, it, is easy to see that
EX!,
=
(1 - r ) r 2 ,
arid
ENk. = E ( X 2
k
= 2 , 3, . . . :R.-
+ . ’ . + Xn-l) = ( n
-
1,
2j(1 - T ) r2 .
In a.ddit,ion t o the classical Bernoulli distributed random variables, there is one more cla.ss of Bcrrioulli distributions which is often encountered. These iiivolve random va.riab1t.s Y l ,Yz,. . . , which ta.ke on values *l. Based on these random va.riables, the slim Sk = Yl . . . Y, (71. = 1 , 2 , . . .) foriiis different discrete raiitloiri walks on tlie integer-valued httice and result in some interesting prolmbility problrtnis.
+
4.6
+
Relationships with Other Distributions
(a) We have sliowii that convolutions of Bernoulli distributions give rise to biriornial distribution, which is discussed in detail in Chapter 5.
(b) Let X1,X2.. . . be independent B P ( ~ 0) ,< p < 1, raridoin variables. Introduce a n ~ w random variable N a s
N
= min{.j : X,+, = O};
that is, N is simply tlic. iiuinber of 1’s in the s(qiieiicc~X1, X 2 , . . . that precede tlic’ first zmo.
BERNOULLI DISTRIBUTION
48
It, is easy to see tha.t N can take on values 0, 1, 2, . . ., and its probability distribution is
P{N
= 7L)
= = =
x,,
P{X1 = 1, x,= 1,.. . , = 1, Xn+l = 0) P { X 1 = 1}P{X2= 1). . . P{X,,, = l)P{X,,+1 (l-p)p", n=0,1,....
= 0)
This dist,sibiition, called a, geometric distribution, is discussed in drta.il in Chapter 6.
CHAPTER 5
BINOMIAL DISTRIBUTION 5.1
Introduction
As shown in Chapter 4, convolutions of Bernoulli distributions give rise t o a new distribution that takes on a fixed set of integer values 0,1, . . . , n with probabilities p , = P { X = m} =
(3
m
p m ( l -p)"-'",
= 0, I , . . . , n ,
(5.1)
where 0 5 p 5 1 and n = 1 , 2 , . . .. The parameter p in (5.1) can be equal t o 0 or 1, but in these situa.tions X has degenera.te distributions, D ( 0 ) arid D ( n ) , respectively, which a.re naturally special ca,ses of these distributions. The probability distribution in (5.1) is called a binomial distribution.
5.2
Notations
Let X have the distribution in (5.1). Then, we say that X is a binomially distributed random variable, and denote it by
X
-
B(72,p).
We know that linear transforinations
Y=u+hX,
-co
h>0,
have the sa.iiie type of distributions and, hence, we need t o study only the st,andard distribution of the given type. If X sa.tisfies (5.1) and Y = a, h X , then Y takes on values a , a h, . . . , a nh and
+
P{Y
y
u + mh,} =
J:(
+
p"'(1
-
p),-"',
m
= 0 , 1 , . . . ;n.
+
(5.2)
We say that Y twlongs t o the binomial type of distribution, and denote it by
y
B ( 7 b , P ,a,h ) ,
a a.nd h > 0 being location a.nd scale parameters, respectively. 49
50
BINOMIAL DISTRIBUTION
5.3
Useful Representation
As wc know froin Chapter 4, binomial random varia,bles c m i be expresscd as slims of independent Bernoulli random variables. Let 21,Zz. . . . be independent, Bernoulli B e ( @ )randorri variables a i d X B ( n ,p ) . Then the followiiig equalit,y holds for ally n = 1,2 , . . . : N
x= d
21
+z,+...+z,.
(5.3)
Due t o (5.3), we (mi ea.sily obtain generating function aiid charact,eristic fuiictioii of binoiniad random va.ria.blesfrom the corresporidiiig expressions of Brrnoulli randoin variables.
5.4
Generating Function and Characteristic Function
-
Let, X B ( n , p ) .It follows froin (5.3) a d the independence of Zi's that the geiiera.ting function of X is
p, (s)
EcsX
~
=
~
E8Z1+."+Zn
~
E Q B I E S Z Z , . . E,s"r
(5.4)
( P z ( s ) ) "= (1 - p + p * s ) ' " ,
+
wlicrc. Px(.s) = 1 p p s is t,he coininon generating furict>ionof the Bernoulli random va.riables ZI, 22,. . . 2,. Lct, f x ( t ) he the c1ia.racteristic fuiictioii of X. Frorii (5.4), we readily obt,ain -
A\ a rorisqueiice, if Y
5.5
B ( n ,p . a , h ) , tlieri Y
d
=a
+ hX
and
Moments
Equdit 1 (5.3) irnmtdiatcly yiclds
as
wc4l a s
Otlicr inonleiits of X caii h found by differentiating the generating finiction in (5.4).
MOMENTS
51
Factorial moments of positiue order: pk
1 ) .. . ( X
=
EX(X
=
n(n-l)-+-k+l)&
-
-
k
+ 1) = P$'(l) k=1,2 )....
(5.9)
In particiilar, we have =
lL2
=
pLy
=
71p,
(5.10)
n(n - l)p", 4 7 1 . - 1)(n- 2)p3,
(5.12)
= n! p n .
(5.13)
(5.11)
and
pn Note that
pn+1 = p n + 2 =
. . . = 0.
(5.14)
Factorial moments of nxgutive order: For any k = 1 , 2 , . . . ,
Exercise 5.1 Show that we rari use the equality i1-k
= P!p)(l),
k
=
1,2... . ,
under the assumption that we consider the derivative of a negative order - k as the fdlowirig integral:
Now we can use (5.15) t o find p-l.l
=
and
p-2
.I, (1 - p + p s ) n
as follows:
ds
(5.16)
BINOMIAL DISTRIBUTION
52
p-2
E ( (X
1'
= =
1
~ ~
-
+ 1)(X l + 2) )
l ( 1- p + p ~ )ds~ d t
+
(1 -p)"+'{l+ (71 l)p} (71 1)(n 2)p2
+
+
(5.17)
Moments about zero: Relatioils (5.10)-(5.13) readily imply that
and
Ce71 trul m omP 71 ts:
From (5.18) a i d (5.19)' we obtain [see also (5.8)] the variance as
Var X
(5.22)
= Bj2 = 7 ~ p1( - p ) .
Siinilarly, we find from (5.20) and (5.21) the third arid fourth central niomcnts as p3
=
a.-j
-
302Ql
+2
4
= np(1
-
p)(l
-
2p)
(5.23)
and
S h n p ~chuructertstzcs: Froin (5.22)- (5.24), we find Pcarsoii'b coefficients of sk(>wriessand kurtosis as (5.25) and
(5.26)
MAXIMUM PROBABILITIES
53
respectively. From (5.25), it is clear that the binomial distribution is negatively skewed when p > a.nd is positively skewed when p < f . The coefficient of skewness is 0 when p = (in this case, the distribution is also symmetric). Equation (5.25) also reveals that the skewness decreases as n increases. Furthermore, we note from (5.26) that the binomial distribution may be leptokurtic, mesokurtic, or platykurtic, depending on the value of p . However, it is clear that y1 and yz tend t o 0 and 3, respectively, as n tends t o co (which are, as mentioned in Section 1.4, the coefficients of skewness and kurtosis of a normal distribution). Plots of the binomial mass function presented in Figures 5.1 and 5.2 reflect these properties.
i
_
_
_
_
~
Exercise 5.2 Show that the binomial B ( n , p ) distribution is leptokurtic for p < platykiirtic for
$
4 (1 5 ) or -
p
>i
(1
+ &) ,
arid mesokurtic
5.6
Maximum Probabilities
Among all the binomial B ( n ,p ) probabilities,
m
p)”-”,
= 0 , 1 , . . . , n,
it will be of interest t o find the rnaxirnum values. It appears that there are two different cases: (a) If m* = ( n
+ 1)p is an integer, then p7n*-l
= p,,,.
=
max p,.
O<m
(b) If m* is not an integer, then p[,.1 is the only ma.xirnum in the sequence yo,. . . , p 7 L where , [m*]is the integer part of m* = ( n 1)p.
+
BINOMIAL DISTRIBUTION
54
Bin(lO,O.25) 03 4
0
02
n
n
5
10
rn
Bin(10,0.5) 0 25 0 20
0 -
g
015 010 0 05
000-.
I 5
10
m
Bin(10,0.75)
5
10
m
Figure 5.1. Plots of t)inoniial ma.ss function when
71 =
10
NIAXIMUM PROBABILITIES
55
Bin(20,O .25)
0
10
20
m
02
Bin(20 ,O .5)
1
0
10
20
m
Bin(20 ,O .75)
0
10
20
rn
Figure 5.2. Plots of hirioinial mass function wlirw n
20
56
BINOMIAL DISTRIBUTION
5.7
Convolutions
Let XI. X2.. . . be iiidcpendent and identically distributed B(rL,p) randoni variables, and SAT =
Xi + X ,
+ ... + X N ,
N
=
I,2,. ., .
For finding the distribution of SN,we may recall relation (5.3) to write
xk
d
Zlk
+ Z Z k + + Znk,
k = 1 , 2 , ., . , N ,
" '
(5.27)
where all 2 ' s in (5.27) are independent and have Bernoulli B e ( p ) distxibution. Then, (5.28) is tht. sum of nN independent and identically distributed Bernoulli raiidorn variables, and therefore,
SN
-
(5.29)
B(nN,p).
Another way of establishing (5.29) is through charactcristic functions.
Exercise 5.3 Using characteristic functions, show that the convolution of N binomial B ( n ,p ) distributions results in binomial B ( n N ,p ) distribution.
5.8
Decompositions
We know from (5.3) that any binomial B(n.p) randoin variable can be expressed a5 a siini of n independent terms, each having a Bernoulli Be@) distribution. This simply means that any B ( n ,p ) distribution is decorriposable for 0 < p < 1 and n 2 2. Moreover, X B ( n , p )can be represented as a sum U + V of two binomial iandom variables U B ( m ,p ) and V I3 (n m. p ) . Since U V = (U - a ) ( V a ) for any -00 < a < m, we can get for X a Inow gencral decomposition in terms of binomial distributions. In fact,
+
+
- -
+
-
-
for any pair of independent random variables U and V , where
U
-
B ( m , p ,a , l ) ,
V
-
B ( n - m , p , -a,l ) ,
(5.30)
describes a set of all possible decompositions of the binomial B(71,p) distribution.
57
MIXTURES
Exercise 5.4 Show that if X B ( n , p ) , n 2 2, 0 < p < 1, then it can be decomposed into components of binomial type only as in (5.30). N
Furthermore, since any binomial distribution has a finite support, it is not, infinitely divisible.
5.9
Mixtures
Let us consider a sequence of binomial random variables X , B ( n , p ) ,n = 1 , 2 , . . . . Let one of these random variables be randomly selected. It means that we have a new random variable N
where N is an integer-valued random variable which does not depend on X ' s . We are now interested in the distribution of Y . The distribution of Y is a mixture of binomial distributions since it can coincide with the distribution of the random variable X,, with probability P { N = n}. Of course, Y can take on values 0 , 1 , 2 , . . . , and we wa.nt to find the probabilities pTn = P { Y = m}, m = 0 , 1 , 2 , . . . . These probabilities depend on the distribution of N . Suppose that N B ( M ; q )is also a binomial random variable. Since N can take 011 zero value, let us define Xo = 0. Then, the theorem of total probability enables us to obtain the necessary probabilities as follows: N
pm
=
P{Y
= 71Lj =
P{XN
= rn}
A4
=
CP{X,= n =m
AI
P { X n = mlN
=
= n}P{N = n j
n=m
M
=
C P{X,,
= m}P{N = nj.
(5.31)
n=m
In (5.31)' the independence of X ' s arid N has been used. As the distributions of X ' s and N are known to us, we find that
BINOMIAL DISTRIBUTION
58
(5.32)
It follows from (5.32) that Y
5.10
N
B(M,pq).
Conditional Probabilities
-
Consider the following situation. Let. X I B ( n , p ) and Xa B ( r n : p ) be independent raiidorri variables. Then, we have already seen that, Y = X I + Xa has a. binoiriial B ( m + n , p )dist,rihution. Suppose it, is known tha.t Y has some fixed va.lue T , 0 5 T 5 nzfn. Then, we are interested in finding the conditional distribution of X I , given that Y = r. For this, we need t o find the conditiona.1 probabilities P { X 1 = t / Y = T } for all integers t in the interval max(O, r For c.xamplr, if r that
-
N
ni) 5 t 5 min(71,r ) .
(5.33)
+ m, wc must consider only the value t = n arid get P ( X 1 = nlX1+ x,= n + m } = 1.
= 7z
Now, for any t satisfying ( 5 . 3 3 ) ,we find that
P{XI
= t/Y = r }
=
P ( X 1 = t , Y = ,r} P{Y = r }
(5.34)
It is curious that the RHS of (5.34) does riot depend on p . Let us now consider t,he following classical problem from combinatorics: A box contains 7~ red a.nd n~bla.ck balls. A random selection of T ba.lls is made from this box. What is the probability of getting exactly t red balls in our sample'! It is easy t o see that this probability coincides with the RHS of (5.34). To continue the problem, introduce a random variable N for the niirriber of red balls taken from thc box. Indeed, N takes on the values in (5.33) with proba.bilities as iii (5.34). Thiis, the conditional distribution of X I , given t1ia.t X I X 2 is fixed,
+
TAIL PROBABILITIES
59
coincides with the distribution of N , called a hypergeometric distribution, which is discussed in detail in Chapter 8.
Tail Probabilities
5.11
Let X B ( n , p ) . Quite often, one has to calculate probabilities P { X or P { X 2 m } . Of course, we can write that N
P { X 2 m} = 1 - P { X < m } =
c
< m}
n
(5.35)
k=m
but the computation of the RHS of (5.35) ma.y become difficult for large values of m arid n. For this purpose, a simpler integration formula can be given.
Exercise 5.5 Prove that
(5.36)
We will recall equality (5.36) in Chapter 16 when we discuss the beta distribution.
5.12
Limiting Distributions
There are two important distributions which appear a.s the limit for some sequences of binomial distributions. (a) For a. fixed positive X and n > A, consider a sequence of random variables X n B ( n ,X / n ) . In this case N
BINOMIAL DISTRIBUTION
60
Exercise 5.6 Show that, for any fixed
7n = 0 , l . .
..,
lirn P { X , = m } = p,(X),
(5.37)
Tl--I"LI
where p,(X)
=
A" , e- x m!
m=0,1, ....
(5.38)
Note that a random variable taking on values 0 , 1 , 2 , . . . with probabilities p,(X) is said t o have a Poisson distribution, and it is discussed in detail in Chapter 9.
(b) Let X ,
- B(n,p),
n = 1 , 2 , . . . . We know that
EX,
and
=np
Var X ,
= np(1 - p ) .
Consider a new sequence of normalized random variablcs
Using notations introduced in Section 5.2, we ca.n write
where
a.nd
1
hrL=
Jm.
For the new ra.ndoni va.riables, we rea.dily find t1ia.t
EW,
=0
and
Var W , = 1, n = 1 , 2 . .
Let .fn(t)and g,(t) be the characteristic functions of X , and W,, respectively. From (5.5) and (5.6), it follows that
f T L (= t ) (1 - - p + p r " ) r L a.nd
( 5.40)
LIMITING DISTRIBUTIONS
61
From (5.40), we obta.in tha.t for m y fixed t ,
g n ( t ) --f e - t 2 / 2
as n --t m .
(5.41)
Since
(5.42) is the chara.cteristic function of thc density fiiriction
(5.43) we conclude that the sequence of distributions of W , converges as n --t m to a limiting distribution with density function p(x) as in (5.43). This continuous distribution, called the standard normal distribution,is discussed in detail in Chapter 23.
Thus, we have investiga.ted the asymptotic behavior of two sequences of binomial random variables and obtained t,wo different limiting distributions, Poisson and normal.
This Page Intentionally Left Blank
CHAPTER 6
GEOMETRIC DISTRIBUTION 6.1
Introduction
Consider the following scenario. There is a sequence of independent trials in each of which a, certain event A can occur with a const,ant probability p , and its cornplerncnt A can occur with probability 1 - p . Let us call A and A success’’ and “failure”, respectively. Now, let Xk ( k = 1 , 2 , . . .) be the number of “successes” in the kth trial. Indecd, for any k , X I , = 1 1 ~ )and X I , takes on two values, 0 and 1, with probabilities 1 - p and p , respectively. This means that X I , B e ( p ) , k = 1,2,. . . . Let Y, be the total number of “successes~’in the first n trials; i.e., Y, = X I X z . . . X,. We know tha.t the convolution of Bernoulli B e ( p ) random variables generates the binomial ) any n = 1 , 2 , . . . . Now, let B ( n , p ) distribution. Hence, Y, N B ( 7 ~ , pfor N bc the nuniher of “successes” until the appearance of the first “failure”. Then, this exa.mple with a sequence of ones and zeros considered in Cha.pter 4 revealed that (1
+ + +
N
P { N = m } = ( I - p)p”,
7n
= 0, I , . . . .
This probability distribution is the subject matter of this chapter.
6.2
Not at ions
Let a random variable X take on values 0 , 1 , 2 , . . . with probabilities p , = P { X = n} = (1- p)p”,
71
= 0,1,.. .,
(6.1)
where 0 < p < 1. We then say that X has a geometric distribution with parameter p , and denote it by
x
G(P).
The geometric type of distributions include all linear transforniations Y = a h X , - m < a < 30 and h > 0, of the geometric random variable. In this case we denote it by Y G ( p ,a , h ) , which is a random variable concentrated at points a , a $- h , a 2h,. . . with probabilities
+
+
P{Y
-
= a+71h} =
(1 - p ) p ” ,
63
R =O,l,.
...
(6.2)
GEOMETRIC DISTRIBUTION
64
Tail Probabilities
6.3 If X
N
G ( p ) ,then the tail probabilities of X are very simple. In fa.ct,
Formula (6.3) can be used to solve the following problem.
Exercise 6.1 Let X I , random variables, and
N
G ( p k ) ,k
=
1 , 2 , . . . , n, be a seyuencc of independent
mrL= mi11 &. l
Prove that m,
G ( p ) ,where p
=
nz=,
pk.
Generating Function and Characteristic Function
6.4 If X
N
N
G ( p ) ,then (6.4)
and 1-P f x ( t ) = EeZtX= Px(e") = ___
1 - peat
6.5
.
Moments
The generatiiig function in (6.4) has a rather simple form which we will use first to derive cxpressions for factorial moments. Factorial moments of positive order: Pk =EX(X
-
1 ) .. . ( X
-
k: + 1) =
pp(q
MOMENTS
65
In particular, we have Pl
=
P2
=
P3
=
~
P 1- p ’
2P2 (1 - PI2 ’ 6P3 (1- p ) 3 ,
____
and
(6.10)
Factorial m o m e n t s of negative order: From formula (5.15) we can give the general form of moments:
which is as follows:
Now,
(6.12)
and
(6.13)
M o m e n t s about zero: It follows from (6.7)-(6.10) that (6.14) (6.15)
(6.16)
66
GEOMETRIC DISTR.IBUTION
Gmtrul moments: From (6.14) and (6.15)’ we readily obtain the variance to be (6.18) Sirnilarly, frorn (6.14)-(6.17), we obtain the third and fourth cent,ral niornents to he P3 =
P(1 +PI (1 - p ) 3
(6.19)
~
and D4
=
P ( 1 + 71, + P2) (1 - PI4
’
(6.20)
respect ivcly.
Shape churrLcterrstacs: From (6.18)-(6.20), we find Pearson’s coefficients of skewness and kurtosis
as
(6.21) arid
(6.22) respcc:tively. It is quite clear from (6.21) tha.t the geometric distribution is positively skewed. Furthermore, we observe from (6.22) tha.t the distribution is leptokurtic for all values of the pa.ra.nietcr p. Figure 6.1 presents plots of the geoiriet,ric iiia.ss function for some choices of p.
Exercise 6.2 Prove that the geometric G ( p ) distribution is leptokurtic for all values of thr. parameter p .
MOMENTS
67
Geometric(0.25)
20
10
n
Geometric(0.5) 05
c ii
01 00
0
n
10
5
Geometric(0.75)
0
1
2
n
3
4
5
Figure 6.1. Plots of geometric mass function
68
GEOMETRIC DISTRIBUTION
Convolutions
6.6
Let XI G(p1) and X2 G ( p 2 ) be independent random variables. Without loss of generality, let us assume that p l > pz. We are then interested in the distribution of Y = X 1 + X 2 ; see Sen and Balakrishnan (1999) for a discussion on a more general problem. From (6.4), we get N
P ~ ( s )= Es* = Px,(s)Px,(s) 1-m 1-P2 x-. 1- P I S 1- p z s
(6.23)
The RHS of (6.23) can be simplified and written as (6.24)
Now it, follows from (6.24) that
P{Y = m }
=
Pl(1 - P2) P { X 1 = m } -
Pl - P2
P l -P2
(6.25) Indeed, (6.25) is also valid if p2 > PI. If PI = pa, we must let p2 + PI in (6.25). As a result, the distribution of the sum of two independent G ( p ) random vxiables is determined as
P{Y
= 7n) =
(rn + 1)(1 p)2p'7L, -
WL =
0,1,. ...
(6.26)
Below, we will obtain (6.26) as a special case of more general convolutions. Now, let XI, X 2 , . . . , X , be independent G ( p )ra.ndom varkbles, and Y, = X 1 $- . . . + X,. The generating function of Y, is given by (6.27) Since (6.28) we simply get (6.29) where
( - n ) ( - n - 1). . . (-n =
n(ri +
(-l)m
m! 1) . . . ( n 7n!
-
m + 1)
+ 7n
-
1)
DECOMPOSITIONS
69
hence,
P{Y,
(’”:,”,
= m} =
m = 0,1,.. .
l ) P V -PI”,
(6.30)
In particular, if n = 2, we a.rrive at (6.26). In the scheme of independent trials, the random variable Y, can be interpreted as the total number of “successes” until the appearance of n t h “failure”. Since the generating function of Y, is a binomial of the negative order ( -n), the distribution in (6.30), called the negative binomial distribution, is discussed in detail in Chapter 7. The geometric distribution is naturally a particular case of the negative binomial distribution when n = 1.
6.7
Decompositions
It is easy to see that geometric G(p) distribution can be represented as a sum of two or more independent random variables. In fact, the generating function P x ( s ) = (1 - p ) / ( l - p s ) of the random variable X N G ( p ) can be rewritten as a product of two generating functions:
where (6.31) and (6.32) It is easy t o see that the generating function in (6.31) corresponds to a random variable U , which ha.s a geometric distribution, denoted by G(p2,0,2); i.e., U takes on values 0 , 2 , 4 , . . . , with probabilities T,
Note that U
= P{U
d
=am} = (1- p 2 ) p2m ,
2W, where W
-
= l} =
(6.33)
G(p2). Further, the genemting function
P v ( s ) in (6.32) corresponds to Bernoulli Be E’{V
m = O , 1 , ....
1 - P{Y
(
__ p)
= O} =
~
distribution; i.e.,
P . l+P
(6.34)
Thus, for a G(p) distributed random variable X , we get the representation
70
GEOMETRIC DISTRIBUTION
where the distributions of the independent random variables U and V are as given in (6.33) a.nd (6.34). We can also write it equivalently as
x = 2UI+ VI, d
- G(p),
where X
U1
N
G(p2) and Vl
(6.35)
Be
N
(
P).
Moreover, U, in
(6.35) is also a geometrically distributed random variable and itself admits the representation
where U:,
-
Ul G(p4) and &
N
+
d
= 2u2
v2,
. Combining these, we have a new
Be
decomposition for X as
x
+ 2vz $- VI.
4u2
This process can be continued and for any n position holds:
x d 2nUn +
=
1 , 2 , . . . , the following decom-
c n
2'"-1vk,
(6.36)
k= 1
where
Note that all the random variables on the RHS of (6.36) are independent. In Chapter 7, when we introduce the general form of nega.tive binomial distribut>ions,it will be showri that any geometric distribution is infinitely divisible.
6.8
Entropy
For geornct,ric distributions, one can easily derive the entropy.
Exercise 6.3 Show that the entropy of X
H ( X )= -ln(l - p ) in particular, H ( X ) = 2 I n 2 i f p =
+.
-
-
G ( p ) ,0 < p < 1, is given by
-]rip;
1-P
(6.37)
CONDITIONAL PROBABILITIES
6.9
71
Conditional Probabilities
In Chapter 5 we derived the hypergeometric distribution by considering the conditional distribution of X I , given that X1 X2 is fixed. We will now try t o find the corresponding conditional distribution in the case when we have geometric distributions. Let X 1 and X2 be independent and identically distributed G ( p ) random variables, arid Y = X I X,. In our case, Y = 0 , 1 , 2 , . . . and [see (6.30) when n = 21 r = 0,1,2,. . .. P{Y = r } = ( r 1)(1- p l 2 p r ,
+
+
+
We can find the rcquired conditional probabilities as
P{X1
= tjXl+
X,
=r}
=
P(X1 = t , X , = T t } P { X 1 + X2 = r } ~
(6.38)
+
It follows from (6.38) that the conditional distribution of X I , given that X1 X2 = T , becomes the discrete uniform DU(r 1) distribution. Geometric distributions possess one more interesting property concerning conditional probabilities.
+
Exercise 6.4 Let X
N
G ( p ) ,0 < p < 1. Show that
P { X 2 71 holds for any n
= 0,1,.. .
+ V7lX 2}.7
and nz
=P{X
2 71)
(6.39)
= 0 , 1 , . . ..
Remark 6.1 Imagine that in a sequence of independent trials, we have m “successes” in a row and no “failures”. It turns out that the additional number of “successes” iiiitil the first “failure” that we shall observe now, has the same geometric G(p) distribution. Among all discrete distributions, the geometric distribution is the only one which possesses this lack of m e m o r y property in (6.39). Remark 6.2 Instead of defining a geometric random variable as the number of “successes” until the a.ppearance of the first “failure” (denoted by X ) , we could define it altmmtively as the number of “trials” required for the first “failure” (denoted by Z ) . It is clear that 2 = X 1 so that we readily have the mass function of 2 as [see (6.l)]
+
P { Z = n} = (1 -p)p”-1,
n = 1 , 2 , .. ..
(6.40)
72
GEOMETRIC DISTRIBUTION
From (6.4), wc then have the generating function of 2 as
P ~ ( s=) EsZ
=E
S ~ ” = sEsX
1 if /sI < -.
= ___ ’(’-’)
P
1-ps
(6.41)
Many authors, in fact, take these as ‘standard form’ of the geomrtric distribution [instead of (6.1) arid (6.4)].
6.10
Geometric Distribution of Order k
Consider a sequence of Bernoulli(p) trials. Let 21,denote the nurnber of “tria.ls” required for ‘‘k consecutive failures” to appear for the first time. Then, it can be sliown tha,t the generating function of‘ 21, is given by
PZk(,s)= E s Z k =
+
(1 - p ) k s k ( l - s p s ) 1 - s +p(l - p ) kSk+l ’
(6.42)
The corresponding distribution has been named the geometric distribution of order k . Clearly, the genemting function in (6.42) reduces t o t1ia.t of the geometric distribution in (6.41) when k = 1. For a review of this and many other “run-related” distributions, one may refer t o the recent book by Balakrishnari and Koi1tra.s (2002).
Exercise 6.5 Show tl-1a.t (6.42) is indeed the generating function of Exercise 6.6 From the generating function of arid variance of Zk.
z k
Zk.
in (6.42). dcrivc the mean
CHAPTER 7
NEGATIVE BINOMIAL DISTRIBUTION 7.1
Introduction
In Chapter 6 we discussed the convolution of n geometric G ( p ) distributions and derived a new distribution in (6.30) whose generating function ha.s the form P,(s) = (1 - p)"(l - ps)-".
As we noted there, for any positive integer n, the corresponding random variable ha.s a suitable interpretation in the scheme of independent trials as the distribution of the total number of "successes" until the nth "failure". For this interpretation, of course, n has to be an integer, but it will be interesting to see wha.t will happen if we take the binomial of arbitrary negative order as the gt:nera.ting function of some ra.ndom variable. Specifically, let 11s consider the function
m(s) = (1
-
p),(l
-
ps)-,,
cy
> 0,
(7.1)
which is a generating function in the case when all the coefficients ~ 0 . ~ 1. ., . in the power expansion x y = O p k s k of the RHS of (7.1) are nonnega.tive and their sun1 cquals 1. The second condition holds since C y = O p k = P,(l) = 1 for any Q > 0. The function in (7.1) has derivatives of all orders, and simple calculations enable us to find the: coefficients p k in the expansion
k=O
as pk
= -
r,(k)(0) ~~
k!
N(cy
+ 1) ' .
(1 -p)"(-p)"-ai)(-a
( a+ k k! '
-
1)
-
k! (1- d " P k
73
1).. . (-.
-
k
+ 1)
74
NEGATIVE BINOMIAL DISTRIBUTION
a.nd p ~ .> 0 for a.ny k = 0 , 1 , 2 , . . . . Thus, we have obtained the following assertion: For any CY > 0, the function in (7.1) is generating the distribution concentrat,ed at, points 0, 1 , 2 , . . . with probabilities as in (7.2). It should be meritioried here that negative binomial distributions and some of their genedized forms have found importa.nt applicat,ioris in actuarial w.na.lysis; see, for example, the book of Klugman, Panjer and Willmot (19%).
7.2
Notations
Let a random variable X take on values 0 , 1 , . . . with probabilities pk =
P{X
=
A"}
=
(a
+
;
-
1) (1 - p)"p'" ,
cy
> 0.
We say that X lias the negatzve bznom7al distrabutaon with parameters p (0 > 0, 0 < p < l ) ,and denote it by
x
N
o/
and
NB((L.,p).
Note t,lia.t,N B ( 1 . p ) = G ( p ) and, hencc, the geomet,ric distribution is a pa.rticiilar case of the negative binomial distribution. Sometimes, the negativr: binomial clist,ribution with an integer parameter 0 is called a PG,SUJ,~ distrihu-
tiol?,.
7.3
Generating Function and Characteristic Function
There is no necessity to calculate the generating function of this distribution. Unlike the earlier situations, the generating function was thv prirnary object in this cabe, which then yielded the probabilities. Recall that if X NB(a.p), then N
P x ( s )= EsX
=
( - ) O .
1- p s
(7.3)
From ( 7 . 3 ) ,we inimediatcly have the characteristic function of X as
7.4
Moments
From the expression of the generating function in (7.3),we can readily find tlic factorial niotnents.
MOMENTS
75
Factorial moments of positive order: pk
=
where
E X ( X - l ) - . . ( X - k + 1) = PC'(1)
r(s)=
./o
00
e--zxs-l dx
is the complete gamma function. In particular, we have
P4
a(a
+ l ) ( a+ 2 ) ( a+ 3)p4
(7.9) (1 - PI4 Note that if a = 1, then (7.5)-(7.9) reduce to the corresponding moment,s for the geometric distribution given in (6.6)-(6.10), respectively. =
Factorial moments of negative order:
and in particular,
(7.11) if a > 0 and a (6.12).
#
1. The expression for p-1 for the case a = 1 is as given in
(7.12)
76
NEGATIVE BINOMIAL DISTRIBUTION
Centrul moment.s: Froni (7.12) and (7.13), we readily find the variance of
0 2 = Var X
= a2
E
as
(7.16)
-
Similarly, from (7.12)-(7.15), we find the third and fourth central moments of X as P3
=
p4
=
(7.17) a4
-
4a3al f
2 6ff2ctil
-
3a4 -
~
ap (1 -p)4 (1
+ (3a + 4)p + p ” } , (7.18)
respectively
Shape characterrstm: Froni (7.16)-(7.18), we find Pearson’s coefficients of skewness and kurtosis as (7.19) and P4
y 2 = 7
=
Pz”
=
1
+ (3a + 4)P + P2
CUP 1+ 4 p + P 2 3+( ap
>.
(7.20)
respectively. It is quite clear from (7.19) that the negative binomial distribution is positively skewed. Furthermore, we observe from (7.20) that the distribution is lrptokurtic for all values of the parameters a and p . We also observc, that as 01 tends to cc, y1 and 7 2 in (7.19) and (7.20) trnd to 0 and 3 (the values corresponding to the normal distribution), respectively. Plots of negative binomial mass function presented in Figures 7.1 7.3 reveal these propertics.
7.5
Convolutions and Decompositions
-
Let XI N B ( a 1 , p ) and ables, arid Y = X1 Xp.
+
X2
N
N B ( a 2 , p ) be two independent random vari-
Exercise 7.1 Use (7.3) t o estahlish that NB((wl ( 1 2 , ~ distribution. )
+
Y has
a negative binomial
CONVOLUTIONS AND DECOMPOSITIONS
77
NegBin(2,0.25)
0
10
k
20
Neg Bin(2,0.5)
0
10
k
20
NegBin(2,0.75)
0
10
k
20
Figure 7.1. Plots of negative binomial mass function when
T =2
78
NEGATIVE BINOMIAL DISTRIBUTION
NegBin(3.0.25)
10
k
20
NegBin(3,0.5) I
0 2 -i
0
10
20
k
NegBin(3,0.75)
0
10
k
20
Figure 7.2. Plots of negative binoniial mass function whcn
r'
=3
CONVOLUTIONS AND DECOhtPOSITIONS
79
NegBin(4,0.25)
0 06
2 n
0 05 004
000
7
0
10
k
20
NegBin(4,O. 5)
0
10
k
20
NegBin(4,0.75)
0
10
20
k
Figure 7.3.Plots of nega.tive binomial ma.ss function when
T
=4
80
NEGATIVE BINOMIAL DISTRIBUTION
Remark 7.1 Now we see that the sum of two or more independent random variables X k N B ( a k , p ) ,k = 1 , 2 , . . . , a.lso has a negative binomial distribution. On the other hand, the assertion above enables us t o coricliide tha.t any negative binomial distribution admits a decomposition with negative binomial components. In fact, for any n = 1 , 2 , .. . , a ra.ndom variable X N N ( a , p ) can be represented as d = Xl,, ’. N
x
+ + x,,,, ’
where XI,^, . . . , X,,,,, are independent and identically distributed raiidoni variables haviiig N B ( a / n ,p ) distribution. This means that any iiega.tive hinornial dist,ribution (including geometric) is infinitely divisible.
7.6
Tail Probabilities
-
Let X N B ( n . p ) , where n is a positive integer. The interpretation of X based on iridepcriderit trials (each resulting in “successes” and “failiircs” ) gives iis a way to obtain a siniple foriri for tail probabilities as (7.21)
In fact, event { X 2 m } is equivalent t o the following: If we fix out,corries of the first r n R - 1 tria.ls, then the number of “successful” trials must be at least r n . Let Y be the number of LLsuccesses” in m n - 1 indepeiident trials. WP know t1ia.t Y has the binomial B ( m n - 1,p ) distribution and
+
+
+
Moreover, (5.36) gives the following expression for the RHS of (7.22):
Since
P { X 2 732) = P{Y 2 m}, upon coiiibiniiig (7.21)- (7.23), we obtain
As a special case of (7.24), we get equality (6.3) for geometric G ( p )distribution when 71 = 1.
LIMITING DISTRIBUTIONS
7.7
81
Limiting Distributions
Fox X > 0, let us consider X , function of X , in this case is
N
N B ( a ,X/a) where X/a < 1. The generating
(7.25) We see immediately that as
Pa(s) t eX(S-l)
Q
+ 00.
(7.26)
Note tha.t (7.27) is the generating function of a random variable Y taking on values 0,1,2, with proba.bilities Plc =
e-'Xk ~
k!
k ?
= 0,1,2,
(7.28)
In Chapter 5 we mentioned that Y has the Poisson, distribution. Now relation (7.26) implies that for any k = 0,1, . . . , (7.29) that is, the Poisson distribution is the limit for the sequence of N B ( a ,X / a ) distributions as cy + 00. Next, let X , N B ( a , p ) and N
w,
=
X,
-
EX,
Jm (7.30)
Let f,(t) be the cliaracteristic function of W,, which can be derived by standard methods from (7.4). It turns out that for any t , (7.31) Comparing this with (5.43), we note that the sequence W,, as (Y + m, converges in distribution to the standard normal distribution, which we came across in Cha.pter 5 in a similar context.
This Page Intentionally Left Blank
CHAPTER 8
HYPERGEOMETRIC DISTRIBUTION 8.1
Introduction
In Chapter 5 we derived hypergeometric distribution as the conditional distribution of X I , given that XI X z is fixed, where XI and X z were binomial random variables. A simpler situation wherein hypergeometric distributions arise is in connection with classical combinatorial problems. Suppose that an urn contains a red and b black balls. Suppose that n balls are drawn at random from the urn (without replacement). Let X be the number of red balls in the sa.mple drawn. Then, it is clear that X takes on an integer value m such that
+
max(0, n - b) 5 m 5 min(n, a ) ,
(8.1)
with proba.bilities
8.2
Notations
In this case we sa.y that X ha.s a h?jpergeometrzc distribution with parameters n, a, and b, and denote it by
x
N
H g ( n ,a , b).
Remark 8.1 Inequalities (8.1) give those integers m for which the RHS of (8.2) is nonzero. We have from (8.2) the following identity:
83
HYPERGEOMETRIC DISTRIBUTION
84
To simplify our calculations, we will suppose in the sequel that n 5 min(a, b) and hence probabilities in (8.2) are positive for m = 0,1, . . . , n only. Identity (8.3) t h m beromes the following useful equality:
Generating Function
8.3
If X H g ( n ,a , b ) , then we can write a formal expression for its generating function as N
Px(s)=
n!(a+ b ((J
-
+ b)!
n)!
c 7L
m=O
m!( a - m)! (71
a!b! s*. m)! ( b - 72. + m)!
(8.5)
-
It turns out that the RHS of (8.5) can be simplified if we use the Gaussian hypcrgeonietric function
which was introduced in Chapter 2. Then, the generating funct,ion in (8.5) becomes
from which it becomes clear why the distribution has been given the name hypergeometric distribution.
8.4 Characteristic Function On applying the relation between generating function and characteristic function, we readily obtain the characteristic function from (8.6) t o be fx(t)
= -
8.5
E e Z t X= Px(e”) 2F1 [-n, -a;b - n + 1;eZL] * zF1[-n, -a; b - 72 1;11
+
(8.7)
Moments
To begin with, we show how WP can obtain the most important moments a1 = E X and /& = Var X using the “urn interpretation” of X . We have n
MOMENTS
85
red balls in the urn, numbered 1 , 2 , . . . , a. Corresponding to each of the red balls, we introduce the random indica.tors Y1,Y2,. . . , Y, as follows:
Yk
=
1
if the kth red ball is drawn
=
0
otherwise.
Note that
EYk
= P{Yk =
n 1) : a+b'
k = 1 , 2 , .. . , a ,
and
k = 1 , 2 ) . . . ,a. It is not difficult, to see that
+
X=Y,+Y2+..
(8.10)
KL.
It follows immediately from (8.10) that
EX
= E(Y1
an + . . . + Y,) = aEY1 = __ a+b
(8.11)
Now. (8.12)
Using the symmetry argument, we can rewrite (8.12) as Var X
=a
Var Yl
+ a(a
-
(8.13)
1) Cov(Yl,Y2).
Now we only need to find the covariance on the RHS of (8.13). For this purpose, we have
=
P(YIY2 = 1 } -
(
(8.14)
Note that
P{y1Y2=1}=
(';)/("Y) =
n(n- 1) (u+bj(a+b-1)
.
(8.15)
Finally, (8.9) and (8.13)--(8.15)readily yield Var X
=
abn(a + b - n) ( a b ) 2 ( ~ b - 1)
+
+
(8.16) '
86
HYPERGEOMETRIC DISTRIBUTION
Factorial rnonierits of positive order: pk =
-
E X ( X - 1 ) .. . ( X
a! b! ,n! ( a (0
-
-
+b
+ b)!
u! b! 72! ( a + b (a b)!
+ a!n! (a+ b + b)! (U
(U
-
-
Ic
+ 1)
n)! m=k
-
1 ( m - k ) ! ( a - m)! ( n - m)! ( b - n + m)! 1
n)!n-k m=O
m! ( a - k
1('"m k , ( n
n)! n-k - k)! m=O -
m ) ! ( n- k
-
-
-
m)! ( b - 7~
+ k + m)!
;-,,)
(8.17)
Froni (8.4)' we know that
c
71-!i
711
b
(a+ b
rk)! ( n- k ) ! ( U + b n)!
USb-k
=o ( U ~ k ) ( n - k - m
-
-
using which we readily obtain /Lk
=
a! n! ( a + b - k ) ! ( u b)! ( a - Ic)! (71 - k ) !
+
for k 5 n.
(8.18)
Note also that p k = 0 if k > n. 1x1 particular, the first four factorial moments are as follows: 1-11
=
p2
=
11.3
=
114
=
an a+b'
(8.19)
__
1) 1) ' u ( u - 1 ) ( a- 2)n(n- 1 ) ( r t - 2) ' (a b ) ( u b - 1)(a b - 2 ) a(a 1 ) ( a - 2 ) ( u - 3)n(n- 1)(n - 2 ) ( n- 3 ) (a b ) ( ~ b - l)(a b - 2 ) ( ~ b - 3 ) U ( a - 1)72(n -
(a
+ b ) ( a+ b
+ -
-
+
+
+
+
+
Fuctoraal momrnts of negutave order: Analogous to (8.17) and (8.18)' we have 1d-A
=
E
[( X +
1
1 ) ( X + 2 ) . . . ( X +k )
+
I
(8.20) (8.21) (8.22)
MOMENTS -
=
a! n!( a + b (a k)! (a
+
-
n)! n+k
+ b ) ! m=k
(
a! n! ( a + b - n ) ! a + b + k (a+k)!(a+b)! n+k -
87
a! n!( a + b - n ) ! ( a k ) ! ( a b)! m = O
+
)
+
In particular, we find I*.-1
= -
-
1
.(X+.) a+b+l ( a l ) ( n 1) ( a a+b+l ( a l ) ( n 1) ( a
+
+
+
+
a! n! ( a + b - n ) !b! + l)!( a + b)! ( n + l ) ! ( b - n - l ) ! b! ( a + b -- n)! + l ) ( n + 1)(a+ b)! ( b - n - l)! .
(8.24)
Moments about zero: Indeed, the RHSs of (8.11) and (8.19) coincide, and we have
Furthermore, we can show tha.t 0 2 =EX
2
= p 1 + I*.2 =
and a3
= -
+
an a f b
~
E X 3 = /-LI + 3 p ~ p3 an 3a(a - l ) n ( n - 1) a+b ( a + b ) ( a + b - 1)
-+
l ) n ( n 1) + (a(a a+b)(a+b-l) -
-
(8.25)
- 2)n(n l ) ( n 2) + a (( aa + b1)(a )(a+b- l)(a+b-2) -
-
-
(8.26)
Remark 8.2 If n
=
1, then the hypergeometric Hg(1,a , b) distribution co-
incides with Bernoulli Be
(a 1- b ) ~
distribution.
Exercise 8.1 Check that expressions for moments are the same for
Hg(1,a , b) as those for Be ( a ~
b ) distributions.
Exercise 8.2 Derive the exprcssions of t,he second and third moments in (8.25) and (8.26), respectively.
HYPERGEOMETRIC DISTRIBUTION
88
8.6
Limiting Distributions
-
Let X N H g ( n . p N , (1- p ) N ) , N = 1 , 2 , . . . , where 0 Then, wr have
~
< p < 1 and R is fixed.
rl! m! ( n- m)! pN(pn'-l)...(pN-m+l)((lp)N)((l-p)N-l)...((l~~)N~n+~+l) N(N-l)...(N-nfl)
from which it is easy to see that for any fixed m
= 0, 1 , .. .
, n,
asNioo.
P{XN = m} i
(8.27)
Thus, we get the binomial distribution as a limit of a special sequcnce of hypergronirt rir distributions.
Exercise 8.3 Lrt X,v any fixed ni = 0 . 1 , . . . ,
P{XN
N
1 , 2 , . . . . Show then that for
H g ( N ,A X 2 , N 3 ) , N
=
A'" epx 7n.
as N
= m} i
t 00.
(8.28)
The Poisson distribution, which is the limit for the sequence of hypergeometric rantloni variables, present in Exercise 8.3, is the subject of discussion in the iirxt chapter.
CHAPTER 9
POISSON DISTRIBUTION 9.1
Introduction
The Poisson distribution arises natiirally in many instances; for example, as we have already sectn in preceding chapters, it appears as a limiting distribution of some sequences of binomial, negative binomial, and liypergeometric random variables. In addition, due to its many interesting characteristic properties, it is also used as a probability model for the occurrence of rare events. A book-length account of Poisson distributions, discussing in great detail their various properties and applications, is available [Haight (1967)l.
9.2
Not at ions
Let a random variable X take on values 0,1,. . . with probabilities
cAAm p , = P { X = m } = -, ni!
m=0;1, ...;
(9.1)
where X > 0. We say that X has a Poisson distribution with paramcter A, and denote it by
X
N
..(A).
If Y = u + h X , -30 < a < 00,h > 0, then Y takes on values a,a+h,a+2h,.. . wit,h probabilities P { Y = a+nzh}
epAXm
= P { X = m} = ____
m!
’
m
= 0,1,
This distribution also belongs t o the Poisson type of distribution, and it will be denoted by .(A, a , h). The standard Poisson .(A) distribution is nothing but n(X,0 , l ) .
89
POISSON DISTRIBUTION
90
9.3 Let, X
Generating Function and Character ist ic F'unct ion
-
.(A),
X
> 0. From (9.1), we obtain the generating function of X as
Then, the characteristic function of X has the form . f x ( t )= EeitX = Px(eit)= exp{X(eit
9.4
-
1)).
(9.3)
Moments
The simple form of the generating function in (9.2) enables us to derive all the factorial iiioinerits easily.
Factorial moments of positive order: I L = ~ EX(X
-
1 ) . . . ( X- k + 1) = PF'(1)= X k ,
k = 1,2,....
(9.4)
In particular, we have:
Factorial moments of negative order:
In particular, we obtain (9.10) and
TAIL PROBABILITIES
91
Moments about zero: From (9.5)-(9.8), we immediately obtain the first four moments about zero as follows:
Central m o m e n t s : From (9.12) and (9.13), we readily find the variance of X as p2 = Var
X
= a2 - a; = A.
(9.16)
Note that if X N .(A), then E X = Var X = A. Further, from (9.12)-(9.15), we also find the third and fourth central moments as
/!& b4
=
E(X
-
E X ) 3=
-
3Q2Q1+
2 ~= ; A,
=
E(X
-
EX)4= a 4
-
4a3~tl
-
+ 6a2a'4
3ai: = X
+ 3X2,
(9.17) (9.18)
respectively.
Shape characteristics: From (9.16)-(9.18), we obtain the coefficients of skewness and kurtosis as (9.19) 72
=
P4
1
jg=3+x,
(9.20)
respectively. From (9.19), we see that the Poisson distribution is positively skewed for all values of A. Similarly, we see from (9.20) that the distribution is also leptokurtic for all values of A. Furthermore, we observe that as X tends to 03, the valiies of 71 and y2 tend to 0 and 3 (the values corresponding to the normal distribution), respectively. Plots of Poisson mass function presented in Figure 9.1 reveal these properties.
Tail Probabilities
9.5 Let X
N
.(A).
Then (9.21)
The RHS of (9.21) can be simplified and written in the form of an integral.
92
POISSON DISTRIBUTION
Exercise 9.1 Show that for any m = 1,2 , . . ,
P { X 2 m} =
(9.22)
du.
Remark 9.1 We will recall the expression on the RHS of (9.22) later when we discuss the gamma distribution in Chapter 20.
9.6
Convolutions
Let XI .(XI) and X 2 .(A,) be independent random variables. Then, by it is easy t o show making use of the generating fiinctions P x , ( s ) and PxZ(s), that, Y = XI X2 ha.s the generating function N
N
+
-
+
and, hence, Y .(XI A 2 ) . This simply means that convolutions of Poisson distributions arc also distributed a s Poisson.
9.7
Decompositions
Due to the result just st,ated, wc know that for any X pair of independent Poisson random variables U .(XI) where 0 < A1 < A, to obtain the decomposition N
N
.(A) wt' can find a and V T ( X - XI),
d
X=U+V. Hrncc, any Poisson distribution is decomposable. 1 , 2 . . . . , the decomposition
--
(9.23) Moreover, for any n
x = x1+x2 + + x , d
" '
=
(9.24)
holds with X ' s being independent and identimlly distributed as r ( X / n ) . This siniply implies that X is infinitely divisible for any A. Raikov (1937h) established that if X a.dmits decomposition (9.23), them both the independent nondegenera.te components U a.nd V have necessarily a. Poisson type of distribution; that is, there exist constants --cc< u < cx) arid X I < x such that
U - J T ( X ~ , U and ,~) V~~(A-xl,-a,l). Thus, convolutions and decompositions of Poisson distributions always belong t o the Poisson class of distributions.
DECOMPOSITIONS
93
Poisson( 1)
03
nr
0
5
10
n
Poisson(4)
0
5
10
n
Poisson( 10)
0
10
20
n
Figure 9.1. Plots of Poisson mass function
94
POISSON DISTRIBUTION
Conditional Probabilities
9.8
-
-
Let X I .(XI) arid X2 r ( X 2 ) be independent random variables. Consider the conditional distribution of X1 given that XI +XZ is fixed. Since X 1+ X 2 r(X1 Xz), we obtain for any n = 0 , 1 , 2 , . . . and 712 = 0 , 1 , . . . , n that
+
P(X1
+ x2 = n } P ( X 1 = m , X l + X2 = n}
= mix1
x,
-
P{X1+ = n} P { X ] = Ill, x2 = 72 - m } P { X l + x2 = n } P ( X 1 = rrb}P{X2 = n - nL} P { X I + x, = 7L)
-
-
, - A I x ~ L1
~
~
m!
=
( 7 3
n!
e-X2,y-m
(n
-
m)!
+ X2)"
dX1+XZ)(X1
(A)nL+
Thus, the conditional distribution of X 1 , given t,hat X I the biriorriial B
( ?' h z ) n, X1 ~
(9.25) X2 =
n, is simply
distribution.
Now, we will try to solve the inverse problem. Let, X1 and X2 be independent random va.riablcs taking on values 0 , 1 , 2 , . . . with positive proba.bilities. Then, Y -= X1 X2 also takes on values 0 , 1 , 2 , . . . with positive probabilities. Suppose t,ha.t for a.ny n = 0 , 1 , . . . , the conditional distribut,ion of X I , given that Y = 71, is binomial B ( n , p ) for soiric: pa.rameter p , 0 < p < 1. Then, we are interested in determining the distributions of X1 and X2! It, turns out that both distributions arc Poisson. To see this, let
+
P(X1
= 711) = r m >
0,
aid
P{X2=l}=qe>0,
V L=
0, I ? .. . ,
1=0,1,....
As seen above, the conditional probabilities P { X 1 = ni/Y = n } result in the expression r.,,q,-,/P{Y = n}. In this situation, we get the equality Trn4n-m
P{Y
=
n}
-
n! p m ( l - p)"-", nz! ( n- nz)!
m
Compare (9.26) with the sanie equality written for m has the form
= 0 , 1 , . . . ,71.
(9.26)
+ 1 in place of 712, which
It readily follows from (9.26) and (9.27) tha.t (9.28)
MAXIMAL PROBABILITY holds for any n = 0 to obtain
=
1 , 2 , . . . and m
=
95
0 , 1 , . . . ,n. In particular, we ca,n take
m
(9.29) Let us denote X Yn =
= q ( 1 - p)/(rop). Then,
X n
-%-I
=
A2
n(n - I)
we immediately get qo
= epx
4n =
(9.29) implies that
A" qnp2 = . . . = -qo,
n = 1 , 2, . . . .
n!
(9.30)
and, consequently,
e-'Xn n! '
~
n=o,1, ...
,
(9.31)
where X is some positive constant. On substituting qn into (9.28) and taking m = n - 1, we obtain
Hence, and consequently, (9.32) Thus, only for X I and X2 having Poisson distributions, the conditional distribution of X I , given t1ia.t X I X2 is fixed, can be binornial.
+
9.9
Maximal Probability
We may often be interested to know what of proba.bilities p , ( m = 0 , 1 , . . .) are maximal.
= e-'X"/rn!
Exercise 9.2 Show tha.t there are the following two situations for maximal Poisson probabilities: If mo < X < mo 1, where mo is an integer, then p,,, is maxinial among all probabilities p , ( m = 0 , 1 , . . .). If X = mo, then p,,, = pmOpl,and in this case both these probabilities a.ro maximal.
+
96
POISSON DISTRIBUTION
Limiting Distribution
9.10 Let X,
N
.ir(n),
n = 1 , 2 , . . . , and X,-EX, w,, = d=
-
Xn-n
f i '
Using characteristic functions of Poisson distributions, it is easy t o find the liniiting distributions of random variables W,.
Exercise 9.3 Let g T 1 ( t )be the characteristic function of W,. Then, prove that for any fixed t , g,,(t) + e
-t'/2
as n
+ 00.
(9.33)
Remark 9.2 As is already known, (9.33) means that the standard normal distribution is the limiting distribution for the sequence of Poisson random variables W l ,1V2, . . . . At the same timc, tlir Poisson distribution itself is a limiting form for the binomial and some other distributions, a5 noted in the preceding chapters.
9.11
Mixtures
In Chapter 5 we considered the distribution of binomial B ( N , p ) random varia.bles in the case when N itself is a binomial random variable. Now we discuss the dist,ribution of B ( N , p ) random vaxiable when N has tht: Poisson distribution. La.ter, wr deal with a niore general Poisson mixtures of random va.riables. Let X , . X , , . . . be independent and identically distributed ra.ndom varip k s k , where ables liaviiig a conirnon generating function P ( s ) = Cy=o p k = p{x7,L = k } , 172 = 1 , 2, . . . , k = 0 , 1 , 2 , . . . . Let s o = 0 and = XI t X2 -t. . . t X , ( n = 1 , 2 , . . .) bc the cumulative sunis of X ' s . Then, the genera.ting function of S, has the form
s,,
Pn7ns) = EsSn
=
ESXl+"'fX,,
=
E s ~E ' s X a . . . E s X V= L P"(s),
n = 0,1,.. . .
Consider now an intcger-valued random variable N taking on values 0 . 1 , 2 . with probabilities qn = P { N = n } ,
n = 0 , 1.....
MIXTURES
97
Suppose that N is independent of X I ,Xa,. . . . Further, suppose that Q ( s ) = 00 Cn=Oqn~n is the generating function of N . Let us introduce now a new random variable, Y =SN. The distribution of Y is clearly a mixture of distributions of S, taken with probabilities qn. Let us find probabilities r , = P { Y = m } , m = O , l , . . . . Due to the theorem of total probability, we readily have
r,
=
P{Y
= m } = P{SN = m )
cu
=
C p { s ~ mlN =
= 7b}P{N = n}
n=O 0
(9.34) n=O
Since random variables S,
+ ... + X ,
= XI
and N are independent,
P{Sn = ,m/N = n } = P { S n = na}; and hence we may write (9.34) as 00
r,
=
C P{S,
= m>qn,
n=O
m
= 0, I,..
(9.35)
Then, the genera.ting function of Y has the form
m=O m
m
m=O n=O 00
00
n=O
nL=O
(9.36) We note that the sum
oii
n1=O
is simply the generating function of Sn and, as noted earlier, it is equal to P"(s). This enables us to simplify the RHS of (9.36) and write 00
(9.37) n=O
Relation (9.37) gives the genemting function of Y, which provides a way t o find the probabilities TO, r 1 , . . . .
98
POISSON DISTRIBUTION Suppose that we take V,
N
B ( n , p ) ,and we want t o find the distribution of
Y = V N ,where N is the Poisson r ( X ) random variable, which is independent of V,, V,. . . . . We can then apply relation (9.37) t o find the generating function of Y . In fact, for any n = 1 , 2 . . . ., due to the properties of the binomial distribution, we have
v, 2 X 1 + X2 + . ’ . + x,,
where X I ,X 2 , . . . are independent Bernoulli B e ( p ) randoni variables. Therefore, wc can consider Y as a sum of N independent Bernoulli random variables
Y =d X I
+ X2+...+XN
.
(9.38)
In this case,
P ( s )= E s X k = 1 - p + p s , and therefore,
k
=
1 , 2 , .. . ,
Q ( s ) = E s N = eX(’-l).
Hence, wc readily obtain from (9.37) that
R ( s ) = EsY
= & ( P ( s ) )= exp{X(ps
-
p ) } = exp{Xp(s
~
1)).
(9.39)
Clearly, the RHS of (9.39) is thc generating function of the Poisson .(Xp) random variable, so we have
P{Y
= m} =
e-’P
(Xp)” m!
,
m = 0,1,2,
Exercise 9.4 For any n = 1 , 2 , . . ., let the random variable X , take on values 0 . 1 , . . . , n - 1 with equal probabilities l / n , and let N be a randorn variable independent of X ’ s having a Poisson .(A) distribution. Then, find the generating fiiriction of Y = X N arid hericc P{Y = 0).
More generally, when ATis distributed a,s Poisson with parameter X so that Q(.) = ex(”-’)
[see (9.2)],(9.37) becomes
qs)= e A { l Y s ) - l - l .
This t,hen is the generating function of the distribution of the sum of a Poisson number of i.i.d. random variables with generating function P ( s ) . The corrcspondiiig distributions are called Poisson-stopped-sum distributionq a na,nie introduced by Godainbe and Patil (1975) and adopted since then by a nurnber of authors including Doug1a.s (1980) a.nd Johnson, Kotz, and Kemp (1992). Some other na,mes such a.s generalized Poisson [Feller (1943)], stuttering Poisson [Kemp (1967)] and compound Poisson [Feller (1968)l have also been i i s e d for these distributions.
RAO-RUBIN CHARACTERIZATION
99
Exercise 9.5 Show that the negative binomial distribution in (7.2) is a Poisson-stopped-sum distribution with P ( s ) being the generating function of a logarithmic distribution with mass function
9.12
Rao-Rubin Characterization
In this section we present (without proof) the following celebrated Rao-Rubin characterization of the Poisson distribution [Rao and Rubin (1964)]. If X is a discrete random variable taking on only nonnegative integral values and that the conditional distribution of Y , given X = xi is binomial B ( z , p ) (where p does riot depend on x ) , then the distribution of X is Poisson if and only if
P ( Y =ylY = X ) = P ( Y
= y/Y#
X).
(9.40)
An interesting physical interpretation of this result was given by Rao (1965) wherein X represents a naturally occurring quantity with some of its components not being counted (or destroyed) when it is observed, and Y represents the value remaining (that is, the components of X which are actually counted) after this destructive process. In other words, suppose that X is the original observation having a Poisson .(A) distribution, and the probability that the original observation n gets reduced t o 2 due to the destructive process is (9.41) Now, if Y represents the resulting random variable, then
P(Y
= y)
= -
P (Y = yldestroyed) e--Px (PA)” .
Y!
,
=P
(Y = ylnot, destroyed) (9.42)
furthermore, the condition in (9.42) also characterizes the Poisson distribution. Interestingly, Srivastava and Srivastava (1970) established that if the original observations follow a Poisson distribution and if the condition in (9.42) is satisfied, then the destructive process has to be binomial, as in (9.41). The RawRubin characterization result above generated a lot of interest, which resulted in a number of different variations, extensions, arid generalizations.
100
9.13
POISSON DISTRIBUTION
Generalized Poisson Distribution
Let, X be a random variable defined over nonnegative integers with its probability function as p,(H,X)
=
P { X = 7n)
where 0 > 0, max(-l,-8/!) < X 5 1, and l (2 4) is the largest positive integer for which 8 t X > 0 when X is negative. Then, X is said t o have the generalized P o i s s o n (GPD) distribution. A book-length account of generalized Poisson distributions, discussing in great detail their various properties and applications, is available and is due to Consul (1989). This distribution, also known as the Lagrangian-Poisson distribution, is a. Poisson-stopped-sum distributioii. The special case when X = 00 in (9.43) is called the restricted generalized P o i s s o n distribution, and the probability function in this case beconies
+
1
p,(O, A) = P { X = n?,}= - H m ( l nL!
wherc max(-1/6, -1,")
+ rna)m-le--B--ma@,
7R=0,1, ...,
(9.44)
< a < l/6.
Exercise 9.6 For the restricted generalized Poisson distribution in (9.44), show that the mean and variance are given by
8 1- 0 0 respect i w 1y.
and
H (1
-
4
3
,
(9.45)
CHAPTER 10
MISCELLANEA Introduction
10.1
In the last eight chapters we have seen the most popular and commonly encountered discrete distributions. There are a few natural generalizations of some of these distributions which are not only of interest due to their mathema.tica1niceties but also due to their interesting probabilistic basis. These are described briefly in this chapter, and their basic properties are also presented.
P6lya Distribution
10.2
P6lya (1930) suggested this distribution in the context of the following combiiiatoriad problem; see also Eggenberger and P6lya (1923). There is an urn with b black and T red balls. A ball is drawn at random, after which c 1 (where c 2 -1) new balls of the same color are added t,o the urn. We repeat this process siiccessively n times. Let X , be the total number of black balls observed in these n draws. It can ea.sily be shown that X , takes on values 0,1, . . . , n with the following probabilities:
+
TI
pk
=
where k
(k)
+
+ +
b(b C) . . . { b ( k - ~)c}T(T+ C) . . . { T ( b + T ) ( b + T C)(b+ T 2c). . . { b +
= 0 , 1 , . . . ,n
p=-
and n
b b+r '
=
+
+ (n T
-
+ (n
k -
~
l)~}
l)C}
(10.1)
1
1 , 2 , . . . . Let us now denote
r q = 1- p = -
C
(10.2)
a=bfr'
b+r
Then, the proba.bility expression in (10.1) can be rewritten in the form pk =
n P ( P + a ) .. . { P (k) (1
+ ( k l)ol}q(q + a ) ' . { g + ( n + a ) ( 1 + 2 a ) . ' . (1 + ( n 1)a) -
'
-
-
k
-
l)a}
>
(10.3)
wherein we can forget that p , q, and a are quotients of integers a.nd suppose only that 0 < p , q < 1, arid a > - l / ( n - 1). Note that if cy > 0, then (10.3), 101
102 for 1 < k
MISCELLANEA
< n , can be expressed in terms of complete beta functions
as
(10.4) where the complete beta function is given by
Distribution in (10.3) is called the Po’lya distribution with parameters n , p , and a , where n = 1 , 2 , . . . , 0 < p < 1 and Q > - l / ( n - 1). If a = 0, then the probabilities in (10.3) simply become the binomial B ( n , p )probabilities. As a matter of fact, in the P6lya urn scheme, we see that this case corresponds to the number of black balls in a sample of size n with replacement from the urn, in which the ratio of black balls equals p . Next, we know from Chapter 8 that a hypergeometric random variable can be interpreted as the number of black balls in a. sample of size n without replacement from the urn. Clearly, this corresponds to the P6lya urn scheme with c = -1, that is, Q = -1/(b r). Thus, P6lya distributions include binomial as well as hypergeometric distributions as special cases, and hence may be considered as a “bridge” between these two types of distributions. Note also that if a random variable X has distribution (10.4), then
+
EX
=
(10.5)
np
and
Var X
10.3
l t a n
= np(1 - p ) ____
l+cW
.
(10.6)
Pascal Distribution
From Chapter 7 we know that the negative binomial N B ( m , p )distribution with integer parameter ~ r is t same as the distribution of the sum of m indepeiident geometrically distributed G ( p ) random varia.bles. If X N B ( m ;p ) , then N
(10.7) The appearance of this class of distributions in this manner was, therefore, quite natural. Later, the family of distributions (10.7) was enlarged to the family of negative binomial distributions N B ( a ,p ) for any positive parameter a . Negative binomial distributions with integer parameter a are sometimes (especially if we want to emphasize that Q is an integer) called the Pascal distributions.
NEGATIVE HYPERGEOMETRIC DISTRIBUTION
10.4
103
Negative Hypergeometric Distribution
The Pasca.1 distribution (i.e., the negative binomial distribution with integer parameter a ) has the following “urn and balls” interpretation. In an urn that contains black and red balls, suppose that the proportion of black balls is p . Balls are drawn from the urn with replacement, which means tha.t we have the P6lya urn scheme with c = 0. The sampling process is considered to be completed when we get the mth red ball (i.e., the mth “failure”). Then, the number of black balls (i.e., the number of “successes”) in our random sample has the distribution given in (10.7). Let us now consider an urn having T red and b black balls. Balls are drawn without replacenient (i.e., the P6lya urn scheme with c = -1). The sampling process is once again considered t o be completed when we get the m t h red ball. Let X be the number of black balls in our sa.mple. Then, it is not difficult to see that X takes on values 0 , 1 , . . . , b with probabilities
Such a randoni variable X is said to have the negative hypergeometric distribution with parameters b = 1 , 2 , . . ., r = 1 , 2 , . . ., and m = 1 , 2 , . . . , r.
Exercise 10.1 Show that the mean and variance of the random variable above are given by
EX=and Var X respectively.
=
m(b + r (T
mb r+l
+ l)b(r
+
1)2(T
(10.9)
-
m
+ 2)
+ 1)’
(10.10)
This Page Intentionally Left Blank
Part I1
CONTINUOUS DISTRIBUTIONS
This Page Intentionally Left Blank
CHAPTER 11
UNIFQRM DISTRIBUTION 11.1 Introduction The uniform distribution is the simplest of all continuous distributions, yet is one of the most important distributions in continuous distribution theory. As shown in Chapter 2, the uniform distribution arises naturally as a limiting form of discrete uniform distributions.
11.2
Notations
We say that the random variable X has the standard uniform distribution if its pdf is of the form 1 0
ifO
(11.1)
ifx<0 if O j x l l if z > 1.
(11.2)
The corresponding cdf is given by 0 z
1
We use the notation X U ( 0 , l ) for the standard uniform distribution, concentrated on the interval [0,1]. As we well know, the linear transformation Y = a + h X , h > 0, preserves the type of distributions and gives us a new random variable Y , which takes on values in the interval [a,a h ] . It is easy to see that N
+
FY(Z)
=
P{Y 5 }.
= P{a
+ hX
5 x}
(11.3)
107
UNIFORM DISTRIBUTION
108
and the corresponding pdf is of the form (11.4)
For h = b - a ( b > a ) , we obtain the distribution with pdf ifa<x
(11.5)
otherwise, and cdf
(11.6) We denote this distribution by U ( a ,b ) , and call it the uniform U ( a ,b) distribution. Some authors call it the rectnngular distribution.
Moments
11.3
Since the random variable X exist.
-
U ( a ,b) has a finite support, all its moments
-
Moments about zero: If X U ( a ,b ) , then
-
bnfl
+
- an+l
(n l)(b
~
(11.7)
a)
and, in pa.rticular, we have the first four moments as ~1
=
a+b EX=----
a2
=
EX2=
ag = C Y ~
=
2
a2 +
' ab
(11.8)
+ b2
+
(11.9)
>
3 a' a2b + ab2 b3 EX3= , 4 a4 + a3b + a2b2 + ab3 EX4= 5
+
(11.10)
+ b4
(11.11)
respectively. From (11.7), we also note that (11.12)
MOMENTS
109
for the standard uniform U ( 0 , l ) distribution.
Central moments: For any n = 1 , 2 , .. . , we obtain from (11.7) that ijjn
=
E(X
-
EX)"
=
b-a
en dv -
( b - a)" ( 1 - (-1)"+1} ( n 1) 2n+l
+
(11.13)
Thus, we simply obtain for any n = 1 , 2 , . . . , P2n-1 P2n
,
=
0
=
(b( 2 n + 1) 4"'
(11.14)
From (11.14), we readily have the variance of X t o be Var X
= pz =
and, in particula.r, Var X
~
(b- u ) ~ 12 '
(11.15)
1 12
=-
for the standard uniform U ( 0 , l ) distribution.
Exercise 11.1 Let X and Y be independent random variables with uniform U ( 0 , l ) distribution. Then, find the cumulative distribution function, probability density function, and expectation of the random variable Z = X Y .
Shape characteristics: From (11.14), we note that the uniform U ( a ,b) distribution has its Pearson coefficient of skewness y1 t o be zero. In fact, in this case, the distribution is also symmetric about the mean ( a + b ) / 2 . Furthermore, we obtain the coefficient of kurtosis as (11.16) Hence, the uniform U ( a ,b) distribution is symmetric platykurtic for any choice of a a.nd b.
UNIFORM DISTRIBUTION
110
11.4
Entropy
Recall from Chapter 1 that if a random variable X has pdf p ( z ) , then its entropy is defined as (11.17)
H(X)= -
11 (A) log
b-a
dz = log(b - a )
(11.18)
Remark 11.1 It is of interest t o note that among all the absolutely continuous distributions having support [a,b], the maximal value of the entropy is attained by the uniform U ( a ,b ) distribution.
11.5 If X
N
Characteristic Function U ( a ,b ) , then its characteristic function has the form
,itb -
-
,its
( b - a)it
(11.19) '
In particular, we have (11.20)
for the standard uniform U ( 0 , l ) distribution, and ,it
-
,-it
-
f x ( t )=
sin t
~
(11.21)
t
for the symniet,ric uniform U (-1,1) distribution.
11.6
Convolutions
Let XI and X 2 be independent random variables with pdf's p l (z) and p z ( z ) , respectively. To find the pdf p y ( x ) of the sum Y = X I Xa, we may use the following equalities:
s,
+
c4.
PY (z)
=
PXl
(J
-
YIP& ( 9 ) 4 4
P c c
(11.22)
DECOMPOSITIONS
111
Exercise 11.2 Suppose that X I U ( 0 , l ) and X2 are independent. Show then that tJhepdf of Y = X I N
U ( 0 ,l ) , and that they
+ X z is given by
N
if O < n : < l if 1 < z < 2 otherwise.
(11.23)
Remark 11.2 The distribution with pdf (11.23) is called the triangular distribution, and the shape of p y ( z ) gives the obvious reason for this name.
+
+ +
The pdf of the sum Y, = X I X z . . . X , of independent U ( 0 , l ) distributed random variables X I , X z , . . . X , has the following more complicated form: (11.24)
+
if k 5 x 5 k 1 for k = 0 , 1 , . . . , n - 1, and 0 otherwise. Indeed, for the case when ri = 2, (11.24) reduces to the simple expression in (11.23).
11.7 Decompositions
-
i),
Let X N U ( 0 ,l ) ,V U ( 0 , and Y N B e ( $ ) ,where V and Y are independent random variables. It then turns out that d Y X=V+-.
(11.25)
2
In fact, the characteristic functions of X , V , and Y/2 have the following form:
fx(t)
=
fv(t)
=
1 -(e it
2
it
-
-
l), l),
1
+ 1) f s ( t ) = -(8 2 Since
fx ( t )= fv(t)fs( t ) ,
relation (11.25) follows immediately. Developing this idea further, we can use the equality
(11.26)
UNIFORM DISTRIBUTION
112
t o show that for m y n = 1 , 2 , . . . , Yl 2
d
X=V,+-++..+-,
-
Yn 2"
(11.27)
where X U ( 0 ,I ) , V, N U ( 0 ,2 ~ - n ) ,and Yl,Y,, . . . all have Bernoulli B e ( $ ) distribution, a.iid all the random variables on the R.HS of (11.27) are independent. We thus see that X U ( 0 , l ) can be expressed as a sum of a.ny number of indepeiiderit random variables. Yet its distribution is not infinitely divisible, since X has a finite support.
-
11.8
Probability Integral Transform
Consider a random variable Y ha.ving a continuous cdf F . Let G be the inverse of F , defined as
G ( u )= inf{z : F ( z ) 2 u}, 0 < u < 1.
(11.28)
Notc t,lia.t F ( G ( z ) )= z for 0 < z < 1. Lct us now introduce a new random variable,
x =F(Y). It is then clear tha.t 0 5 X 5 1. Also, it is ea.sy t o find the cdf of X as FX(2)
= =
P { X 5 z} = P { F ( Y )5 z} P { Y 5 G ( z ) }= F ( G ( z ) )= z for 0 < z < 1.
This simply means t h a t X U ( 0 , l ) . Hence, for any random variable Y with a. continuous cdf F , the transformation X = F ( Y ) results in a standard uniform U ( 0 ,1) random varia.ble. On the other hand, let F be any distribution function and let G be its inverse a.s defined in (11.28). Then, it turns out that the transformation Y = G ( X ) , where X U ( 0 ,l), will yield a random varia.ble Y with the prefixed cdf E'. N
-
Remark 11.3 This property is very useful in simulating samples from any coritiiiuoiis distribution F , as the problem can be reduced t o siinulation from sta.nda.rd uniform U ( 0 ,1) distribution. Thus, with the a.id of an eficicrit uriiform random number generator and a method t o determine G, we can easily coiist>ructan algorithm for simula.ting samples from F .
11.9
Distributions of Minima and Maxima
Consider a srquencr of indrpendent random variables XI, Xz, . . . having the standard uniform U ( 0 ,1) distribution. Let
rn,
= min(X1, X,, . . . , X,}
(11.29)
DISTRIBUTIONS OF MINIMA AND MAXIMA
113
and
M,
n = 1,2,....
= max{X1,X2,. . . ,X,},
(11.30)
It is then easy to see that for 0 5 x 5 1,
G,(z)
> z}
=
P{m, 5
=
l-P(X1 >z, ...,X , > X } 1- (1-z)"
=
Z} =
1 - P{m,
(11.31)
and
H n ( z )= P{Mn I X } Note that
= P(X1
1
1
EM,
=
5
2,.. .
, X, 5 z> = zn.
n zd(z") = n+l '
(11.32)
(11.33) (11.34)
and 1- E M n
1
+1+ O
= Em, = ___ 71
asn-tm.
(11.35)
This shows that for large n, 1- Mn and m, are close to zero. To examine the behavior of m, a.nd 1- Mn for large n, we will now determine the asymptotic distributions of these ra.ndom variables, after suitable normalization. It turns out that
Gn(:)
":I
=
P{m,< n =P{~zm,<x}
-
1 - (1 -
37''
1' 1 -
e P
for z
> 0,
(11.36)
as n t 00. Thus, we have established that the asymptotic distribution function of n min(X1, X z , . . . , X,) is given by
G ( x )=
1 - e-"
ifx
(11.37)
This is the exponential distribution, which is discussed in detail in Chapter 18.
UNIFORM DISTRIBUTION
114
Exercise 11.3 Show that
P { n ( M , - 1) < x}
+ e5
for x
<0
(11.38)
asn+w.
Remark 11.4 Consider the distribution functions
,
O < z < n , n=1,2, ...
which we encoiintered in (11.36). Following Proctor (1987), we can say that they form the family of generalized uniform distributions as the sequence R l ( x ) ,R2(x), . . . forms a. “bridge” between the uniform distribution [ R ~ ( I= L’) IC, 0 5 x 5 11 and t,he exponential distribution [R,(Ic) = limn+coRn(z) = 1 - e-z, x 2 01.
11.10
Order Statistics
Let us consider n independent standard uniform U ( 0 , l ) random variables X I , X2,. . . , X r L . In Section 11.9 we investigated the maximal and minimal values of X I , . . . , X,. We can similarly consider other values among the ordering of XI, . . . , X,. A result of such an ordering gives the order statistics (in this case, uniform order statistics)
In pa.rticiilar, =
and
min(X1,. . . , X,}
X,,n = rnax(X1,. . . , X , } ,
which were denoted earlier by m, and Mn, respectively. Let, F k , J x ) = P { X k , , 5 x}
for 0
5 IC 5
1, 1
n,
71 =
1,2,... ,
denote the ctlf of the order statistic Xk,,. Simple probability arguments yield an exprcssion for Fk,n(x). In fact, the event A t ; = {Xk,, 5 x} coincides with thc cvcrit &(.I-), where
u:==,
BT(x) = {exactly r of the variables X I , . . . , X , are at most x}. It is then easy to see that
P{B,(x)} = p
l
- x)n-T
ORDER STATISTICS
115
and, consequently, (11.40) Using the equality in (5.36),we readily obtain for 0 5 z 5 1,
=
2
(;)ql-
x y r
r=k
Note that the cdf given in (11.41) belongs t o the family of beta distributions, which is discussed in detail in Chapter 16. Upon differentiating (11.41) with respect to z,we immediately obtain the pdf of X k , , as Pk,n(x) =
n! xk-'(l( k - l ) ! ( n- k ) !
x y ,
05
From (11.42), we readily find the mean and variance of
EXk,,
=
Xk,n
5 1.
k n+l
(11.43)
+
k ( n - k 1) ( n 1)2(n 2)
+
(11.42)
as
=-
and Var X k , ,
2
+
(11.44)
Exercise 11.4 Derive the formulas presented in (11.43) and (11.44).
Using similar probability arguments, we can derive the joint pdf of and X!,, (for 1 5 k < C 5 n ) . For this purpose, let
Xk,,
F k , g , , ( s , y ) = P { X l c , n I z , X e , n < ~ }for O < z < y / < l , l < k < C < n denote the joint cdf of the order statistics Xk,, and XC,,. We then observe that for 0 5 z < y 5 1, Fk,!,n (2, y)
=
-
P {at least k of X I , . . . , X , are at most z and at least C of X I , . . . , X , are at most y}
c Cr! n
s
s=t?
r=k
n! xT(y - s)s-T(l - y)"-". ( s - r - l ) ! (TL-s)! (11.45)
116
UNIFORM DISTRIBUTION
Upon using the identity that the RHS of (11.45) equals
t?-l(t,
n!
(k-l)! (t-k-l)!
-
tl)f-k-l
x (1- t2)n'-e dtz dtl, we obtain the joint density of Lsnas Fk,',n(z'y)
=
xk,,
for 0 5 x
and
( k - I)! (.(
-
n! k I)! ( n - j ) ! -
x ( t Z - tl)e-k-l(l
-
(11.46)
< y 5 1 and
/x
/yt l
tz)npedt, dt,.
15 k
<
(11.47)
Upon differentiating (11.47) with respect to y and x,we obtain the joint pdf of Xk,, a.nd Xg,, as p k , t , n ( z ,V )
=
n! z"'(y - .)!-k-1(1 ( k - l ) ! (1- k - l)! (n- e)! O<x
-
p, (11.48)
Exercise 11.5 Prove the combinatorial identity in (11.46).
From (11.48) we can show that
so that the covariance between Xk,, and X!,, is given by
c o v (Xk,n,Xe,n)
E (Xk,nXt.n) - EXk,n EXe,TL kL k ( l + 1) -___ ( n l ) ( n 2) (n k ( n -e 1) =
+
-
+
+
(n+l),(n+2)'
+
(11.49)
Fiirthermore, combining the forniiilas in (11.44) and (11.49), we obtain the correlation coefficient betwecn Xk,, and Xt,,, for 1 5 k < t 5 n as (11.50)
RELATIONSHIP WITH OTHER DISTRIBUTIONS
117
Exercise 11.6 Derive the formulas presented in (11.49) and (11.50).
Remark 11.5 Let Yl,. . . , Y, be a random sample from a continuous pdf p(y) and cdf F ( y ) , and Yl,, 5 Y Z ,5~... ~ 5 Y,,, denote the corresponding d
order shtistics. Since F ( Y , ) = X i , where X I , . . . , X , is a random sample from the standard uniform U ( 0 , l ) distribution (see Section 11.8) and that the transformation is order-preserving, we readily obtain from (11.42) the pdf of Y k , , n as
-m < x
and from (11.48) the joint pdf of Yk,, and
11.11
Ye,,
(1 5 k
< m,
(11.51)
< l? 5 n) as
Relationships with Other Distributions
We have already seen the relationship of uniform distribution to discrete uniform, triangular, exponential, and beta distributions. Now, we will show that Cauchy distributions are closely related to the uniform distribution. Let 0 = (O,O), A = (1,0), E = (0, -1) and D = ( 0 , l ) . Let the point B be chosen uniformly on the semicircle DAE meaning that 8, the angle between O E and O B , has the uniform U(-7r/2,7r/2) distribution. Let C = ( 1 , Y ) be the point of contact of the extension of the line OB on the line Y = 1. The second coordinate of C is a random variable Y which is negative if -7r/2 < 6' < 0, and is positive if 0 < 0 < n/2. Exercise 11.7 Show that the cdf and pdf of Y are given by
1 1 2 7 r
1
-m
< y < m. (11.53)
The distribution of Y in (11.53) is called the Cauchy distribution, which is discussed in detail in Cha.pter 12.
118
UNIFORM DISTRIBUTION
Let X be distributed as standard uniform U ( 0 , l ) . Then, Tukey (1962) studied the transformation
Y=
UXX
-
(1-
x
xp
1 U u > 0 , x f 0 , - - < Y < - for X > O ,
x-
with the transitional transformation (case A
= 0)
x
(11.54)
being
Y = l o g1-x ( Z ) , a>O
(11.55)
The distribution of Y is called as Tukey ’s lambda distribution. The case u is referred to as Tukey’s symmetric lambda distribution, in which case
Y=
XX
-
(1 - X)X
x
3
xfo,
=
1
(11.56)
and (11.57)
Exercise 11.8 If Y follows Tukey’s lambda distribution as in (11.54), show that its nth moment about zero is given by l
n E(-l)’
EYn = X” 3=0
un-’B ( x ( n- j )
+ 1,X j + 1),
where B ( . , .) denotes the complete beta function. Show, in particular, that the mean and va.riance of Y are given by
EY and
respectively.
=
a-1 X(X 1)
+
CHAPTER 12
CAUCHY DISTRIBUTION 12.1
Notations
In Chapter 11, in Exercise 11.7, we found a new distribution with pdf 1
p ( z ) = (. 1 + $) ’
-00
< x < 00,
(12.1)
and cdf
+
1 1 ta.n- 1 z, (12.2) F(z)= -00 < 2 < 00. 2 7 r A random variable X with pdf (12.1) and cdf (12.2) is said to have the standard Cauchy distribution. A linear transforrnation
Y=a+hX gives a random variable with pdf 1
PY(X) =
(x- a)’
7r
K h { l + T } and cdf 1 1 2 7 r
F ~ ( z=) - + - tan-’
h
-
{h2
+ (2
-00 -
< x < m, (12.3)
(xia). __
--o<x<m, - m < a < w , h>0. (12.4)
The random variable Y belongs to the Cauchy type of distributions, and we will denote it by Y C ( a ,h ) . N
Indeed, the random variable X C ( 0 , l ) then has the standard Cauchy distribution defined by (12.1) and (12.2). Note that X G ( 0 , l ) is expressed in terms of the uniform U ( 0 , l ) random variable U as N
N
x
N
{
tan r ( U
-
119
a)}.
(12.5)
120
12.2 Let X
CAUCHY DISTRIBUTION
-
Moments C ( 0 , l ) . Then, p x ( z )
EIXI*
-
=
1/(7rz2)as z -+
DC
or z
+ -m.
Hence,
oc)
Lot) /zl"p<(z)dx
=M
2 1, and EIXI" < 00 if Q < 1. This mea.ns that the moments of order a < 1 do exist, while the moments of order a 2 1 do not exist for Cauchy distributions. if a
12.3
Characteristic Function
Let us consider
roo
,ztx
which is the chara.cteristic function of the random variable X C(0,l). With the help of complex contour integration, it can he shown tha.t N
(12.6)
f x ( t ) = e+l.
Then the characteristic function of Y , having the Cauchy C ( u ,h ) distribution, becomes (12.7)
f Y ( t ) = e zta-hit1
12.4
-
Convolutions
-
Let U C(a1,h l ) a.nd V C(a2,hz) be independent random variables, and let Y = U V. Then the characteristic function of Y is given by
+
fY(t)
=
f u ( t ) f v ( t= ) eztal--hlltl +az)-(h1+hz) It/
-
(12.8)
1
which simply implies that
Y
eitaz--hzltl
N
C(a1
+
U'L, lL1
+ hz).
-
(12.9)
Generalizing this result, if we assume Y k C ( Q , h k ) ( k = 1 , 2 , .. . ,n ) t o be indepcndent random variables, and Y = Y1 . . . Y,, then Y C ( a ,h ) , where a = a1 . . . a7Land h = hl . . . h,. N
+
+
Exercise 12.1 Let Y k variables, and Y = (Yl C(n,h,) distribution.
+ +
N
C(a,h ) ( k
=
+ +
1 , 2 , . . . , n ) be independent ra.ndom
+ . . . + Y7,,)/n.Show then tha.t Y also has the Cauchy
E x e r c i s e 12.2 Let Y k C ( a ,h ) ( k = 1 , 2 , . . . , r i ) be independent random variables. Find the distribution of a linear conibination Y = clY1+. . . c,Y,. N
+
TRANSFORMATIONS
121
Decompositions
12.5
The result in (12.9) implies that any Cauchy distribution can be represented as a sum of two or more random variables also having the Cauchy distributions. Moreover, for any n = 1 , 2 , . . ., the Cauchy C(a,h ) distribution is expressed as a siini of n independent random variables, each of them having the Cauchy C ( a / n ,h / n ) distribution (see Exercise 12.1). This means that any Cauchy distribution is infinitely divisible.
Stable Distributions
12.6
A characteristic function f ( t ) and the corresponding dist,ribution are said to be stable if for any a1 > 0 and a2 > 0, there exist constants a > 0 and b such that
Exercise 12.3 Show that any Cauchy distribution is stable.
Transformat ions
12.7
1/X. It turns out that Y also has the standard Let X C ( 0 , l ) and Y Cauchy distribution. In fact, the pdf’s of X and Y satisfy the relationship N
Z2PY(Z) =P X ( l / Z ) .
(12.11)
From (12.11), we simply obtain
Remark 12.1 The fact that
x 2 1/X
(12.12)
does not, however, imply that X must have the Cauchy C(0,l) distribution. To see this, let. us take V to be any symmetric random variable, and X = ev. Then, since
v = -v, d
we readily have
x = ,v d ,--v
=
1/X.
CAUCHY DISTRIBUTION
122
Exercise 12.4 Use (12.5) t o show that if X also has the Cauchy C(0,l) distribution. Exercise 12.5 Let X
N
C ( a ,h) and Y
=
N
C ( 0 ,l),then V
:
l / X . Show then that
(X
~
+)
CHAPTER 13
TRIANGULAR DISTRIBUTION 13.1 Introduction In Chapter 11 we derived the pdf of the sum of two independent standard uniform U ( 0 , l ) random variables as (13.1)
Because of the triangular shape of p ( z ) , the corresponding distribution is called as a triangular distribution.
13.2
Notations
The standard triangular distribution has its pdf as
where 0 < p in (13.2) by
< 1. We will denote the random X
The linear transformation Y = a gular density functions, since
-
variable X which has its pdf as
Tr(p, 0 , l ) .
+ hX
then provides a general form of trian-
We denote this general triangular random variable by
Y-Tr(p,a,h),
0 < p < l , --oo
h>0,
124
TRIANGULAR DISTRIBUTION
which has the pdf
If p = f , the distribution of Y is called the symmetrical triangular distribution (symmet,ric around the point a h / 2 ) .
+
Moments
13.3
Let X Tr(p, 0 , l ) . All moments of X exist since it is a bounded random variable. N
Moments about zero:
(13.4) In particular, we have the following expressions for the first four moments: a1
=
a2
=
E X = -1 + P 3 ' 1+p+p2 EX2 = 6 '
(13.5) (13.6) (13.7)
a4
=
EX4=
1+P+P2+P3+p4 15
(13.8)
Central moments: From (13.5) and (13.6), we readily find the variance of X to be (13.9) Similarly, from (13.5)- (13.8)' we find the third and fourth central moments of X to be
(13.10)
(13.11)
CHARACTERISTIC FUNCTION
125
respectively.
Moments about p : For the triangular distributions, due to the form of the pdf in (13.2), it is easier to find the moments about p rather than the central moments [moments about the mean E X = (1 p)/3]. The nth moment about p is obtained from (13.2) to be
+
E ( X -p)"
=
2 { (-l)"p"+l+ (1 - p)"+l} ( n l ) ( n 2)
+
+
(13.12)
Shape characteristics: From (13.9) arid (13.10) we obtain the coefficient of skewness as
TI=--P3
p;/2
-
5(1 -
-
+P)(2 +p 2 ) 3 / 2
-
p)
(13.13)
It is then clear from (13.13) that the distribution is positively skewed for p < $ and is negatively skewed for p > $. The coefficient of skewness is 0 when p = (in this case, the distribution is in fact symmetric). From (13.9) and (13.11) we obtain the coefficient of kurtosis as
which reveals that the distribution is platykurtic for all choices of p .
13.4 Let X
N
Characteristic Function Tr(p,O, 1). Then, the characteristic function f x ( t )has the form (13.15)
In the general case when X function is given by
N
Tr(p, a,, h ) , the corresponding characteristic
(13.16) The most interesting case is Y
N
Tr
(i,-1,2),
P Y ( ~= ) max{0,1- Izl),
and the characteristic function of Y is
when the pdf of Y is (13.17)
126
TRIANGULAR DISTRIBUTION
Note that f ( t ) = (2/t) sin(t/2) is the characteristic function corresponding to the symmetric uniform U $) distribution and, hence, the triangular Tr($, -1,2) distribution is the convolution of two uniform distributions concentrated on the same interval (- ;, ;).
(-i,
Consider the following expression, which relates the pdf and the characteristic function of a certain random variable Y : (13.19) In the case when Y
N
Tr (;,-1,2),
(13.19) gives
which is equivalent, to (13.21) Setting t
=0
in (13.21), we simply obtain (13.22)
and, consequently, the nonnegative function p ( z ) = (2/7r) sin2(z/2) is the pdf of some random variable V . We can, .therefore, see from (13.21) that f ( t ) = max(0,l Iti} is indeed the characteristic function corresponding to the pdf p(x) = (2/7r)sin2(x/2). Hence, for any A > 0, the function f A ( t ) = f ( A t ) = max(0,l - Alt/} is also a characteristic function. Characteristic functions fA(t) are used to prove the Pdlya criterion, giving a sufficient condition for a real function to be a characteristic function: -
If a real even continuous function g(t) [with g(0) = 11 is convex on (0, m) and limt+m g ( t ) = p , 0 5 p 5 1, then g ( t ) is a characteristic function of some random variable.
CHAPTER 14
POWER DISTRIBUTION 14.1
Introduction
In Chapter 11 [Eq. (11.32)] we found that the distribution of maximal value M ( n ) = max{ U1, . . . , U n } of n independent uniform U ( 0 , l ) random variables U l >.. . , Un has the following cdf: &(z) = 9 ,
0 < 2 < 1.
(14.1)
Note that if U U ( 0 , l ) and X = U'/", then also X has cdf (14.1), which is a special case of the power distribution. N
14.2
Notations
The standard power distribution function has the form
0 < x < 1,
F,(2) = z a ,
Q
> 0,
(14.2)
and the corresponding pdf is pa(.)
0 < 2 < 1, Q > 0.
= ad-1,
(14.3)
In the special case when a = 1, we have the standard uniform distribution. The linear transformation gives us a general form of the power distribution. The corresponding cdf is
2-a G a ( z ) =( T ) a ,
a < z < a + h , a>0,
(14.4)
and the pdf is
ga(x) = ah-"(x
-
ay-1,
(14.5)
where Q > 0, -cc < a < 03 and h > 0 are the shape, location, and scale parameters of the power distribution, respectively. We will use the notation
x
-
P o ( a , a, h )
to denote the random variable X having cdf (14.4) and pdf (14.5).
128
POWER DISTRIBUTION
Remark 14.1 Let U have the standard uniform U ( 0 , l ) distribution, and X Po(a,a , h ) . Then, N
X
14.3
d
a
+ hU'l*.
(14.6)
Distributions of Maximal Values
Let X I , X 2 , . . . , X , be independent random variables having the same power Po(cu,a , h ) distribution, and M , = max(X1, X 2 , . . . , X,}. It is then easy to see that,
P { M , 5 x} m,
=
(xha)"'L, __
M,
N
a<x
Po(an,a , h ) .
Thus, the power distributions are closed under maxima. Exercise 14.1 Let X I , X z , . . . X, be a random sample from the standard power Po(cu,O:1) distribution] and let XI.,^ < X2,n < . . . < X,,, be the corresponding order statistics. Then, from the pdf of X k , , ( k = 1 , 2 . . . , n) given by pk,n(r) =
n! {F,(x)}k-l(1 ( k - l)! ( n- k ) !
-
F * ( r c ) } n - k p , ( 2 ) , 0 < x < 1,
where F,(x) and pN(x) are as in (14.2) and (14.3), derive the following formulas: and
Exercise 14.2 Let X I ,X 2 , . . . , X,, be a random sample from the standard power Po(a,O, 1) distribution, and let XI., < Xz,, < . . . < X,,,, be the corresponding order statistics. Further, let x2,n Xn - l, , w,= -,xX1,n w,= , .. . , w,-1= , wn = Xn,,. 2,n x3,n xn,n _^__
Prove that W ,, W 2 , . . . , W, are independent random variables with tribukd as P o ( k a ,0 , l ) .
Wk
dis-
Exercise 14.3 By making use of the distributional result above, derive the expressions for EXk., and E(X,&) presented in Exercise 14.1.
MOMENTS
14.4
129
Moments
The relation in (14.6) enables us t o express moments of the power distribution in terms of moments of the uniform distribution.
Moments about zero: Let X N Po(a,a , h ) . Then,
=
5(L)
hkanPk.
(14.7)
In particular, we have
a1 = E X = a
rLu: +cut1
(14.8)
and =EX2
2aha h2a = a 2 + -+ ___ u:+l a+2'
(14.9)
From (14.8) and (14.9), we also obtain the variance of X as (14.10) Plots of power density function are presented in Figure 14.1 for some choices of a.
Exercise 14.4 Froni (14.7), derive the expressions for the coefficients of skewness and kurtosis (71 and 7 2 ) . Exercise 14.5 From the expression of y1 derived in Exercise 14.1, discuss the nature of the skewness of the power distribution.
POWER DISTRIBUTION
130
Power(0 5) Power(2) Power(4)
I
I
I
I
I
I
00
02
0.4
06
08
10
X
Figure 14.1. Plots of power density function
131
CHARACTERISTIC FUNCTION
14.5
Entropy
Recall from Section 1.5 that the entropy H ( X ) of a random variable having a pdf p(x) is defined as
H ( X )= where
P(X)logp(x) dx,
D = {X : p(x) > 0).
Exercise 14.6 For X
Po(a,a, h), show that
N
H ( X ) = logh
14.6
L -
loga
+1
-
1
(14.11)
-
a
Characteristic F'unct ion
Let X follow the standard power distribution P o ( a ,0 , l ) . Then, the corresponding characteristic function has t h e form
fa(t)
=
-
a/
1 0
O0
k=O
00
eitXxa-' dx
=a
x
dx
k=O
(it)k
(k+a)'
(14.12)
The RHS of (14.12) can be simplified for integer a. For example,
( 14.13) is the characteristic function of the standard uniform U ( 0 , l ) random variable. It is also easy t o check that the function fn(t) satisfies the following recurrence relation: (14.14)
POWER DISTRIBUTION
132
Exercise 14.7 Prove (14.14) and then show that for any n = 1 , 2 , . . . the following expression for f n ( t ) is true: n--1
C (-it)k k ! + ( - l ) n n ! (it)-".
f n ( t )= (-l)"+'n!
(it)-ne2t
~
(14.15)
k=O
In particular, deduce that
f2(t) =
2eit
(1 - it
- e-if)
(14.16)
t2
and
(14.17)
Since we know the characteristic function of the standard power distribution, it will be possible t o write the characteristic function of the general power distribution. In fact, the relation
Y where X
N
Po(a,0, I ) and Y
N
d
=a
+hX,
Po(a,a , h ) , readily implies that
f y ( t )= Ee itY
-
eiatfa(ht),
with f C Y ( t = ) EeZtXas given in (14.12) for any a integer values of a .
> 0, and as in (13.15) for
CHAPTER 15
PARETO DISTRIBUTION 15.1
Introduction
The standard power distribution Po(a,0, l), N > 0, which we discussed in Chapter 14, was noted earlier [see (14.6)] to be the same as the distribution of U1/"(for QI > 0), where U is the standard uniform U ( 0 , l )random variable. Let us now consider the distribution of U1/" for negative values of Q. Let
x 2 u-'/",
N
> 0.
We can then see that the corresponding cdf and pdf are given by
F,(x)= P{U-'/" < X}
= P{V
> X-"}
=
1 - X-",
x >_ 1,
(15.1)
and (15.2)
A book-length account of Pareto distributions, discussing in great detail their various properties and applica.tions, is available [Arnold (1983)].
15.2
Notations
A random variable X with cdf (15.1) and pdf (15.2) is said t o have the standard Pareto distribution. A linear transformation Y=a+hJ gives a general form of Pareto distributions with pdf
ah"
if z > a + h if z < a + h ,
1 0
and cdf
FY(z) = 1-
(")
2-a
a
,
x >_ a t h , 133
--M
< a < 00. h > 0.
(15.3)
(15.4)
134
PARETO DISTRIBUTION
We will use the notation
a > o , -cw
YNPa(a,a,h),
h>0,
t o denote the random variable Y having the Pareto distribution with pdf (15.3) and cdf (15.4). Then,
x
N
a > 0,
Pa(a,O,l ) ,
corresponds to the standard Pareto distribution with pdf (15.2) and cdf (15.1). Note that if Po(a,0, l),
v
then
N
1 x =V
N
Pa(a,O,1).
It is easy t o see that if Y Pa(a,a , h ) , then the following representation of Y in terms of the standard uniform random variable U is valid: N
Y
1 a + hU-'la.
(15.5)
Exercise 15.1 Prove the relation in (15.5).
15.3
Distributions of Minimal Values
Just as the power distributions have simple form for the distributions of maximal values M , = max(X1, X,, . . . , X,}, the Pareto distributions possess convenient expressions for minimal values. Let X1 , X2, . . . , X , be independent random variables, and let
XI,
N
P u ( ~ Ia,, ,h ) ,
k = 1 , 2 , . . . , R,
i.e., these random variables have the same location and scale parameters but have different values of the shape parameter QI,. Now, let mTL = min{X1, X 2 , . . . ,X n } .
It is then ea.sy to see that
J:
where
a(n) = a1
+ a2 + . . . + an.
We siniply note from (15.6) that
rn,
N
Pa(a(n),a, h).
Thus, the Pareto distributions are closed under minima.
2 a + h,
(15.6)
DISTRIBUTIONS OF MINIMAL VALUES
135
Exercise 15.2 Let X I ,X2, . . . ,X , be a random sample from the standard Pareto Pa(a,O,1) distribution, and let X I , , < X Z , , < ... < X,,, be the corresponding order statistics. Then, from the pdf of x k , , ( k = 1 , 2 . . . , n) given by
pk,n(x) = ( I c
n! {F,(2)}k-1 (1 - F,(x)}n--kp,(2), 0 < 2 < 1; - l ) ! ( n - Ic)!
where F,(s) and pa(.) mulas:
are as in (15.1) and (15.2), derive the following for-
and
Exercise 15.3 Let X I ,X 2 , . . . ,X , be a random sample from the standard pareto Pa(a,O,1) distribution, and let X I , , < Xz,, < ... < X,,, be the corresponding order statistics. Further, let
Prove that W1,W2,.. . ,W, are independent random variables with tributed as P a ( ( n- k 1)a,0 , l ) .
+
w k
dis-
Exercise 15.4 By making use of the distributional result above, derive the expressions for EXk,n and E(XZ%,)presented in Exercise 15.2. Exercise 15.5 Let V1,Vz,. . . ,V, be a random sample from the standard power Po(a,0 , l ) distribution, and let Vl,, < VZ,,,. . . , V,,, be the corresponding order statistics. Further, let X I ,X 2 , . . . , X , be a random sample from the standard Pareto Pa(a,O,l)distribution, and let X I , , < Xz,, < . . . < X,,, be the corresponding order statistics. Then, show that d
Xk,n =
1
vn-k+ 1 ,n
for I c = 1 , 2 ,..., n.
Exercise 15.6 By using the propoerty in Exercise 15.5 along with the distributional result for order statistics from power function distribution presented before in Exercise 14.2, establish the result in Exercise 15.3.
136
PARETO DISTRIBUTION
15.4 Moments
+
Let X Pa(a,O,1) and Y = a hX P a ( a , a , h ) . Unlike the power dist,ributions which have bounded support and hence possess finite moments of all orders, the Pareto distribution Pa(cy,a , h ) takes on values in an infinite interval ( a h, m) and its moments &IP and central moments E ( Y - EY)O are finite only when ,B < a. N
N
+
-
Moments about zero: If X Pa(a,O,l ) ,then we know that X can be expressed in terms of the standard uniform random variable U as
Hence. an = E X " =EU-"I"
1
1
=
and
x-nla d x =
if n
a , = oc)
I n the general case when Y consequently, we obtain EYn
=
0,
a a-n
__
if n < a ,
(15.7)
2 a.
P a ( a , a , h ) ,the relation in (15.5) holds and =E(u
+ hU-'/O)"
(15.8)
a.nd a ,
= 00
for n
2 a. In particular, we have a1
EY
1
ha a-1
= a + __
if a > l
(15.9)
and (15.10)
ENTROPY
137
Central moments:
The relation in (15.5) can also be used t o find the central moments of a Pareto distribution, as follows:
Pn
=
E(Y - EY)" = h"E(U-l/"
-
EU-l/")"
(15.11) for n < a. From (15.11) we find the variance of Y t o be Var Y
= /32
=
a-2
(a-1)2 (15.12)
Plots of Pareto density function are presented in Figure 15.1 for some choices of CY.
15.5
Entropy
The entropy N ( Y ) of Y
N
P a ( a ,a,, h ) is defined by
where p(x) is as given in (15.3).
Exercise 15.7 If Y
-
Pu(a,a , h), show that
H ( Y ) = -(I
+ 2a)logh
-
logs
+ 1-+ o!1 . -
( 15.13)
PARETO DISTRIBUTION
138
-3
Pareto(0 5) Pareto(2) Pareto(4)
m
U
n
a
m I I
,, .
I
7
.
, \
1
.- I . - _
0
..--.__ .--.---.._.__._.....___
I
I
I
I
I
1
2
3
4
5
X
Figure 15.1. Plots of Pareto density function
CHAPTER 16
BETA DISTRIBUTION 16.1
Introduction
+
Let a stick of length 1 be broken at random into ( n 1) pieces. In other words, this means that n breaking points U l , . . . , U, of the unit interval are taken independently from the uniform U ( 0 , l ) distribution. Arranging these random coordinates U 1 , . . . , Vn in increasing order of magnitude, we obtain the uni,form order statistics
[see (11.39)], where, for instance,
U1,n = min(U1,. . . , Un} and
Un,+ = max(U1,. . . ,77,).
The cdf Fk,n(z) of
U k , n was
obtained earlier in (11.41) as
where (16.1)
denotes the incomplete beta function, and (16.2)
is the complete bet,a function. The corresponding pdf form fk,n(IC) =
n! &'(l - z)~-', ( k - l ) ! ( n- k ) !
139
fk,n(z)
of
Uk,n
0 < IC < 1.
has the (16.3)
140
BETA DISTRIBUTION
Notations
16.2
We say that the order statistic Uk,n has the beta distribution with shape parameters k and n - k 1. More generally, we say that a random variable X has the standard beta distribution with parameters p > 0 and q > 0 if its pdf is given by
+
A linear transformation Y=a+hX,
-m
h>O
gives us the general form of beta distributions with pdf
a<x
(16.5)
The random variable with pdf (16.5) is denoted by
Y
N
beta(p, q, a,h ) .
For the random variable X beta(p, q , 0 , l ) having the standard beta distribution with pdf (16.4), we will use the simplified notation N
X
N
beta(p, 4 ) .
Thc iriost important special cases of beta distributions are the uniform ( p = 1, Q = 1) and power ( p > 0, q = 1) distributions. We also make a notrt here that the special case of beta distributions with p = q = $ is known as the arcsine distribution (discussed in Chapter 17), and the case q = 1~-p , 0 < p < 1, corresponds to the generalized arcsine distribution. The uniform and arcsine distributions are symmetric (about their expectations) distributions. The same property also holds for all beta(p, p , a , h ) distributions (that is, with equal parameters p and q ) , a.nd they are symmetric about their expectation a + h/2.
Mode
16.3
If p > 1 and q > 1, then the beta(p, q ) distribution is unimodal, and its mode (the point a t which the density function p x ( z ) takes its maximal value) is at 2 =
’ ~
p+q-2
, a.nd the density function in this case is a bell-shaped curve. If
SOME TRANSFORMATIONS
141
p < 1 (or q < l),then p x ( z ) tends t o infinity when z + 0 (or when z + 1). If p < 1 and q < 1, then p x ( z ) has a U-shaped form and it tends to infinity when 2 --t 0 as well as when x -+ 1. If p = 1 and q = 1, then px(x) = 1 for all 0 < z < 1 (which, as noted earlier, is the standard uniform density). Plots of beta density function are presented in Figures 16.1-16.3 for different choices of p and q .
16.4
Some Transformations
-
It is quite ea5y to see that if X N beta(p, q ) , then 1 - X beta(q,p). Now, let us find the distribution of the random variable V = 1/X.
Exercise 16.1 Show that (16.6)
Taking 4 = 1 in (16.6), we obtain the pdf of Pareto Pa(p, 0 , l ) distribution. The density function p w ( z ) of the random variable
takes on the form 1
p w ( 2 )=
rJyV-1
(I + x ) P + 4 '
(16.7)
> O.
Distribution with pdf (16.7) is sometimes called the beta distribution of the second kind.
16.5
Moments
-
Since the beta distribution has bounded support, all its moments exist. Consider the standard beta distributed random variable X bet+, q ) . Since 0 5 X 5 1, we can conclude that
and
EIX for any a
2 0.
-
EX/" 5 1
BETA DISTRIBUTION
142
c :
LD
I
Beta(0 5 , 0 5) Beta(0 5 , 2) Beta(0 5 4)
U
I
m U
n
a
c.l
7
0
I
00
I
02
I
I
04
06 X
I
I
08
10
Figure 16.1. Plots of beta density function when p = 0.5
MOMENTS
,
f
00
02
143
04
06 X
I
I
08
10
Figure 16.2. Plots of beta density function when p
= 2.0
BETA DISTRIBUTION
144
I 00
I
I
I
I
I
02
04
06
08
10
Figure 16.3. Plots of beta density function when p = 4.0
MOMENTS
145
-
Moments about zero: Let X beta(p,q). Then
(16.8) and, in particular,
= a+
EY
hEX
hP
(16.13)
= a+ -
P+4
and
EY2
=
oZ2+ 2ahEX
+ h2EX2 (16.14)
-
Central moments: If X bet& q ) , then we readily find from (16.9) and (16.10) the variance of t o be
<
0 2 = Var
X
=E(X
~
E X ) 2 = a2
-
a:
=
P4 . ( P + d 2 ( P + 4 + 1)
(16.15)
Similarly, for the beta(p, 4, a,h ) distributed random variable Y , we find from (16.13) and (16.14) the variance of Y t o be
Var Y = h2 Var X
=
h"P4 ( P + d 2 ( P + 4 f 1)
(16.16) '
BETA DISTRIBUTION
146
-
Proceeding in a similar fashion, when X beta(p,q), we can show from (16.9) (16.12) that the third and fourth central moments of X are given by (16.17)
(16.18) respectively. ~~
~
Exercise 16.2 Derive the expressions in (16.17) and (16.18). Exercise 16.3 At, the beginning of this chapter we considered a stick of length 1, which was broken at random into (n 1) pieces, having respective lengths
+
D1
=
ul,n> 0
Dn
=
u n , n - Un-1,nr
2 =
u',n
- ul,n,..
Dn+l =
.,
Dk = u k , n - UI,-l,n,. . . >
1- un,n.
These random variables DI, ( k = 1 , 2 , .. . ,n + 1) are called spacings of the uniform order sta.tistics, or simply uniform spacings. Then, by making use of relations (16.3) and (16.9), show that,
for any 1 5 k 5 n
+ 1.
Moments of nryatave order: If X N beta(p, q ) , then it is casy to find moments about zero of negative order. In fact, the moment
E X p n = ___ exists if n
< p and is given by (16.19)
In particular, we obtain
and
SHAPE CHARACTERISTICS
16.6
-
147
Shape Characteristics
Let X beta(p,q). Then, from the expressions in (16.15), (16.17), and (16.18), we obtain Pearson's coefficients of skewness and kurtosis of X as (16.22)
respectively. From (16.22), we see immediately that the beta(p, q ) distribution is negatively skewed for q < p and positively skewed for q > p ; further, the coefficient of skewness is zero when q = p (in fact, the distribution is symmetric in this case). Also, from (16.23), we see readily that the beta(p, q ) distribution is platykurtic when p = q . More generally, the distribution is platykurtic, mesokurtic, and leptokurtic depending on whether ( P - d 2 ( P + (1 + 1) - P d P
+ Q + 2)
< 0, = 0, and > 0, respectively.
16.7 If X
N
Characteristic Function beta(p, q ) , its characteristic function is (16.24)
If p and q are integers, then the RHS of (16.24) can be written as a finite sum of terms eit/(it)kand (it)-", taken with some coefficients. For example,
if p
= 1 and q =
if p = 2 and q
=
1 [the case of the uniform U ( 0 , l ) distribution];
1 [the power P0(2,0,1) distribution], and
148
BETA DISTRIBUTION
if p
= 2 and q = 2. For noniritegral values of p and/or q , we can use the following expression of the characteristic function:
where
F(a,b;z)
=
a 1 a ( a + 1) z 2 I+- z+b 2 ! b(b+ 1) 1 4 a+ + 2) z 3 + . . 3! b(b l ) ( b 2) ~
+-
+
(16.26)
+
is the Kummer confluent hypergeometric function. If Y beta(p, q , a , h ) , then its characteristic function is simply given by N
f Y ( t ) = Ee itY
16.8
-
Eeit("+hX)
-
eiatF(p,p
+ q; i t h ) .
(16.27)
Decompositions
Since the beta distributions have bounded support, they cannot be infinitely divisible. Yet, we can give an example of a random variable X having a beta distribution which can be represented as a sum X = XI X, . . . X, of n independent ternis [see relation (11.27), which is valid for X beta(1, l)]. Indeed, in this example, the random variables XI, X2,. . . ,X , have different distributions. On the other hand, there are indecomposable beta distributions as well. For example, it is known that the beta(p, q ) distribution is indecomposable if p t-q < 2. Recently, Krysicki (1999) showed that if X beta(p, q ) , then for any n = 2,3, . . . , it can be represented in the following form:
+
+ + N
-
(16.28) where
are independent random variables. An analogous result was obtained by Johnson and Kotz (1990). They showed that if independent random variables Xo,X I .,. . have a conmion
RELATIONSHIPS WITH OTHER DISTRIBUTIONS
149
beta(p, q ) distribution, then cc
j
j=O
i=O
x 2 C(-l)jnx, has a beta(p,p
16.9
+ q ) distribution.
Relationships with Other Distributions
As we mentioned earlier, beta distributions include uniform, power, and arcsine distributions as special cases. Also, Eqs. (5.35) and (5.36) show that if V has the binomial B ( n , p ) distribution and X has the beta(m,n - m + 1) distribution, where m 5 n are integers and 0 < p < 1, then P{V 2 m } = P { X < p } .
( 16.29)
This Page Intentionally Left Blank
CHAPTER 17
ARCSINE DISTRIBUTION 17.1
Introduction
As mentioned in Chapter 16, the arcsine distribution is an important special case of the beta distribution. In this chapter we focus our attention on this special case and discuss many of its features and properties.
17.2
Notations
(i,i)
We say that a random variable X N beta has the standard arcsine distribution. It follows from (16.4) that the pdf of the standard arcsine distribution is given by
Since I?(
i)= fi,we obtain (17.1)
The distribution of X is called the arcsine distribution since its cdf has the form FX(2)=
2 -arcsin&,
O5
7r
II:
5 1.
(17.2)
The density function in (17.1) is U-shaped with p x ( z ) tending to infinity when x + 0 as well as when 2 + 1, and its minimal value of 2/7r is attained a t 2 = see Figure 16.1. Linear transformations Y = a + h X of the random variable X b'rive us a family of all arcsine distributions, which is just the family of beta(;, u, h ) distributions. Note that if
i;
i,
151
152
ARCSINE DISTRIBUTION
then its cdf is given by
Fy (x) = - arcsin
-
7r
a<x
(17.3)
If we need to use the general arcsine density in (17.3), we denote it by A S ( a ,h ) , and for the standard arcsine density in (17.1) we use the notation X AS(0,l). The importance of the arcsine distribution is due to its applications in the theory of random walks. Let us consider sums So = 0, Sk = Yl . . . Y k , lc = 1 . 2 . . . . , 7 2 , where independent random variables Y l ,Yz, . . . take on values -1 and 1 with equal probabilities; let us denote by V, the number of positive sums among So, S1,. . . , S,. Then, 2, = V,/n is simply the fraction of positive sums. It turns out tha.t as n + 00,
-
+
2
P{Z, 5 x} + -arcsin&, 7r
+
o 5 x 5 1.
Furthermore, let
T,, = min{k : SI, = max(So,S1,. . . , S,)) be the index of the maximal sum. Then, as n 4co,
P
:{
-
}
:
5 x + -arcsin&,
o5x
(17.4)
once again. Another similar example a.rises in the theory of random processes. It is well known that the standard Brownian motion W ( t )can be regarded as a limit process for random broken lines
where the sums SI, are as defined above. Let 6 be the time spent by W ( t ) , 0 5 t 5 1, on the positive half-line, a.nd 7 be a random point, on [0,1],where the nia.ximum value of W ( t ) ,0 5 t 5 1, is attained. Then
P{fi L x}
= P{T 5
x}
=
2
-
7r
arcsin
&?
o 5 2 5 1.
(17.5)
Indeed, the arcsine distribution, being a special case of the beta distribution, inherits all the properties of that beta distribution. Note, for example, that X AS(O.1) is syninietric with respect to $:
-
x =cl 1 - x .
(17.6)
153
MOMENTS
17.3
Moments
Relation (17.6) enables us to simplify expressions for the moments of X .
Moments about zero: Let X N AS(0,l). Then, it follows from (16.8) that
-
w(an)! = z ( 1 n 2n)
,
n = 1 , 2, . . . .
(17.7)
In particular, we obtain ~1
= =
Q~
=
1 2 E X 2 = -3 8 '
(17.8)
EX=-,
(17.9)
E X 4 = - 35 128
(17.11)
-
Central moments: Since X A S ( 0 , l ) is symmetric with respect t o its mean E X immediately have P2,+1 = E
(X -
a)
2n+1 =0,
=
i, we
n=1,2, ...
For central moments of even order, we have 2n
P2n
=
E(X-5)
( 17.12)
154
ARCSINE DISTRIBUTION
Comparing (17.7) and (17.12), we see that (17.13) It readily follows from (17.13) that the variance of
Var
17.4
x =E(X
-
EX)^
=
"-s' -
1
< is given by (17.14)
Shape Characteristics
As remarked earlier, the standard arcsine distribution is symmetric about its mean $ and, consequently, its coefficient of skewness is zero. Further, we deduce from (17.13) that the fourth cent,ra.lmoment of X is given by P4
=
3
and as a result,, the Pearson coefficient of kurtosis is
Thus, we observe that the arcsine distribution is a symmetric platykurtic distribution which is U-shaped.
17.5 If X
N
Characteristic Function AS(0,I), its characteristic function is given by
f x ( t )= EeitX =
(it)k
k=O
(17.15) k=O
Comparing this expression with (16.25), we see that the RHS of (17.15) can be rewritten as F ( $, 1;it), where F ( a ,b; z ) is the Kumrner confluent hypergeoniet,ric function defined earlier in (16.26).
155
CHARACTERIZATIONS
17.6
Relationships with Other Distributions
As mentioned already, arcsine distribution is a special case of the beta distribution. We may also recall that beta distributions with parameters q = 1- p , 0 < p < 1, are called generalized arcsine distributions. Indeed, the arcsine distribution is a special case ( p = q = f ) of generalized arcsine distributions. Using the probability integral transformation, we have the random variables X N A S ( 0 , l ) and U U ( 0 , l ) satisfying the relation
-
arcsin
Exercise 17.1 Show that if Y
‘Xu . Jx =d 2
-
U ( O , x ) ,then
COSY AS(-l,2). N
17.7
(17.16)
(17.17)
Characterizations
-
If Y U ( 0 ,x),it is easy to verify that cos2 U and (1+cos U)/2 have the same distribution, meaning that
( 17.18) when V
N
AS(-l,2). Note that (17.18) is equivalent to the relation 4
when X
N
(x - -
:)2
=x
(17.19)
AS(0,l).
Exercise 17.2 Show that (17.19) is valid for the standard arcsine distribution AS(0, l), comparing central moments 02n and moments about zero an ( n = 1 , 2 , . . .) given in (17.13) and (17.7), respectively.
Note that property (17.19) characterizes A S ( 0 , l ) distribution while the relation (17.18) characterizes AS( -1’2) distribution.
ARCSINE DISTRIBUTION
156
17.8
Decompositions
+
As mentioned in Chapter 16, beta(p,q) distributions with p q < 2 are indecomposable. Hence, the arcsine distribution is indecomposable; that is, we cannot find two nondegenerate independent random va.riables U and V such that U V AS(0,l). However, the following relation holds for any n = 2 , 3 , . . . [see Krysicki (1999)l:
+
N
(17.20)
where X ,,, A S ( 0 , l ) and
Xk
N
beta
("",, ', a,), k --
=
1 , 2 , . . . , n , are in-
dependent random variables. Hence, X cannot be presented as a. sum of independent variables while it can be represented as a product of independent beta random variables.
CHAPTER 18
EXPONENTIAL DISTRIBUTION 18.1 Introduction When we discussed distributions of the uniform order statistics Uk,n (1 5 k 5 n) in Chapter 11, it was shown that (as R + co)
which is the standard exponential distribution function. A book-length account of exponential distributions, discussing in great detail their various properties and applications, is available [Balakrishnan and Basu (1995)l.
18.2
Notations
We say that a random variable X has the standard exponential distribution if its cdf Fx(x)is given by (18.1)
The corresponding probability density function has the form (18.2)
+
Linear transformations Y = a AX, -m < a < co, X family of exponential distributions to those with cdf
> 0, enlarge the
(18.3)
157
158
EXPONENTIAL DISTRIBUTION
Note that F y ( 2 ) in (18.3) can be rewritten as
{
(
F y ( 2 ) = max 0 , l - exp - “ x u ) } , ~
-m<xz<.
(18.4)
The corresponding pdf is given by (18.5) In the sequel, we will use the notation Y E ( a , X ) to denote a random variable Y having the exponential distribution with location parameter a and scale parameter X > 0. In many situations, we deal with nonshifted ( a = 0) exponential distribution E(0,A). For the sake of simplicity] we will denote it by Y E ( X ) ,and in this case N
Fy(x)= max 0 , 1 - exp
{
(-
and
Note that if X
-
31
(18.6)
(18.7)
E(1),then Y
= AX
N
E(X).
E(1) and Y E(X) be independent random variExercise 18.1 Let X ables. Then, find the value of the parameter X such that P { X 2 Y } = N
i.
N
Exercise 18.2 Let X and Y be independent standard exponentia.1 random variables. Find the distribution of X / Y .
18.3 Laplace Transform and Characteristic F‘unct ion If Y form:
N
E ( X ) , then its Laplace transform py(s) = Ee-SY has the following 1 1fXs.
If V
=a
+ Y , then V
N
(18.8)
E ( u ,A), and consequently, (18.9)
MOMENTS
159
To obtain the characteristic function of an exponentially distributed ran-
dom variable, we can use the following relation between Laplace transforms
cpx(s)= EecSX and characteristic functions f x ( t )= EeitX:
fx ( t )= cpx (-it).
(18.10)
Using (18.10) and the expression for the Laplace transform of exponential distribution given above, we ca.n write the characteristic functions of X E(l),Y E(A), and V E ( a , X ) as N
N
N
(18.11)
( 18.12) (18.13) respectively.
Moments
18.4
The exponential decay of the pdf (18.5) provides the existence of all the moments of exponential distribution.
Moments about zero: If Y E ( A ) ,then N
a,=EY"
xn exp
=
(- ):
dx
z"e-" dx = Anr(n =
+ 1)
n = 1 , 2 , . .. .
Ann!,
(18.14)
In particular, we have (18.15) (18.16) ( 18.17)
EY = A, E Y 2 = 2A2, E Y 3 = 6A3, E Y 4 = 24A4 In the general case when V
EV"
= E(u
+ U)"
N
( 18.18)
E(a,A), we obtain =
(;)U"-~EY
n!xm, Xran--r
-
r=O
n = 1 , 2, . . . .
(18.19)
160
EXPONENTIAL DISTRIBUTION
Central moments:
-
Indeed, the central moments coincide for random variables V N E ( a , A ) and Y E(A). Let X have the standard exponential E(1) distribution. Then
v - EV 5 Y
-
EY
d
= X(X
-
EX),
and therefore
pn = E(V - EV)"
=
E ( Y -- A)"
=
A"E(X - 1)"
r=O
c r=O
=
Ann!
( n- r ) ! (-1)T
PI
'
N7e can show from (18.20) tjhat central moments recurrence relations: @I /?,+I
= =
0, (n
+ l)A@, + (-l)"+'Xn+',
n = 1 , 2 ,. . . . (18.20)
fin satisfy the following
TL
=
1,2,.. . .
(18.21)
In particular, we obtain froin (18.20) the variance of Y to be Va.r Y = p2 = x2.
18.5
-
( 18.22)
Shape Characteristics
Let Y E(X). Then, from (18.7), we see readily that the distribution is unimodal with the mode at 0. It has an exponential decay forni. Also, from (18.20), we obta.in the third and fourth central moments of Y as p3
= 2 3 ~ and
p4=9x4.
We then readily find Pearson's measures of skewness arid kurtosis as
respectively. Thus, the exponential distribution is a positively skewed and leptokurtic distribution which is reverse J-shaped. Plots of exponential density function are presented in Figure 18.1 for some choices of A.
SHAPE CHARACTERISTICS
161
0 r-
03 0
W
0
CL
n Q U 0
c.I 0
0
0
I
I
I
I
I
I
0
1
2
3
4
5
X
Figure 18.1. Plots of exponential density function
162
EXPONENTIAL DISTRIBUTION
18.6
Entropy
Since the random variable V H ( V ) is defined by
N
E ( a ,A) has pdf as given in (18.5)' its entropy
-
x
") dx
=1
+ log A.
(18.23)
Consider the set of all probability density functions p ( z ) satisfying the following restrictions: (a) p ( z ) > 0 for z 2 0 and p ( z ) = 0 for z < 0; z p ( z ) dx = C, where C is a positive constant. (b) It happens that the ma.xima1value of entropy for this set of pdf's is attained for
that is, for an exponential distribution with mean C
18.7
Distributions of Minima
Let Y k N E(Xk),k min(Y1,. . . , Y,}.
=
1 , 2 , . . . , n, be independent random variables, and mn =
Exercise 18.3 Prove that m, (for any n = 1 ' 2 , . . .) also has the exponential E(X) distribution, where
x =A1
+
" '
+A,.
The statement of Exercise 18.1 enables us to show that d
yl
min(Y1,. . . , Y ? }= -, n
n = 1 , 2 , . ..
( 18.24)
when Yl,Y2,.. . are independent and identically distributed as E ( X ) . It is of interest to mention that property (18.24) characterizes the exponential E(X) (A > 0) distribution.
UNIFORM AND EXPONENTIAL ORDER STATISTICS
163
Uniform and Exponential Order Stat istics
18.8
In Chapter 11 [see (11.39)] we introduced the uniform order statistics
. . 5 Un,n
U1,n L u2,n I .
arising from independent and identically distributed random variables Uk U ( 0 ,I), k = 1 , 2 , . . . ,n. The analogous ordering of independent exponential E ( l ) random variables X I , X z , . . . , X , gives us the exponential order statistics N
X1,n Note that
I X2,n 5 . . 5 Xn,n'
= min(X1,
and
X2,. . . , X n }
Xn,n = max(X1, X 2 , . . . , X n } .
There exist useful representations of uniform as well as exponential order statistics in terms of sums of independent exponential random variables. To be specific, let X I , X,, . . . be independent exponential E(1) random variables, and S k = x1
+ x,+ . + X k , '
k = 1,2, . ..
'
Then, the following relations are valid for any n = 1 , 2 , . . . : (18.25) and
Note that relation (18.24) is a special case of (18.26). The distributional relations in (18.25) and (18.26) enable us t o obtain some interesting corollaries. For example, it is easy t o see that the exponential spacings
D2,n = X2,n
D1,n = X1,ni
are independent, and that
( n- k
-
X l , n , . . . 3 Dn,n = Xn,n - X n - ~ , n
+ I)&,,
N
E(1).
It also follows from (18.26) that
-
1 1 -+n n-1
+ . - . +n - k1+ l '
1 5 k 5 n.
(18.27)
Upon setting k = n = 1 in (18.25), we obtain the following interesting result: If X1 and X , are independent exponential E(l) random variables, then Z = X , / ( X l X,) has the uniform U ( 0 , l ) distribution.
+
164
EXPONENTIAL DISTRIBUTION
Exercise 18.4 Using the representation in (18.26), show that the variance of x k , , is given by
and that the covariance of
and Xe,, is simply
Xk,,
COv(Xk,n,Xe,+)= Var
Y1
< t 5 72 .
Convolutions
18.9 Let Yl
15 k
Xk,n,
+ Y2.
N
E(A1) and
Y2
-
E(A2) be independent random variables, and V
=
Exercise 18.5 Show that the pdf pv(x) of V has the following form:
e -x/x1 PV(2) =
if A1
# Az,
__ e - x / x z
z>0
1
A 1 - A2
(18.28)
and p v ( x )=
X
z
--e-"/X, A2
20
(18.29)
if A 1 = A2 = A.
Equation (18.29) gives the distribution of the sum of two independent, raridon1 variables having the same exponential distribution. Now, let us consider the sum
v,,= XI + ' . + X, '
of n independent exponential random variables. For the sake of simplicity, we will suppose that XI, N E ( l), k = 1 , 2 , . . . , n. It follows from (18.8) that the Laplace transforni p(s) of any XI, has the form (18.30)
Then the Laplace transform p,(s) positive values, is given by p,(.)
00
=
of the sum V,, which also takes on only
e - S Z p n ( x ) dx = (1
+
S ) y ,
(18.31)
DECOMPOSITIONS
165
where p,(x) denotes the pdf of V,. Comparing (18.30) and (18.31), we can readily see that
a.nd, hence,
It then follows from (18.32) that (18.33) Probability density function in (18.33) is a special case of g a m m a distribu t i o n s , which is discussed in detail in Chapter 20.
Exercise 18.6 Let X1 and X2 be independent random variables having the standard E(1) distribution, and W = XI - X2. Show that the pdf of W has the form 1 p w ( x ) = - e-Iz1, 2
-00
< x < 00.
(18.34)
The probability density function in (18.34) corresponds to the standard Laplace distribution, which is discussed in detail in Chapter 19.
18.10 Decompositions Let X N E(1). We will now present two different ways to express X as a sum of two independent nondegenerate random variables. (1) Consider random variables V = [XI and U = { X } , which are the integer and fractional parts of X , respectively; for example, V = n and U = x if X = n x,where n = 0 , 1 , 2 , . . . and 0 5 x < 1. It is evident that
+
X=V+U.
(18.35)
Let us show that the terms on the RHS of (18.35) are independent. It is not difficult to check tha.t V takes on values 0 , 1 , 2 , . . . with probabilities p , = P{V = n } = P { n 5 X
< n + 1) = (1 - X)X",
(18.36)
166
EXPONENTIAL DISTRIBUTION where X = l/e. This simply means that V has the geometric G(l/e) distribution. We see that the fractional part of X takes values on the interval [0, 1) and 03
Fu(x)
P{U 5 z}
=
=
C P { n5 x 5 n + z }
n=O
To check the independence of random variables V and U , we must show that for any n = O , 1 , 2 , . . . and 0 5 z 5 1, the following condition holds:
P{V = n,
u 5 .}
(18.38)
= p,Fv(z).
Relation (18.38) becomes evident once we note that
P{V
= n,
U 5 x}
= P{n 5
X < n + x}
= eFn(1- e-").
Thus, we have expressed X having the standard exponential distribution as a. sum of two independent random variables. Representation (18.35) is also va.lid for Y E ( X ) ,X > 0. In the general case, when Y E ( a ,A), the following general form of (18.35) holds: N
N
Y =a
+ [Y
-
a] + {Y - a } ,
where the integer and fractional parts ([Y- a] and {Y random variable Y - a are independent.
-
a } ) of the
(2) Along with the random variable X N E(l),let us consider two independent random variables Yl and Y2 with a common pdf
(18.39) It is easy to prove that the nonnegative function g ( x ) above is really a probability density function because of the fact that
Exercise 18.7 Show that the sum Y1f yZ has the exponential E(1) distribution.
Thus, we have established that the exponential distributions are decomposable. Moreover, in Chapter 20 we show that any exponential distribution is infinitely divisible.
LACK OF MEMORY PROPERTY
18.11
167
Lack of Memory Property
In Chapter 6 [see (6.39)], we established that the geometric G ( p ) random variable X for any n = 0, 1,. . . and m = 0,1, . . . satisfies the following equality:
P { X 2 n + m I X 2 m} = P { X 2 n}. Furthermore, among all discrete distributions taking on integer values, geometric distribution is the only distribution that possesses this “lack of memory” property. Now, let us consider Y E ( X ) , X > 0. It is not difficult to see that the equality N
P{Y 2 z + y
1 Y 2 y}
= P{Y
2 z}
(18.41)
holds for any z 2 0 and y 2 0. This “lack of memory” property characterizes the exponential E(X) distribution; that is, if a nonnegative random variable Y with cdf F ( z ) [such that F ( z ) < 1 for any z > 01 satisfies (18.41), then
F ( z ) = 1 - e-+, for some positive constant A.
II:
> 0,
This Page Intentionally Left Blank
CHAPTER 19
LAPLACE DISTRIBUTION 19.1 Introduction In Chapter 18 (see Exercise 18.4) we considered the difference V = XI - X , of two independent random variables with X I and X , having the standard exponential E(1) distribution. It wa.s mentioned there that the pdf of V is of the form 1 pv(x) = - e + ~ , -co < x < 00. 2 This distribution is one of the earliest in probability theory, and it was introduced by Laplace (1774). A book-length account of Laplace distributions, discussing in great detail their various properties and applications, is available and is due to Kotz et al. (2001).
19.2
Notations
We say that a random variable X has the Laplace distribution if its pdf px (x) is given by (19.1)
-
We use X L ( a ,A) to indicate that X has the Laplace distribution in (19.1) with a 1oca.tionpa.rameter a and a scale parameter X (-co < a < 00, X > 0). In the special case when X has a symmetric distribution with the pdf (19.2)
-
we denote it by X L(X) for the sake of simplicity. For insta.nce, V denotes that V has the standard Laplace distribution with pdf 1
pv(x) = - e-I21, 2
--M
169
< x < 00,
-
L(1)
(19.3)
170
LAPLACE DISTRIBUTION
and its cdf Fv(x)has the form (19.4)
+
Indeed, if V L(1), then Y = XV L(X),and X = a XV L ( a ,A). Laplace distributions are also known under different names in the literature: the first law of Laplace (the second law of Laplace is the standard normal distribution), double exponential (the name dou.ble exponential or, sometimes, doubly exponential is also applied to the distribution having the cdf N
N
-co < x < 00,
F ( z ) = exp(-e-2),
which is one of the three types of extreme value distributions), two-tailed exponen,tial, and bilateral exponential distributions.
19.3
Characteristic Function
Recall that if V L ( 1 ) , it can be represented as V = X I - X,, where XI and X z are independent random variables and have standard exponential E( 1) distribution. Consequently, the characteristic function of V is N
f v ( t )= EeitV = EeitX1Ee-itX2 and, therefore, we obta.in from (18.11) that fV(t) =
1 1 1 t2 (1 - it)(l +it)
+
Since the linear transformation X = a distribution, we readily get
.
(19.5)
+ XV leads to the Laplace L ( a ,A) (19.6)
It is of interest t o mention here that (19.5) gives a simple way to obtain the characteristic function of the Cauchy distribution [see also (12.6) and (12.7)]. In fact, (19.5) is equivalent to the following inverse Fourier transform, which relates the characteristic function f v ( t )with the probabilit.7 density function PV (XI:
(19.7) It t,hen follows from (19.7) that (19.8) Now wt' can see from (19.8) that the Cauchy C(0.1) distribution with pdf
has characteristic function
f ( t ) = e-lt'.
MOMENTS
19.4
171
Moments
The exponential rate of the decreasing nature of the pdf in (19.1) entails the existence of all the moments of the Laplace distribution.
-
Moments about zero: Let Y L ( A ) . Since Y has a symmetric distribution, we readily have =
a2n-1
EY2"-
=o,
1
For moments of even order. we have ( ~ 2 " = EY2"
1"
n=1,2)....
(- y )dx (- T) dx
x2nexp 2A -m 1 " x2"exp
=
x.6
=
dx
x2n e -x
=
A2"Iw
=
x2" 1 ? ( 2 ~ +1) = x2" ( 2 4 4
=
i , 2 , . . . . (19.9)
In particular, we have
E Y 2 = 2A2
( 19.10)
E Y 4 = %A4.
(19.11)
and
In the general case when X = a + Y
N
L ( a ,A), we also have
n
EX"
= E(u
+ Y)" = C
an-'EYT,
n = 1,2,.. .,
(19.12)
r=O
where
EY'
=
if r = 1 , 3 , 5 , ... if T = 0 , 2 , 4 , . . . .
0 A'r!
-
Central moments: Let Y L(X) and X = a + Y L ( a ,A). It is clear that central moments of X and moments about zero of Y are the same since N
,& = E ( X - E X ) "
= E(Y
-
EY)"
= EY",
n = 1 , 2 , .. . .
From (19.13) and (19.10), we immediately find that
(19.13)
172
LAPLACE DISTRIBUTION
19.5
-
Shape Characteristics
Let Y L(X). Then, due to the symmetry of the distribution of Y , we readily have the Pearson’s coefficient of skewness as y1 = 0. Further, from (19.11), we find the fourth central moment of Y as p4
=
a4
=
a4 =
-
+
4 ~ 3 ~ 61 ~
~ -2 ~ ~C X 4:
(19.15)
From Eqs. (19.15) and (19.14), we readily find the Pearson’s coefficient of kurtosis as (19.16)
Thus, we find the Laplace distribution t o be a symmetric leptokurtic distribution.
19.6
Entropy
The entropy of the random variable X
-
J’, O
+ =
1
N
L ( a ,A) is given by
ex ( l o g ( 2 ~ )- x log e> tin:
lrr f + e-” (log(2X)
log(2Xe) = 1
3: loge}
dx
+ log(2X).
( 19.17)
Consider the set of all probability density functions p(x) satisfying the following restriction: (19.18) where C is a positive constant. Then, it has been proved that the maximal value of entropy for this set of densities is attained if p ( x ) = - exp 2c that is, for a Laplace distribution.
( ;.) -
-
,
173
CONVOLUTIONS
19.7
-
Convolutions
-
Let Y1 L(X) and Y 2 L(X)be independent random variables, W = Yl +Y2, and T = Y1- Y 2 . Since the Laplace L(X)distribution is symmetric about zero for any X > 0, it is clear that the random variables W and T have the same distribution. Exercise 19.1 Show that densities p w ( z ) and p ~ ( zof) the random variables W and T have the following form:
-
Next, let Y1 N L(X1) and Y 2 L(X2) be independent random variables each having Laplace distribution with different scale parameters. In this case, too, the random variables W = Yl +Y2 and T = Y1 -Y2 have a common distribution. To find this distribution, we must recall that characteristic functions fl(t) = EeitY1 and f ~ ( t )= EeitY2have the form
and hence the characteristic function fw(t) is given by
It follows from (19.20) that the probability density functions
of the random variables W , Yl,and YLsatisfy the relationship
Hence, we obtain
--oo
< 2 < 00.
(19.22)
174
LAPLACE DISTRIBUTION
Decompositions
19.8
Coming back to the representation V = X1 - X2,where V L ( 1 ) and independent random variables X1 and X2 have the standard exponential E ( 1 ) distribution, we easily obtain that the distribution of V is decomposable. Recall from Chapter 18 that the random variables X1 and X , are both decomposable [see, for example, (18.35) and Exercise 18.51. This simply means that there exist independent random variables Yl,Y2,Y3, and Y4 such that N
and
d
x2 = Y3
+ y4,
and hence
V = (Yl - y3) + (Y2 - Y4), d
(19.23)
where the differences Y1 - Y3 and Y2 - Y4 are independent random variables and have nondegenerate distributions. Therefore, V L( 1) is a decomposable random variable. Furthermore, a linear transformation a XV preserves this property and so the Laplace L ( a ,A) distribution is also decomposable. In addition, since exponential distributions are infinitely divisible, the representation V = X 1 - X 2 enables us to note that Laplace distributions are also infinitely divisible. N
19.9
+
Order Statistics
Let Vl, Vl, . . . , V, be independent and identically distributed as standard Laplace L(1) distribution. Further, let V1,n < Vz,, < . . . < V,,n be the corresponding order statistics. Then, using the expressions of p v ( x ) and F v ( z ) in (19.3) and (19.4), respectively, we can write the kth moment of V?,, (for 1s T 5 n) as
-
n! ( r - l ) ! ( n- T ) !
x k {Fv(x)}'-l (1 - Fv(2)}"-"pv(z) dx
--oi)
n!
-
-
/"
an(?--
n! I)! ( n- T ) !
xk
(2 - e - x } T - '
{e-x}n--r
ePx dx (19.24)
ORDER STATISTICS
175
+
Now, upon writing 2- e-" in the two integrands as 1 ( 1- ec") and expanding the corresponding terms binomially, we readily obtain
E (Vtn)
l n n! + (-1)"E 2n ( T - I ) (! n - r ) ! . z=r
-
1
c (3
r-l
2n i=O
2 k (T -
( n - i)! 1 - i)! (TL - T ) ! { I - e-x}r-l-i
i!
z=r
x { e - ~ > ~ - ' e - X dz.
(19.25)
Noting now that if X I ,X,, . . . ,X , is a random sample from a standard exponential E ( l ) distribution and that X I , , < X,,, < ... < Xm,m are the corresponding order statistics, then
(19.26) Upon using this formula for the two integrals on the RHS of (19.25), we readily obtain the following relationship:
( 19.27) Thus, the results of Section 18.8 on the standard exponential E ( l ) order statistics can readily be used to obtain the moments of order statistics from the standard Laplace L ( 1 ) distribution by means of (19.27).
176
LAPLACE DISTRIBUTION
Similarly, by considering the product moment E (V,,, K , n ) ,splitting the double integral over the range (-cc < z < y < cc) into three integrals over the ra.nge (-cc < z < y < 0), (-cc < z < 0, 0 < y < m), and (0 < z < y < oo), respectively, and proceeding similarly, we can show that the following relationship holds:
for 1 5 r < s 5 n, where, as before, E ( X k , m Land ) E ( X k , m Xe,,) denote the single and product moments of order statistics from the standard exponential E ( 1) distribution.
Exercise 19.2 Prove the relation in (19.28).
Remark 19.1 As done by Govindarajulu (1963), the approach above can be generalized to any symmetric distribution. Specifically, if &,,'s denote the order statistics froin a distribution symmetric about 0 arid Xz,7L's denote the order statistics from the corresponding folded distribution (folded about 0), then the two relationships above continue to hold.
Exercise 19.3 Let Vl,V2,. . . , V, be a random sample from a distribution F ( z ) synirnctric about 0, and let Vl,, < V2,, < . . . < Vn,nbe the corresponding order statistics. Further, let Xe,, denote the l t h order statistic from a random sample of size nz from the corresponding folded distribution with cdf G(z) = 2 F ( z ) - 1 (for z > 0). Then, prove the following two rclationships between the moments of these two sets of order statistics: For 1 5 r 5 n and k 2 0,
(19.29)
ORDER STATISTICS for 1 5 r
177
< s 5 n,
E(V,,,K,,)
=
1 2"
-
The rehtionship in (19.27) can also be established by using simple probability arguments as follows. First, for 1 5 T 5 n, let us consider the event Vr,, 2 0. Given this event, let i (0 5 i 5 r - 1) be the number of V's (among V1,V2,.. . , V,) which are negative. Then, since the remaining ( n - i) V ' s (which are nonnegative) form a random sample from the standard exponential E(1) distribution, conditioned on the event V,,,, 2 0, we have d
V,,,= Xr-i,,-i wit,h binomial probabilities
(3I
for i
= 0 , 1 , . . . ,r
-
1
2". Next, lct us consider the event
(19.31)
V,,,< 0.
Given this event, let i ( r 5 i 5 7 ~ )be the number of V's(among V1,V2,.. . , Vn) which are negative. Then, since the negative of these i V ' s (which are negative) form a random sample from the standard exponential E(1) distribution, conditioned on the event Vr,, < 0, we also hame d
V,.,, = -Xi-,+l,i
for i
with binomial probabilities (:)/27L.
= r, r
+ 1 , .. . , n
(19.32)
Combining (19.31) and (19.32), we
readily obtain the relation in (19.27).
Exercise 19.4 Using a similar probability argument, prove the relation in (19.28).
This Page Intentionally Left Blank
CHAPTER 20
GAMMA DISTRIBUTION 20.1
Introduction
In Section 18.9 we discussed the distribution of the sum
w,= x1+. . . + x, of independent random variables X I , (k = 1 , 2 , . . . ,n ) having the standard exponential E(1) distribution. It was shown that the Laplace transform p,(s) of W , is given by p,(s) = (1+s)-,,
(20.1)
and the corresponding pdf p,(x) is given by
p,(.)
1
=
xn-l
e --I
if
5
> 0.
(20.2)
It was also shown in Chapter 18 (see Exercise 18.5) that the sum of two positive random variables Yl and Y2 with the common pdf
p+(x)= 1 ~
J;;
2
-112
e --I ,
(20.3)
x>O,
has the standard exponential distribution. In addition, we may recall Eq. (9.22) in which cumulative probabilities of Poisson distributions have been expressed in terms of the pdf in (20.2). Probability density functions in (20.2) and (20.3) suggest that we consider the family of densities
pa(.)
=C(a)xa-leP,
x > 0,
where C ( a )depends on cr only. It is easy to see that pa(.)
where
r(a)is the complete gamma function. 179
(20.4) is a pdf if
GAMMA DISTRIBUTION
180
Notations
20.2
We say that a random variable X has the standard gamma distribution with parameter a > 0 if its pdf has the form
(20.5) The linear transformation Y = a PY(X) =
+ AX yields a random variable with pdf
1 (x - a)a-1 exp qQ)XN
~
{ -x}> a. 2-a
if z
(20.6)
We will use the notation Y r(a,a,A) to denote a random variable with pdf (20.6). Hence, X r(a,O,1) corresponds t o the standard gamma distribution with pdf (20.5). Note that when QI = 1, we get the exponential distributions as a subset of the gamrria distributions. The special gamma dist,ribut,ions r (71/2,0, a ) , when n is a positive integer, are called chi-square distributions (xzdistribution) with n degrees of freedom. These distributions play a very important role in statistical inferential problems. From (20.5), we have the cdf of X r ( a ,0 , l ) (when Q > 1) as N
N
N
=
-
+ F X [(21,
px (x)
where X ' is a r(n - 1 , 0 , 1) random variable. Thus, the expression above for the cdf of X presents a recurrence relation in Q. Furthermore, if a equals a positive integer 7 1 , then repeated integration by parts a s above will yield
1-x7p n-1
=
e-Xz2
i=O
which is precisely the relationship between tlie cumulative Poisson proba.bilities arid tlie gamma distribution noted in Eq. (9.22).
Mode
20.3
Let, X r(a,O,1). If Q = 1 (the case of the exponential distribution), the pdf px (x)is a monotonically decreasing function, as we have seen in Cha,pter 18. If Q > 1, the gamma r(a,0 , l ) distribution is unimodal and it,s mode [the point wkiert: density function (20.5) takes the maximal value] is at 5 = cy 1. 011th: othcr hand, if Q < 1, then p x ( z ) tends to infinity as z + 0. N
-
MOMENTS
20.4 Let X
-
181
Laplace Transform and Characteristic Function r(a,0 , l ) . Then, the Laplace transform ' p ~ ( s )of X is given by (20.7)
Then the characteristic function f x ( t ) = EeitX has the form
fx ( t )= px ( - i t )
=
1 (1- it)"
(20.8)
As a result, we can write the following expression for the characteristic function of the random variable Y == a A X , which has the general gamma r ( a ,a,A) distribution:
+
(20.9)
20.5
Moments
The exponentially decreasing nature of the pdf in (20.5) entails the existence of all the moments of the gamma distribution.
-
Moments about zero: Let X r ( a ,0 , l ) . Then, an=EXn
=
(20.10) In particular, we have
E X = a, E X 2 = a ( a + l), EX3 = a(a+l)(a+2), E X 4 = C X ( Q+ l)(a+ 2)(a + 3).
(20.11) (20.12) (20.13) (20.14)
Note that (20.10) is also valid for moments E X " of negative order n > -a. Let us now consider Y N I ' ( a , a , A ) . Since Y can be expressed as Y = a A X , where X r ( a ,0, l),we can express the moments of Y as
+
N
EY" = E ( a + AX)"
=
-
GAMMA DISTRIBUTION
182
Central moments: Central moments of the random variables Y r(a,a , A) and V r(Q, 0, A) are the same. Now, let X have the standard gamma r(a,O, 1) distribution. Then, Y - EY 5 v - EV 5 A(X - E X ) N
N
and hence
p"
Y
=
E(Y
-
EY)"
= E(V
-
EV)"
= XnE(X
-
EX)"
As a special case of (20.16), we obtain the variance of the random variable r(a, a , A) as
N
Var Y = L,ZI = ax2.
20.6
(20.17)
Shape Characteristics
From (20.16), we get the third central moment of Y t o be p3
= 2aX
3
using which we readily obtain Pearson's coefficient of skewness as (20.18)
This reveals that the gamma distribution is positively skewed for all values of the shape parameter a. Next, we get the fourth central moment of Y from (20.16) to be
p4 = ( 3 2 + 6 4 x 4 , using which we readily obtain Pearson's coefficient of kurtosis as 7 2
=
D4
Pz"
-
=3
+ -.a6
(20.19)
This reveals tha.t the gamma distribution is leptokurtic for all values of the shape parameter a. Furthermore, we note from (20.18) and (20.19) that as Q m, y1 and 7 2 tend to 0 and 3, respectively (which are the values of skewness and kurtosis of the normal distribution). As we shall see in Section 20.9, the normal distribution is, in fact, the limiting distribution of gamma. distributions as the shape parameter a + 00. Plots of gamma density function presented in Figure 20.1 reveal these properties.
SHAPE CHARACTERISTICS
183
LD 0
U
0
m 0
U
n
a
c.l 0
7-
0
0
0
0
5
10
15
Figure 20.1. Plots of gahrna density function
20
184
GAMMA DISTRIBUTION
Remark 20.1 From (20.18) and (20.19), we observe that y1 and the rela.tionship 72 = 3
72
+1.5~;~
satisfy (20.20)
which is the Type I11 line in the Pearson plane (that corresponds to the gamma faniily of distributions).
Exercise 20.1 Generalizing the pdf in (20.5),we may consider the pdf of the generalized ga’mmn distributions as
p x , ( z )= ~ ( a :h)z6ep1e--rs: ,
z
> 0, a: > 0, 6 > 0.
Then, find the normalizing constant C(a:,6). Derive the moments and discuss the shape charackristics of this generalized gamma family of distributions.
+
Consider the transformation Z = y logX’ when X’ has a generalized gamma distribution with pdf p x (z) ~ as given above. Then, we readily obtain the pdf of 2 a.s cy6(*--y)
p ~ ( z )= C(cr,6)e
e
&(Z-Y)
--cx)
1
a>O, 6 > 0 , --cx,
and this distribution is called the log-gamma dastrabution. In the above density function, -/ is the location parameter, 6 is the scale parameter, and a: is the shape parameter. The family of log-gamma distributions has found applications in the analysis of life-time data.
Exercise 20.2 For the log-gamma density function presented above, determine thc: riornializing constant C ( n ,6). Derive the moment generating funct,iori of 2 and use it to obtain E Z and Var 2.
CODITIONAL DISTRIBUTIONS AND INDEPENDENCE
20.7
185
Convolutions and Decompositions
-
Consider independent random variables Yl r(a,a , A) and Y2 r ( p ,b, A), having gamma distributions with the same scale parameter A > 0. Let W = N
Yl + Y2. Since the characteristic functions of Y1 and Yz are eibt
(1 - i A t ) P ' sum as
eiat
(1 - iAtp
and
respectively, we obtain the characteristic function f w ( t ) of their
-
(20.21)
+
+
Comparing (20.9) and (20.21), we simply see that W r ( a p, a b, A). This result is valid for sums of any number of independent random variables Yl,Y2,. . , , Y,, having gamma distributions with the same scale parameter. To be specific, if
yk
r ( a k , a k , A ) , IC = 1 , 2 , . . . ,
then
w = yl + Y~+ . . . + Y, q y , , d,, where yn = a1 + . . . + an and 6, = a1 + . . . + a,.
(20.22)
A),
-
Moreover, it follows from (20.22) that for any n = 1 , 2 , . . . , a gamma distributed random variable W r(a,a , A) can be expressed as a sum of n independent and identically distributed random variables having r (cu/n,u / n , A) distribution. For instance, if W r(l,a , A) [which means that W has the exponential E ( a ,A) distribution], then W has the same distribution as that of the sum Yl Y2 . . . Y,, where Yl,Y,,. . . ,Y, are independent and Yk r ( l / n , a / n , A), k = 1 , 2 , . . . , n. This simply means that all gamma distributions (with exponential among them) are infinitely divisible. Also, an example given in Chapter 18 for the exponential distribution (see Exercise 18.5) shows that gamma distributed random variables can admit representations as a sum of non-gamma distributed variables. N
-
20.8
+ +
+
Conditional Distributions and Independence
-
Let us consider independent random variables X r(a,0 , l ) and Y r(p,0 , l ) . Further, let V = X Y and U = X / ( X Y ) . It is then easy t o see that the joint density function p x , (2, ~ y) of X and Y is given by
+
P X , Y (z, Y)
=
+
N
PX ( Z ) P Y (Y)
(20.23)
GAMMA DISTRIBUTION
186
Taking into account that X = U V and Y = V ( l - U ) and the Jacobian of the corresponding change of variables (x,y) = (vu, v(1 - u ) ) equals z', we obtain from (20.23) the joint density function of the random variables U and V as
'L'
> 0, 0 < u < 1.
(20.24)
From (20.24), we observe immediately that the random variables U and V are independent, with V having the gamma I'(a+P, 0 , l ) distribution (which is to be expected as V is the sum of two independent gamma random variables), and U ha.ving the standard beta(cu, P ) distribution. We need to mention here two interesting special cases. If X r ( 0 , l ) and Y I' 0, l ) , then U = X/(X Y ) has the standard arcsine distribution; and if X and Y have the standard exponential E(1) distribution, then U = X / ( X Y ) is a standard uniform U ( 0 , l ) random variable. Of course, we will still have the independence of U and V, and the b e t a ( a , p ) distribution of U if we take X r ( a ,0 , A) and Y r ( p ,0, A). Lukacs (1965) has proved the converse result: If X and Y are independent Y positive random variables and the random variables X / ( X Y ) and X are also independent, then there exist positive constants a , p, and A such that X r(a,0, A) and Y r(p,0, A). It was also later shown by Marsaglia (1974) that the result of Lukacs stays true in a more general situation (i.e., without the restriction that X and Y are positive random variables). N
+
(i,
N
+
-
N
+
N
+
N
+
Exercise 20.3 Show tha.t the independence of U = X / ( X Y ) and V = X + Y also implies the independence of the following pairs of random variables: (a)
V
X
f_
a.nd x
+
+Y ;
X 2 Y2 and X XY (d)
(xX Y*I2 +
+Y ;
and ( X
+ Y)'.
+
Exercise 20.4 Exploiting the independence of U = X / ( X Y ) and V = X Y , the fact that X and Y are both gamma, and the moments of the gamma distribution, derive the moment of the beta distribution in (16.8).
+
187
LIMITING DISTRIBUTIONS
Once again, let X r ( a ,0, A) and Y r(p,0, A) be independent random variables, and let V = X Y. Consider now the conditional distribution of X given that V = v is fixed. The results presented above enable us to state that conditional distributions of X/v and X / V are the same, given that V = ‘u. Thus, the conditional distribution of X/v, given that X Y = ‘u, is beta(a,p). If a = p = 1 [which means that X and Y have the common exponential E(A) distribution], the conditional distribution of X , given that X Y = v, becomes uniform U ( 0 ,v). The following more general result is also valid for independent gamma random variables. Let x k r ( a k , 0, A), k = 1 , 2 , . . . ,n, be independent random variables. Then, N
N
+
+
+
N
vx1
(X,+...+X,
.’.+X, ’ ... ’ X l +vxn
d
= {Xi, . . . , X,
1
1 V = Xi + . . . + X ,
= U} .
(20.25)
Recalling now the representation (18.25) for the uniform order statistics Ul,,,. . . , U,,,, which has the form
where
s k =
x1 + . . . + x k , and xk-r(i,o,i),
k = 1 , 2 ,...
are the standard exponential E(1) random variables, we can use (20.25) to obtain another representation for the uniform order statistics as {Ul,n,..
20.9
un,n}=d {SI,.. . , Sn I Sn+l
I}.
(20.26)
Limiting Distributions
Consider a sequence of random variables Yl,Y2, . . . , where
~ , ~ r ( n , o , i ) , n = 1 , 2 ,..., and with it, generate a new sequence
( 20.2 7)
with its characteristic function f n ( t )being
(20.28)
GAMMA DISTRIBUTION
188
Exercise 20.5 Show that for any fixed t , as n + M;
(20.29)
Earlier, we encountered the characteristic function e-"l2 [for example, in Eqs. (5.43), (7.31) and (9.33)] corresponding to the standard normal distribution. Wc have, therefore, just esta.blished that the normal distribution is t,he limiting distribution for the sequence of gamma random variables W, in (20.27).
Exercise 20.6 Let a(.) denote the cumulative distribution function of a random variable with the characteristic function e-"/'. Later, in Chapter 23, we will find that
1
/'
@(x) = -
v%
-m
ePt2/' d t .
Making use of the limiting relation in (20.29), prove that
s";"
( n - l)! as n + 00.
Zn-l
0
e -x d z + @ ( 1 ) = 0 . 8 4 1 3 4 ...
(20.30)
CHAPTER 21
EXTREME VALUE DISTRIBUTIONS 2 1.1 Introduction In Chapter 11we considered the minimal value m, = min(U1,. . . , U,} of i.i.d. uniform U ( 0 , l ) random variables U,, Uz, . . . and determined [see Eq. (11.36)] that the asymptotic distribution of n rn, (as n + m ) becomes the standard exponential distribution. Instead, if we take independent exponential E ( 1) random variables XI, X z , . . ., and generate a sequence of minimal values
z, = min{XI,Xz,. . . ,X,},
n = 1 , 2 , ... ,
then, as seen in Chapter 18 [see Eq. (18.24)], the sequence n z, converges in distribution (as n + m ) t o the standard exponential distribution. Consider now the corresponding maximal values
M,
= max(U1,. . . , U,}
and
2, = max(X1, X,, . . . , X,},
n = 1 , 2 , .. . .
Then, as seen earlier in Eq. (11.38),as n + m,
P[n{Mn,- I} < z] + e",
z
< 0.
This simply means that the sequence n(1- M,} converges asymptotically t o the same standard exponential distribution. This fact is not surprising to us since it is clear that the uniform U ( 0 ,1)distribution is symmetric with respect to and, consequently, d
l-M,=rn, Let us now find the asymptotic distribution of a suitably normalized maximal value 2,. We have in this case the following result:
P{Z, - I n n < z}
=
( I - exp{-(z
+ Inn)})" (21.1)
189
190
EXTREME VALUE DISTRIBUTIONS
as n + 00. Thus, unlike in the previous cases, we now have a new (nonexponential) distribution for normalized extremes. The natural question that arises here is regarding the set of all possible limiting distributions for maximal and minimal values in a sequence of independent and idetically distributed (i.i.d.) random variables. The evident relationship d
max{-Yl, -Y2,. . . , -YrL} = - min{YI,Yz,.
. . , Y,}
(21.2)
requires us to find only one of these two sets of asymptotic distributions. Indeed, if some cdf H(x)is the limiting distribution for a sequence max(Y1, Y2, . . . ,Y,}, n = 1 , 2 , . . . , then the cdf G(x) = 1 - H ( - z ) would be the limiting distribution for the sequence min{-Yl, -Y2,. . . , -Y,}, n = 1 , 2 , . . . , and vice versa.
21.2
Limiting Distributions of Maximal Values
We are interested in all possible limiting cdf's H(x)for sequences =
{F(a,z
+ b,)),,
(21.3)
where F is the cdf of the underlying i.i.d. random variables Y1, Y2,. . . , and a, > 0 and b, ( n = 1 , 2 , . . .) are some normalizing constants; hence, H,(x)is the cdf of max(Y1, Y2,. . . , Yn} - b,
v, =
an Of course, for any F , we can always find a sequence a, ( n = 1 , 2 , .. .) which provides the convergence of the sequence V, to a degenerate limiting distribution. Therefore, our aim here is to find all possible nondegenerate cdf's H(x). Lemma 21.1 I n order for a nondegenerate cdf H(x)to be the limit of sequence (21.3) for some cdf F and normalizing constants a, > 0 and b, ( n = 1 , 2 , . . .), it is necessary and suficient that f o r any s > 0 and x,
H"[A(s)x + B ( s ) ]= H ( z ) ,
(21.4)
i h r e A ( s ) > 0 and B ( s ) are some functions defined for s > 0. Thus, our problem of finding the asymptotic distributions of maxima is reduced to finding all solutions of the functional equation in (21.4). It turns out that all solutions H ( z ) (up to location and scale parameters) are as follows: (21.5)
H3(x) =
e-e-"
, -m<x<<.
(21.7)
LIMITING DISTRIBUTIONS OF MINIMAL VALUES
191
can also be limiting distributions of normalized maxima for any values of the parameters -cm < a < cc and X > 0. HI is called the Fre‘chet-type distribution, H2 the Weibull-type distribution, and H3 the extreme value distribution. H3 is also referred to in the literature as the log- Weibull, double exponential, and doubly exponential distribution. As mentioned in Chapter 19, the name double exponential distribution is sometimes used for the Laplace distribution.
21.3
Limiting Distributions of Minimal Values
As mentioned above, there exists a relation G ( z ) = 1 - H ( - s ) between the limiting distributions of maximal ( H ( z ) )and minimal ( G ( z ) )values. Hence, the sct of all possible nondegenerate limiting distributions of the normalized minimal values min(Y1, Y2, . . . , Y,} are of the following forms:
(21.9)
G3(5)
=
1 - e p e X , -co
< s < cm.
(21.10)
All cdf’s where -cc < a < cc and X > 0, can also be limiting distributions of minimal values. Gz is commonly known as the Weibull distribution.
21.4
Relationships Between Extreme Value Distributions
As seen in the preceding two sections, we have three types of extreme value distributions for maxima and three corresponding types of extreme value distributions for minima. The term extreme value distributions includes all distributions with cdf’s
with the standard members (when a = 0 and X = 1) being as given in (21.5)(21.10). Often, the name extreme value distribution has been used in the
EXTREME VALUE DISTRIBUTIONS
192
literature only for distributions with cdf's H3
( "). -
It is useful to remem-
ber tha.t all six types of extreme value distributions given above are closely connected with exponential distributions.
Exercise 21.1 Let X have a standard exponential distribution. Then, show that the random variables
-x-'/",
X I / " , logX,
x-'/", -XI/",
-1ogx
have, respectively, the distributions
Linear transformations of random variables mentioned in Exercise 21.1 enable us to express the distribution of any random variable with cdf's
via the standard exponential distribution. Note also that the exponential E ( a ,A) distribution is a special case of the Weibull distribution because its cdf coincides with Gz,l ( x ~
more, if we take Y =
a,where X
Fy(x) = P { Y < .}
N
=
')
. Further-
E(l),then it is easy to show that
1 - e-z2,
x > 0.
(2 1.11)
We see that the RHS of (21.11) coincides with the Weibull cdf G 2 , 2 ( x ) .This distribution of Y is called the standard Rayleigh distribution, while linear transformations a XY yield the general two-parameter Rayleigh distribution with cdf
+
(21.12)
Exercise 21.2 If X denotes a standard exponential random variable and Y = show that the cdf of Y is as given in (21.11). Then, derive the incan and variance of Y .
a,
GENERALIZED EXTREME VALUE DISTRIBUTIONS
21.5
193
Generalized Extreme Value Distributions
It turns out, that all the limiting distributions of maxima as well as all the limiting distributions of minima can be presented in a unified form. For this purpose, let us introduce the faniily of cdf’s H(z,,!?) (-0s < ,d < m) which are defined as
+
H(X, P) = exp{-(l+ zp)-’’D}
(21.13)
in the domain 1 zp > 0,and we suppose that H(z,,B) equals 0 or 1 (depending on the sign of p) if 1 z,LJ< 0. For p = 0, H ( z ,0) means the limit of H(x,,B)as /3 t 0. Let us first consider the case P = l / a , where a > 0. Then,
+
(21.14)
It is easy to see that the cdf (21.14) coincides with Next, if
p = -l/cx, where a > 0,then
This cdf coincides with H2,a ( XiQ). Finally, for /3
= 0,we
have
H(x,O)= e-e-5
= H3(x).
(21.16)
Thus, the derivations above show that the three-parameter faniily of cdf’s
H ( z ,p, a, A)
=H
(xiu,A ) , --oo
-cm
X>O, (21.17)
where
H(z,b)= e - (l+zP)- I / @
in the domain 1
(21.18)
+ zp > 0,includes all the cdf’s
as special cases. Equation (21.17) defines the generalized extreme value distributions for maxima, while H ( z ,p) in (21.18) correspond to its standard form. Similarly,
G(z, P, a, A)
=
1- H ( - T
P, a , A)
(21.19)
EXTREME VALUE DISTRIBUTIONS
194
defines the generalized extreme value distributions for minima, and (212 0 )
G ( x , P ) = 1- H ( - x , P )
correspond to its standard form.
21.6
Moments
Making use of the representation in Exercise 21.1, we can express moments of the extreme value distributions in terms of moments of the standard exponential distribution. Let random variables Y , W , and V have cdf's H I , ~ ( X ) , respectively. Then, from Exercise 21.1, we have the following H Z . ~ ( XH3(x), ), relations:
y
d
=x-l/a,
wd
-XI/",
v =d
-
logX,
(21.21)
where X has the standard exponential distribution. Hence, we have
EYk
=
EX-'/"
-1
cc x-k/a
e --I dx,
(21.22)
and (21.24) It readily follows from (21.22) that moments E Y k exist if k
k).
E Y k =I- (1-
< Q and that (21.25)
Relation (21.23) reveals that moments EW' exist for any 1 , 2 , . . . , and
Q
> 0 and k
=
(21.26) From (21.25) and (21.26), we also obtain (21.27)
Var for any a
> 0.
w = r (1 + :)
-
{r (1 +
:)}
2
(21.28)
195
MOMENTS It is known that Euler's constant y = 0.57722.. . is defined as lim
=
n+cc
(2
-
log n )
(21.29)
k=l
rw
=
-
Jo
logx e-" dx.
(21.30)
Comparing (21.24) and (21.30), we immediately see that
EV
= y = 0.57722 ....
(21.31)
Another way to obtain (21.31) is through the characteristic function fv(t) of the random variable V . We have
fv(t)
EeitV - ~ ~ - i t l o g X EX-it
lee
=
=
x-it
e -x dx
= r ( l-it).
(21.32)
From (21.32) and the relation
f'"(0)
= i k E V k,
we readily find that
k = 1,2,... .
E V ~= ( - i ) k r ( k ) ( i ) ,
(21.33)
The following useful identity, which is valid for positive z , helps us to find the necessary derivatives of the gamma function:
(21.34) Since
we obtain from (21.33) and (21.34) that
EV
=
- q i ) = -+(I) = 7,
(21.36)
(21.37)
It follows now from (21.36) and (21.37) that n
Var V
7r4
=-
6
(21.38)
EXTREME VALUE DISTRIBUTIONS
196
), Now, let, the random varia.bles Yl, W1, and V1 have cdf's G I , ~ ( XGz,a(x), and G3 ( x ) , respectively. Since Yl
d
=
-Y,
d
WI =
-w,v1 = -v, d
(21.39)
we immediately obtain EY:
5)'
=
Q
k
(21.40) (21.41)
for cv > 2, (21.42) Var Wl
=
Var W =I'
(
l +-
(21.43) (21.44)
and ?I-2
Var V1 = Var V = 6
.
(21.45)
It is important t o mention here that extreme value distributions discussed in this chapter have assumed a very important role in life-testing and reliability probltnis besides being used as probabilistic models in a variety of other problems.
CHAPTER 22
L 0GISTIC DISTRIBUTION 22.1
Introduction
Let Vl and V2 be i.i.d. random variables having the extreme value distribution with cdf [see (21.7)] &(2)
Let V
= V1 -
= e-e-",
-02
< 2 < 00.
V2. Then, the cdf Fv(x)of V is obtained as roo
J-00
This distribution is a particular case of the logistic distribution, which has been known since the pioneering work of Verhulst (1838, 1845) on demography. A book-length account of logistic distributions, discussing in great detail their various properties and applications, is available [Balakrishnan (1992)].
22.2
Notations
A random variable X is said to have a logistic distribution if its pdf is given by
The corresponding cdf is given by
197
198
LOGISTIC DISTRIBUTION
We will use X L o ( p , a 2 ) to denote the random variable X which has the logistic distribution with pdf and cdf as in (22.2) and (22.3), respectively. It is evident that p (-03 < p < cc) is the location parameter while u ( u > 0) is the scale parameter. Shortly, we will show that p and a2 are, in fact, the mean and variance of this logistic distribution. The standard logistic random variable, denoted by Y Lo(0, l), has its pdf and cdf as N
N
7r
e-XX/&
PY(Z)= -
--03<2<03
2’
(22.4)
and
-cc < 2 < 03,
(22.5)
respectively. Although we will see later that this is the standardized form of the distribution (i.e., having zero mean and unit variance), yet it is often convenient to work with the random variable V having the logistic Lo (0, 7r2/3) distribution as it possesses the following simple expressions for the pdf and cdf: (22.6) and FV(2) =
1
1+>-m
< 2 < 00.
(22.7)
Since the logistic density function in (22.2) can be rewritten as
P x ( z )=
a { 7r
sech2
7 r ( 2-
2ud3
},
-m
< 2 < oc,
(22.8)
the logistic distribution is also sometimes referred to as the sech-squared distribution. If X Lo(p,a2),then we note from (22.2) and (22.3) that the cdf F x ( 2 ) and the pdf p x ( 2 ) satisfy the relationship N
7r
p x ( z ) = -Fx(2)
06
(1 - F x ( 2 ) } ,
Of course, in the special case when V reduces to
N
-
-03
< 2 < 00.
(22.9)
Lo (0, 7r2/3),the relationship in (22.9)
p v ( z ) = Fv(2)(1 - Fv(2)},
-cc < 2 < cc.
(22.10)
When X Lo(p,a2),we may observe from (22.3) that the cdf of X has exponentially decreasing tails. Specifically, we find that
Fx(-z) as z + 00.
N
e--.rr(X+P)/(u&)
and
1 - Fx(2)
e-X(z-P)/(u&)
199
MOMENTS
22.3
Moments
Let V Lo (0, 7-r2/3). The simple form of its pdf p v ( x ) in (22.6) enables us to obtain the moments of V. Then, upon exploiting the obvious relationship N
x=-%IT+ p ,
(22.11)
we can readily obtain the moments of X Lo ( p ,0 ' ) . Of course, the exponential rate of decrease of the pdf entails the existence of all the moments of the logistic distribution. N
-
Moments about zero: Let v LO (0,7 r 2 / 3 ) . Since
all the odd-order moments of V are zero. Hence, we need to determine only the even-order moments of V. Now, making use of the identity that
C 00
(1+ x ) - 2 =
00
=
(;)xn
n=O
C(-l)n(n+ n=O
(22.12)
l)Zn,
we derive for m = 0 , 1 , 2 , . . . ,
1, 00
EVZ"
=
=
2Jd
x2m(1
00
e-" +
e-z)2
dx
e-"
x2m(1 + e - q 2 dx
n=O
=
{
2(2m)! C ( 2 k
-
l)-2m - C(2k)-2m
}
(22.13)
200
LOGISTIC DISTRIBUTION
where ((s) is the Riernann zeta function defined by
c
k-".
C(s) =
(22.14)
k=l
Using now the well-known facts that ((2) readily have E V 2 = Var V
=2
=
7r2/6 and ((4) =: 7r4/90, we
7r2
((2)
(22.15)
=-
3
and
E V 4 = 42 ((4) In the general case when X we readily have
N
77r4 15
(22.16)
= __ .
Lo ( p , a 2 ) ,using the relationship in (22.11),
In particulax, we find from (22.17) that
EX
=p+
*EV IT
(22.18)
=p
and 3a2 + -EV2 IT2
E X = p2 + 2p-EV 2
V5U
7r
= p2
+ a2.
(22.19)
Central moments: From (22.11), we find for n = 0 , 1 , 2 , . . . ,
=
In pa.rticular, we have
($)
n
EV".
(22.20)
CHARACTERISTIC FUNCTION
201
Also, due to the symmetry of the distribution of X (about p ) , we have /32n-1
E (X
-
E ( X - P ) ~ ~ -= 'O
E X )2n-1
for n==1 , 2, . . . .
22.4
(22.23)
Shape Characteristics
Let X be a logistic Lo(p, 0 2 )random variable. As noted earlier, the distribution of X is symmetric about p and, consequently, the Pearson coefficient of skewness of X is [see also (22.23)] y1 = 0. In addition, we find from (22.21) and (22.22) the Pearson coefficient of kurtosis of X to be (22.24)
Thus, the logistic distribution is a symmetric leptokurtic distribution. Plots of logistic density function presented in Figure 22.1 (for p = 0 a.nd different values of 0 ) reveal these properties.
22.5
Characteristic Function
-
For simplicity, let us consider V Lo (0,7r2/3) with pdf and cdf as given in (22.6) and (22.7), respectively. Then, the characteristic function of V is determined to be
f"(t)
= EeitV
dx
=
= = =
1
uit(l-
d7~
B(1+ it, 1 - i t ) r(i +it) r(i - i t ) .
(22.25)
Comparing the characteristic function of V Lo (0, 7r2/3) in (22.25) with the clmracteristic function r(l - it) of a random variable Vl which has the extreme value distribution H 3 ( 5 ) in (21.7) [see, for example, (21.32)], we readily observe the fact that the logistic Lo (0, 7r2/3) random variable V has the same distribution as the difference V, - V2, where Vl and Vz are i.i.d. random variables with the extreme value distribution H 3 ( 2 ) in (21.7). Note that this result was derived in Section 22.1 using the convolution formula. N
LOGISTIC DISTRIBUTION
202
-5
0
5
X Figure 22.1. Plots of logistic density function when p
=0
RELATIONSHIPS WITH OTHER DISTRIBUTIONS
as
203
From (22.25), we readily find the characteristic function of X
N
Lo(p, 02)
Now, making use of the facts that r(Z
+ 1) = z r(z)
and
r(Z)r(l-Z) =
n sin(nz)
~
'
we can rewrite the characteristic functions in (22.25) and (22.26) as fv(t)
= -
r(i+ it)r(i
it) = it r(it)r(i- it) 7rt sinh(nt) -
nit sin(7rit)
-
( 22.27)
and (22.28)
Exercise 22.1 From the expression of the characteristic function in (22.28), show that the mean and variance of X are p and a2,respectively.
22.6
Relationships with Other Distributions
For simplicity, let us once again consider V Lo (0, n2/3) with pdf and cdf as given in (22.6) and (22.7), respectively. Then, using the probability integral transformation, we readily observe the relationship N
v 2 log (")1 - u
,
(22.29)
where U has the standard uniform U ( 0 , l ) distribution. Since the distribution of V is symmetric about zero, we may consider the folded form of this logistic distribution, termed the half logistic distribution [see, for example, Balakrishnan (1992)l. Specifically, the random variable IVI has the half logistic distribution with pdf and cdf
LOGISTIC DISTRIBUTION
204
Exercise 22.2 For the half logistic distribution defined in (22.30), derive the mean a.nd variance.
-
Realizing that the half logistic distribution in (22.30) is simply the lefttruncated (a.t zero) distribution of V Lo (0, 7r2/3), we can introduce a general truncated logistic distribution with pdf (22.31) where a and b are the lower and upper points of truncation of the distribution of V , and A = 1/(1 e-a) and B = 1/(1 e d ) . Note that the half logistic density function in (22.30) is a special case o f t h e pdf in (22.31) when u = 0 and b = cx) so that A = and B = 1.
+
22.7
+
Decompositions
-
As already observed, the logistic random variable V Lo (0, 7r2/3) can be represented as the difference Vl - 14 of two i.i.d. random variables having the extreme value distribution H ~ ( xin) (21.7). This readily reveals that the logistic random variable is decomposable. Using Euler’s formula on the gamma function given by
S,
00
=
,-t
t
-1
dt
71!
=
z(z+ l)(z+2)...(z+n)
we have
r(l + it) r(l
-
it)
=
limnim
=
n 03
+
j=1
+
(1 P ) ( 4 1
14-( t / j ) 2 ‘
n”, (22.32)
+
{ ( n 1)!j2
+ t 2 ) . ’ . { ( n+ 1 ) 2 + t 2 } (22.33)
Recalling that 1/(1 t 2 )is the characteristic function of the standard Laplace L ( 0 , l ) distribution [see, for example, (19.5)], we obtain the following decorriposition result from (22.33): (2 2.34)
-
where V Lo ( 0 , x 2 / 3 ) and Yl,Y2,.. . are i.i.d. standard Laplace L ( 0 ;1) random variables with density function
205
ORDER STATISTICS
22.8
Order Statistics
~ Let Vl,V2,.. . ,V, be i.i.d. Lo (0,7r2/3) random variables, and let V I , < V2,n< . . . < Vn,, denote the corresponding order statistics. Then, the density function of Vr,%(for 1 T n) is given by
< <
pvJz)
=
n! (. - I)! ( n - T ) !
(6) + n -r
1
(=>'
-m
e-"
(1
< 5 < m.
e-z)2 '
(22.35)
From (22.35), we derive the characteristic function of Vr,nas
fv,,,(t)
=
EeitKsn
n!
-
-
-
eitx
n! ( T - I)! ( n - T ) ! r(T+ i t ) r ( n -
r(n
r(T)
s',+ -T
n-r+l
(e-") (1+ e-x)n+l
ur+zt- 1
dx
(1- 4n-r+if du
1 - it) 1)
+
(22.36)
Note that we have fv,,,(t) = f v n ~ y + l , n due ( - t )to the symmetry of the logistic distribution and, consequently,
EV& for 1 5 T 5 n and k
=
=
(22.37)
(-l)kEvL.+l,n
1,2,. .. .
Exercise 22.3 From the characteristic function of Vr,nin (22.36), derive expressions for the mean and variance of Vr,, for 1 T n.
< <
22.9
Generalized Logistic Distributions
Due to the simple form of the logistic distribution, several generalizations have been proposed in the literature. Four prominent types of generalized logistic densities are as follows: The Type I generalized logistic density function is (22.38)
The Type 11 generalized logistic density function is (22.39)
LOGISTIC DISTRIBUTION
206
T h e Type 111 generalized logistic density f u n c t i o n is
and the Type IV generalized logistic d e n s i t y f u n c t i o n is IV PV
ecbX
qa+b)
=
r(a)r(b)(1 + e-z)a+b’ -W
< 2 < 0 0 ~ > 0 , b > 0.
(22.41)
It should be mentioned that all these forms are special cases of a very general family proposed by Perks (1932). Exercise 22.4 Show that the characteristic functions of these four generalized logistic distributions are
r(i
-
i t ) r ( a+ it)
r(i + i t ) r ( a+ i t )
1
J3a)
1
respectively. Exercise 22.5 From the characteristic functions in (22.42), derive the expressions of the moments and discuss the shape characteristics. Exercise 22.6 If V has the Type I generalized logistic density function in (22.381, then prove the following: The distribution is negatively skewed for 0 < a < 1 and positively skewed for a > 1; The distribution of -aV behaves like standard exponential E ( 1) when a + 0; The distribution of V - log a behaves like extreme value distribution H s ( 2 ) in (21.7) when a 400;
(d) -V has the Type I1 generalized logistic density function in (22.39). Exercise 22.7 Let V have the Type I generalized logistic density function in (22.38). Let Y, given T , have the extreme value density function
re
-.x
e
--Te--z
where r has a gamma distribution with density function
Then, show that the marginal density function of Y is the same as that of V .
GENERALIZED LOGISTIC DISTRIBUTIONS
207
Exercise 22.8 If V has the Type I11 generalized logistic density function in (22.40), then prove the following: (a) The distribution of @V behaves like standard normal distribution when a + 00; (b) If Y1 and Yz are i.i.d. random variables with log-gamma density function
then the difference Y1 - Y2 is distributed as V; and (c) Let Y1 and Y2 be i.i.d. random variables with the distribution of - log 2,where 2 given T is distributed as gamma I?(&, a b) and T is distributed as beta(a, b). Then, the difference Y1 - Y2 has the same distribution as V.
+
Exercise 22.9 If V has the Type IV generalized logistic density function in (22.41), then prove the following: (a) The distribution is negatively skewed for a < b, positively skewed for a > b, and is symmetric for a = b (in this case, it simply becomes Type I11 generalized logistic density function); (b) If V is distributed as beta(a, b ) , then log ( Y / ( l- Y ) )is distributed as
and
v;
(c)
-V has the Type IV generalized logistic density function in (22.41) with parameters a and b interchanged.
This Page Intentionally Left Blank
CHAPTER 23
NORMAL DISTRIBUTION 23.1
Introduction
In Chapter 5 [see Eq. (5.39)] we considered a sequence of random variables
x,
-
np
wn=&$Tq,
n = 1,2,...]
where X , has binomial B ( n ,p ) distribution and showed that the distribution of W, converges, as n + 03, to a limiting distribution with characteristic function (23.1)
f ( t )= e-t2/2.
Similar result was also obtained for some sequences of negative binomial, Poisson, and ga.mma distributions [see Eqs. (7.30), (9.33), and (20.27)]. Since M
s,
If(t)l d t
< 00,
the corresponding limiting distribution has a density function p(x), which is given by the inverse Fourier transform 1 cp(x) = 2.ir
Lm "
eCitZ"ft) d t .
(23.2)
Since f ( t ) in (23.1) is an even function, we can express p(x) in (23.2) as (23.3) Note that p(x) is differentiable a.nd its first derivative is given by
=
-x p(x). 209
(23.4)
210
NORMAL DISTRIBUTION
We also have from (23.3) that
(23.5) since r ($) = f i . Upon solving the differential equation in (23.4) using (23.5), we readily obtain
as the pdf corresponding to the characteristic function f(t) = e p f 2 / 2 ,i.e.,
Lm 00
f ( t ) = e--t2/2 =
eit"cp(x) dx
A book-length account of normal distributions, discussing in great detail
their various properties and applications, is available [Patel and Read (1997)].
23.2
Notations
We say tha,t a random variable X has the standard n o r m a l distribution if its pdf is as given in (23.6), and its cdf is given by (23.7) The linear transformation Y = u nornial random variable with pdf
-
and cdf
1
$-O X
(-cc
(x - u)2
< a < co, o > 0) genera.tes a
,
--oo
< x < co,
(23.8)
ENTROPY
211
In the sequel, we will use the notation Y N ( a ,0 2 )to denote a random variable Y having the normal distribution with location parameter a and scale parameter cr > 0. Shortly, we will show that a and a2 are, in fact, the mean N ( 0 , l ) will denote that X has and variance of Y , respectively. Then, X the standard normal distribution with pdf and cdf as in (23.6) and (23.7), respectively. The normal density function first appeared in the papers of de Moivre a t the beginning of the eighteenth century as an auxiliary function that approximated binomial probabilities. Some decades later, the normal distribution was given by Gauss and Laplace in the theory of errors and the least squares method, respectively. For this reason, the normal distribution is also sometimes referred t o as Gaussian law, Gauss-Laplace distribution, Gaussian distribution, and the second law of Laplace. N
N
23.3
Mode
It is easy t o see that normal N ( a ,0') distribution is unimodal. From (23.8), we see that
which when equated to 0, yields the mode to be the location parameter a, and the maximal value of the pdf p ( a , 0 , x) is then readily obtained from (23.8) to be 1/(0&).
23.4
Entropy
The entropy of a normal distribution possesses an interesting property.
Exercise 23.1 Let Y
N
N ( a ,r 2 ) Show . that its entropy H ( Y ) is given by
1 H ( Y )= 2
+ log(aJ2.rr).
(23.10)
It is of interest to mention here that among all distributions with fixed mean a and vxiance 02, the maximal value of the entropy is attained for the normal N ( a ,0 ' ) distribution.
212
NOR,MAL DISTRIBUTION
23.5
Tail Behavior
The normal distribution function has light tails. Let X N ( 0 , l ) . From Table 23.1, which presents values of the standard normal distribution function a(.), we have N
a(-l) + 1 2{1 @(a)}
@(1)= 2{1
P{l[l > 1) P{ > 2)
=
P{I 3)
=
., 2{1 - @ ( 3 ) }= 0.0027.. . ,
P{I[I > 4)
=
2{1
=
-
-
-
-
Q(1)) = 0.3173.. . ,
= 0.0455..
@(4)} = 0.000063.. . .
It is easy t o obtain that for any x
> 0,
(23.11) Simila.rly, for any x
> 0,
I-
x3
1
d%
p
/
2
(23.12)
Using (23.12), we get the following lower bound for the tail of the normal distribution function: 1- @(x)
=
1
.i,
”
c t 2 l 2d t
(23.13) Hence, the asympt,otic behavior of 1 - @(x) is determined by the inequalities
TAIL BEHAVIOR
J:
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86
@.(XI
0.5000 0.5080 0.5160 0.5239 0.5319 0.5398 0.5478 0.5557 0.5636 0.5714 0.5793 0.5871 0.5948 0.6026 0.6103 0.6179 0.6255 0.6331 0.6406 0.6480 0.6554 0.6628 0.6700 0.6772 0.6844 0.6915 0.6985 0.7054 0.7123 0.7190 0.7257 0.7324 0.7389 0.7454 0.7517 0.7580 0.7642 0.7704 0.7764 0.7823 0.7881 0.7939 0.7995 0.8051
x
0.88 0.90 0.92 0.94 0.96 0.98 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 1.18 1.20 1.22 1.24 1.26 1.28 1.30 1.32 1.34 1.36 1.38 1.40 1.42 1.44 1.46 1.48 1.50 1.52 1.54 1.56 1.58 1.60 1.62 1.64 1.66 1.68 1.70 1.72 1.74
@(X)
0.8106 0.8159 0.8212 0.8264 0.8315 0.8365 0.8413 0.8461 0.8508 0.8554 0.8599 0.8643 0.8686 0.8729 0.8770 0.8810 0.8849 0.8888 0.8925 0.8962 0.8997 0.9032 0.9066 0.9099 0.9131 0.9162 0.9192 0.9222 0.9251 0.9278 0.9306 0.9332 0.9357 0.9382 0.9406 0.9429 0.9452 0.9474 0.9495 0.9515 0.9535 0.9554 0.9573 0.9591
X
2 13
@(XI
1.76 0.9608 1.78 0.9625 1.80 0.9641 1.82 0.9656 1.84 0.9671 1.86 0.9686 1.88 0.9699 1.90 0.9713 1.92 0.9726 1.94 0.9738 1.96 0.9750 1.98 0.9761 2.00 0.9772 2.02 0.9783 2.04 0.9793 2.06 0.9803 2.08 0.9812 2.10 0.9821 2.12 0.9830 2.14 0.9838 2.16 0.9846 2.18 0.9854 2.20 0.9861 2.22 0.9868 2.24 0.9875 2.26 0.9881 2.28 0.9887 2.30 0.9893 2.32 0.9898 2.34 0.9904 2.36 0.9909 2.38 0.9913 2.40 0.9918 2.42 0.9922 2.44 0.9927 2.46 0.9931 2.48 0.9934 2.50 0.9938 2.52 0.9941 2.54 0.9945 2.56 0.9948 2.58 0.9951 2.60 0.9953 2.62 0.9956
X
2.64 2.66 2.68 2.70 2.72 2.74 2.76 2.78 2.80 2.82 2.84 2.86 2.88 2.90 2.92 2.94 2.96 2.98 3.00 3.02 3.04 3.06 3.08 3.10 3.12 3.14 3.16 3.18 3.20 3.22 3.24 3.26 3.28 3.30 3.32 3.34 3.36 3.38 3.40 3.42 3.44 3.46 3.48 3.50
@(x)
0.9959 0.9961 0.9963 0.9965 0.9967 0.9969 0.9971 0.9973 0.9974 0.9976 0.9977 0.9979 0.9980 0.9981 0.9982 0.9984 0.9985 0.9986 0.9987 0.9987 0.9988 0.9989 0.9990 0.9990 0.9991 0.9992 0.9992 0.9993 0.9993 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
Table 23.1: Standard Normal Cumulative Probabilities
214
NORMAL DISTRIBUTION
which are valid for any positive
2.
It readily follows from (23.14) that
1- @ ( T )
1
N
(23.15)
- cp(.) X
a.s x 4cu, where is the pdf of the standard normal distribution [see (23.6)].
23.6
Characteristic F’unct ion
Frorn (23.1), we have the characteristic function of the standard normal N(O.1) distribution to be
f ( t ) = e--t2/2.
If Y has the general N ( u ,a 2 )distribution, we can use the relation Y = a+aX, where X N ( 0 ,l),in order to obtain the characteristic function of Y as N
“1
(23.16)
2
We see that fy(t) in (23.16) has the form exp{Pz(t)}, where Pz(t) is a polynomial of degree two. Marcirikiewicz (1939) has shown that if a characteristic function g ( t ) is expressed as exp{P,(t)}, where P,(t) is a polynomial of degree n, then there are only the following two possibilities: (a) n = 1, in which case g ( t ) = eiat (degenerate distribution); (b) n = 2, in which case g ( t ) = exp{iat
-
a2t2/2}(normal distribution).
In Chapter 12 we presented the definition of stable characteristic functions and stable distributions [see Eq. (12.10)].
Exercise 23.2 Prove that the characteristic function in (23.16) is stable, and hence all normal distributions are stable distributions. Exercise 23.3 Find the characteristic function of independent standard normal random variables.
XY whcri X a.nd Y are
Exercise 23.4 Let X I ,X z , X3, and X 4 be independent standard normal ra.ndom variables. Then, show that the random variable Z = X I X 2 X s X , has the Laplace distribution.
+
MOMENTS
23.7
215
Moments
The exponentially decreasing nature of the pdf in (23.8) entails the existence of all the moments of the normal distribution.
Moments about zero: Let X N ( 0 , l ) . Then, it is easy to see that the pdf p(z) in (23.6) is symmetric about 0, and hence N
= EX2"+l = 0
CYZn+l
(23.17)
for n = 1 , 2 , . . . . In particular, we have from (23.17) that a1
=EX =0
(23.18)
and (23.19) Next, the moments of even order are obtained as follows:
-
(an)! 2" n!'
~
n = l , 2 , ....
(23.20)
In part,icular, we obtain from (23.20) that
EX2=1
(23.21)
= E X 4 = 3.
(23.22)
a2 =
and
In general, if Y the formula
EY"
= E(a
=
a
+ OX
+ax)"=
N
N ( a , a 2 ) ,we can obtain its moments using
n.
a'a"-'EX"-', r=O
n = 1,2,.. .,
(23.23)
NORMAL DISTRIBUTION
216 which immediately yields
(23.24)
EY=u and
E Y 2 = a2
+ u2.
(23.25)
-
-
Central moments: Let X N ( 0 , l), V N ( 0 , u 2 ) ,and Y N ( a ,a2). Then, we have the following relations between central moments of these randoni variables: N
E ( Y - EY)"
= =
E ( V - EV)" = u " E ( X - E X ) " u"EX" = u r b a n , n = 1 , 2, . . . ,
(23.26)
where an are as given in (23.17) and (23.20). In particular, we have Var Y
= u2
Var
x = u2 a2 = u 2 .
(23.27)
We have thus shown that the location paranieter a and the scale parameter a2 of riornial N ( a ,a 2 )distribution are simply the mean and variance of the distribution, respectively.
Cumulants: In situations when the logarithm of a characteristic function is simpler to deal with than the characteristic function itself, it is convenient to use the cuniulants. If f ( t ) is the characteristic function of the random variable X , then the cumulant 7 k of degree k is defined as follows: (23.28) In particular, we have 71
=
72
=
73
=
(23.29)
EX, Var X , E(X-EX)3
-
(23.30) (23.31)
If the moment E X k exists, then all cumulants 71,72, . . . ,7 k also exist. Let us now consider Y N ( a ,a 2 ) .Since its characteristic function ha.s the form f y ( t ) = exp
{
iat
(t)
,
-
we have log f y ( t )= iat
F}
fy
a2tl
.
2 Hence, we find y1 = a , 7 2 = a ' , and yk = 0 for Ic = 3 , 4 , . . . . Moreover, recalling the Marcinkiewicz's result for characteristic functions mentioned earlier, we obtain the following result: If for some n all cumulants 7 k ( k = n,n+ 1.. . .) of a random variable X are equal to zero, then X has either a degenerate or a norriial distribution. ~
~
CONVOLUTIONS AND DECOMPOSITIONS
Shape Characteristics
23.8 Let
Y
N
217
N ( a , ( s 2 ) .Then, from (23.26), we readily have
E (Y -
= 0 ~ ~ x=3 0,
(23.32)
from which we immediately obtain the Pearson coefficient of skewness to be 0. Next, we have from (23.26) and (23.22) that
E ( Y - EY)4= 04a4 = 304 ,
(23.33)
from which we immediately obtain the Pearson coefficient of kurtosis to be 3 . Thus, we have the normal distributions to be symmetric, unimodal, bellshaped, and mesokurtic distributions. Plots of normal density function presented in Figure 23.1 (for a = 0 and different values of 0 ) reveal these properties.
Remark 23.1 Recalling now that (see Section 1.4) distributions with coefficient of kurtosis smaller than 3 are classified as platykurtic (light-tailed) and those with larger than 3 are classified as leptokurtic (heavy-tailed), we simply realize that a distribution is considered to be light-tailed or heavy-tailed relative t o the normal distribution.
23.9
Convolutions and Decompositions
+
In order to find the distribution of the sum Y = Yl Y2 of two independent random variables Yk N ( a k , o;),k = 1,2, we must recall from (23.16) that characteristic functions f k ( t ) of these random variables have the form N
and hence
where a = a1 normal N ( u l also valid.
+ a2 and o2 = a? + 0.22.This immediately implies that Y has + a2, 0::+ 0,”)distribution. Of course, a more general result is
+
Exercise 23.5 For any n = 1 , 2 , . . ., the sum Yl Y2 + . . . + Y, of independent random variables Yk N ( a k ,o;),k = 1 , 2 , . . . ,n, has normal N ( a ,c2) distribution with mean a = a1 + . . . + a, and variance o2 = of . . . 0:. N
+ +
NORMAL DISTRIBUTION
218
- - - . N(0.4)
I
I
I
I
I
-4
-2
0
2
4
X
Figure 23.1. Plots of normal density function when a
=0
CONDITIONAL DISTRIBUTIONS
219
We can now state that any normal N ( a ,u 2 )distribution is decomposable, because it can be presented as a convolution of two normal N ( a l , a ! ) and N ( u - a l , a 2 - a!) distributions, where CT! < a2 and a1 may take on any value. In fact, Cram& (1936) has proved that only "normal" decompositions are possible for normal distributions, viz., if Y N ( u , a 2 ) ,-cc < a < 00, a2 > 0, and Y = Y1 Y2, where Y1 and Y2 are independent nondegenerate random variables, then there exist -cc < a1 < cc and 0 < af < u2 such that Yl N ( a l , a : ) and yZ N ( u - u1,a2 - a:). Thus, the family of normal distributions is closed with respect to the operations of convolution and decomposition. Recall that families of binomial and Poisson distributions also possess the same property. Furthermore, the characteristic function of normal N ( a ,a 2 ) distribution which is
-
-
+
-
{
f ( t ) = exp iat
-
can be presented in the form
T}
where
is also the characteristic function of a normal distribution. Thus, we have established that any normal distribution is infinitely divisible.
23.10
Conditional Distributions
-
Consider independent random variables X N ( 0 , l ) and Y N ( 0 , l ) . Then, V = X Y has normal N(O,2) distribution. Probability density functions of X , Y , and V are given by
+
N
(23.34) and
(23.35) from which we can find the conditional distribution of X given V conditional pdf p X l v ( z ( w ) is given by
u. The
(23.36)
NORMAL DISTRIBUTION
220
Observing that the RHS of (23.36) is the pdf of normal N (z3/2, l / 2 ) distribution, we conclude that the conditional distribution of X , given V = w, is normal with mean w/2 and variance 1/2. A similar result is valid in the general ca.se too.
Exercise 23.6 Find the conditional pdf p x , v ( z l v )for the case when X N ( u l , o f ) ,Y N N(a2,a5),with X and Y being independent, arid V = X + Y .
N
23.11
Independence of Linear Combinations
-
As noted earlier (see Exercise 23.3), the sum mally distributed random variables X I , N ( u k ,.;), normal
Xk of independent nork = 1 , 2 , . . . , n,,also has
distribution. The following genera.1result can also be proved similarly.
Exercise 23.7 For any coefficients b l , . . . , bn, prove that, the linear combination n k=l
of independent normally distributed random variables XI, 1 , 2 , . . . , n, also has normal / n
n
\
\k=l
k=l
1
N
N ( a k ,o f ) , k =
distribution.
Let us now- consider two different linear combinations n
n
k=l
k=l
arid find the restriction on the coefficients b l , . . . , b,, e l , . . . , c,,, and parameters a l , . . . , a, and a:, . . . , a : , which provides the independence of L1 and La. It, is clear that the location parameters a l , . . . , an cannot influence this independence, so we will suppose that uk = 0 for k = 1 , 2 , . . . , n without loss
BERNSTEIN’S THEOREM
221
of any generality. Then, the linear forms L1 and L2 are independent if and only if EeiuLl+iwLz
holds for any real u and v. Using the fact that X k that
-
EeiUL1EeiVL2
N ( 0 ,o:), k
N
fk(t) = E e i t X k
=
(23.37)
1 , 2 , . . . , are independent, and
= e-t2l2
we get the following expression for the joint characteristic function of L1 and
L2;
f
(%V)
-
~ ~ i u L l + i v Lz ~~i
x:=,(bkU+Clcv)Xk
n
k=l
(23.38) Also, the characteristic functions of L1 and La are given by (23.39) and
(23.40) Equations (23.38)-(23.40) then imply that the condition for independence of L1 and L2 in (23.37) holds iff n
(23.41) k=l
23.12
Bernstein’s Theorem
If we take a pair of i.i.d. normal random variables X and Y and construct linear combinations L1 = X Y and L2 = X - Y , then the condition (23.41) is certainly valid and so L1 and L2 are independent. Bernstein (1941) proved the converse of this result.
+
222
NORMAL DISTRIBUTION
+
Theorem 23.1 Let X and Y be i.i.d. random variables, and let L1 = X Y and L2 = X - Y a l s o be independent. Then, X and Y have either degenerate or normal distribution. We will present here briefly the main arguments used t o prove this theorem. (1) Indeed, the statement of the theorem is valid if X and Y have degenerate distribution. Hence, we will focus only on the nontrivial situation wherein X and Y have a nondegenerate distribution.
(2) Without loss of generality, we can suppose that X and Y are symmetric random variables with a common nonnegative real characteristic function f ( t ) . This is so because if X and Y have any characteristic function y ( t ) , we can produce symmetric random variables V = X - XI and U = Y - Yl, where X I and Yl have the same distribution as X and Y , respectively, and the random variables X , X 1 , Y , and Y1 are all independent. This symmetrization procedure gives us new independent ra.ndom variables with the common characteristic function f ( t ) = g ( t ) y ( - t ) = 1g(t)I2, which is real and nonnegative. Due to the conditions of the theorem, X Y and X - Y as well as X1 Yl and XI- Yl are independent, and so X U and X - U are also independent. Thus, the random variables V and U with a common real nonnegative characteristic function satisfy the conditions of the theorem. Suppose now that we have proved that V and U are normally distributed random variables. Sirice V is the sum of the initial independent random variables X and Y , we can apply CramWs result on decompositions of normal distributions stated earlier and obtain immediately that X and Y are also normally distributed random variables.
+
+
+
(3) Since X and Y have a. common real characteristic function f ( t ) , we have the cha,ract,eristicfunctions of L1 and L2 as
f l ( u )= Ee
ZUL1
- Eei7LXEeiUY -
-
f2(4
(23.42)
arid fi(v) = Ee i u L 2 - E ~ ~ . u X E ~ - ~= U fY (
~ ) f ( - v= ) fz(t/).
We also have the joint characteristic function of
L1
(23.43)
and Lz as
(23.44)
(23.45)
BERNSTEIN’S THEOREM
(4) Taking u = nt and
2)
=t
223
in (23.45), we get
+
f [ ( n l)t]f[(n- l ) t ]= f 2 ( n t ) f 2 ( t ) ,
n
=
1 , 2 , .. .
.
(23.46)
In particular, we have f ( 2 t ) = f4(Q
(23.47)
It follows immediately from (23.47) that f ( t ) # 0 for any t. Also, since f(0) = 1 and f ( t ) is a continuous function, there is a value a > 0 such that f ( t ) # 0 if ltl < a. Then, f(2t) # 0 if It1 < a, which means that f ( t ) # 0 for all t in the interval (-2a,2a). Proceeding this way, for any n = 1 , 2 , . . . , we obtain that f ( t ) # 0 if jtl < 2na, and hence the nonnegative function f ( t ) must be strictly positive for any t. Now, from the equality [see (23.46) and (23.47)], f ( 3 t ) f ( t ) = f 2 ( 2 t ) f 2 ( t )= {f4(t)Kf2(t) = flO(t), we get the relation
f(3t) = f9(t).
(23.48)
Following this procedure, we obtain
f(m t ) = f r n 2 ( t ) which is true for any m
=
(23.49)
1 , 2 , . . . and any t.
(5) The standard technique now allows us t o get from (23.49) that (23.50) and (23.51) for any integers m and n, where c = f ( l ) ,0 < c 5 1. Since f ( t )is a continuous function, (23.51) holds true for any positive t:
f ( t ) = ct 2 .
(23.52 )
Since any real characteristic function is even, (23.52) holds for any t.
If c = 1, we obtain f ( t ) = 1 (the characteristic function of the degenerate distribution). If 0 < c < 1, then
f ( t )= e- 2 t 2 / 2 , where u2 = -2 log c > 0, and the theorem is thus proved.
224
NORMAL DISTRIBUTION
Darmois-Skitovitch’s Theorem
23.13
In Bernstein’s theorem, we considered the simplest linear combinations L1 = X f Y and L2 = X - Y . Darmois (1951) and Skitovitch (1954) independently proved the following more general result. Theorem 23.2 L e t X I , (k = 1 , 2 , . . . , n ) be i n d e p e n d e n t nondegenerate ran-
d o m variables, a n d let
where bl; and c k (k = 1 , 2 , . . . , n) are n o n z e r o real coeficients. I . L1 and L2 are independent, t h e n t h e r a n d o m variables X I , . . . , X , all have n o r m a l distributions. A very interesting corollary of Theorem 23.2 is the following. Let -
L1=X= We see t1ia.t
XI+...+X, n
and
L2
= X1
-
X.
n
k=l
k=l
where
c1 Ck
= =
1 1--,
n
1
- -
n
,
k = 2 , 3 , . . . ,n.
(23.53)
It then follows from Darmois-Skitovitch’s theorem that independence of L1 and L2 implies that X’s are all normally distributed. If X I , N ( Q , a;), k = 1 , 2 , . . . , n,then (23.41) shows that for the coefficients in (23.53), linear combinations L1 and L2 are independent if and only if
-
-
(23.54)
For example, if X I , N ( Q , a 2 ) ,k = 1 , 2 , . . . , n, then 0%= cr2 ( k = 1 , 2 , .. . ,n ) , in which case (23.54) holds, and so linear combinations L1 and L2 are independent. More generally, in fact, for independent normal random variables Xi; N ( a k , a2),k = 1 , 2 , . . . , n, the random vector ( X I - X , X2 - X , . . . , X , - X ) and X = ( X I . . . X , ) / n are independent. It is clear that in order t o prove this result,, we can take (without loss of any generality) X’s t o be standard
-
+ +
DARMOIS-SKITOVITCH’S THEOREM
225
normal N ( 0 , l ) random variables. Let us now consider the joint characteristic function of the random variables X , X1 - X , X2 - X , . . . , X n - X :
n
=
t - (tl
E e x p { ik=l xXk (tk+
+ + . .+ tn) t2
’
n
(23.55) Since X’s in (23.55) are independent random variables having a common characteristic function f ( t )= e-t2/2, we obtain
t - (tl
t2
n
k=l n
t - (tl
+ t 2 + . . . + tn) n
k=l
n
=
+ + .. .+ tn)
e x p { - ; qk=l t,+
t - (tl + t 2 + . . . + tn) n
(23.56) We can rewrite (23.56) as
where (23.58)
Equation (23.57) then readily implies the independence of X and the random vector ( X I - X , X2 - X,. . . , X n - X ) . Furthermore, X and any random variable T ( X 1 - X , . . . , X , are also independent. For example, if X I , X2, . . . , X n is a random sample from normal N ( a ,0 2 )distribution, then we can conclude that the sample mean X and the sample variance
x)
(23.60) are independent. Note that the converse is also true: If X I ,X 2 , . . . , X n is a random sample from a distribution and that X and S 2 are independent, then X’s are all normally distributed.
NORMAL DISTRIBUTION
226 Now, let
XI,^ 5 X2,n 5 . . . 5 Xn,n
be the order statistics obtained from the random sample XI, X2,. . . , X,. Then, the random vector XI.^ - X ,. . . , X,,,, - X ) and the sample mean X are also independent. This immediately implies, for example, that the sample range Xn,n
-
XI,,
== (Xn,n - X )-
(X1,n - X )
and the sample mean X are independent for samples from normal distribution.
23.14
Helmert’s Transformation
Helmert (1876) used direct transformation of variables to prove, when XI, . . . , X, is a random sample from normal N ( a , a 2 )distribution, the results that X and S2 are independent and also that ( n - 1)S2/a2has a chi-square distribution with n - 1 degrees of freedom. Helmert first showed that if YI, = XI, - X ( k = 1,.. . ,n ) , then the joint density function of Yl, . . . , Y,-l (with Y, = -Yl
-
. . . - Y,-1)
and
X
is proportional t o
thus establishing the independence of X and any function of X1 - X ,. . . , Xn X,including S2. In order to derive the distribution of S2, Helmert (1876) introduced the transformation
...
Then, froni the joint density function of Y1, . . . , Y,-l
fi
(&-)
n-l
exp
{
-
1 (YH
given by
}
+ .. . + Y 3 ,
we obtain the joint density function of wl,. . . , wn-l as
IDENTITY OF DISTRIBUTIONS O F LINEAR COMBINATIONS
227
Since this is the joint density function of n - 1 independent normal N ( 0 ,u 2 ) random variables, and that 1
n
1 ” C y; = y C(.k a
,.EWE=,. n-l
k= 1
k=l
k=l
-
2)2 =
( n- 1)s’ 0-2
’
Helmert concluded that the variable ( n- 1)S2/a2 has a chi-square distribution with n - 1 degrees of freedom. The elegant transformation above given is referred to in the literature as Helmert’s transformation.
23.15
Identity of Distributions of Linear Combinations
Once again, let X I , variables, and
N
N ( a k , a z ) , Ic = 1 ’ 2 , . .. ,n, be independent random n k=l
k=l
It then follows from (23.39) and (23.40) that these linear combinations have the same distribution if n
n
bkak = k=l
k=l
n,
n
(23.61)
Ckak
and (23.62) If X ’ s have a common standard normal distribution, then the condition in (23.62) is equivalent to n
n
k= 1
k= 1
(23.63) For example, in this situation, (23.64) for any integers m and n and, in particular, (23.65) P6lya (1932) showed that if X1 and X2 are independent and identically distributed nondegenerate random variables having finite variances, then the equality in distribution of the random variables X1 and ( X I X z ) / f i characterizes
+
NORMAL DISTRIBUTION
228
the normal distribution. Marcinkiewicz (1939) later proved, under some restrictions on the coefficients bk and ck ( k = 1,.. . , n ) , that if X I , .. . , x, are independent random variables having a common distribution and finite momerits of a.11order, then
k=l
k=l
implies that X ' s all have normal distribution.
23.16
Asymptotic Relations
As already seen in Chapters 5, 7, 9, and 20, the normal distribution arises naturally a.s a limiting distribution of some sequences of binomial, negative binomial, Poisson, and gamma distributed random variables. A careful look at these situations reveals that the normal distribution has appeared there as a limiting distribution for suitably normalized sums of independent random variables. There are several modifications to the central limit theorem, which provide (under different restrictions on the random variables XI, X 2 , . . .) the convergence of sums S, - ES,
~TGzz'
+
+
where S, = X I . . . X,, to the normal distribution. This is the reason why the normal distributions plays an important role in probability theory and mat hematical statistics. Cha.nging the sum S, by the maxima Ad, = max(X1, X 2 , . . . , X n } , n = 1 , 2 , . . ., we get a different limiting scheme with the extreme value distributions (see Chapter 21) determining the asymptotic behavior of the normalized random variable Adn. It should be mentioned here that if X I , N ( 0 , I ) , k = 1 , 2 , . . ., are independent random variables, then
-
(23.66) as n
4 00, where
a,
=
42-
-
+
log log n log 47r 2&i-i@7i
(23.67)
and (23.68)
TRANSFORMATIONS
23.17
229
Transformations
Consider two independent random variables R and 4,where 4 has the uniform U(0127r)distribution and R has its pdf as pR(r)= r
r
e-r2/2,
2o .
(23.69)
Note that
P { R < r ) = 1- e - r 2 I 2 ,
r
2 0,
(23.70)
which means that R has the Rayleigh distribution [see (21.12)], which is a special case of the Weibull distribution. Moreover, R can be expressed as (23.71)
R=J2X,
where X has the standard exponential E(l) distribution. Then, the joint density function of R and q5 is given by 1
r 2 0, o 5 cp 5 27r.
pR,a(r,p)= - r e - ~ ’ / ~ , 27r
(23.72)
Let us now consider two new random variables V = R sin 4 and W = R cos 4. In fact, R and q5 are the polar coordinates of the random point (V,W ) .Since the Jacobian of the polar transformation (TJ = r sin cp, w = r cos cp) equals T , we readily obtain the joint pdf of V and W as (23.73) Equation (23.73) implies that the random variables V and W are independent a.nd both have standard normal distribution. This result, called Box-Muller ’s transformation [Box and Muller (1958)], shows that we have the following representation for a pair of independent random variables having a common standa.rd normal N ( 0 , l ) distribution:
(v,W ) 5
( J 2 X s i n 2 7 r ~ ,d E c o s 2 7 r ~ ) ,
(23.74)
where X and U are independent, X has the standard exponential E(1) distribution, and U has the standard uniform U ( 0 , l ) distribution. We can obtain some interesting corollaries from (23.74). For example,
V2fW2 2
=x d
(23.75)
and so
V
VV-TTF’d m 2
=
(sin 27rU, cos 27rU).
(2 3.76)
We see from (23.75) and (23.76) that the vector on the left-hand side of (23.76) does not depend on V 2 W 2and it has the uniform distribution on the unit circle. From (23.76), we also have
+
Z
=
v d
-
W
=
tan27rU.
(23.77)
NORMAL DISTRIBUTION
230 It is easy to see that
d
tan2rrU = t a n r a.nd so
z =d
ta.nr ( U -
t)
.
(23.78)
Comparing (23.78) with (12.5), we readily see that Z has the standard Cauchy C ( 0 , l ) distribution. Next, let us consider the random variables Yl =
2vw
VVT-w
and
Y2 =
w2 v2 -
vP-Tiv‘
(23.79)
Shepp (1964) proved that Yl and Y2 are independent random variables having standard normal distribution. This result can easily be checked by using (23.74). We have
(Y~,
d
(2m sin 27ru cos 2 7 r ~ ,2J2X(cos2 27ru - sin2 27ru)
5
(msin47r~m , c o s 4 7 r ~ .j
(23.80)
Taking into account that (sin 4rU, cos 47rU) and hence
2
( ~ 1 ~, 2 )
d
=
(sin 27rU, cos 27rU)
(asin 2 7 r ~ ,
cos 27ru) ,
we immediately obt,ain from (23.74) that
which proves Shepp’s (1964) result. Bansal et al. (1999) proved the following converse of Shepp’s result. Let V and W be independent and identically distributed random variables, and let Y1 and Y2 be as defined in (23.79). If there exist real u and b with u2 b2 = 1 such that aY1 bY2 has the standard normal distribution, then V and W are standard normal N ( 0 , l ) random variables. Now, summarizing the results given above, we have the following characterization of the normal distribution: “Let V and W be independent and identically distributed random variables. Then, V N(O,l) and W N ( 0 , l ) iff Yl has the standard normal distribution. The same is valid for Y2 as well.” Conling back to (23.75), we see that X = ( V 2 W 2 ) / 2has the standard exponential E ( 1) distribution with characteristic function (1- it)-’. Since X is a sum of two independent and identically distributed random variables V2/2
+
+
-
-
+
TRANSFORMATIONS
231
and W 2 / 2 ,we obtain the characteristic function of V2/2 t o be (1 which corresponds to r 0 , l ) with pdf
(a,
1
-e-”,
6
x >O.
Then, the squared standard normal variable V 2 has
&
e-+,
r (i,0,2) with pdf
x > 0.
If we now consider the sum (23.81) k=l
where Vl, V2,. . . ,Vn are i.i.d. normal N ( 0 , l ) random variables, then the reproductive property of gamma distributions readily reveals that Sn has I‘ (n/2,0,2) distribution. In Chapter 20 we mentioned that this special case of r (n/2,0,2) distributions, where n is an integer, is called a chi-square ( x 2 ) distribution with n degrees of freedom. Based on (20.24), the following result is then valid: For any k = 1 , 2 , . . . , n, the random variables
and Sn are independent, and Tk,n has the standard beta ution. In particular, if n = 2, T1,2 =
v?
v,”+ vz”
and
T2,2 =
(i, 2 ’) distrib-
-
vz” v,2 + vz”
have the standard arcsine distribution.
Exercise 23.8 Show that the quotient (V,” form U ( 0 , l ) distribution.
+ K2)/Sq has the standard uni-
Exercise 23.9 Show that the pdf of (23.82) is given by (23.83)
NORMAL DISTRIBUTION
232
The distribution with pdf (23.83) is called as chi distribution (x distribution) with n degrees of freedom. Note that (23.83), when n = 2, corresponds to the Rayleigh density in (23.69). The case when n = 3 with pdf
is called the standard Maxwell distribution. Consider now the quotient
T=
d
V
m
=
V
m
l
(23.85)
where V, V l , . . . , V, are all independent random variables having standard normal distribution. The numerator and denominator of (23.85) are independent arid have normal distribution and chi distribution with n degrees of freedom, respect>ively.
Exercise 23.10 Show tha.t the pdf of T is given by
The distribution with pdf (23.86) is called Student’s t distribution with n degrees of freedom. Note that Student’s t distribution with one degree of freedom (i.e.] case n = 1) is just the sta.ndard Cauchy distribution.
Exercise 23.11 As n + 30, show that the t density in (23.86) converges to the standard normal density function.
Now, let Y = 1/V2 be the reciprocal of a squared standa.rd normal N ( 0 , l ) random variablc. Then, it can be shown that the pdf of Y is
( 23.87) It can also he shown that the characteristic function of Y is given by
(23.88)
TRANSFORMATIONS
233
It is not difficult t o check that f y ( t ) is a stable characteristic function, and the random variable with pdf (23.87) has a stable distribution. It is of interest to mention here that there are only three stable distributions that possess pdf's in a simple analytical form: normal, Cauchy, and (23.87). Of course, Y as well as any other stable random variable has an infinitely divisible distribution. We may observe that many distributions related to the normal distribution are infinitely divisible. However, in order to show that not all distributions related closely to normal are infinitely divisible or even decomposable, we will give the following example. It is easy to check that if a random variable X with a characteristic function f ( t ) and a pdf p(x) has a finite second moment then f*(t)= f ( 2 ) ( t ) is indeed a characteristic function which corresponds .f (2) (0) to the pdf a2,
Now, let X
-
p*(x)
X2P(X) =ff2
N ( 0 , l ) . Then,
consequently, (23.89)
f * ( t ) = (1- t 2 ) e - t 2 / 2 is the characteristic function of a random variable with pdf
(23.90) The characteristic function in (23.89) and hence the distribution in (23.90) are indecomposable. Let X N(0,l) and X = a+ blog Y . Then, Y is said t o have a lognormal distribution with parameters a and b. By a simple transformation of random variables, we find the pdf of Y as
-
b
(23.91)
We may take b to be positive without loss of any generality, since - X has the same distribution as X . An alternative reparametrization is obtained by replacing the parameters a and b by the mean m and standard deviation 0 of the random va.riable l o g y . Then, the two sets of parameters satisfy the relationships 7n =
U
- -
b
and
1 b'
(23.92)
L T =-
so that we have X = (log Y - m ) / a ,and the lognormal pdf under this reparametrization is given by 1
1 (logy - m)2 2 02
>
Y>O.
(23.93)
234
NORMAL DISTRIBUTION
The logiiormal distribution is also sometimes called the Cobb-Douglas distribution in the economics literature.
Exercise 23.12 Using the relationship X = ( l o g y - m ) / a , where X is a standard normal variable, show that, the kth moment of Y is given by
E ( Y k )= E (e k ( r n + o X ) )
(23.94)
= ekm+Bk2a2
Then, deduce that
EY where w
= em
&
and
Var Y = eZmu(u - l),
= euL.
Lognormal distributions possess many interesting properties and have also found important applications in diverse fields. A detailed account of these developments on lognormal distributions can be found in the books of Aitchison and Brown (1957) and Crow and Shimizu (1988). Along the same lines, Johnson (1949) considered the following transformations:
X
=a
+ blog (L) and 1-Y
X
=a
+ bsinh-I
Y,
(23.95)
where X is once again distributed as standard normal. The distributions of Y in these two ca.ses are called Johnson's Sg and Su distributions, respectively. These distributions have been studied rather extensively in both the statistical and applied literature .
Exercise 23.13 By transformation of variables, derive the densities of Johnson's Sn and S" distributions. Exercise 23.14 For the Su distribution, show that the mean and variance are
(23.96) respectively.
CHAPTER 24
MISCELLANEA 24.1
Introduction
In the preceding thirteen chapters, we have described some of the most important and fundamental continuous distributions. In this chapter we describe briefly some more continuous distributions which play an important rolc either in statistical inferential problems or in applications.
24.2
Linnik Distribution
In Chapter 19 we found that a random variable V with pdf 1 p v ( x ) = - e?, 2
-m
< x < m,
has as its characteristic function [see Eq. (19.5)] (24.1) Such a random variable V can be represented as
vgaw,
(24.2)
where X and W are independent random variables with X having the standard exponential E ( 1) distribution and W having the normal N(O,2) distribution with characteristic function
gw(t) = EeitW
(24.3)
= e-t2.
This result can be established through the use of characteristic functions as follows:
fv(t)
=
Ee itV
-
EeitJSTW
e-XEeitfiW
235
dz
236
MISCELLANEA =
1,
e-"gw(t&) dx
(24.4)
a
Thus, from (24.4), we observe that V = W has the standard Laplace L( 1) distribution. Similarly, let us consider the random variable
Y
=xz>
where X and Z are independent random variables with X once again having the standard exponential E(1)distribution and 2 having the standard Cauchy C(0,l) distribution with characteristic function gz(t) =
(24.5)
= e-lti.
Exercise 24.1 Show that the characteristic function of Y 1 f Y ( t ) = EeitY = 1 It1 .
=X
Z is
+
(24.6)
Note that in the two examples above, W N(O,2) and Z C(0,l) have symmetric stable distributions with characteristic functions of the form exp(-ltla), where (Y = 2 in the first case and (Y = 1 in the second case. Now, more generally, starting with a stable random variable W ( a )(for 0 < Q 5 2) with characteristic function N
N
let us consider the random variable
Y(a!)= X""W(a),
(24.8)
where X a.nd W ( a ) are independent random variables with X once again having the standard exponential E ( 1) distribution. Then, we readily obtain the chara.cteristic function of Y ( a )as
(24.9)
INVERSE GAUSSIAN DISTRIBUTION
237
Thus, we have established in (24.9) that (24.10) (for any 0 < Q 5 2) is indeed the characteristic function of a random variable Y ( a )which is as defined in (24.8). The distribution with characteristic function (24.10) is called the Linnik distribution, since Linnik (1953, 1963) was the first t o prove that the RHS of (24.10) is a characteristic function for any 0 < Q 5 2. But this simple proof of the result presented here is due t o Devroye (1990). In addition, from (24.8), we immediately have (24.11)
E {Y(a))' = EXe/" E { W ( Q ) } ' . Since
exists for any C > 0, the existence of the moment E {Y(a))' is determined by the existence of the moment E { W ( Q )e .} Hence, we have
{ ly(a)le}< 02 if and only if 0 < !< Q = 2.
24.3
Q
(24.12)
< 2 , a.nd that (24.12) is true for any !> 0 when
Inverse Gaussian Distribution
The two-parameter inverse Gaussian distribution, denoted by I G ( p ,A), has its pdf as
p x ( z ) = /=exp
271.x3
{
-
~
A
2P
2
(- p) z
and the corresponding cdf as
Fx(z)
=
a)
(E(;
-
1))
+
,
z
> 0, A, p > 0, (24.13)
(-8 (; +
1))
z
> 0.
, (24.14)
The characteristic function of IG (p ,A) can be shown t o be k.
(24.15)
238
MISCELLANEA
Exercise 24.2 From the characteristic function in (24.15), show that E X p and Var X = p3/X.
=
Exercise 24.3 Show that Pearson coefficients of skewness and kurtosis are given by
respectively, thus revealing that IG(p,A) distributions are positively skewed and leptokurtic. Note that these distributions are represented by the line 7 2 = 3 -t- 5$/3 in the ($, y2)-plaiie.
By taking X = p2 in (24.13), we obtain the one-parameter inverse Gaussian distribution with pdf
,
x > 0,p > 0,
(24.16)
denoted by I G ( p ,p’). Another one-parameter inverse Gaussian distribution may be derived from (24.13) by let,ting p + 00. This results in the pdf
Exercise 24.4 Suppose X I ,X 2 , . . . ,X , are independent inverse Gaussian random variables with X , distributed as IG(p,, A,). Then, using the characteristic function in (24.15), show that X,X,/p: is distributed as I G ( p ,p 2 ) , where p = C:=l A z / p z . Show then, when p, = p and A, = X for all i = 1 . 2 . . . . , n, that the sample mean X is distributed as I G ( p ,nA).
x;&
Inverse Gaussian distributions have many properties analogous to those of normal distributions. Hence, considerable attention has been paid in the literature t o inferential procedures for inverse Gaussian distributions as well as their applications. For a detailed discussion on thew developments, one may refer to the books by C h h i h r a and Folks (1989) and Seshadri (1993, 1998).
CHI-SQUARE DISTRIBUTION
24.4
239
Chi-square Distribution
In Chapter 20 (see, for example, Section 20.2), we made a passing remark that the specia.1case of gamma r (n/2,0,2) distribution (where n is a positive integer) is called the chi-square (x2) distribution with n degrees of freedom. We shall denote the corresponding random variable by x;. Then, from (20.6), we have its density function as 1 (n/2)
Pxz,(.) =
e--2/2
x(n/2)--1
,
o<x<m.
From (20.9), we also have the characteristic function of
(24.17)
xi as
fxz, ( t )= (1- 2it)
(24.18)
From (20.15) and (20.17), we have the mean and variance of
Ex: = n
and
Var
x:
xn2 = 2n.
as (24.19)
Furthermore, from (20.18) and (20.19), we have the coefficients of skewness and kurtosis of as
xi
(24.20) Also, as shown in Chapter 20, the limiting distribution of the sequence (x: - n ) / G is standard normal. Next, let and x i be two independent chi-square random variables, and x i . Then, from (24.18), we obtain the characteristic function let x2 = of x2 as
xi +
fX*(t) = Eeitx2 = EeitXz EeitXi
=
4-(nfm)/2
(1 - 2.
(24.21)
which readily implies that x2 has a chi-square distribution with (n+m)degrees of freedom. On the other hand, if x i and X are independent random variables with X having an arbitrary distribution, and if x2 = X is distributed as chi-square with ( n m ) degrees of freedom (where m is a positive integer), then the characteristic function of X is
xi +
+
xk.
which implies that X is necessarily distributed as Let X I , . . . , X,n be independent standard normal N ( 0 , l ) random variables. Then, as noted in Chapter 23 [see, for example, Eq. (23.81)],
k= 1
follows a chi-square distribution with n degrees of freedom. More generally, the following result can be established.
240
MISCELLANEA
Exercise 24.5 Let Y l ,. . . , Y, be a random sample from the normal N ( a ,g2) Yk denote the sample mean. Then, show that distribution, and = CE==, [see Eq. (23.60)]
v
( 24.23)
It is because of this fact that chi-square distributions play a very important role in statistical inferential problems. A book-length account of chi-square distributions, discussing in great detail their various properties and applications, is available [Lancaster (1969)l.
24.5
t Distribution
Let X , X I , . . . , X , be i.i.d. random variables having standard normal N ( 0 , l ) distribut,ion. Then, consider the random variable [see also Eq. (23.85)] (24.24) where the numerator and denominator are independent with the numerator having a standard normal distribution and S, having a chi-square distribution with n degrees of freedom. Then, as given in Exercise 23.8, the pdf of this random variable is given by [see Eq. (23.86)]
which is called Student’s t distribution with n degrees of freedom. Let us denote this distribution by t,. This is a special form of Karl Pearson’s Type VII distribution. Since “Student” (1908) was the first to obtain this result, it is called Student’s distribution. But sometimes this distribution is called Fisher’s distribution. More generally, the following result can be established. Exercise 24.6 Let Yl,. . . , Y, be a random sample from the normal N ( a ,g 2 ) distribution, and = C;=,Y k / n denote the sample mean. Then, with S2 as defined in (24.23), show that
t DISTRIBUTION
24 1
It is for this reason that t distributions play a very important role in statistical inferential problems. d From the density function of X = t, in (24.25), it can be shown that the r t h moment of X is finite only for r < n. Since the density function is symmetric about IC = 0, all odd moments of X are zero. If r is even, it can be shown that the r t h moment of X (which are also central moments) is given by
-
~ _ _ _ _ _
n'/2
1 . 3 . . . ( r - 1)
( n - r ) ( n- r
+ 2 ) . . . (n
-
2)
( 24.2 7)
~
Exercise 24.7 Derive the formula in (24.27).
From the expressions above, we readily obtain the mean, variance, and d coefficients of skewness and kurtosis of X = t, as
EX n(X)
=
0,
=
0
n a), n-2 3 ( n - 2) and y2(X) = ( n > 4). n-4 Var X
= -(n, >
(24.28)
It is evident that the t distributions are symmetric, unimodal, bell-shaped and leptokurtic distributions. In addition, as mentioned earlier in Exercise 23.9, t , distributions converge in limit (as n + m) to the standard normal distribution. Plots of the t density function presented in Figures 24.1 and 24.2 reveal these properties.
Exercise 24.8 Let X and Y be i.i.d. random variables with t , distribution. Then, show that
(24.29) a result established by Cacoullos (1965).
MISCELLANEA
242
In a recently published article, Jones (2002) observed that the t 2 distribution has simple forins for its distribution and quantile functions which lead to simple calculations for many properties and measures relating to this distributiori. The t density in (24.25) reduces, for the case n = 2, simply to P2(t) =
1
< t < 00,
-m
(2 + t 2 ) 3 / 2 '
(24.30)
from which we readily obta.in the cdf as
-
L2
-
I,,,
( t / 4)
tan
1 h s e c 2 6 dQ 23/2(sec26 ) 3 / 2
(setting 7~ =
tan-'(t/JZ)
Jztan 6 )
d6
-C O S ~
2
; { l + + j (v5 ) )} 1
=
tan2 (tan-'
-m
< t < 00.
Exercise 24.9 From (24.31), show that the quantile function of the tribution is 2u - 1 FT'(u) = O
(24.31)
t2
dis-
&qKq'
Proceeding similarly, the following expressions can be obtained correspondirig to n = 3, n = 4, n = 5, and 'ri = 6: ~ 3 ( t )=
F,(t)
=
Fs(t)
=
F6(t)
=
(5)+
+
-tan-' 1 2 7 r 1 t ( 6 $- t 2 ) 2 2(4+t2)3I2 ' 1
d3t 7r(3
+
1 2 7 1 -+ 2 -
1 + -tan-' r
+ + +
t(135 30t2 2t4) 4(6 t 2 ) 5 / 2
+ t2)
'
t DISTRIBUTION
243
LD
0
U 0
'
.,'
m
0
LL
n
a
N 0
-
0
0
.
0 I
-6
I
-4
.......
I
I
I
I
I
-2
0
2
4
6
t
Figure 24.1. Plots of t density function when n = 1 , 2 , 4
MISCELLANEA
244
Lo
0
U
0
m
0
LL
n Q
rn
0
7
0
0 0
I
-6
1
-4
1
1
1
1
1
-2
0
2
4
6
t
Figure 24.2. Plots of t density function when n = 10,20,30
F DISTRIBUTION
24.6 Let V
-
245
F Distribution xk and W x i be independent random variables. N
x=-V / m W/n
-
Further, let
-n. - V
W
m
(24.32) '
Then, it can be shown that the pdf of X is given by
This is called Snedecor's F distribution with ( m , n ) degrees of freedom, as its original derivation is due t o Snedecor (1934), and we shall denote it by Fm,,. This is related to Karl Pearson's Type VI distribution, which is a beta distribution of the second kind mentioned earlier (see Section 16.4).
Exercise 24.10 Derive the density function in (24.33).
From (24.32), we obtain the r t h moment of X
EX'
= =
for r
X
N
(t)' (t)'
N
Fm,7L as
E(V') E(W-')
+
+
m ( m 2). . . (rn 2r - 2) (n - 2)(n - 4) ... ( n- 2r)
(24.34)
< n/2. From (24.34), we immediately obtain the mean and variance of Fm3nas EX=-
n
n-2
(n > 2) and Var X
Exercise 24.11 If X
d
=
=
+
2n2(m 12 - 2) (n> 4). m(n - 2)2(n - 4)
F,,,, then show that
fi
-
(24.35)
l/a) /2 d t,.
246
24.7
MISCELLANEA
Noncentral Distributions
In Section 24.4 we noted that when X I , X 2 , . . . , X, are i.i.d. N ( 0 , l ) random variables, the variable S, = CE=,X z has a chi-square distribution with n degrees of freedom. Now, consider the distribution of the variable n
sk c ( x k + a k ) 2 .
(24.36)
k=l
The distribution of Sk depends on a l , a2, . . . , a, only through X = C:=,a:, and is called thc noncentral chi-square distribution with n degrees of freedom and Iioncentality parameter X = la^. When X = 0 (i.e., when all a l , . . . , a, are zero), this noncentral chi-square distribution becomes the (central) chi-square distribution in (24.17). Exercise 24.12 Let Y1,Y2,. . . , Yn be independent random variables with Y k distributed as N ( Q , g 2 ) ,and Y = Cy=,Y k / n denote the sample mean. Then, show that (n-l)S2 1 7L 02
=02
C(Yk -
Y ) 2
k=l
x:=l
is distributed as noncentral chi-square with n - 1 degrees of freedom and noncentmlity parameter X = x:=,(ak - u ) 2 / a 2where , zi = akin. Exercise 24.13 From (24.36), derive the mean and variance of the noncentral chi-square distribution with n degrees of freedom and noriceritrality parameter x = u;.
x:=,
In a similar manner, we can define noncentral t and noncentral F distributions which are useful in studying the power properties o f t and F tests. For example, in Eq. (24.32), we defined the F distribution with ( m ,n) degrees of freedom as thn distribution of the variable
where V arid W are independent chi-square random variables with m and n degrees of freedom, respectively. Now, consider the distribution of the variable
(24.3 7 ) where V’ and W’ are independent noncentral chi-square random variables with m and n degrees of freedom and noncentrality parameters XI and X2, respectively. The distribution of X’ is called the doubly noncentral F distribution witjh (m, n ) degrees of freedom and noncentrality parameters ( X I , X2). In the special case when A 2 = 0 (i.e., when there is a central chi-square in the denominator), the distribution of X’ is called the (singly) noncentrul F distribution with ( m ,n) degrees of freedom and noncentrality parameter XI.
Part I11
MULTIVARIATE DISTRIBUTIONS
This Page Intentionally Left Blank
CHAPTER 25
MULTINOMIAL DISTRIBUTION 2 5.1
Introduction
The multinomial distribution, being the multivariate generalization of the binomial distribution discussed in Chapter 5, is one of the most important and interesting multivariate discrete distributions. Consider a sequence of independent and idential trials each of which can result in one of k possible mutually exclusive and collectively exhaustive events, say, A l , Az, . . . , A k , with respectively probabilities p1 , p 2 , . . . ,p k , where p l pa . .. p k = 1. Such trials are termed multinomial trzals. Let Ye = ( Y I ,Yz,!, ~ , . . . , Y k , e ) , 1 = 1 , 2 , . . . , be the indicator vector variables, that is, Y,,e takes on the value 1 if the event A, ( j = 1 , 2 , . . . , k ) is the outcome of the 4th trial and the value 0 if the event A, is not the outcome of the l t h trial. Note that the variables Yl,e,YZ,e,.. . , Yk,! (which are the components of the vector Ye) are dependent, and that
+
+
+
Y1,e
+ Y2,e + . . + Y k , ! = 1, ’
e = 1,2,....
For any n = 1 , 2 , . . . , let us now define the random vector X,
(25.1) =
(XI,,, X Z , ~ ,
,Xk,n) as
x,
(25.2) + YZ + . . . + Y,, n = 1 , 2 , . . . , = y 3 , ~+ y3,2 + . . . + %,, ( j = 1 . 2 , . . . , k ) is the number of occur= Y1
where X,,, rences of event A, in the n multinomial trials. In other words, the random vector X, is simply a counter which gives us the number of occurrences of the events A l , A z , . . . , Ak in the n multinoniial trials, and hence
X l , , + X z , , + ~ ~ ~ + X k ,=, n , n = 1 . 2 , . . . .
(25.3)
Then, simple probability arguments readily yield
P, ( m l , m z , . . , n
~ k ) =
Pr { X I , , = m l , X ~=, m2,. ~ .., X ~ C= ,, m k }
m,
= 0,. ..,n,
249
ml
+ . . . + mk = n.
(25.4)
250
25.2
MULTINOMIAL DISTRIBUTION
Notations
A random vector X, = ( X l , , , x,,,,. . . , X k , , ) having the joint probability mass function as in (25.4) is said t o have the multinomial M ( n ,p l , p2, . . . ,p k ) distribution. In the case when k = 3, the distribution is also referred to a.s the
trinomial distribution.
-
Remark 25.1 The random vector x, M ( n , p l , p z , . . . , p k ) is actually ( k 1)-dimensional since its components XI,^, Xz,,, . . . , X k , , satisfy the relationship XI,, X 2 , n . . f x k , n = n
+
+
'
and, consequently, one of the components (say, x k , n ) can be expressed as xk,, =n
-
XI,,
-
X z , , - . . . - xk- 1 ,n.
Hence, the distribution of the random vector X, = (XI,,, X 2 , , , . . . , Xk,lL) is completely determined by the distribution of the ( k - 1)-dirnensional random vector XI,^, X2,,,. . . , Xk-l,?L). For cxaniple, when k = 2, the probabilities P,(ml, m.2) in (25.4) simply become
which are the binomial probabilities.
25.3
Compositions
Due to the probability interpretation of multinomial distributions given above, it readily follows that if independent vectors Y1, Y2,. . . ,Y, all have multinomial h ' ( l , p 1 , p 2 , . . . , p k ) distribution, then t h e s u m x , = Y 1 + Y 2 + . . . + Y n has the iiiult,iriomial M ( n , p l , p 2 , . . . , p k ) distribution. In addition, if X M(n1,p1.p2.. . . , p k ) and Y M ( n 2 ,p l , p a , . . . , p k ) are independent multinomial random vectors, then X Y is distributed as the sum Y1 Y2 . . . Y,,+,, of i.i.d. multinomial M ( l , p l , p z , . . . ,p,+)random vectors, and hence, is distributed as multinomial M(nl n2,p1,pzr.. .,pk).
-
-+
+
+ +
+
25.4
Marginal Distributions
The fact that the multinomial distribution is the joint distribution of the number of occurrences of the events Al, A2,. . . ,A,+ in n rnultinornial trials enables ub to derive easily any marginal distribution of interest. Suppose that we are interested in finding the marginal probabilities
P r ( X 1 , , = m ~ , X a=, m ~,...,X,,,=m,}
CONDITIONAL DISTRIBUTIONS for j
< k , when X,
-
25 1
M ( n , p l , p z , .. . , p k ) . We first note that
P r { X I , , = m l , . . . ,X j , ,
=mj}
= Pr {XI,, = m l , . . . , X j , , = mj, V = m } ,
(25.6)
where
evidently, V denotes the number of occurrences of the event A = Aj+l U Aj+2 U.. .U A k in the n multinomial trials with the corresponding probability of occurrence being
Pr{A} = pj+l
+ ... + p k
=
1- p l
-
. . . - p .3
-
P (say) .
Then, the random vector (XI,,,. . . , Xj,,, V) clearly has the multinomial M ( j l , p l , . . . , p j , p ) distribution; then, using (25.4) and (25.6), we have
+
Pr
= m l , . .., X j , , = m j }
=
P, ( m l ,. . . ,mj,m )
j
j
m + C m i = n and p + C p i = l . i=l i=l (25.7) In particular, for j of XI , , as Pr
=
1, we simply obtain from (25.7) the marginal distribution
= rnl} =
n!
p Y l ( 1 - pl)n--ml , ml
ml! ( n- ml)!
= 0 ,..., n,
(25.8)
which simply reveals that the marginal distribution of X I , , is binomial B ( n , p l ) Similarly, we have X , , B(n,p,) for any r = 1 , 2 , . . . , k . N
2 5.5
-
Conditional Distributions
M ( n , p l , p z , .. . , p k ) . Consider now the conditional distribution of (X,+i,,,. . . , X k , 7 L )given , ( X i , , = m i , . . . ,X,,, = m,) , defined by
Let X,,
Pr{X,+i,,
= m3+1,.
.. ,Xk,,
= mk
1 XI,,
= m l , .. .
,x,,,= m,} (25.9)
Substituting the expressions in (25.4) and (25.7) into (25.9), we obtain
MULTINOMIAL DISTRIBUTION
252
Pr {Xj+l,, = n ~ j +. .~. ,,X k , ,
=mk
1 X1,,
= m l , . . . ,Xj,,, = m j ]
(25.10)
+. +
+. +
for mj+l . . mk = n - (ml . . m j ) ,and 0 otherwise. From (25.10), we readily observe that the conditional distribution of ( X j + l , n ,. . . ,Xk,7L),given = m l , . . . ,Xj,, = m j , is multinomial M ( n - m , y j t l , . . . , y k ) , where m = 'tn1 . . . m j , yi = p i / p (for i = j 1 , .. . , k ) , and p = p j + l + . . . + p k . Since the dependence on ml, m2,. . . , mj in (25.10) is only through the sum 7n1 ni2 . . . mj, we readily note t1ia.t
+ + + + +
+
P r {Xj+l,, = mj+l,.. . , X k , , = m
k
I
= m l , .. . . X j , , = mj)
(25.11) hence, the conditional distribution of (x2+l,n, . . . , X k , l L ) , given Xl,+ . . . X 2 , n= r n , is also the same multinomial M ( n - 7)2,y J + l , . . . , y k ) distribution.
+
25.6
+
Moments
Let x,, :( X I,,,, . . . , x k , n ) M ( n . p l . . . . , p k ) . Then, since the marginal distribution of X,,, ( r = 1 , 2 , . . . , k ) is binomial B(n,p,), we readily have N
EX,,,,
= 7lp, = e,
and Var Xr,n = np,(l
-
p,) = (T, 2 .
(25.12)
Next, in order t o derive the correlation between the variables X,,,, ( r 1.. . . , k ) , we shall first, find the covariance using the formula
where the last equality follows from the fact that
=
MOMENTS
j
i
i
j
=
253
Cj ~r { x ~ j ,> E~(Xr,nIXs,n =
j
=E
=j
) (25.14)
{Xs,n E (Xr,nlXs,n)}.
Now, we shall explain how we can find the regression E (Xr,nlXs,,) required in (25.13). For the sake of simplicity, let us consider the case when T = 2 and s = 1. Using the fact that the conditional distribution of the vector (X2,,,. . . , X k , n ) , given = m, is multinomial M ( n - m, q 2 , . . . , q k ) , where qi = pi/( 1 - p l ) (i = 2,. . . , k ) , we readily have the conditional distribution of X 2 , n , given XI,^, = m, is binomial B ( n - m, q 2 ) ; hence, (25.15) from which we obtain the regression of
X2,,
to be
on
Using this fact in (25.13), we obtain 012
= -
E XI,^ XZ,,)e1e2 E 1x1,~ E (X2,nlt1,n)}- e1e2 -EP2 1-P1 Pa l-pl
-
( n - Xl,n)P2 1 -P1
~
( n- XI,,)} 2
-
e1e2
{ n Pl -nP1(1-P1)-n2P?}-
nPlP2.
7L2PlP2
(25.16)
Similarly, we have
From (25.12) and (25.17), we thus have the covariance matrix of X, (Xl,n,.. . , X k , n ) to be
=
(25.18) for 1 5 T , S 5 k . Furthermore, from (25.12) and (25.17), we also find the correlation coefficient between Xr,n and X s , n (1 5 T < s 5 k ) to be (25.19)
254
MULTINOMIAL DISTRIBUTION
It is important t o note that all the correlation coefficients are negative in a multinomial distribution.
Exercise 25.1 Derive the multiple regression function of . .. Xy,n.
XT+1,7Lon
XI,^,
1
25.7
Generating Function and Characteristic Function
Consider a random vector Y = (Y1,. . . , Yk)having M ( l , p l ,. . . ,pk) distribution. As pointed out earlier, the distribution of this vector is determined by the nonzero probabilities (for T = 1,. . . , k ) py = P r {Yl = 0 , . . . , Yr-l= 0, Yr 1,Yy+1= 0 , . . . , Yk = o} .
Then, it is evident that the generating function
$ ( s 1 , s2,..
(25.20)
. , sk) of Y is (25.21)
+
+
Since X, N M ( n , p l , .. . ,pk) is distributed as the sum Y1 . . . Y, [see Eq. (25.2)], where Y1,.. . , Y n are i.i.d. multinomial M ( l , p l , .. . ,pk) variables with generating function as in (25.21), we readily obtain the generating function of X, as
n
=
(Q(S1,
. . . ,Sk)}n
=
From (25.22), we deduce the generating function of m = l , . . . , k - 1 ) as Rn(s1,.
. . ,S r n )
=
Pn(S1,.
. . , sm, 1,.. . ,I)
In paxticular, when m = 1, we obtain from (25.23)
. . , X m , 7 1 (for )
GENERATING FUNCTION AND CHARACTERISTIC FUNCTION 255 which readily reveals that X I , , is distributed as binomial B ( n , p l ) (as noted earlier). Further, we obtain from (25.23) the generating function of the sum X I , , . . . Xm,n (for m = I , . . . , k - 1) as
+ +
EsX13,+...+X-,,
=Rn(S,.
. . ,s ) =
+
{+ 1
(8 -
1)e p r } ; ' ,
r=l
+ +
(25.25)
which reveals that the sum XI,^ . . . Xm,n is distributed as binomial B (n, p r ) . Note that when m = k , the sum XI,^ . . . Xk,n has a degenerate distribution since X I , , + . . . Xk,, = n.
c7==l
+
+
Exercise 25.2 From the generating function of X, M ( n , p l , . . . , p k ) in (25.22), establish the expressions of means, variances, and covariances derived in (25.12) and (25.17). N
Exercise 25.3 From the generating function of (XI,,, . . . , X m , n ) in (25.23), prove that if m > n and m 5 k , then E (XI,,.. . Xm,+)= 0. Also, argue in this case that this expression must be true due to the fact that at least one of the X r , n ' ~ must be 0 since . . . Xk,, = n.
+
+
From (25.22), we immediately obtain the characteristic function of X n M ( n , ~ l ., .., ~ l i as ) N
E
f n ( t l , .. . , tk) =
{
ei(tlXl,~+'..+tkxk,n)
P, (eitl, . . . ,e i t k )
3
(25.26) In addition, from (25.23), we readily obtain the characteristic function of (XI,,, . . . , X m , n ) (for m = I , . . . ,k - 1) as g,(tl,. . . , tm)
= =
R, ( e i t l , .. . ,eitm)
{+ 1
n
m
x p r
(,it,
-
1)) .
(25.27)
r=l
Exercise 25.4 From (25.27), deduce the characteristic function of the sum XI,^ . . . Xm,, (for m = 1 , 2 , . . . , k - 1) and show that it corresponds to that of the binomial B (n, pr).
+ +
xr=l
256
25.8
MULTINOMIAL DISTRIBUTION
Limit Theorems
Let 11s now consider the sequence of random vectors
Xn
= ( X 1 , n r . .. , X k , n ) N
M(n,pl,.. . ,
(25.28)
~ k ) ,
where p k = I - C“-’ p,. Let p , = A T / n for T = 1,.. . , k - 1. Then, for nz = k-1, the characteristic function of . . , X ~ - I , ~in&(25.27) ) becomes n
(25.29) Letttirig n
4
00
in (25.29), we observe that
where h,(t) = exp {A, (ezt 1)) is the characteristic function of the Poisson .(A,.) distribution (for T = 1,.. . , k - 1). Hence, we observe from (25.30) the components XI,,, . . . , X k - l , , of the multinomial random tha.t, a.s ‘n+ x, vector X, in (25.28) are asymptotically independent and that the marginal distribution of X , , converges to the Poisson .(A,) distribution for any r = 1 , 2 , . . . , k - 1. ~
+
+
Exercise 25.5 Using a similar argument, show that XI,, . . . X m , , (for rn = 1,.. . , k - 1) converges to the Poisson 7r ( C r = l A,) distribution.
Next, let tors
11s corisider
the sequence of the ( k - 1)-dimensional raridorri vec-
Let h,,(tl.. . . , t k - 1 ) be the characteristic function of W, in (25.31). Then, it follows froni (25.27) that
(25.32)
Exercise 25.6 As n to
LIMIT THEOREMS
257
show that h,(tl,.
. . , t k - l ) in (25.32) converges
4 m,
where
is the correlation coefficient between Xr , n and X s , n derived in (25.19).
From (25.33), we see that the limiting characteristic function of the random variable XT.n - npr Wr,n = nP ( 1 - P 1
dy
(-it:),
becomes exp which readily implies that the limiting distribution of the random variable Wr,, is indeed standard normal (for T = 1,.. . , k - 1). Furthermore, in Chapter 26, we will see that the limiting characteristic function of W, in (25.33) corresponds to that of a multivariate normal distribution with mean vector (0,. . . , 0) and covariance matrix ifi=i
(25.34)
for 1 5 i , j 5 k - 1. Hence, we have the asymptotic distribution of the random vector W, in (25.31) to be multivariate normal.
This Page Intentionally Left Blank
CHAPTER 26
MULTIVARIATE NORMAL DISTRIBUTION 26.1
Introduction
The multivariate normal distribution is the most important and interesting multivariate distribution and based on it, a huge body of multivariate analysis has been developed. In this chapter we present a brief description of the multivariate normal distribution and some of its basic properties. For a detailed discussion on multivariate normal distribution and its properties, one may refer t o the book by Tong (1990). At the end of Chapter 25 (see Exercise 25.6), we found that the limiting distribution of a sequence of multinomial random variables has its characteristic function as [see Eq. (25.33)]
h ( t l , .. . ,tk-l)
= exp
{
-
1
Q ( t l , . . . , tk-1)) ,
(26.1)
where Q (tl, . . . ,tk-1) is the quadratic form
Q(t1,. . . d k - 1 ) =
k-1
Ct:+ 2 C
Prstrts.
(26.2)
l
r=l
The quadratic form Q (tl, . . . ,tk-1) in (26.2) can be written in matrix notation as
Q(t) = tCt’,
(26.3)
where t is a row vector ( t l , . . . , tk-l), and C is a (k-1) x (k-1) real symmetric positive definite matrix with (i, i)th element as 1 and (i, j ) t h element as p i J . In fact, C is the covariance matrix of a random vector (Y1,. . . , Yk-1) with variances 1 and covariances p i j (which are also the correlation coefficients). It can be shown that (we state this result without proof)
259
260
MULTIVARIATE NORMAL DISTRIBUTION
where
with 1x1 denoting the determinant of the matrix C, Q-'(t) = tC-'t/, and C-' denoting the inverse of the matrix C. Eqmtion (26.4) implies that the nonnegative function p ( z 1 , . . . , Zk-1) satisfies the condition
(26.6)
= 1
using (26.2). Hence, p ( z 1 , . . . , ~ random variable, and
h ( t l , . .. , t k - 1 )
k - 1 )
is the pdf of some ( k - 1)-dimensional
Q ( t l ,... . t k - I ) }
= exp
(26.7)
is indeed the characteristic function of this distribution
26.2
Notations
Let C = (crLj)tj=' be a (,a x n) real symmetric positive definite matrix. Note that such a matrix may be a matrix of second-order moments of somc ndimensional distribution. Let us now define the quadratic form corresponding to C as
Q(t) = t C t '
c n
=
orrt;
+2
r=l
1
(26.8)
17rstrts.
ljr<s
As twforc:, let C-' denote the inverse of the matrix C and Q-' (t) = tC-lt'. Then, a random vector Y = (Yl,. . . , Yn)is said to have a multivariate normal MlV(0,C ) distribution, if it,s pdf is of the form g(z1,.. . , z),
=
r nexp { 1
-
. . , z,)
~-1(zl,.
},
(26.9)
and its characteristic function is (26.10) where the quadratic form Q(t1,.. . , t n ) is as in (26.8). In this case, the first parameter 0 in the notation MlV(0, C) is a row vector of dimension ( n x 1)
NOTATIONS
261
with all of its elements being 0. From the characteristic function of Y in (26.10), we then readily find that EY,
for
=0
T
= 1 , .. . , n ,
(26.11)
and that C is indeed the matrix of the second-order moments of Y , that is,
Va.r Y, Cov(Y,,Ys)
= =
oTr for 1 5 T 5 n, urS for 1s r < s 5 n.
(26.12)
Using the random vector Y, we can now introduce a new random vector
where m = ( m l ,. . . , m,) is the vector of means of the components of the random vector X.
Exercise 26.1 From (26.9), show that the pdf of the random vector X is given by p(z1,. . . , z,)
=
1
m
(26.14)
Exercise 26.2 From (26.10), show that the characteristic function of the random vector X is given by
(26.15) where the quadratic form Q(t1,.. . , t,) is as in (26.8).
We then say that such a random vector X = ( X l , . . . , X,) has a multivariate normal distribution (in n dimensions, of course) with mean vector m and covariance matrix C, and we denote it by X M N ( m ,C). N
262
26.3
MULTIVARIATE NORMAL DISTRIBUTION
Marginal Distributions
Let X = ( X I , .. . , X n ) MN(m,C), and V = ( X I , .. . , X e ) , !< n. Then, from the characteristic function of X in (26.15), we readily obtain the characteristic function of V as N
~ei("xl+-.+tPxe)
= f ( t 1 , . . . ,t,,O, . . . , O )
(26.16) Equation (26.16) immediately implies that the random vector V .t < n, is distributed as MN(m('), C(')), where
=
( X I ,. . . , X e ) ,
In particular, we observe that marginally X I N(m1,nl1),where ml E X 1 and 011 = Var X I . Similarly, it can be shown that marginally, X,. N ( ~ , . . O ; for ~ ) any T = 1,.. . ,n. N
26.4
= N
Distributions of Sums
Let X and Y be two independent (n-dimensional) multivariate normal random vectors with parameters (m('),C(')) and (m('),C(2)),respectively. Further, let V be a new random vector defined as V = X Y .
+
Exercise 26.3 Then, using Eq. (26.15), prove that V MN(m(1)+m(2), C(')t C(')). More generally, if X, M N ( m ( J )C")) , ( j = 1 , . . . , k ) are independent k random vectors, prove that C,=, X, MN(m, C), where N
26.5
N
-
k
k
j=1
j=1
Linear Combinations of Components
-
Lct X = ( X I , .. . , X n ) MN(m, C). Now, let us consider the linear combination L = C:==, c,X, = Xc', where c = ( e l , . . . , cn). Then, from the
INDEPENDENCE OF COMPONENTS
263
characteristic furictiori of X in (26.15), we readily obtain the characteristic function of L as
=
f ( C l t , .. . , cnt)
(26.17) Equation (26.17) readily reveals that the linear combination L is distributed as normal N(nz,0 2 ) ,where n
m=
C crmr = mc’ T=l
and
26.6
Independence of Components
It is easy t o find conditions under which the components of the multivariate normal vector X = ( X I , .. . , X,) M N ( m ,C) are all independent. Since in this case, when X I , .. . , X , are all independent, the characteristic function of X in (26.15) must satisfy the condition N
n n
f ( t l ,‘ . .
ltn)
=
f ( 0 ,. . . 10,tT,o,. . . > O),
(26.18)
r=l
we immediately observe that the components of the random vector X are independent if and only if uij = 0
for 1 5 i
< j 5 n.
(26.19)
In other words, the components X I , . . . , X , are all independent if and only if the covariance matrix C is a diagonal matrix.
MULTIVARIATE NORMAL DISTRIBUTION
264
26.7
Linear Transformations
-
Let X = (Xl,. . . , X n ) M N ( m ,C ) . Further, let Y = (Yl,. . . , Y,) = XC be a linear transformation of X, where C is a ( n x n ) non-singular matrix.
Exercise 26.4 Then, prove that Y
= (Y1,.
. . , Y,)
N
MN(mC,C’CC).
It is well known from matrix theory that for any symmetric ( nx n) matrix C, there exists an ( n x n ) orthogonal matrix C such that C’CC is a diagonal matrix; be reminded that a matrix C is said to be an orthogonal matrix if CC’ = I,, where I, denotes an identity matrix of order ( n x n ) . From this property, it is clear that if C’CC is a diagonal matrix, then the components of the random vector Y = ( Y l , . . , Y,) MN(mC,C’CC) will all be iridependent. Hence, if X = ( X l , .. . , X,) M N ( m ,C), then there exists an orthogonal linear transformation Y = XC that generates independent normal random variables
-
N
y,. = q r X 1 + ’ .
‘
+ c,,x,,
T
=
1... . ,n.
(26.20)
Moreover, we see that X can be expressed as X = YB, where B = C-’ is also an orthogonal matrix. This implies that the components X I , .. . , X, of any multivariate normal random vector X can be expressed as linear combinations
x,= b,lYI + . . . + b,,Y,,
T
= 1 , .. . , n
(26.21)
of independent normal random variables Yl, . . . , Y,. The orthogonality of the transformations in (26.20) and (26.21) can be exploited t o obtain the following relations:
c n
Y,” = YY’
1.=1
=
xc (XC)’ = X(CC’)X’ = XInX’ = xx’ =
c n
r=l
X; (26.22)
a.nd n
(Y - EY) (Y - EY)’ r=l
(XC - EXC) (XC - EXC)’ (X - EX) CC’ (X - EX)’ (X - EX) I, (X - EX)’ (X - EX) (X - EX)’
c n
(Xr -
r=l
’
(26.23)
BIVARIATE NORMAL DISTRIBUTION
265
It follows, for example, from (26.22) and (26.23) that n
n
r=l
r=l
(26.24) and (26.25) r=l
26.8
r=l
Bivariate Normal Distribution
In this section we discuss in more detail the special case when n = 2, that is, the two-dimensional random vector ( X I ,X2) having a bivariate normal distribution. In this case, we denote the mean vector by (ml,m2)and the covariance matrix by (26.26) where 02 = Var X I , ( k = 1,2), p0102 = Cov(X1,X2), and p is the correlation coefficient between X1 and X z . Since this bivariate normal distribution depends on five parameters, we shall denote it by B N ( m 1 ,m2, a:, 02, p ) .
Characteristic function: The characteristic function for this special case is deduced from (26.15) t o be h(t1,t 2 )
= exp
{i (rn1tl-t mZt2)
1 (oft: 2
- -
+ 2palaztlt2 + a,2t;)
1
. (26.2 7)
Note that the expressions of the mean vector and the covariance matrix given above can also be obtained easily from the characteristic function in (26.27).
Density function: From (26.26), we note that the determinant of the matrix C is 1x1 = afa,2(1 - p 2 ) , which is positive if as > 0, a$ > 0, and IpI < 1. If IpI = 1, then there exists a linear dependence between the variables X I and X2 and, therefore, the vector ( X I ,X 2 ) has a degenerate normal distribution. The inverse of the matrix C in (26.26) can easily be shown to be (26.28) Upon substituting for these expressions in (26.14) and simplifying, we obtain the pdf of ( X 1 , X z )to be
266
MULTIVARIATE NORMAL DISTRIBUTION
-2p(y) (F(%2)2 ) +
(26.29)
Exercise 26.5 By integrating the pdf p ( z 1 , z2) in (26.29) with respect to 2 2 and z l , show tha.t the marginal distributions of X1 and X z are N ( n ~ 1a:,) and N(rn2,a;), respectively.
It should, however, be mentioned here that if the marginal distributions and X2 are both normal, it does not necessarily imply that the vector ( X I ,X z ) should be distributed as bivariate normal. In order to see this, let us consider the bivariate density function
of
X1
+exp{
-
1 2 ( 1 - p2)
(4
-
2P2122
+ z;)
11
,
(26.30) which is a mixture of the densities of BN(O,O,1,1,- p ) and BN(O,O,1,1,p).
Exercise 26.6 If the bivariate random vector (X1,Xa) has its pdf as in (26.30), then show that XI N ( 0 ,l),X2 N ( 0 ,l ) ,and Cov(X1, X 2 ) 0. N
N
But, as seen earlier, among all bivariate normal distributions, only BN(O,O,1,I, 0) with pdf (26.31) can provide such properties for the marginal distributions. Since the pdf h ( s l ,~ 2 in) (26.31) does not equal the pdf g(z1, z2) in (26.30), we can conclude that g ( z l , Q) in (26.30) is not a bivariate normal density function, but it does have both its marginal distributions to be normal.
Some relationships: Let V and W be independent standard normal variables. Further, let XI = V
and
XZ=pV+ J v W ,
(26.32)
BIVARIATE NORMAL DISTRIBUTION
267
where IpI < 1. Then, we readily have the characteristic function of the bivariate random vector ( X I ,X 2 ) to be
f(t1,ta)
=
~eitlXl+it2X2
{ +t2p)V +
=
E exp
-
Eei(tl+t2P)VEeit2J1-p2W
=
exp
=
exp
{ {
i(t1
-
-
1 2 1 - (t: 2
-(tl+ t2p)2
i t 2 J-W}
-
1 Z t i ( l- p2)}
+ 2pt1t2 + t;)
I
,
(26.33)
which, when compared with (26.27), readily implies that ( X I ,X 2 ) is distributed as BN(O,O,1,1, p).
Exercise 26.7 Establish this result by using the Jacobian method on the density function. Exercise 26.8 More generally, prove that the bivariate random vector ( X I ,X 2 ) , where
X1 =ml + a l V
and
X2 = r n z + u 2
is distributed as B N ( m l , m 2 , a ~ , a & p ) .
Conditional distributions: Let the random vector ( X l r X 2 )be distributed as BN(rn1, m2, uY,u;,p). In this case, as seen earlier, X1 N(m1,n:) and X2 N(rn2,a;). From (26.29), we can then obtain the conditional density function of X I , given X z = 5 2 , as N
-
268
MULTIVARIATE NORMAL DISTRIBUTION
(26.35) where (26.36) Froin (26.35), it is clear t,ha.tthe conditional distribution of X I , given X , = 2 2 , is simply N ( X ( I C . L ) , (-Tp~2() ~ ) , where X ( z 2 ) is as given in (26.36).
Exercise 26.9 Proceeding similarly, prove that the conditional distribution of X,, given X I = 2 1 , is N(X*(z1),0$(1- p 2 ) ) , where X*(ZC,) = m2
+ pa2
(26.37)
Regressions: From Eq. (26.35), we immediately have the conditiona.1 mean of
which is a linear function in
z2. Similarly,
X1
to be
we have from Exercise 26.9 that
which is a linear function in 2 1 . Thus, for a bivariate normal distribution, both regression functions (of X1 on X2 and of X 2 on X I ) are linear. Furthermore, we note that the conditional variances are at most as large as the corresponding uncondit,ional variances. In fact, the conditional variances a.re strictly snialler than the corresponding unconditional variances whenever thp correlation coefficient lpi z 0.
CHAPTER 27
DIRICHLET DISTRIBUTION 27.1
Introduction
In Chapter 18, when dealing with exponential order statistics, we established an important property of order statistics U1,n 5 UZ,, 5 . . . 5 as [see Eq. (18.25)] (27.1)
+ + +
+
where S k = Yl YZ . . . Y k ( k = 1 , 2 , .. . , n 1) is a sum of independent and identically distributed standard exponential random variables. Now, let T k , n = Uk>n- Uk-l,, ( k = 1 , 2 , . . . ,n), with the convention that Uo,, 3 0, denote the uniform spacings. Then, from (27.1) it is evident that (27.2) Clearly, therefore, the joint distribution of the uniform spacings TI,^, . . . , Tn,%is the same as the joint distribution of Y1/Sn+l,.. . , Yn/Sn+l.In the simplest case of k = n = 1, it is clear that we have only one variable Yl/(Yl Y z ) and that its distribution is the same as that of TI,^ = U1,1 = U1; in other words, Yl/(YI Y z )has the standard uniform U ( 0 , l ) distribution. In the general case, however, the distribution is somewhat involved and we will now proceed to derive it. With Y1,Y2.. . . , Y,+l being independent standard exponential random variables, we have their joint density function as
+
+
P Y ~ , , ~ , + l ( Y l , . . . , Y Y n + l ) =e--("+
+Yn+l),
2
~ l , . . . , ~ n + l0.
(27.3)
Consider now the transformation V1 = Y1,V2 = Yz,
. . . , V,
= Yn and Vn+l= Yl
+ . . . + Yn+l.
(27.4)
Then, after noting that the Jacobian of this transformation is 1, we obtain from (27.3) the joint density function of V1,V2,.. . , Vn+l as P v ~ , ,v,,,
,
( c 1 , .. . tJn+1)=
e-""+',
211,.
. . ,V n 2 0 ,
n
C Z=1
269
5 v,+I.
21%
(27.5)
270
DIRICHLET DISTRIBUTION
Now, consider the transformation
x1 =
Vl V,+l
~
,
... ,X,=-
V, Vn+l
Xn+l = K+l
and
(27.6)
or equivalently,
Vl
= XIXn+l, . . .
, Vn = Xn Xn +lr and
Vn+l = Xn+l.
(27.7)
From (27.7), it is clear that the Jacobian of this transformation is Xz+l. Then, we obtain from (27.5) the joint density function of XI, X 2 , . . . ,Xn+l as P X ,..., ~ x,+, (21,. . . ,2n+1) = e-""+'
05
21,.
Integrating out the variable function of X I , . . . , X , as
.., 2 ,
z,+1
< 1, 0 5
n
x:+11
5 1, ~ , + 12 0.
(27.8)
i=l
from (27.8), we then obtain the joint density n
~x~,...,x~(21,...,2,)=~!, 0 < ~ 1 , . . . , 2 , I1, 0 < c ~ I 1 .
(27.9)
i=l
In addition, from the density functions in (27.8) and (27.9), we readily observe that the random vector (XI,. . . ,X n ) and the random variable Xn+l are statistically independent. We thus have the joint density function of the uniform spacings TI,,, . . . , T,,n to be given by (27.9). If A, denotes the region
A,=
i
n
I
, . . . ,x,): O < Z ~, . . . ,~ ~ 50 1 < C , ~ i < l,
(21
i=l
(27.10)
then we imniediately have from the joint density function in (27.9) that (27.11)
It turns out that (27.11) is the simplest case of the well-known Dirichlet integral formula, which, in its general form, is [Dirichlet (1839)]
for ak's positive, where I?(.) denotes the complete gamma function. For details on the history of the Dirichlet integral formula above and its rolo in probability and statistics, one may refer to Gupta and Richards (2001).
DERIVATION OF DIRICHLET FORMULA
27.2
271
Derivation of Dirichlet Formula
In deriving the simpler formula in (27.11), we started with the random variables Y1, . . . , Yn+l as independent standard exponential random variables. As a natural generalization, let us now assume that Y1,. . . , Yn+l are independent gamma random variables with Yk r ( a k ,0, l),where ak > 0. Then, the joint density function of Y1, . . . ,Yn+lis N
With the transformation in (27.4), we obtain the joint density function of Vl,. . , Vn+l as '
Now making the transformation in (27.7) (with the Jacobian as X:+l), obtain from (27.14) the joint density function of X I , . . . ,Xn+las P X 1 , ...,Xn+1 (51,.. '
x (1
-
we
,Zn+l)
p)an+l-', n
Upon integrating out the variable xn+l in (27.15), we then obtain the joint density function of X I , . . . , Xn as PXl,...,x, (21, . . . , Z n )
c n
0 5 21, . . . , 2 n 5 1, 0 5
22
i=l
I 1.
(27.16)
272
DIRICHLET DISTRIBUTION
Indeed, from the fact that (27.16) represents a density function, we readily obtain
which is exactly the Dirichlet integral formula presented in (27.12). We thus have a multivariate density function in (27.16) which is very closely related to the multidimensional integral in (27.12) evaluated by Dirichlet (1839).
Notations
27.3
A random vector X = ( X I ,. . . , X,) is said to have an n-dimensional standard Dirichlet distribution with positive parameters a l , . . . , a,+l if its density function is given by (27.16) and is denoted by X D,(al,. . . , a,+l). Note that when
71. =
-
1, (27.16) reduces to
which is nothing but the standard beta distribution discussed in Chapter 16. Indeed, the linear transformation of the random va.riables X I , . . . ,X , will yield the random vector (bl c1X1,. . . , b, cnXn) having an n-dimensional general Dirichlet distribution with shape parameter ( a l ,. . . , a,+l), location parameters ( b l , . . . , b,), and scale parameters (c1,. . . , c,). However, for the rest, of this chapter we consider only the standard Dirichlet distribution in (27.16), due t o its simplicity.
+
+
Marginal Distributions
27.4
Let X D n ( a l , . . . ,a,+l). Then, as seen in Section 27.2, the components XI, . . . , X , admit the representations N
(27.17) and yk
d
XI, =
-
~
s,.+1
,
k = 1 , . . . ,n,
(27.18)
where Y1. . . . , Yn+l are independent standard gamma random variables with Yk r ( a k ,0,1) and = Y1 .. Y,+l. From the properties of gamma distributions (see Section 20.7), it is then known that
sn+l
S,,,
-
+. +
r ( a , 0 , l ) with a = a1
+ . . . + a,+l,
MARGINAL DISTRIBUTIONS
Sn+l-
Yk
N
r ( a - a k , 0,1) (independent of
and d
y k
&
XI, = -sn+l
273
y k
y k
+ (!%+I
-Yk)
-
Yk)
(27.19)
B e ( a k , u - arc);
that is, the margina.1 distribution of X k is beta B e ( a k , a - u k ) , k = 1,.. . ,n (note that this is just a Dirichlet distribution with n = 1). Thus, the Dirichlet distribution forms a natural mult,ivariate generalization of the beta distribution. Exercise 27.1 From the density function of X in (27.16), show by means of direct integration that the marginal distribution of x k is B e ( a k , a - a k ) .
For two-dimensional marginal distributions, it follows from (27.17) that for any 1 5 k < e 5 n,
where Yk a t , 0, l), and that N
(27.20)
-
0, I ) , Z = - Y k - 5 r(a- ak 0, I),5 and 2 are independent. Thus, (27.20) readily implies
Y k , Ye,
N
(27.21) In a similar manner, we find that
for any
T
= 1 , . . . ,n. and 1 5 k(1)
< k ( 2 ) < . . . < k ( r ) 5 n.
Exercise 27.2 If (XI,.. . ,XS)
N
(Xl,x2 + x3,x4
D ~ ( a 1 ,. . , a 7 ) , show that
+ x5 + X S )
Exercise 27.3 Let (XI,.. . , X n ) of Wk = XI + . . . X k .
+
N
DD3(Ul,a2
+ a3, + a5 + a(j,a7). a4
D n ( a l , . . . ,u,+l). Find the distribution
Exercise 27.4 Let (XI,X 2 ) Dz(a1, a2, a 3 ) . Obtain the conditional density function of XI, given X2 = 2 2 . Do you observe some connection to the beta distribution? Derive an expression for the conditional mean E(XIIX2 = 2 2 ) and comment. N
2 74
DIRICHLET DISTRIBUTION
Marginal Moments
27.5
Let X V , ( a l , . . . ,a,+l). In this case, as shown in Section 27.4, the marginal distribution of Xk is beta Be(ak, a - a k ) , where a = a1 . . . a,+l. Then, from the formu1a.s of moments of beta distribution presented in Section 16.5, we immediately have
+ +
N
27.6
-
Product Moments
Let X D n ( a l , .. . , a,+l). Then, from the density function in (27.16), we have the product moment of order ( a 1 , .. . , a,) as
E ( X p l .. . X:")
where the last equality follows by an application of the Dirichlet integral formula. in (27.12). In particula.r, we obtain from (27.23) that
and
Exercise 27.5 Let X N D T L ( a l., . , lation coefficient between X I , and X!.
Derive the covariance and corre-
2 75
DIRICHLET DISTRIBUTION OF SECOND KIND
27.7
Dirichlet Distribution of Second Kind
In Chapter 16, when dealing with a beta random variable X probability density function
-
B e ( a , b) with
by considering the transformation Y = X/(1 - X ) or equivalently X = Y / ( 1 Y ) ,we introduced the beta distribution of the second kind with probability density function
+
-
In a similar manner, we shall now introduce the Dirichlet distribution of the second kind. Specifically, let X D,(al,. . . , a,+l), where a k > 0 ( k = 1 , . . . , n + 1) are positive parameters. Now, consider the transformation
x1 . . . ,y, Y1 = 1 - x1- . . . Xn ’
1
1-
x n
x1
-
. . . - xrl
(27.24)
or equivalently, x1=
Yl ... l+Y1+...+Yn’
,x, 1 + Y1 +Yn. . ’ + Y,‘ 1
(27.25)
+ +
Then, it can be shown that the Jacobian of this transformation is (1 Y1 . . . + Y,)p(n-tl).Then, we readily obtain from (27.16) the density function of Y = (YI,.. . ,Yn)as
The density function (27.26) is the Dirichlet density of the second kind.
Exercise 27.6 Show that the Jacobian of the transformation in (27.25) is (1+ Yl + . . . + Yrl)-(n+l) (use elementary row and column operations). Exercise 27.7 Suppose that Y has a Dirichlet distribution of t,he second kind in (27.26). Derive explicit expressions for EYk, Var Yk, cov(Yk,f i ) , and correlation p(Yk,Ye).
276
27.8
DIRICHLET DISTRIBUTION
Liouville Distribution
Liouville (1839) generalized the Dirichlet integral formula in (27.12) by establishing that
where a l , . . . , u, are positive parameters, 2 1 , . . . , 2 , are positive, and f ( . ) is a suitably chosen function. It is clear that if we set h = 1 and choose f ( t ) = (1- t)an+l-l, (27.27) readily reduces to the Dirichlet integral formula in (27.12). Also, by letting h + 00 in (27.27), we obtain the Liouuille integral formula
where ~ 1 ,. .. , ulL> 0 and tal+."+an-l f ( t )is integrable on ( 0 ,co). The Liouville integral formula in (27.28) readily yields Liouwillr distribution with probability density function
p x l ,...)x n ( Z 1 , . . . , 2 , ) 21,
...,% > O ,
= a1
c f(21 + . ' . + 2 , )
,..., a,>O,
2y1-1
..' x a n T L - l
l
(27.29)
where C is a normalizing constant a.nd f ( . ) is a nonnegative function such that f ( t ) t " i + ' . ' f a . z - 1 is integrable . on (0, co). For a historical view a.nd details on the Liouville distribution, one may refer to Gupta and Richards (2001). '
Exercise 27.8 Show that the Dirichlet distribution of the second kind in (27.26) is a Liouville distribution by choosing the function f ( t ) appropriately and then determining the constant C from the Lioiiville integral forniiila in (27.28).
APPENDIX PIONEERS IN DISTRIBUTION THEORY
As is evident from the preceding chapters, several prominent mathematicians and statisticians have made pioneering contributions to the area of statistical distribution theory. To give students a historical sense of developments in this important and fundamental area of statistics, we present here a brief biographical sketch of these major contributors. Bernoulli, Jakob Born - January 6,1655, in Basel, Switzerland Died - August 16, 1705, in Basel, Switzerland Jakob Bernoulli was the first of the Bernoulli family of Swiss mathematicians. His work Ars Conjectandi (The Art of Conjecturing),published posthumously in 1713 by his nephew N. Bernoulli, contained the Bernoulli law of large numbers for Bernoulli sequences of independent trials. Usually, a random variable, taking values 1 and 0 with probabilities p and 1 - p , 0 p 1, is said to have the Bernoulli distribution. Sometimes, the binomial distributions, which are convolutions of Bernoulli distributions, are also called the Bernoulli distributions.
< <
Burr, Irving W. Born April 9, 1908, in Fallon, Nevada, United States Died - March 13, 1989, in Sequim, Washington, United States ~
Burr, in a famous paper in 1942, proposed a number of forms of explicit cumulative distribution functions that might be useful for purposes of graduation. There were 12 different forms presented in the paper, which have since come to be known as the B w r system of distributions, have been studied quite extensively in the literature. A number of well-known distributions such as the uniform, Rayleigh, logistic, and log-logistic are present in Burr’s system as special cases. In the years following, Burr worked on inferential problems and fitting methods for some of these forms of distributions. In one of his last 277
278
APPENDIX
papers (co-authored with E. S. Pearson and N. L. Johnson), he also made an extensive comparison of different systems of frequency curves.
Cauchy, August in-Louis Born Died
~~
August 21, 1789, in Paris, France May 23, 1857, in Sceaux (near Paris), France
Augustin-Louis Cauchy was a renowned French mathematician. He investigated the so-called Cauchy functions p(x,h, a ) , which had Fourier transformations of the form
f ( t ) = exp (-hiti") ,
h > 0, a
> 0.
It was proved later that p(x,h, a ) , 0 < Q 5 2, are indeed probability density functions. The special case of the distribution with pdf
and with characteristic function
f(t) = exP(-ltl) is called the standard Cauchy distribution. The general Caiichy distribution, of course, has the pdf p((x - u ) , h, 1). Dirichlet, Johann Peter Gustav Lejeune
Born Died
~
~
February 13, 1805, in Duren, French Empire (now Germany) May 5, 1859, in Gottingen, Hanover, Germany
Lejcune Dirichlet's family came from the neighborhood of Ligge in Belgium and not, as many had claimed, from France. Dirichlet had some of the renowned mathematicians as teachers and profited greatly from his contacts with Biot, Fourier, Hachette, La.place, Legendre, and Poisson. Dirichlet made pioneering contributions to different areas of mathematics starting with his famous first paper on Ferrnat's last theorem. In 1839, Dirichlet established the general integral formula
The above n-dimensional integral is now known as the Dirichlet integral and the probability distribution arising from the integral formula as the Dirichlet distribution.
APPENDIX
2 79
Fisher, Ronald Aylmer Born Died
~
February 17, 1890, in London, England July 29, 1962, in Adelaide, Australia
Fisher was a renowned British statistician and geneticist and is generally regarded as the founder of the field of statistics. The Fisher's F-distribution, having the pdf
plays a very important role in many statistical inferential problems. One more distribution used in the analysis of variance, called the Fisher's z-distribution, has the pdf
where m, 72
= 1 , 2 , .. .
.
F'r&chet,Re&-Maurice Born Died
-
September 2, 1878, in Maligny, France June 4, 1973, in Paris, France
Frkhet was a renowned French mathematician who successfully combined his work in the areas of topology and the theory of abstract spaces to make his essential contribution to statistics. In 1927, he derived one of the three possible limiting distributions for extremes. Hence, the family of distributions with cdf G(z,cr) = exp(--2-a), 2 > 0, a! > 0, is sometimes referred to as the Fre'chet type of distributions.
Gauss, Carl F'riedrich Born April 30, 1777, in Braunschweig, Duchy of Brunswick, Germany Died - February 23, 1855, in GGttingen, Hanover, Germany ~
Gauss is undisputably one of the greatest mathematicians of all time. In his theory of errors in 1809, Gauss suggested that the normal distributions for erros with density
For this rea.son, the normal distribution is often called the Gaussian law or as the Gaussian distribution.
280
APPENDIX
Gnedenko, Boris Vladimirovich
Born Died
January 1, 1912, in Simbirsk (Ulyanovsk), Russia December 27, 1995, in Moscow, Russia
~
~
In the first half of the twentieth century, foundations for the theory of extreme values were laid by R.-M. Frkchet, R. von Mises, R. A. Fisher, and L. H. C. Tippett. Consolidating these works with his own, Gnedcnko produced his outstanding paper in 1943 t,hat discussed at great length the asymptotic behavior of extremes for independent and identically distributed random variables. For this reason, sometimes the limiting distributions of maximal values are called three types of Gnedenko ’s limiting distributions. Gosset, William Sealey
Born Died
June 13, 1876, in Canterbury, England October 16, 1937, in Beaconsfield, England
~
Being a chemist and later a successful statistical assistant in the Guiniiess brewery, Gosset did important ea.rly work on statistics, which he wrote under the pseudonym Student. In 1908, he proposed the use of the t-test for quality control purposes in brewing industry. The corresponding distribution has pdf
and it is ca.lled Student’s t distribution with n degrees offreedom. Helmert, F’riedrich Robert
Born Died
July 31, 1843, in Freiberg (Saxony), Germany June 15, 1917, in Potsdam (Prussia.), Germany
~
~
Helmert was a famous German mathematical physicist whose main research was in geodesy, which led him to investigate several statistical problems. In a famous paper in 1876, Helmert first proved, for a randoni sample X I ,. . . ,X , from a norrrial N ( a ,0 2 ) distribution, the independence of X arid any function of X I - X ,. . . , X , - X ,including the variable S2 = C,”=,(X, X ) 2 / a 2 .Then, using a very interesting transformation of variables, which is now referred t o in the literature as Helmert’s transformataon, he proved that S2 is distributed as chi-square with n - 1 degrees of freedom. ~
Johnson, Norman Lloyd
Born
~
January 9, 1917, in Ilford, Essex, England
Johnson, having started his statistical career in London in 1940s, came into close contact and collabora.tion with such eminent statisticians as B. L.
APPENDIX
281
Welch, Egon Pearson, and F. N. David. Motivated by the Pearson family of distributions and the idea that it would be very convenient to have a family of distributions, produced by a simple transformation of normal variables, such that for any pair of values (&, 7 2 ) there is just one member of this family of distributions, Johnson in 1949 proposed a collection of three transformations. These resulted in lognormal, S g , and Su distributions, which are now referred to as Johnson's system of distributions. In addition, his book System of Frequency Curves (co-authored with W. P. Elderton), published by Cambridge University Press, and the series of four volumes on Distributions in Statistics (co-authored with S . Kotz), published by John Wiley & Sons and revised, have become classic references to anyone working on statistical distributions and their applications.
Kolmogorov, Andrey Nikolayevich Born Died
April 25, 1903, in Tambov, Russia October 20, 1987, in Moscow, Russia
~
~
Kolmogorov, one of the most outstanding mathematicians of the twentieth century, produced many remarkable results in different fields of mathematics. His deep work initiated developments on many new directions in modern probability theory and its applications. In 1933, he proposed one of the most popular goodness-of-fit tests called the Kolmogorov-Smirnov test. From this test procedure, a new distribution with cdf
originated, which is called the Kolmogorov distribution in the literature.
Kotz, Samuel Born
-
August 28, 1930, in Harbin, China
Kotz has made significant contributions to many areas of statistics, most notably in the area of statistical distribution theory. His four volumes on Distributions in Statistics (co-authored with N. L. Johnson), published by John Wiley & Sons in 1970 and revised, have become classic references to anyone working on statistical distributions and their applications. His 1978 monograph Characterszation of Probability Distributions (co-authored with J. Galambos), published by Springer-Verlag, and the 1989 book Multivariate Symmetric Distributions (co-authored with K. T. Fang and K. W. Ng), published by Chapman & Hall, have become important sources for researchers working on characterization problems and multivariate distribution theory. A family of elliptically symmetric distributions, which includes t,hP multivariate normal distribution as a special case, are known by the name Kotz-type elliptical distributions in the literature.
APPENDIX
282 Laplace, Pierre-Simon
Born Died
March 23, 1749, in Beaumount-en-Auge, France Ma.rch 5, 1827, in Paris, France
~
~
Laplace was a renowned French astronomer, mathematician, and physicist. In 1812, he published his famous work The'orie analytique des probabilite's (A n alytic Theory of Probabilrty), wherein he rationalized the necessity to consider and investigate two statistical distributions, both of which carry his name. The first one is the distribution with pdf
where -m < n < 00 and X > 0, which is called the Laplace distribution and sometimcs the first law of Laplace. The second one is the normal, which is sometimes ca.lled the second law of Laplace or Gauss--Laplace distribution. Linnik, Yuri Vladimirovich
Born Died
January 21, 1915, in Belaya Tserkov, Russia (Ukraine) June 30, 1972 , in Leningrad (St. Petersburg), Russia
~
~~
Linnik was the founder of the modern St. Petersburg (Leningrad) school of probability and mathematical statistics. He was the first to prove tha.t
f a ( t ) = (1 + Itla)-' (for any 0 < Q 5 2 ) is indeed a characteristic function of some random variable. For this reason, distributions with characteristic functions f a ( t ) are known in the literature as the Linnik distributions. His famous book Charmterization Problems of Mathematical Statistics (co-authored with A. Ka.gan and C. R. Rao), published in 1973 by John Wiley & Sons, also became a basic source of reference, inspiration, and ideas for many involved with research on charact*erizations of probability distributions. Maxwell, James Clerk
Born Died
~
June 13, 1831, in Edinburgh, Scotland November 5, 1879, in Cambridge, England
Jaiiics Maxwell was a farnous Scottish physicist who is regarded as the founder of classical thermodynamics. In 1859, Maxwell was first to suggest that the velocities of molecules in a gas, previously assumed to be equal, must follow the chi distribution with three degrees of freedom having the density
This distribution, therefore, is aptly called the Maxwell distribution.
APPENDIX
283
Pareto, Vilfredo Born - July 15, 1848, in Paris, France Died - August 20, 1923, in Geneva, Switzerland Pareto was a renowned Italian economist and sociologist. He was one of the first who tried t o explain and solve economic problems with the help of statistics. In 1897, he formulated his law of income distributions, where cdf’s of the form
played a very important role. For this reason, these distributions are referred to in the literature as the Pareto distributions.
Pascal, Blaise Born Died
-
June 19, 1G23, in Clermont-Ferrand, France August 19, 1662, in Paris, France
Pascal was a famous French mathematician, physicist, religious philosopher, and one of the founders of probability theory. A discrete random variable taking on values 0 , 1 , . . . with probabilities
where 0 < p < 1 and r is a positive integer, is said to have the Pascal distribution with parameters p and r. This distribution is, of course, a special case of the negative binomial distributions.
Pearson, Egon Sharpe Born August 11, 1895, in Hampstead, London, England Died - June 12, 1980, in Midhurst, Sussex, England -
Egon Pearson, the only son of the eminent British statistician Karl Pearson, was influenced by his father in his early academic career and later by his correspondence and association with “Student” (W. S. Gosset) and Jerzy Neyman. His collaboration with Neyman resulted in the now-famous NeymanPearson approach to hypothesis testing. He successfully revised Karl Pearson’s Tables for Statisticians and Biometricians jointly with L. J. Comrie, and later with H. 0. Hartley producing Biometrika Tables for Statisticians. Egon Pearson constructed many important statistical tables, including those of percentage points of Pearson curves and distribution of skewness and kurtosis coefficients. Right up to his death, he continued his work on statistical distributions, and, in fact, his last paper (co-authored with N. L. Johnson and I. W. Burr) dealt with a comparison of different systems of frequency curves.
284
APPENDIX
Pearson, Karl Born Died
-
March 27, 1857, in London, England April 27, 1936, in Coldharbour, Surrey, England
Karl Pearson, an eminent British statistician, is regarded as the founding father of sta.tistica1 distribution theory. Inspired by the famous book Natural Inheritance of Francis Galton published in 1889, he started his work on evolution by examining large data sets (collected by F. Galton) when he noted systema.tic departures from normality in most cases. This led him to the development of the P e a r s o n s y s t e m of frequency curves in 1895. In addition, he also prepared Tables for Statisticians and B i o m e t r i c i a n s in order to facilitate the use of statistical methods by practitioners. With the support of F. Galton and W. F. R. Weldon, he founded the now-prestigious journal B i o m e t r i k a in 1901, which he also edited from it,s inception until his death in 1936.
Poisson, Sirneon-Denis Born Died
June 21, 1781, in Pithiviers, France April 25, 1840, in Sceaux (near Paris), Paris
~
~
Poisson was a renowned French mathematician who made fundamental work in applications of mathematics to problems in electricity, ma.gnetism, and mechanics. In 1837, in his work Recherches s u r la probabilitk des j u g e m e n t s e n mati2re criminelle e t e n m a t i i r e civile, pre'ce'de'es des r2gles ye'ne'rales d u calcul des probnbilite's, the Poisson distribution first appeared as an approximation to the binomial distribution. A random variable has the P o i s s o n distribution if it takes on values 0,1,. . . with probabilities
P d y a , George Born Died
~
December 13, 1887, in Budapest, Hungary Scptember 7, 1985, in Palo Alto, California, United States
P6lya was a famous Hungarian-born U.S. mathematician who is known for his significant contributions to combinatorics, number theory, and probability theory. He was also an author of some popular books on the problem-solving process, such as How t o Solve It and Mathematical Discovery. In 1923, P6lya discussed a special urn model which generated a probability distribution with probabilities 7L
p(p
+ a )' ..{ p + ( k
-
l)a}q(q
+ a )' . .{ q + ( n
-
k
-
l)a}
, . . (1 + ( n- l ) a } corresponding to the values 0,1,. . . ,n, where 0 < p < 1, q = 1 - p , a > 0 , pk = ( k )
+ a ) ( l+ 2 0 )
(1
'
and n = 1 , 2 , . . . . For this reason, this distribution is called as the Pdlya distribution.
APPENDIX
285
Rao , Ca1yampudi Radhakrishna Born
-
September 10, 1920, in Huvvinna Hadagalli (Karnataka), India
Rao is considered to be on the most creative thinkers in the field of statistics and one of the few pioneers who brought statistical theory t o its maturity in the twentieth century. He has made numerous fundamental contributions to different areas of statistics. The Rao-Rubin characterization of the Poisson distribution and the Lau-Rao characterization theorem based on integrated Cauchy functional equation, published in 1964 and 1982, generated great interest in characterizations of statistical distributions. Another significant contribution he made in the area of stat,istical distribution theory is on weighted distributions. In addition, his books Characterization Problems of Mathernatical Statistics (co-authored with A. Kagan and Yu. V. Linnik), published in 1973 by John Wiley & Sons, and Choquet-Deny Type Functional Equations with Applications to Stochastic Models (co-authored with D. N. Shanbhag), published in 1994 by John Wiley & Sons, have also become a basic source of reference, inspiration, and ideas for many involved with research on chara.cterizations of probability distributions.
Rayleigh, 3rd Baron (Strutt, John William) Born - November 12,1842, in Langford Grove (near Maldon), England Died - June 30, 1919, in Terling Place (near Witham), England Lord Rayleigh was a farnous English physicist who was awarded the Nobel Prize for Physics in 1904 for his discoveries in the fields of acoustics and optics. In 1880, in his work connected with the wave theory of light, Rayleigh considered a distribution with pdf of the form
This distribution is aptly called the Rayleigh distribution in the literature.
Snedecor, George Waddel Born Died
-
-
October 20, 1881, in Memphis, Tennessee, United States February 15, 1974, in Amherst, Massachusetts, United States
Snedecor was a famous U.S. statistician and was the first director of the Statistical Laboratory a t Iowa State University, the first of its kind in the United States. In 1948, he also served as the president of the American Statistical Association. The Fisher F-distribution (mentioned earlier) having the pdf
x>0, a>0,
p>o,
286
APPENDIX
is sometimes called as the Snedecor distribution or the Fisher-Snedecor distribution. In 1937, this distribution was tabulated by Snedecor. Student (see Gosset, William Sealey) Tukey, John Wilder
Born - July 16, 1915, in New Bedford, Massachusetts, United States Died - July 26, 2000, in Princeton, New Jersey, United States Tukey was a famous American statistician who made pioneering contributions to many different areas of statistics, most notably on robust inference. To facilitate computationally easy Monte Carlo evaluation of the robustness properties of normal-based inferential procedures, Tukey in 1962 proposed the transformat ion - (1 - X)X if X # 0 Y={ x if X t 0, 1%
xx
(A)
where the random variable X has a standard uniform U ( 0 , l ) distribution. For different choices of the shape parameter A, the transformation above produces a light-ta.iled or heavy-tailed distributions for the random variable X in addition t o providing very good approximations for normal and t distributions. The distributions of X have come to be known as Tukey’s (sym,metric) lambdu distributions. Weibull, Waloddi
Born Died
~
June 18, 1887, in Schleswig-Holstein, Sweden October 12, 1979, in Annecy, France
Wa.loddi Weibull was a famous Swedish engineer and physicist who, in 1939, used distributions of the form 2
< 0, --oo
X>0,
to represent the distribution of the breaking strength of materials. In fact, these are limiting distributions of minimal values. In 1951, Weibull also demonstrated a close agreement between many different sets of data and those predicted with the fitted Weibull model, with the data. sets used in this study relating t o as diverse characteristics as the strength of Bofors’ steel, fiber strength of Indian cotton, length of syrtoideas, fatigue life of an ST-37 steel, statures of adult males born in the British Isles, and breadth of the beans Phaseolus vulga>ris.For this reason, the distributions presented
APPENDIX
287
above are called Weibu,ll distributions, and they have become the most popular and commonly used statistical models for lifetime data. Sometimes, the distributions of maximal values with cdf’s
are also called Weibull-type distributions.
This Page Intentionally Left Blank
BIBLIOGRAPHY Aitchison, J. and Brown, J. A. C. (1957). The Lognormal Distribution, Cambridge University Press, Cambridge, England. Arnold, B. C. (1983). Pareto Distributions, International Co-operative Publishing House, Fairland, MD. Arnold, B. C., Castillo, E. and Sarabia, J.-M. (1999). Conditionally Specified Distributions, Springer-Verlag, New York. Balakrishnan, N. (Ed.) (1992). Handbook of the Logistic Distribution, Marcel Dekker, New York. Balakrishnan, N. and Basu, A. P. (Eds.) (1995). The Exponential Distribution: Theory, Methods and Applications, Gordon & Breach Science Publishers, Newark, NJ. Balakrishnan, N. and Koutras, M. V. (2002). R u n s and Scans with Applications, John Wiley & Sons, New York. Bansal, N., Hamedani, G. G., Key, E. S., Volkmer, H., Zhang, H. and Behboodian, J. (1999). Some characterizations of the normal distribution, Statistics & Probability Letters, 42, 393-400. Bernstein, S. N. (1941). Sur line propriktk charactkristique de la loi de Gauss, Transactions of the Leningrad Polytechnical Institute, 3,21-22. Box, G. E. P. and Muller, M. E. (1958). A note on the generation of random normal deviates, Annals of Mathematical Statistics, 29, 610-611. Burr, I. W. (1942). Cumulative frequency functions, Annals of Mathematical Statistics, 13, 215-232. Cacoullos, T. (1965). A relation between t and F distributions, Journal of the A m e r i c a Statistical Association, 60, 528-531. Correction, 60, 1249. Chhikara., R. S. and Folks, J. L. (1989). The Inverse Gai~ssianDistribution: Theory, Methodology and Applications, Marcel Dekker, New York. 289
290
BIBLIOGRAPHY
Consul, P. C. (1989). Generalized Poisson Distributions: Properties and Applications, Marcel Dekker, New York. Cram&, H. (1936). Uber eirie Eigenscha.ft der normalen Verteilungsfunktion, Mathematische Zeitschrift, 41, 405-414. Crow, E. L. and Shimizu, K. (Eds.) (1988). Lognormal Distributions: Theory an,d Applications, Marcel Dekker, New York. Darmois, D. (1951). Sur diverses propriktks charactkristique de la loi de probabilitk de Laplace-Gauss, Bulletin of the International Statistical Institute, 23(11), 79-82. Devroye, L. (1990). A note on Linnik’s distribution, Statistics & Probability Letters, 9, 305- 306. Diriclilet, J . P. G. L. (1839). Sur une nouvelle mhthode pour la dktermination des intkgrales multiples, Lioi~iiille,Journal de Mathe‘matiques, Series I, 4, 164-168. Reprinted in Dirichlet’s Werke (Eds., L. Kronecker and L. Fuchs), 1969, Vol. 1, pp. 377--380, Chelsea, New York. Douglas, J . B. (1980). Analysis with Standard Contagious Distributions, International Co-operative Publishing House, Burtonsville, MD. Eggenberger, F. and P d y a , G. (1923). Uber die Statistik verketetter Vorgange, Zeitschrift fur Angewandte Mathematik und Mechanik, 1,279-289. Evans, M., Peacock, B. and Hastings, N. (2000). Statistical Distributions, 3rd rd., John Wiley & Sons, New York. Feller, W. (1943). On a general class of “contagious” distributions, Annals of Mathematical Statistics, 14, 389-~400. Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., John Wiley & Sons, New York. Gilchrist, W. G. (2000). Statistical Modelling with Quantile Functions, Chapman & Hall/CRC, London, England. Godambe, A. V. a.nd Patil, G. P. (1975). Some characteriza,tions involving additivity and infinite divisibility and their applications to Poisson mixtures and Poisson sums, In Statistical Distributions in Scientific Work ~- 3: Characterizations and Applications (Eds., G. P. Patil, S. Kotz and J. K. Ord), pp. 339-351, Reidel, Dordrecht. Govindarajulu, Z. (1963). Relationships a.mong moments of order statistics in samples from two related populations, Technometrics, 5 , 514-518. Gradshteyn, I. S. and Ryzhik, I. M. (Eds.) (1994). Table of Integrals, Series, and Products, 5th ed., Academic Press, San Diego, CA. Gupta, R. D. and Richards, D. St. P. (2001). The history of the Dirichlet and Liouville distributions, International Statistical Review, 69, 433-446.
BIBLIOGRAPHY
291
Haight, F. A. (1967). Handbook of the Poisson Distribution, John Wiley & Sons, New York. Helmert, F. R. (1876). Die Genauigkeit der Formel von Peters zue Berechnung des wahrscheinlichen Beobachtungsfehlers directer Beobachtungen gleicher Genauigkeit, Astronomische Nachrichten, 88, columns 113-120. Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation, Biometrika, 36, 149-176. Johnson, N. L. and Kotz, S. (1990). Use of moments in deriving distributions and some characterizations, The Mathematical Scientist, 15, 42-52. Johnson, N. L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distributions, Vol. 1, 2nd ed., John Wiley & Sons, New York. Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, Vol. 2, 2nd ed., John Wiley & Sons, New York. Johnson, N. L., Kotz, S. and Balakrishnan, N. (1997). Discrete Multivariate Distributions, John Wiley & Sons, New York. Johnson, N. L., Kotz, S. and Kemp, A. W. (1992). Univariate Discrete Distributions, 2nd ed., John Wiley & Sons, New York. Jones, M. C. (2002). Student’s simplest distribution, Journal of the Royal Statistical Society, Series D, 51, 41-49. Kemp, C. D. (1967). ‘Stuttering-Poisson’ distributions, Journal of the Statistical and Social Enquiry Society of Ireland, 21, 151-157. Kendall, D. G. (1953). Stochastic processes occuring in the theory of queues and their analysis by the method of the imbedded Markov chain, Annals of Mathematical Statistics, 24, 338-354. Klugman, S. A., Panjer, H. H. and Willmot, G. E. (1998). Loss Models: From Data to Decisions, John Wiley & Sons, New York. Kolmogorov, A. N. (1933). Sulla determinazi6ne empirica di una lkgge di distribuzihne, Giornale d i Istituto Italaliano degli Attuari, 4, 83-93. Kotz, S., Balakrishnan, N. and Johnson, N. L. (2000). Continuous Multivariate Distributions, Vol. 1, 2nd ed., John Wiley & Sons, New York. Kotz, S., Kozubowski, T. J . and Podgbrski, K. (2001). The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance, Birkhauser, Boston. Krasner, M. and Ranulac, B. (1937). Sur line propri6t6 des polynomes de la division du cercle, Comptes Rendus Hebdomadaizes des Se‘ances de 1’Acade‘mie des Sciences, Paris, 204, 397-399. Krysicki, W. (1999). On some new properties of the beta distribution, Statistics & Probability Letters, 42, 131-137.
292
BIBLIOGRAPHY
Lancaster, H. 0. (1969). The Chi-Squared Distribution, John Wiley & Sons, New York. Laplace, P. S. (1774). Mkmoire sur las probabilitk des causes par les isvknemens, Mkmoires de Mathe'matique et de Physique, 6, 621--656. Linnik, Yu. V. (1953). Linear forms and statistical criteria I, 11, Ukrainian Mathematical Journal, 5, 207-243, 247-290. Linnik, Yu. V. (1963). Linear forms and statistical criteria I, 11, in Selected Translations an Mathematical Statistics and Probability, Vol. 3, American Mathematical Society, Providence, RI, pp. 1-90. Liouville, J. (1839). Note sur quelques intkgrales dkfinies, Journal de Mathe'matiques Pures et Applique'es (Liouville 's Journal), 4, 225-235. Lukacs, E. (1965). A characterization of the gamma distribution, Annals of Mathematical Statistics, 26, 319-324. Marcinkiewicz, J. (1939). Sur une propriktk de la loi de Gauss, Mathematische Zeitschrift, 44, 612-618. Marsaglia, G. (1974). Extension and applications of Lukacs' characterization of the gamma distribution, in Proceedings of the Symposium on Statistics and Related Topics, Carleton University, Ottawa, Ontario, Canada. Patel, J. K. and Read, C. B. (1997). Handbook of the Normal Distribution, 2nd ed., Marcel Dekker, New York. Pea.rson, K. (1895). Contributions t o the ma.thematica1 theory of evolution. 11. Skew variations in homogeneous ma.teria1, Philosophical Transactions of the Royal Society of London, Series A , 186, 343-414. Perks, W . F. (1932). On some experiments in the graduation of mortality statistics, Journal of the Institute of Actuaries, 58, 12-57. Pblya, G. (1930). Sur quelques points de la thkorie des probabilitks, Annales de l'lnstitut H. Poincare', 1, 117-161. P d y a , G. (1932). Verleitung des Gauss'schen Fehlergesetzes ails eirier Funktionalgleichung, Mathematische Zeitschrifl, 18, 185-188. Proctor, J . W. (1987). Estimation of two generalized curves covering the Pearson system, Proceedings of the A S A Section on Statistical Computing, 287-292. Raikov, D. (1937a). On a property of the polynomials of circle division, Matematicheskii Sbornik, 44, 379-381. Raikov, D. A. (1937b). On the decomposition of the Poisson law, Doklady Akadernii Nauk SSSR, 14, 8-11. Raikov, D. A. (1938). On the decomposition of Gauss and Poisson Laws, Izvestia Akademii Nauk SSSR, Serija MatematiCeskie, 2, 91-124.
BIBLIOGRAPHY
293
Rao, C. R. (1965). On discrete distributions arising out of methods of ascertainment, in Classical and Contagious Discrete Distributions (Ed., G. P. Patil), Pergamon Press, Oxford, England, pp. 320--332; see also Sankhya, Series A , 27, 311-324. Rao, C. R. and Rubin, H. (1964). On a characterization of the Poisson distribution, Sankhya, Series A , 26, 2955298. Sen, A. and Balakrishnan, N. (1999). Convolution of geometrics and a reliability problem, Statistics & Probability Letters, 43,421-426. Seshadri, V. (1993). The Inverse Gaussian Distribution: A Case Study in Exponential Families, Oxford University Press, Oxford, England. Seshadri, V. (1998). The Inverse Gaussian Distribution: Statistical Theory and Applications, Lecture Notes in Statistics No. 137, Springer-Verlag, New York. Shepp, L. (1964). Normal functions of normal random variables, SIAM Review, 6 , 459-460. Skitovitch, V. P. (1954). Linear forms of independent random variables and the normal distribution law, Izvestia Akademii Nauk SSSR, Serija MatematiEeskie, 18, 185-200. Snedecor, G. W. (1934). Calculation and Interpretation of the Analysis of Variance, Collegiate Press, Ames, Iowa. Srivastava, R. C. and Srivastava, A. B. L. (1970). On a characterization of the Poisson distribution, Journal of Applied Probability, 7,497-501. Stuart, A. and Ord, J. K. (1994). Kendall’s Advanced Theory of Statistics, Vol. I, Distribution Theory, 6th ed., Edward Arnold, London. “Student” (1908). On the probable error of the mean, Biometrika, 6, 1-25. Tong, Y. L. (1990). The Multivariate Normal Distribution, Springer-Verlag, New York. Tukey, J. W. (1962). The future of data analysis, Annals of Mathemutical Statistics, 33, 1-67. Verhulst, P. J. (1838). Notice sur la loi que la population suit dans sons accroissement, Cor. Mathe‘matiques et Physique, 10, 113-121. Verhulst, P. J. (1845). Recherches mathkmatiques sur la loi d’accroissement de la population, Acade‘mie de Bruxelles, 18, 1-38. Wimmer, G. and Altmann, G. (1999). Thesaurus of Uniuaraite Discrete Probability Distributions, STAMM Verlag, Essen, Germany.
This Page Intentionally Left Blank
AUTHOR INDEX Aitchison, J., 234, 289 Altmann, G., xv, 293 Arnold, B.C., 18, 133, 289
Folks, J. L., 238, 289 Frkchet, R.-M., 279, 280 Fuchs, L., 290
Ba.lakrishnan, N., xv, xvi, 68, 72, 157, 197, 203, 289, 291, 293 Bansal, N., 230, 289 Basu, A. P., 157, 289 Behboodian, J., 230, 289 Bernoulli, J., 277 Bernoulli, N., 277 Bernstein, S. N., 221, 289 Box,G. E. P., 229, 289 Brown, J . A. C., 234, 289 Burr, I. W., 277, 283, 289
Galambos, J., 281 Galton, F., 284 Gauss, C. F., 279 Gilchrist, W. G., 3, 290 Gnedenko, B. V., 280 Godambe, A. V., 98, 290 Gosset, W. S., 280, 283, 290 Govindarajulu, Z., 176, 290 Gradshteyn, I. S., 31, 290 Gupta, R. D., 270, 276, 290 Haight, F. A., 89, 291 Hamedani, G. G., 230, 289 Hartley, H. O., 283 Hastings, N., xv, 290 Helmert, F. R., 226, 280, 291
Cacoullos, T., 241, 289 Castillo, E., 18, 289 Cauchy, A.-L., 278 Chhikara, R. S., 238, 289 Comrie, L. J., 283 Consul, P. C., 100, 290 Cram&, H., 219, 290 Crow, E. L., 234, 290
Johnson, N. L., xv, 98, 148, 234, 278, 280, 281, 283, 291 Jones, M. C., 242, 291 Kagan, A., 282, 285 Kemp, A. W., xv, 291 Kemp, C. D., 98, 291 Kendall, D. G., 39, 291 Key, E. S., 230, 289 Klugman, S. A., 74, 291 Kolmogorov, A. N., 281, 291 Kotz, S., xv, 98, 148, 169, 281, 290, 291 Koutras, M. V., 72, 289 Kozubowski, T. %J., 169, 291 Krasner, M., 36, 291 Kronecker, L., 290 Krysicki, W., 148, 156, 291
Darmois, D., 224, 290 David, F. N., 281 Devroye, L., 237, 290 Dirichlet, J . P. G. L., 270, 272, 278, 290 Douglas, J. B., 98, 290 Eggenberger, F., 101, 290 Elderton, W. P., 281 Evans, M., xv, 290 Fang, K. T., 281 Feller, W., 98, 290 Fisher, R. A., 279, 280 295
296
AUTHOR INDEX
Lancast,er, H. O., 240, 292 Laplace, P. S., 169, 282, 292 Linnik, Yu. V., 237, 282, 285, 292 Liouville, J., 276, 292 Lukacs, E., 186, 292
Rayleigh, 285 Read, C. B., 210, 292 Richards, D. St. P., 270, 276, 290 Rubin, H., 99, 293 Ryzhik, I. M., 31, 290
Marciiikiewicz, J., 214, 228, 292 Marsaglia, G., 186, 292 Maxwell, J. C., 282 Muller, M. E., 229, 289
Sarabia, J.-M., 18, 289 Sen, A., 68, 293 Seshadri, V., 238, 293 Shanbhag, D. N., 285 Shepp, L., 230, 293 Shimizu, K., 234, 290 Skitovitch, V. F’., 224, 293 Snedecor, G. W., 245, 285, 286, 293 Srivasta.va, A. B. L., 99, 293 Srivastava, R. C., 99, 293 Stuart, A., xv, 293 ‘Student’, 240, 286, 293
Nevzorov, V. B., xvi Neyman, J., 283 Ng, K. W., 281
Ord, J . K., xv, 290, 293 Panjer, H. H., 74, 291 Pareto, V., 283 Pascal, B., 283 Patel, J. K., 210, 292 Patil, G. P., 98, 290, 293 Peacock, B., xv, 290 Pearson, E. S., 278, 281, 283 Pearson, K., 8, 283, 284, 292 Perks, W. F., 206, 292 Podgbrski, K., 169, 291 Poisson, S.-D., 284 Pblya., G., 101, 227, 284, 290, 292 Proctor, J. W., 114, 292 Raikov, D., 36, 92, 292 Ranulac, B., 36, 291 Rao, C. It., 99, 282, 285, 293
Tippett, L. H. C., 280 Tong, Y. L., 259, 293 Tukey, J. W., 118, 286, 293 Verhulst, P. J., 197, 293 Volkmer, H., 230, 289 von Mises, R., 280 Weibull, W., 286 Welch, B. L., 281 Weldon, W. F. R., 284 Wilmot, G. E., 74, 291 Wimmer, G., xv, 293 Zhang, H., 230, 289
SUBJECT INDEX Absolutely continuous bivariate distribution, 16, 18, 19, 21 Absolutely continuous distribution, 2, 9 Actuarial analysis, 74 Arcsine distribution, 140, 149, 151156, 186, 231 Characteristic function, 154 Characterizations, 155 Decompositions, 156 Introduction, 151 Moments, 153, 154 Notations, 151, 152 Relationships with other distributions, 155 Shape characteristics, 154 Asymptotic relations, 228 Auxiliary function, 211
Moments, 141 Notations, 140 Relationships with other distributions, 149 Shape characteristics, 147 Some transformations, 141 Beta distribution of the second kind, 140, 275 Beta function, 101, 102, 118, 139 Bilateral exponential distribution, 170 Binomial distribution, 39, 46, 4961, 63, 69, 80, 83, 88, 89, 94-96, 98, 99, 102, 211, 219, 228, 249, 252, 253, 255, 277 Conditional probabilities, 58, 59 Convolutions, 56 Decompositions, 56, 57 Generating function and characteristic function, 50 Introduction, 49 Maximum probabilities, 53 Mixtures, 57, 58 Moments, 50-53 Notations, 49 Tail probabilities, 59 Useful representation, 50 Binomial expansion, 175 Borel set, 1, 15 Borel a-algebra, 1, 15 Box-Muller’s transformation, 229 Brownian motion, 152 Burr system of distributions, 277
Bell-shaped curve, 140, 241 Bernoulli distribution, 37,43-50,56, 63, 69, 72, 87, 98, 112, 2 77 Convolutions, 45, 46 Introduction, 43 Maximal values, 46, 47 Moments, 44, 45 Notations, 43, 44 Relationships with other distributions, 47, 48 Bernstein’s theorem, 221-223 Beta distribution, 8, 59, 115, 117, 139-149, 151, 152, 155, 156, 186, 207, 231, 245, 2722 75 Characteristic function, 147 Decompositions, 148, 149 Introduction, 139 Mode, 140
Cauchy distribution, 12, 117, 119122, 170, 230, 232, 233, 236, 278 297
298
SUBJECT INDEX
Characteristic function, 120 Convolutions, 120 Decompositions, 121 Moments, 120 Notations, 119 Stable distributions, 1 2 1 Transformations, 121 Cauchy function, 278 Cauchy-Schwarz inequality, 8 Central limit theorem, 228 Characteristic function, 10-14, 22, 23, 33, 34, 37, 40, 45, 46, 50, 56, 60, 61, 64, 74, 81, 90, 96, 97, 110, 111, 120, 121, 125, 126, 131, 132, 147, 148, 154, 158, 159, 170, 173, 181, 185, 188, 195, 201, 203-206, 209, 210, 214, 216, 217, 219, 222, 223, 225, 230-239, 254-263, 265, 267, 278, 282 Chi distribution, 232, 282 Ch-square distribution, 180, 226, 227, 231, 239, 240, 245, 246, 280 Closed under minima, 134 Cobb-Douglas distribution, 234 Compositions, 250 Compound Poisson distribution, 98 Conditional distribution, 18, 20, 58, 71, 83, 94, 95, 99, 185187, 219, 220, 251-253, 267, 268, 273 Coriditional expectation, 20, 21 Conditional mean, 20, 21, 273 Conditional probabilities, 58, 59, 71, 72, 94, 95 conditional variance, 268 Contour integration, 120 Convex function, 10 Convolution formula, 201 Convolutions, 34, 35, 41, 45, 46, 56, 68, 69, 76, 92, 111, 120, 164, 165, 173, 185, 217-219 Correlation coefficient, 19, 20, 116, 252-254, 257, 259, 265, 268, 274, 275
Correlation matrix, 19 Countable set, 16 Covariance, 19, 116, 164, 252, 255, 259, 274 Covariance matrix, 19, 253, 257, 259, 261, 263, 265 Cramkr’s result, 222 Curnulants, 216 Darmois-Skitovitch’s theorem, 224226 Decompositions, 14, 35, 36, 41, 56, 57, 69, 70, 76, 80, 92, 111, 121, 148, 149, 156, 165, 166, 174, 185, 204, 217219 Degenerate distribution, 14,36,3941, 49, 214, 216, 222, 223 Convolution, 41 Decomposition, 41 Independence, 40 Introduction, 39 Moments, 39, 40 Degenerate normal distribution, 265 Destructive process, 99 Determinant, 24, 260, 265 Diagonal matrix, 263, 264 Differential equation, 8, 210 Dirichlet distribution, 269-276, 278 Derivation of Diriclllet formula, 271, 272 Dirichlet, distribution of the second kind, 275 Introduction, 269, 270 Liouville distribution, 276 Marginal distributions, 272 Marginal moments, 274 Notations, 272 Product moments, 274 Dirichlet distribution of the second kind, 275, 276 Dirichlet integral formula, 27-272, 274, 276, 278 Discrete bivariate distribution, 16, 18-20 Discrete random walk, 47 Discrete rectangular distribution, 29
299
SUBJECT INDEX Discrete uniform distribution, 2937, 43, 71, 107, 117 Convolutions, 34, 35 Decompositions, 35, 36 Entropy, 36 Generating function and characteristic function, 33, 34 Introduction, 29 Moments, 30-33 Notations, 29, 30 Relationships with other distributions, 36, 37 Distributions Absolutely continuous biva.riate Arcsine Bernoulli Beta Bilateral exponential Binomial Bivariate normal Burr system of Cauchy Chi Chi-square Cobb-Douglas Compound Poisson Conditional Degenerate Degenerate normal Dirichlet Dirichlet of the second kind Discrete bivariate Discrete rectangular Discrete uniform Double exponential Doubly exponential Elliptically symmetric Erlang Exponential Extreme value F First law of Laplace Fisher-Snedecor Fisher’s Fisher’s z Folded Frkhet-type
Gamma Gaussian Gauss-Laplace Generalized arcsine Generalized extreme value Generalized gamma Generalized logistic Type I Type I1 Type I11 Type IV Generalized Poisson Generalized uniform Geometric Geometric of order k Half logistic Hypergeometric Infinitely divisible Inverse Gaussian Johnson’s system of Kolmogorov Lagrangian-Poisson Laplace Lattice Limiting Linnik Liouville Logarithmic Log-gamma Logistic Log-logistic Lognormal Log-Weibull Maxwell Mixture Mult inomial Mu1t ivariat e normal Negative binomial Negative hypergeometric Noncentral Chi-square
F
t
Normal Pareto Pareto of the second kind Pascal Pearson’s family of
300
SUBJECT INDEX Poisson Poisson-stopped-sum P6lya. Power Rayleigh Rectangular Restricted generalized Poisson Run-related
SB SU Sech-squared Second law of Laplace Snedecor Stable Stuttering Poisson Symmct ric symmetric uniform t Triangular Trinomial Truncated Tukey’s lambda Tukey’s symmetric lambda Type VI Type VII Two-tailed exponential Uniform Weibull-type Weighted Double exponential distribution, 170, 191 Doubly exponential distribution, 191 Elementary events, 15 Elliptjicallysymmetric distributions, 281 Entropy, 8, 9, 36, 45, 70, 110, 131, 137, 162, 172, 211 Erlang distribution, 39 Euclidean space, 15 Euler’s constant, 195 Euler’s formula, 204 Expectation, 5, 7, 50, 109 Exponential decay, 160 Exponential distribution, 39, 113, 114, 117, 157-167, 169, 170, 174, 175, 177, 179, 180, 186, 187, 189, 192, 194,
229, 230, 235, 236, 269, 271 Convolutions, 164, 165 Decompositions, 165, 166 Distributions of minima, 162 Entropy, 162 Introduction, 157 Lack of memory property, 167 Laplace transform and characteristic function, 158, 159 Moments, 159, 160 Notations, 157, 158 Shape characteristics, 160 Uniform and exponential order statistics, 163, 164 Exponentially decreasing tails, 198, 199 Extremes, 280 Extreme value distributions, 170, 189-197,201,204,206,228 Generalized extreme value distributions, 193, 194 Introduction, 189, 190 Limiting distributions of maximal values, 190, 191 Limiting distributions of minimal values, 191 Moments, 194-196 Relationships between extreme value distributions, 191
F distribution, 245, 246, 279, 285 Fermat’s last theorem, 278 First law of Laplace, 170 Fisher-Snedecor distribution, 286 Fisher’s distribution, 240 Fisher’s z-distribution, 279 Folded distribution, 176 Fourier transform, 278 Fractional part, 166 FrGchet-type distribution, 191, 279 F test, 246 Functional equation, 18, 190 Gamma distribution, 92, 165, 179188, 206, 207, 208, 228, 231, 239, 271, 272
SUBJECT INDEX Conditional distributions and independence, 185, 187 Convolutions and decompositions, 185 Introduction, 179 Laplace transform and characteristic function, 181 Limiting distributions, 187, 188 Mode, 180 Moments, 181, 182 Notations, 180 Shape characteristics, 182 Gamma function, 75,179, 195,204, 2 70 Gaussian distribution, 211 Gaussian law, 211 Gauss-Laplace distribution, 2 11 Generalized arcsine distribution, 140, 155 Generalized extreme value distributions, 193, 194 Generalized gamma distribution, 184 Generalized logistic distributions, 205-207 Type I, 205, 206 Type 11, 205, 206 Type 111, 206, 207 Type IV, 206, 207 Generalized Poisson distribution, 98, 100 Generalized uniform distribution, 114 Generating function, 10, 11, 13, 22, 24, 33, 34, 36, 50, 64, 68, 69, 72-74, 81, 84, 90, 92, 96, 98, 99, 254, 255 Geodesy, 280 Geometric distribution, 39, 48, 6375, 80, 102, 166 Conditional probabilities, 71, 72 Convolutions, 68, 69 Decompositions, 69, 70 Entropy, 70 Generating function and characteristic function, 64 Geometric distribution of order k , 72 Introduction, 63
30 I
Moments, 64-67 Notations, 63 Tail probabilities, 64 Goodness-of-fit, 281 Half logistic distribution, 203, 204 Heavy-tailed distribution, 286 Helmert’s transformation, 226, 227, 280 Hypergeometric distribution, 59, 71, 83-89, 102 Characteristic function, 84 Generating function, 84 Introduction, 83 Limiting distributions, 88 Moments, 84-87 Notations, 83, 84 Hypergeometric function, 33, 84, 148. 154 Identity matrix, 264 Incomplete beta function, 139 Indecomposable, 14, 35, 36, 156 Independent random variables, 16, 17, 21-23, 40, 128, 135, 148, 152, 156, 162, 166, 185-187, 220-229, 263,264, 270 Inferential problems, 180, 240, 241 Infinitely divisible characteristic function, 14 Infinitely divisible distribution, 14, 57, 70, 80, 92, 166, 174, 185, 219, 233 Integer part, 166 Integrated Cauchy functional equation, 285 Interarrival times, 39 Inverse distribution function, 112 Inverse Fourier transform, 170, 209 Inverse Gaussian distribution, 237, 238 Inverse transformation, 24 Jacobian, 24,25, 186,229, 267,269271, 275 Johnson’s system of, 234, 281 Joint characteristic function, 221, 222, 225
302
SUBJECT INDEX
Joint probability density function, 17, 18, 115, 116, 185-187, 226, 227, 229, 270, 271 Kolmogorov distribution, 281 Kolmogorov-Smirnov test, 281 Kurtosis, 7, 8, 44, 45, 52, 53, 66, 76, 91, 110, 125, 129, 147, 154, 160, 172, 182, 201, 238, 241, 283 Lack of memory property, 71, 167 Lagrangian-Poisson distribution, 100 Laplace distribution, 165, 169-177, 191, 204, 214, 236, 282 Characteristic function, 170 Convolutions, 173 Decompositions, 174 Entropy, 172 Introduction, 169 Moments, 171 Notations, 169, 170 Shape characteristics, 172 Laplace transform, 158, 159, 164, 179, 181 Lattice distribution, 29 Lau-Rao characterization theorem, 285 Law of large numbers, 277 Least squares method, 211 Leptokurtic distribution, 8, 53, 66, 76, 91, 147, 160, 172, 182, 201, 238, 241 Life-time data analysis, 184, 196 Light-tailed distribution, 286 Limiting distribution, 37, 39, 5961, 81, 88, 96, 182, 187193,209,228, 256-259, 280, 286 Limit, process, 152 Limit theorems, 256, 257 Linear combinations, 220-228, 262, 263 Linear dependence, 265 Linear regression, 268 Linnik distribution, 235-237, 282 Liouville integral formula, 276
Location parameter, 4, 6, 30, 127, 134, 158, 169, 184, 190, 198, 211, 216, 220, 272 Logarithmic distribution, 99 Log-gamma distribution, 184, 207 Logistic distribution, 197-207, 277 Characteristic function, 201 Decompositions, 204 Generalized logistic distributions, 205-207 Introduction, 197 Moments, 199-201 Notations, 197, 198 Order statistics, 205 Relationships with other distributions, 203 Shape characteristics, 201 Log-logistic distribution, 277 Lognormal distribution, 233, 234, 281 Log-Weibull distribution, 191 Marginal distribution function, 1618, 21, 250-252, 266, 273 Maxima, 46,47, 113, 114, 127, 128, 189-191, 193, 228, 280 Maximal sum, 152 Maxwell distribution, 232, 282 Mean, 5, 8, 30, 72, 100, 103, 115, 118, 153, 154, 192, 203, 205, 211, 216, 217, 220, 233, 234, 239, 241, 245, 246, 255 Mean vector, 19, 257, 261, 265 Measurable function, 20, 21 Mesokurtic distribution, 8, 53, 147 Minima, 113, 114, 134, 162, 189191, 286 Mixture distribution, 57, 58, 96-99 Moment generating function, 12, 184 Moments, 4, 7, 8, 12, 30-33, 39, 40, 44, 45, 50-53, 64-67, 7476, 84-87,90,91, 108-110, 118, 120, 124, 125, 129, 136, 137, 141-146, 153-155, 159, 160, 171, 174, 175, 181, 182, 184, 186, 194196, 199-201, 206, 215,216,
SUBJECT INDEX
228, 234, 237, 241, 245, 252-254 about p , 125 about zero, 4-6, 30, 44, 50, 52, 65, 75, 87, 91, 108, 124, 129, 136, 145, 153, 155, 159, 171, 181, 199, 200, 215, 216 Central, 5, 6, 8, 30, 44, 50, 52, 66, 76, 91, 109, 124, 125, 136, 137, 145, 153, 155, 160, 171, 182, 200, 216 Factorial, 6, 7, 10, 30, 31, 51, 64, 65, 74, 75, 86, 90 Marginal, 274 of negative order, 146, 181 Product, 19, 176, 274 Second-order, 260, 261 Monte Carlo evaluation, 286 Multinomial distribution, 249-257, 259 Compositions, 250 Conditional distributions, 251, 252 Generating function and characteristic function, 254, 255 Introduction, 249 Limit theorems, 256, 257 Marginal distributions, 250, 251 Moments, 252-254 Notations, 250 Multinomial trials, 249-251 Multiple regression, 254 Multivariate normal distribution, 257, 259-268 Bivariate normal distribution, 265-268 Distributions of sums, 262 Independence of components, 263 Introduction, 259, 260 Linear combinations of components, 262, 263 Linear transformations, 264, 265 Marginal distributions, 262 Notations, 260, 261 Negative binomial distribution, 69,
303
70, 73-81, 89, 102, 209, 228 Convolutions and decompositions, 76-80 Generating function and characteristic function, 74 Introduction, 73, 74 Limiting distributions, 81 Moments, 74-76 Notations, 74 Tail probabilities, 80 Negative hypergeometric distribution, 102, 103 Neyman-Pearson approach, 283 Noncentral distributions, 246 Chi-square, 246 F , 246 t , 246 Noncentrality parameter, 246 Normal distribution, 8, 39, 53, 61, 76, 81, 91, 96, 170, 182, 188, 207, 209-235, 239-241, 246, 257, 266, 279, 282, 286 Asymptotic relations, 228 Bernstein’s theorem, 221-223 Characteristic function, 214, 215 Conditional distributions, 219, 220 Convolutions and decompositions, 217-219 Darmois-Skitovitch’s theorem, 224, 226 Entropy, 211 Helmert’s transformation, 226, 227 Identity of distributions of linear combinations, 227, 228 Independence of linear combinations, 220, 221 Introduction, 209, 210 Mode, 211 Moments, 215, 216 Notations, 210, 211 Shape characteristics, 217 Tail behavior, 212-214 Transformations, 229-234 Normalized extremes, 189-191
304
SUBJECT INDEX
Normalized sums, 228 Normalizing constants, 189-191 Order-preserving transformation, 117 Order statistics, 114-117, 128, 135, 139, 140, 157, 163, 164, 174-176, 187,205,226, 269 Orthogonal linear transformation, 264 Orthogonal matrix, 264 Pareto distribution, 133-138, 141, 283 Distributions of minimal Values, 134, 135 Entropy, 137 Introduction, 133 Moments, 136, 137 Notations, 133 Pascal distribution, 74, 102, 283 Pearson family of distributions, 281, 283, 284 Pearson plane, 184 Platykurtic distribution, 8, 53, 110, 125, 147, 154 Poisson distribution, 39, GO, 61, 81, 88-100, 179, 180, 209, 219, 228, 256, 284, 285 Conditional probabilities, 94, 95 Convolutions, 92 Decompositions, 92 Generating function and characteristic function, 90 Introduction, 89 Limiting distribution, 96 Maximal probability, 95 Mixtures, 96-99 Moments, 90, 91 Notations, 89 Rao-Rubin characterization, 99 Tail probabilities, 91, 92 Poisson-stopped-sum distribution, 98-100 Polar coordinates, 229 Polar transformation, 229 P6lya criterion, 126 P6lya distribution, 101, 102, 284
P6lya urn scheme, 102, 103 Power, 246 Power distribution, 127-133, 135, 136, 140, 147, 149 Characteristic function, 131 Distributions of maximal Values, 128 Entropy, 131 Introduction, 127 Moments, 129 Notations, 127 Probability integral tmnsform, 112, 155, 203 Probability measure, 1, 15 Probability space, 1, 3, 15 Quadratic form, 259-261 Quantile density function, 3 Quantile function, 3, 242 Queueing theory, 39 Random number generation, 30, 112 Random processes, 152 Rao-Rubin characterization, 99, 285 Rayleigh distribution, 192, 229, 232, 277, 285 Rectangular distribution, 108 Recurrence relation, 131, 160, 175177, 180 Regression, 21, 253, 268 Reliability problems, 196 Reproductive property, 231 Restricted generalized Poisson distribution, 100 Reverse J-shaped, 160 Riemann zeta function, 200 Robustness, 286 Run-related distributions, 72
Sg distribution, 281 Su distribution, 281
Same type of distribution, 4, 6, 30 Sample mean, 225, 238, 240, 246 Sample variance, 225 Scale parameter, 4, 30, 127, 134, 158, 169, 173, 184, 185, 190, 198, 211, 216, 272 Sech-squared distribution, 198
SUBJECT INDEX Second law of Laplace, 211 Service times, 39 Shape parameter, 127,134,182,184, 286 Shepp’s result, 230 a-algebra, 1, 15 Skewed distribution, 7, 8, 66, 76, 91, 125, 147, 160, 182,206, 207, 238 Negatively, 8, 125, 147, 206, 207 Positively, 8, 66, 76, 91, 125, 147, 160, 182, 206, 238 Skewness, 7, 8, 44, 52, 53, 66, 76, 91, 110, 125, 129, 147, 154, 160, 172, 182, 201, 238, 241, 283 Snedecor distribution, 286 Spacings, 146, 163 Stable distribution, 14, 121, 214, 233, 236 Standard deviation, 233 Statistical modelling, 3 Stuttering Poisson distribution, 98 Symmetric distribution, 8, 44, 140, 154, 169, 171-173, 176, 201, 205, 207, 241 Symmetric positive definite matrix, 259, 260 Symmetric uniform distribution, 110, 126 Tail probabilities, 59, 64, 80, 91, 92
t distribution, 8, 232, 240-245, 280,
286 Theorem of total probability, 57 Theory of errors, 211 Transformation, 24, 25, 121, 140, 141, 151, 157, 170, 210, 229-234,264,265, 269-273, 275, 286 Triangular distribution, 111, 117, 123-126 Characteristic function, 125, 126 Introduction, 123 Moments, 124, 125 Notations, 123 Trinomial distribution, 250
305
Truncated distribution, 204 t test, 246, 280 Tukey’s lambda distribution, 118 Tukey’s symmetric lambda distribution, 118, 286 Two-tailed exponential distribution, 170 Type I11 line, 184 Type VI distribution, 245 Type VII distribution, 240 Uniform distribution, 37, 107-119, 123, 127, 128, 131, 133, 136, 139, 140, 147, 149, 163, 164, 186, 187, 189, 203, 229, 231, 269, 270, 277, 286 Characteristic function, 110 Convolutions, 111 Decompositions, 111 Distributions of niinima and rnaxima, 113 Entropy, 110 Introduction, 107 Moments, 108 Notations, 107 Order statistics, 114- 117 Probability integral transform, 112 Relationships with other distributions, 117, 118 Uniform spacings, 146, 269, 270 Unimodal distribution, 7 Urn interpretation, 84 U-shaped curve, 141, 151, 154 Variance, 5, 7, 20, 30, 40, 44, 50, 52, 72, 100, 103, 109, 115, 118, 124, 129, 137, 145, 154, 160, 164, 171, 182, 192, 203, 205, 211, 216, 217, 220, 227, 234, 239, 241, 245, 246, 255, 259, 268 Weibull-type distribution, 191, 229, 287 Weighted distribution, 285