Sequential Statistics
This page intentionally left blank
Zakhula Gouindarajulu University of Kentucky, USA
Seq uen...
198 downloads
1498 Views
12MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Sequential Statistics
This page intentionally left blank
Zakhula Gouindarajulu University of Kentucky, USA
Seq uential Statistics
NEW J E R S E Y
LONDON
\:
World Scientific
SINGAPORE
BElJlNG
SHANGHAI
HONG KONG
TAIPEI
CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK once: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-PublicationData A catalogue record for this book is available from the British Library.
SEQUENTIAL STATISTICS Copyright 0 2004 by World ScientificPublishing Co. Re. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in anyform or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permissionfrom the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-905-9
Printed in Singapore.
“Education without wisdom and wisdom without humility are husks without grain.”
Sri Sathya Sai Baba
This page intentionally left blank
Dedicated
To the Memory of My parents
This page intentionally left blank
Preface Sequential statistics is concerned with treatment of data when the number of observations is not fixed in advance. Since Wald (1947) wrote his celebrated book, the subject has grown considerably, especially in the areas of sequential estimation and biostatistics. The author’s book THE SEQUENTIAL STATISTICAL ANALYSIS OF HYPOTHESIS TESTING, POINT AND INTERVAL ESTIMATION, AND DECISION THEORY (available from American Sciences Press, Inc., 20 Cross Road, Syracuse, New York 13224-2104, U.S.A.), ISBN 0-935950-17-6, is a comprehensive 680 page reference to the field of sequential analysis; everyone reading the present new work will likely want to see that there is at least a copy of that comprehensive book in their institution’s library, and many serious researchers will want a copy in their personal libraries. Of that previous book, reviewers said “There are plenty of examples and problems” and “The presentation is clear and to the point.” In contrast, the present new book is designed for a semester’s course and is thus less than half the length of the previous book. Other books by Ghosh (1970), Siegmund (1985)) Wetherill and Glazebrook (1986) and Ghosh, Mukhopadhyay and Sen (1997) are either too theoretical or limited in scope. It is so easy for an instructor to get side-tracked and bogged down with details and running out of time to cover interesting topics. In this new version, I have tried to select only those topics that can be covered in a semester’s course. Still, the instructor may not be able to cover all the topics in the book in one semester. Thus he has some flexibility in the choice of topics. Straightforward and elementary proofs are provided and for more details the reader is referred to the earlier book of the author. Thus, the mathematical and statistical level of the book is maintained at an elementary level. This book is geared to seniors and first year graduate students who have had a semester’s course in each of advanced calculus, probability and statistical inference. A semester’s course can be based on chapter 1, chapter 2 (excluding section 2.7) chapter 3 (excluding sections 3.7 and 3.8) chapter 4 ( excluding section 4.9) and chapter 5 (excluding sections 5.5 and 5.6). The instructor might devote three 50-minute lectures to chapter 1, ten lectures to chapter 2, nine lectures to each of chapter 3 and 4, and five lectures in chapter 5, with the remaining lectures devoted to sections of his/her and student’s interests. 1x
PREFACE
X
The chapter on applications to biostatistics is new and the supplement containing computer programs to certain selected sequential procedures is also provided. Useful illustrations and numerical tables are provided wherever possible. Problems identified by the section to which they pertain are given at the ends of all chapters. An extensive list of references that are cited in the book is given at the end. This list of references is by no means complete. April 2004
Z. Govindarajulu Professor of Statistics University of Kentucky Lexington, KY
Acknowledgments I have been inspired by the celebrated book on this topic by the late Abraham Wald and I am grateful to Dover Publications for their kind permission for my use of certain parts of Wald’s book as a source. I am equally grateful to the American Sciences Press for permission to use several sections of my earlier book as a source for the present book. I am grateful to Dr. Hokwon Cho of the University of Nevada, Las Vegas for putting the entire manuscript on LaTex and cheerfully making all the necessary changes in the subsequent revisions. I am thankful to the Department of Statistics for its support and other help. I thank Professors Rasul Khan of Cleveland State University and Derek S. Coad of the University of Sussex (England) for their useful comments on an earlier draft of the manuscript. I also wish to express my thanks to Ms. Yubing Zhai and Ms. Tan Rok Ting, editors of World Scientific Publishing Co. for her encouragement, cooperation and help. My thanks go to the American Statistical Association for its kind permission to reproduce table 3.9.3 and tables 3.9.1 & 3.9.2 from their publications, namely, the Journal of American Statistical Association Vol. 65 and the American Statistician Vol. 25, respectively. To Blackwell Publishing Ltd. for its kind permission to reproduce tables 2.7.1 and 2.7.2 from the Journal of Royal Statistical Society Series B Vol. 20 and tables 3.2.18~3.2.2 and table 3.3.lfrom the Australian Journal of Statistics , Vols 31 and 36 respectively. To the University of Chicago Press for their kind permission to reproduce data set from Olson & Miller, Morphological Integration, p. 317 and to have brief excerpt from Kemperman, J. H. B. (1961)) the Passage Problem for a stationary Markov Chain. To Springer-Verlag GMbH & Co. to use Theorem 8.25 and Corollary 8.33 of Siegmund, D. 0.)(1985) Sequential Analysis as a source for Theorem 3.4.4 of the present book. To Professor Erich Lehmann for his kind permission to use his book Testing Statistical Hypotheses (1959) as a source for the proof of Theorem 2.9.1 and the xi
xii
ACKNOWLEDGMENTS
statement of Theorem 2.9.2. To Oxford University Press for its permission of the Biometrika Trustees to reproduce Table 1 of Lindley and Barnett (1965) Biometriku, Vol. 52, p. 523. To Professors Donald Darling and Mrs. Carol Robbins for their kind permission for reproducing Table 3.8.1 from Proceedings of the Nut. Acud. Sciences Vol. 60. To Professor Thomas Ferguson for his kind permission to use his distribution as Problem 2.1.6. To CRC Press for its kind permission to use Sections 1.2 and 1.3 of B. Wetherill (1975) as a source for sections 1.1 and 1.2 of this book. To the Institute of Mathematical Statistics for its kind permission to reproduce Tables 2.6.1, 2.10.1, 2.10.2 and 3.8.1 from the Annuls ofMuthemutica1 Statistics and Annals of Statistics. To Francis Taylor Group for their kind permission to reproduce tables 5.4.1 and 5.4.2 from Stu&stics Vol. 33. To John Wiley & Sons, for their kind permission to use Whitehead (1983) sections 3.7 and 3.8 as a source for section 5.6 of this book.
Contents Preface
ix
Acknowledgments
xi
1 Preliminaries 1.1 Introduction to Sequential Procedures . . . . . . . . . . . . . . . 1.2 Sampling Inspection Plans . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Sample Size Distribution . . . . . . . . . . . . . . . . . . . . 1.3 Stein’s Two-stage Procedure . . . . . . . . . . . . . . . . . . . . . . 1.3.1 The Procedure . . . . . . . . . . . . . . . . . . . . . . . . .
.
1 1 3 3 6 7
2 The Sequential Probability Ratio Test 11 2.1 The Sequential Probability Ratio Test (SPRT) . . . . . . . . . . . 11 2.2 SPRT: It’s Finite Termination and Bounds . . . . . . . . . . . . . 13 2.3 The Operating Characteristic Function . . . . . . . . . . . . . . . . 19 21 2.4 The Average Sample Number . . . . . . . . . . . . . . . . . . . . . 29 2.5 Wald’s F’undamental Identity . . . . . . . . . . . . . . . . . . . . . 2.5.1 Applications of the F‘undamental Identity . . . . . . . . . . 30 2.6 Bounds for the Average Sample Number . . . . . . . . . . . . . . . 33 2.7 Improvements to OC and ASN Functions . . . . . . . . . . . . . . 36 2.7.1 The OC F‘unction . . . . . . . . . . . . . . . . . . . . . . . . 36 2.7.2 The Average Sample Number . . . . . . . . . . . . . . . . . 38 40 2.8 TruncatedSPRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Optimal Properties of the SPRT . . . . . . . . . . . . . . . . . . . 45 2.10 The Restricted SPRT . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.11 Large-Sample Properties of the SPRT . . . . . . . . . . . . . . . . 51 54 2.12 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Tests for Composite Hypotheses 3.1 Method of Weight Functions . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Applications of the Method of Weight Functions . . . . 3.2 Sequential t and t 2 Tests . . . . . . . . . . . . . . . . . . . . . . .
xiii
59 59 . . 60 61
CONTENTS
XiV
3.3 3.4
3.5
3.6
3.7
3.8 3.9
3.10 3.11
3.2.1 Uniform Asymptotic Expansion and Inversion for an Integral 63 3.2.2 Barnard’s Versions of Sequential t- and t2-tests . . . . . . . 65 3.2.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . 65 3.2.4 Asymptotic Normality of the Statistic T . . . . . . . . . . . 66 3.2.5 Finite Sure Termination of Sequential t- and t2-tests . . . . 69 3.2.6 Sequential t2-test (or t-test for Two-sided Alternatives) . . 71 3.2.7 The Sequential Test T . . . . . . . . . . . . . . . . . . . . . 73 3.2.8 An Alternative Sequential Test T’ . . . . . . . . . . . . . . 74 Sequential F-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3.1 Inversion Formula . . . . . . . . . . . . . . . . . . . . . . . 77 Likelihood Ratio Test Procedures . . . . . . . . . . . . . . . . . . . 79 3.4.1 Generalized Likelihood Ratio Tests for Koopman-Darmois Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Testing Three Hypotheses about Normal Mean . . . . . . . . . . . 90 3.5.1 Armitage-Sobel-Wald Test . . . . . . . . . . . . . . . . . . . 90 3.5.2 Choice of the Stopping Bounds . . . . . . . . . . . . . . . . 93 3.5.3 Bounds for ASN . . . . . . . . . . . . . . . . . . . . . . . . 94 3.5.4 Testing Two-sided Alternatives for Normal Mean . . . . . . 96 The Efficiency of the SPRT . . . . . . . . . . . . . . . . . . . . . . 99 3.6.1 Efficiency of the SPRT Relative to the Fixed-Sample Size Procedure at the Hypotheses . . . . . . . . . . . . . . . . . 99 3.6.2 Relative Efficiency at 8 # 8 0 , 81 . . . . . . . . . . . . . . . 102 3.6.3 Limiting Relative Efficiency of the SPRT . . . . . . . . . . 105 Bayes Sequential Procedures . . . . . . . . . . . . . . . . . . . . . . 106 3.7.1 Bayes Sequential Binomial SPRT . . . . . . . . . . . . . . . 106 3.7.2 Dynamic Programming Method for the Binomial Case . . . 111 3.7.3 The Dynamic Programming Equation . . . . . . . . . . . . 112 3.7.4 Bayes Sequential Procedures for the Normal Mean . . . . . 114 Small Error Probability and Power One Test . . . . . . . . . . . . 117 Sequential Rank Test Procedures . . . . . . . . . . . . . . . . . . .123 3.9.1 Kolmogorov-Smirnov Tests with Power One . . . . . . . . . 123 3.9.2 Sequential Sign Test . . . . . . . . . . . . . . . . . . . . . . 124 3.9.3 Rank Order SPRT’s Based on Lehmann Alternatives: TwoSample Case . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3.9.4 One-Sample Rank Order SPRT’s for Symmetry . . . . . . . 130 138 Appendix: A Useful Lemma . . . . . . . . . . . . . . . . . . . . . . 139 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Sequential Estimation 4.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . 4.2 Sufficiency and Completeness . . . . . . . . . . . . 4.3 Cram&-Rao Lower Bound . . . . . . . . . . . . . .
143
......... ......... .........
143 144 152
CONTENTS
xv
4.4 Two-Stage Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.4.1 Stein’s Procedure for Estimating the Mean of a Normal Distribution with Unknown Variance . . . . . . . . . . . . 158 4.4.2 A Procedure for Estimating the Difference of Two Means . 162 4.4.3 Procedures for Estimating the Common Mean . . . . . . . 164 4.4.4 Double-Sampling Estimation Procedures . . . . . . . . . . . 167 4.4.5 Fixed Length Confidence Intervals Based on SPRT . . . . . 173 4.5 Large-Sample Theory for Estimators . . . . . . . . . . . . . . . . . 182 4.6 Determination of Fixed-width Intervals . . . . . . . . . . . . . . . . 191 4.7 Interval and Point Estimates for the Mean . . . . . . . . . . . . . . 196 4.7.1 Interval Estimation for the Mean . . . . . . . . . . . . . . . 196 4.7.2 Risk-efficient Estimation of the Mean . . . . . . . . . . . . 201 4.8 Estimation of Regression Coefficients . . . . . . . . . . . . . . . . . 203 4.8.1 Fixed-Size Confidence Bounds . . . . . . . . . . . . . . . . 203 4.9 Confidence Intervals for P(X
5 Applications to Biostatistics 5.1 The Robbins-Monro Procedure . . . . . . . . . . . . . . . . . . . . 5.2 Parametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Up and Down Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Spearman-Karber (S-K) Estimator . . . . . . . . . . . . . . . . . 5.5 Repeated Significance Tests . . . . . . . . . . . . . . . . . . . . . . 5.6 Test Statistics Useful in Survival Analysis . . . . . . . . . . . . . 5.7 Sample Size Re-estimation Procedures . . . . . . . . . . . . . . . 5.7.1 Normal Responses . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Formulation of the Problem . . . . . . . . . . . . . . . . . 5.7.3 Binary Response . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
227 227 228 230 . 231 236 . 238 . 246 246 . 247 254 262
6 Matlab Programs in Sequential Analysis
265 265 269 . 269 . 271 . 273 275 277
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Sequential Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Sequential Probability Ratio Test (SPRT) . . . . . . . . . 6.2.2 Restricted SPRT (Anderson’s Test) . . . . . . . . . . . . . 6.2.3 Rushton’s Sequential t-test . . . . . . . . . . . . . . . . . 6.2.4 Sequential t-test . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Sequential t2-test . . . . . . . . . . . . . . . . . . . . . . . .
xvi
CONTENTS 6.2.6 Hall’s Sequential Test . . . . . . . . . . . . . . . . 6.2.7 Stein’s Two-Stage Procedure (Confidence Interval) 6.2.8 Stein’s Two-Stage Test . . . . . . . . . . . . . . . . 6.2.9 Robbins’ Power One Test . . . . . . . . . . . . . . 6.2.10 Rank Order SPRT . . . . . . . . . . . . . . . . . . 6.2.11 Cox’s Sequential Estimation Procedure . . . . . . 6.3 Distribution . . . , . . . . . . . . . . . . . . . . . . . . . .
. , . . 279 . , . , . 281 . . . . . 283 . . . . . 285 . . . . . 287 . , . . . 289 . . . . . 290 ,
Referenced Journals
293
Bibliography
295
Subject Index
311
Chapter 1
Preliminaries 1.1
Introduction to Sequential Procedures
Sequential procedures differ from other statistical procedures in that the sample size is not fixed in advance. The experimenter has the option of looking at a sequence of observations one (or a fixed number) at a time and decide whether to: stop sampling and take a decision; or to continue sampling and make a decision some time later. The order of the sequence of observations which the experimenter will take is specified in advance. Decision problems in which the experimenter may sequentially vary the treatments is of a higher order of difficulty and is called the sequential design problem. For example, consider the following problem.
Problem 1.1.1 If we wish to compare several drugs or treatments ( a s an sequential screening of cancer drugs), then it should be possible to drop some drugs out of the trials at an early stage if the results from these are very poor when compared with the others. Thus, an essential feature of a sequential procedure is that the number of observations required to terminate the experiment is a random variable since it depends on the outcome of the observations. Sequential procedures are of interest because they are economical in the sense that we may reach a decision earlier via a sequential procedure than via a fixed-sample size procedure. In sequential experiments we need to specify:
1. the initial sample size 2. a rule for termination of the experiment
3. the additional number of observations to take if the experiment is to be continued; and
1
2
CHAPTER 2. PRELIMINARIES 4. a terminal decision rule.
Notice that (2) and (3) can be combined into a single rule. Experiments in which only the number of observations is sequentially dependent, require simpler theory and will be of general applicability, than the sequential design problem in which not only the number of trials but also the number of treatments will be sequentially dependent. If the experiment has been continued until we observe X I ,X2, ...,X,, a sequential test is completely defined by specifying the disjoint subsets Rg, RL, and R& of m-dimensional Euclidean space R, for m = 1 , 2 , ... If ( X I ,X2, ...,X,) belongs to R g we accept the hypothesis H , we reject H when it belongs to RL and we continue sampling if it falls within region R&. Since the above sets are mutually exclusive and have union R,, it suffices to specify any two of the three sets. The basic problem is a suitable choice of these sets. The criteria for the choice of these sets will be dictated by the operating characteristic (OC) and the average sample number (ASN) functions which will be elaborated in the following. Suppose that the underlying distribution function is indexed by a real-valued parameter, and suppose that the statistician has to choose between two hypotheses, HO and H I . The OC(8) is defined as the probability of accepting HO when 8 i s the value of the parameter. It is desirable that the OC function should be high for values of 6’that are consistent with HO and low for values of 8 that are consistent with HI. For instance, one may require OC(8) 2 1 - a for all 8 in HO and OC(8) 5 ,B for 8 in H I , where a and p denote the error probabilities. A sequential test S is said to be admissible if its OC function meets the above criteria. As noted earlier, the number of observations required by a sequential procedure is a random variable and of much interest is its expected value when 8 is the true value of the parameter. This expected value is typically a function of 8, and is called the ASN function. It is desirable to have a small ASN function for given a and p. We also desire the expected sample size to be smaller than those required by the fixed-sample size procedure. Let u(8lD) denote the expected sample size of procedure D when 8 is the true value of the parameter. If DO is admissible and if u(OID0) = minDu(B(D) then DO is considered to be a LLuniformlybest” test. However, in general, no uniformly best test exists. It is possible to find an optimal sequential procedure when HO and H I are simple hypotheses. Wald’s sequential probability ratio test (SPRT) gives the minimum ASN at both HO and H I . The efficiency of a procedure D at 8 is defined as the ratio of the minimum expected sample number at 8 to the expected sample number of D at 8. Wald’s SPRT has efficiency equal to 1 at both Ho and H I .
1.2. SAMPLING INSPECTION PLANS
3
Sampling Inspection Plans
1.2
The earliest sequential procedure is the double sampling plan of Dodge and Romig (1929) for sampling inspection. A lot consisting of n items and rejecting (accepting) the lot if the number of defectives in the sample is (<) c. The drawback of this scheme is that we might have had more than c defectives earlier than sample size n. An alternative scheme is: sample one item at a time, reject the lot as soon as the number of defectives in the sample is 2 c, and accept the lot as soon as the number of effective items in the sample is 2 n - c 1. The required sample size is at least c and is at most n. This scheme is called curtailed inspection.
>
+
1.2.1 Sample Size Distribution Let N denote the random sample size required to terminate the experiment. Then
Pe ( N = c and reject Ho) = P , Po ( N = c
(1.2.1)
+ r and reject Ho) = ('z-~ ' ) o c ( ~ - O > T , r = 1 , 2)..,n - c, (1.2.2)
+
Po (N= n - c + 1 s and accept Ho) = (n-;+S)s.(l
-o)n-c+l,
s = O , l , . . , c - 1. Now
(1.2.3)
n
Ee (N) =
C
mpm,
m=l
where pm denotes the probability that a decision is reached at the mth trial. (Note that P ( N = mlreject H o ) = 0 for m < c, and P ( N = mlaccept Ho) = 0 for rn < n - c + 1). Further pm
=
(reject at stage m)
+ P (accept at stage m,m 2 c)
(1.2.4)
4
CHAPTER 1. PRELIMINARIES
Hence
(1.2.5)
)8'.
c- 1
xL-(n-c:l+r
(1.2.6)
One should prefer the curtailed sampling plan to an equivalent single sampling plan because E (Nle) for the former lies below the sample size for the single sampling plan. Consider the case c = 1. Then n-1
E (Np) =
8C ( r + 1)(1 - 8)' + n (1 - 8)n r=o
n-1 r=o
n-1
n
n-1
n-1
r=o
j=l
r=O
j=O
(1.2.7) which is increasing in y. Hence E (Nl8) is decreasing in 8 when c = 1. However, this is not true for c > 1. (See Table 1.2.1 and the case c = 4). Table 1.2.1 Giving E (N|O) for various of n, c, and
c=l .01 -10 .20 -30 .40 .50
9.56 6.51 4.46 3.24 2.48 2.00
18.20 8.78 4.94 3.33 2.50 2.00
c=2 22.22 9.28 4.98 3.33 2.50 2.00
9.07 8.76 7.45 6.03 4.86 3.97
10 19.06 24.01 7.07 14.73 16.49 7.74 9.58 9.84 8.34 6.64 6.66 8.50 5.00 5.00 8.13 4.00 4.00 7.39
C=4 20 17.17 18.10 16.15 12.77 9.94 8.00
25 22.22 22.58 18.02 13.17 9.99 8.00
5
1.2. SAMPLING INSPECTION PLANS Let
PI (0) = P(accept lot using the fixed sample procedure 10) (1.2.8) and P2
(8) = P (accept lot using the sequential rule 10) n
=
P(accept lot and N = mle) m=n-c+l
)
c- 1
-
(1- q n - c + l
(1.2.9)
er.
Then we have following Lemma. Lemma 1.2.1 PI (0) = P 2 (0) f o r all n and c.
Proof. For c = 1, Pl (0) = P 2 (e) = (1- qn. For c = 2, p1 (e) = p2(e) = ( I - qn ne (1 - q n - l . Now assume it is true for any c and consider the case c
+
+ 1. That is, assume
and we wish to show that r+n-c-1 k=O
r=O
r
)
er.
(1.2.11)
Subtract (1.2.10) from (1.2.11) and cancelling a common factor (1 - e)n-c, it suffices to show that
6
CHAPTER 1. PRELIMINARIES
or
or
or
c- 1
O=-E
(r+;:;-
l)er
r=o
+
2
( s + ns --1c -
y
3
,
which is obviously true. H
Remark 1.2.1 Lemma 1.2.1 can also be established by showing that all the sample paths leading t o accepting the lot are exactly the same in both the sampling schemes.
1.3
Stein’s Two-stage Procedure
In this section we present a certain hypothesis-testing problem for which fixedsample and meaningful procedures do not exist. However, a two-stage procedure has been given for the same problem. Consider the following problem. Let X be distributed as normal with mean 8 and variance u2,where 8 and u2 are both unknown. We wish to test HO : 8 = 80 against the alternative hypothesis H I : 8 > 8 0 ; this is known as Student’s hypothesis. It is well-known that given a random sample X l , X 2 , ...)X,, the uniformly most powerful unbiased test of HO against H I is to reject HO when
T=
x
(X- 60) S
Jn > in-1,l-a
(1.3.1)
and s denote the mean and the standard deviation of the observed Xi’s where and tn-l,l-a denotes the l O O ( 1 - a)th percentile of the &distribution with n - 1 degrees of freedom. If 1 - 7r (8,u) denotes the power of the test in (1.3.1), then 7r ( 8 0 , ~ )= 1- a , irrespective of the value of u. However, when one is planning an experiment, one is interested in knowing the probability with which the statistical test will detect a difference in the mean when it actually exists. However, the power function of “Student’s’’ test depends on u which is unknown. Hence, it is of interest to devise a test of HOversus H I , the power of which does not depend
1.3. STEIN’S TWO-STAGE PROCEDURE
7
on CT. However, Danzig (1940) has shown the nonexistence of meaningful fixedsample test procedures for this problem. Stein (1945) proposed a two-sample (or two-stage) test having the above desirable property, where the size of the second sample depends on the outcome of the first.
1.3.1 The Procedure A random sample of size no observations X I ,X2, ...,Xn,, is taken and the variance o2 is then estimated by
(1.3.2) Then calculate n as n=max{
I:[
+l,no+l},
(1.3.3)
where z is a previously specified constant and [y] denotes the largest integer less than y, and draw additional observations Xno+l,Xno+2,...,X n . Evaluate, according to any specified rule that depends only on s2, real numbers ai (i = 1,2, ...,n) such that n
n
Cai = 1, a1 = a2 = ... = an,,
s 2 x a : = z.
i=l
This is possible since
(1.3-4)
i=l n
min>af . _
2=
1
1 z 5n s2
=-
by (1.3.3), the minimum being taken subject to the conditions a1 1,al = a2 = ... = an,.
(1.3.5)
+ a2 + ...+a,
=
Define TI by
(1.3.6) Then
is such that
CHAPTER 1. PRELIMINARIES
8
Also, it is well-known that V = (no - 1)s 2 / a 2is distributed as central chisquare with no - 1 degrees of freedom. uls2
Hence
normal (0, oz/s2> .
S
~ - 1 s ~normal (0,1> .
a
S
S
Since the distribution of U-ls2 does not involve s, we infer that U - is uncondia U tionally distributed as normal(0,l) and is independent of s2. Consequently (1.3.7) If f ( z , y ) denotes the joint density of U s / a and s2, then
where g (x) is the density of U s / u because
So, U s / o and s2 are stochastically independent. i.e., U has the t-distribution with no - 1 degrees of freedom irrespective of the value of 0.Hence,the test based on T‘ is unbiased and has power free of u. Then in order to test for the one-sided alternative 8 > 8 0 , the critical region of size a is defined by (1.3.8) The power function is then
Analogues critical region, with similar power function independent of a, holds for the two-sided alternative: 8 # 8 0 . As mentioned earlier, the above test is not used in practice. However, a simpler, and slightly more powerful, version of the test is available, as we now show. (Intuitively Stein’s test wastes information in order to make the power of the test strictly independent of the variance.) Instead of (1.3.3), take a total of (1.3.10)
9
1.3. STEIN'S TWO-STAGE PROCEDURE observations and define
(1.3.11) One can easily establish that U1has a &distribution with no - 1 degrees of freedom. Since n > s2/q we have I(e - 0 0 ) f i l s 1 > I(B - 0 0 ) /*I. So, if we employ critical region TI' > tno-l,l--(Y instead of (1.3.8) the power of the test will always be increased. Also, the number of observations will be reduced by 1 or left the same. Suppose we want the power to be 1 - p when 6' = 00 6 where 6 is specified. Then power at 00 6 is
+
+
=
1-p,
provided tno-l,l--(Y - S f i / s = Now solving for n we obtain
-i!no-l,l-p
where
X
denotes the sample mean.
(1.3.12) Similarly in the two-sample case let X = d normal(pl, 0 2 ) ,Y = d normal(p2, 0 2 ) and X , Y be independent. Suppose we wish to test HO : p1 = p2 versus H I : p2 > p l . Suppose we wish to have error probability a when HOis true and power 1 - p when p2 - p l = 6. In the first stage we observe (X1,X2,...,Xno) and
...,Ym)and compute
(Y1,Y2,
(1.3.13) Then the total sample size to be drawn from each population is n = max (n',no),
CHAPTER 1. PRELIMINARIES
10 where
2
nI =
ha-l),l-a
s2+ t2(no-1),1-p]
2
( 1.3.14)
Moshman (1958) has investigated the proper choice of the initial sample size no in Stein’s two-stage procedure and believes that an upper percentage point of the distribution of the total sample size, n when used in conjunction with the expectation of the sample size, is a rapidly computable guide to an efficient choice of the size of the first sample. However, the optimum initial sample that maximizes a given function involves an arbitrary parameter which has to be specified by the experimenter from non-statistical considerations. If the initial sample size is chosen poorly in relation to the unknown a2,the expected sample size of Stein’s procedure can be large in comparison to the sample size which would be used if o2 were known (which it is not). For example, this can occur if o2 is very small; then (if o2 were known) a small total sample size would suffice, but one may use no much larger (hence being inefficient). However, this problem is not of practical significance.
Chapter 2
The Sequential Probability Ftatio Test 2.1
The Sequential Probability Ratio Test (SPRT)
During World War 11, Abraham Wald and others began working on sequential procedures and developed what is called the Sequential Probability Ratio Test procedure, which can be motivated as follows: Neyman and Pearson (1933) have provided a method of constructing a most powerful test for a simple versus simple hypothesis-testing problem. Suppose X has p.d.f. f (x;0) and we wish to test HO : 8 = 80 against H I : 8 = 81. Lemma 2.1.1 (Neyman and Pearson, 1933). Let X I )X2) ...)X n be a random sample and also let
Then the most powerful test of HO against H I is obtained by rejecting HO if An 2 K , and accepting Ho i f An < K , where K is determined by the level of significance. Wald proposed the following sequential probability ratio test which was obviously motivated by Lemma 2.1.1: Choose two constants A and B such that 0 < B < A < 00, and accept Ho if An 5 B ; reject Ho if An 2 A; continue sampling if B < An < A when the experiment has proceeded up to stage n (n = 1,2, ...).
Example 2.1.1 Consider the exponential family
where Q (0) is monotonically increasing in 0.
11
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
12
For this family the graph is as shown in Figure 2.1.1.a. However, at stage n the rule is equivalent to continue sampling if n
C l + n D < X R ( X i )
where C1, C2 and D depend on 80 and 01. We can draw the parallel lines which correspond to the linear functions of n in the extremes of the above inequality. In practice, one can then simply cumulate the R ( X ; ) graphically as a function of n as the sampling proceeds (see Figure 2.1.1.b.).
(a 1
.
-
Ib)
Figure 2.1.1 The Process of Sampling
As special cases of Example 2.1.1 consider: Example 2.1.2 (Exponential distribution) Let
f (x;6) = 8-l exp (+e)
, x > 0, 8 > 0.
We wish to test HO : 00 = 80 versus HI : 00 = 01
(01
> 00). Then
The continue-sampling inequality after taking the nth observation is
(k &) -
[InB+nln
(31
n
[InA+nln < Xi=lX i < (M) 8081
($)I.
2.2. SPRT: IT’S FINITE TERMINATION AND BOUNDS
13
Example 2.1.3 For the binomial distribution a SPRT for HO: 8 = 80 versus HI : 0 = 81, (01 > 00) is defined by two constants B and A. After n observations, we continue sampling if
where m is the number of defectives or successes ( X i = 1) among the n observations. Alternatively, at stage n the continue-sampling region is: Q
+ s n < m < c1+ sn
where
co
=
In B InK’C1=-
In A 81 (1 - 00) 1nK’ K = 00 (1- 01) -
In the plane of n and m, the continue-sampling region lies between two lines having common slopes and intercepts ~0 and c1. Each sample point (n,m) when plotted in this plane, has integer-valued coordinates. Two procedures, defined by are equivalent if there is no point (n,m) pairs of intercepts (q,c1) and (4,~;) n 2 m 2 0, between the lines y = q sx and y = c; sx and between the lines y = c1+ sz and y = c;I sx. Anderson and Fkiedman (1960) have shown that if the slope is rational there is a denumerable number of SPRT’s, and if the slope is irrational, there is a nondenumerable number. Let s = M / R where M and R are relatively prime integers. Then a point (n,m) is on the line y = c sx for a value of c = (mR - n M ) / R which is rational. All the lines required for defining SPRT’s in this case have intercepts of the form shown above. There is a denumerable number of such lines, and hence a denumerable number of pairs of such lines.
+
+
+
+
2.2
SPRT: It’s Finite Termination and Bounds
The reason we resort to a sequential procedure is that we may be able to terminate the experiment earlier than a fixed-sample size procedure. Then we should ensure that the sequential procedure will terminate finitely with probability one. Towards this we have the results of Stein (1946) and Wald (1947). Theorem 2.2.1 Let Z = In f ( X ;01) / f ( X ;00)) where we are interested in testing test HO : f (z) = fo (z) versus HI : f (z) = f1 (z) . Then Wald’s S P R T terminates finitely with probability one provided P ( Z = 0) < 1.
14
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
We will omit part of the proof because it is somewhat technical. When we are dealing with a family of densities indexed by a parameter 8, Z = 1n{f (z; 81) /f (z;&)}, where f ( q 0 0 ) and f(z;01) are hypothesized density functions under HO and HI respectively. In general, it can be shown that if N is the stopping time of the SPRT, P ( N > k r ) 5 (1- q‘>”, where k and r are positive integers. Since P ( Z = 0) < 1, there exist a d > 0 and q > 0 such that either P (2> d ) 2 q or P (2 < -d) 2 q. In the former case choose the integer r such that r d > In ( A / B ). In the latter case choose the integer r such that - r d < -In ( A / B ). Now 00
{ N = 00) =
n
{N
> n} where {N > n }
n=l
is monotone decreasing. Hence
P ( N is not finite) = =
lim P ( N > n)
n-wa
lim ( N > k r ) .
k-mo
Note that {P( N > n ) } is a monotone decreasing sequence of probabilities bounded below, and hence has a limit. This limit is also the limit of any subsequence, in particular, the sequence consisting of every r t h element of the original sequence. Thus, lim P ( N > k r ) 5 lim (1- qr)k = 0. k400
k+00
Remark 2.2.1 Wald (1947, pp. 157-158) has established Theorem 2.2.1 under the assumption that vur (2)is positive. Next, we shall explore whether it is possible to solve for A and B explicitly for specified a and p. We have
a = P(reject Ho(H0) 00
=
CPH,( B < Rj < A , j = 1 , 2,...,i - 1 and Ri 2 A ) , i= 1
p
= P(accept
HolH1)
00
= ~ i= 1
P ( BH < Rj <~A , j = 1 , 2,..., i - 1 and Ri 5 B ) .
2.2. SPRT: IT'S FINITE TERMINATION AND BOUNDS
15
However these expressions do not easily lend themselves to evaluate A and B in general, where
Theorem 2.2.2 For Wald's SPRT A 5 (1 - p) /a! and B 2 p/ (1- a!). Proof. Let X = ( X I ,X 2 , ...,X k ) and let E k be the set of all points (in kdimensional Euclidean space R k ) for which we reject HOusing the SPRT. Also, let Fk be the set of all points for which we accept Ho. Notice that ( E k ,k = 1,2, ..) are mutually disjoint and (Fk,k = 1,2, .->are also mutually disjoint (draw pictures in R1 and R2). Assume, without practical loss that P H ~ ( { U E k } U { U F k } ) = 1,i = 0 and 1. That is, P ( N = ca) = 0 which is satisfied when P ( 2= 0 ) < 1 (see Theorem 2.2.1). Notice that 2 will be identically zero if and only if f 1 (x) and fo (z) agree at each point x which can occur. The mild condition P ( Z = 0) < 1 will be satisfied provided the random variable X is not concentrated on the set of points x for which f 1 (z) = fo (x). Then 00
Since fi (x) 2 Afo (x) holds at every point x
EEk,
we obtain
00
(2.2. I)
Hence
1
a!
-A> - 1 - p '
(2 * 2.2)
Similarly
since 1- a! = PH, (accept Ho) = CEl PH, (Fk). Consequently (2.2.3)
16
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
Corollary 2.2.1 A = (1- p) /a and B = (1- B ) / ( A - B ) and p = B ( A - 1)/ ( A - B ) .
p/ (1- 0)
imply that a =
Remark 2.2.2 In obtaining Wald’s bounds for A and B it is assumed that the SPRT terminates finitely with probability one. However, when p H i ( { u & } U { U F k } ) 5 1, i = 0 and 1, the last equality in (2.2.1) can be replaced by the inequality 5. Hence, the inequalities relating A and B and the error probabilities constitute approximate bounds irrespective of whether termination is certain or not. Also B = p and A A l / a can be reasonable bounds. The inequalities obtained here are almost equalities since A, does not usually attain either a value far above A or a value far below B. So, suppose we take A = A‘ f (1 - p> / a and B = B‘ -= p/ (1- a).When we use the approximate values A’ and B‘ for A and B respectively, we may not have the same probabilities of Type I and Type I1 errors, namely a and p. Let the effective probabilities of error be a’ and p’. Consider the SPRT based on (B ,A ’). Then there may be some sequences which call for rejection of HO by the test based on ( B , A ) and for acceptance of HO by the test (B,A’). So a‘ 5 a,p’ 2 p for (B,A’). Similarly a‘ 2 a , p‘ 5 ,O for the SPRT (B’,A ) . However, if the SPRT (B’,A’) is used instead of ( B ,A ) it is not obvious whether the error probabilities are decreased or increased. However
i.e., a(1-p’) p(1-a’)
2 a’(1-P) L: P’(1 - a )
-
Adding these two we obtain
a
+ p 2 a’+ P’.
That is, at most one of the error probabilities could be larger than the nominal error probability. Further,
a’
5
a’ <-- a - % ( l + p > (1 - p’) - (1- P I
P’I-- : P ( l + a ) , p’ 5 (1-a‘)
(1-a)
2.2. SPRT: IT’S FINITE TERMINATION AND BOUNDS
17
+
hence any increase in error size in a’ [p’] is not beyond a factor of 1 p [1+a]. These factors are close to unity when a and ,8 are small. If a = p = .05, then a‘ = p‘ 5 .0525. If both a’ < a and p’ < p, it would usually mean the (B’,A’) test required substantially more observations than the ( B ,A) test. Since B 2 p/ (1 - a ) and A 5 (1 - a )/p, we have increased the continuesampling region. There are several reasons for believing that the increase in the necessary number of observations caused by the approximations to be only slight. First, the sequential process may terminate at the nth stage if f1 (x)/fo (x) 2 A or f1 (x)/fo (x)5 B. If at the final stage fl/fo 2 A were exactly equal to A or B, then the inequalities for A and B would be exact equalities. A possible excess of f l / f o beyond the boundaries A and B at termination of the test procedure is caused by the discontinuity of the number of observations. If n were continuous then f1/ fo would be continuous in n and the ratio could exactly achieve A and B at the time of the termination. Wald (1947, Section 3.9) has shown that increase in the expected sample number using the inequalities is slight. A nice feature of the SPRT is that the approximations to A and B are only functions of a and /3 and can be computed once and for all free of f;whereas the critical values in Neyman and Pearson formulations of fixed-sample procedures depend on f and a. So, in the SPRT no distributional problems are involved except where one is interested in finding the distribution of the number of trials required to terminate that experiment. However this is of secondary importance if we know that the sequential test o n the average leads to a saving in the number of observations. Note that when B = ,8/ (1- a ) and A = (1- a ) lp, it is trivial to show that B < 1 < A (since a ,8 cannot exceed unity).
+
Example 2.2.1 Let 8 be the probability of an item being defective. At the nth stage: take one more observation if
that is, if
or
nC + D’ < r < n C t D ,
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
18
where r denotes the number of defectives and C, D and D' are functions of A, B , 80 and 81. Example 2.2.2 Let 80 = 1/2, 81 = 0.8 in Example 2.2.1. Then
An =
(23)' (.2)n-r ( . 5 y (.5)n-r
Letting a = 0.2 and p = 0.2,
+
(8/5) An, if ( n ,)st trial results in a defective, (2/5) An, if (n ,)st trial results in a non-defective,
+
B = 2 / 8 = 1/4, A = 0.8/0.2 = 4. Suppose we observe D G G D D D D D D, where D denotes a defective item and G denotes a non-defective (or good) item. The continue-sampling region is nln(5/2) - 1 < < n In (5/2) + 1, 21n2 21n2 0 . 6 5 ~-~1 < r < 0.65n 1.
+
Hence, we reject HO on the ninth observation. Example 2.2.3 (Fixed-sample size procedures for Example 2.2.2) If a fixedsample size n were used above in Example 2.2.2, specifications are
P ( r > klOo = 1/2) = 0.2 P ( r 5 kI& = 0.8) = 0.2,
i.e., @
@
A
[
IC - 0'8n] ( .16n)1/2
0.8 = @ (0.84)
= 0.2 = @ (-0.84)
i.e. ,
Ic - n/2 = 0.84&/2 and k - 0.8n = -0.84&(0.4)
,
or 0.3n = 0 . 8 4 6 ( 0 . 9 ) , = 0.84(3) = 2.52,
Jn
that is, n A 7 and k = 5.32.
The exact values, using binomial tables are n = 10, k = 6.
2.3. THE OPERATING CHARACTERISTIC FUNCTION
2.3
19
The Operating Characteristic Function
Wald (1947) devised the following ingenious method of obtaining the operating characteristic ( O C ) (probability of accepting Ho) of a n SPRT. Consider an SPRT defined by given constants A and B , with B < 1 < A, in order to test HO : f = fo (x) = f (.;go) against H I : f = fi (x) = f (x;81). If 80 and 81 are the only two states of nature, then there is no point in considering the operating characteristic function (OC Function). However, if the above hypothesis-testing is a simplification of, for example, HO : 8 5 8* versus HI : 8 > 8*, then one would be interested in the OC(8) for all possible values of 8. Let 8 be fixed and determine as a function of that 8 a value of h (other than 0) for which
This expectation is 1 when h = 0 but there is one other value of h for which it is also 1. For example, h = 1 if 8 = 80 and h = -1 if 8 = 81. The above formula can be written as
Define the density function
Consider the auxiliary problem of testing
H : f = f (x;6 ) vs. H* : f = f* (x;6 ) which are simple hypotheses for fixed h and 8. So, one continues sampling (in testing H vs. H * ) if
After taking the l/hth power (assuming h > 0) throughout, we obtain the same inequality that was used for continuing sampling in testing Ho against H I . Hence Po (accept Ho) = Po (accept H ) = PH (accept H ) = 1-a* where a* is the size of the type I error for the auxiliary problem. However solving the equations
20
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
we find that a* (1 - Bh)/ (Ah - Bh) . Hence OC(0) (Ah - 1) / (Ah - Bh) , which is a function of h. If h < 0 we set B* = Ah and A* = Bh. Then Pe (accept Ho) = Pe (reject H ) = PH (reject H ) = a* where
yielding the same expression for OC(0) as in the case of h > 0. However, h is a function of 6' and these two relations define the operating characteristic curve OC(0) parametrically. Each value of h determines a 8 and a value of Pe (accept Ho), a point on the OC curve. (For exponential models, one obtains an explicit expression for 0 in terms of h.) The equation relating h and 8 does not provide a well-defined value of 8 when h = 0 since the relation is satisfied by all 0. However, one can evaluate the limit of OC(0) as h -+ 0 by using 1'Hospital's rule. Thus lim OC (0) =
h+O
In A 1nA - 1nB'
We know OC(B0) = 1 - a,OC(B1) = p, lim OC (0) = 1, and
h+oo
since B
lim OC (0) = 0,
h+-W
< 1 < A. Thus we obtain the following table of approximate values h
e OC
-00
-1 81
0
l
In A 1nA-1nB
1-0
o
o
60
1
h Ah - 1
Ah - Bh
Example 2.3.1 Consider the problem of testing 0 = 80 vs. 0 = 81 > 80 in a Bernoulli population. Here
Setting this equal to 1 and solving for 0 we obtain
21
2.4. THE AVERAGE SAMPLE NUMBER as h --+ 0 this becomes
Also one can easily see that lim 8 = 0, and
h-tw
If
80
lim 0 = 1.
h+-w
= 0.5, 81 = 0.8, and a = ,O = 0.01, we obtain
e=
1- (2/5)h - 5h - 2h 1.6h - (2/5)h - 8h - 2h'
and the table
h
I
oc 2.4
0 .661 .01 .5
-1
1-00
1
0
.8
1 .5 .99
00
0 1
The Average Sample Number
The sample size needed to reach a decision in a sequential or a multiple sampling plan is a random variable N . The distribution of this random variable depends on the true distribution of the observations during the sampling process. In particular, we are interested in evaluating E ( N ), the average sample number (ASN). In Section 2.2 it was shown that for the SPRT, N is finite with probability one. Thus, N can take on values 1,2,3, ... with probabilities pl,p2, ..., where
c p a = 1. The moments of N cannot be explicitly computed. However, one can show < 00 for all i. Towards this end, consider that (assuming P (2 = 0) < 1) E (Ni)
Now,
r
r
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
22
etc. follow from the inequality obtained in Section 2.2, namely that P (N > k r ) < (1- q r ) k , k = 1,2, ...) since T
T
j=l
j=1
etc. Consequently,
Now, the series in brackets can be shown to be convergent by using the ratio test for 0 < g 5 1. The ratio of the ( n l ) t hterm to the nth term in the above series is (%)'(1- 4') the limit of which is less than unity. Hence, E (Ni) < 00. In fact one can show that (assuming P (2 = 0) < 1) the moment-generating function of N is finite, Towards this end consider
+
M N ( ~ )= E ( e N t )
provided ert (1 - q T ) < 1. That is, t E (-m, 0). If a decision is reached at the nth stage, A, is approximately distributed as a Bernoulli variable with values B and A and
E (1nRN)
+
(1nB) P(1nhN = 1nB) (1nA) P (1nRN = 1nA) = 1nB - P (accept Ho) 1nA - P (reject Ho)
+
)
where the expectation and the probabilities are with respect to the true distribution. So Eeo (1nAN) (1nB) (1 - a ) (1nA) a
+
and
Edl (1nhN) = (1nB) p + (1nA) (1- p) .
However In AN = 2 1
+ 2 2 + - - - + ZN,
2.4. THE AVERAGE SAMPLE NUMBER
23
a random sum of i.i.d. random variables, where
Now, using the simple method of Wolfowitz (1947) and Johnson (1959)) we will show that E (1nAN) = E (N) E (2).Let 2, Z1,Z2, ... be a sequence of independent, identically distributed random variables and N a random variable with values 1,2, ... such that the event ( N 2 i) is independent of Zi,&+I, .... Let yZ be zero if N < i and 1 if N 2 i. Then the event ( N = i) is determined by the constraints on 21,Z2 ...,Zi and hence is independent of &+I, ..., for i = 1 , 2 , .... Also {N 2 i} = Uiz; { N = j})' is independent of Zi,Zi+l .... Thus / 0 0
\
0
0
E(Z1+Z2+..i=l
\i=1 00
00
i=l
i=l
since yZ depends only on 2 1 , 2 2 , ..., Zi-1 and hence is independent of Zi provided the interchange of infinite summation and expectation is justified, and 00
00
0 0 0 0
This completes the proof of the assertion that E(lnAN)=E(Z)E(N).
(2.4.1)
The interchange of summation and expectation is valid if the series is absolutely convergent. Consider
121) and E ( N ) are finite. Thus it follows as an application of the provided E ( last result to the sequence
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
24 that
Hence
. alnA+(l-a)lnB
Eeo ( N ) =
Eeo (2)
7
(2.4.2)
and (2.4.3) Example 2.4.1 Let X be normal with mean 8 and variance 1. Let and 81 = 1 and a = ,B = 0.01. Then
80
=0
A = 99 = 1/B, 1nA = 4.595, So Eo (1nAN) = - (1 - 2a) In 99 = -4.5031.
. 4.5031 .
Ee,(N) =
--
112
- 9 and Eel ( N ) 2 9 .
For a fixed-sample size procedure n = 22 is needed. The expected sample size can be computed for states of nature other than HO and H I via
(2.4.4) where
7i-
(8) = PO(accept Ho) = OC(8).
+
Example 2.4.2 Let X be a random variable distributed uniformly on [O, 6' 21. We wish to test HO : 8 = 0 (density fo) against H I : 8 = 1 (density f1). We will obtain Wald's SPRT, the exact average sample number, and the error probabilities. Let I ( a ,b) = 1, if a 5 b and zero otherwise. Then fl
(Xi) I ( X ( , ) > 3 )I ( l , X ( I ) )
'II[ml= I (X(+ 2) I (o,x(l))
AN = i=l
where X(,) and X(l) respectively denote the largest and smallest observations in the sample. Hence the rule is: at stage n accept Ho if X(1) < 1 and X(n) < 2 (then A, = 0), at stage n take one more observation if 1 5 Xi 5 2 (i = 1,2, ...,n)
2.4. THE AVERAGE SAMPLE NUMBER
25
(then An = l),at stage n reject Ho if X(1) > 1 and X ( n ) > 2 (then An = co).If N denotes the random sample size required. Then
Similarly,
P l ( 4 = P ( N = nlH1) = P ( N = n, reject HolH1) P ( N = n, accept HoIH1) = P (1 5 X i 5 2 , i 5 n - 1,X(1)> 1 and X(n) > 21H1) +P (1 5 Xi 5 2 , i 5 n - l,X(1) < 1 and X(n) < 2(H1)
+
=
P (1 5 X i 5 2 , i 5 n - l,X(n) > 21H1) +P (1 5 xi 5 2 , i 5 n - 1, X ( n ) < 11~1)
= (1/2)n-1 (0) =
+ (1/2)n-1
(1/2)
(1/2)n. 00
E (NIHo) =
En (1/2)n = 2, and E (NIHI)= 2,
n=l (because C,"=l nOn = P(Type I error)
non-'
= 8 (a/aO) =l",c
On = O/ ( 1 - 8)2) and
Similarly ,B = 0. Higher moments of randomly stopped sums have been derived by Chow, Robbins and Teicher (1965). In the following we shall state their results for the second moment.
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
26
Let (R, F, P ) be a probability space and let 2 1 , 2 2 , ... be a sequence of random variables on R. A stopping variable (of the sequence 2 1 ,2 2 , ...) is a random variable N with positive integer values such that the event {N = n } depends only on ( Z l , 2 2 , ...,Zn) , for every n 2 1. Let
then
N
i= 1
is a randomly stopped sum. We shall assume that
Wald’s (1947) Theorem states that for independent and identically distributed (i.i.d.) Zi with E(Zi) = 0, E ( N ) < 00 implies that E ( S N )= 0. We have the following results of Chow, Robbins and Teicher (1965). Theorem 2.4.1 Let 2 1 , 2 2 , ... be independent with E(2n) = 0 , E lZnl = an, E(Z:) = o i < 00 ( n 2 1) and let Sn = C:., Zi.Then i f N is a stopping variable, either of the two relations
(2.4.6) implies that E(,S”) = 0 and (2.4.7)
If
C T = ~
o2 < 00, then E ( N ) < 00 implies that
(2.4.8)
Corollary 2.4.1.1 If E(Zn) = 0 , E ( N ) = E ( S g ) / E ( Z 2 )which , is known as Wald’s second equation.
One disturbing feature of the expected sample size of the SPRT is the following: if one is really interested in testing H,* : 8 5 8* against the alternative
2.4. THE AVERAGE SAMPLE NUMBER
27
H;" : 8 > 8*, then one would set UP HO : 8 = 80 (00 E H;) and with H I : 8 = 81 with 81 2 8*: the zone between 80 and 81 being the %zdzflerence zone." If the population is indexed by a 0 belonging to this indifference zone, that is near 8*, E ( N ) tends to be largest. Thus the test tends to take larger stopping time to reach a decision when 8 is near 8* where there is hardly any concern as to which decision is made. This is clear intuitively also, because it should not take very long to discover that a population is overwhelmingly of one kind or the other as it does to discover which kind it is when it is near the borderline. What is annoying then, is that wrong decisions are less likely to be costly in the borderline case, whereas it is precisely at that situation that a large sample is likely to be required in order to reach a decision. Example 2.4.3 For the Bernoulli problem of that
80
against 81, we have shown
1 - eo
~ ( 0 )= OC(8) = (Ah - 1) / ( A h - Bh) . NOWusing, 80 = .5, el = .9, a! = p = .05, one obtains
e
=
(5h-1) / ( g h - i )
and
= 8ln9-ln5.
As h -+ 0, 9 and n ( 8 ) tend to be indeterminate. Also both E (2) and E (1nRN) tend to zero, but their ratio can be computed by evaluating its limit as h --+ 0. We have Table 2.4.1. from Example 2.3.1. So, let us find limh,o E (NIB). Table 2.4.1 OC Function and Expected Sample Size for the Bernoulli Problem h -00 -1 0 1 00 e 1 -9 .732 0 .5 0 .05 -5 .95 1 7r ( 0 ) ln1.8 0.9ln9 - In5 0 In 0.6 -In5 E(zle> .9 In 19 = 7.2 9.16 -.9n In 19 E (NIB) = 5.01 .91n9-ln5 l .6 - 5.2 In 5 = 1.83
I#
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
28
Therefore,
lim E (NIB) = lim
h-+O
(In 19) [ (1 - 19-~)
h-to
{ [l - (.2)h] / [(1.8)h- (.2)h]} (log9) - log5
’
Using the expansion ah-bh
= ,hlna - ,hlnb = h (lna - lnb)
+ (h2/2)[(lna)2 - ( l n l ~ ) +~ ]- - -,
we have (1 - 19-h) - (19h - 1) - -h2 (In 19)2- $ (In 19)* - 19h - 19-h 2 hln19 !$ (1n1q3
-
+
+
[
= -(hln19)
-
-.
-1
1+ $ (In 1 9 ) ~- -
[
2 1+
--
+ - -1
(In 1 9 ) ~
Similarly
Hence the denominator = Thus
5 (In 5) [In (5/9)] + - - -.
- (In 1 9 ) ~ - (In 1 9 ) ~ h+O (In 5) In (5/9) [’ (h)l = (In 5) In (5/9) (2.9444)2 = 9.16. (1.6094) (0.5878)
*
lim E(Nl0) = lim
h+O
Alternatively since Ee(2) = 0 when h = 0, we use Wald’s second equation and obtain
Ee(N) =
(lnA)2P(SN 21nA)
+ (lnB)2P(SN 5 1nB)
Ee (Z2)
7
where B = 0.732. We note that a = ,O = 0.05 implies that A = B-l = 19, and 7r (0) = (19h - 1)/(19h - 19-h).
2.5. WALD'S FUNDAMENTAL IDENTITY
29
Here 2 = Xln9-In5 = (X - 0)In9
e
+ ( O h 9 - ins) ,
e)
~ ~ ( =2 (1 ~ -) (ln9)2 = 0.9448,
and the numerator is Ee ( N ) becomes
(
)
1-Bh (1nA)2 + Ah - Bh
(
$h--jh)
(1nB>2
= (lnA)2 = 8.6697,
since A = B-' = 19. Thus (In 19)2 (2.9444)2 Ee ( N ( h= 0) = = 9.16,
0.9448 0.9448 which agrees with the value in Table 2.4.1. Although the SPRT terminates finitely, in any single experiment N could be very large. Hence, one might establish a bound no and terminate sampling at no. If no decision is reached by the nkh stage; sampling is stopped anyway (and HO accepted if Ano < 1 and rejected if A, > 1). The truncation of the procedure would certainly affect the error sizes of the test and this aspect will be studied in Section 2.8 and the effect is slight if no is fairly large.
2.5
Wald's Fundamental Identity
In this section we shall give an identity of Wald (1947) which plays a fundamental role in deriving the moments of the sample size required to terminate the SPRT, for testing HO : 8 = 80 against H I : 0 = 81, where X has the probability density function given by f (x;0). Theorem 2.5.1 (WaZd, 1947) Let 2 = ln[f(X;81)/f(X;Oo)] and let
P ( Z = 0) < 1. Then
{
E eSNt[C( t ) ] - N = } 1 for every t in D where
c N
SN =
2 2 ,
c ( t )= E ( P )
i=l
and D is the set of points in the complex plane such that C ( t ) is finite and C ( t )2 1. Under some mild regularity assumptions, the above identity can be differentiated under the expectation sign any number of times with respect to t at any real value t such that C ( t )2 1.
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
30
2.5.1
Applications of the Fundamental Identity
Differentiating with respect to t and setting t = 0, we obtain
E [SN- E (N) C‘ (O)] = 0. This is
E ( S N )= E ( N )E ( 2 ) .
(2.5.1)
Differentiating twice and setting t = 0, we obtain = 0) t=O
i.e.,
{
+
E [SN- NC’(0)I2- NC”(0) N [C’(O)I2}= 0. That is, war(SN) = E ( N )w a r ( 2 ) . If E ( 2 ) = C’(0) = 0 then
E (S:) = E ( N )E (Z2). Hence
E ( N ) = E (5’5)/ E (Z2)
(2.5.2)
which is known as Wald’s second equation, where
P (SN 2 In A)
A
(1 - B h ) / (Ah - B h ) ,
and P(SN
E(ehz) = 1 ,
i.e., here h is the non-zero root of the equation C ( t )= 1. The above formula was derived under the assumption that E ( 2 ) # 0. If E ( 2 ) = 0, then h = 0, as is proven in the following lemma.
Lemma 2.5.1 If E ( 2 ) = 0 then h = 0, provided P ( 2 = 0 ) < 1 (that is, 2 is not a trivial random variable almost surely), and that diflerentiation of E (eZh) underneath the integral sign is permissible.
Proof. Under the assumption of the lemma,
2.5. WALD’S FUNDAMENTAL IDENTITY
31
This together with E ( 2 ) = 0 implies that
E [(e””
- 1) Z]= 0.
Now, using the mean value theorem, we obtain
0
hE Z2eyZh = 0 for some 0 < y(Z) < 1. Since P ( Z = 0) < 1, we infer that Z2eyzh > 0 with positive probability. That is E (Z2erzh) > 0. Hence h = 0. This completes the proof. H We also have lim
1-Bh - -
lim
Ah - 1 In A Ah - Bh 1nA - 1nB’
h-to A h - Bh
In B 1nA - 1nB
and h-to
Thus
E (5’;)
.
+
(lnB)21nA 1nA - 1nB = -1nAlnB. =
Hence
- (lnA)21nB
- 1nAlnB E ( z 2 ) , when E ( 2 ) = 0.
E ( N )=
(2.5.3)
d
Example 2.5.1 Let X = normal(0,l). We wish to test HO : 8 = 0 against HI : 8 = 1. Also set a = p = 0.05. Then
A A 19 and B
1 19
*
Hence In A = 2.9444 = - In B. Computations yield that Z = X - 0.5. Suppose we are interested in Eo.5(N). Since Eo.s(Z) = 0, we infer from Lemma 2.5.1 that h = 0. Hence (2.9444) E0.5(N)
~
~
~
~
=( (2.9444)2 2 2 )
9.
Also in Example 2.4.3
Z = X ( h 9 ) - in5 = l n 9 [ X - (in5) / (lng)] ,
32
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
and
E ( Z 2 ) = (In 9)28 (1- 8 ) , where 8 = In 5/ In 9 = 0.7325 = (4.8278) (0.7325) (0.2675) = 0.9460. Hence (In 19)2 - (2.9444)2 E(NIh = 0) = -= 9.16, 0.9460 0.9460 which agrees with the value in Table 2.4.1. Wald (1947, Appendix A4) obtained exact formulas for the OC and ASN functions when 2 = In [ f ( X 8 ; 1 ) / f ( X ;OO)] takes on as values only a finite numbers of integral multiples of a constant d. Ferguson (1967) was able to obtain exact expression for the error probabilities and the expected stopping time when X has the following density:
and we are interested in testing HO : 8 = -112 vs. HI : 8 = 112. Consider the exponential density for X given by
f ( x ; e) =
{
ifx>O if x < 0,
suppose we are interested in testing HO : 8 = 8 0 against H I : 8 = 81, with 81 > 80. Kemperman (1961, pp. 70-71) obtained upper and lower bounds for the error probabilities given by
l-&B
A- E ~ B and
1- A-l
B-1
EA-' < p < B-11--&2A-1
- &A-l - & -
where E = B o / B 1 < 1. By considering lower bounds for (1- p) /a we obtain
-
(1- E B )( A - E ~ B/ B ) ( A - E ~ B(1)- E B )/ A B
= A.
2.6. BOUNDS FOR THE AVERAGE SAMPLE NUMBER
33
Also considering upper bounds for P/ (1- a ) we have
(1 - &Av1)(B-l - E ~ A - ’ ) 1 - (1- EB)( A- eB)-’ B ( A - E ) (A - EB) I ( A - 1)(A - E ~ B ) B ( A - 4 = B + (1 - E ) B L A-1 A-1 These results suggest that we modify Wald’s approximation to the boundary values A and B as follows:
(2.5.4)
2.6
Bounds for the Average Sample Number
When both In A and In B are finite, Stein (1946) has shown that E ( N ) is bounded. However, when one of the boundaries (say 1nA) is infinite, we cannot show that E ( N ) is bounded. When E ( N ) exists, the formula E ( N ) = E(,S”)/E(Z)will hold and be meaningful. M. N. Ghosh (1960) has shown that E ( N ) is bounded if a certain condition is satisfied. Theorem 2.6.1 Let the random variable Z be such that E ( Z ) < 0. Then
E ( N ) is bounded if
1: /l: zdG(2)
for some c and function of Z.
dG(2) = E(Z1.Z < -x)
2 -Z - c
(2.6.1)
k so that x > k > 0, c > 0 where G ( z ) denotes the distribution
Special Case 1 : If Z is normal with mean p and variance 02, then we can take c = (2/3)0 and k = 2 0 - p. Special Case 2 : If Z has a standard double exponential distribution, then we can take c = 1 and k = 1.
Next we consider lower bounds for the ASN required by an arbitrary sequential test. Let X1,X2, ... be a sequence of i.i.d. random variables having the density or probability function f(z;8) where 8 E SZ. Suppose we wish to test HO : 8 E wo versus H I : 8 E u1 where wo and w1 are two disjoint subsets of 32. Let D denote an arbitrary (possibly randomized) sequential test of HOvs. H I . Let 0 < a ,,6 < 1 such that Pe(D accepts H I ) 5 a , if 8 E W O ,
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
34 and
Pe(D accepts Ho) 5 p, if 8 E
w1.
(2.6.2)
Then Hoeffding (1953, pp. 128-130) obtained:
(2.6.3) and
where
Notice that inequalities (2.6.2) and (2.6.3) were obtained by Wald (1947, Appendix A.7), (See also (2.4.2) and (2.4.3)) when HOand H1 are simple hypotheses.
Special Case: Suppose that
Then
Hence
and el (8) =
{ !-~)~/2,
ifOc-6 if 8 2 6.
Further if a = p,
Hoeffding (1960) had derived improved lower bounds and we will provide one of them. X2, ... be a sequence of i.i.d. random variables having the density Let XI, (or probability function) f (which could be indexed by a parameter 0). Consider sequential tests for which a(P) denotes the error probability when f = fo(f1). Let N denote the stopping time of the sequential test.
2.6. BOUNDS FOR THE AVERAGE SAMPLE NUMBER
35
Theorem 2.6.2 Let the sequential test terminate with probability one under each of fo, f 1 and f2. Also assume that E2(N) < 00 when E2(N) denotes the expected stopping time when f = f2. Further let a -k p < 1. Then
where 9 = m+90,91),
gi =
S
f2(4
In [f2(.)/fi(Z)] d.,
i =0,l
(2.6.6)
and
Special Case: Let fo, f 1 and fi be normal densities with variance 1 and respective means -<, 5,O (< > 0 ) . Then
and (2.6.5) takes the form of
(2.6.8) when a = P. Note that when a is small
which can be obtained by first squaring and then using the inequality:
(-2 In a)1/2- (1- 2 In 2)’12 5 [I - 2 In (2a)l1l25
- (2 In
Next, consider the SPRT which stops as soon as
2 c l S ~> l lnA(> 0) since Zi = 2[Xi, where
and A = (1 - a)/..
+ (1- 2 In 2)ll2 .
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
36 Hence
(2.6.10)
Table 2.6.1 Values of E2 (N) and of the Lower Bound in (2.6.8) for E = 0.1 a! .01 .05 0.1 0.2 Fixed-sample size 541.2 270.6 164.3 70.8 SPRT 527.9 216.7 120.7 48.0 Lower Bound (2.6.8) 388.3 187.0 111.1 46.6
When a is close to its upper bound 1/2 and (2.6.8) is nearly achieved by the SPRT.
5 is small, the
0.3 27.5 17.9 17.8 lower bound in
2.7 Improvements to OC and ASN Functions Page (1954) and Kemp (1958) have improved the Wald’s approximations for the OC and ASN functions. In the following we will give Kemp’s (1958) results which are better than Page’s (1954).
2.7.1 The OC Function Wald’s approximate formula (see Section 2.3) for the operating characteristic of a SPRT is equivalent to the probability that a particle performing a linear random walk between two absorbing barriers is absorbed by the lower barrier. This formula is valid if the starting points of the test are not near either boundary and if the mean paths are inclined to the. boundaries at not more than a small angle so that the overshoot beyond the boundary at the end of the test is negligible. Page (1954) derived expressions for the OC function and the ASN of a SPRT that are closer to the true values. Kemp (1958) obtains even better approximations by using the same method of Page (1954) but different assumptions and we will present Kemp’s results. Suppose that a Wald’s SPRT is to be carried out on a population for which the scores, Zi assigned to the observations are independent having a continuous density function g ( z ) . Note that in our case z ~ = l o g f[
(x; 01)
f ( X ;00)
1’
2 = 1 , 2 , ...
and we assume that we take at least one observation (that is n 2 1). Consider a sequential testing procedure with horizontal decision lines distance w apart. Take the lower line as the line of reference. Also let P ( z ) be the
2.7. IMPROVEMENTS T O OC AND ASN FUNCTIONS
37
probability that a sequential test starting at a point x from the line of reference will end on or below the lower boundary. Then Kemp (1958) says that P (2) satisfies --z
(2.7.1)
If P ( x ) A 1 when x 5 0 and if either P ( x ) = 0 or P(2; > w - x) = 0, Equation (2.7.1) can be approximately written as 00
P(z)g(z- z)dz.
(2.7.2)
Then P ( x ) satisfying (2.7.2) is of the form
(2.7.3) where h is the solution of the equation (2.7.4) J-00
Also C and D can be solved for in terms of P(0) and P(w) and obtain
and D=
P(O)- P(w)ewh
1 - ewh Now, substituting (2.7.3) into (2.7.2) and carrying out the integration, we obtain the simultaneous equations to solve for P(0) and P(w) by setting x = 0 and x = w.
Special Case: If 2; is normally distributed with mean 0 and variance one and w 2 3, then h = -28. Kemp (1958) obtains
where
K2 = [l- a (W
+ 0) - @ (W - Q ) - 2@ ( Q ) ] [l - e-2we]-'.
(2.7.7)
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
38
Also note that
q o p ) = 1 - P ( ~- Je)
(2.7.8)
when 0 = 0, the limiting form of the operating characteristic is
P(2) = P(0)+ [ l - 2P(O)]z / w ,
z = 2.5
e -1.00 -0.50 0.0 0.125 1.250
Wald 1.00 0.99 0.75 0.494 0.282
Kemp 1.00 0.9998 0.716 0.406 0.190
z = 5.0
True 1.00 0.9997 0.721 0.428 0.211
Wald 1.00 0.993 0.500 0.223 0.076
Kemp 1.00 0.997 0.500 0.190 0.052
True 1.00 0.996 0.500 0.199 0.058
2.7.2 The Average Sample Number If the sequential procedure starts at a point distance z from the line of reference then n ( z ) the expected sample number satisfies the equation n(z) = 1
+
1
W
n(x)g(x - z)dx.
(2.7.9)
If the probabilities in (-00, z ) and (w - z , 00) are negligible then one can approximately write (2.7.9) as
+ J_,
00
n(z)
1
n ( x ) g ( x - z)dx.
(2.7.10)
which is satisfied by the solution n(z) =
[(C*+ D*ezh)- z]
(2.7.11)
where h is defined by (2.7.1).
Special Case: If Zi is normal with mean 0 and unit variance and w 2 3, then
39
2.7. IMPROVEMENTS T O OC A N D A S N FUNCTIONS and n(z) = n(0) - ( 2 / 6 )
6 - w } (I - e-2wz) + { [n(w) - n(o)] e (1- e-2we)
(2.7.12)
Substituting (2.7.10) into (2.7.9) integrating and setting z = 0 and z = w one can obtain
Kln(0)
+ K2n(w)
= 1 - iP (LJ - 0)
+ iP (-6)
K2w
--
e
(2.7.14) where K1 and K2 are given by (2.7.5) and (2.7.6). Also note that it is necessary to calculate n(0) and n ( w ) only for positive or negative since n(Ol6) = n(wl - 6 ) . For 6 = 0, the limiting forms are n ( z ) = n(0)
+ -(w, -z 4
n(0) = l+(-$=){[;-iP(w)]}-'.
Table 2.7.2 Comparison of the values n ( z ) when w = 10
e -1.00 -0.50 0.0 0.5 1.0
z = 5.0
z = 2.5
Wald Kemp 3.8 2.5 5.0 7.0 27.7 18.8 12.3 16.2 8.8 9.3
Tme 3.4 6.4 25.2 15.4 8.4
Wald Kemp 5.0 6.3 9.9 12.0 34.0 25.0 12.0 9.9 6.3 5.0
Tme 5.9 11.4 31.4 11.4 5.9
Note that the true values in Tables 2.7.1 and 2.7.2 are obtained by solving the exact equations for P ( z ) and n ( z ) . Tallis and Vagholkar (1965) also obtain improvements to the OC and ASN approximations which are comparable to those of Kemp (1958). However, they are too complicated to be presented here.
40
2.8
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
Truncated SPRT
Although SPRT's enjoy the property of finitely terminating with probability one, often due to limitations of cost or available number of experimental units we may set a definite upper limit say no on the number of observations to be taken. This may be achieved by truncating the SPRT at n = no. Thus, Wald's (1947) truncated SPRT is formulated as follows: If the sampling procedure has progressed to nth stage (n 5 no) n
reject HO if
C Zi 2 1nA , and i=l
n.
2 zi 5 l i=l
accept HI if and take one more observation if In B lead to terminal decision for n < no, reject HO if 0
n ,~
< C:!, Zi < In A . If the SPRT does not no
< C Zi < 1nA , and i=l
accept HI if 1nB <
nO
C Zi < 0 , i= 1
By truncating the sequential process at the nhh stage, we change the probabilities of type I and type I1 errors. The following theorem provides upper bounds for the modified probabilities of errors.
Theorem 2.8.1 Let (u and p be the normal probabilities of errors of first and second kinds f o r the SPRT. Let a (no) and P (no) respectively denote the modajied a and P f o r the truncated SPRT at no. Then In A
(2.8.1)
and
Jo ~ W Y )
B)]
(2.8.2)
1nB
where
I Y) \id
and Pj denotes the probability computed when W h e n no is suficiently large, we have
Hj
is true (j = 0,l).
(2.8.3)
41
2.8. TRUNCATED SPRT and
(2.8.4) where pj = EH,(Z),a; = W U ? - ( Z I H j ) , j = 0 , l .
Proof. Let po (no) denote the probability under HO of obtaining a sample such that the SPRT does not lead to a terminal decision for n 5 no and the truncated process leads to rejection of Ho, while sampling beyond no leads to acceptance of Ho. Let C1, C2, C3 respectively denote the sets of sample points the probability contents of which when fo is true are a(no),po (no)and a. Also let C4 denote the set of outcomes for which one continues sampling forever if he/she does not make a decision at no. Then C1 c C2 U C3 U C4 because any sample point belonging to C1 is also in C 2 U C3, and the sample point belonging to C3 (and hence to C2 U C3) for which n0
1nB <
C zi < 0 i=l
does not belong to C1. Hence the strict inequality and consequently (since Poi(C4) = 0, i = O , 1 ) (2.8.5) a (no) L a + po (no) *
Next we derive an upper bound for po (no),which is the probability under HO that the sequence of observations 2 1 , 2 2 , ... satisfy the following three conditions: (i) InB (ii) 0
< Cy==l Zi < 1nA for n = 1,2, ...,no - 1,
< Czl Zi < In A ,
(iii) when the sequential process is continued beyond no, it terminates with the acceptance of Ho. Now n
Zi i=l
< In A and
Zi 5 1nB for some n > no i=l
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
42
Thus, since the 2’s are i.i.d. random variables,
=
I,,
[l- a* (Be-Y,Ae-Y)] dGo(y),
where a* (I3e-Y’Ae-Y) denotes the type I error probability of the SPRT having stopping bounds (Be-Y,Ae-Y). Using corollary 2.2.1, we have 1 - a* = (A - ey) / (A - B ) . Thus
(2.8.7) since eY 2 1. Analogously, one can obtain that
where
(iv) In B
< Cyll Zi < 0, and
(v) when the sampling process is continued beyond no, it terminates with the rejection of Ho. Hence no
1 n B <: X Z i i=l
n
< 0 and X Z i 2 1nA for some n > no i=l
(2.8.8)
2.8. TRUNCATED SPRT
43
since the 2’s are i.i.d. random variables and where ,8*(Be-Y,Ae-9) denotes the type I1 error probability of the SPRT having stopping bounds (Be-y, Ae-y). Again from approximation (2.2.1)and (2.2.2)we obtain
1- p* (Be-Y,Ae-Y) A A(A - B ) - l ( l - Be-Y) with y
< 0.
Thus
(2.8.9) Next, let us consider the case where no is sufficiently large. Consider for some c
where pj = EH,(Z),0; = war(ZIHj), j = 0,l.
Remark 2.8.1 Wald’s (1947,p. 64) upper bounds for po (no) and p1 (no) are given by PO[(ii)] and PI [(iv)] respectively using normal approximations. Example 2.8.1 Let fj(x) = $(x - Oj), j = 0,1, with 01 > 80. Then Z = 6[ X - (00 81) /2],6 = (01 - 80) and hence po = -62/2,p1 = 62/2 and 02 = S2. Hence, from (2.8.3)and (2.8.4)we have, for all no. 3
+
and
Special Case: a = ,d = .05 and n o = 25, 80 PO
(no) 5
= -1/2
and 81 = 1/2. Then
0.95 [Q, (3.08)- Q, (2.5)]= 0.005.
also p1 (no) = po
50.005.
Example 2.8.2 Let X1,X2, ..., be an i.i.d. sequence of random variables having the probability density function fj(x) = 0;’ exp(-z/ej), x,0j > 0, j = 0 , l . Then 2 = 6X-ln(B1/Bo) where 6 = (01 - 60) and without loss of generality,
44
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
let us assume that 01 > 00. Hence pj = EH,.(Z) = S0j - ln(0&) and 0; = var(Z1Hj)= 8’0;. Now, note that 2 Cy:, X i / 0 j is, under Hj,distributed as chi-square with 2no degrees of freedom. Also let
Then straight-forward computations yield (for some c)
= H (2.8.10) Now, using (2.8.10) with c = 1nA and j = 0 in (2.8.1), one obtains an upper bound for a (no),for the exponential distribution. Also, using (2.8.10) with c = 1nB and j = 1 in (2.8.2), we get an upper bound for ,B(no) for the exponential case. If no is large, we obtain the upper bounds by substituting the relevant quantities for pj and aj in (2.8.3) and (2.8.4). Aroian (1968) proposes a direct method for evaluating the OC and ASN functions of any truncated sequential test procedure once the acceptance, rejection and continuation regions are specified for each stage. His method involves repeated convolution and numerical integration. For instance no
where no is the truncation point, pi0 (0) Ipil(0) ] denotes the probability of accepting (rejecting) HOat ithstage (i = 1,2; ...,no) . His method is amenable to testing simple hypotheses about parameters in exponential families (since then he can reduce the SPRT to a random walk). Aroian’s (1968) method is more promising, especially, when the underlying distributions are discrete. Aroian (1968) illustrates his method by evaluating the OC and ASN functions of Wald’s SPRT for the normal mean with known variance with no = 7 and 14. For the binomial case, by choosing an arbitrary continuation region, he obtains exact expressions for the OC and ASN functions. So far we have some idea about the performance of Wald’s SPRT. Now, we would like to ask whether the SPRT has any optimal properties.
2.9. OPTIMAL PROPERTIES OF THE SPRT
2.9
45
Optimal Properties of the SPRT
The sequential probability ratio test (SPRT) for testing a simple hypothesis against a simple alternative was first proved to be optimal in a certain sense by Wald and Wolfowitz (1948) (see Wolfowitz, 1966, for additional details). Another proof has been given by LeCam which appears in Lehmann (1959). Matthes (1963) has given a proof which relies on a mapping theorem. Let X I ,X z , ... be an i.i.d. sequence of random variables having the density function f (2;8 ) . We wish to test HO : 8 = 80 versus H1 : 8 = 81. Let N denote the stopping time of Wald’s SPRT for testing HO against H I . Then we have
Theorem 2.9.1 (Wald and Wolfowitz, 1948). Among all tests (fixed-sample or sequential) f o r which P(reject Holeo) < a , P(accept Hol81) < p and f o r which E(NI8i) < 00, i = 0, 1, the SPRT with error probabilities a and p minimizes both E(NIBo) and E(NIB1). Proof. The main part of the proof of Theorem 2.9.1 consists of finding the solution to the following auxiliary problem: Let wi denote the loss resulting from wrong decision under Hi (i = 0, l), and let c denote the cost of each observation. Then the risk (expected loss) of a sequential procedure is
when HO is true, PWl
+ cE(NIQ1)
when H I is true and a,P are the error probabilities. If the state of nature 8 is a random variable such that P(8 = 8 0 ) = 7r and P ( 8 = 81) = 1 - T , then the total average risk of a procedure 6 is
r ( T W O , w1,S) = T- [
+ cE(NIQo)]+ (1 -
~ W O
T)
[PWI+ C E ( N ~ .~ ) ]
The proof of Theorem 2.9.1 consists of determining the Bayes procedure for this problem, that is, the procedure which minimizes r ( T , wo, w1, S), and showing that Wald’s SPRT is Bayes in the sense: Given any SPRT, given any 7r with 0 < T < 1, there exist positive constants c and w such that the Wald’s SPRT is Bayes relative to T , c, wo = 1- w, w1 = w. (It is important to note that T can be chosen arbitrarily.) From the Bayes character of Wald’s SPRT, one can show its optimum property as follows: Let S* be any other competitive procedure having error probabilities a* 5 a , ,B* 5 ,B, and expectations of sample size E(N*lOi)< 00, (i = 0 , l ) . Since Wald’s SPRT minimizes the Bayes risk, it satisfies T
I
+ C E ( N I B ~ ) ] + (1- [pwl + C E ( N ~ ~ ) ] b * w 0 + C E * ( N ~+ ~ (1 ) ]- [p*wl + C E * ( N ~ ~ ) ] , [awe
T)
T)
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
46 hence
+
+
nE(NJ9o) (1- n) E ( N J 9 1 )I nE*(NIBo) (1- n)E*(NI91). Since the inequality is valid for all 0 < n < 1, it implies
I
~ ( ~ 1 9 0 E*(Npo) ) and
E(NIQ1) I E*(NI&),
which establishes the optimum property of Wald’s SPRT. Next we consider the monotonicity property (Property M).
Definition 2.9.1 An SPRT is said to have the Property A4 if at least one of the error probabilities decreases, when the upper stopping bound of the SPRT is increased and the lower stopping bound decreased, unless the new test and the old test are equivalent. In this case the error probabilities are unchanged. (Two tests are said to be equivalent if their sample paths differ on a set of probability zero under both the hypotheses.) We have the following result regarding the uniqueness of a SPRT.
Theorem 2.9.2 There is at most one sequential probability ratio test f o r testing HO : f(x) = fo(x) us. HI : f = f 1 , that achieves a given a and p provided one of the following conditions hold:
(i) f 1 ( X ) /f o ( X ) has a continuous distribution with positive probability on every interval in (O,oo),
(ii) The S P R T has stopping bounds which satisfy 0 < B < 1 < A, (iii) The S P R T has monotonicity property M. For (i) see Weiss (1956) and for (ii) see Anderson and Friedman (1960). Wijsman (1960) has shown that the SPRT has property M.
Definition 2.9.2 A density or probability function f ( z ; 9) where 9 is real is said to have monotone likelihood ratio (MLR) in t ( z ) if the distributions indexed by different 9’s are distinct and if f(z; 9)/f(z; 9’) is a nondecreasing function of t ( z ) for 8 > 8’. The following result (see Lehmann (1959, p. 101) pertains to the monotonicity of the power function (or the OC function)).
Theorem 2.9.3 Let X1,X2, ... be i.i.d. random variables with the density f (2;9 ) which has M L R in T ( x ) . Then the S P R T f o r testing HO : 9 = 90 us.
HI : 9 = 91
(90
< 91) has a non-decreasing power function.
Corollary 2.9.3.1 If f(x;9) has M L R in T ( x ) , then the SPRT is unbiased [that is, OC(Oo)> OC(Ol)].
47
2.10. THE RESTRICTED SPRT
2.10 The Restricted SPRT Although the SPRT has the optimum property, in general its expected sample size is relatively large when the parameter lies between the two values specified by the null and alternative hypotheses (that is, a large number of observations is expected, in cases where it does not make much difference which decision is taken). Then one can ask whether there are other sequential procedures which will reduce the expected number of observations for parameter values in the middle of the range without increasing it much at the hypothesized values of the parameter. Another difficulty with the SPRT is that for most cases its number of observations (which is random) is unbounded and has a positive probability of being greater than any given constant. Since it is not feasible to take an arbitrarily large number of observations, often the SPRT is truncated. A truncated SPRT with the same error probabilities may have considerably increased expected sample size at the hypothesized values of the parameter. As an alternative to the SPRT, Armitage (1957) has proposed certain restricted SPRT’s, leading to closed boundaries in testing hypotheses regarding the mean of a normal population. He converted the boundaries to the Wiener process, however he only approximated the probabilities and expected time of the procedure based on the Wiener process. Donnelly (1957) has proposed straight line boundaries that meet, converted to the Wiener process and obtained certain results , Anderson (1960) has also considered a modified SPRT in testing hypotheses about the mean of a normal population with known variance and has derived approximations to the operating characteristic (or power) function and the average sample number. We now present Anderson’s procedure which is similar to Armitage’s and Donnelly ’s procedure. Anderson’s (1960) method consists of replacing the straight line boundaries by boundaries that are linear in the sample size. The method is easily applicable to the exponential family of distributions, because the SPRT can be based on a sum of i.i.d. random variables, each of which has a distribution belonging to the exponential family. Assume that observations are drawn sequentially from a normal distribution with mean 8 and known variance 02.We wish to test HO : 8 = 8 0 vs. H I : 8 = 81 (81 > 8 0 ) with a procedure which either minimizes Ee(N) at 8 = ( 8 0 81) /2 or (alternatively) minimizes the maximum of Ee(N ) . Replacing the observation X by the transformed observation [ X - (80 81) /2] /o and calling 8* = (61 - 8,) /2a, the hypotheses become HO : 8 = -8* and HI : 8 = 8* (8* > 0) when sampling from the normal population having mean 8 and variance 1.
+
+
Restricted SPRT Procedure: Let c1
> 0 > c2. Take (transformed) observa-
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
48
tions X1,X2, ... sequentially. At the nth stage: reject Ho if
accept 00 if i=l
and take one more observation if neither of the above occurs. If no observations are drawn: stop sampling, and reject Ho if nn i=l
accept HO if i= 1
To avoid intersection of the lines before truncation point, one requires c2
+ dz(n0 - 1) < c1 + d&o
- 1).
Also, since we wish the lines to converge, we require dl < 0 < d2. Because of the symmetry of the hypotheses about 6' = 0, consider the case when the error probabilities are equal. Since the problem is symmetric, it is then reasonable to consider only symmetric procedures, that is procedures with c1 = -c2 = c and -dl = d2 = d and k = 0. To calculate the probabilities and expected values that are of interest is complicated. However, one can calculate such quantities if
i= 1
is replaced by an analogous X ( t ) (0 5 t < 00)) the Wiener process with E [ X ( t ) ]= 8t and var [ X ( t ) ]= t. Anderson (1960) has derived expressions for the probability of rejecting HO as a function of 6, and the expected length of time. That is, he proposed to obtain a specified significance level at -0* and a specified power at O* with some minimization of the expected time. Then the OC and expected time are approximations to the OC and expected sample number when observations are taken discretely. One might hope that the approximations are as good as the corresponding quantities for the SPRT, which is the special case with dl = d2 = 0 and no = oo,T = 00 where T is the truncation point of the Wiener process considered here.
2.10. THE RESTRICTED SPRT
49
Anderson (1960) obtains the probabilities and expected times as infinite series of terms involving Mill’s ratio. Subject to the condition that the error probabilities are the same, the constants c and d are varied so as to obtain the smallest expected observation time at 8 = 0. The line ~t:= c - d t has intercept c at t = 0 and c - d T at t = T. When c d T = 0 , the two lines converge to a point. For each of several values of the ratio of these two intercepts, [ ( c - d T ) / c = 0,0.1,0.2]. Table 2.10.1 gives the c and T (and hence d ) that approximately minimize the expected observation time at 8 = 0.
+
Condition Fixed size
SPRT c-dT=O c - d t = .Ic c - d t ,= .2c
I
Table 2.10.1 a = ,O = .05(.01), 8 = -.l and .1 Expected time Expected time C T at8=O at 8 = -.1,8 = .1 2 70.6(541.2) 270.6 (541.2) 270.6 (541.2) oo(00) 2 16.7(528.0) 132.5(225.0) 14.7(23.0) 19.9(35.5) 600.3(870.3) 192.2(402.2) 139.2(249.4) 20.1 (35.5) 529.0(783.2) 192.2(402.2) 139.3(249.4) 20.3(35.5) 457.1(700.0) 192.2(402.8) 139.8(249.8)
In Table 2.10 I,the values inside parentheses correspond t o error probabilities ac = ,O = .01. These computations suggest that the convergent line procedures show a considerable improvement over the SPRT at 8 = 0 with a moderate decrease in efficiency at 0 = -.l and 8 = . l . When the error probabilities are each 0.05 the expected time at 8 = 0 is 24.5 less than the SPRT and is 6.7 more at 8 = f 0 . 1 (a ratio of 3.7 to 1);at the 1% levels it is 125.7 less at 0 = 0 and 24.2 more at 0 = f O . l (a ratio of 5.2 to 1). When we operate at the 5% levels we are better off with the modified SPRT if intermediate values of 8 occur at least 1/4 of the time and when we operate at the 1%level if intermediate values occur at least 1/6 of the time. The difference in the expected times at 8 = 0 when the ratios of the intercepts are 0 and 0.1 is not significant, because in the latter case the probability of reaching a decision at t = T is almost zero. Bartlett (1946) has obtained the probability of absorption before time no (which is approximately equal to the probability of crossing the upper boundary with not more than no transformed observations) :
Po(8,no) = 1- @
[-
+ exp [ 2 c ( +~ 41
[
-C
& -e(
3
d)no
(2.10.1)
with Po( -8* ,n o ) = a , where @ denotes the standard normal distribution function, and Po denotes probability under Ho. Armitage (1957) suggested using ~ = 1- 1 n1 -(a ~ ) d,= - - A A 2,
(2.10.2)
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
50
where A is the solution of the equation:
2
2
Let p ( O , d , c ) denote the probability of accepting HO (for the transformed observations). One may think of p (8, d , c ) for 8 2 0 as a probability of an incorrect decision. Analogously let q (8, d , c ) denote the probability that the continuous Brownian motion X ( t ) on [O,oo) exits I ( t ) : (-c d t , c - d t ) through the lower boundary. The quantity q (8, d , c ) is an approximation to p (8, d , c ) ; Anderson (1960, p. 170) remarks that it is actually an upper bound for p (0, d , c ) when 8 2 0. Fabian (1974) has derived a simple and explicit expression for q (8, d , c ) when B and d are such that Old is an even integer. Interpolation methods can be used for other values of Old. Also one can choose d in some optimal sense. In general, when testing that the normal mean 8 = -8* vs. 8 = 8* (8* > 0) at a prescribed level, the asymptotically optimal values of d (that d which makes the expected sample size at 8 = 0 minimal) is d = 8 * / 2 (which is easy to see by using the strong law of large numbers). We now present the results of Fabian (1974) and Lawing and David (1966).
+
Theorem 2.10.1 W e have
(2.10.3) and
( 2.10.4) where
(2.10.5) and
Sij
is Kronecker’s delta.
Proof. See Theorem 2.2 of Fabian (1974). H Often with a preassigned y and given 8 we wish to determine d and c so that
If we also specify d , then the value of c satisfying (2.10.6) is uniquely determined and is given by C=
In q-l 2 (8 - d )
(2.10.7)
2.1 1. LARGESAMPLE PROPERTIES OF THE SPRT
51
where XP is the solution of q(0,d, c) = y with q (0, d, c) given by Equation (2.10.4). (Notice that Paulson’s (1964) bound, q(0,d, c) 5 exp [-2c (0 - d ) ] , yields c given by (2.10.7). Fabian (1974) has computed the values of XP for given values of y and Bid, and these are given in Table 2.10.2. Table 2.10.2 Values of Q for which q(B,d,c) = y y = .1 .05 .01 .005 .001 Old 2 .2 .1 .02 .010 .002 4 .13443 .06237 .01126 .00548 .00105 6 .12367 .05742 -01055 .00518 .00101 8 .11957 .05571 .01035 -00511 .00101 10 .11745 .05487 .01027 .00508 .00100 12 .11617 .05439 -01023 .00506 .00100 00 .11111 .05263 .01010 .00502 .00100 ~
2.11
Large-Sample Properties of the SPRT
In this section we consider the asymptotic behavior of the error rates and ASN of Wald’s SPRT, which were studied by Berk (1973). Assume that X,XI, ... are i.i.d. with common pdf fi under hypotheses Hi, i = 0 , l . Wald’s SPRT of HOvs. HI uses stopping time (2.11.1) N = inf { n : Sn 6 ( b , ~ ) } ,
Cj”=,
where Sn = Z j , Zj = In [fl(Xj)/fo(Xj)] and b, u are two numbers. (Assume that 2 is finite with probability one (wpl).) The error rates are denoted by a and ,O, i.e. a = Po (SN 2 u ) and ,O = PI (SN 5 b). Sometimes (a,,O)is called the strength of the test. Wald’s (1947) inequalities for (a,,O)(see Theorem 2.2.2, p. 12) may be written as
Note that these inequalities are general and do not depend on the common ... are distribution of the Xi.Further, it was shown in Section 2.4 that if X,XI, i.i.d. and P ( Z = 0) < 1, then E{exp(tN)} < 00 for some t < 0. (Here P and E refer t o the true distribution of X , which need not be either fo or f1.) In particular EN < 00. We assume throughout this section that P ( Z = 0) < 1. Suppose that c = min(-b, u ) --+ 00, and write lim, for limc+m. Then lim, a = lim, ,O = 0 and wpl lim,N = 00, consequently lim,E(N) = 00. The following theorem states precisely the asymptotic behavior of N and E ( N ).
Theorem 2.11.1 (Berlc, 1973 and Govindarajulu, 1 9 6 8 ~ ) . Suppose that X , X I ,... are i.i.d. with finite p = E ( 2 ) .Then if p > 0, wpl
52
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST (i) lim, I{sNla} = lim, P (SN2 a ) = 1, and (ii) lim, N / a = lim, E ( N ) / a = 1 / p . If p < 0 , then wpl @) lim, I{SN
A result of Siegmund (1967) will be used which we will state without proof. Lemma 2.11.1 (Siegmund, 1967). Let X I ,X2, ..., be an 2.i.d. sequence of random variables having a finite mean p. Let Sn = X I - +- . *Xn( n 2 1). For any positive non-decreasing, eventually concave function g defined on the positive real numbers and c > 0, let
r = r(c)=
first n 2 lsuch that Sn > c g ( n ) 00,if no such nexists.
Let X = X(c) be the solution of the equation. pX = cg(X).
Assume that g ( n ) = o ( n ) and X(c) is unique f o r suficiently large c. Also, f o r some S E (0,l) and L slowly varying (that is, limt+oo [ L ( z t ) / L ( t )=] 1 for every z E (0, co)) we assume that g ( n ) n6L(n). Then
-
lim A-’E
c-mo
(T)
= I.
Proof of Theorem 2.11.1. Consider the case p > 0. Since limn Sn/n = p wpl, limnSn = 00 wpl. Thus S, = minnSn is finite w p l . We then have I{SN,b} 5 I{s*sb}-+ 0 wpl as c -+ 00. Thus l i r n , I { ~ ~ >=~ }1 and by the dominated convergence Theorem lim,P (SN2 a ) = 1. Since wpl lim, N = 00, lim,SN/N = p wpl. By the definition of N, SN-ll{SN>a)
< a l { s ~ > a ) 5 sN1{s~>~)-
On dividing throughout by N and letting c -+ 00, the extreme terms both approach p wpl; thus wpl lim, a/N = p or lim, N / a = l/p. By Fatou’s Lemma liminf, E ( N ) / a 2 l / p . Now, let t = inf { n : Sn 2 a}. Clearly N 5 t. From Lemma 2.11.1, it follows that limcE(t)/a = 1/p. Hence, we have limsup, E (N) / a 5 l/p. Now the proof is complete for the case p > 0. The proof of the theorem for p < 0 is analogous. This completes the proof of Theorem 2.11.1. Theorem 2.11.1 shows that Wald’s approximation to the ASN is asymptotically correct. This approximation (see Wald, 1947, p. 53) applies when 0 < 1p1 < 00 and may be written as
E ( N ) = [bP(sN 5 b),+ ap(sN > a ) ]/ p
(2.11.3)
2.11. LARGESAMPLE PROPERTIES OF THE SPRT
53
According to Theorem 2.11.1, the ratio of the two sides of (2.11.3) approaches one as c --+ 00. From Wald's inequalities (2.11.2) one can obtain the cruder inequalities (2.11.4) Q L exp(-a), P L exp(b). The next theorem shows that asymptotically, the inequalities in (2.11.4) are, in some sense, equalities. Theorem 2.11.2 (Berk, 1973). Let X I ,X2, ..., be i.i.d. and Ei 121< 00 f o r i = O , l . Then lima-l In(l/cu) = 1 = 1im(-b)-'ln(1/P). C
C
Proof. See Berk (1973) or Govindarajulu (1987, pp. 123-124). Remark 2.11.1 This result also shows that Wald's approximations for the error rates [obtained by treating the relations in (2.11.2) as equalities and solving for (a,P)]are asymptotically correct in the sense of the theorem.
An approximation for the power curve of the SPRT is obtained (see Sec. 2.3) under the additional assumption that, for some (necessarily unique) real number h # 0, E {exp hZ} = 1. In the above notation the approximation to the power may be written as
Theorem 2.11.3 Together with Wald's method of considering the auxiliary
SPRT generated b y Sn = hSn with stopping boundaries
{
(b', a') =
(hb,ha) , zf h (ha,hb), zf h
>0 <0
establishes the following corollary.
Corollary 2.11.3.1 Let X , X I ,X2, ..., be i.i.d. and Ei 121 < some h # 0 ) E(exph2) = 1. Then
lim(-ha)-'lnP
00
and ("or
( S N 2 a ) = 1 if h > 0 ,
lim(-hb)-'lnP(SN
5 b)
= 1 zf h < O .
C
Proof. Power of the SPRT is equal to the probability of type I error for the auxiliary SPRT. That is,
P
(sN
2 u)
e--ha, ( h > 0).
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
54
Similarly, power of the SPRT = 1 - P (SN 5 b) = 1 - e-hb, ( h < 0) after using Theorem 2.11.2, since (1 - eha) / (ehb - eha)
P ( S N 5 b) .
e-hb when h < 0, c is large.
When h = 0, the limiting power is -b/(a - b).
Example 2.11.1 Let X be normally distributed with mean 0 and variance 1. We wish to test HO : 0 = 0 versus H I : 8 = 1. Let a = ,6 = 0.01. Then a = -b= ln99,
2 =
x- 1/2,
E(Z) = e-1/2=p7
and hence h = 1 - 28. Then
EON
Power at 0 = Pe ( S N 2 u )
2.12
-- (0
In 99 - 1/2) . - lng9
tY
> 1/2,
if0
< 1/2.
. n n
1I
a
I
exp (-ha) where h = 1 - 20. So
Problems
2.1-1 Let the random variable X have the density function
We wish to test Ho : S = 1 versus H I : S = 2. Construct Wald’s sequential probability ratio test.
2.1-2 Let X have density function Oexp(-Ox), x > 0. We wish to test HO : 0 = 2 versus H1 : 0 = 1 with a = .05 and ,6 = .lo. Construct Wald’s SPRT.
2.12. PROBLEMS
55
2.1-3 Let .7, .8, .9, .9, .85 denote the squares of observations drawn randomly (0 < x < 1). We wish to test HO : 8 = 1 versus from the density HI: 8 = 3 with a = ,B = .lo. Carry out the SPRT for the above data and see whether you accept HO or H I .
+
2.1-4 Let X be distributed uniformly on [8,$ 21. We wish to test Ho : 8 = 0 vs. H1 : 8 = 1. Let 1.2, 1.5, 1.8, 1.4, .9 denote a random sample drawn sequentially from the above density. Carry out Wald’s SPRT.
2.1-5 Let X be a non-negative random variable having the probability function f(x;e> =
{
1- h ( 1 - a ) - l , Oax,
for x = o for x = 1,2, ....,
where a is a known constant (1/2 < a < 1). We wish to test HO : 8 = 8 0 versus H I : 8 = 81 (0 < 8 0 < O1 < (1- a)/.). Construct Wald’s SPRT using a = ,B = .05. [Hint: Let m denote the number of zero observations in a random sample of size n and without loss of generality set z1 = 2 2 = *
’
*
= x m = 0, rn
5 n.]
2.1-6 Let the common distribution of X l , X 2 , ... have a density
Construct the sequential probability ratio test of the hypothesis HO : S = -1/2 versus HI: S = 1/2. 2.3-1 Obtain the relation between 6 and h in the SPRT for 8 0 = 0 vs. 81 = 1 in a normal population with unit variance. Plot the OC function of the test with a = ,B = .01. 2.3-2 Show that, in testing 8 = relation between 8 and h is
80
vs. 8 = 81 in a Poisson population, the
2.3-3 Obtain the graph of the OC(8) in the SPRT for testing the density = 1,a = .05, and ,B = .01. Boe-eOz VS. (X > 0) using 00 = 2, 2.3-4 Let X be normally distributed with mean 8 and known variance are interested in testing HO: 8 = $0 vs. 8 = $1. Show that
h(8)=
el -+eo - 28 $1 - $0
-
02.We
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
56
2.3-5 Show that for Wald's SPRT, B = 1/A when a = p. 2.3-6 Obtain the relation between 8 and h in the SPRT for testing HO : 8 = 80 vs. 8 = 81 for the exponential family given by
f ( z ;e) = B(0)exp [ ~ R ( x )h(x). ] [Hint]: Note that
1 = 1 =
2.4-1 Let f ( x ; a ) = a z ( l - a ) , x = O , l , . . . , 0 < o < 1. We wish to test Ho : a = a0 vs. a = 01. Tabulate some values of the OC function with a0 = 1/2, 0-1 = 3/4, a = p = .05. Obtain an expression for the OC function. Also find E,(N). 2.4-2 Let {Xi} be independent and identically distributed according to the Pareto density Oae/xefl for x 2 a. Here a is known and we wish to test HO : 8 = 00 vs. H I : 8 = 81 (0 < 00 < el). Construct the SPRT and obtain its ASPJ and OC(8). 2.4-3 Let X have the probability density function
We wish to test HO : 8 = 1 vs. 8 = 2. Construct Wald's SPRT and find its
ASN,
ocp).
2.4-4 Find the ASN curve for testing HO : f ( z ;00) = t90e-zeo vs. H1 : f ( z ;81) = Ole-zel (x > 0) using 0 0 = 2,01 = 1, a = .05 and p = -10. Notice that for one of the five points used E ( N ) = 0. In view of the fact that N 2 1, this result must be wrong. Explain.
2.12. PROBLEMS
57
2.4-5 Find the ASN of the SPRT for testing 0 = 80 vs. 8 = 81 in a normal population with unit variance (use Q = ,O = -01). 2.4-6 Let 2 1 , 2 2 , ... be an i.i.d. sequence random variables and let Wi = lZil (i = 1,2, ...) and SN = ELl Zi where N is any stopping variable such that the event { N 2 i} is independent of Zi, Zi+l, .... Show that E(w/)5 6 k < 00 (i = 1,2, ...) and E ( N k ) < 00 imply E(S%)< 00. [Hint: Ericson (1966, Theorem I).]
a
+
2.5-1 Let the common pdf of X1,X2, ... be (1 - 8') exp (- 1x1 8z) for 181 < 1, --oo < x < 00. Evaluate E ( N ) for the SPRT of HO : 8 = -112 vs. Hl : 8 = 1/2 with a = O, = .01. 2.5-2 For the pdf considered in Example 2.5.1, derive the exact upper and lower bounds for the OC and ASN functions. 2.5-3 Let X take on values -1,O, 2 with probabilities 81,l - 81, -02, and 82, respectively. We wish to test
against
H~ : el = e2 = 116.
Using a = p = 0.05, find the exact values for OC(8) and E ( N ) where = el e2.
e
+
2.5-4 Let X take on values -2, -1,1,2 with probabilities 81,&, 1 - 281 - 8 2 ,
and
81,
against
respectively. We wish to test
H~ : el = e2 = 112.
Using Q = 0 = 0.5, find the exact values for OC(8) and E ( N ) where 8 = (s1,e'). [Hint: Z ( X ) takes on values In 2,0, - In 2 with probabilities 02, 1- 281, -02, and 201, respectively.]
2.8-1 Let f ( z ;0) = 0" (1 - t9)'-" , x = 0 , l and 0 < 8 < 1. We wish to test HO : 8 = 0.5 versus H1 : 8 = 0.75. Using a = p = .05 and no = 25, compute the bounds for the error probabilities when Wald's truncated SPRT is employed.
58
CHAPTER 2. THE SEQUENTIAL PROBABILITY RATIO TEST
2.8-2 In Problem 2.8-1 show that, for the SPRT truncated at n = no = 5, a(n0) 5 0.2281 and @(no)5 0.3979. 2.8-3 Let f ( x ;8 ) = e-e8z/x!, x = 0,1, ... and 8 > 0. We wish to test HO : 8 = 1 versus H I : 8 = 2. Using a = p = .05 and no = 25 evaluate the bounds for the error probabilities of the truncated SPRT. 2.8-4 Let f ( x ;8 ) = 8 (1 - 8)z, x = 0,1, ... and 0 < 8 < 1. We wish to test HO : 8 = 0.5 versus H I : 8 = 0.75. Using a = ,O = .05 and no = 30 compute the bounds for the error probabilities of the truncated SPRT. 2.8-5 Let X be distributed as normal (8,l). We wish to test HO : 8 = -0.5 versus H I : 0 = 0.5 with a = p = -05. Using no = 25, find upper bounds on the effective error probabilities when Wald’s truncated SPRT is used.
2.8-6 Let X be Bernoulli with 0 denoting the probability of a success. Suppose we wish to test HO : 0 = 0.5 versus H I : 0 = 0.9 with a = p = .05. Using no = 25, find upper bounds on the effective error probabilities when Wald’s truncated SPRT is used. 2.10.1 Let 8 denote the probability of obtaining a head in a single toss of a coin. Suppose we wish to test HO : 8 = 1/2 vs. H I : 8 = 3/4. Can you obtain a restricted SPRT for this binomial problem?
[Hint: The binomial tends to the normal distribution when suitably standardized. Also see Armitage (1957) .] 2.11.1 Let X be a normal ( 8 , l ) variable. We wish to test HO : 8 = -1 versus HI : 0 = 1. Using a = p = .01, find the asymptotic expressions for E ( N ( H i )i = 0 , l and the power function. 2.11.2 Let X have the logistic distribution function
We wish to test HO : 8 = -0.5 versus H I : 6 = 0.5. Using a = ,B = .01, find the asymptotic expressions for E(NIHi) i = 0 , l and the power function. 2.11.3 Let the random variable X have the density function
f(x;0) = cr2xe-x6, x,cr > 0 and 0 elsewhere. We wish to test HO : o = 1 versus H I : 0 = 2 . Using a = p = .01, find the asymptotic expressions for E(NIHi) i = 0 , l and the power function.
Chapter 3
Tests for Composite Hypotheses 3.1
Method of Weight Functions
In Chapter 2 we have considered SPRT’s to test a simple hypothesis against a simple alternative. However in practical situations the simple null hypothesis is only a representative of a set of hypotheses; the same thing can be said about the simple alternative. Thus, we are faced with the problem of testing a composite hypothesis against a composite alternative. The compositeness of the hypotheses can arise from two situations: (i) the composite hypotheses are concerned about the parameters of interest and there are no nuisance parameters, and (ii) the hypotheses may be simple or composite, but one or more nuisance parameters are present. Let f(x;6 ) denote the probability function (or probability density functions) of X , indexed by the unknown parameter 0 (which may be vector-valued). In general, we wish to test the composite hypothesis Ho : 6 E wo against the composite alternative If1 : 6 E w1. Let S1 denote the boundary of w1. Wald (1947) proposed a method of “weight functions” (prior distributions) m a means to construction of an optimum SPRT. Assume that it is possible to construct two weight functions gO(6) and gl(0) such that
lo
g o ( W 6 = 1,
s,,
g1(Q)dS = 1,
(3.1.1)
where dS denotes the infinitesimal surface element. Then the SPRT is based on the ratio
(3.I.2) and satisfying the conditions:
59
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
60
(i) the probability of type I error, a(0),is constant on wo; (ii) the probability of type I1 error, p(e), is constant over S1; and (iii) for any point 0 in the interior of w1 the value of p(0) does not exceed its constant value on S1. Wald (1947, Section A.9) claims that the weight functions gi(0) (i = 0 , l ) are optimal in the sense that for any other weight functions ho(0) and h1(0), the associated error probabilities a*(0)) p*(0) satisfying (as good approximations) (3.1.3) are such that:
1-B A -B
maxa*(0) 2 -- maxa(0) eEwo
and max p* (0) 2 eEwl
eEwo
B ( A - 1) = maxp(0). A- B eEwl
(3.1.4)
(3.1.5)
3.1.1 Applications of the Method of Weight Functions (a) Sequential Binomial Test
Let X take 1 or 0 with probability 0 and 1 - 0 respectively. We wish to test HO : 0 = 1/2 against two-sided alternative, H I : 10 - 1/21 2 6 > 0. So, let go(1/2) = 1 and go(6) = 0, for 8 # 1/2, and gl(01) = g l ( l - 01) = 1/2 and gi(0) = 0, otherwise, where 01 = 1/2 S. Then, if m denotes the number of positive observations in a total sample of size n, the continuation region of the SPRT is given by
+
B < 2n-1
[e;" (1- el)n-m + qrn (1- el)"] < A
(3.1.6)
Note that the sequential binomial test given by (3.1.6) is not optimal in the sense of (3.1.5) since it may not satisfy (iii).
(b) Sequential Chi-square Test Let X i be independent and normally distributed having unknown mean 0 and variance 02.We wish to test HO : O- = 0-0 against H I : O- = 0-1 > 0-0. Here choose g(0) = 1/2c, -c 5 0 5 c, and zero otherwise. The limit of the ratio of the modified likelihoods under HI and HOtends to (as c + 00)
(3.1.7)
3.2. SEQUENTIAL T AND T2 TESTS
61
It should be noted that the ratio in (3.1.7) indicates that the problem of testing HOagainst HI with 8 as a nuisance parameter is equivalent to the problem of testing a simple hypothesis about u against a simple alternative with known mean zero from a sample size n - 1. This can be established using Helmert’s transformation:
Yj
=
[XI + *
’
*
+ xj-1 - ( j - 1)X j ], j [j ( j -
W2
= 2 , 3 ,..,n.
(3.1.8)
Hence
Thus the properties of the SPRT based on the ratio (3.1.7) can be studied via the properties of Wald’s SPRT considered in Chapter 2 and it is optimal in the sense of (3.1.5) (Apply Theorem 2.9.1).
3.2
Sequential t and t 2 Tests
Here we will provide a practical situation in which the following hypothesis-testing naturally arises. Given a random variable X and a given number M , often we will be interested in knowing whether P ( X < M ) is equal to p or p’ where p and p’ are specified. For instance, X might be the tensile strength of a steel rod and M a lower limit below which the tensile strength should not fall; a rod is classified as defective if its tensile strength is less than M and then P ( X < M ) would be the proportion of defective rods in a large batch. We might wish to know whether this proportion of defective rods was equal to a low value p or to a relatively high value p‘. Since the tensile strength can reasonably be assumed to be normally distributed with mean p and variance a2,P ( X < M ) = @ [ ( M - p ) /o].Since we can shift the origin of measurements to M , we can, without loss of generality, set M = 0 and then P ( X < 0) = @ (-7) where 7 = p/a. If o is known, one can easily set up a sequential test. Let X be normally distributed with mean 8 and unknown variance u2. We wish to test HO : 8 = 80 vs. H I : 18 - 8 0 ) 2 So where S > 0. Then the boundary S1 consists of all points (8, a) for which 18 - 801 = So,i.e., it consists of two points for each fixed 0.Define 90 ( 8 , o ) = 1/c if 0 5 u 5 c, 8 = 80 (and zero elsewhere), and 91 (8,a) = 1/2c if 0 5 u 5 c, 8 = 80 f 60 (and zero elsewhere). One can
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
62 easily obtain
where
n
s = C (x;- eo)2= (n - 1)s; + n ( X . - eo)2 i= 1
and X n and s i (respectively) denote the sample mean and sample variance. By letting S/2a2 = v, one can show that the integral in the denominator is equal t o 2(n-3)/2 x r [(n- 1)/2] S-(n-1)/2. Also by letting S/2a2 = v and ( X n - 00) /N2= T we have
fi n @(T;S,n) = lim - = c--t-
fo,n
e-n62 /2
r [(n- 1)/2]
Irn 21(n-3)/2e-vcosh
0
Thus, the limit of the modified likelihood ratio is a function of T only. Also, since $(-T) = $ ( T ) , it is a function of ITI. Furthermore, $ ( T ) and (57, - 80) 2 / S are single valued functions of IXn - 801 /Sn. Now since the joint distribution of {IX,- 001 / S n , n = 2,3, ...} depends only on = 10 - 801 / a , it follows that (i) a(0,o)is constant on wo and (ii) P(8,a) is a function of 171 = (0 - 801 / a . Analogously, for the sequential t-test by taking g1 (0,o) = 1/c if 0 5 o 5 c and 8 = &+So (and zero elsewhere) we obtain the limit of the modified likelihood ratio to be
Jn
(X - 80) / S n , - Thus the sequential procedures can be based on tn where tn = X denoting the sample mean and sn the sample standard deviation respectively, that are based on n observations. That is, the sequential t (or t 2 ) test of HO : 0 = 80 VS. the alternative H I : 8 - 80 2 SO (18 - 001 2 SO) can be described as follows: If the experiment has proceeded until nth stage, the sampling continuation region is given by
where the constants B, and A, (Bk and A:) are obtained by inverting the inequality: B < Q1 ( T ;6,n) < A [B < @ ( T ;S,n) < A] in terms of tn [ltn1].David and Kruskal (1956) obtain an asymptotic expression for q1 ( T ;S,n) and appealing t o the asymptotic normality of T when suitably standardized show that the sequential t-test terminates finitely with probability one.
63
3.2. SEQUENTIAL T A N D T2 TESTS Rushton (1950) has obtained an asymptotic approximation to In $l. Let
where t n denotes the Student’s t-statistic based on a random sample of size n. If the sampling continuation region for the sequential t-test is (approximately) of the form (3.2.3) then Rushton (1950) has obtained, for large n
From numerical study, Rushton (1950) concludes that one can use the approximation (3.2.4) with confidence and that one should add the term [4 ( n - 1)I-l and S2T*2 [24 (n - 1)I-l only when he/she is about to reach a decision. Rushton’s approximation to Wald’s t2-test is
(3.2.5)
3.2.1 Uniform Asymptotic Expansion and Inversion for an Integral Let
e-62n/2
L1=
r b - WI Sm
z(n-3)/2 e -x+6T* &dx
0
and
where T* = nT and S,L1 and L2 are given positive numbers and n is large. We want to solve these equations for T* as a function of n,L1 or L2 and 6,that is T* = T*(n,Li,S). We will show that the solution of the second equation is closely related to the solution of the first equation. Towards the solution of the first equation, let p2 = 2n - 3 and - p t f i = ST*. Now, we solve for t. Employing some uniform asymptotic expansions, Govindarajulu and Howard (1989) obtain, after writing
+
t = to tlp-2
+
(3.2.6)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
64
t o ( t 0 < 0) is the solution of the transcendental equation
tl is given by
and
{ [(1+ ti)1/2- to] t: - + 27-4 ( t o ) / (1+ ti) } 2 [tu - (1+ , 3 1 7 (1+ tot1
t 2 =t2 (t0,tl) =
+
where ul(t)= -(t3 65)/24. In the following we provide a table of values of t o corresponding to various choices of 6. Table 3.2.1 Negative Root: -to(b)
6
-to (6)
6
.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0
2.59 (loM3) 9.95 10-3) 2.22 low2 3.92 6.06 loM2 8.62 low2) .116 .148 .184 .223
1.1 1.2 1.3 1.4 1.5 1.6
1.7 1.8 1.9 2.0
-to@)
-26 .31 .35 .40 .45 -50 .55 .60 .65 .70
Quadratic interpolation in Table 3.1 will yield at least two significant figures of accuracy. If the sampling continuation region for the sequential test is given by
with the constants An,l and B,,J given by
3.2. SEQUENTIAL T AND T2 TESTS
65
where t o = t ( S ) , tl = tl (to,B ) and t 2 = t 2 (to,tl), and An,1 is given by the same formal expression as Bn,1 except A replaces B everywhere. For the sequential t2-test, if the continuation region is
Bn,2
where Bn12 = replaces A.
3.2.2
Brill
< IT*l < An,2
except 2B replaces B and where
An,2 =
A,,J except 2A
Barnard's Versions of Sequential t - and t2-tests
The test criteria are given by
(-;""> {F ( 572)n 1.6~:~)
Wl(T*,S,n) = exp -
+JZ(aT*)F(T,q;T)
n+l
and
W2(T*,6,n) = exp where F ( a ,c; x) denotes the confluent hypergeometric function. For the sequential t , Govindarajulu and Howard (1989) and Rushton (1952, p. 304) show that
-*)
{ F ( - n - 1 1 62T*2 2 '2' 2 + JZ(fiT*)F (2' n 5; 3 S2T*2
Gl(T*,S,n) = exp(<) -n6
and
T) [.)(; hh (4)(- -)
q2(T*, 6,n>= exp -nS
F
n-1 1-S2T*2 2 '2' 2
}, '
That is, Barnard's criteria use the parameter n in the first argument of the F function and in the gamma functions, whereas Wald's criteria use the parameter n - 1 in the same places.
3.2.3 Simulation Studies Govindarajulu and Howard (1989) carry out simulation studies in order to compare their approximations and Rushton's (1950, 1952) approximations for the sequential t and t2-test using Wald's versions for the test criteria. The following results were obtained which are based on 500 replications with a = ,6 = 0.05.
66
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES Table
(R Mean Stopping Time Mean Difference S.E. of d
Test t-test
S
R
0.25 100.11 0.50 28.18 1.0 10.55 1.50 7.82 1.75 7.53 t2-test 0.25 114.86 0.50 32.62 1.0 11.59 1.5 8.35 1.75 7.85
G&H 97.63 27.44 10.16 7.06 6.42 115.03 33.02 11.80 8.26 7.37
d=R-G&H 2.48 0.74 0.39 0.77 1.11 -0.17 -0.40 -0.21 0.09 0.48
SJ=&
m
0.78 0.24 0.06 0.06 0.07 0.04 0.10 0.03 0.06 0.04
On the basis of these simulation studies, we note that the average stopping time based on Govindarajulu and Howard’s approximation for the sequential t-test is consistently smaller than the one based on Rushton’s (1950) approximation. In the case of t2-test, Rushton’s (1952) approximation to the t2-test is slightly better than Govindarajulu and Howard’s for small 6(65 1)’while the latter is slightly better than Rushton’s for large S (S > 1).
Remark 3.2.3.1 The current approximations to Barnard’s versions of sequential t- and t2-test criteria can be obtained by making the following changes: (i) (ii) (iii) in
3.2.4
Set p2 = 2n - 1; Change A to Aexp (-S2/2) ; Change B to B exp (-S2/2) ; the expressions for Anli and Bnli (i = 1’2).
Asymptotic Normality of the Statistic T
Toward this we need the following lemma.
Lemma 3.2.1 Let x: be a chi-square variable with v degrees of freedom. T h e n ( 2 ~ : ) ~’ ~(2v)l12 is approximately standard normal f o r large v. Proof.
= P { - x:-v
<x+ 2 ( 2x V2 y 2 (2V)ll2 = @(x) o(1). I
+
1
3.2. SEQUENTIAL T AND T2 TESTS
67
Corollary 3.2.1.1 If s: denotes the sample variance i n a random sample of size n, & ( s n / a ) is approximately normal with mean 0 and variance 112.
Jn
Jn
( x n - 00) / S n . Then for suficiently large n, Lemma 3.2.2 Let tn = t, - J n q is approximately normal with mean 0 and variance 1 q2/2 where r] = ( 8 - 8 0 ) / a when 8 is the true value of the parameter.
+
Proof. One can write
in distribution since S n / o converges to one in probability. Thus, when 8 is the true value for large n, the distribution of tn - f i q is normal with mean 0 and variance 1 q2/2, after using the above corollary. H
+
Next, one can write
T*= f i t , (n - I + tn) 2 -1/2
Now, for y
Hence
>0
tn (1 +
5)
-1’2
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
68 Then
Thus
+
Thus it follows that T* - fiq (1 q2)-lI2 is for large n, asymptotically normal with mean zero and variance (1 q2/2) (1 q 2 )-3 when 8 is the true value since
+
+
+
(1 q2)lI2tends to --oo for large n. Alternatively, we can use the delta method as follows. Let
-y - &q
Then we can write T* as
3.2. SEQUENTIAL T AND T2 TESTS
Since
g(x) = x (1
69
-1/2
+);
7
Hence Thus or
However,
Hence
That is
when 8 is the true value and g (&q) = f i q (1
3.2.5
+ q ) -1/2 . 2
Finite Sure Termination of Sequential t- and t2-tests
We have shown [see also David and Kruskal (1958) or Cram& (1946, Section 28.4)] that if q = (0 - 00) /a,then
T*
A
Zo*+ n1/2v (1+ q2)-lI2 (in distribution),
g*2
=
(1+ r12/2) (1 + rl
2 -3
)
7
where Z has asymptotically a standard normal distribution. Let N the stopping time for the sequential t(t2)-test. Then
P ( N = 00) = lim P ( N > n ) , n-oo
(N*) denote
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
70
P ( N > n) = P(Bk1 < T* < Ak1 for all k 5 n) 5 P(B,1 < T* < A,1)
after substituting the asymptotic expressions for A,1 and to is free of A or B. Thus P ( N = m) = 0.
B,1;
and noting that
In order to show the finite sure termination of sequential t2-test, consider
and show that each probability on the right side tends t o zero by proceeding as in the case of sequential t. The sequential t has the following properties. Property I: The (one-sided) sequential t-test terminates finitely with probability one. Let PT (0,a) denote the probability of Type I1 error when the true parameter values are 8 and 6. If it depends only on a single function r ( 8 , a ) of the parameters, we shall denote it by PT ( 7 ) . The same convention is adopted for aT ( @ , a- ) Property 11: For the sequential t-test T of Ho against one-sided H I , PT (7) is a decreasing function of q = (6' - 00) / a > 0.
Let Ca,p denote the class of tests with average error of Type I equal to a and average error of Type I1 equal to P, the two averages being with respect to any two weight (prior density) functions defined over the regions 6' = 6'0 and 8 - 6'0 2 60 respectively. Let C& be the class of weighted probability ratio tests belonging to Ca,a. Further, CA,B be the class of weighted probability ratio tests with boundaries A , B . Then, we have the following theorem pertaining to the optimum property of the sequential t-test. Theorem 3.2.1 (J.K. Ghosh, 1960). Let T be any sequential test of HO : : (6' - 6'0) 2 60 > 0 with error probabilities a~ (6'0) = a and
6' = 6'0 against Hl
3.2. SEQUENTIAL T AND T2 TESTS
71
p
with respect to 8 - 80 = 60. Then, T has the double minimax property in the namely: f or TI E Ca,p class Car,@,
Corollary 3.2.1.1 If Wald's approximations to the boundaries of the probability ratio an terms of the error probabilities is allowed, T has the double minimax property in the class CA,Bwhere A = (1- p) / a and B = p/ (1- a ) .
3.2.6
Sequential t2-test (or t-test for Two-sided Alternatives)
Let us consider the sequential t-procedure as given in (3.2.1) and study its properties. Property I: The sequential t2-test terminates finitely with probability one.
Property 11: The sequential t2-test for HO : 8 = 80 us. HI : 18 - 001 = SO has constant error probabilities a and p where A 5 (1- p) / a and B (1- a ) . Theorem 3.2.2 (J.K. Ghosh, 1960). For HO : 8 = 8 0 us. H I : 18 - 801 = SO, the sequential t2-test has the double minimax property which in Ca,p is obtained by replacing T b y T 2 ,TI by TI2 and (8 - 0 0 ) 2 6a b y 18 - 801 = 60 in (3.2.8) provided A and B are so chosen that T 2 has error probabilities a and p. Moreover, i f A (1 - p) / a and B p/ (1- a ) , then T 2 is double minimax in CA,J.
-
-
Proof. The proof is analogous to the proof of Theorem 3.2.1 and one should use Property 11. W Sacks (1965) proposed a sequential t-test for one-sided alternative with possible extensions t o the two-sided situations. His procedure is as follows: Let HO : 0 = 0 vs. HI : 0 = 0,0 > 0. Let
and
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
72
Then take an ( n
+
observation if
and if
stop and accept HO if (3.2.10) is violated, and stop and accept I31 if (3.2.9) is violated. Notice that the procedure is symmetric with symmetric bounds A and A-l. (That is, a! = p). We also have the following result. Theorem 3.2.3 (Sacks, 1965). For the above procedure, let N denote the stopping time or variable. Then (3.2.11)
Remarks 3.2.3.1 Similar ideas would work for H I : 8 = 60 or 101 = 60. Since the distribution of N depends only on 8/o, (3.2.11) is valid for any 8 and o with 8 = cr. (That is, for any point in HI). Sacks (1965) points out that the moments-generating function of N is finite for t in some neighborhood of zero. The sequential test has bad asymptotic properties when 0 = a/2. When 8/cr = 0 or 1 the author claims that the error probability is o (In A/A). Remarks 3.2.3.2 Sacks' (1965) t-test can be obtained by employing the following class of weight functions: (2alnc)-', and 91 (e74 =
{ 0)
(2alnc)-'
8 = 600 and c-l elsewhere in wo
, 6'= 610
< cr < c
and c-l < cr elsewhere in q.
(3.2.12)
(3.2.13)
Note that
= 2("-2)/2S- n / 2 r(7212)
where S = Also
C.5
after substituting v =
& and carrying out the integration.
73
3.2. SEQUENTIAL T AND T2 TESTS where T = F/&,
Thus Sack’s (1965) criterion is
which coincides with Barnard’s criterion when S = 1, and with Wald’s criterion when we set 6 = 1, B = A-l, ,B = a and replace n by n 1 in the exponent of v and the gamma function occurring in Wald’s criterion. The GovindarajuluHoward approximation to Sack’s version of the sequential t can be obtained by setting p = 2n - 1, B = 1/A and S = 1, and changing A to A exp (-S2/2).
+
Hall (1962) has given two analogues of Stein’s two-stage test procedures for testing hypotheses about the mean of a normal population with specified bounds on the error probabilities when the variance is unknown. These procedures are modifications of Baker’s (1950) procedure. They provide alternatives to the sequential normal test (variance known) or the sequential t-test. The performance of these tests does not depend on the validity of any assumption regarding the variance. Moreover, unlike the t-test, these procedures do not require that we specify the alternative hypothesis in standard deviation units.
3.2.7 The Sequential Test T Let X I , X2, ... be an i.i.d. normal (8, a 2 ) sequence of random variables with -00 < 8 < 00, 0 < a < 00. Let m ( m 2 2) be a specified integer. We wish t o test HO : 8 = 0 versus If1 : 8 = 6 ( a unknown) based on X m , Xm+l, ... having terminal boundaries Am and B, and with a replace by S m where
x,=--ZX~, l r n s k = -lCr (n X i - X m )
-
m
i=l
2,
2
,~
= m - l ,
i=l
and
(3.2.14)
74
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
and similarly
b,
=
lnB,
(3.2.15) The first test of Hall (1962) will be called test T. Note that A , > A G l / a and B, < B = ,O with approximate equalities instead of inequalities if m is large, where A , B denote Wald’s conservative stopping bounds that are appropriate if cr were known. Let (3.2.16) i=l
Then we can describe the procedure T as follows: Take observations XI, X 2 , ... X,. Then successively observe X,+l, Xm+2,...; and for each n 2 m, after observing X,, (i) (ii) (iii)
Stop and take decision do (accept Ho) if rn (s,) Stop and take decision d l (accept H I ) if rn (s,) and take one more observation if b, < rn (s,)
5 b,; 2 a,;
< a,.
It can be shown that
P(d1 using TI8,cr) < a for all 8 5 0,a > 0, and P ( d 0 using TI6,a) < ,f3 for all 6 2 S,a > 0.
(3.2.17)
That is, T has strength ( a , @ . Further, since the SPRT T (s, a ) terminates finitely with certainty for every fixed value of, ,s the test T also terminates with certainty. Hall (1962) has o’btained expansions for the OC and ASN functions of T . Further Hall (1962) has made certain comparisons of the approximate power and ASN functions of the sequential test T , Stein’s two-stage test, Wald’s SPRT and fixed-sample size test (FSST) if cr were known for = p = .05 and .01. He surmises that substantial sE,vings may be possible using T , at least if one of the hypotheses is correct. The comparison between T and Stein’s test is analogous to the comparison between the SPRT and the FSST of the same strength.
3.2.8
An Alternative Sequential Test T‘
For the symmetric ( a = ,8) one-sided case, a minimum probability ratio test (MPRT) which has converging straight line boundaries can be adapted. The
3.3. SEQUENTIAL F-TEST
75
MPRT is equivalent to one of Anderson’s tests (1960) discussed in Section 2.9. The test T‘ is given by: For n 2 m, stop sampling as soon as
and choose d l or do according as ,c
Cy=l(xi- S/2)
= v [(2a)-2/’-
is
> or < 0 where
11 -2 In (2a) + o ( v - l ) ] .
(3.2.18)
If N‘ is the stopping time associated with the test T‘, then a lower bound for E(N’) at 8 = S/2 is (3.2.19)
3.3 Sequential F-test In this section, we are concerned with testing for location in several normal populations in the presence of nuisance parameters. P. G. Hoel (1955) has applied Wald’s method of weight functions to the general linear hypothesis-testing problem. In the following we take a special case of the formulation of the problem by Hoel (1955). Let X I ,X2, ...,X , be independent normal variables with means 81, 8 2 , ...,8, (Qi= 0 for i = p 1,..., s) and common variance 0 2 .We wish to test
+
H~ : el = . . . = e, = o ( a > 0)
(3.3.1)
against the alternative
where A0 is a specified constant. Since the likelihood is a function of Cr.l 8f’/cr2, it is reasonable to consider such alternative hypotheses. Notice that cr is a nuisance parameter which makes HO a composite hypothesis. Let S1 denote the boundary of the region (3.3.2)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
76
The following is a special case of Hoel’s (1955) normalizing weight functions. ul&,
91 (W =
{ O, { 0,
if o _< u _< c otherwise
(3.3.3)
~ 2 0 ~ 2 if,
o 505c
otherwise,
+
where the ai and bi are certain related constants with bl 5 0 and b2 = bl 1 - p. Let ( X i l , X i 2 , ...,X i n ) be a random sample on X i (i = 1,2, ...,s). Let f1nc (font) denote the density obtained after integrating the joint density of X i j (i = 1,2, ...,s, j = 1,2,...,n) with respect to the prior density g l ( o ) ( g o ( o ) ) on the region SIC(woe) which is that part of the surface S1 (WO) when 0 5 0 5 c and wo is the region in the parameter space characterized by the null hypothesis. Notice that SICand woCare truncations of S1 and wo which imply the existence of the necessary integrals. After forming the ratio flnc/fonc and letting c + 00, the special case of Hoel’s (1955) expression is given by
f In = fon
lim
f 1nc -
c-mo fonc
(3.3.4) L
where
s =
(3.3.5)
M(a,b,u) = the confluent hypergeometric function, and -
l n
x i = -):xij.
n j=1
Note that if bl = -1 and s = p, the test criterion (3.3.4) reduces to that obtained by Johnson (1953) by the application of invariance considerations proposed by Cox (1952a). If bl = 0 and s = p = 1 the test criterion (3.3.4) reduces to the sequential t2-test criteria (see $(T;6,n) in Section 3.2). If s = p = 1 and bl = -1 the test criterion (3.3.4) yields a version of the sequential t2-test given independently by Barnard (1952) and Rushton (1952). In the following we assume that s = p and bl = -1. Letting b = p/2, X = np/2 and T = AoS/p (i.e., XonS/2 = AT) (3.3.4) takes the form of (3.3.6)
3.3. SEQUENTIAL F-TEST
77
If a and p denote the type I and type I1 error probabilities, then setting A = (1- p ) / a and B = p/(l - a ) the sequential F-test can be described as follows: If the experiment reaches the nth stage, the sampling continuation region is given by B < M (A, b, AT)e-xxO/P < A. (3.3.7) If we invert the inequality (3.3.7)) then the sampling continuation inequality becomes (3.3.8) Bx < AT < Ax. Before we discuss this inversion, it is useful to consider the asymptotic distribution of nS. Towards this, Govindarajulu and Howard (1994) (which will be abbreviated as G-H (1994)) have shown that
in distribution as n -P
00
where Z has a standard normal distribution and
s=>:-. i=l
82 02
Thus S tends in probability to S/ ( p + 6)as n 00. Hence S remains bounded in probability as n increases and this is useful in the forthcoming expansions. Using the inversion formula (3.3.8) G-H (1994) show that the sequential procedure terminates finitely with probability one. Also note that Ray (1957a)) using the monotonic properties of the logarithm of the M function, established almost sure finite termination of the sequential F-test. --f
3.3.1 Inversion Formula We are concerned with solving equations of the form
M (A,
$, x) = A e X X o / p
(3.3.10)
for x > 0 where 0 < Ao/p 5 1 with A and p given ( n large, p = 2,3,4, ...). We assume that p / 2 is a positive integer, so p is an even positive integer; if p is an odd positive integer, then the analysis becomes complicated by the appearance of “extra terms”. However, the formula valid for even p also gives satisfactory results when used for odd p . So it suffices to provide the results for p even. Let Ic = ( n- l ) p / 2 , so that Ic b = A. Then we have
+
(3.3.11)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
78
where the C,k are the usual binomial coefficients. Using the leading term in the Euler-Maclaurin sum formula, G-H (1994) replace the sum involving the binomial coefficients by an integral and then using Laplace's methods, obtain an asymptotic expansion for the integral. Thus, they obtain
x2 - 3b2 + 6bx + -4 + 6b12+ (xk)1/2 -
1 - 6b
+ 6~ + 3
( -~b)2 + 12xb2 + O ( k - 3 / 2 ) . (3.3.12) 48xk
Solving (3.3.10) for x is equivalent to solving for x from
G(x) = In M ( k + b, b, x) - 1nA - ( k + b)Ao/2b = 0
(3.3.13)
where G(x) can be obtained from (3.3.12) and
xG'(x)
M
+
1 3 - 8b 4 ( +~ b)2 ( k ~ ) l-/ ~ -(-I - 2 2 2b) + 4 32(~1C)l/~ 9 + 1 4 -~30b - 24x2 + 24b2 0 ( k W 3 l 2 ). 384xk
+
+
+
(3.3.14)
Now use the Newton-Raphson algorithm which yields the sequence {xn} (n = 0,1,2, ...) where
xn+l = (xn) - G (Xn)
(3.3.15)
G' ( X n ) where equations (3.3.12) and (3.3.14) can be used to evaluate the quotient GIG' in (3.3.15). From a convexity argument we take xo = 0 in (3.3.15) and obtain
-
blnA
+ ( k + b)Xo k+b
(3.3.16)
Further approximations to the root x of (3.3.10)are obtained by using (3.3.15), (3.3.12) and (3.3.14) to generate x2, x3, etc. Thus we have the algorithm.
Algorithm. To find an approximation to the solution of (3.3.10) in x > 0, use the Newton-Ralphson sequence generated by (3.3.15) with G and G' given respectively by (3.3.12) and (3.3.14) starting from x1 given by (3.3.16).
3.4. LIKELIHOOD RATIO TEST PROCEDURES
79
Note that unless the right side member of (3.3.10) is at least one, this equation has no nonnegative solutions for x, since
M
(A,;,.)
2 1 for
x > 0 , P2 > 0 , x 2 0.
Recall that the sampling continuation region at the nth stage is given by (3.3.7). The algorithm above can now be used to give an equivalent sampling continuation region given by (3.3.8) where Ax is the root, x of (3.3.10) and Bx is the root, x of (3.3.10) with A replaced by B. These roots are obtained using (3.3.15), (3.3.12) and (3.3.14). It is found that 4 or 5 iterations of the algorithm are needed to obtain Ax and Bx with an error of 1%.G-H (1994) carry out some simulation study (with 500 replications for each case) with a = ,O = 0.05 (and hence A = 1/B = 19) and 9i = (X,/p)l/’. Some of their values are presented in Table 3.3.1. Table 3.3.1 Stopping Times for some Parameter Configurations
3.4
XOIP
P
Average Stopping Time (AST)
Standard Error of the AST
0.5 1.5 0.5 1.5 0.5 1.5 0.5 1.5 0.5 1.5
1 1 3 3 5 5 7 7 9 9
17.2 8.4 7.8 4.4 5.10 4.16 4.58 3.96 4.29 3.90
0.35 0.14 0.13 0.04 0.06 0.25 0.04 0.01 0.03 0.01
Likelihood Ratio Test Procedures
If it is not feasible to apply the method of weight functions of Wald (1947), and if invariance considerations do not apply, then one should look for a procedure in which sample estimates replace the true values of the nuisance parameters. Estimates like the BAN (best asymptotically normal) estimates will suffice. In particular, we confine ourselves to maximum likelihood estimates which have some well-known desirable large-sample properties. In this section we implicitly assume that we are testing hypotheses that are ”close” to each other, hence the need for large sample sizes to terminate the experiments. Large-sample properties of maximum likelihood estimators (mle’s) in fixed-sample size cases have been studied by CrAmer (1946); the regularity conditions have been relaxed to some extent by LeCam (1970).
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
80
Let f (2; 6 , q ) be the common probability density (probability function) of a sequence of i.i.d. random variables {xi}, where 6 E R, q E A and let 0 = R x A. Then we have the following well-known result.
Lemma 3.4.1 For each ( 6 , q ) E 0 , assume that
( i ) The first partial derivatives of f with respect to 6 and q exist and are bounded absolutely by some integrable functions of x, and
(ii) The second partial derivatives of f with respect to 6 and 7 exist, are absolutely bounded by some integrable functions of x and have finite expectations. If
then,
' '
- d21(@' ')
-+
-I&
- d21(" ') n dedq
-+
-Ieq in probability as n -+00,
-d21(e7rl) n 672
--+
-Iqq an probability as n -+ 00
n
where
in probability as n + 00,
do2
" ( Z ) o=+-), =
dln f
war
( dln f7 = ) -E ( d2In fF = l e) e,
var
nf (d lF )=
-E
(3.4.1)
(3.4.2)
( F )Ivv. d2In f
=
Proof. Assumptions (i) and (ii) enable us to perform differentiation underneath the integral sign and imply the existence of 100, Ieq and Iqv. Also I (@,q ) is the sum of i.i.d. random variables since the X ' s are. Hence application of Khintchine's theorem pertaining to the convergence of an arithmetic mean of i.i.d. sequence of random variables to their mean, yields (3.4.1).
3.4. LIKELIHOOD RATIO TEST PROCEDURES
81
Lemma 3.4.2 (Crkmer, 1946; LeCam, 1970). If the assumptions of Lemma 3.4.1 are satisfied and i f
(3.4.3) and analogous results hold for the mixed second derivative and the second partial derivative of log f with respect to r], then the mles an, based o n a random sample of size n, have an asymptotic bivariate normal distribution with means 6' and r ] , and variances n-l Ice, n-l Iqq and covariance n-l Ieq where Iee =
on
Fee
-
-1
liv( I ~ ~ ) --'' ]
-I
;~ (10e)-']
,1077
[ (IeeIqv
-
-'
I ; ~ )/ I ~ ~ I
Remark 3.4.1 Uniform continuity of the second partial derivatives of 1 ( 0 , ~ ) with respect to 6' and 7 or the existence and absolute boundedness by integrable functions in x of the third order partial derivatives could replace the assumptions in (3.4.3). Maximum likelihood SPRT's have been proposed by Bartlett (1946) and D. R. Cox (1963). First we shall give Cox's procedure which is a slight modification of Bartlett's procedure. Suppose we are interested in testing HO : 6' = 6'0 against I31 : 6' = 81, where 7 is the nuisance parameter, no prior information about which is available. We further assume that 6'1 - 6' and 6'0 - 6' are of the order n-lI2 where 6' denotes the true value. Let en and denote the mle's of 6' and r] based on ( X I ,X2, ...,X n ) . Then consider the sequential procedure based on the likelihood ratio (3.4.4) 2; = l(6'1, on) - l(eo,'Fln)-
on
The method of Bartlett (1946) involves the use of two mles of r] one assuming that 6' = 81 and the other with 0 = 6'0. His method will be described later on. Notice that exp (2;) is no longer a ratio of probability density functions and hence, the boundary values expressed in terms of error probabilities cannot be used without proper justification. Taylor's expansions of 1 ( O i , 9,) for i = 0 , l about the true (6',r]) yield
(3.4.5)
a
where the remainder involves the differences of the second order derivatives, and it converges to zero in probability when - 6'1, i = 0 , l are sufficiently small
82
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
and the second derivatives are smooth (see Remarks 3.4.1). Next, expanding the following functions of en and ijn (which are the mles)
about (e,n)we obtain
and
where &,J and &,2 converge to zero in probability and hence can be neglected for sufficiently large n. Substituting (3.4.7) in (3.4.5) and ignoring terms of order o(1) we obtain Theorem 3.4.1.
Theorem 3.4.1 (Cox, 1963). When lei - 81 is suficiently small (i = 0 , l ) and the second partial derivatives are smooth, then we have
ZA = (el - e,) Iee n On
(*
-
-
eO:el)
where n8, is asymptotically normal with mean n8 and variance ,Iee. Also, using (3.4.1) in (3.4.7) and (3.4.8) we see that f o r large n, n 8, - 0 ) is the sum of
(*
i.i.d.
{K} i = 1 , 2,...,72 where
-1
Also, note that w a r y = Iee = Iqq (IeeIqq - I & ) . The asymptotic representation of 2; suggests that the test should be based on
where E (Tn) = n ( 0 - (01 + 0 0 ) /2} and war (Tn) = nIee. Thus, for large n, Tn is a random walk where the mean increment per step is 8 - (01 0,) /2 and the variance per step is Ice. Further, leecan be estimated by substituting 6' = (01 + 0 0 ) /2 and q = 5. We can therefore use the theory for normally distributed
+
3.4. LIKELIHOOD RATIO TEST PROCEDURES
83
observations. (The continue-sampling inequality for testing HO : 8 = 0 0 versus HI : = 81 when u is known is: u21nB < (el - e,) Thus, the continue-sampling inequality becomes
where
b = IeeIeeln
(J-)A %In (J-) l-G!
a = IeeIeeln
(T)
1-P
(a> l-G!
A
c,ln
1-P
(3.4.9)
and
It should be pointed out that the above procedure can be carried out even if 7 is a vector. Appropriate modifications should be made in the justification. Now, we can use Wald's (1947) approximations for the boundary values in terms of error probabilities. Furthermore, the expressions for the OC functions, and the ASN are also valid for sufficiently large n. Example 3.4.1 (Cox, 1963). Suppose that X i are i.i.d. normal with mean p and variance u2. We wish to test HO : 8 = p / u = 8 0 against H I : 6' = 81 (00 < el). Here i n = X n / s n , ijn = sn. Thus 2;
= n (01 - 0 0 )
lee
= 1,
I ~ , = (2
rev
=
[Xn(sn)-l
- (00
+ 01) /2]
, n = 2,3, ...,
+ e2) lo2,
elu.
Hence, we take c, = (1 + F:/2si) and carry out the procedure. It is of interest to note that the above procedure is asymptotically equivalent to the sequential t-test ( to the asymptotic form of Rushton (1950), See Eq.(3.2.4)). One can obtain sequential likelihood ratio tests in order to test Ho : p = po against H I : p = p l (pl> po) when u is unknown. Here the principle of invariance does not apply. The sequential procedure is obtained by replacing u by s in Wald's
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
84
SPRT with u known. Since s and ?E are independent, there is asymptotically no change in the properties of the sequential procedure from that when u is known. Furthermore, one can also derive a sequential likelihood ratio procedure for the Behren-Fisher problem (in which we wish to test the equality of the means of two normal populations having unequal and unknown variances). There might be some situations where it is easier to calculate the mle’s of 77 when 8 = 80 and 8 = 81 than to compute the joint mle’s 8, and 4., Bartlett (1946) proposed that the sequential likelihood ratio procedure be based on
el
where [Go] denotes the mle of 77 when 0 = 01 [@,I. Notice that the subscript n on 77 is suppressed for the sake of simplicity.
Theorem 3.4.2 (Bartlett, 1946). Under the regularity assumptions of Theorem 3.4.1 we have
Proof. Let 01 be the true value of 0. Expanding I and using (3.4.1) we obtain Taylor series around 1 (81,fil) 2;
=
(01 - 60)
(g) +
(41 - 40)
(80,fiO) -
(E)
I (01,fjl)in
(3.4.11)
1
1
where the subscript i denotes that the particular derivative is evaluated at 8 = 8i and 7 = iji (i = 0 , l ) . Also
Furthermore, since
(c> (c>
(”> -(”) 87 e=eo
877 e=el
=
= 0, we have
-(”> 877
=
(E)
=
87 e=eo -n (77 - 40)
Iqq
= n (40 - 41)IW
-
+(;)l-(E) 87 e=el 0
(41 - 77) 17777
(3.4.13)
Hence, from (3.4.12) and (3.4.13) we obtain
(3.4.14)
3.4. LIKELIHOOD RATIO TEST PROCEDURES
85
Also
after using (3.4.14). Now using (3.4.13), (3.4.14) and (3.4.15) in (3.4.11) we obtain
which is a sum of i.i.d. random variables after using (3.4.14) and hence, Wald’s approximations to the boundary values, OC function and ASN are applicable.
Remark 3.4.2 Cox (1963) has applied sequential likelihood ratio procedure in order to test certain hypotheses about two binomial proportions. If p l and pa are the two proportions, let 8 = p2 - p l and 7 = p l or 8 = In [p2/ (1- p2)] In [pi/ (1- pi)] and 7 = p i / (1- p l ) . We will be interested in testing a simple hypothesis about 8 against a simple alternative. For the normal problems (see Example 3.4.1) Joanes (1972 and 1975) has compared the procedures of Bartlett (1946), Cox (1963), Barnard (1952) and Wald (1947) (the test of Barnard (1952) [Wald (1947)] is a sequential t-test obtained by using the weight function g (0)= 0-l (0 < 0 < 00) [g (0)= 1/c (0 5 0 5 c)]. Using asymptotic expansions similar to those obtained by Rushton (1950), Joanes (1972) surmises that Bartlett’s test (1946) is closer to that of Barnard rather than to that of Wald (1947). Cox’s (1963) test procedure with modified bounds given by (3.4.9) is asymptotically equivalent to that of Bartlett. All these statistics, when asymptotically expanded, differ only at the 0 (n-1/2)term. Breslow (1969) provides a general theory of large-sample sequential analysis via weak convergence approach which explicitly justifies Cox’s (1963) approach. He applies this general theory to the comparison of two binomial populations (see Remark 3.4.2) and the comparison of two exponential survival curves; the latter problem can be described as follows: Let 2720 denote the number of patients that enter into study in the time interval (0,T) (that is, the total number of
86
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
observations available to the experimenter), no to be placed on treatment A and no on treatment B . Let the entry times of the former [latter] group be denoted by Hi[Ji] (i = l72,...,n0). Let X1,X2, ... [Yl,Y2,. .I denote the survival times of the patients in group A [group B] patients. Let
and F ( ~ ) ( Z= ) P(Y
5 x) = 1 - exp(-zXB), x 2 0,
XB
> 0.
+
We wish to test HO: 8 = (XA/XB) = 1 against Hno : 8 = en, = 1 2r]/no 1 / 2. Breslow (1969) proposes a large-sample sequential test of HO(as no -+ m) based on the mle of 8.
3.4.1 Generalized Likelihood Ratio Tests for Koopman-Darmois Families Before we present these open-ended tests for Koopman-Darmois families we need the following result of Lorden (1970) pertaining to the excess over the boundary. In SPRTs cumulative sums play an important role. Let Sn = X I + X2 ... X n denote sum of n independent random variables with common mean p > 0. The stopping time is given by N ( t ) = inf {n : Sn > t}. Wald’s (1947) equation states p E N ( t ) = ESN(t) (see (2.4.10)) whenever supn E lXnl and E N ( t ) are finite, and can be rewritten as p E N ( t ) = t ERt, where Rt = S N ( t ) - t.
+ +
+
Definition 3.4.1 Suppose a random walk, {Sn}T=o having positive drift and starting at the origin is stopped the first time Sn > t 2 0. Let N ( t ) = inf {n : Sn > t } . Then Rt = SN(t) - t is called the “excess over the boundary.” The excess over the boundary is often assumed to be negligible. Wald (1947) gave an upper bound for supt>oERt in the case of i.i.d. X’s, namely SUP^>^ E ( X - rlX > r ) (see Wald, 1947, p. 172. Eq. (A.73)). Wald’s bound whichis large by at most a factor [l- F(O)]-’, may be difficult t o calculate. Lordon (1970) provides an upper bound for the excess over the boundary which is intuitively clear if the X ’ s are nonnegative random variables. Towards this, we have the following theorem which we state without proof. Theorem 3.4.3 (Lorden, 1970). Let X1,X2, ... be independent and identically distributed random variables with E X = p > 0 and E ( X + ) 2 < 00 where X + = X if X > 0 and X + = 0 if X < 0. Let Sn = X i X2 ... X n , N ( t ) = inf {n : Sn > t } and R(t) = S N ( t ) - t. Then
+ + +
(3.4.16)
3.4. LIKELIHOOD RATIO TEST PROCEDURES
87
Corollary 3.4.3.1 Under the hypothesis of Theorem 3.4.3, if b < 0 < u and N* = inf { n : Sn @ (b,u)}, then (3.4.17) where /? = P ( S p
< b) .
b = -00 implies that P = 0 and N = inf { n 2 1,S n > u } . Then Siegmund (1985, p.170 and 176) obtained an asymptotic expression for E N .
Theorem 3.4.4 (Siegmund (1985)). E X ; < 00, then
If P ( X 1 > 0 )
= 1, ,u =
EX1 and
if X1 is non-arithmetic, (2)
limu-+wE(SN -
if X1 is arithmetic’ with span h.
If
p
>0
and E X ;
< 00,
we have’
i
ES$+ 2ESp+ ’
(ii) limu-+mE ( S N - u ) =
EsN+
+
2E5”+ 5, where N+ = inf ( n 2 1 , S n > u ) , and
if X1 is non-arithmetic, if X1 is arithmetic with span h,
where x- = - min(x, 0).
Proof. See Sigmund (1985, Chapter 8). One can use Theorem 3.4.4 in the expression for the average stopping time given by 1 E N = - [U E ( S N - u ) ]. P
+
Remark 3.4.3 If the X ’ s themselves are non-negative, then n-l will be identically equal to zero. ~~~
CTZ1ES;
~
‘A distribution on the real line is said to be arithmetic if it is concentrated on a set of points
of the form 1,f h ,f 2 h , .... The largest h with this property is called the spun of the distribution.
88
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
Now, we shall consider the generalized likelihood ratio test procedures. H. Robbins and his collaborators have proposed "open-ended tests" which like the one-sided SPRTs, continue sampling indefinitely (with prescribed probability) when the null hypothesis is true and stop only if the alternative is to be accepted. Their methods are most effective in the case of testing a normal mean. Lorden (1973) has investigated the generalized likelihood ratio approach to the problem of open-ended tests for Koopman-Darmois families. Schwarz (1962) has shown that the approach leads to easily computable procedures. Since the tests are equivalent to simultaneous one-sided SPRT's, it is easy to obtain upper bounds on expected sample sizes. We should focus on bounding the error probabilities, since the simple Wald approach is not applicable. For the one parameter Koopman-Darmois family, Wong (1968) has shown that the error probabilities tend to zero faster than clnc-l (as c + 0) for simultaneous one-sided SPRT's with c ( c < 1) being the cost of each observation. Lorden (1973) obtains an explicit bound which is of the order of c In c-l. Let X I , X2, ... denote independent and identically distributed random variables having the density (3.4.18) with respect to a non-degenerate 0-finite measure. Stopping time N (possibly randomized) satisfies
P ( N < 0olHo)5 y for some 0 < y < 1/3,
(3.4.19)
where HO : 8 = 8 0 and H I : 8 > 8 0 . (Reparametrize the density if necessary to shift the boundary point between null and alternative hypotheses to zero, that is 00 = 0. Also, without loss of generality assume that b(0) = 0. Let Sn = X1 X2 * X n , n = 1,2, ... and note that
+ +- - +
(3.4.20)
so that one-sided SPRTs of soon as
f (x;0) against f (2; e), 9 > 0 are given by: stop as
sn >
lna-'
+ nb (0) e
(3.4.21)
for specified type I error probability a (0 < a! < 1). (Notice that here, we set p = 0 in A = (1- p) /a.) The function b (.) is necessarily convex and infinitely differentiable in (8,e) which need not be the entire natural parameter space2 of 2The set of parameter points 8 for which f ( q 8 ) is a probability density function is called its natural parameter space.
3.4. LIKELIHOOD RATIO TEST PROCEDURES
89
the family of densities considered here. One can easily show that Ee (X) = b' ( 0 ) and ware (X) = bN ( 0 ) where primes denote the order of the derivatives. It is easy to show that the information number Ee {In [f( X ;0 ) / f ( X ;O)]} is I ( 0 ) given by eb'
(e) - b (e) = I ( 8 ) ,
(3.4.22)
and the variance of In {f ( X ;0 ) /f ( X ;0)) under 8 is 02b" (0) . Define a likelihood ratio open-ended test of HO : 0 = 0 against HI : 0 < 81 5 8 < 8 as a stopping time, N ( 8 1 , a ) : the smallest n 2 1 (or 00 if there is no n ) such that [lna-'
Sn > inf elg<e e
I nb(8)]
(3.4.23)
e
For the alternative 0 5 0 2 < 0, N ( 8 2 , a) can similarly be defined. Although the infimum in (3.4.23) is computable in some cases, for example, the case of the normal mean, it is simpler in many cases such as the normal scale parameter and negative exponential distribution, to formulate the continue-sampling inequality in terms of X n = Sn/n and n, as in Schwarz (1962). First note that (3.4.23) is equivalent to sup [eSn- n b (e)] > lna-l. (3.4.24)
elgje
If b' (81) 5 X n < b' ( 8 ) , the supremum is attained at q (Xn) where q is the inverse of the increasing function b'. Then (3.4.24) is equivalent to (3.4.25)
If f f n < b'(01), the supremum is achieved at 01; and if X n > b' ( 8 ) , the supremum is approached as 8 --+8 (and attained at 8 if the latter belongs to the natural parameter space). Then we have the following theorem which we state without proof. Theorem 3.4.4 (Lorden, 1973). If
fi = N (01, a ) with
then N satisfies (3.4.19) and
l n y - l + lnlny-1
I (0)
+ 21n { fi [I
+ I]}
I (0)
+-e2b'' (') + 1 (3.4.26) { I (0) )2
for all 8 in [01,8]. Further, af N satisfies (3.4.21) then
(3.4.27)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
90
Example 3.4.2 Let X be distributed normally with mean 0 and variance 1. Then b ( 0 ) = Q2/2 and q (y) = y. Hence (3.4.25) is equivalent to -2
Xn>
3.5
21n(l/a)
Testing Three Hypotheses about Normal Mean
In the preceding sections we have developed sequential procedures that are appropriate for testing two hypotheses (simple or composite). However, there are applications that require a choice among three or more courses of action; hence the theory developed earlier is not adequate. For example, the tolerance limits set to a machine may be too high, too low or acceptable. As another example let X , Y denote the breaking strengths of two yarns. We may be interested in testing whether X and Y are approximately equal or that one is stochastically larger than the other. One can formulate the above problems in terms of sequential testing among three hypotheses. If the three hypotheses can be ordered in terms of an unknown parameter, a sequential test may be devised by performing two SPRT’s simultaneously, one between each pair of neighboring hypotheses after they have been ordered. Armitage (1947, 1950)) and Sobel and Wald (1949) obtained sequential tests which satisfy certain conditions on the error probabilities.
3.5.1 Armit age-Sobel-Wald Test Let a random variable X be normally distributed having mean 8 and known variance which, without loss of generality can be set to be unity. We are interested in accepting one of
H~ : e = eo, H~ : e = el, H~ : e = e2, with
eo < el < e2,
(3.5.1)
on the basis of an i.i.d. sequence, { X n } (n = 1,2, ...). Notice that Armitage (1947) has considered the above formulation, whereas Sobel and Wald (1949) consider HO : B = 0 0 , HI : 81 5 8 5 6 2 , H2 : 8 = 0 3 . Thus we are considering X n is sufficient for 0, the the special case of 0 2 = 81. Since Tn = X1 fixed-sample size procedure would be:
+ -+
accept Ho if Tn 5 t o , accept Hi if t o < Tn < t l , accept H2 if Tn > ti ,
(3.5.2)
where to and tl are chosen subject to P(reject HolHo) 5 yo, P(reject HlIH1) 5 yl, P(reject H2lH2) 5 y 2 ,
(3.5.3)
The sequential procedure is given by: Let R1 denote the SPRT for HO versus H I and Rz denote the SPRT for H I versus H2. Then both R1 and R2 are carried out at each stage until
3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN
91
- either: one of the procedures leads to a decision to stop before the other. Then the former is stopped and the latter is continued until it leads to a decision to stop.
- or: both R1 and R2 lead to a decision to stop at the same stage in which case no further experimentation is conducted. The final decision rule is: Accept HO if R1 accepts HO and R2 accepts H I , Accept H I if R1 accepts H I and R2 accepts H I , Accept H2 if R1 accepts HI and R2 accepts H2. Let the SPRT Rj be given by the stopping bounds (Bj,Aj) ( j = 1,2) . Also let Aj = 8 j - 8j-1 and d j = (8j 8j-1) /2 for j = 1,2. Then at stage n the continue-sampling inequality for Rj is given by
+
(3.5.4) Now, we will show that acceptance of HO and
H2
is impossible if
(B1,Al) = (B2,Az)= (B,A) and A1 = A2 = A.
If HO is accepted at nth stage, then In B 2 X i < ndl+ A ' i=l
+
+
However, since n d l In B/A < nd2 In B/A, H2 is rejected at nth stage. Next, suppose that H2 is accepted at the nth stage. That is, In A 2 X i 2 nd2 + A ' i=l
+
+
However, since nd2 lnA/A > ndl InA/A, HOis rejected at nth stage. Thus, Ho and H2 cannot both be accepted. A geometrical representation of the Rule R is given in Figure 3.5.1. The combined Rule R can be stated as follows: continue sampling until an acceptance region (shaded area) is reached or both dashed lines are crossed. In the former case, stop and accept the relevant hypothesis as shown in figure 3.5.1; in the latter case stop and accept H I . Lemma 3.5.1 (Sobel and Wald, 1949). Let A1 = A2 = A. A s u f i c i e n t condition f o r the impossibility of accepting both Ho and H2 is (3.5.5)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
92
Proof. Acceptance of HOand H2 is impossible if and only if rejection number of HOfor R1 5 rejection number of H2 for R2, and acceptance number of Ho for R1 5 acceptance number of H2 for R2. That is, for every n 2 1 ndl
In A1 In A2 +I nd2 + A A
ndl
In B2 In B1 I nd2 + +a A *
and That is, for every n 2 1
and this completes the proof. H
Reiection line for H
s
m
D
III Figure 3.5.1 The Sobel-Wald Procedure
Remark 3.5.1.1 Since n ( 6 2 - 6 0 ) A/2 = A2 > 0, the inequalities (3.5.5) 1 and & / I 3 2 5 1. Hereafter unless otherwise are surely satisfied when A1/A2 I stated, we assume that the stopping bounds of the two SPRT's are such that it is impossible to accept both HO and H2. Let Nj [ N ]denote the stopping time associated with Rj [R]( j = 1,2). Then one can easily see that N = max (N1,N 2 ).
+
P ( N > n) = P(N1 > n or N2 > n) 5 P(N1 > n ) P(N2 > n) 5 2pn + 0 as n tends to infinity for some 0 < p < 1. Thus, the combined SPRT terminates finitely with probability one.
3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN
3.5.2
93
Choice of the Stopping Bounds
Here we indicate how to choose the constants (Bj,Aj),j = 1,2 subject to (3.5.3). Let L(HilOj,R) = Probability of accepting Hi when Hj is true, using R (i, j = O,1,2).
(3.5.6)
(3.5.7) where q = (1- B2) / (A2 - B2).
(3.5.8)
Equations (3.5.6) and (3.5.7) yield
Also, the expressions for q and y2 yield
Thus, by specifying r,~(besides yo,yl and y2), one can determine the stopping bounds Al, A2, B1 and B2. Clearly 0 < q 5 min (rl,1- y2), provided A2 2 1. If B1 5 1 then q 2 max (0, yo y1 - 1). Also, the sufficient conditions for the impossibility of accepting HOand H2 will also lead to meaningful lower and upper bounds for 7.
+
OC Function Let Lj(B(R)denote the probability of accepting Hj using R when 0 is the true value. Then we have Lo(B(R)= Lo(B(R1)since this is the probability of a path starting at the origin and leaving the lower boundary of R1 without ever touching the upper boundary of R. Similarly
L2(4R) = L2(4R2).
(3.5.9)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
94
Also, because the combined procedure R terminates with probability one, we have
However, we know
where
and
where
3.5.3
Bounds for ASN
Let RT be the SPRT having stopping bounds (B1,oo)and RZ be the SPRT associated with stopping bounds (0, A2). That is RT says continue taking observations until R1 accepts Ho and R,*says continue sampling until R2 accepts H2. Thus, R must terminate not later than RT or R;. If Nr ( N ; ) denotes the stopping time associated with RT (R;), N 5 NT and N 5 N,* and hence we have
Furthermore, since N = max (N1,N2) , we have
because
Similarly, the other inequality can be obtained and consequently
(3.5.12) Neglecting the excess over the boundary and from (2.4.4) we have
(3.5.13)
3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN
95
since the probability of accepting H I with RT is zero. Similarly
(3.5.14) since the probability of accepting H1 with R; is zero. Sobel and Wald (1949) numerically evaluate the proceeding upper and lower bounds for various values of 0. Several remarks are in order.
Remark 3.5.1 Although the Sobel-Wald procedure is not an optimum procedure (in the sense that the terminal decision is not in every case, a function of only the sufficient statistic, namely, the mean of all the observations), it is simple to apply, the OC function is known (after neglecting the excess over the boundary) and bounds for the ASN are available. Sobel and Wald (1949) claim that their procedure is not far from being optimum and when compared with a fixed-sample size procedure having the same maximum probability of making a wrong decision, the sequential procedure requires, on the average, substantially fewer observations to reach a final decision. Remark 3.5.2 Due to the monotonicity of the OC function, Sobel-Wald's procedure is applicable to test
H~ z
eI eQ,H~ ;
el
e
el1,
H~
e 2 el
where
Remark 3.5.3 Although we have formulated the procedure for the normal density, one can easily set up the procedure for an arbitrary probability density f (a;0 ) . However, it may not be easy to evaluate the OC function for all arbitrary f (a;0 ) . Furthermore, all these considerations can be extended to k-hypotheses testing problems ( k > 3). Simons (1967) developed a sequential procedure for testing (3.5.1) in the particular case when {Xi}are i.i.d. normal with unknown mean 6' and known variance 02.Numerical computations indicate that his procedure is as efficient as the Sobel-Wald procedure. Although Simon's (1967) procedure is a little more flexible, it would be somewhat difficult to extend it to non-normal populations. Atmitage (1950), independently of Sobel and Wald (1949)) has proposed a sequential procedure for k-hypotheses testing problem, which is related to classification procedures. When specialized to three hypotheses his procedure is as follows:
96
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
Armitage’s Procedure
Let Aijn = f ( X n ;8i) / f ( X n ;8 j ) where f ( X n ;0) denotes the joint probability or density function of X n = ( X I ,X2, ...,X n ) . Also, let A;j ( i , j = 0,1,2) be some positive constants. At stage n,
otherwise take one more observation. It should be fairly easy to see (via the central limit theorem) or since the procedure is a combination of several SPRT’s which surely terminate finitely) that Armitage’s procedure also terminates finitely with probability one. Armitage (1950) provides some crude bounds for the error probabilities. Let Li ( 8 j ) denote the probability of accepting Hi when Hj is true. By considering the total probability of all sample points which call for accepting Hi we see that
1 > Li ( O i ) > AijLi ( O j ) , ( i , j = O , l , 2, i # j ) . (Proceed as in Theorem 2.2 for obtaining Wald’s bounds). That is, 1
(3.5.15) Also, if the procedure is closed (that is LO(8)
+ L1(O)+ L2 ( 8 ) = 1)
Not much is known about the ASN for this procedure although Sobel-Wald’s bounds for ASN might still hold. Armitage (1950) applies his procedure to a binomial testing problem. Remark 3.5.4 Armitage’s (1950) procedure has some nice features compared to the Sobel-Wald procedure. The difficulty in the dashed area (see Figure 3.5.1) is avoided by performing an SPRT of Ho versus H2 in this region. Also, sampling terminates only when all the SPRTs terminate simultaneously.
3.5.4
Testing Two-sided Alternatives for Normal Mean
Interest in testing a two-sided alternative hypothesis arises naturally in the following industrial-setting. Suppose a measuring device has a zero setting which is liable to shift, so we will be interested in resetting the device or not on the basis of reported measurements on the “standard”. If an appreciable bias, expressed as a standardized departure from the known time reading in either direction is
3.5. TESTING THREE HYPOTHESES ABOUT NORMAL MEAN
97
indicated, then the instrument is taken out of service and reset. Let X be normally distributed with unknown mean 8 and known variance 02. Suppose we are interested in testing HO : 0 = 80 against H I : (8 - 8 0 ) = 60 for specified 6. We can run a sequential t2-test discussed in Section 3.2. Alternatively, as suggested by Armitage (194'7))one can carry out an SPRT, R+ of HO against H+ : 8 = 80 60 with error probabilities a / 2 and p;and also carry out an SPRT, R- of HO against H- : 6' = 80 - So with error probabilities a/2 and p. Then the combined decision rule is:
+
accept HO if R+ accepts HO and R- accepts Ho, accept H+ if R+ accepts H+ and R- accepts Ho, accept H- if R+ accepts Ho and R- accepts H-. Note that the acceptance region of H+ and H- will be symmetrically placed. (Draw a diagram with n on the horizontal axis and Z X i on the vertical axis). Also acceptance of H+ and H- will be impossible provided ,B a/2 < 1. One exception to the above terminal rule is: accept HO immediately after the path leaves the region common to both the continue sampling regions of R+ and Rinstead of waiting till the path reaches one of the acceptance regions stated above. This test has P(rejecting Ho(H0) = a and P(accepting HoIH+) = P(accepting Ho(H-) = p. This procedure is suitable if one is concerned not only about the magnitude of the shift but also its direction. Another situation where we will be interested in testing three hypotheses is the following. Let X be normally distributed with unknown mean p and known variance u 2 . Suppose we wish to test HO : p = po against the alternative hypotheses H I : p # po. The composite hypothesis is usually replaced in practice by two simple hypotheses Hi : p = pi (i = -1,l) where p-l < po < p l . A test procedure is sought such that P(accepting Ho(H0)= 1 - a and P(rejecting Ho(p = p i ) = 1 - ,B for i = f l . Billard and Vagholkar (1969) have proposed a sequential procedure for the above hypotheses-testing problem which is defined as follows (see Figure 3.5.2). Let Sn = Cy==l Xi denote the sum of n observations at nth stage.
+
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
98 n
cx,
I
I Accept H I
i 21
a ............................
b .............................
I
I
Accept Ho
Figure 3.5.2 The Test Procedure of Billard and Vagholkar First let an initial sample of size no 2 2 be taken. Then continue taking observations until the sample path traced out by the point (n,Sn) crosses either of the boundary lines LA, LAP when H I accepted or either of the lines BC, BQ and CR when HO is accepted, or it crosses either of the lines MD and DS when H- is accepted. This procedure will terminate finitely with probability one since N = min(N1,N-1) where Ni is the stopping time of the SPRT for HO versus Hi (i = A l ) and we know that the Ni are finite with probability one. Note that the procedure is completely specified in terms of the geometric parameters (no,a , b, c, t ,$, y) which are determined so as to minimize the ASN function subject to certain constraints on the OC functions. In general, there are seven geometrical pararmeters which completely define the sequential test procedure. However, in the symmetrical case, that is when po = 0, and --pel = p = pl and o2 = 1, we have c = -b, d = -a and y = -$. Hence, there remains only four parameters, namely (no,a , b, $) which need to be specified. The optimum set of values was obtained by Billard and Vagholkar (1969) for these parameters which minimize the ASN function E,(N) for some specified p, subject to the following constraints on L ( p ) :
L (Po) 2 1 - a , L ( p i ) 5 p, i = fl. where a and
p are the preassigned error probabilities, and L ( p ) = P(accepting
Hold* Further, Billard and Vagholkar (1969) obtain explicit expressions for L ( p ) and E p ( N ) and on the basis of some simulation study, claim the superiority of their test over that of Sobel and Wald. They attribute this to the fact that the B-V procedure is based on the sufficient statistics, X X i .
3.6. THE EFFICIENCY OF THE SPRT
3.6
99
The Efficiency of the SPRT
In this section we will study the efficiency of the SPRT not only at the hypothesized values of the parameter but also at other values of the parameter. The behavior of the relative efficiency of the SPRT when compared with the fixedsample size procedure in the case of normal populations is studied when the error probabilities are related in a certain way.
3.6.1
Efficiency of the SPRT Relative to the Fixed-Sample Size Procedure at the Hypotheses
Let { X i } be i.i.d. having f (z;8) for the common probability (density) function and 8 E 0. For the sake of simplicity assume that 8 is real and 0 is a part of the real line. We are interested in testing HO : 8 = 80 against H I : 8 = 81 where 80 # 81, subject to the prescribed error probabilities a and P. By Theorem 2.8.1 the SPRT minimizes the expected sample sizes at both the hypotheses. Given any competing procedure D one can study the efficiency of the SPRT relative to D at 8 = 80, 81. We shall, in particular, study the amount of saving achieved by the SPRT relative to the corresponding optimum fixed-sample test for the same hypotheses. If the optimum fixed-sample size is n ( a ,p), the relative efficiency of the SPRT at 8 E 0 is defined by
(3.6.1) where N is the stopping time of the SPRT. We note that 100 [Re (8) - I] /& ( 8 ) is the average percentage saving achieved by the SPRT over the optimum fixedsample size test when 8 is the true value of the parameter. In particular, let X be normal with mean p and known variance g2 where we shall also assume that lpl - pol is small so that the approximations for Epi ( N ) (i = 0 , l ) in (2.4.2) and (2.4.3) are reasonable. For the fixed-sample size case one can easily verify that if we reject HO when Cy=lxi > c
(3.6.2)
+
provided a ,B < lwhere zr is defined by 4 (zr) = y and [z]denotes the largest integer contained in x. Thus, it follows from (2.4.2), (2.4.3) and (2.10.2) that
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
100 and
In particular when
a!
= p,
(3.6.4) The following table shows that the approximate values of the efficiency of the SPRT relative to the fixed-sample size procedure for the normal mean when o is known. Note that ( R e - 1)100/Re indicates the percentage of savings by the SPRT.
a!
Re ( R e - 1) 100/Re
.005 2.540 60.6
-01 2.411 58.5
.05 2.042 51.0
.1 1.864 46.4
From Table 3.6.1 one can reasonably guess that the percentage saving would be at least 40 for all combinations of a and p. It should be noted that the expressions in (3.6.3) serve as upper bounds for R e because we have used expressions for Ep0( N ) and Epl ( N ) which are essentially lower bounds. (For instance, see Theorem 2.6.3). Under certain regularity conditions on the probability or denPaulson (1947) has shown that, when 81 is close to 80, the sity function f (.$), efficiency of the SPRT relative to the optimum fixed-sample size procedure is free of the particular form of f (z; 9) and the particular values 80 and 91. It should be noted that 8 could be vector-valued. Theorem 3.6.1 (Paulson, 1947). Let { X i } be a sequence of i.i.d. random variables having f (5;8) for the common probability or density function where 8 is assumed to be scalar. For 8 in the neighborhood of 80 and 81, assume that Eg ( 1 ) = f (2; 9 ) d x can be differentiated twice with respect to 9 under the integral sign, and that d2 In f (xi 0) 1 (d02> is unaformly continuous in 8. Then, the formulae for R e (80) and R e (81) of the SPRT of 8 = 80 against 8 = 81 relative to the most powerful fixed-sample procedure are respectively given b y (3.6.3) and (3.6.4) when A = 81 - 80 tends to zero.
Proof. We shall show that Re(00) tends to the expression in (3.6.3) and an analogous argument leads one to assert that Re (81) tends to the expression in (3.6.4).
101
3.6. THE EFFICIENCY OF THE SPRT Let us denote the information contained in X about 8j by
Expanding in Taylor series we have
where
and Since
we have
1
EeO(2)= --A21 2 using the first expansion, and
(80)
vur8, (2)= A21(80)
+o (A2),
+ o (A2) ,
using the second expansion, because the first and second derivatives of d 1n[f (2;0) /a81 are uniformly continuous in O in the neighborhood of 80. Hence from (2.4.2) we have
Notice that o (1) -+ 0 as A -+ 0. Next, let n ( a ,p) denote the sample size required by the most powerful nonsequential test which will be based on Sn = Cy=lZi. Now, n ( a ,P ) = n is determined by
102
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
Since lAl is small, n will be large and hence the central limit theorem is applicable. Thus [K - Eeo (z)lI
Jn
~1
[wareo
- a--a
1/2
and
p), we obtain Solving for n (a,
(3.6.6) Now, as A -+0, I(81) = I(80). Thus, considering n (a, p) /Eeo ( N ) we obtain (3.6.3). The expression for Re (81) can be obtained by interchanging the roles of a and p in (3.6.3) (because interchanging the roles of 8 0 and 81 is equivalent t o interchanging the roles of a and p) . Thus, the percentage savings indicated in Table 3.6.1 will also apply to the SPRT of 8 = 8 0 against 8 = 81 for arbitrary f (z; 8 ) provided 181 - 801 is small.
3.6.2
Relative Efficiency at 0 # Bo,
Although the null and alternative hypotheses are simple, it is conceivable that the unknown value of 8 is different from 80 and 81. The OC function of the SPRT at values of 8 lying between 80 and 81 is usually not of much interest since one is indifferent as to whether 8 = 8 0 is accepted or rejected. The performance of the ASN function at such values, however, is of considerable interest since the optimum property of the SPRT holds only at 8 = 80 or 81. If Ee ( N ) is a continuous function of 8, one would expect that the results of the preceding subsection should hold for 8 in the neighborhood of 80 and 81. In general, if a and p were not too small, then sup0 Ee ( N ) < n ( a ,p). Whether the maximum ASN of the SPRT is less than n(a,P) can be easily verified in a given situation by using Wald’s approximations. We note that it is p) for all 8 E 0’ where 0’ C 0. In this case if quite possible that Ee ( N ) > n (a, 0 C R then 0’ is typically an interval of values located between 80 and 81. We illustrate this feature by Wald’s SPRT and the optimum fixed-sample size test of the normal mean p and with known variance cr2, namely HO : p 5 po, against HI : p 2 pl,-00 < po < pl < 00. Let a = ,6 < 1/2. The monotonicity and the
3.6. THE EFFICIENCY OF THE SPRT
103
symmetry of the OC function yield E, ( N ) < E,,, ( N ) = EP1( N ) < n ( a ,a ) for p < po or p > pl, and sup, E, (N) = Ep (N), = (po pl) /2. From (3.6.2) and the relations
+
z =
(P1 - Po) (2X - Pl - Po>
202
where 2 is normally distributed with mean (pl- P O ) (2p - p1 - po) / ( 2 0 ~ )and variance ( p l- po)2/02, we obtain
(3.6.7)
(after noting that A = (1 - a ) / a , h (p)= 0 and using (2.5.8)). Consequently
[ 2z17,] 2
inf Re (/A)= c1
n(a7a) = sup, E, (N) In (l-Cr
= $ ( a ) , for 0 < a < 1 2-
(3.6.8)
We present in Table 3.6.2 the values of t+b ( a )for some values of a. From Table 3.6.2 we guess that inf, Re ( p ) is monotonically increasing. It is easy to verify (noting that d z / d a = -l/+ ( 2 ) and using 1’Hospital’s rule) that m
lim inf Re ( p ) = 0 and lim inf Re ( p ) =
a+o
p
a+1/2
c1
-
forTable 3.6.2 Approximate Values of inf R ()u Testing the Normal Mean ( Known ) When .10 .05 .005 .01 a 1.357 1.028 1.250 .950 inf, Re. (PI In order to establish the monotonicity of inf, Re ( p ) , it is sufficient to examine ’~. it suffices to show that h ( a ) > 0 for the derivative of (1/2) [$ ( G X ) ] ~ Hence, 0 < a < 1/2 where
h ( a )= z$ ( z ) - a (1- a )In Set
(’- ,
a ) and z = ~
1 - ~ .
104
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
Then, we can rewrite h ( a ) as
using the mean value theorem and the fact that g ( 0 ) = 0, where 0 5 z* 5 z. This is possible because g ( z ) is continuous and differentiable. Thus, in order to show that h ( a )>_ 0, it suffices to show that
Also, since 4 ( z ) > 0 and In suffices to show that
(A) > 0 for all positive finite
Notice that H (0) = 0 and H (m) = 0 and HI(.) =
z,
equivalently it
(1- Q ) - 2cp2.
Since H' (0) < 0, H' ( z ) is decreasing near z = 0. Also for large x since
we have (3.6.9) Also,
H ( 2 ) = 0 implies that H ' ( 2 ) > 0
(3.6.10)
because H (2) = 0 implies that 5@(1- a) = cp (2Q - 1). Hence
by applying the mean value theorem and noting that 0 < 2* < 2. Now H ( z ) cannot intersect the x-axis exactly once because then H' ( z ) should be negative for large z which contradicts (3.6.9). Also H ( z ) cannot intersect the x-axis more than once because H ' ( z ) has to be negative at the second intersection whereas this cannot happen because of (3.6.10). In Table 3.6.3 we give some values of h ( a ) . By using Newton's method, it is easily seen that the root of the equation $ ( a ) = 1 lies between a = .0080 and a = ,0081. Consequently, for a 5 .008 Wald's SPRT is less efficient than the
3.6. THE EFFICIENCY OF THE SPRT
105
o
A T
.05 2.99 x .40 4.46 x
.01 a x 1.60 h(a) .30 a h ( a ) 4.80 x
.10 2.73 x .45 2.73 x
.20 3.69 x .49 1.69 x
.25 5.71 x
3.6.3 Limiting Relative Efficiency of the SPRT The limiting relative efficiency of the SPRT has also been studied by Chernoff (1956), Aivazian (1959) and Bechhofer (1960). The statement “The SPRT often results in an average saving of 50 percent in sample size” needs to be qualified. Bechhofer (1960) has studied the limiting relative efficiency of the SPRT for the normal case when the error probabilities are related to each other and they both tend to zero. He brings out some of the surprises that are never anticipated. In the following we shall present his results. X2, ... be an i.i.d. sequence of normal variables having unknown mean Let XI, p and known variance a2. We wish to test the hypothesis HO : p = po against the alternative H I : p = pl (pl> po). Let a and p be the bounds on the error probabilities such that 0 < a,@< 1 and a p < 1. Bechhofer (1960) studied the efficiency (measured in terms of the ratio of average sample size to the fixedsample size) of the SPRT to the best competing fixed-sample procedure as a,P approach zero in a specified manner. Denote this efficiency by Re (a,p, 6,6*) where 6 = [2p - (po p l ) ] /20 and 8* = (pl - po)/2a. Hence p = po and p = pl are equivalent to 6 = -6* and 6 = 6* respectively. Then
+
+
When d = 1, that is, p = ca
8* P1- Po lim Re ( a ,ca, 6,6*)= - = 4 lel 4 IPO + p1 -
(3.6.12)
CY+O
Thus in the limit the relative efficiency given by (3.6.12) tends to zero as f o o , is equal to 1/4 when p = po or p = pl. It is greater than unity when (5p0 3p1) /8 < p < (3p0 5p1) /8 and it is infinite if p = (po p l ) /2. The relative efficiency of 1/4 when p = po or pl was previously noted by Chernoff (1956, p. 19) and the result in (3.6.11) for 6 = 6* has been obtained by Aivazian (1959). Both Chernoff and Aivazian considered the general problem of testing a simple hypothesis versus a simple alternative and studied the limiting relative efficiency as the two hypotheses approach each other.
p
--+
+
+
+
106
3.7
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
Bayes Sequential Procedures
In this section we will give Bayes sequential procedures for the binomial proportion, normal mean and some asymptotic Bayes sequential procedures.
3.7.1 Bayes Sequential Binomial SPRT Suppose that batches of items are available for inspection, and the items must be classified as either effective or defective. Let p denote the fraction of defective items in a batch. Assume that there exists a known critical fraction defective po such that a batch with a fraction po may without loss be accepted or rejected. Batch with p > po are considered bad and should be rejected. Let P(p) denote the prior distribution for the fraction of defective items in batches available for inspection and assume that it is of the following type:
Let W21 (W12) denote the loss incurred if a batch with fraction defective pl ( p 2 ) is rejected (accepted). Let c denote the cost of inspecting a single item. Then we wish t o determine the most economical sampling inspection plan. Vagholkar and Wetherill (1960) give a method based on the basic theory developed by Barnard (1954) which will be presented below. Because of the special form of the prior distribution, the problem of acceptance sampling is reduced to that of testing two simple hypotheses Hi (i = 1 , 2 ) where Hi means that a batch comes from a population with a fraction defective pi (i = 1,2). We accept a batch if H I is accepted and we reject the batch if H2 is accepted. In the sampling plan one item is inspected at a time and inspection is stopped as soon as sufficient evidence is accumulated in favor of either of the hypotheses. If the cost of the inspection depends merely on the total number of items inspected and no extra cost is involved due to sequential sampling, a sequential plan will be the most economical one. Also because of the optimum property of the SPRT, the latter will be the optimum test procedure when the cost of inspection is linear in the number of items inspected. The optimum procedure is given by: (i) continue inspection as long as A2
<x
a1
= -l(X,Y)
< A1,
a2
(ii) stop inspection and accept the batch as soon as
(iii) stop inspection and reject the batch as soon as
(3.7.2)
3.7. BAYES SEQUENTIAL PROCEDURES
107
where X and Y denote respectively the number of effectives and defectives obtained at any stage, A represents the ratio of the weighted likelihoods, and Z ( X ,Y ) is the likelihood ratio given by ( p l / p 2 ) y ( q 1 / q 2 ) x , qi = 1 - pi (i = 1,2). It is more convenient t o write Inequality (3.7.2) as
v
(3.7.3)
where V = a2Al/al and 2 = a2Xz/al. Let L ( z ~ , E , pdenote ) the probability that a batch with fraction defective p will be accepted, and n ( g , E , p ) denote the average number of items required to be inspected. The decision boundary (as pointed out by Barnard (1954)) is the set of points for which the expected cost of taking an immediate decision is equal to the expected cost of taking at least one more observation and continuing the test. For example, when X = X I , we write down the expected cost of accepting, as a function of the prior probabilities and decision costs: if we take another observation, then a effective item leads to acceptance of the batch, incurring a decision cost if H2 is true, and a defective item will lead us to a point where we either continue the test or reject the batch immediately. Then A1 is that value for which these two costs -of accepting immediately, and of taking one more observation and continuing the test if necessary- are equal. It should also be noted that, in general, the difference between the expected cost of an immediate decision and the expected cost of continuing the test should decrease as the sample size increases. For any A, it can easily be verified that the posterior probabilities associated with H I and H2 are A/ (1 A) and 1/ (1 A) respectively. Now, we are ready to derive the equations for the optimum values of A1 and X 2 . Consider X = XI. Using Bayes Theorem one can show that the posterior probability
+
+
+
Thus when X = XI, the posterior probability that HI is true is X I / (1 X I ) and the posterior probability that H2 is true is I/ (1 X I ) . Then, we accept H I and thereby incur a loss W12 if H2 is true. Thus, the expected loss due t o an immediate decision is W12/ (1 A,). On the other hand, if we inspect one more item, incurring a cost c, and continue the procedure thereafter, we accept the batch if the next item is an effective one since Alql/q2 > XI. The cost W12 will be incurred if H2 is true, so that the expected loss is
+
+
W12P (H2 is
true) P (next item is effective I H2 is true) = W12q2/ (I + X I ) .
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
108
If the next item is a defective, the likelihood ratio will be Xlpl/p2, and we shall continue with a (new) SPRT starting at this point. If the number of effective and defective items obtained with this new test is ( X I ,Y’),then we continue sampling if
X2/X1 2 pl/p2 we reject the batch, so that sampling continues only < p1/p2. The cost due to this continuation is
Clearly, if if
X2/X1
if Hi is true and p < 6 where 6 = p2/p1 and p = X2/X1. If the cost due to continuation of sampling is
H2
is true and p
<6
+
If p > 6, we reject the batch, and the cost is W21p1X1/ (1 XI). Now, equating the expected loss of an immediate decision to the expected loss if at least one more item is inspected, we have for p < 6
(3.7.4) and, for p
>6 w12 --c+ 1 A1
+
Solving for
A1
W21PlX1 1+X1
from (3.7.4) we get, for p
+-1W12Q2 +X1
(3.7.5)
<6 (3.7.6)
and for p
> 6 from (3.7.5) we have A1 =
W12P2 w21p1
(3.7.7)
+ c’
A similar argument leads to the following equation for
X2.
X2 = W12q2L (Yh,7 ;P2)
+ c + q2cL (Yh,7 ;P2)
W21QlJ5( Y l P , 7 ;P1)
- c - q1cL (YlP,w 1 )
(3.7.8)
3.7. BAYES SEQUENTIAL PROCEDURES Dividing (3.7.8) by (3.7.6) we have, for p
< pl/p2
and dividing (3.7.8) by (3.7.7) we have, for p
> pl/p2
109
Using Equations (3.7.9) and (3.7.10) one can solve for p by the method of iteration. the usual way of determining the boundaries of a SPRT is to use Wald’s approximate formulae which assume the error probabilities to be small, which is not always true if the sampling inspection plans are designed on a minimum cost basis. Hence, we would prefer some type of exact formulae that are useful from a practical point of view. Burman (1946) provided such formulae which will be discussed next. The SPRT for a binomial population as defined by (3.7.3) can be reduced (without much loss in accuracy) to a scoring procedure given by (3.7.11) where b = In (p1/p2)/ In (ql/q2). If b, M I and M2 are rounded off to the nearest integers, the SPRT reduces to the following scoring scheme. Start with a score M2, and add one to the score for each effective item found, subtracting b for each defective item found. Reject the batch if the score is zero or less, and accept the batch if the score reaches 2M = MI M2. Formulae for the ASN and OC functions of such a scheme have been given by Burman (1946) which are exact if b, M I and M2 are integers. The error involved in rounding them off is small if b exceeds ten, which is often satisfied in practice. One can express h ( p ) and hl ( p ) given by Equations (3.7.9) and (3.7.10) in terms of the score notation by replacing
+
and
n ( 6 , p b ; p j ) by n ( b , 2 M - b ; p j ) , f o r j = 1,2, where b and 2M = - In p/ In (ql/q2) are rounded off to the nearest integers.
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
110
The Methods of Calculation In practice, fix the values of a l , a2,pl , p 2 , W12,W21 and c, and then compute the value of b. In order t o solve Equations (3.7.9) and (3.7.10) (which are expressed in terms of the score notation) we start with some guessed value for p and then iterate until we get the same value of p. In the right hand side expressions of the modified versions of (3.7.9) and (3.7.10) p enters through 2M, which is an integer, so that when we get a p which gives rise to the same value of 2M that was used in the previous iteration, we stop and take that as our final iterated solution. Once p is obtained, A1 and A2 can be computed from (3.7.6) and (3.7.8) or (3.7.7.) and (3.7.8) as the case may be. The value of p is always less than one, and a lower bound for p has been derived by Vagholkar (1955) which is given by
(3.7.12) The lower bound or any number which is a little higher than the lower bound can serve as a good first guess for p, in order t o start the iteration.
Example 3.7.1 (Vagholkar and Wetherill, 1960). Let a1
=
5/9,
a2 =
4/9,
p l = .01, p2 = -10, W2l = 400, W12 = 500 and c = 1. We get b = 24 and by (3.7.12) we have .00065 < p < 1. Starting with a guessed value of .001 for p the successive values of p obtained are
P
+
.001
+
.001607
+
.001600
.1
I
-1
2M = 72
2M = 68
2M = 68
This gives A1 = 30.5, A2 = .0489 and the optimum test procedure (3.7.11) is given by M2 = 34, 2M = MI M2 = 68, b = 24.
+
Remark 3.7.1 It has been assumed that the cost of inspection is proportional to the number of items inspected. Also, the difficulty in carrying out a SPRT arises in that a decision is required after every inspection of an item as to whether to continue sampling or pass a judgement on the batch. This can be taken care of in the cost function by adding an extra term, namely Dd where D denotes the number of times the decision to continue sampling or pass a judgement on the batch has to be made and d denotes the cost associated with stopping the sampling in order t o make this decision. Further, one can also add on the cost of processing the items sampled. This may be equal t o Tt where T denote the number of times the process is carried out, and t denotes the cost associated with each processing. Thus, the total cost becomes cost = nc
+ Dd -t- Tt
(3.7.13)
3.7. BAYES SEQUENTIAL PROCEDURES
111
In this case the optimum test will be a SPRT with items sampled in groups (not necessarily of a constant size). If the constant group size is k, then (3.7.13) becomes d t n n cost = n c + - d + - t = n c ' , c'=c+-+(3.7.14) k k k k which is linear in n. Two particular results apply to group sequential sampling which follow from the results of Wald and Wolfowitz (1948). These are stated as lemmas.
Lemma 3.7.1 The risk associated with a group sequential sampling scheme greater than or equal to the risk for a unit-step sequential, for given prior probability and loss functions. is
Lemma 3.7.2 The optimum boundaries for group sequential sampling are within the optimum boundaries for the unit-step sequential sampling having the same prior probability and loss functions. Remark 3.7.2 The above lemmas are useful in practice, because for group sequential sampling we can derive the optimum boundaries from unit-step sequential sampling using c', the artificial cost per item given in (3.7.14). These boundaries will contain the true optimum boundaries. If the group size is large, and if we can replace the two point binomial distribution by an equivalent two point normal distribution, a more exact solution is possible. Remark 3.7.3 The three point binomial distribution is difficult to handle mathematically. However, Vagholkar and Wetherill (1960) and Wetherill (1957) provide a result pertaining to the three point binomial distribution.
3.7.2 Dynamic Programming Method for the Binomial Case In the previous work of Vagholkar and Wetherill (1960) the set of values of p is replaced by two special values of p namely, p l and p2 such that one decision is clearly appropriate for p l and the second decision is appropriate for p2. This approach is somewhat unsatisfactory and is an oversimplification of the problem. For the binomial problem Lindley and Barnett (1965) have given an optimal Bayes sequential procedure that can be implemented numerically by the backward induction techniques of dynamic programming. This will be described in the following. Without loss of generality, assume that the losses are given by L1
( P ) = b,
L2
(13) = P
(3.7.15)
where, without affecting the problems, the scales of losses have been changed and Li ( p ) is the loss associated with the terminal decision di (i = 1,2). We shall also assume that 0 < k, < 1, otherwise the problem is trivial (if k, lies outside
112
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
this range, then one decision is optimum for all p and no sampling is necessary). We call k, the critical value or the break-even value of p . If p < k,, d2 is the optimum decision, and if p > k,, d l is optimum where for convenience, let d l be the decision to reject and d2 the decision to accept. This agrees with the industrial application to sampling inspection where X i = 1 if the ith individual item in the batch is defective. Let c be the constant cost of sampling an item. Notice that we cannot scale the cost of sampling since we have already scaled the loss functions. The prior distribution is the conjugate prior, namely, the beta family given by ( a b - l ) ! p " - l ( l - pIb-l, a ,b > 0 (3.7.16) ( a - l)!( b - l)!
+
If the optimum scheme is tabulated for a = b = 1 (that is, when the prior density of p is uniform), then the optimum scheme can be known for any beta prior distribution with positive integral a and b for the following reason: The tabulated scheme for a = b = 1 tells us what to do if n observations are taken out of which r are found to be defective. Then since the likelihood of p is proportional to pr (1 - p)"-', the posterior distribution of p is proportional to the likelihood of p . Hence, the situation is the same as that of starting with a = r , b = n - T and no further tabulation for this prior distribution is necessary. Therefore, the tables depend only on the parameters k, and c.
3.7.3 The Dynamic Programming Equation If the current probability distribution of p is beta given by (3.7.16), then the expected loss is k, if d l is taken, and is equal to E ( p ) = u / ( u + b ) if d2 is taken. Hence, if only two terminal decisions are considered we reject (take d l ) if k, > a/(. + b) and otherwise accept (take d2). If the prior distribution has a = b = 1 then this current distribution is obtained by observing r = ( a - 1) defectives ( X I = 1) and n - r = (b - 1) effectives (Xi= 0). In this case we reject if k, < (r + 1)/ (n 2). One can plot the schemes on a diagram with (a + b) along the horizontal axis and a along the vertical axis. The line kr = a/(. b) is called the critical line. Above it the optimum decision among all the terminal decisions is to reject; below it is better to accept. Then, the loss incurred due to this terminal decision is given by
+
+
{
D ( a ,b) = min k,, -
(3.7.17)
arb}.
One of the other possibilities besides making a terminal decision, is take one observation and then choose among the terminal decisions. Amongst all the possibilities, there is one procedure that has the smallest expected loss and this will be called best. Let B ( a ,b) be the expected loss of the best possible procedure when the prior distribution has values ( a ,b). If B ( a , b) = min [k,, a/ ( a b ) ] ,then
+
3.7. BAYES SEQUENTIAL PROCEDURES
113
the best procedure is to stop and take that terminal decision with the smaller loss. If B ( a ,b) < min [k,, a/ ( a b ) ] ,take at least one more observation and then proceed as follows. Let B * ( a ,b) denote the expected loss if one further observation is made followed by the optimum decision. If the observation is defective then a and (a+b) will increase by 1. Hence, the expected loss obtained by adopting the optimum procedure after the observation will be B ( a 1,b). If the observation is an effective one, the expected loss will be B ( a , b + 1). Consequently, if one observation is taken, when the prior state of knowledge is ( a ,b), we have
+
+
B*(a,b) = c
b + 1,b) + --(a, a+b
U + -B(a a+b
b
+ 1).
(3.7.18)
When once we know D(a,b) and B*(a,b) the equation for B ( a , b ) is, since stopping or taking one more observation are the only possibilities,
B ( a ,b) = min [D(a,b), B * ( a ,b)] .
(3.7.19)
If B ( a ,b) is known for all a , b, with a+ b = zo(say) then (3.7.19) enables one to find B ( a , b) for all a , b with a b = zo- 1. Consequently, B ( a ,b) is known for all a , b with zo- a - b equal to an integer. Once B(a,b) is known, the optimum procedure at ( a ,b) is easily found by accepting if it is equal to a / ( a + b), rejecting if it is equal to k,, and otherwise taking one more observation. Thus, each point can be labelled as acceptance, rejection or continuation. According to Lindley and Barnett (1965) it can be shown that for fixed ( a b) the continuation region is an interval (possibly empty) between the two boundaries with the acceptance region below and the rejection region above, and that for all sufficiently large ( a + b) the interval of the continuation region is empty. Therefore, there is a least upper bound to the values of a b in the continuation region, at which the rejection and continuation boundaries meet on the critical line. This meeting point [call it (ii 6, b)] will satisfy the following relation a b U Ic, = -- c + -Ic, - a (3.7.20) a+b a+b 7i+b G + b + l '
+
+
+
+
+
Letting 3 = ii
-
+ 6, and solving we obtain (3.7.21) L
J
Equation (3.7.21) gives the upper bound beyond which it is never worth taking further observations. From a practical point of view, it is sufficient to start from the highest reachable point. the authors discuss as to how to find the highest reachable point. They also provide a computational method. They include a discussion about the OC and ASN functions of their procedure.
114
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
The Efficiency of Their Procedure
If c were equal to zero, we could sample indefinitely until p were known: Then the value of B(1,l)is
The initial expected loss when a = b = 1 is kr (if kr 5 0.5) obtained by rejecting. The difference between these two equals k:/2 which is the expected value of perfect information (EVPI) in the sense of Raiffa and Schlaifer (1961). Similarly, for any other value of c, kr - B(1,l) denotes the expected value of sample information (EVSI). Thus a measure of the efficiency of the optimum scheme is the ratio of the EVSI to EVPI given by
(3.7.22) This measure is a better criterion for the performance of the scheme than B(1,l)which depends for its interpretation on kr and c. The following table lists the efficiencies for various values of kr and c. Table 3.7.1 Showing the Values of E = 2 [kr - B(1, l)]/k: kr\ c 0.05 .01 .001 .031 .041 .051 .061 .oo .59 .90 .01 -1 -00 .66 .91 .98 1-00 .2 .OO .52 -86 .97 .99 1.00 -3 .OO .15 -72 -93 .98 1.00 .4 .OO .45 .82 .95 .99 1.00 .88 .97 .99 1.00 .5 .27 .61 For fixed kr, the efficiency naturally increases as the sampling cost c decreases. The limiting value of 1 is approached much more slowly for small kr than for values of Icr near l/2. Lindley and Barnett (1965) provide normal approximations to the boundaries, which are consistent with the results of Moriguti and Robbins (1962) and Chernoff (1960). Lindley and Barnett (1965, Section 16) also consider the normal case with HO : p > 0 against H I : p 5 0 and known variance where p has a normal prior distribution. The problem of sequential sampling has been considered in great generality for the exponential family by Mikhalevich (1956); in particular, he investigates the circumstances under which the optimum schemes terminate.
3.7.4 Bayes Sequential Procedures for the Normal Mean This problem has been considered by Chernoff in a series of papers. The review paper by Chernoff (1968) summarizes the results pertaining to the Bayes sequential testing procedure for the mean of a normal population. We present these
3.7. BAYES SEQUENTIAL PROCEDURES
115
results and for details the reader is referred to the references given at the end of the review paper by Chernoff (1968). The problem can be formulated as follows. Let X be a normal random variable having unknown mean p and variance a2.We wish to test HO : p 2 0 against HI : p < 0 with the cost of an incorrect decision being k l p l , k > 0. Let c denote the cost per observation. The total cost is cn if the decision is correct and cn k lpl if it is wrong where n is the number of observations taken. Thus, the total cost is a random variable whose distribution depends on the unknown p and the sequential procedure used. The problem is t o select an optimal sequential procedure. After much sampling , one is either reasonably certain of the sign of p or that 1pl is so small that the loss of a wrong decision is less than the cost of another observation. Here, one expects the proper procedure t o be such that one stops and makes a decision when the current estimate of p is sufficiently large and continues sampling otherwise. The adequacy of the largeness of p depends on, and should decrease with, the number of observations or equivalently the precision of the estimate. It can be shown that after a certain sample size, it pays to stop irrespective of the current estimate of 1p1. For given values of the constants, this problem can be solved numerically by the backward induction techniques of dynamic programming employed by Lindley and Barnett (1965). However, care must be taken in initiating the backward induction at a sample size n sufficiently large so that no matter what the estimate of p is, the optimal procedure will lead to a decision rather than to additional sampling. The technique of the backward induction can be summarized by the equation
+
(3.7.23) where & (t,)is the expected cost of an optimal procedure given by the history [, up to stage n, tn+l(6,) 5,) describes the history up to stage n 1 which may be random, with distribution depending on t, and the action 6, taken at stage n. It is possible to show that En is adequately summarized by the mean and variance of the posterior distribution of p. The evaluation of posterior distributions when dealing with normal random variables and normal priors enables us to treat the problem without much difficulty. If it is desired t o have an overall view of how the solutions depend on the various parameters, the simple though extensive numerical calculations of the backward induction are not adequate. A natural approach that is relevant to large-sample theory is that of replacing the discrete time random variables by analogous continuous time stochastic processes. The use of the Wiener process converts the problem t o one in which the analytic methods of partial differential equations can fruitfully be used. So let us assume that the data consist of a Wiener process X ( t ) with unknown drift p and known variance a2 per unit time. Also assume that the unknown value of p has prior normal distribution with mean po and variance 0;. One can easily
+
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
116
verify (or see Lemma 4.1 of Chernoff (1968)) that the posterior distribution of p is again normal with mean Y ( s )and variance s where
Y (s) = s =
+ x ( t )K 2 + to-2
pooC2
oo2
(3.7.24)
>
1
02;
(3.7.25)
+ to-2’
and Y (s) is a Wiener process in the -s scale originating at (yo, SO) = ( p o , o i ) , that is E [ d Y ( s ) ]= 0, var [ d Y ( s ) ]= -ds. (3.7.26) Notice that s decreases from so = oi as information accumulates. Since the
X process can be recovered from the Y process it suffices to deal with the latter, which measures the current estimate of p and which is easier to analyze. The posterior expected cost associated with deciding in favor of Ho at time t (when Y ( s )= y) is (3.7.27) where $J+ (u) = 4 (u) - u [l- @ (u)]. Similarly the posterior expected cost assoIt is ciated with deciding p < 0 is lcfi$- (y/fi) where $- (u)= 4 (u) uc9 (u). easy to see that if sampling is stopped at Y ( s )= y the decision should be made on the basis of the sign of y and the expected cost of deciding plus the cost of sampling which is given by
+
(3.7.28) where
(3.7.29) Thus the continuous time version of the problem can be viewed as the following stopping problem: The Wiener process Y (s) is observed. The statistician may stop at any value of s > 0 and pay d { Y ( s ) s} , . Find the stopping procedure which minimizes the expected cost. In this version using Y ( s ) ,the posterior Bayes estimate, the statistical aspects involving the unknown parameter p have been abstracted . The original discrete time problem can be described in terms of this stopping problem provided the permissible stopping values of s are restricted to so, s1, s2, ... where sn = (go2 n ~ - ~ ) -Now l . it should be straightforward to see that the discrete version can be treated numerically by backward induction in terms of the Y ( s ) process starting from any sn = c2/k2$J2(0) = 27rc2/k2.
+
3.8. SMALL ERROR PROBABILITY AND POWER ONE TEST
3.8
117
Small Error Probability and Power One Test
Before we consider power one tests, we give a useful lemma.
Lemma 3.8.1 (Villa (1939) and Wald (1947, p. 146)). For each n >_ 1, let fn (XI,...,zn)[f; (XI,...,x,)] denote the joint probability density function of X l , X 2 , ...,X , under Ho [ H I ]. Also let Pi(E) denote the probability of E coml f when puted under the assumption that Hi is true (i = 0 , l ) . Also, let An = f k ' fn > 0. Then, for any E > 1, 1 (3.8.1) Po ( A n > E for some n 2 I) 5 -. &
Application. Let the X ' s be i.i.d. normal ( 8 , l ) and let f6,n ( X I , 2 2 , ...,x n ) denote the joint density of X I ,X 2 , ...,X n and let G(8) denote a prior distribution of'e. NOW, set 00
fA ( ~ 1 , ~ 2 , - - . , x n)
J_,
f6,n(217 x2, - * - ) xn) dG(8)
00
- 8 ) dG(8) and
(3.8.2)
i where Sn = X
+ Xz + - - - + X n . Let (3.8.3)
and replace G(8) by G(8Jm) where rn is an arbitrary positive constant. Then (3.8.2) takes the form of =
=
J_m_ (
exp 8Sn -
2
dG(8rni)
Irn (e Jm g) exp
-00
-
2m
d~(9).
(3.8.4)
Thus, for i.i.d. normal (0,l) variables
(3.8.5) In order to understand the implication of (3.8.5) let G(8) = 0 for 8 < 0 so that g(x,t) is increasing in x. If A ( ~ , E is )the positive solution of the equation g(x,t) = E , then g ( x , t ) 2 E if and only if x 2 A(t,e). (3.8.6)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
118
Hence (3.8.5) becomes
1 (m>O,~>l). f o r s o m e n 2 1 I-,
I €
(3.8.7)
Remark 3.8.1 It was shown in Robbins (1970) that (3.8.7) is valid for an arbitrary i.i.d. sequence of random variables provided
(3.8.8)
Example 3.8.1 Let P ( X = 1) = P ( X = -1) = 1/2. Then
Robbins (1970) provides some examples where it is possible to give an explicit form for the function A(t,E). Example 3.8.2 Let G(0) be degenerate at 8 = 2a
> 0.
Then In E 2a
g(x,t ) = exp (2ax - 2a2t) 2 E if and only if x 2 at + -. Hence (3.8.7) gives with d = In&/(2a), an
<exp(-2ad), ( a , d , m > O ) .
Example 3.8.3 If G(y) is the folded standard normal, then Robbins (1970) shows that
exp (-."I2), (a,d, m > 0) 2Q,(a)
(3.8.9)
3.8. SMALL ERROR PROBABILITY AND POWER ONE TEST
119
Tests with small error probability Let X I ,X2, ... be an i,i.d. sequence of normal (8,l) variables, where 8 is an unknown parameter (-co < 8 < m). Suppose we wish to test H - : 8 < 0 against H+ : 0 > 0 (8 = 0 is excluded). Then, define the stopping time
N = inf ( n : lS,l and accept H+ [H-] when SN _> CN [SN5
2 c,}
-CN]
where
(3.8.10) (Here h(x) = x 2 and note that c n / n Now
Pe(N=co) =
N
[(lnn)/n]1/2-+ 0 as n -+co).
lim P e ( N > n ) 72400
=
lim Po ( I S j ( 5 c j , j 5 n)
n-wc
since Sn/n -+ 8 # 0 under H - or H+. Hence Po ( N < co)= 1 for 8 # 0, whereas for 0
Po (accept H - )
=
>0
Po ( S , 5 -c, before S, 2 k)
5 Po ( S , 5 -c, before Sn 2 G ) 1 = zPo(lSnl >_ c, for some n >_ 1)
-< 2L e x p ( f ) after using (3.8.9) with h(s)= x 2 . Similarly (from symmetry considerations) we have
(-f)
Po (accept H+) 5 - exp - for 0 < 0. 2
Thus the error probability of this test is uniformly bounded by (1/2) exp( -u2/2) for all 8 # 0. Hence
Po ( N < 00) = Po (IS,l 2 cn for some n 2 1) = 2P0 (S, > c, for some n 2 1)
120
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
So the test will rarely terminate when 8 = 0. However, Ee (N) is finite for every 8 # 0, approaches 181 --+ 00; because
00
as 8 -+ 0 and 1 as
00
n=l 00
n=O 00
n=O
and all the terms in the summation except the one for n = 0 will be equal to zero as 101 -+ 00.
Remarks 3.8.2 In a similar fashion one can construct an SPRT with uniformly small error probability when the X i are i.i.d. Bernoulli variables with P ( X 1 = 1) = p = 1 - P (X 1 = -1) and we set 8 = 2 p - 1 and H - : 6 < 0 and H+:e>o. Tests with Power One Let X1,X2, ... be i.i.d. normal (8,l) and we wish to test HO : 8 5 0 against H I : 8 > 0. Let the stopping time be defined as
N={
smallest n such that Sn 2 c, 00, if no such n occurs,
(3.8.11)
and when N < 00, stop sampling with X N and reject Ho in favor of H I ; when N = 00, continue sampling indefinitely and do not reject Ho, where Sn and c, are as defined in (3.8.10). For 8 < 0,
Po (reject Ho) = Po ( N < 00)
I =
<
-
+
Po(N
C n = { (n+ m) [a2 log ($ + l)]}1'2 (18)). Hence
with
(3.8.12)
after using Robbins (1970, equation
O (Sn < c, for all n 2 1) PO (not reject Ho) = P = 0 for all 8 > 0,
3.8. SMALL ERROR PROBABILITY AND POWER ONE TEST
-
121
because c,/n 0 and Sn/n + e(> 0) as n + 00. Thus the test has power one against H I . Clearly the test rarely terminates when 0 < 0. Towards & ( N ) for 8 > 0, we have (after using (2.6.12) with p = 0)
Ee(N) 2
-2lnPo(N
e2
< GO)
, for every 0 > 0.
(3.8.13)
For example, if N is such that PO( N < 00) = 0.05, then Eo(N) 2 6/02. However, no such N will minimize E e ( N ) uniformly for all 0 > 0. For N given by (3.8.11) and c, selected here (ct is concave for t 2 l), Darling and Robbins (1968a) have shown that (3.8.14) For our choice of c, with r n = 1, and u2 = 9, we obtain from (3.8.13) and (3.8.14) the following table. Table 3.8.1. Upper and Lower Bounds for Eo(N) with m = 1, and u2 = 9 e .1 1 2 Equation (3.8.13) 1040 10.4 26 1800 15 5 Equation (3.8.14) Monte Carlo studies will usually yield more precise estimates of Ee(N). However, they are not directly applicable to estimating the type of error for which the upper bound (3.8.12) becomes 0.0056 when u2 = 9. Thus the proposed test has type I error probability 5 a uniformly and type I1 error probability = 0. Also the sample size N is finite with probability one when H I is true, N = 00 with probability 2 1--a when HOis true. The latter property is not usually acceptable in statistical inference, since Wald’s SPRT was originally designed for problems in acceptance sampling, where nonterminating tests are not of much use. Darling and Robbins (1968a) provide a practical situation in which a physician has to decide whether to switch over to a new drug and where a test of power one makes sense. Barnard (1969), independently of Darling and Robbins (1967a, 196713, 1968a) has proposed tests of power one for the following Bernoulli problem. Suppose we wish to test HO : p < po versus HO : p 2 po where p denotes the probability of a certain component being defective. Let Sn denote the number of defectives in n components. In practical applications, po is small. the stopping time N is defined as:
N = { smallest n for which Sn 2 npo + z m, if no such n occurs,
I - ~ [npo (1 -
PO)]^/^
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
122
where denotes the (1 - a)100% point on the standard normal distribution. After we stop, we reject Ho. The above sequential procedure has power one uniformly for all p E H I and the type I error probability a can be made arbitrarily small by choosing a small. In order to see this, let us consider the standardized variables :
where q = 1- p . Hence, we stop as soon as ZN
>
(3.8.15)
Also, recall that the aw of the iterated logarithm says that (3.8.16) M
+- - +
for any Sn = X I + X2 * X n where the X’s are i.i.d. having mean zero and variance one. In our case, the successive maxima of the sequence of Zn increases like (2 log log N)l12 with probability one. If p > PO, then with probability one, we eventually stop the sampling process. If p = PO, then the right side of (3.8.15) is a constant and again, ZN will exceed that constant sooner or later. However, when p < po, the coefficient of N112is positive and hence the right hand side of (3.8.15) grows faster than (2 log log N ) l 1 2 and thus, there is a positive probability that the inequality in (3.8.15) will be violated for all N. This latter probability can be made arbitrarily close to 1, for any given value of p < .01, by choosing a arbitrarily small. Another possible practical application of the above formulation is the supervision of weights and measures. Sugar is packaged and is sold to consumers. otherwise the manThe weight of each packet of sugar should not fall below WO; ufacturer will be prosecuted. So the distribution of the weight of sugar in a package W follows a distribution having mean WOand a small variance such that P (W < WO)is very small when E ( W )= WO. Suppose we wish to test Ho : E ( W ) 2 Wo against H I : E ( W ) < WO.Note that difference W - WOconstitutes a bonus by the manufacturer to the consumers. The Weights and Measures Inspectorate should record the deviations n
di = Wo - Wi, i = 1 , 2 , ... , let Tn = C d i . i=l
They should stop experimentation as soon as Tn > kn112 and then prosecute the manufacturer, where k is chosen keeping in mind the standard deviation of
3.9. SEQUENTIAL RANK TEST PROCEDURES
123
Wi and the amount of risk of being prosecuted the manufacturer is willing to take. In this way, the manufacturer who consistently gave below legal weight would eventually be caught and prosecuted, while honest manufacturers would not have to spend unnecessary funds on unduly elaborate weighing equipment. They could pass on some of the benefits of the resultant cost reduction to the consumers.
3.9
Sequential Rank Test Procedures
In this section we shall consider certain nonparametric test procedures having power one (which were proposed by Darling and Robbins, 1968b) and simplified by Robbins (1970) and rank order test procedures for Lehmann alternatives. First we consider the test procedures proposed by Darling and Robbins (1968b) and Robbins (1970).
3.9.1 Kolmogorov-Smirnov Tests with Power One Let X [Y] have distribution function F ( z ) [G(y)]. Also, let D+ ( F ,G) = IF(z) - G(z)I; assume that X and Y are [ F ( z )- G(z)], D ( F ,G) = independent. . (a) We wish to test H I : F ( z ) 5 G(z) for all z, against Hla : F ( z ) > G(z) for some x. Let F, (z) [G, (z)] denote the empirical distribution function based on a random sample X I ,X2,...,X, [Yl,Y2,...,Y,].In order to test H I define
N = first n 2 m such that D+ (F,,G,)
2dn) n
(3.9.1)
where g(n)is some positive sequence such that g(n)/n+ 0 as n 3 00. Let h ( ~denote ) the inverse function of g(x)/z.If H I is false and D+ (F,G) = d > 0, then by the Glivenko-Cantelli theorem, D+ ( F ! ,G,) -+ d with probability one while g(n)/n-+ 0 as n -+ 00 so that P ( N < m) = 1. Then, if we agree to reject H I as soon as we observe that N < 00 while if N = 00, we do not reject H I , then the test certainly has power 1 when H I is false. It remains to consider the probability of type I error. Towards this we have the following result.
Result 3.9.1 (Darling and Robbins, 1968b). Let the stopping time be defined by (3.9.1). Also let (i) g(x)/x < 1, (ii) g(x) be concave. Then P (reject HlIH1 is true) 5 a
(3.9.2)
+
where a 2 Cr=,exp [-g2(n)/(n l)]. If HI is false and D+ ( F ,G ) = d > 0 with d 5 g(m)/rn, then (3.9.3)
124
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
(b) Suppose we wish to test the hypothesis: H2 : F ( t ) = G ( t )for all t , against
Hza : F ( t ) # G ( t ) for some t. Define N as N = smallest n 2 m such that D (Fn,Gn) 2 g(n> n
and we reject H2 if and only if N
< 00. For the error probabilities we have
Result 3.9.2 (Darling and Robbins, 1968b). For the above stopping time,
P (reject H21Hz is true) 5 2a, P(reject H21H2 is false) = 1. If H2 is false, and D(F,G) = d where a is as defined above.
(3.9.4)
> 0 with d 5 g(m)/rn, then (3.9.3) is valid,
(c) Let FO be any specified d.f. and consider testing the hypothesis: H3 F ( t ) 5 Fo(t) for every -00 < t < 00, against H3a : F ( t ) > Fo(t) for some t.
:
Result 3.9.3 (Darling and Robbins, 1968b). Define N as the smallest n _> m such that D+ (Fn,Fo) 2 g(n)/n,and reject H3 if and only if N < 00. Then P (reject ~ 3 1 . is ~ 3true) 5 2&, P(reject H31H3 is false) = 1. H3
is false, and D+(F,Fo) = d
3.9.2
Sequential Sign Test
If
(3.9.5)
> 0 with d 5 g(m)/m,then (3.9.3) holds.
Assume that X has d.f. F ( x ) and Y has d.f. G(y). We wish to test Ho : F = G , against the alternative H I : G ( x ) = F 2 ( x ) for all x. We observe pairs of observations ( X l , Y l ) ,(X2,Y2), ... sequentially. Assume that p = P ( X < Y ) is constant from stage to stage. Then the hypotheses can be replaced by Hb : p = 1/2 versus Hi : p = 2/3. The sequential sign test procedure has been proposed by Hall (1965, p. 40). Reduce the data to the signs of the differences X i - y Z . This may be justified by invariance under monotone transformations gn applied to each of the observations in stage n. (See Section 1.8 of Hall, Wijsman and Ghosh, 1965). The reduced data constitutes a Bernoulli sequence with success (positive sign) probability p . The simple SPRT for Bernoulli data (see Example 2.1.2) is applicable and the likelihood ratio at nth stage being that of a binomial random variable with parameters n and p .
125
3.9. SEQUENTIAL RANK TEST PROCEDURES
Groeneveld (1971) has proposed a sequential sign test for symmetry considered by Miller (1970). Groeneveld assumes that the density f (x;0) = f (z - 0) where f(x) > 0 for all x, f(z)= f(-z) and f’(x) < 0 for x > 0. In order to test Ho : 0 = 00 against HI : 0 = 01 > 00, one can carry out an SPRT by computing
Groeneveld’s (1971) sequential test of the same hypotheses is based on the number of times the ratio 2 = In [f (X; 01) / f (X; Bo)] is positive. Let this statistic be denoted by Tn.It is clear that this ratio is positive if and only if
eo+el - -e. X>-2
(Note that f (x - 00) = f (x - 01) when x = 3.) In this sequential test we are testing two alternative values of a binomial parameter p , HO : p = po against H I : p = p l where PO = P
(x> 3p0) = P (x - eo > 2 80>
and
x - el > = eo 2
p1 = P ( X >
i;f
(4dX
with A = (el - eo)/2. It is easy to see that po + p l = 1. It is of interest to compare the efficiency of the sequential sign test (SST) relative to the SPRT. When both the tests have the same error probabilities a! and p, the relative efficiencyis e = E ( N ) s ~ R T / E ( N ) s s T Using . the approximate formula for E ( N ) given by Eqs. (2.4.2) and (2.4.3) and noting that the random variable 2 for the ) ~- p1) / (1- p ~ ) ] ’ - ~=} (2Y - 1)ln ( p l / p o ) SST is given by 2 = In ( p l / p ~[(l since po + p i = 1 where Y takes on the value 1 with probability p and 0 with probability 1- p , one can obtain the relative efficiency under HO as
{
where 2 = In [f(Xi; 01) /f (Xi; eo)] and Eeo(2)= s-”,In [f(Y - 2A) /f (Y)] f(y)dy. The same value results under H I . Table 3.9.1 gives values of these efficiencies when X is normal or Laplace (double exponential). The parameter A is in terms
126
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
of standard deviations. Table 3.9.14 Efficiency of SST Relative to SPRT 2A Normal Laplace .1 .634 .978 .2 .635 .959 .4 .634 927 .5 .632 .914 .6 .630 .902 .8 .624 380 1.o .618 362 Both the expressions in the numerator and denominator of e are functions of
A and the density function f(x). Under additional regularity conditions on f(x), both can be expanded in a Taylor series. The resulting expression for e is
Hence lim e =
Ad0
which is also the efficiency of the median as an estimator of 8 in relation to the best unbiased estimator of 6' [See Fisz, 1963, Chapter 131, and is also the asymptotic efficiency of the sign test relative to the most powerful test for the hypothesis of symmetry if we consider competing tests based on a fixed-sample size [See HAjek and Sidak, 1967, Chapter 71. Table 3.9.2 gives the limiting value for several distributions. Table 3.9.24 Limiting Efficiencies of SST Normal Laplace Logistic Cauchy 2 I r = -637 1 .75 8 / r 2 = -810 One would consider this SST of Groeneveld (1971), because the calculation of the statistic T, does not require the specific form of f(x). But the value po depends on the distribution of X . However, the test can be carried out by choosing a value for po (say .4) and hence p l = -6 and the values of A and B (Wald's approximate boundary values) so that the sequential binomial test has errors a and p. If A is measured in standard deviation units, then by the improved Chebyshev's inequality for continuous unimodal distributions [See CrBmer, 1946, p. 183.1, P ( X - 6'0 > A) 5 2/9A2. Hence po = .4 implies that A2 5 5/9 4Reproduced from The American Statistician Vol. 25, copyright (1971) by the American Statistical Association. All right reserved.
3.9. SEQUENTIAL RANK TEST PROCEDURES
127
or A 5 .745. That is, the SST corresponds to a SPRT with error sizes (a,p) in which 81 (80) differs in absolute value from 3 by at most .745 standard deviations, whatever the distribution of X, provided a2 is finite.
3.9.3 Rank Order SPRT’s Based on Lehmann Alternatives: TwoSample Case Nonparametric alternatives known as Lehmann (1953) alternatives are given in terms of powers of the d.f. under the null case. For example, in the two-sample case if NO: F = G, then the Lehmann alternative is given by HI : (i) G = F A or (ii) G = 1- (1 - F ) A (A > 0) or (iii) G = h ( F ) such that h(0) = 0, h(1) = 1 and h (.) is nondecreasing in Let (XI, ...,Xnl) denote a random sample from F and (Y1, ...,Yn2)denote a random sample from G where X ’ s and Y’s are mutually independent, and F and G are assumed to be continuous. Let s1 < s 2 < < sn2 be the ranks of Y1, Y2, ...,Yn2in the combined sample of size n = n1 722. Then S = (s1,s2, ...,sn2)is called the rank order. One can derive an explicit expression for the probability of a rank order under Lehmann alternatives. (For instance, see Lehmann, 1953). Using this result, Wilcoxon, Rhodes and Bradley (1963) (see also Bradley, 1967) have developed two sequential two-sample grouped rank tests called sequential configural rank test (SCR-test) and the rank sum test. Bradley, Merchant and Wilcoxon (1966) provide a modified version of the configural group rank test proposed earlier, which is based on rerankings of observations as new groups of observations are obtained sequentially. Monte Carlo studies carried on the modified, sequential, configural rank test (MSCR) indicates a formal superiority of the MSCR-test over the SCR-test. Let us briefly describe the MSCR-test procedure. Suppose that X and Y-observations are taken in groups of m and n and that no group or block effects are present. Then at the tth stage of such a process, mt X-observations and nt of Y-observations are ranked in joint array. Then the likelihood ratio is the ratio of the probabilities of the ranks s!) j = 1,2, ...,nt which constitute the ranks of nt Y’s in the joint reranking at stage t. The likelihood ratio at tth stage is given by (a).
+
(3.9.6)
By invariant sufficiency the likelihood ratio only for the last reranking is equivalent to the joint likelihood ratio for all rerankings up to stage t. Hall, Wijsman and Ghosh (1965) discuss the MSCR-test and note that its finite termination with probability one under HO and HI follows from the work of Wirjosudirdjo (1961). Next we shall turn to the work of Savage and Sethuraman (1966). Let (XI,Yl), (X2,Y2), ... be independent and identically distributed bivariate random vari-
128
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
ables with a joint distribution H ( . , .) which has continuous marginal distributions F and G. We wish to test Ho : X and Y are independent and G = F against the alternative hypothesis, H1 : X , Y are independent and G = F A where A > 0, A # 1 is a known constant. At the nth stage of experimentation the available information is the ranks of (Y1,...,Yn)among ( X I ,...,X,, Y1,...,Yn). Let the combined sample be denoted by (W1, W2,..., W2n) and the ordered combined sample by Wn,l,Wn,2,...,Wn,an. Let Fn(Gn) denote the empirical distribution function of X i , ...,X n (Y1,...,Y,). Let sl < s2 < - - . < sn be the ranks of the ordered Y1,Y2,...,Y, in the combined sample. Notice that the statistics (s1, s2, ...,S n ) is equivalent to (Gn (Wn,i),i = 1,...,2n) which in turn is equivalent to (Fn (Wn,i),Gn (Wn,i>, i = 1,---,2n>. Lemma 3.9.1 With the above notation we have
(z
Clearly, PH,, = z) = PH,,(s1,...,S n ) = 1/ (2,") , which follows by setting A = 1 in (3.9.7) or from the fact that each rank order is equally likely under HO and (2,") denotes the total number of distinct rank orders. Let
z
(3.9.8)
It should be remarked that, since the product on the right side of (3.9.8) is symmetric in the Wni one can replace the Wni by the Wi. Then the SPRT based on ranks for testing HO against H I is given by: (i) (ii) (iii)
Take one more observation if B < An ( A ,Fn, Gn) < A , accept Ho if An 5 B , and reject Ho if An 2 A, n = 1,2, ....
Let N be the stopping variable for the above SPRT and let (3.9.9) Then, from (3.9.7) we have
Sn = In (4A)- 2 - Tn + 0
(2) 7
(3.9.10)
3.9. SEQUENTIAL RANK TEST PROCEDURES where Tn = n-l
129
x:zl In [Fn(Wi)+ AGn(Wi)]. Also, let 00
S (A, F, G) = In (4A) - 2 -
+
+
In [ F ( z ) AG(z)] { d F ( z ) dG(z)} . (3.9.11)
Also let
{ 0,
I, i f x s z ifx>z7 W ( z ) = F ( z ) +AG(z),
I(.;.)
=
W(Z,Y, 4 = I(z;4 + A%; 4 1 V ( X , Y ) = In [ W ( X ) W ( Y ) ]In (4A) - 2
+
Then combining the results of Savage and Sethuraman (1966) and Sethuraman (1970) we have
Result 3.9.4 The rank order SPRT based o n Lehmann alternatives t e m i nates finitely and has a finite moment-generating function provided P{V ( X ,Y ) = 0) # 1 where V ( X , Y ) is given by (3.9.12). Corollary 3.9.4.1 If X and Y are independent, then
P { [V ( X ,Y ) ]= 0) < 1. Next, towards the asymptotic normality of Sn we have the following result of Govindarajulu (196Sa) and Sethuraman (1970). Let {k} denote a sequence of stopping rules and { N k } with N 1 5 N2 5 . - - denote the sequence of stopping times associated with { R ) . Also assume that there exists a nonstochastic sequence of positive numbers { n k } , n 1 5 n2 5 - - 5 n k -+ 00 as k --+ 00 and N k l n k + 1 in probability as k -+ 00 .
-
Result 3.9.5 If X and Y are independent and the sequences { k } , { n k } and are as defined above,
{Nk}
(3.9.13) for
every z where
(3.9.14)
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
130 Note that
0; ( F ,F ) = 2 (1- A)2
- (1+ A)-2
(3.9.15)
3.9.4 One-Sample Rank Order SPRT’s for Symmetry Let V1, V2,... be independent and identically distributed random variables observed sequentially and having a continuous d.f. F . We wish to test the hypothesis, HO: F(w) F(-w) = I, for all w, (3.9.16)
+
that is, F is symmetric about zero. Let H(w) = P(V 5 wJV2 0) = [F(w)- F(O)]/ [ l - F(0)] and G(w) = P(lVl 5 wlV < 0) = [F(O)- F(-w)]/F(O) for w 2 0, and H(w), G(w) = 0, for w < 0. Thus F(w) = F ( 0 ) [l -G(-w)] for w < 0 and F(w) = H(w) F ( 0 ) [ l - H(w)] for w 2 0. Then we can rewrite (3.9.16) as
+
HO : H(w) = G(w) for all w and F ( 0 ) = 1/2 and take HI : H(w) # G(w) for some w
However, probabilities of desired rank orders cannot be derived explicitly under Ha. Assuming that H, : H(w) = 1 - [I - G(w)lA, w 2 0 with F ( 0 ) = A / ( l + A ) , sampling with groups of fixed size at each stage and ranking the absolute values of the observations within each group at each stage, Weed and Bradley (1971) have proposed two sequential procedures, one based on withingroup configuration of signed ranks and the other based on the within-group sums of positive signed ranks. They carried out some Monte Carlo studies in order to test whether the model is approximate or not when the data is generated from the normal populations. The choice of F(0) is obtained by forcing F(w) not to have a jump discontinuity at the origin. Consider the alternatives: Model I1 : Hull : H(w) = GA(w), for all w, A
> 0, A # 1,
(3.9.17)
where A is specified, F(0) = XO, XO specified. Weed (1968) has considered the alternative Model I : H,I : H(w) = 1 - [l- G(w)lA, for all w, A
> 0, A # 1,
(3.9.18)
+
where A is specified, F(0) = A/ (1 A). Notice that if X = -V, V < 0 and if Y = V, V 2 0, and if X and Y have conditional d.f.’s (G and H respectively) satisfying (3.9.17) then the conditional d.f.’s of -X and -Y would satisfy (3.9.18) provided A0 = A/ (1 A) and the
+
3.9. SEQUENTIAL RANK TEST PROCEDURES
131
converse statement is also true. Thus, the proof for finite termination would essentially be the same for both the models. (See Weed, Bradley and Govindarajulu (1974)). In the following we shall present the main results for Model 11. Let X I , ...)X m denote the absolute values of those V’s that are negative and let Y1,...,Yn denote the nonnegative V ’ s , rn n = t . Notice that rn is binomially distributed with parameters t and X = F(O), 0 < X < 1. Let the combined sample of X ’ s and Y’s be denoted by W1, ...,Wt and the ordered combined sample by Wtl < - - - < Wtt. Let Gm and Hn respectively denote the empirical d.f. of X I )...,X m and Y1)..., Yn respectively. Further, following Savage (1959)) let 2 = (21,...,Zt) where Zi = 1 or 0 according as Wti corresponds to a negative or nonnegative V respectively. Also, let
+
(3.9.19) where z denotes a specified value for the rank order. The SPRT for testing Ho against Hal or Ha11 is given by: (i) (ii) (iii)
Take one more observation if B accept Ho if At 5 B , and reject HO if At 2 A, t = 1,2, ...,
< At < A,
where 0 < B < 1 < A are suitable constants (independent oft). Let T denote the stopping variable for the above SPRT. Lemma 3.9.2 With the above notation and assumptions we have
ztt!A;;” (1 - AO)” An
nf=1 [mGm (Wi)+
(J44)l-l f o r Model
II,
t
2?!
[A]~~=,[m{1-G,(W,)}+An{1-H,(Wi)}] for Model
I. (3.9.20)
Towards the finiteness of the stopping time we have the following result. Let I ( z ;z ) be as defined in the two-sample case (i.e., I (z; z ) = 1 if x 5 z and zero elsewhere). W ( Z= ) XG(z) + A (1 - A) H ( z ) , X = F(0) (3.9.2 I)
132
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
and
A (1 - Xo) U(V) = In2 - 1+In [A (1- Xo)] - In - l n [ X + A ( l -X)]+lnW(X) +lnW(Y) -lnW(IVI) - / l n W ( z ) d { A G + ( l - X ) H }
- (1- A)
[
(A - 1) 0
Under Ho, U(V) simplifies to (since X =
+ + In
-21n (1 A)
XG(z) W(">
"'1
dH(z).
(3.9.22)
and G = H )
[wF;;y]
(3.9.23)
*
Then combining the main results of Weed, Bradley and Govindarajulu (1974) and Govindarajulu (1984) we have
Result 3.9.6 If P ( U ( V ) = 0) # 1, then the rank order S P R T terminates finitely with probability one and the stopping time has a finite moment-generating function. Remarks 3.9.1 Suppose Xo is not specified and we estimate it b y Xt = m/t and base the sequential procedure on the resulting likelihood ratio. This modified rank order S P R T will also have the property given in Result 3.9.6. Miller (1970) has proposed a sequential signed rank test for symmetry. Let XI, X2, ... be a sequence of i.i.d. random variables having density f(x;0) which is symmetric about 8. We wish to test HO : 8 = 0 against the alternative HI : 0 # 0. Notice that the hypothesis is regarding the location of symmetry and the symmetry is a part of the assumption under both HO and H I . Thus tests of this hypothesis would be different from the test procedures developed earlier in Section 3.9.4. Let &,n denote the rank of IXi( among 1x11,...,lXnl , and let S denote the sum of the ranks of the positive X ' s . That is
where I(x) = 1 if x > 0 and zero otherwise. Miller's (1970) procedure is as follows: Continue sampling as long as
3.9. SEQUENTIAL RANK TEST PROCEDURES n (n (ii)
+ 1)24(2n + 1)] 1’2
133
and
n < no.
Stop sampling as soon as (i) or (ii) is violated. If (i) is violated, decide in favor of H I . If (ii) is violated and not (i), decide in favor of Ho. no and a are no) as follows. Let selected by the investigator. These determine z (a,
+
+
For the rejection boundary on S - n(n 1)/4 defined by f y (a, no) [n (n 1) (2n 1) /24]1/2, the test will decide in favor of H I if and only if Yzo2 y ( a ,no). Thus, y (a, no) should be the upper a-percentile of the distribution of Yzo, that is, P {Y;,, 2 y ( a ,no)} = a. The behavior of the sequence {Sn} for n = 1,2, ...,no can be approximated by the Wiener process. In particular, P {Y,*,,2 y} can be approximated by the probability that a Wiener process crosses a square root barrier. Miller (1970) has carried out Monte Carlo studies on the distribution of Y,*,, for various values of no and has obtained the values of y (a, no). These are presented in Table 3.9.3 where K denotes the number of replications for each no.
+
10 2.02 .10 2.20 -05 .01 2.55 ~/103 2 a\n0
Table 3.9.35 Values of y (a,no) 15 20 25 30 40 50 2.16 2.20 2.22 2.28 2.33 2.37 2.39 2.40 2.46 2.55 2.55 2.62 2.75 2.83 2.91 2.93 3.03 3.03 1 3 1 3 2 3
60 2.39 2.62 3.07 6
The truncation point no, especially in medical trials, is dictated by limitations on time, money, number of patients etc. Since Sn denotes the sum of the ranks of the positive X’s among the absolute X’s,
Miller (1970), by means of Monte Carlo studies, tabulates the power and average sample number of his procedure for the translation alternatives of the double exponential distribution and concludes that its power is reasonable and the ASN is less than or equal to no (equal no for large no and small values of 5Reproduced from the Journal of American Statistical Association Vol. 65, copyright (1970) by the American Statistical Association. All rights reserved.
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
134
the shifts in the location parameter). Miller (1970) also discusses the “inner acceptance boundary.” Let sgn(x) = 1 if x 2 0 and = -1 if x < 0. Also, let Si = sgn (Xi) &i where Rji denotes the rank of IXj( among 1x11,..., [Xil. Then Reynolds (1975) has shown that the following statements are equivalent: (i) S1, Sz, S3, ..., are mutually independent,
for all n >, 1 where sn is a non-zero integer in [-n,n] (iii) F ( - z ) [ l - F(O)]= F(O)[I- F ( z ) ] ,x (iv)
,
2 0,
1x1I and sgn (XI) are independent, and
(v) Rnn and sgn ( X n ) are independent for all n 2 1. Reynolds (1975) proposes a sequential procedure for testing symmetry that is based on the test statistic
The signed Wilcoxon rank statistic is n
on which Miller’s (1970) procedure is based. Writing
where
One can easily compute
6’ = E
(q5jj)
= 1- 2F(O)
3.9. SEQUENTIAL RANK TEST PROCEDURES
135
and
+
where r2 = (1/3) 67 - 5t2. If f denotes the density of F , then we will be interested in testing HO : f is symmetric about zero against H : f is symmetric about some S # 0. Then if no is the upper bound on the sample size, Reynold’s (1975) procedure is as follows: If the experiment has proceeded up to stage n, reject Ho if Zn $ (b,a) with b < 0 < a; if Zn E (b,a) and n < no take one more observation. If n = no and Zn E (b,a) stop sampling and accept Ho. In case we are interested in one-sided alternatives, then set either b = -00 or a = 00 accordingly as 6 > 0 or S < 0. If the value of the test statistic is close to zero in the two-sided test, then as n approaches no, a point is reached from which it is not possible to reach the rejection boundary irrespective of the values of the remaining Si’s. This leads us to the use of an inner acceptance boundary that enables us to accept HO at an early stage. At any stage n1 < no the maximum amount that Zn can increase or decrease while taking the remaining no - n1 observation is
i i=nl+l
Thus, if for any n 5 no, Zn is such that nn
b+
22
i+l i=nl+l
nn
2
then HO is accepted since we will not be able to reject it at a later stage.
Asymptotic Results Reynolds (1975) shows that asymptotically the test statistic behaves like a Brownian motion process and derives asymptotic expressions for the power and ASN functions that are based on the Brownian motion having truncated linear barriers. One-sided Alternatives. We have
a = P (reject HOIHO) E 2@ ( c ) ; c -
(;)1’2u,
136
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
--En0 - a )+a(*).
exp($)a(
TJno
TJno
Let N denote the stopping time of the sequential procedure. Then
Two-sided Alternatives Using Anderson’s (1960) results and considering only the first term in the infinite series expressions for the power and the ASN functions of the continuous time sequential procedure, Reynolds (1975) obtains, after setting b = -a,
Reynolds (1975), via simulation methods, compares his procedure, for double exponential shift alternatives, with the Wilcoxon signed rank sequential procedure given by n
SR, =
C sgn (xi)
Rzn
i=l
which is not quite Miller’s (1970) test statistic, since Miller does not include the ranks enjoyed by the observations that are equal to zero. Reynolds surmises that the procedure based on S R , and his procedure are equivalent for reasonable values of no and alternatives that are not too far from the null hypothesis. However,
3.9. SEQUENTIAL RANK TEST PROCEDURES
137
the SSR statistic (i.e., Reynold’s statistic) is easier to use as well as to generate its null distribution because the Si (sequential ranks) are independent under the null hypothesis. Furthermore, the Brownian motion approximation seems to over-estimate the probability of rejecting HO for all values of no.
Remark 3.9.2 If the underlying density f is symmetric about some positive constant or if f has median zero but is skewed to the right, then E (Zn) is positive. Hence the sequential procedure based on Zn could also be used to test the hypothesis that f is symmetric about zero against the alternative that the distribution is symmetric having shifted mean or it is skewed having zero median.
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
138
3.10
Appendix: A Useful Lemma
Lemma 3.A.1
where
s: =
Proof.
i=l
n
no - 1 i=l
n
i=no+l
3.11. PROBLEMS
3.11
139
Problems
3.2-1 For the following data carry out the sequential t-test for HO : 8 = 5.0 vs. H1 : 8 2 5.0 0.20 (0unknown) with a = p = 0.05 :
+
5.4,5.3,5.2,5.0,5.4,5.9,5.4,5.1,5.4,5.2,5.7,5.9,5.0,5.0 Hint:
i=l
i=l
where
3.2-2 For the following data carry out the sequential t2-test for HO : 8 = 5.0 vs. H I : 10 - 5.01 > 0.20 (a unknown) with a = p = 0.05.
5.4,5.3,5.2,4.5,5.0,5.4,3.8,5.9,5.4,5.1 5.4,4.1,5.2,4.8,4.6,5.7,5.9,5.8,5.0,5.0. 3.2-3 For the data in Problem 3.2.1, using Hall's procedure test Ho : 8 = 5.0 VS. Hl : 0 = 5.2 with m = 9, a = ,B = 0.05. 3.2-4 Let X be normally distributed with mean 0 and unknown variance a2. We wish to test HO : B = 0 versus H I : 8 > 60 where 6 > 0. The following is a sequence of independent observations from the above population: -1.3, .34, -41, -.06, .94, 1.44, -.22, -.34. Carry out the sequential t-test (see Equation (3.2.4)) for S = 1/4 and for 1/2, starting with n = 2,3, etc., using a = p = 0.1. 3.3-1 For the following data with p = 4, carry out a sequential F-tkst with A0 = 0.25 and a = ,O = 0.05. The following data from Olson and Miller (1958) (or see Sokal and James (1969))6 are measurements of four random samples of domestic pigeons. The measurements (multimeters) is the length from the anterior and of the narial opening to the tip of the bony beak. 601son, E. C. and Miller, R. L. (1958), Morphological Integration, University of Chicago Press, p. 317. Sokal, R. R. and James, R. F. (1969), Biometry, W.A. Freeman and Co., San Francisco, p.
251 Problem # 9.5.
140
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES Samples 1
5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1
2 5.4 4.1 5.2 4.8 4.6 5.7 5.9 5.8 5.0 5.0
5.2 5.1 4.7 5.0 5.9 5.3 6.0 5.2 6.6 5.6
4
3
5.1 5.7 5.1 4.7 6.5 5.1 5.4 5.8 5.8 5.9
5.5 4.7 4.8 4.9 5.9 5.2 4.8 4.9 6.4 5.1
5.1 4.5 5.3 4.8 5.3 5.4 4.9 4.7 4.8 5.0
5.1 4.6 5.4 5.5 5.2 5.0 4.8 5.1 4.4 6.5
4.8 4.9 6.0 4.8 5.7 5.5 5.8 5.6 5.5 5.0
For each of the following problems carry out a sequential likelihood ratio test procedure.
3.4-1 Let 6, 15, 3, 12, 6, 21, 15, 18, 12, ... denote a sequence of independent observations from the normal population having unknown mean p and variance 02.We wish to test sequentially HO: p = 8 against HI : p = 14. 3.4-2 As part of a learning experiment twelve subjects recited a series of digits. The number of correct responses out of 180 is given below for two groups of subjects, one group having had 20 practice trials and the other group 30 practice trials. * Group 1 Group 2
169 97 16 113 61 77 100 141 151 169 100 166
Assuming that the proportion of correct responses is approximately normal and that the two groups have the same variability, test sequentially HO : pl = p2 = 112 against H I : pl = 112, p2 = 314.
3.4-3 The following data* gives the Hemoglobin content in milligrams in six patients having pernicious anemia before and after three months treatment with vitamin B12. Patient Before After
’*
1 12.2 13.0
2 3 4 11.3 14.7 11.4 13.4 16.0 13.6
5 6 11.5 12.7 14.0 13.8
The data in problems (3.4.2) and (3.4.3) are taken from Kurtz, Thomas E. (1963). Basic Statistics. Prentice Hall, Englewood Cliffs, N.J., p. 257. The data in (3.4.2)was provided to Dr. Kurtz by Professor Donald C. Butler and the data (3.4.3) was taken from the Southern Medical Journal, Vol. 43 (1950), p. 679. Reprinted by permission of Prentice Hall Inc., Englewood Cliffs, N.J. and the Southern Medical Journal.
3.11. PROBLEMS
141
Assume that the data is normal and let pd denote the expectation of the difference between “before” and “after” Hemoglobin content. We wish to test sequentially HO : pd = 0 against H I : pd = -1. 3.4-4 Let 40,47,35,60,54,42,66,51 denote independent observations from the
exponential density 0-l exp [- (x - 6 ) /o],x 2 8. Test sequentially HO : CT = 5 against H I : 0 = 8. [Hint: Note that n1I2( X Q )- 0) converges to zero in probability as n + 00 where X ( l ) = min ( X I ,...)X n ) .] 3.4-5 Derive an asymptotic expression for the OC function of Cox’s likelihood
ratio test procedure. 3.4-6 Derive an asymptotic expression for the ASN function of Cox’s likelihood
ratio test procedure.
3.4-7 Let X I ,X2, ... be an i.i.d. sequence of normal variables having mean p and unknown variance 0 2 .We wish to test Ho : p = po against H I : p = pl with 0 unknown. Obtain Bartlett’s sequential likelihood ratio test procedure for testing HO against H I . 3.4-8 Let 81 and 6 2 denote two binomial probabilities. Let y = 82 - 61 and 6 = 81. Set up Cox’s likelihood ratio test procedure for testing HO : y = yo versus H I : y = y1 where yo and y1 are some specified constants.
3.4-9 With the notation of Problem 3.4.8, let y = In [&/ (1 - &)]-ln [el/ (1 - Sl)] and 6 = el/ (1 - 81). Set up Cox’s likelihood ratio test procedure for testing Ho : y = yo versus HI : y = y1 where yo and y1 are some specified
constants.
[-
1
1 - exp (x - 1 3 )/o ~ , x 2 6. Set up the sequential likelihood ratio procedure for testing Ho : 0 = 00 against HI : 0 = 01 (with 01 > 00) when 8 is unknown.
3.4-10 Let F
(2; 0,s)
=
Using y1 = y2 = y3 = .05,q = .01 carry out the Sobel-Wald sequential procedure in problems 3.5.1, 3.5.2 and 3.5.3.
3.5-1 The binomial distribution with HO : 8 = 1/4, H I : 8 = 1/2, and H2 : 6 = 314. 3.5-2 The Poisson distribution with HO : 8 = 1, H I : I3 = 2, and H2 : 6 = 3.
3.5-3 The negative exponential distribution having density f(x;8) = 8-1e--z/e for x > 0 with HO : 6 = 1, H I : 6 = 2, and H2 : 6 = 3.
CHAPTER 3. TESTS FOR COMPOSITE HYPOTHESES
142
3.8-1 Carry out Darling-Robbins’ power one test of HO : 9 5 4.5 versus HI : 8 > 4.5, B = 0.5 with m = 1, a2 = 9 for the following normal data:
5.4, 5.3, 5.2, 5.0, 5.4, 5.9, 5.4, 5.1, 5.4, 5.2, 5.7, 5.9, 5.0, 5.0 Carry out Barnard’s power one test for the Bernoulli problem with HO : p < 0.4 and H I : p 2 0.4 and a = .05 for the data ( X l , X 2,...) = (1,L 1,0,1,1’0,1,1) -
3.8-2
3.9-1 For the following pairs of data (X;,Y,) carry out a sequential sign test of HO : G ( x )= F ( x ) for all x versus H I : G = F 2 using a = ,f3 = .05 :
(5.4, 5.2) (5.4, 5.2) (4.1, 4.5)
(5.3, 4.7) (3.8, 4.8) (5.7, 5.4)
(5.2, 4.8) (5.9, 4.9) (5.9, 4.9)
(4.5, 4.9) (5.1, 5.0) (5.8, 4.7)
(5.0, 5.9) (5.4, 5.1) (5.0, 4.8)
[Hint: Let p = P ( X > Y ) .Then the problem of testing HO versus H I is equivalent to testing Hb : p = 1/2 versus Hi : p = 2/3.] 3.9-2 Carry out a two-sample sequential rank order SPRT with a = p = .05 for the data in problem 3.9.1.
Chapter 4
Sequential Est irnat ion 4.1
Basic Concepts
In some application, formulation of a problem as a hypothesis-testing one would be artificial. In some of these instances, estimation seems to be more appropriate. In the fixed-sample size situation there is a close connection between acceptance regions and confidence regions, whereas that analogy does not hold in the sequential situation. Hence, there is a need for a theory of sequential estimation. The stopping rules in sequential testing may not be meaningful in sequential estimation. In this section we formulate (1)the general loss function involved in sequential estimation and (2) certain stopping rules. Let X I ,X2, ..., be a sequence of independent random variables having common pdf f (x;e>. Let r [S (x); 01 denote the loss resulting from making a terminal decision S(z) when 8 is the true value of the parameter. We should add to the loss the cost of experimentation, namely C ( N ) ,the cost of taking N observations (where N is a random variable). The statistician’s task lies in choosing a stopping rule and a terminal decision rule (estimate for 8). Then according to Lehmann (1950, Section 2.4) the statistician might be faced with the following situations: (i) Limited resources (forcing a bound on expectation of total cost of experimentation). He then seeks to minimize the risk function (expectation of the loss function) subject to an upper bound no on the expectation of C ( N ) . (ii) Limit o n accuracy (bound on the risk function). He then seeks to minimize the expectation of C ( N )subject to an upper bound p on the risk function. (iii) Both losses are economically important. Here he seeks to choose a terminal decision that will minimize the weighted sum of the risk function and E [C -
I)“
143
144
CHAPTER 4. SEQUENTIAL ESTIMATION
In general, there does not seem to be a sequential procedure that satisfies (iii) uniformly in 9 unless the criterion is modified. Such a modification is the Bayesian criterion of optimality. Regarding (i), the Cram&-Rao inequality implies that an optimal sequential procedure is the one based on a fixed-sample size no whenever the variance of its uniformly minimum variance unbiased estimator equals the Cram&-Rao lower bound. (This is the case whenever the unbiased estimator in a fixed-sample size procedure has a density belonging to the exponential family.) If the criterion is (ii), there is a justification for a sequential procedure even in the case of exponential type distributions. This has been shown by DeGroot (1959) and Wasan (1964) in the binomial case. We take this up in Section 4.2.
4.2
Sufficiency and Completeness
Let X1,X2, ..., be a sequence of i.i.d. random variables having common pdf f ( x ; 9 ) . We wish to estimate 9 by some function S ( X I ,...,X i ) , while using a stopping rule which is closed (that is, for every 8, P ( N 6 n) --+ 1 as n 3 00, although not necessarily uniformly in 9). The sample space is El E2 - - -,where Ei is contained in Ri and consists of those points ( X I ,..., X i ) which serve as stopping points. Again, N denotes the random number of observations taken. Throughout, we assume that the relevant conditional probabilities exist. Let Tn = T ( X I ,...,X n ) be a sufficient statistic for the joint density of X I , ...,X n .
+ +
Definition 4.2.1 The sequence ( T I T2, , ...) is called a sufficient sequence for the sequential model. Then we have the following result of E. Fay (see Lehmann (1950)).
Result 4.2.1 (E. Fay). If, for each n, Tn = T ( X I ,...,X n ) is a sufficient statistic for 9 in the fixed sample X I , ...,X n , then (T', N ) is a sufficient statistic for 9 in the sequential case. Proof. This theorem was proved by Blackwell (1947) under the assumption that the stopping rule depends only on the TN's. Here, we assume only the existence of conditional probabilities. Let E be any measurable set in U z l Ei. Then with E = U g l A i with Ai c Ei, i = 1 , 2,..., Ei = { N = i ) = set of stopping points in R'. Also one can look upon Ei as cylindrical sets in ROO. Consider P (EIN = n,T, = t ) (4.2.1)
4.2. SUFFICIENCY AND COMPLETENESS
145
Now E n En and En are sets in Rn. Since Tn is sufficient for 8 on the basis of X I , ...,X n , both numerator and denominator of the above expression are free of 8. Hence P ( E I N = n, Tn = t ) is free of 6'. Thus ( T NN , ) is sufficient for 6' in the sequential case. 99 Let { Tn} be a sufficient sequence for the sequence of random variables X 1 ,X2 ,..., Then the SPRT at stage n depends only on the observed values of Tn. Even in sequential estimation, it is reasonable to base our decision on Tn if experimentation is stopped at nth stage. Thus, we are led to the definition of transitivity. A sequence of sufficient statistics {Tn} is said to be a transitive sequence if for every n > 1 and for each 8 E 0 , the conditional distribution of Tn+l given ( X I ,..., X n ) = ( X I , ...,x n ) is equal (in distribution) to the conditional distribution of Tn+l given that Tn ( X i ,...,X n ) = tn ( X I , ...,x n ) . ,In other words, transitivity of {Tn}implies that all the information concerning Tn+l contained in X n = ( X i , ...,X n ) is also contained in the function Tn ( X n ) . Bahadur (1954) showed that if a sufficient and transitive sequence {Tn} exists, then any closed sequential procedure based on {X,} is equivalent to a procedure which at stage n is based only on T,. In the case of i.i.d. random variables, {Tn} is transitive if Tn+1(Xn+l) = qn {Tn ( X n ) ,Xn+1} , for every n > 1. The exponential family has this property. T ' , N ) . Assume that Tm is complete Next, we shall consider completeness of ( for every fixed rn.
Definition 4.2.2 The family of distributions of (TN,N ) is said to be complete if Eo [g (TN,N)]= 0 for all 6' implies that g (t,, n) = 0 almost everywhere for all n > 1. Definition 4.2.3 The family of distributions of (TN,N ) is said to be boundedly complete if for every bounded g (tn,n) , Ee [g (T', N ) ] = 0 for all 0 implies that g (tn,n) = 0 almost everywhere for all n > 1.
Notice that bounded completeness is clearly a weaker property than completeness. Lehmann and Stein (1950) have found a general necessary condition for completeness of the statistic (N,T'). It is of interest to explore the stopping rules for which (N,T') is complete. Lehmann and Stein (1950) have examined this question where the X ' s are normal (0, 1),Poisson or rectangular on (0,e). In the case of normal (0, l), ( N ,T') is complete if N = m (here T, = CZ, X i ) . In the binomial and Poisson cases, Tm = C p , X i . Let Sm be the set of values of Tm for which we stop at the mth observation. A necessary and sufficient condition for (N,T') to be complete is that the Si's be disjoint intervals, each lying immediately above the preceding one. For example, if the stopping rule is continue sampling until Tm exceeds c (a given value), then (N,T') is not complete.
CHAPTER 4. SEQUENTIAL ESTIMATION
146
Similarly one can obtain a necessary and sufficient condition for the rectangular case (see Lehmann and Stein, 1950, Example 2). Example 4.2.1 (Binomial Case). Let P ( X i = 1) = 6' and P ( X i = 0 ) = 1-6' for some 0 < 6' < 1. Then T, = X I X2 - - - X, is sufficient for 9 when m is fixed. Suppose we are given a closed stopping rule which depends only on the Tm7s.For a given stopping rule, each point (m,T,) can be categorized as (i) a stopping point, (ii) a continuation point, or (iii) an impossible point. The sample space of points ( N , T N )consists of all stopping points. Since we have a closed stopping rule, a continuation point is not a value that can be assumed by the sufficient statistic (N,T N ) .We are interested in estimating 6' unbiasedly. Since X1 is unbiased for 8, applying Rao-Blackwell Theorem', we find that
+ +
+
is unbiased for 6' and that war(Y) 5 var(X1). Each sample (X1,X2,...,X n ) can be viewed as a path starting at the origin (0) and ending at a stopping point (n,T,) . The ithstep of such a path is either to the point immediately on the right if Xi = 0 or to the point immediately above that if X i = 1. Obviously, a path cannot pass through a stopping point before reaching (n,Tn).The probability of any single path from 0 to the stopping point ( m , t )is Qt (1- 6')m-t. Let 7r (m,t)denote the number of paths leading to (m,t) starting at 0, and 7r* (m,t)be the number of paths leading to ( m , t )starting at O* = (1,l).It would be helpful if the reader draws diagrams. Then, we have
That is,
Y = r* ( N , T N ) T (N, TN)
(4.2.2)
'
Let us consider some special stopping rules: (i) Sample offixed size m. 'The Rao-Blackwell 'Theorem states that if T is sufficient for I3 and U is any unbiased estimator for 8, then V ( t )= E(UIT = t ) is unbiased for I3 and var(V) 5 var(U).
4.2. SUFFICIENCY AND COMPLETENESS
147
Then
and consequently y = t/m. (ii) Continue sampling until c successes are obtained. Then
Hence, y = (c - 1)/ (m - 1). (iii) A n y stopping rule with O* = (1,l) as a stopping point. Then
1, if (m,t)= O* 0 , otherwise.
Y ( m , t )=
Notice that rule (ii) with c = 1 reduces to rule (iii). That is Y = XI. (iv) Curtailed simple sampling. As in Section 1.2, we accept a lot if fewer than c defectives appear in a sample of size s, and we reject the lot if defectives greater than c are discovered. Thus the full sample of size s need not be taken, since we can stop as soon as c defectives or s - c 1 non-defectives are observed. It is customary, however, to inspect all the items in the sample even if the final decision to either reject or accept the lot is made before all sample items are inspected. One reason for this is that an unbiased estimate for 8 cannot be found if a complete inspection of the sample is not taken. The best unbiased estimate of 8 was provided by Girshick, Mosteller and Savage (1946): - c- 1 Y(x~c)=c+z--l-c+~--I
+
(",y) (
c-1
)
where x is the number of non-defective items examined, which is the unique unbiased estimate along the horizontal line corresponding to rejection with c > 1 defectives. Further the unique unbiased estimate along the line corresponding to acceptance for c > 1 is
where m is the number of defectives observed. Thus, the unique unbiased estimate is equal to the number of defectives observed divided by one less than the number of observations.
148
CHAPTER 4. SEQUENTIAL ESTIMATION
Girshick, Mosteller and Savage (1946) give an example of a general curtailed double-sampling plan, They also provide necessary and sufficient conditions for the existence of a unique unbiased estimate of 8. Sometimes we may be interested in estimating unbounded functions of parameters like l/8. Hence, completeness is more relevant than bounded completeness. DeGroot (1959) has considered unbiased estimation of a function of 8 namely h(8). From the Cram&-Rao inequality (see Eq. 4.3.7) we have
(4.2.3) where g denotes an unbiased estimator of h(8).
Definition 4.2.4 (DeGroot, 1959). A sampling plan S and an estimator g are said to be optimal at 8 = 80 if among all procedures with average sample size at 80 no larger than that of S, there does not exist an unbiased estimator with smaller variance at 80 than that of g. If a particular estimator for a given sampling plan, attains the lower bound in (4.2.3) for its variance at 00, then it is concluded that the estimator and the sampling plan are optimal at 80 and the estimator is said to be efficient at 8 0 . DeGroot (1959) has shown that the (fixed) single-sample plans and the inverse binomial sampling plans are the only ones that admit an estimator that is efficient at all values of 8. For inverse sampling plan, DeGroot (1959) has given an explicit expression for the unique unbiased estimator of h (1- 8). The stopping points of an inverse sampling plan is the totality of {ylT (y) = c } where T denotes the number of defectives at the point y. Then, for each nonnegative integer k there exists a unique stopping point yk such that N (yk) = c k . Since
+
P ( N = C + k ) = ( k + ~ -' ) e e q k ,
q = i - e.
(4.2.4)
Then, for any estimator g
(4.2.5)
Result 4.2.2 (DeGroot, 1959). A function h( q) is estimable unbiasedly if and only if it can be expanded in Taylor's series in the interval lql < 1. If h(q) is estimable, then its unique unbiased estimator is given by
(4.2.6)
4.2. SUFFICIENCY AND COMPLETENESS
149
Proof. h ( q ) can be expanded in Taylor's series in the given interval if and only if h ( 4 ) / (1 - q)" can be expanded. Then, suppose that
That is
00
h ( 4 ) = 8"
C bkqk. k=O
and taking
yields an estimator g with E (9) = h ( q ) . Suppose now that h ( q ) is estimable unbiasedly. Then, there exists an estimator g such that
or
Hence k+c-1 The uniqueness of g (Y~) follows from the uniqueness of the Taylor's series expansion, which is the basis of the completeness of this sampling plan. This completes the proof. I It is often possible to find the expectation of a given estimator in closed form by using the fact that if the series 00
(4.2.7) is differentiated m times within its interval of convergence, then
(4.2.8)
As an illustration of the technique involved, the variance of an unbiased estimator of 8 and the moment-generating function of N will be determined.
CHAPTER 4. SEQUENTIAL ESTIMATION
150
Result 4.2.3 (DeGroot, 1959). Let g ( y k )= ( c - 1)/ ( k a n unbiased estimator of 8 for c 2 2. T h e n we have -
(g2>=
1)e c (-1)'-l In8
qc-l
+ c-2 i= 1
+ c - 1) which is
z
(4.2.9)
where q = 1 - 8 and
(4.2.10) Proof. 00
E (g2) =
ec (.
-
q2C +
-
k=O
=
qCq2--
1
dC-l (c - I)!dqC-l
[g
]
r2qk
,
after using (4.2.8). Note that the constant term in the last series on the right side is taken to be zero. Its value can be assigned arbitrarily since it does not appear in the derived series. However,
k=O
J
k=l 00
k= 1
-
Consequently, dC-
where
1
Ig ]
[
-dc-2 l n ( 1 - q ) K2qk = dqc-2
]
151
4.2. SUFFICIENCY AND COMPLETENESS Using this would complete the proof for (4.2.9). For t
< ln(l/q),
(eet)>"(1 - 4 e t ) - c .
=
This completes the proof for (4.2.10).
Remark 4.2.1.1 Haldane (1945) gives E ( g 2 ) in the form of an integral which after repeated integration by parts would yield (4.2.9).
Corollary 4.2.3.1 E ( N ) = c/9, and v a r ( N ) = c ( l - 8)/e2 achieving the Cram&-Rao lower bound (4.2.3). T h u s N is a n eficient estimator of its expected value. Notice that g ( r k ) = ( c - 1)1( k + c - 1) i s not e f i c i e n t in this sense. T h e eficient unbiased estimator of E N ( w i t h h (4) = c/ (1- q) in (4.2.6)) is given by
[
C!
(k
1
+ c - l)!
(c- k
+ 2) ( c - k + 3) -
*
- ( c + 1).
In the following we shall state DeGroot's (1959) theorem pertaining to the optimality of the single sample plans and the inverse binomial sampling plan.
Result 4.2.4 (DeGroot, 1959). For all stopping points y such that N (y)= n, any non-constant function of the f o r m a + bT is a n e f i c i e n t estimator of a+bn9, and these are the only e f i c i e n t estimators. For y such that T ( 7 )= c , any non-constant function of the f o r m a+ bN is a n eficient estimator of a+ bc (lp), and these are the only eficient estimators where T denotes the number of defective items. For the problem (ii) posed in Section 4.1 (that is, to choose a procedure S ( X ) which minimizes E ( N ) subject to E [S( X ) - el2 5 p) Wasan (1964) has shown that the fixed-sample procedure is admissible and minimax [see Wasan (1964, Theorem 1, p. 261)]. Consider the following symmetric curtailed sample procedure S* ( k ) whose stopping points are ( k i, k ) and ( k i,i) , i = 0, 1,..., k - 1. Here
+
P(N
=I
C + ~ ,=TIC)~=
+
(k + i - - l k-1
)ek
(1- el2 i = 0 7 i ,...?k - 1
(4.2.11)
152
CHAPTER 4. SEQUENTIAL ESTIMATION
and
P(N=k+i,TN=Z)=
(
IC4-i-1 )oi((1-6)", i = O , l , k-1
..., k - 1 .
(4.2.12)
Then the unbiased estimate of 8 (proposed by Girshick et al., 1946) is given by
for i = 0,1, ..., k - 1. Wasan (1964) studied the asymptotic optimality of g (yk) for large k. This is given in the following theorem.
Theorem 4.2.5 (Wasan, 1964). We have (4.2.14) and
Hence g ( y k ) is asymptotically uniformly better than the fixed-sample size procedure. Wasan (1964) also demonstrates that the fixed-sample size procedure with m = I / ( 2 4 ) has the smallest total risk equal to ,/Z = rnaxo<0<1 [cm 0 (1 - 0) /m] among all procedures having bounded Ee ( N ). Thus, the fixed-sample procedure with m = 1/ (2&) is minimax and admissible for problem (iii) posed in Section 4.1.
+
4.3 Cram6r-Rao Lower Bound Suppose the statistician is interested in solving problem (i) posed in Section 4.1, that the terminal decision is an estimate for 8, and that r (6(x); 0) = [S (x)In this case, if we restrict ourselves to unbiased estimates of 0 we would then be interested in lower bounds for the variance of such estimates. The Cram&-Rao inequality was extended to the sequential case by Wolfowitz (1947). Towards this we need the following lemma pertaining to a random sum of i.i.d. random variable.
el2.
Lemma 4.3.1. Let SN = X1 + X2
+ - - - + X N where the X i
are 2.i.d.
4.3. CRAMER-RAO LOWER BOUND (2)
153
If E ( N ) < 00 and E ( X (< 00, we have E(,S")= E ( N ) E ( X ) .
(ii) I f E ( X ) = 0 , E ( X 2 ) < 00, and E ( N ) < 00, E(S$) = E ( N ) E ( X 2 ) . (iii) If E [ g ( X ) ]= E [h( X ) ]= 0 , E [ g 2 ( X ) ]< 00, E [ h 2 ( X ) ]< 00 and P ( X = 0 ) < 1, then
Proof. Notice that (i) and (ii) follow from Theorem 2.4.2. (iii) has been established by Lehmann (1950) under a different set of sufficient conditions. Consider
where Y , = 1 if N 2 i and zero otherwise. So, left hand expression
i< j
i=l
i> j 00
i=l
since for i
i<j
< j , 5 = 1 implies that Y , = 1.
In order to interchange expectation and infinite summation, the series should
CHAPTER 4. SEQUENTIAL ESTIMATION
154 be absolutely integrable. That is
after applying Cauchy-Schwarz inequality and using the independence of Xi and Y,. From Theorem 2.2.1, we infer that there exists a p ( 0 < p < 1) such that
provided P ( X = 0)
< 1. Hence
i= 1
i=l
<
00.
Wolfowitz (1947) has given a generalized version of Wald’s equation [Lemma 4.3.l(i)] which is valid even if the underlying variables are dependent.
Lemma 4.3.2 (Wolfowitz, 1947). Let ui = E (XiIN 2 i) exist for all i where i assumes positive integers for which P ( N 2 i ) # 0. Then, write ui = E (IXi- vil IN 2 i), i = 1 , 2 , ... and assume that the series Czl x P ( N = i) converges. Then
Proof. Consider
155
4.3. CRAMgR-RAO LOWER BOUND where yZ = 1 if N
2 i and zero elsewhere. Then
i= 1
= 0.
The interchange of integration and infinite summation is justified because 00
00
i=l
i=l
<
00.
Corollary 4.3.3.1 (Wolfowitz, 1947). Let the Xi be independent r.v. 's having different distributions. Suppose that all ui are equal, except perhaps f o r those i for which P ( N > i ) = 0. Further assume that E ( N ) exists. Then (4.3.2) holds. Notice that Corollary 4.3.3.1 is a special case of Lemma 4.3.2 and is a generalization of Lemma 4.3.1(i). If the random variables Xi are i.i.d., then u = E (Xi) and v' = E IXi - vI , i = 1,2, .... In the sequential sample space, let E l be the set of values of X1 for which we stop after taking one observation. Let Ei be the set of values of X I ,X2, ...,Xi for which we stop after taking i observations. Notice that ( ~ 1 ~ x..., 2 xi) , E Ei implies that ( ~ 1 ~ x..., 2x, j ) $ Ej, j = 1,2, ...,i - 1 (i = 1,2, ...) . The totality of stopping points is El U E2 U - - -.Recall that N denotes the random number of observations taken. Then if we define g ( q 2,2 , ...,Z N ) = g ( X I , 22,...,xi) whenever ( x i ,z 2 , ...,xi) E Ei (i = 1 , 2 , ...) we have
We assume that Po (uy Ei) = 1.
Theorem 4.3.1 (Wolfowitz, 1947). Let TN = S ( X l , X 2 ,...,X,) be an estimate of 0 , such that Ee ( T N )= 8 -/- b ( 8 ) . Suppose that differentiation underneath the summation and integral signs is permissible in Ee ( 1 ) = 1 and
CHAPTER 4 . SEQUENTIAL ESTIMATION
156
Ee ( T ' )
= 6'
+ b ( 0 ) where b' (0) exists.
Then
(4.3.4)
Proof. We have
i=l
That is, Ee ( S N )= 0 where
(4.3.5) i=l
provided (af/ae)dx = 0 and Ee laln f/%l 8 b (0) similarly yields
+
< 00. Differentiating Ee (T')
=
However, from Lemma 4.3.1(ii) we have
provided Ee ( a l nf/a812
< 00.
Remark 4.3.1.1 (Lehmann, 1950). If we are restricting ourselves to sequential estimation procedures for which Ee ( N ) 5 no and the regularity conditions of Theorem 4.3.1 hold for all such estimates, then for every unbiased estimate 7' of 8. we have (4.3.6)
In the case of a normal distribution with mean 8 and variance unity, binomial and Poisson distributions, one can establish the validity of (4.3.6) for all unbiased estimation procedures for which Ee ( N ) 5 no provided there is an M such that P ( N 5 M ) = 1. this additional restriction to bounded procedures is inconvenient from the theoretical point of view although in practice it is no restriction at all since A4 could be fairly large.
4.3. CRAMER-RAO LOWER BOUND
157
Remark 4.3.1.2 If TN is an unbiased estimator of h(8) then one can analogously show [see, for instance Wolfowitz, 19471 that (4.3.7)
If 8 = (81,82, ...,8,)' and if we are interested in unbiased estimation of a single component say 81, then Wolfowitz has also obtained (4.3.8) where T N ,denotes ~ any unbiased estimate of 81.
Example 4.3.1 Let X be distributed as normal with mean 81 and variance 82. Let T',1 be an unbiased estimator of 81. Then
Blackwell and Girshick (1947) have shown that the lower bound given by (4.3.4) for the variance of an unbiased estimate of 8 is attained only for the sequential procedure for which P ( N = n) = 1, if the probability density function f(x;8 ) of X is such that E ( X ) = 0 and X1 X2 - - - X, is sufficient for 0 for all integral values of m where XI, X 2 , ...,X m are independent observations on the random variable X. Seth (1949) has extended Bhattacharyya's (1946) bounds to the sequential case which in some respects are more general than those of Wolfowitz (1947). In the following we shall give his result specialized to unbiased estimates of 8 on the basis of i.i.d. observations.
+ +
+
Theorem 4.3.2 (Seth, 1949). Let TN = T (XI,X2, ...,XN) be an unbiased estimate of 8 having a finite variance and let 9 lie in an open interval I . The derivatives dif (x;6 ) ldBi (i = 1,2, ..., k ) exist f o r all 8 in I and almost all x. Let
and
((xij))
= ((xii))-l.
(4.3.10)
Further assume that Eo (1) and Ee ( T N )are differentiable underneath the infinite summation and the integration at least k times. Then, we have war ( T N )
2 A".
(4.3.11)
CHAPTER 4. SEQUENTIAL ESTIMATION
158
Corollary 4.3.2.1 If k = 1, Theorem 4.3.2 reduces to Theorem 4.3.1. Seth (1949) has also obtained conditions under which the inequality in (4.3.11) (and hence, in (4.3.4) becomes an equality.
4.4
Two-Stage Procedures
4.4.1 Stein's Procedure for Estimating the Mean of a Normal Distribution with Unknown Variance It is known that there does not exist a fixed-sample size procedure for estimating the mean of a normal population (when the variance is unknown) with a confidence interval of fixed-width and specified confidence coefficient. Stein (1945) has presented a two-sample procedure, in which the size of the second sample depends upon the result of the first sample, for the problem of determining confidence intervals of preassigned length and confidence coefficient for the mean of a normal population with unknown variance. In order to make the length of the confidence interval free of the variance, it seems necessary to "waste" a small portion of the information contained in the sample. Thus, in practical applications one would, if possible, modify this procedure still preserving this property, and use an interval of the same length, whose confidence coefficient (although a function of 0)is always greater than the desired value and at the same time, reducing the expected number of observations by a small amount. The two-sample procedure will be a special case of sequential estimation. It is further shown by Stein (1945) that if the variance and initial sample size are sufficiently large, the expected number of observations differs only slightly from the number of observations required by a single sample interval estimation procedure when the variance is known. Let Xi (i = 1,2, ...) be independent normal variables having mean 0 and cr2 (unknown). We wish to estimate 8 by a confidence interval of specified length 2d and specified confidence coefficient 1 - a. Take a sample size no observations XI, X2, ...)Xn,and compute the sample variance given by
(c
1 no - 1 Then, define n by ,=mu{
xi)2] *
(4.4.1)
i= 1
[3
+l,nO+l}
(4.4.2)
where z is a previously specified positive constant and [-I denotes the largest integer less than (.). Now, take additional observations Xno+l,...)X n . Also, choose real numbers ai (i = 1,2, ...)n) such that
4.4. TWO-STAGE PROCEDURES
(i) Cz1ai = I, and
a1
= a2 = -
159
- = ano
(ii) s2 Cy=laf = z. This is possible since n
1 2 m i n x a f = - 5 - by (4.4.2) n s2
(4.4.3)
the minimum being taken subject to the condition (i). Then define the statistic by
u
(4.4.4)
aiXi (a1 = a2 = - - - = ano) is independent of s2 because of normal samples and aiXi is independent of s2 because they are based on two mutually independent set of observation. Write
-xE:+l
For given s, U is distributed as normal with mean 0 and variance 02/s2. It was shown in Section 1.4 that U is distributed as Student’s t with no - 1 degrees of freedom. A confidence interval for 8 of specified length 2d and confidence coefficient 1 - a is then given by n
n
i=l
i=l
+d
(4.4.5)
where
(4.4.6) tno-l,l-a/2
= l O O ( 1 - a/2) percentage point of the
t distribution with no
degrees of freedom. The distribution of n, the sample size is given by
where y = (n: - 1)z / 0 2 , and for w
> no +1
-
1
CHAPTER 4. SEQUENTIAL ESTIMATION
160
all other values of v being impossible. Hence
c 00
E ( n ) = (no+ l ) P ( n = n o + l ) +
wP(n=v)
v=n0+2 00
= (no + 1)p
(x:o-l
< Y)
+ Jv=n0+2
U P( n = v) d v fno-1
1
( u ) d u dv
where fno-l (u)denotes the chi-square density with no - 1 degrees of freedom. Thus after interchanging the variables of integration, we have
By replacing the integrand v by the upper and lower limits of integration on v one can get the bounds for E ( n )obtained by Stein (1945). However, one can obtain an exact expression for E (n). By performing the integration on v
Thus,
1
(4= -2 + (no +
;) p
o2
(x:o-l
< Y> + (no- 1) p
> Y)
(X?L0+l
(4.4.7)
For moderately large no, one can use Fisher's approximation to the chi-square, namely P - (2v 1 p 2I t ) @ (t)
(a
or
+
p(x;~s)=~(,&dWi).
Then (4.4.7) becomes
\
I
(4.4.8)
Thus E ( n )is a function of o2 and can be evaluated from the chi-square tables or normal table.
161
4.4. TWO-STAGE PROCEDURES
As mentioned earlier, in practical applications instead of (4.4.2) we take a total of n=max
I;[
+M o }
(4.4.9)
observations and define
(4.4.10) where U' has the t distribution with no - 1 degrees of freedom. By (4.4.2), n > s 2 / z , so that although dn1I2/s is random,
(4.4.11) Thus
provided z is defined by (4.4.6). Thus the interval
(4.4.12) has length 2d and the probability that it covers the true parameters is a function of u but is always greater than 1 - a and differs slightly from 1 - a provided u2 > noz. Also E ( n )will be reduced from that in (4.4.7) by P (xiovl< 9). Thus (4.4.12) can be used instead of the confidence interval (4.4.5). From (4.4.7) it follows that
(4.4.13) Thus, the approximation E(n) = 02/z being fair provided u2 > xno. The length of the confidence interval is given by
When u2 is known, the length of the single-sample confidence interval of confidence coefficient 1- a obtained on the basis of n observations is given by
Hence, if no is moderately large (say > 30) the expected number of observations of a confidence interval of given length and confidence coefficient is only slightly larger than the fixed number of observations required in the single-sample case when the variance is known provided the variance is moderately large.
CHAPTER 4. SEQUENTIAL ESTIMATION
162
4.4.2
A Procedure for Estimating the Difference of Two Means
Putter (1951) considered the problem of estimating the mean of a population composed of a known number of normally distributed strata whose relative proportions are known. [see also Robbins (1952, p. 528)j. Ghurye and Robbins (1954) considered the problem of estimating the difference of two means by a two-stage procedure and in the following we shall present their results. Let IIi be a population with unknown mean Bi and variance a: (i = 1 , 2 ) ; we wish to estimate the difference 81 - 8 2 . Let ( n )be the mean (Xi1 Xi2 - - +Xi,) of a sample from Hi. Then TI ( n l )- T2 (n2) is an unbiased estimate of 2 81 - 8 2 , with variance (a?/ni). Let the cost of sampling be an unknown linear function of the number of observations. The cost of taking a sample of size n1 from IIl and a sample of 722 from I I 2 is a1n1 a2n2 a3. If there is a prescribed upper bound A0 to the cost of sampling, n1 and 722 are subject to the restriction (4.4.14) a1n1+ a2722 5 A = A0 - a3
+
+
+
+
xi
The quantity (aT/ni)which is equal to the variance of 2-1 (nl)-T2 (712) for integral values of the ni, is minimized for continuous ni > 0 subject to (4.4.14) by taking ni = ng where
(4.4.15) the minimum value is given by
(4.4.16) When the ratio p = 02/01 on which the optimum value (4.4.16) depends is unknown, one can use a two-stage procedure for estimating 81 -&, first by taking a sample of ml m2 observations, mi from IIi, and then using estimates of ai obtained from this preliminary sample to distribute the remaining observations between the IIi. We shall investigate the performance of this estimation procedure when IIi are normal.
+
Normal Populations When the IIi are known to be normal, choose positive integers mi such that u l m l + u2m2 < A and take mi observations from IIi. Let
(4.4.17) (4.4.18)
163
4.4. TWO-STAGE PROCEDURES
(4.4.19)
(4.4.20) fii
=
[na],(i = l , 2 ) .
(4.4.21)
where [XI denotes the largest integer contained in x. Having computed fii we take (fii - mi) additional observations ( X i j , j = mi 1,...,iii) from lli, and estimate 01 - 0 2 by
+
(4.4.22)
( A ) = war {TI( f i l )- T2 ( f i z ) } . Now, one can write
Let
where
Since the fii depend only on the s i ( m i ) for fixed sir the random variables ( f i l - m l ) , T2 ( 6 2 - ma) are mutually independent, and the conditional distributions of T, (mi) and T , (iii - mi) are respectively normal (Oi, af/mi) and normal (0,,0:/ (fii - mi)).Hence, for fixed si the conditional distri(of/fii)). Hence bution of TI ( f i l ) - T2 ( 6 2 ) is normal (01 - 02,
TI (rnl), T2 (mz),2'1
xi
E [Tl( f i l )
- T2 ( f i 2 ) ] = O1
- 0 2 and V ( A )= E
[
2
4
.
(4.4.23)
Ghurye and Robbins obtained an explicit expression for v ( A ) and also an approximation to it namely V*( A ) when [-]'s [the largest integers contained in (-)'s] are replaced by (-)'s. They tabulate the ratio V * / V ofor selected values of the parameters, p = 02/01,ma/A and A / . where a = a1 = a2 and m = m1 = m2. Based on these computations, they infer that the two-stage procedure provides considerable improvement over the usual one-stage procedure for values of p away from 1; and the performance seems to be best for ma/A in the neighborhood of all (01 0 2 ) if 0 1 < 02.
+
164 4.4.3
CHAPTER 4. SEQUENTIAL ESTIMATION Procedures for Estimating the Common Mean
Let I I 1 , I 3 2 be two populations having the common mean 8 and variances a:, 0:. We wish to estimate 8 using a fixed number of observations. If the population variances are known, the efficient procedure would be to take all n observations from that population having the smallest variance. When prior information about the variances is not available or is too vague to be quantified, it is natural to consider the procedure which consists of taking a preliminary sample of size m from each population, computing estimates of the variances, and then taking the remaining n - 2m observations from the population having the smaller estimated variance. If m is too large or too small, the advantage of the two-stage sampling scheme over the procedure of simply taking n/2 observations, from each population will be lost. Hence it is of interest to determine for some good estimator an optimum choice of m as a function of n, not dependent on the unknown variances. As an example suppose that there are two devices for measuring a physical constant, that each measurement is expensive or time consuming so that their total number is limited, and that we wish to estimate the constant as accurately as possible. Richter (1960) considered this problem and his results will be given below. Let X i l , Xi2, ...,Xi,m be a random sample from Hi (i = 1,2) and let 7 = a;/a; . Also, let
(4.4.24) so that l / R is the usual estimator of 7 based on 2m observations. Then take observations Xl,rn+l)Xl,m+2)..., X1,n-m if R < 1 or take observations X2,m+l, X2,m+2, .-.,X2,n-m otherwise. Write y i , =~ ~ X i j l N ; , i = 1,2. We will consider estimators 8 of 8 which are of the form
cy$
(4.4.25) where N1, N2, A1 and A2 are random variables such that N1 = n - rn if R < 1, N~=rnifR~l,N1+N2=n,O~Ai~l,(i=1,2)andA1+A2=1 probability one; and besides, where A1 and A2 are such that
(4.4.26)
EH [ytk]= E
[x;'], (i = 1,2) , for all 1 and k
where EH [.] = E [.lH] and H = (N17N2,A1,A2).If the X i j are assumed t o be normally distributed, then, the sample mean and sample variance are independent. Hence, the above assumption may be replaced by the assumption that A1 and A2 are functions of the sample variances only. Estimators of the form
4.4. TWO-STAGE PROCEDURES
8 seem reasonable since, if
165
observations are - available on normal variables X i j ,
j = 1 , 2 , ...,ni, i = 1 , 2 , and q is known, a l X l l n l+u2y2,n2 is uniformly minimum variance unbiased estimator of 0, where a1 = nlq/ (nlq n2) and a2 = 1 - a l . Next, let & = (l/n) min (a:, o;),which is the variance of the standard es-
+
timator of 8 for the case when sgn(ai - u:) is known beforehand, and define 2
& (m;q ) = V C I E (8 - 0) to be the risk function associated with the estimator
8. The first part of the following theorem implies that & (m;q ) = VC'vur Theorem 4.4.1 (Richter, 1960). F o r any e s t i m a t o r of t h e form
8,
Proof. Since Next
[
(A
which proves (ii) since R,(m;q ) = V c lE EH 0 - 0 ) 2 ] . Finally, A?/N1 qAi/N2 has a unique minimum with respect to A1 = 1-A2 at A1 = N l q / (N2
+
+
+ N1q:
so that E [ A ? / N l + qA;/N2] 2 qE [(Nz Nlq)-'] which proves the left-hand inequality of (iii) ; since N2 N1q 5 n max ( 1 , q ) the proof is complete.
+
Let us examine the risk function for the usual one-stage experiment for estimating 0, which would be to observe n/2 of the X l j and n/2 of the X2j. If we confine ourselves to unbiased estimators 0', and assume the variables to be normally distributed, then war (8') > 2&n (1 q ) , since ( q x 1 x2)/ (1 q ) is the minimum variance unbiased estimator with variance 24/72 (1 q ) when the variances are known. Then
+
since max ( 1 , q ) = 1 / min ( l , l / q ) ,and
+
+
+
CHAPTER 4. SEQUENTIAL ESTIMATION
166
with equality if and only if 77 = 1. Hence, for each fixed 77 # 1, the risk function is bounded away from unity independent of the sample size. One would hope that the risk function for the two-stage scheme would prove t o be smaller, and we shall show that it is so for large samples at least provided m is suitably chosen. For the two-stage experiment, it is clear that once an estimator is specified, the only variable left at the statistician's disposal is the quantity m. Then given an estimator of the form 6, we may say that any real-valued function m(n)such that 4 5 2m(n)< n for all n 2 5 is a solution to the problem. With respect to an estimator of the form 6, m(n)will be called a uniformly consistent solution (UCS) 771 --+ 1 as n 00. We shall restrict to such solutions if they if supv R, [m(n); exist. Further, if supv l& (m;7 ) < 00, a solution which minimizes supv & (m;77) will be called a minimax solution (MMS). If there exists a UCS then a MMS is UC too. Hence the minimax principle provides a means of selecting one solution from the class of UC solutions. --+
A Simpler Estimator In the following, we shall derive an asymptotic minimax solution for a particular unbiased estimator. For the subsequent considerations, we shall assume that the Xij are normally distributed. Hence rlR = s ~ a i / s i ahas ~ the F-distribution with m - 1,m - 1 degrees of freedom, and we write
K(m;q) = P ( R > l )
+
Now define 6 1 = A ~ X ~ , NA2132,~~, , where A1 = 1 or 0 according as R < 1 or R 1. This estimator has the form of 8 and by Theorem 4.4.1, 8 is unbiased
>
(4.4.2 7 )
It is easy to show that R1n (m;77) = R1n (m;1/77) by using the fact that K (m;77) = 1- K (m;1/77);thus, when considering supq R, (m;7 ) )we can assume that 77 2 1. Thus, we have the following result towards the MMS for 6. Theorem 4.4.2 (Richter, 1960). The minimax solution f o r 6 is m(n) = (~n/2+ ) ~O(n1l3) /~ and min, maxv R1n (m;77) = 1+ 3 ( ~ / 2 ) ~ / ~+ nO - l( n /~ -2/3) where c = 2rt@(-7') and r' is the solution of the equation: D !
(-7-)
- rq5 ( r )= 0.
4.4. TWO-STAGE PROCEDURES
167
Proof. See Richter (1960, Theorem 2). Note that 0.7 5 r 5 0.8. W
A Class of Estimators One may ask if better (in the sense of smaller risk) estimators exist and if they do, if results like Theorem 4.4.2 can be found for such estimators. Richter (1960) provides an affirmative answer t o both these questions. Let
+
whose risk is Rzn ( m ;7 ) = n m a x ( 1 , ~- E ) [I/ (N17 N z ) ] ;then R2n (m;7 ) is a lower bound for the risk of all estimators of the form 8 by Theorem 4.4.1 (iii). However, when v is unknown, one can replace it by f j where f j + 7 in probability. It is mathematically convenient to use a i j based on the first stage only; we take f j = 1/R and define
Another motivation for 8 3 is as follows. When q is known, for a one-stage experiment, the uniformly minimum variance unbiased (UMVU) estimator is ( n 1 ~ y 1 ,~~1 ~ 2 2 2 , /~ (~n )l q 722) . However, when n l , 722 and 7 are unknown, taking $ t o be the usual estimator of v based on 2 m i n ( n l , n 2 ) observations, and replacing n1,na by the random variables N1, N2 we obtain 8 3 . Another estimator which might be considered is 8 4 , the grand mean of all the observations: 8 4 = ( N I ~ I , -INN , 2 x 2 , ~/n. ~ ) For 8 4 there exist no UC solution and no nontrivial MM solutions. For 8 2 , Richter (1960) obtains a theorem similar t o that of Theorem 4.4.2.
+
4.4.4
+
Double-Sampling Estimation Procedures
Suppose that we are interested in estimating an unknown parameter 8 having specified accuracy, using as small a sample size as possible. The accuracy could be in terms of variance ~ ( 8 )some ) given function of 8. Another problem of interest is to estimate 8 by a confidence interval having a specified width and a specified confidence coefficient y. Since it is not possible t o construct an estimate meeting the specifications on the basis of a sample of fixed size, one has t o resort to some kind of sequential sampling. Cox (1952b) proposed a double sampling procedure for the above problem. The basic idea is to draw a preliminary sample of observations which determines how large the total sample size should be. Stein’s (1945) two-stage procedure for the normal mean is a special case of the double sampling method since the underlying distribution is known. Furthermore, the double sampling methods of Cox (195213) differ from those used in industrial inspection because, in the latter case, the second sample is of fixed size. Although the theory of double
CHAPTER 4. SEQUENTIAL ESTIMATION
168
sampling developed by Cox (195213) is primarily for large-sample sizes, hopefully it is reasonable for small sample sizes. In the following, we present an estimate of 8 having bias 0 (no2) and variance a (8) [1+0 ( n t 2 ) ]where no is the preliminary sample size and a (8) the specified variance.
Estimation with given Variance: Single unknown Parameter Let 8 be the unknown parameter we wish to estimate with a specified variance equal to a function of 8, namely a ( 0 ) which is small. Assume the following: For a fixed-sample size m, one can construct an estimate T(m)of 8 such that (i) T(m)is unbiased for 8 with variance u (8) / m .
(ii) the skewness coefficient of T(m)is of order y1 (8) m-l/' and Kurtosis of T(m) is 0 (m-l) as m becomes large; (iii) Asymptotic means and standard errors can be derived for a (T(m)),u (T'")) by expansion in series.
Procedure; (a) Take a preliminary sample of size no and let TI be the estimate of 8 from this sample. (b) Let (4.4.28)
ii (Tl) = n (Tl)[1+b (531
where
with m (0) = l / n (0). (c) Take a second sample of size max [0, ?i (TI) - no] and let T2 be the estimate of 0 from the second sample.
(d) Define (4.4.30) and
TI=
{ TTl
- m' 7
( T )u ( T ), if no 5 ?i (TI) if no > ?i (7'1)
(4.4.31) '
4.4. TWO-STAGE PROCEDURES
169
(iv) If no < n (8) and the distribution of m ( T I )is such that the event n ( T I )< no may be neglected, then T' has bias and variance of the desired orders.
Example 4.4.1 Suppose we wish to estimate the normal mean 8 with fractional standard error all2. Here a (0) = d 2 ,n(0) = 02/ (aB2) and b ( 0 ) = 8a u2/ Thus the total sample size is
+
@zoo2).
+
n (Tl)[I b (Tl)] = fi (Tl) -
8a2 +-+@ TI o2
o4
noaTf
where TI is the mean of the initial sample. T' = T ( l - 2a) where T is the mean of the combined sample. We should choose no sufficiently large so that no << 02/(a02). Example 4.4.2 Suppose we wish to estimate the binomial proportion 8 with specified variance V . Here we set a (0) = V, u (0) = 8 (1- 0) and y1 (0) = (1- 2e) [e (1- e ) ~ - l / ~ . Then
m'
zv [I - 3e (1- ell e3 (1- e)3 3 - se(i - e) 1 - 3 e ( i - e) +
(e) = v (28 - 1) and m" e2 (1- e)2'
Thus b (e) n (0) =
(8) =
e (1- s)
n0
Consequently, the total sample size n is
n=
a1 (1
e
-Al)
V
(1 - e 8, (1 - al)
+3-8i1
where no is the initial sample size and preliminary sample.
A1)
61
+ 1- 381 (1 - 81) Vn0
is the estimator of i9 based on the
Example 4.4.3 (Estimation of the binomial 8 with specified coefficient of variation = c1I2). Here a (8) = 02c, u and yl, are as in Example 4.4.2. Thus
ce
m ( e )= (1 - e) Hence
C 2c m'(e)= - m'' (0) = (1- e)2' (1-
e)3
*
170
CHAPTER 4. SEQUENTIAL ESTIMATION
Then computations yield
3c 1 b(6)= (1- 6)2 no (1 - 8 )
+
and
where
81 is the estimate of 6 based on preliminary
sample of size no.
Estimation in the presence of a Nuisance Parameter We now suppose that in addition to the unknown parameter 6 , which is to be estimated with variance a ( 6 ) which is small, there is an unknown nuisance parameter $. Assume that in samples of any fixed size m, we can find estimates T(") and C(") of 0 and $ such that (i) T(") and C(") are unbiased estimates and have variances $ / m and rG2/m where T is asymptotically constant, (ii) if m is large, asymptotic means and standard errors can be developed for combinations of T ( m ) ,C(") and a (57'")) by expanding them in Taylor series.
1cf
Procedure: Take a preliminary sample of size no and let TI, C1 be the estimates of 9 and based on the initial sample. Set
(4.4.32) where
r n0
(4.4.33)
Take a second sample of size max [0, {ii ( T I Cl) , - no}]and let T2 be the estimate of 8 from it. Set
(4.4.34) and
(4.4.35) Then, assuming as before that
4.4. TWO-STAGE PROCEDURES
171
(iii) the possibility that no > ii ( T I Cl) , can be neglected, one can show that T' has bias 0 (no-2)and variance a (0) [1+ 0 (no2)] .
Example 4.4.4 (Estimation of a normal mean with given standard error Let the method be based on the sample mean. Then $ is the unknown population variance u2 and is estimated by the usual sample variance, that is, take C1 = s:. Then r = 2, a (0) = a and
(4.4.36) and the final estimate is the pooled sample mean T , which can easily be shown to be unbiased. The expected sample size is u2 (1 2/no)/ a . Thus, ignorance of u increases the sample size by the ratio (1 2/no). Cox (1952b) section 4) shows that except when the preliminary sample size is small, the expected sample size of the best double-sa,mpling procedure is slightly larger than that of the best sequential procedure.
+
+
Example 4.4.5 (Estimation of a normal mean with specified coefficient of ) . set a (8) = Q2c, $ = u 2 ,Tm = $> $m = s : and obtain the variation c ~ / ~Here total sample size to be
where ?jand : s? respectively denote the sample m.em and sample variance based on the preliminary sample of size no. Confidence Intervals
Suppose that we want to estimate 9 by a 7%confidence interval of predetermined form, Further, suppose that we have obtained an estimate T' after a sampling procedure designed t o give a variance a (0) . If we want a (1- 2a)100% confidence interval for 8, we let x l V a denote the (1- a)thpoint on the standard normal distribution. Then define 8-, 8+ by the equations
(O-, 13,) will be the required confidence interval. If an explicit solution is impossible, Equation (4.4.37) is solved by the successive approximation method given by Bartlett (1937).
CHAPTER 4. SEQUENTIAL ESTIMATION
172
Example 4.4.6 Suppose that TI is the estimate obtained after the procedure of Example 4.4.1 for estimating a normal mean with given fractional standard error all2. Then a (0) = a02 and (4.4.37) yields for the 95% confidence interval
e-
=
(1
+ 1 . 9 6 ~ ' / ~' )e+ = (1 - 1 . 9 6 ~ ~ / ~ ) *
Formula (4.4.37) assumes that TI is normally distributed. A refinement of the method depends on evaluating the skewness yl, and kurtosis y2 of TI and making a correction for these based on the Cornish-Fisher expansion. We shall illustrate the method by the following example.
Example 4.4.7 Confidence interval of given width for a normal mean, variance unknown. Suppose we wish to construct a confidence interval after the procedure of If Example 4.4.2 for estimating a normal mean 8 with given standard error TI is the final estimate, in this case, the sample mean, the (1 -2a)100% confidence interval is, from (4.4.37) (TI - z1-aa1/2, TI z ~ - , a l / ~ ) Now, . it can be shown that for the distribution of T', y1 is zero and y2 is 675'. Thus, the Edgeworth expansion for the distribution of the standardized T' is given by
+
Now if we set F (x)= 1 - a and wish to find x, then we must solve for x. Since in the case of TI, y1 = 0 and y2 = 6/71,)) we have to solve for x such that
1 F (x)2 @ (x)- - (32 - x3) cp (x). 4 = x (say). As a first approximation, z = @-' (1- a ) = Thus we wish to find a refinement of x, namely x*, from the equation
1
F (z*)= @ (2)- - ( x 3 - 32) 4no
,
(2)
or
(-) around @ ( z ) we obtain
Now expanding z*
= @-I
=
1
[a(41+ - ( z 3- 3 4 cp (x) 4n0
1 +( x 3 - 32) . 4n0
1 cp {@-l [@
+...
(41>
4.4. TWO-STAGE PROCEDURES
173
Note that the above can also be gotten from the Cornish-Fisher inversion of the Edgeworth expansion. Thus, the normal multiplier should be replaced by xl-, (x:-, - 3tl-,) /4no = Z T - ~ , say. If we use the normal multiplier, the width of the confidence interval is 2 ~ 1 - , u ~ / and ~ , if we use the corrected multiplier, the width is 2 ~ ; - , a ~ / ~To . solve Stein’s problem so that the confidence interval is of width A, we take u1/2 = A/ (2x1-,) or A/ (22,3-,) . The corresponding sample size functions are, from (4.4.36)
+
(4.4.38) or in the second case
~, In Stein’s exact solution the corresponding sample size is 4 ~ ~ A - ~ t -1 where t2alno-l is the two-sided 200a% point of the t distribution with no - 1 degrees of freedom. From the exact solution we can compute the percentage error in the approximate formulae (4.4.38) and (4.4.39). Cox (195213) finds that for .01 5 a 5 .lo, the percentage error based on Formula 14.4.38 is fairly small even when no is as small as 10, provided that a is less than .025. The correction for kurtosis yields a significant improvement. These results indicate that Cox’s (195213) approximate formulae given in this subsection will be reasonably accurate for all no unless no is very small. Remark 4.4.1 There are two situations in which sequential methods are useful. In the first, observations are available only at infrequent intervals and must be interpreted as soon as they are obtained. An example is the study of accident rates which may only be obtainable at weekly, monthly, etc. intervals. Double sampling procedures are not useful in such situations. The second type of situation is where the number of observations is under the experimenter’s control, but observations are expensive, so that the smaller the required sample size the better it is. Double sampling is appropriate for this type of problem.
4.4.5 Fixed Length Confidence Intervals Based on SPRT Franzkn (2003) has given a procedure for obtaining confidence intervals for an unknown parameter 8 which is based on Wald’s SPRT. Let X have probability density or mass function f (2; 9) which has monotone likelihood ratio in x. Assume that we observe Xl,X2 ,... sequentially. Let X n = (x1,x2, ...,x,) E X n and 9 E R & R. The generalized probability ratio test (GPRT) defined by Lehmann (1955) is a test of HO : 8 = 80 against HI : 9 = 91 (90 < 91) with boundaries which may
CHAPTER 4. SEQUENTIAL ESTIMATION
174
vary with n, that continues as long as
(4.4.40) Then we have the following lemma.
Lemma 4.4.1 Let X1,X2, ... be a sequence of random variables with monotone likelihood ratio. Then the power function of any generalized probability ratio test is nondecreasing. Proof. This lemma is analogous to a result of Lehmann (1959, p. 101). Obviously the SPRT is a member of the class of GPRTs. From the above lemma it follows that the SPRT of HO : 8 = 00 against H I : 8 = 01 with error probabilities a and ,8 will have type I error rate less than or equal to a for any parameter in the hypothesis HO : 8 5 80 and type I1 error rate less than or equal to p for any parameter belonging to H I : 8 2 81 and consequently the SPRT of HO : 8 = 80 versus H I : 8 = 81 can be used as a test of HO : 8 5 8 0 versus H~ : e z e l . Next, for fixed 8 0 define the two types of hypotheses, H& : 8 2 8 0 and H; : 0 < 8 0 . L e t 7 - l + = { H $ : e E ~ } a n d 3 - 1 - = { H ~ : 8 E n } . ForfixedA>O, at each step, we test on level a/2 which elements H,f in 7-l+ can be rejected or accepted against the corresponding elements He-A in 7-1- and which elements H e in 7-l- can be rejected or accepted against in Z+. Whenever a decision is reached concerning a pair of hypotheses in 'FI+ and 7-l- these hypotheses will not be considered anymore. The use of composite hypotheses enabling us to make a decision regarding a hypothesis H r against H$ is made possible only by the monotone likelihood ratio property.
R+ ( x n ,A) = ( 8 : H$ is rejected against HrvA at or before time n } and
R-
( x n ,A)
= (6' : H i is rejected against
at or before time n }
be the set of parameters corresponding to hypotheses that have been rejected against their alternatives when observing X I , 22,...,xn. Let
U ( x n ,A)
= inf
(0 : 0 E R+ ( x n ,A)}
= the smallest parameter 6' for which H z
is rejected against
175
4.4. TWO-STAGE PROCEDURES and
L ( x ~A), =
SUP (0 : 0 E R+ (xn, A)} = the largest parameter 0 for which
He
is rejected against Hi+a. when x, is observed. Now we are ready to define the SPRT(A) confidence interval. We construct a sequence of temporary confidence intervals. Assume, for the time being (which will be established later) that R\ {R+(xn,A) U R- (xn)A)} is an interval. Since we have fixed-length confidence interval in mind we call the confidence intervals produced at each step temporary confidence intervals. In this terminology the event that there are no pairs left to test corresponds to the event that the length of the temporary confidence interval is less than or equal to A, and when this happens, the process is stopped.
First step Observe zl and construct R+ (xi,A) and R- (xi,A). Based on this we can compute U (xi,A) and L ( X I , A). If U (XI, A) - L (xi,A) L A, stop and declare that no confidence interval was found. If U (xi, A) - L (xi, A) > A, declare [U (XI, A) ,L (XI, A)] a 1 - a temporary confidence interval and take one more observation.
kth step In the kth step, x k - 1 = (xi,x2, ...,xk-1) has already been observed and the hypotheses corresponding to parameters in R+ ( x k - 1 , A) and R- ( x k - 1 , A) and which yield the present temporary confidence interval [ L( x k - 1 , A) , U ( x k - 1 , A)] have been rejected. Observing X k enables us to reject the hypotheses corresponding to parameters in R+ (xk,A) and R- (xk,A). If U(xk,A) - L (xk,A) L A there are no pairs of hypotheses left to test and hence declare [ L(xk-1,A), U ( x k W l , A)] to be the smallest confidence interval one can get based on the observations x k using A as the interval parameter. However, if U ( x k , A) - L (xk, A) > A declare [ L( x k - 1 , A) ,U ( x k - 1 , A)] as a 1- a temporary confidence interval and take one more observation. Then the SPRT(A) confidence interval is denoted by
where L (xn,A) and U (xn,A) are constructed as described above. The sequence {S (xi,A), i = 1,2, ...} will be a sequence of temporary confidence intervals. Inherent in the construction is the property
176
CHAPTER 4. SEQUENTIAL ESTIMATION
and
R- (Xn,A) C R- (Xn+l,A) and consequently
S (xn+l, A) G S (xn+1, A) Next, we need to be certain that the set R\ {R+(Xn, A) U R- (Xn, A)} of parameters corresponding to hypotheses which have not been rejected against the alternatives while observing xn is indeed an interval and that the coverage probability of this interval is at least 1 - a. This is assured by the following theorem of F r a n z h (2003). Theorem 4.4.2 Let f ( x ; 8 ) have monotone likelihood ratio, that (d2/dB2) x In f ( 2 ;8 ) < 0 and that both error rates are a/2. Then the set R\{ R+ (Xn, A) U R- (xn, A)} as an interval equal to S (xn, A) with coverage probability of at least 1- a. That is, Po (8 E S (xn, A)} 2 1 - a.
Proof. First let us show that if 8' E R- (xn,A) then 8" E R- (xn,A) for every 8" < 8'. Now, if 8' E R- (Xn,A), then for some sample size m 5 n the hypotheses H e was rejected against the alternative H$+,. This means that for the sample size we have
a
< ln{X(xn,8'+A,8')}
+ A) - In f ( x ~8'), In f (xm,8" + A) - In f (xm,8")
= In f (xm,8'
<
since (d2/d6J2)In f ( 2 ;8 ) < 0 implies that the first derivative of In f (z; 8 ) is decreasing. Hence the hypotheses Hey, must have been rejected against the alternative H:,+, at or before the sample size m. Because the error rates are equal, acceptance of the null hypothesis in the SPRT is equivalent to rejecting the hypothesis used as an alternative. Consequently, no hypothesis corresponding to a parameter smaller than L(Xn,A) or larger than U (xn,A) has ever been accepted since that would require U (xn, A)L (xn,A) < A which is not permissible in the construction. This completes the proof of the assertion that R\ {R+( x n , A) U R- (xn,A)} is an interval. Now, the coverage probability of the confidence interval can be decomposed as
Assume that the event 8 < L (Xn, A) has happened. This implies that every hypothesis H i , where 8' 5 L (xn,A) has been rejected against its alternative H$+A for some sample size less than or equal to n.
4.4. TWO-STAGE PROCEDURES
177
In particular, at stage Ic, H e was falsely rejected against H A , with probability at most equal to a/2 since each test has level a/2. Thus
Po { e 5 L (xn,A)} = Pe ({reject all H p where 0" < O } n {reject Ho}) < Pe (reject H e against H A A ) 5 42.
We can apply an analogous argument for asserting that
It remains to be shown that the length of S (xn,A) does depend on A. Franzh (2003) was able to show this for the Bernoulli case via a simulation study. Applications Consider the exponential family given by
(4.4.42) where 8 is a natural parameter. One can easily show that
since
that is,
By differentiating once more we can easily show that uar [T
(43= --d2 In c (0) . do2
CHAPTER 4. SEQUENTIAL ESTIMATION
I78
Note that the Bernoulli, Poisson, normal distributions belong to the exponential family. Also note that if
then
"1
a" In f (z,0 ) = - 2 [~ - ( ~ - e ) ae2 [I + (x - e)"]"
(4.4.43)
which fails to be negative when Ix - el > 1 and thus Theorem 4.4.2 does not hold in the case of the Cauchy density with a translation parameter. The fixed length SPRT(A) confidence interval of length at most D can be constructed by simply stopping at the smallest n for which U (xk,A)-L (xk,A) 5 D. This will always work for all D > A. According to F r a n z h (2003) there seems to exist an optimal value of A that yields, the smallest average number of observations. The optimal value of A may depend on both the true value of the parameter as well as D . In the Bernoulli case, the optimum choice for A seems to be in the interval [ L ) / 2 , D ) ;however the simulation carried out by Franzh indicates that the exact choice of A is not critical. Example 4.4.8 Let X have the Bernoulli mass function given by
f (x,e) = ez (1 -
x = o,i.
Assume that we have n observations x1,x2, ...,x, on X . Assume that the error probabilities are equal to a/2. In order to determine the lower limit of the confidence interval we find the largest value of 80 such that the hypothesis Ho : 8 5 80 can be rejected against H I : 6 2 8 0 A using a SPRT. If ( B ,A ) are Wald's bounds, then set a = 1nA and b = 1nB. The SPRT rejects Ho when
+
That is, when (4 4.45) a
+ +
where s ( n ) = z1+ 2 2 - - - xn. Using Wald's approximations to the boundary values in terms of the error probabilities, we have a = In [ ( 2 - a ) / a ] and b = In [a/(2 - a ) ] and hence to find the largest value of 80 that satisfies (4.4.45) we solve s (n)In
eo + A i-eo-a ( 7 + [n -) s (n)] In ( - eo ) = In
(e) .
(4.4.46)
4.4. TWO-STAGE PROCEDURES
179
Using a similar argument, a candidate for the upper confidence limit is given by the solution to (the smallest value of 80 such that HO : 8 5 80 is rejected against H I : 8 2 8 0 - A)
Note that replacing the strict inequality with an equality in (4.4.46) and (4.4.47) will be of little consequence since the parameter space is continuous. Equations (4.4.46) and (4.4.47) being nonlinear need to be solved numerically for 80 for given n, ~ ( n A ) , and a. Note that until the first response (namely, unity for z) is observed Equation (4.4.46) does not have a solution and the lower confidence limit is set equal to zero. Similarly, Equation (4.4.47) has no solution until the last nonresponse, namely zero is observed, and hence until then the upper confidence limit is set equal to unity. The candidate for the lower confidence limit we obtain at nth step is compared with the confidence limit we have for the previous step and the larger of these two values is used as the current lower limit of the confidence interval. The upper confidence limit is adjusted in a similar fashion. The process continues until the length of the temporary confidence interval is less than D.
Special Case Let A = 0.25, and D = 0.5 and suppose that we want to construct a 90% SPRT fixed-width confidence interval for the binomial 8. Let the first 17 of observations be 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,0, 1, 0 , 0, 0 , 0, 0
CHAPTER 4. SEQUENTIAL ESTIMATION
180
Setting a = 0.1, we obtained the following results. Table 4.4.1 90% Confidence Interval for binomial 6 n Lower root Lower CL Upper root Upper CL 1 0 0 0.9861 0.9861 2 0 0.9256 0 0.9526 3 0 0 0.8502 0.8502 4 0.005644 0.0050 0.8502 0.8740880 5 0.0426368 0.0426 0.892344 0.8502 0.035993 6 0.0426 0.8328 0.8328545 7 0.035355 0.7740144 0.0426 0.7740 8 0.026006 0.0426 0.7173 0.717304 0.022218 9 0.6636 0.0426 0.6636085 10 0.055929 0.6636 0.702070 0.0559 11 0.0497702 0.6571 0.657106 0.0559 12 0.0846981 0.0847 0.6571 0.690889 13 0.076936 0.0847 0.6524 0.6524315 14 0.0700812 0.6169 0.0847 0.616872 0.5842 15 0.0639904 +stop 0.0847 0.584188 16 0.058550 0.5542 0.0847 0.55429599 17 0.05366896 0.5270667 0.0847 The source program2 used for this result is given in Appendix of this section.
Appendix: Computer Program for Evaluating the Confidence Interval for Binomial Parameter
> read
sol;
sol := proc(xctr,nctr, deltactr, alphactr) local j , k, trigger, critl, crit; global eq; print(' '); trigger := deltactr; lprint(' ');
print(' '); if 0 < triggger then for j to xctr do for k to nctr do if j 5 k then
+
eq := z * h ( l +6/6) ( n- z) *In (1- 6/(1 - 6 ) ) = In ((2 - alphactr) /alphactr) ; critl := subs(z = j, n = lc, 6 = deltactr,eq);
'1 thank Professor Henry Howard of the University of Kentucky in helping me to prepare this computer program.
4.4. TWO-STAGE PROCEDURES
181
crit := fsolve(crit1, 8, 0..3/2); lprint(j, k, crit) end if end do; print(‘ ’) end do end if; if triggger < 0 then for j to xctr do for k to nctr do if j = I? then lprint(j,k, ‘no pos sol for this case’) end if; if j < k then eq := x * In (1 6 / 8 ) ( n - x) * In (1- S/ (1- 8 ) ) = In ((2 - alphactr) /alphactr) ; critl := subs(x = j , n = k , S = deZtactr,eq); crit := fsolve(crit1, 8, 0.. 1); lprint(j, k, crit) end if end do; print(‘ ’) end do end if end proc
+
+
CHAPTER 4. SEQUENTIAL ESTIMATION
182
4.5
Large-Sample Theory for Estimators
Anscombe (1949) provided a large-sample theory for sequential estimators when there is only one unknown parameter. He showed, using a heuristic argument, that an estimation formula valid for fixed-sample size remained valid when the sample size was determined by a sequential stopping rule. An alternative proof was given by Cox (1952a) which suggests that fixed-sample size formulas might be valid generally, for sequential sampling, provided the sample size is large. Anscombe (1952) simplified his previous work by introducing the concept “uniform continuity in probability” of the statistic employed. Towards this assume that there exists a real number 8, a sequence of positive numbers { w n } , and a distribution function G ( x ) , such that the following conditions are satisfied:
(Cl) Convergence of {Yn}: For any z such that G ( a )is continuous (a continuity point of G(z)),
(C2) Uniform continuity in probability of {Yn}: Given any small E and q there exists a large u and a small positive c such that, for any n > u ,
P { lynl - Yn I < EWn simultaneously for all integers n‘ such that In/- n1 < cn} > 1- q. Note that, as n
-+ 00, Yn -+
(4.5.1)
8 in probability if wn + 0.
In most applications, G ( x ) is continuous, and usually is the normal distribution function, wn is a linear measure of dispersion of Yn, and for example, is the standard deviation or the quartile range. The term “uniform continuity” is used to describe condition (C2), since a property analogous to ordinary uniform continuity is implied. Given any realization of the sequence {Yn}, let functions Yn and wn be defined for non-integer values of n by linear interpolation between the adjacent integer values. Then, if lnwn is uniformly continuous with respect to I n n for large n, and if (Cl) is satisfied, it is easy to see that (C2) implies, in a probabilistic sense the uniform continuity of (Yn - 8 ) / W n with respect to Inn. Theorem 4.5.1 (Anscombe, 1952). Let {n,} be an increasing sequence of positive integers tending to infinity, and let (NT)be a sequence of random variables taking positive integer values such that {Nr/nr) 4 1 in probability as r -+ 00. Then, for the sequence of random variables (Yn) satisfying conditions (Cl) and (C2), we have
P { Y N~ 8 5 xw,,}
+ G (x)
as r
-+00.
(4.5.2)
4.5. LARGE-SAMPLE THEORY FOR ESTIMATORS
183
Remark 4.5.1 Notice that we have not assumed that the distribution of Nr and Yn are independent.
Application 4.5.1 We will apply this result to the sequential estimation of an unknown parameter 8. Let X I ,X2, ... denote a sequence of observations, Yn be an estimate of 8 calculated from the first n observations, and Zn an estimate of the dispersion W n of Yn. In order to estimate 6' with given small dispersion a,we use the sequential stopping rule: sample until for the first time Zn 5 a and then calculate Yn. To show that Yn is an estimate of B with dispersion asymptotically equal to a if a is small, we consider not a single stopping rule, but a sequence of possible stopping rules such that the values of a tend to zero. The above situation can be described in probabilistic terms as follows. Let { X n } ( n = 1,2, ...) denote a sequence of random variables not necessarily independent. For each n, let Yn and Zn be functions of X i , X2, ...,X n . Assume that {Yn}satisfies (Cl) and (C2) above. Let {a,} (r = 1,2, ...) be a decreasing sequence of positive numbers tending to zero. Let { N,} be a sequence of stopping times defined by the condition: Nr is the least integer n such that Zn 5 a,;and let {n,} be a sequence of integers defined by the condition: n, is the least n such that wn 5 a,. We assume that the following further conditions are satisfied.
(C3) Convergence of {wn}: {wn} is decreasing, tends to zero, and as n --+
00
(4.5.3) (C4) Convergence of { N r } :Nr is a well-defined random variable for all r , and asr-+oo N,/n, 4 1 in probability. (4.5.4) Then we have the following theorem.
Theorem 4.5.2 (Anscornbe, 1952). If conditions (Cl)-(C4) are satisfied, then P (YN,- B 5 xa,) --+ G (x) as r --+ 00, (4.5.5)
at all continuity points x of G(x). Proof. (C3) implies that W n T / a , 4 1 as r
+ 00.
Now (4.5.5) readily follows
from Theorem 4.5.1.
Remark 4.5.2 Renyi (1957) gives a direct proof of Theorem 4.5.2 when
i= 1
184
CHAPTER 4. SEQUENTIAL ESTIMATION
where the Zi are i.i.d. having mean 0 and finite variance. In applying this theorem, it will usually be obvious that (Cl) and (C3) are satisfied. The following theorems show that (C2) is satisfied for a wide class of statistics Yn. (C4) is a weak condition and it is usually easy to verify. Although these conditions are sufficient for the conclusion of Theorem 4.5.2.) it will be shown that they are not necessary.
Particular Forms of Yn: Let us assume the following form for Yn which is satisfied in most applications 1
czi + & Yn - 8 = n i=l
(4.5.6)
where the Zi are independent with E ( 2 ; ) = 0 and v a r ( 2 i ) 5 b 5 00 and n1I2R,= o(1) almost surely (as.). Then we will show that Yn satisfies (C2). Thus, we are led to the following result.
Theorem 4.5.33 Let a statistic Yn have the following representation: (4.5.7) i= 1
where the Zi are independent with E (Zi) = 0 and var (2;)5 b < 00 (i = 1,2, ...) and the remainder term R, = o(n-lI2) a.s. Then Yn - 8 satisfies (C2).
Special Cases
(1) Sample Quantiles. Let F be a distribution function and 7 be a fixed point such that F (7) = p (0 < p < 1). Assume that F has at least two derivatives in the neighborhood of 7 and F’(7) is bounded in the neighborhood and that F’ (7) = f (7) > 0. Let X l , X 2 , ...,X n be a random sample from F , Yn be the pth sample quantile (i.e., Yn = X[nP]+l,nwhere Xin are ordered X’S) and Sn denotes the number of X’s exceeding 7. Then Bahadur’s (1966) representation gives
where R, = 0 ( r ~ - ~ / ~ l nasn )n --+ 00 with probability one. Applying Theorem 4.5.3 we surmise that a sample quantile or a linear combination of sample quantiles satisfies (C2). 3 I thank Professor David Mason of the University of Delaware for a useful discussion regarding this result.
4.5. LARGESAMPLE THEORY FOR ESTIMATORS
185
(2) Maximum likelihood estimate ( m l e ) . When the probability function or the density function satisfies certain regularity conditions, we shall show that the mle has an asymptotic normal distribution when based on a random sample size. First, we shall give the strong consistency property of the rnle which was established by Wald (1949).
Theorem 4.5.4 (Wald, 1949). Let f ( x ; 9 ) denote the probability function or the probability density function of a random variable X , where 9 could be a vector. Let
= f (x;9 , p ) when f ( x ;8,p) > 1 and = 1 otherwise. $* ( q r ) is analogously defined. Assume the following regularity conditions:
f* ( x ;9 , p )
( i )For suficiently small p and for suficiently large
where
90
r,
denotes the true value of the parameter,
(ii)If 8i is a sequence converging to 8, then limi-+oof (x;&) = f ( x ; 8 )for all x except perhaps o n a set which may depend on the limit point 0 (but not on the sequence 9i) and whose probability measure is zero with respect to the distribution associated with the parameter 90, (iii) Distinct parameters index distinct distribution functions, (iv) If lirn;-.+- 8; = 00, then lim;+- f ( x ;8;) = 0 for any x except perhaps o n a fixed set (not depending o n the sequence 8i) whose probability as zero with respect to the distribution associated with 90,
(vi) The parameter space is a closed subset of a finite dimensional Euclidean space, and (vii) f ( x ;9, p ) is a measurable function of x f o r all 8 and p. Let
8,
denote the maximum likelihood estimate of 8 based on a random sample of size n from f ( x ; O o ) . If the regularity assumptions (i)-(vii)hold, then
8, converges to
80
as n + 00 with probability one.
186
CHAPTER 4. SEQUENTIAL ESTIMATION
Proof. See Wald (1949, pp. 599-600). Remark 4.5.3 In the discrete case assumption (vii) is unnecessary. We may replace f* (z; 8, p) by f” (2; 8, p) where f ( x ; 8, p ) = f (x;8, p ) when f (z; 00) > 0, and = 1 when f (x;eo)= 0. Since f(x;80) is positive at most at a countable values of x, f” is obviously a measurable function of x. Huber (1967) gives an alternative set of conditions for the strong consistency of the mle. Next we shall consider the asymptotic normality of the mle based on a random sample size. Let XI, X2, ... denote an i.i.d. sequence having the probability function or the density function of X I . Let Bn(0) = -1x -”l ndf ( X i ; e ) . n de i=l
We assume the following regularity conditions;
dj dej
(a) - [Inf ( X ;O)] exists for j = 1,2,3,
1
d3 - [In f ( X ;e)] 5 H (x)where Ee [H ( X ) ]< M , where M is free of 8. de3 Under the assumptions (a)-(c), the mle converges to 0 0 (the true value of the parameter) in probability (see, for instance, Rao (1965, p. 300)). We shall state without proof the following result of the author (1987, pp. 426-427).
Theorem 4.5.5 Let N be the random sample size. Then, under the asI (&) tends in distribution to normal sumptions (a)-(c) , we have N1I2 (07 I (eo))
’
Remarks 4.5.4 As a by-product, we can establish the law of the iterated logarithm for the mle. That is (4.5.8) since nl/’Bn
(00)
[2 In In n]-1/2
--+
1-112 ( 0 0 ) .
187
4.5. LARGE-SAMPLE THEORY FOR ESTIMATORS Furthermore, if for some S > 0,
then using a result of Lo&e (1977, Vol. 1, pp. 254-255) we can establish that (an
- 00) =
Bn (80)
+o
, 0 n-1/2
(see Problem 4.5.7.)
which coincides with the strong representation in (4.5.7). However, it seems that such a strong representation (although sufficient for the statistic to have the property (C2)) is not necessary as has been demonstrated in the case of the mle. A type of statistic to which Theorem 4.5.3 does not apply, but which still satisfies (C2), is given by (i) Yn = Xn,n; (ii) Yn = XI,; (iii) Yn = Xnn - Xln, because
=
([TI )-’ -+l
+
Thus, P (Ynl > Yn) 5 1 - (1 c)-’ 5 c. That is, the probability that Ynl differs from Yn for any n‘, where n < n‘ < (1 c ) n is less than c in cases (i) and (ii), and less than 2c in case (iii), and is therefore small if c is small. Statistics of this type often consist of one of the three expressions listed above multiplied by a factor not depending on n. In that case, if c is small, the probability is close to 1 that Ynl - Yn = ( U n l / U n - 1)Yn and this must with high probability be small compared with w n for large n and In’ - nl < m.Thus a condition is imposed on relative to wn. the “continuity” of
+
Theorem 4.5.6 (Anscombe, 1952). (C2) is satisfied if X1,X2, ... are independent and identically distributed and Yn is a n extremum or the range of X1,X2, ...,Xn multiplied b y a factor un (provided that un, if not a constant, satisfies the above condition).
Example 4.5.1 Suppose that we wish to estimate, with given small standard error a , the mean 8 of a normal distribution, when its variance o2 is unknown. If X I ,X2, ... denote independent observations, we consider the statistic Yn = n-l Cy==l X i as an estimate of 8; for fixed n this has standard error W n = o/,/E, 1/2
estimated (for n 2 2) by Zn = [n( n - 1)I-l cy=l (Xi- Yn)2} . Condition (Cl) and (C3) are satisfied, and so is (C2). Therefore (C4) hold and Theorem
{
188
CHAPTER 4. SEQUENTIAL ESTIMATION
4.5.2 implies that (YN- 0 ) / a is asymptotically normal with mean 0 and unit variance if N is the least n for which
Zn 5 a
(4.5.9)
Now, (4.5.9) is equivalent to ( n - 1)-' n-1 ti 2 5 nu2, where the tiare independent normal (0, 02)variables derived from the X i by a Helmert's transfor-
'&
+
X3> [i/ (i 1)]'/2. By the strong law of large mation: ti = (Xi+1 - i-l numbers, given E , r] > 0, there is a v such that (4.5.10)
If a is small enough, the probability exceeds 1- r] that (4.5.9) is not satisfied for any n in the range 2 5 n 5 v ;and then, given that N > u (4.5.10) implies that - 11 < &/u2. Hence (C4) holds the probability exceeds 1 - 7 that N 2 as a + 00. To obtain a better approximation to the asymptotic situation when a is not infinitesimally small, it is advisable to impose a lower limit on the value of N and not consider whether (4.5.9) was satisfied until that lower limit had been where 0 < S < 0. passed. One might, for example, specify that N 2 If S < 0 (C4) and Theorem 4.5.2 would apply as before.
I
Example 4.5.2 Suppose that X I ,X2, ... have independent uniform distributions in the interval (O,e), and that we desire to estimate 8 with given small standard error a. We may take
AS n 4 00, P (Yn - 8 5 Xwn) -, G ( x ) = exp (x - 1) (for x 5 l), and wn is asymptotically the standard deviation of Yn. (C2) is satisfied by Theorem 4.5.6. Hence by Theorem 4.5.2 the required sample size N is the least n for which
Similar considerations apply regrading (C4) as in the previous example. If in the definition of Yn we omit the factor ( n l ) / n , this gives G (x)= exp (x) (for x 5 0) and a stopping rule equivalent to the above for large n.
+
Example 4.5.3 To see what may happen if (C2) is not satisfied, consider independent observations X I ,X2, ... from a normal distribution with unknown mean 8 and variance 02,and let us take
(4.5.11)
4.5. LARGESAMPLE THEORY FOR ESTIMATORS
189
Yn is normally distributed with mean 8 and asymptotic variance 2a2/n for large n. The correlation between Yn and Yn+l + 112 and (C2) is not satisfied. Suppose that we wish to estimate 8 with given small standard error a. Then we take N = 2a2/a2 (or the next integer above), a fixed value. The conclusion of Theorem 4.5.2 is valid. Now suppose that we wish to estimate 8 (assumed to be positive) with given small coefficient of variation a. The coefficient of variation of Yn and a suitable estimate of it, are w n = ( c r / O ) (2/n)l12, Zn = (./Yn) (2/n)l12. When Theorem 4.5.2 is applied, it yields the stopping rule: Stop at N where N = inf (n ; nY: 2 2a2/a2).
(4.5.12)
When n = N the second member of the right-hand side of (4.5.11) is asymptotically normal with mean 8 (1 - N-l12) , however, toward the first number, as a + 0 the probability tends to 1 that X N > 8 kcr where k is any positive number. Hence (YN- 8 ) / a -+00 in probability, and so does not have the limit distribution G (z) = (z/O) which Theorem 4.5.2 gives, where CP denotes the standard normal distribution function: It is easy to verify that (C4) is satisfied by the rule in (4.5.12). Thus, when (C2) is not satisfied, the conclusion of Theorem 4.5.2 may hold if N satisfies a stronger condition than (C4), such as being constant. However (C2) seems to be necessary if no condition other than (C4) is imposed on N , and in particular, if the distribution of N and YN are not assumed to be independent.
+
Remark 4.5.5 If Yn is the average of independent random variables, one can easily show (see, for instance, Laha and Rohatgi (1979, Lemma 5.4.1, pp. 322-323) that if Yn satisfies (Cl) then it also satisfies (C3). Since in most practical applications, we will be concerned with random sums of i.i.d. random variables, the following result would supplement the main result of Anscombe (1952). Theorem 4.5.7 (Renyi, 1957, and Wittenberg, 1964). Let XI, Xz,... be independent and identically distributed random variables having mean 0 and variance 1, and define Sn = X1 X2 - - Xn. If Nl,N2, ... is a sequence of positive integer-valued random variables (defined o n the same probability space) such that N,/n converges an probability to a positive constant E , then sNn/(ne)1'2 converges in law to a standard normal variable as n --+ 00.
+
-+ - +
Proof. This is a special case of Theorem 4.1 of Wittenberg (1964, p. 15). I
Bhattacharya and Mallik (1973) employ Theorem 4.5.7 in order to establish the asymptotic normality of the stopping time of Robbins' (1959) procedure for
CHAPTER 4. SEQUENTIAL ESTIMATION
190
+
estimating the normal mean p when the variance u2 is unknown, with (I; - p )2 cn as the loss function where c is proportional to the cost per observation and fi = n-l C:==, X i and the Xi are the i.i.d normal ( p , 02). In other words, they show that N-c-li2u d (4.5.13) 1/2 = normal (OJ) as c -1 0. (;c-%) They use Lemma 1 of Chow and Robbins (1965) (see Lemma 4.7.1) in order to assert that c112N + o almost surely as c + 0. Notice here that the stopping time N should be indexed by c which we are suppressing for the sake of simplicity. Next, we shall consider a result of Siegmund (1968). Let X1,X2, ... be i.i.d. random variable having mean p > 0 and finite variance 02, and Tn = X1 X2 * - - Xn. Let N ( = N,) denote the smallest n for which Tn 2 c-lnb where 0 5 6 < 1. Such stopping rules commonly arise in sequential estimation (for instance, see Chow and Robbins (1965) and Darling and Robbins (1967b)). Then we have
+
+
+
Theorem 4.5.8 (Siegmund, 1968). Let X1,X2, ... be 2.i.d. random variables with E ( X i ) = p > 0 , var ( X i ) = o2 and let Tn = X i X2 * * Xn. If N is the smallest n for which Tn 2 cV1n6,0 5 S < 1, then as c -1 0
+ +- +
where A, = (cp)'/('-')
,
Proof. Bhattacharya and Mallik (1973, Theorem 4) provide a simpler proof of Theorem 4.5.8 that is based on Theorem 4.5.7. They also conjecture that Theorem 4.5.8 holds for all nonnegative 6.Woodroofe (1977) considered the stopping time of a sequential estimation procedure given by N = inf n 2 no : T* < cnbL ( n ) }
{
where Tn is as defined earlier, S convergent sequence such that
+
L (z)= 1 L0z-l
>
1, c is a positive parameter and L ( n ) is a
+ o (z-')
(Z + m)
and LO< 00.
Then Woodroofe (1977) establishes the asymptotic normality of
and obtains an asymptotic expansion for E ( N ) .
4.6. DETERMINATION OF FIXED- WIDTH INTERVALS
4.6
191
Determination of Fixed-width Intervals
Khan (1969) has given a method for determining stopping rules in order to obtain fixed-width confidence intervals of prescribed coverage probability for an unknown parameter of a distribution possibly involving some unknown nuisance parameters The results are only asymptotic, and rely on the asymptotics of Chow and Robbins (1965). Below we present Khan’s (1969) results. Let p (2;81, 8 2 ) denote the probability density function of a random variable X (for convenience, with respect to Lebesgue measure) with real-valued parameters 81 and 8 2 where 0 2 is considered to be a nuisance parameter. For the sake of simplicity we assume that there is a single nuisance parameter since the case of several nuisance parameters would be analogous. We wish to determine a confidence interval of fixed-width 2d (d > 0) for 81 when both 81 and 8 2 are unknown, with preassigned coverage probability 1- a (0 < a < 1).
Assumption We assume that all the regularity conditions of maximum likelihood estimation are satisfied. [See for instance, LeCam (1970)]. Also assume the regularity conditions of Theorem 4.5.4. Let N denote a bona fide stopping variable (that is, N is a positive integervalued random variable such that the stopping set { N = n } is a member of the 0-algebras of subsets generated by X(”) = (XI,X 2 , ...,Xn)’ and P ( N < m) = 1). Let n denote a fixed value assumed by N . Also, let Fisher’s information matrix [see for instance, Rao (1965, p. 270)] be denoted by I ( 8 ) = ( ( I i j ) ) ,2, j = 1 , 2 where 8 = (el, 82)’ and
We assume that ( ( I i j ) )is positive definite and let ((Iij))-’ = ((Xij)) = A, that is I - l ( O ) = A. 81 ( n )and 8 2 ( n )will denote the maximum likelihood estimators (mle’s) of 81,& respectively based on a random sample of size n. It should be noted that 81 (n) is asymptotically normal with mean 81 and variance All/n where A11 = A11 (el,&) since in general the Iij are functions of 81 and 82. Let {u,,n 2 1) be a sequence of positive constants converging to a constant u where
(u)= 1 - (a/2).Let
Jn = [81 ( n )- d , 81 ( n )+ d ] and
no = smallest integer n 2 u2x1l (el, e2)/ d 2 From (4.6.1) it follows that lim no = 00 and lim d+O
d--tO
[
d2n0
u 2 h 1 (81782)
]2
1.
(4.6.1)
192
CHAPTER 4. SEQUENTIAL ESTIMATION
Hence,
ni’2 d+O lim P(01 E
&o)
= d-0 lim P
181 (no>- el A;(2
1 Ld
= 2~ (u’) - 1 2 1- a , since u’
(3
> u.
We will treat no as the optimum sample size if 01 and 0 2 were known, which will serve as a standard for comparison with the stopping time of the sequential procedure to be adopted. In some cases no will turn out to be optimum if only 0 2 were known and A11 (01,02) depends only on 0 2 , for example, in the case of a normal distribution with 01 as the mean and 02 as the variance. When 01 and 02 are unknown, no fixed n will be available to guarantee fixedwidth 2d and coverage probability 1- a. So we adopt the following sequential rule. For a fixed positive integer m, let (4.6.2)
where
( n )= A11
(81 (n),& ( n ) ).
Lemma 4.6.1 (Khan, 1969). If A11 (el,&) < 00, then the sequential procedure terminates finitely with probability 1.
(n)+ A11 (&,02) with probaProof. Under the regularity assumptions, bility 1 [see Theorem 4.5.41. Thus the right hand member of (4.6.2) tends to no with probability one. Hence
P ( N = 00) = lim P ( N > n) 5 lim P n+oo
n+oo
Then we have the following first order asymptotic result.
Theorem 4.6.1 (Khan, 1969). satisfied and if
If the assumptions of Theorem 4.5.4 are
L 1
E supi11 (n) < 00, then we have
(i) limd,o N/no = 1 almost surely (a.s.) (asymptotic optimality),
(4.6.3)
4.6. DETERMINATION OF FIXED- WIDTH INTERVALS
193
(ii) limd+O P (81 E J N ) = 1 - a (asymptotic consistency), (iii) limd,o E ( N )/ n o = 1 (asymptotic eficiency). Proof. To prove (i) let Y n = A11 ( n )/ A i l , f ( n )= nu2/uiand t = u2Aii(81, 82) / d 2 = no.Then conditions of Lemma 4.7.1 [Chow and Robbins (1965, Lemma l ) ] are satisfied and hence N N lim - = lim - = 1 a.s. t+m
t
d--+onO
To prove (ii) we observe that N ( t ) / t --+ 1 a.s. as t -+ 00 and hence N(t)/nt 1 a.s. as t --+ m where nt = [t]= largest integer 5 t. It follows from Theorem 4.5.4 that [N ( t )/A11 (01, 82)]1/2 { N ( t ) }- 811 tends to the standard normal -+
variable in distribution as t a.s. as d + 0. Hence
=
--+
00.
Also from (i) it follows that d(N/A11)l12 -, u
aj ( u )- @ (-u) = 1 - a.
Finally (iii) follows from Lemma 4.7.2 [Chow and Robbins (1965; Lemma 2)]. H
Remark 4.6.1 It should be noted that Assumption (4.6.3) is required only for the validity of (iii). However, in some cases it might be possible to establish (iii) without (4.6.3), for instance, by using Lemma 4.7.3 [Chow and Robbins (1965; Lemma 3)].
Example 4.6.1 (a) Consider the normal population having mean p and variance u2 (0 Take = p, O2 = u2. Then
< u2 < m) ,
The mle's of 81 and 82 are & ( n ) = X n = n-lxy=lXi and 8 a ( n ) = n-' Cy==l (Xi- Xn)2 . Instead of 6 2 ( n ) , we can use s: = n& ( n ) / (n - 1) which is an unbiased and consistent estimator for 82. Hence, the following stopping rule is obtained: n>2:n>-
"i'z>
u2u2 and no = d2 .
CHAPTER 4 . SEQUENTIAL ESTIMATION
194
(b) For the normal population of example (a), let 81 = 02, 02 = p. Then
Thus 2
2u'4
~
n 2 2 : n 2 - and no = d2 d2
4
Graybill and Connell (1964a) have given a two-stage procedure for estimating u2 (see Problem 4.4.5).
(c) Let p
(2; 0)
= 8 exp (-Ox), x
2 0, 0 < 8 < 00. Then,
and
The validity of Assumption (4.6.3) in (a) and (b) follows from the following lemma which is proved from Wiener's (1939) theorem. However, it is not true in (c) and hence (iii) cannot be concluded from Lemma 4.7.2. We first state Wiener's (1939) theorem without proof. Theorem 4.6.2 (Wiener, 1939). Let { X n , n 2 1 ) be a sequence of 2.i.d. random variables with E lXnlr < 00 or E [IXnlr In+ IXnI] < m according as r > 1 or r = 1, where log+ u = max(0, log u). T h e n
5xi 1'
<m
i= 1
and conversely. Lemma 4.6.2 (Khan, 1969). Under the assumption 0
E sups: I22
where
si
1
< 00 for q 2 2
is the sample variance (unbiased version).
< u2 < 00, we have
4.6. DETERMINATION OF FIXED- WIDTH INTERVALS
195
Proof. For q = 2,
Since s;
l
n-1
-2
nX: Ex? - +- X, n
,< -
i=l
we have sups; 5
n-l
n-1
c
l n + 2z; 5 x,2+ x? + z;, n-1 i=2
(ex.)
x; + sup l Ex? n + sup 2-
n12
n22 7 2 -
1 i=l
n>ln2
2
.
i=l
Consequently using Theorem 4.6.2,
E sups, L
< 00
if E ( X 2 )In’ [XI2< 00 and E ( X 2 ) < 00.
2 2,
However, E (X’) In+ IX12 5 E (X4) < 00 and E ( X 2 ) < normal distribution with finite variance. Now let q > 2. Then
+
after using the inequality [ u blP 5
2p
00
are true for
{ Iulp + [blp},p 2 1. That is
Therefore,
Hence, E (supn2z sg) < 00 if E J X I J < q 00 which is true for a normal distribution with finite variance. This completes the proof of the lemma.
CHAPTER 4. SEQUENTIAL ESTIMATION
196
Remark 4.6.2 In the case of a single parameter family of distributions, the stopping variable N takes the form:
{a2
where i (8) = -E lnp ( X ;8 ) /a02} and 8, is the mle of 8. However, if i (8) is independent of 8, no sequential procedure is required since the bounded length confidence intervals of given coverage probability can be based on normal theory. More generally no sequential procedure is required when A11 (&,&) depends only on 8 2 which is known. As an example, consider a normal distribution with unknown mean and known variance.
4.7
Interval and Point Estimates for the Mean
4.7.1 Interval Estimation for the Mean In Section 4.6, we discussed fixed-width confidence intervals for a general parameter, in this section we shall present the large-sample fixed-width sequential confidence intervals for the population mean. The main results can be followed by assuming certain convergence theorems, the understanding of which requires some knowledge of measure theory. The basic asymptotics of Chow and Robbins (1965) will be given as lemmas and their main results and NBdas’ (1969) results will be stated as theorems. Lemma 4.7.1 (Chow and Robbins, 1965). Let Yn ( n = 1,2, ...) be any sequence of random variables such that Yn > 0 a.s. (almost surely) limn+m Yn = 1 a.s., let f (n) be any sequence of constants such that
and for each t > 0 define
N = N ( t ) = smallest k 2 1 such that Yk 5 f ( k ) / t .
(4.7.1)
Then N is well-defined and non-decreasing as a function of t , lim N = 00 a.s., lim E ( N ) = 00,
(4.7.2)
lim f ( N ) / t = 1 a.s.
(4.7.3)
t+oo
t+w
and t+m
4.7. INTERVAL AND POINT ESTIMATES FOR THE MEAN Proof. that for N follows as t
197
(4.7.2) can easily be verified. In order to prove (4.7.3) we observe > 1, YN < f ( N ) / t < [ f ( N ) / f ( N- l ) ] Y ~ - 1from , which (4.7.3) + 00.
rn
Lemma 4.7.2 (Chow and Robbins, 1965). If the assumptions of Lemma 4.7.1 are satisfied and i f E (sup, Yn) < 00, then lim E [ f ( N )/]t = 1.
t+oo
(4.7.4)
Proof. Let 2 = supn Yn; then E ( 2 ) < 00. Choose m such that f ( n ) / f ( n2) 5 2 for n > m. Then for N > m
Hence, for all t 2 1
(4.7.5) Now, (4.7.4) follows from (4.7.3), (4.7.5) and the dominated convergence theorem. Let X I ,X2, ... be a sequence of independent observations from some population. We wish to set up a confidence interval having specified width 2d and coverage probability 1 - Q for the unknown mean p of the population. If the variance of the population u2 is known, and if d is small compared to u2, this can be constructed as follows. For any n 2 1 define
fractile of the and let u, (to be written as u hereafter) denote the (1 standard normal distribution. Then, for a sample of size n determined by n = smallest integer
U202
2d2
(4.7.6)
the interval In has coverage probability
P ( p E In)= P
U
Since (4.7.6) implies that limd,o (d2n) / (u2u2)= 1, it follows from the central limit theorem that limd,o P ( p E In) = 1 - Q.
CHAPTER 4. SEQUENTIAL ESTIMATION
198
Quite often, we will be concerned with the situation where the nature of the population, and hence u2, is unknown, so that no fixed-sample size method is available. Let (4.7.7) u1, u2, ...
be any sequence of positive constants such that limn-,m un = u, and
define
N
= smallest integer k
d2k 2 1 such that V 2 < IC -
u;
(4.7.8)
Then, we have the following theorem. Theorem 4.7.1 (Chow and Robbins, 1965). If 0 < u2 < 00, then we have
(i) 1imd-o ( d 2 N )/ (u2u2)= 1 a.s., (asymptotic (‘optimality”), (4.7.9) (ii) limd,o P ( p E I N ) = 1- Q (asymptotic ‘(consistency))), (4.7.10) (iii) limd-+o ( d 2 E ( N ) )/ (u2u2) = 1 (asymptotic “eficiency’)). (4.7.11) In Lemma 4.7.1, set
and (4.7.12) then (4.7.8) can be written as N = N ( t ) = smallest k Applying Lemma 4.7.1, we have
2 1 such that Yk 5 f ( k ) / t . (4.7.13)
which proves (4.7.9). Next, consider
Proof. By (4.7.13) d N 1 1 2 / a --+ u and N / t --+ 1 in probability as t + 00; it follows from Theorem 4.5.1 that as t --+ 00, ( X I X2 - - - Xn - N p ) behaves like a standard normal variable. Hence limt+m P ( p E I N ) = 1- a which
+ + +
ON^/^
4.7. INTERVAL AND POINT ESTIMATES FOR THE MEAN
199
proves (4.7.10). Now (4.7.11) immediately follows from Lemma 4.7.2 whenever the distribution of the Xi is such that
E sup
-)(xi-Yn)
[n{:"
i=l
2}1
< 00.
(4.7.14)
The justification being that
(4.7.15) and since the function f ( n )defined by (4.7.12) is n+o(n) it follows from (4.7.15) that
For (4.7.14) to hold, finiteness of the fourth moment of the X i would suffice; however, the following lemma shows that (4.7.11) is valid without such a restriction.
Lemma 4.7.3 (Chow and Robbins, 1965). If the conditions of L e m m a 4.7.2 are satisfied, if limn+oo [ f ( n ) / n= ] 1, i j for N defined by (4.7.1) E ( N ) < 00 (all t > 0), lim sup E (NYN) 5 1, t-mo E(N)
(4.7.16)
and if there exists a sequence of constants g(n) such that
then,
(4.7.17) Proof. The proof is somewhat technical and hence is not given here. For details see Chow and Robbins (1965, Lemma 3, p. 459). H
Remark 4.7.1. If the random variables X i are continuous, definition of V2 in (4.7.7) can be modified to V2 = n-l Cy==, (Xi- Xn)2 . The term n-l is added in order to ensure that Yn= Vn/a2 > 0 a.s., and this fact has been used in the proof of Lemma 4.7.1 so as to guarantee that N --+ 00 a.s. as t + 00. Also it is evident from the proofs, N in (4.7.8) could be defined as the smallest (or the smallest odd etc.) integer 2 no such that the indicated inequality holds, where no is any fixed positive integer.
CHAPTER 4. SEQUENTIAL ESTIMATION
200
Remark 4.7.2. Theorem 4.7.1 has been established when the X i are normal with mean p and variance cr2 by Stein (1949), Anscombe (1952, 1953), and Gleser, Robbins and Starr (1964). Extensive numerical computations of Ray (195713) and Starr (1966a) for the normal case indicate that, for example, when 1 - cy = 0.95 the lower bound of P ( ~ -Nd 5 p 5 Y N+d) for all d > 0, where N is the smallest odd integer k 2 3 ( X i - y k ) 2 5 (d2k)/u:,is about 0.929 if the values u k such that ( k - 1)-l are taken from the t-distribution with k - 1 degrees of freedom (see Table 5.1.1 in Govindarajulu (1987) or Ray (1957b)). Niidas (1969) has extended Theorem 4.7.1 so as to take care of other specified “accuracies”. We speak of “absolute accuracy’’ when estimating p by
xf’.l
In ( P and if p
IX, - 1-11 L d )
)
(d > 0) ;
(4.7.18)
# 0,we speak of “proportional accuracy” when estimating p by Jn
(P : I X n -
PI
I P IPI)
7
(0 < P < 1)-
(4.7.19)
Denote by p the coefficient of variation a/ IpI and define (4.7.20) (4.7.21) where as before u denotes the (1- c ~ / 2 fractile ) ~ ~ of the standard normal variable. Then n(d) and m ( p ) increase without bound as the arguments tend to zero. Hence, for small arguments one can (at least approximately) achieve the required probability of coverage 1- a by taking the ‘sample size’ n no smaller than n (d) (for absolute accuracy) or m ( p ) (for proportional accuracy). If, however, cr2 (or p2) is unknown then n ( d ) or m ( p ) is not available. On the other hand if we let V: be given by (4.7.2) then the stopping rules (4.7.22) and (4.7.23) are well-defined. In the event that p2 is known (but not cr2) and one insists on absolute accuracy, or if cr2 is known (but not p2) in the proportional case, then one has
{
-2 n + -2; < n N*=min n : X
n>l
(4.7.24)
4.7. INTERVAL AND POINT ESTIMATES FOR THE MEAN
201
and
(4.7.25) as the sequential analogues of (4.7.22) and (4.7.23). Denote by K any one of the stopping times given by (4.7.22)-(4.7.25),let k be the corresponding “sample size” (4.7.20) or (4.7.21) and let Hk be the corresponding interval estimate (4.7.18) or (4.7.19). Then we have the following theorem. Theorem 4.7.2 (Nhdas, 1969). With the above notation, we have
( i ) limK/lc = 1 a.s., (asymptotic optimality)
(ii) lim P ( p E Hk) = 1- a (asymptotic consistency)
(iii) limE(K)/k = I (asymptotic eficiency) 4.7.2
Risk-efficient Estimation of the Mean
Let X I ,X 2 , ... be i.i.d. random variables having mean p and variance wish to estimate the unknown mean p by s?, with a loss function L n --,26-2
where X, = n-l
02,We
(X. - p)2 +
xz1X i , S > 0, X > 0 and n denotes the sample size. The risk 026
4 = E (Ln) = - + A n n
which is minimized by taking a sample size no where
no=X- 112o6 incurring the risk
h0= 2X1/20b. Since o is unknown, there is no fixed-sample size procedure that will achieve the minimum risk. With S = 1, Robbins (1959) proposed to replace o2 by its estimator i= 1
and define the stopping time by
where m 2 2 and then the point estimate of p is ~
N I .
Let R N ~ = E ( L N ~ The ) . performance of N‘ is usually measured by
CHAPTER 4. SEQUENTIAL ESTIMATION
202
(i) the risk-efficiency: &,/RNI and (ii) the regret: R N ~ - ho where the observations are drawn from a normal population, and 6 = 1, using the loss function
Robbins (1959) obtained some numerical and Monte Carlo results which suggested the boundedness of the regret. Starr (1966b) has shown that
(i.e., asymptotic risk-efficient) if and only if m 2 3. Starr and Woodroofe (1969) have shown that the regret is bounded if and only if m 2 3. Woodroofe (1977) has obtained the second order asymptotics: when m 2 4 1 3 + o(1) E ( N ' ) = X1l2a+ -2 o - ~ v- 4
R>, = 2X-1/2a
+ -21 + o (1)
,
where v is a computable constant. The preceding results indicate that the sequential procedure performs well for the normal samples. Chow and Yu (1981) ask how good is the procedure in general? Consider the following counter example. Let P ( X i = 1) = p = 1- P ( X i = 0), 0 < p < 1 and S = 1. Then for m 2 2
E
L
[
( ~ N -I p )
4- AN']
J
(Xm-
dP
{X~=1,,..,Xm=l} '
(1- p ) 2 p m> 0 and
R,, x 2 [ ~ (1p- p)ll/'
+
o
as
x -+ 0.
Hence limX+0&,/RNt = 0 and thus N' is not asymptotically risk-efficient. To remedy the situation, Chow and Robbins (1965) proposed the stopping time which is a special case of the following with S = p = 1:
where ,6 > 0 and the n-p is added to Vn.
4.8. ESTIMATION OF REGRESSION COEFFICIENTS
203
Chow and Yu (1981) obtain the following theorem which we will state without proof.
Theorem 4.7.3 (Chow and Yu, 1981). Let X1,X2, ... be i.2.d. random variables with E ( X ) = p and war(X) = o2 E ( 0 , ~ ) For . 6 > 0 , let an and bn sequences of constants such that
ann-2/6 41 and 0 < bn + ~ ( las)n + 00. For X > 0 and
nA
2 1, define
Then, we have
(i) If
n A =o
0 0 X-ll2
(ii) limA40 E X112T
as X
+ 0,
then X112T= o6 a s . ,
= 06,and
(iii)If E IX12p < 00 for some p > 1 and -klogX 5 k > k d , p , then as + 0
-RT =E[
(x,
026-2
Rnb
Note that p
-
nA =
o
+ AT] + 1.
2X1l2o6
> 1. In our particular case, we can set bn = n-l and an = n2/'.
4.8
Estimation of Regression Coefficients
4.8.1 Fixed-Size Confidence Bounds Gleser (1965) has extended Chow and Robbins' (1965) results to the linear regression problem Let 91, y2, ... be a sequence of independent observations with
yi = p',W
+~
i i, = 1 , 2 ,
...,n
(4.8.1)
where p' is an unknown 1 x p vector, z(2) a known p x 1 column vector, and ~i a random error having an unknown distribution function F with mean 0 and finite, but unknown variance 02. We wish t o find a region W in p-dimensional Euclidean space such that P ( p E W ) = 1 - a and such that the length of the interval cut off on the Pi-axis by W has width 5 2d, i = 1 , 2 , . . . , p . As has already been noted for p = 1, no fixed-sample size procedure meeting the requirements exists. Hence, we are led to consider sequential procedures.
CHAPTER 4. SEQUENTIAL ESTIMATION
204
When u is Known Since the least-squares (Gauss-Markov) estimate of p has, component-wise, uniformly minimum variance among all linear unbiased estimates of p, has good asymptotic properties (such as consistency) and performs reasonably well against non-linear unbiased estimates, the least squares estimate of p would be a natural candidate to use in the construction of our confidence region. It is well-known that the least squares estimate of p is given by
where YL = (91,y2, ...,yn), X n = ( ~ ( ~d2), 1 , ...,dn)): a p x n ( p 5 n) matrix and where we assume that X , is of full rank. [This can usually be achieved in practice - if not sample until p independent di) are found, start with the p corresponding yZ's and save the remainder for future use in the sequential procedure. Such a procedure does not bias the results and is equivalent to starting after a fixed number of observations no .] Since the covariance matrix of the (n) is u2 (XnX;)-', construct the confidence region
b
{a
(n>-
p>' (XnXL) {B
(n>-
ic> I
d2
which would have probability of coverage equal to P { u 2 X i 5 d 2 } if F is normal (and asymptotically for any F ) . To find a confidence interval of width 2d for any one of the pi,we could [as in Chow and Robbins (1965)] use the interval ( n )fd. Also, for any linear combination alp, a' : 1 x p, of the pi,i = 1,2, ...,p, we could use the confidence interval a'b ( n ) f d. Now, a region Wn that would be contained in all of these confidence intervals is,
pi
(4.8.3) since for any a such that a'a = 1, any z E Wn
{a' [z - b ( n ) ] } 2 5 ata=l {a'
[. -P ( 4 ] }
2
This region can be adapted for the confidence procedure.
When u is Unknown The least squares estimate of o2 is
(4.8.4)
4.8. ESTIMATION OF REGRESSION COEFFICIENTS
205
Before presenting the class of sequential procedures C, we shall consider some asymptotic properties of (n) and e2( n ) ,which will be relevant to the discussion of the asymptotic properties of the class C. Lemma 4.8.1 Let Un = (XnXi)1/2Xn = ((Un,i,j)). Then, if ( A ) maxi,j IUn,i,jl -+ 0 as n + 00, then
(i)
[b(n)- PI’(XnX;)lI2
-
n o m a l ( 0 , u21p),and
(ii) e2(n)-, u2 a.s. ( a s n + 00).
A suficient condition for the Condition ( A ) to hold is the following set of assumptions: A l : There exists a p x p positive definite matrix C such that
Under these assumptions, we can find the asymptotic probability of coverage of the region Wn. Lemma 4.8.2
If assumptions A1 and A 2 hold,
where XI, X2, ..., Ap are the characteristic roots of C-l and T (XI, X2, ...,X,) has the distribution of a weighted sum of p-independent chi-square variables with one degree of freedom, the Xi’s being the weights. Asymptotic Properties of the Class C
Given d and a! and for a fixed sequence of z-vectors, &I,d2), ... arranged so that X, is non-singular and so that assumptions A1 and A2 are satisfied, let {un} be any sequence of constants converging to the number u satisfying
P {T (A1,X2, ..., Xp) 5 U}
= 1- 0.
(4.8.5)
Then, this sequence {Un} determines a member of the class C of sequential procedures as follows:
CHAPTER 4. SEQUENTIAL ESTIMATION
206
Start by taking no 2 p observations y1,y2, ...,yno. Then, sample term by term, stopping at N where
82(k)+ k - l d2 (4.8.6) L -; k uk when sampling is stopped at N = n, construct the region Wn described in (4.8.3). Then the procedures in the class C are asymptotically “consistent” and “efficient”, as d --+ 0, which are given by Theorem 4.8.1. N = smallest k 2 no such that
Theorem 4.8.1 (Gleser, 1965). Under the assumption that 0 = 1 a.s.
,
< o2 < 00, (4.8.7)
(ii) lim P (p E
(4.8.8)
d+O
and
d2E(N) (iii) lim d-0
[
uo2
=’ -
(4.8.9)
Remark 4.8.1. The addition of n-l to o2 ( n )in (4.8.6) is unnecessary if F is continuous. N could be defined as the smallest odd, even, etc., integer 2 no such that (4.8.6) holds and Theorem 4.8.1 will be still valid. Very little is known about the properties of any member of the class C for moderate values of 0 2 / d 2 . Gleser’s assumptions (Al) and (A2) have been found to be strong and they have been weakened by Albert (1966) and Srivastava (1967, 1971). Also, the latter authors obtain spherical confidence regions for the regression parameters. Consider the following example. Example4.8.1 Let E = a + p i + ~ i , i = 1 , 2)...,n w h e r e ~ i a r e i . i . d having . mean 0 and variance 02,so, p = 2 and let
Hence,
XLXn =
[
+
n n ( n 1)/2 n(n+1)/2 n(n+1)(2n+1)/6
1
It is clear that n-l (X,X’,) does not tend to a positive definite matrix as n --+ 00. Thus Assumption (Al) is not satisfied. The characteristic roots of XnX’, are given by
4.8. ESTIMATION OF REGRESSION COEFFICIENTS
207
where
Hence one of the assumptions of Albert (1966) is not satisfied (the limit should go to unity). Srivastava (1971) has given alternative sufficient conditions that are weaker than Albert (1966) and Gleser (1965). It is known (see for example, Roy, 1957) that X n can be written as X n = T,L, where T, is a p x p triangular matrix with positive diagonal elements (hence unique); and L n is a p x TZ semi-orthogonal matrix; L n L k = Ip and Ip is a p x p identity matrix. Hence Tz1X, = L n = 1, ,In , ...,l p ) ]= ( ( l i j ( T Z ) ) ) . (2) Let A, = Xmin ( X n X k ) . Srivastava (1971) has shown that the basic result of Gleser (namely Theorem (4.8.1)) holds under the weaker conditions:
[
+ +
where TZ*= [n(l c)] 1 where and IlBll =, ,A[ (B)p2
[-I denotes here the largest integer in (.),
Example 4.8.2 Let
This satisfies the above conditions (i)-(iii). Srivastava (1971) gives the following sequential rule in order to construct confidence regions whose limiting coverage probability is 1- a and the maximum diameter of the region is almost 2d.
Procedure. Start taking no 2 p observations y1,y2, ...,yno. Then sample observation one at a time and stop when
N
= smallest
k >_ no such that [S2 ( k )
+ k-']
I d2Ak -1
ck
where A, is the smallest characteristic root of ( X n X A ) , P (xi< u) = 1 - a, -+ U . When sampling is stopped at N = n, construct the region W n defined
ii,
208
CHAPTER 4. SEQUENTIAL ESTIMATION
4.9
Confidence Intervals for P(X
In reliability problems the parameter p which is the probability that one random variable is stochastically larger than the other is of much interest. Sequential confidence interval for p = P ( X < Y ) has been considered by Govindarajulu (1974). Let ( X ,Y ) have a bivariate normal distribution with an unknown mean vector and an unknown covariance-matrix. Assume that we observe pairs of observations ( X i ,Y,), i = 1,2, ... - 2 Let Di = Y , - X i , Dn = Fn - Tn, = ( n- 1)-l Cy=,(Di - Dn) where Yn and X n denote means of samples of size n. Then, it is well known that p = @ ( p D / a o )where p D = E (Y - X ) , 0% denotes the variance of (Y - X ) . Whatever be the covariance structure of X and Y , a reasonable estimate of p is $n = @ ( E n / S D l n ) . Then, we have the following result. For the sake of simplicity, we shall suppress all the subscripts in p, a , s and $.
s~O,,
+
Theorem 4.9.1 Let a2 = (1 p2/2u2) and we have
= (1
+ Bi/2sgln). Then,
where the subscript n in Dn is suppressed for the sake of implicity.
Proof. Let @* =
( D / a ). First we will show that
Jn@* - P )
d NN
normal ( ~ , a - ~as) n
--+
00.
D
Toward this, write @* - p = a-’ (D - p ) 4 (t/cr)where t lies between and p. However, since converges t o p in probability (in fact as.), 4 ( t ) converges to 4 ( p / a )in probability. Thus fi($* - p ) / 4 ( p / a ) is asymptotically equivalent in distribution to fi (D - p ) / a . Now write
D
where
where w lies between E / s and D / a and Since D converges t o p in probability
4 (w)converges t o 4 ( p / a ) in probability.
4.9. CONFIDENCE INTERVALS FOR P(X< Y )
209
x
d
Now, using the independence of and s, we have (f~ - p ) /a$ (p/a> = normal(0, aW2+ p2 ( 2 0 ~ ) ~ ' = normal(0,l). The proof of the Theorem is complete upon noting that ii converges to a and 4 (D/s) converges to 4 ( p / o ) in probability. 1
6
Suppose we wish to set up a confidence interval of prescribed converge probability 1- a for the unknown parameter p . Define for n 2 2, In = (fi - d,fi d) and assume that d is small compared to $I ( p / o ) . Then, for
+
2 2 2
n = smallest integer 2
4
(P/+ d2
>
(4.9.1)
the interval In has coverage probability
-, 0 after applying Theorem 4.9.1. Notice that n + 00 as d -+ 0 and (4.9.1) implies that limd-,o {d2n/ [ ~ i a ( p~ / o4 ) ]~} = 1. However, since p and o and consequently 4 ( p / o ) are unknown, no optimal fixed-sample size procedure exists. An inefficient fixed-sample size procedure can be obtained by replacing 4 ( p / o ) by 4 (0) = 6in (4.9.1) which was proposed by Govindarajulu (1967). as d
Sequential Procedure Let { u k } be a sequence of constants tending to u,. Then stop at N = smallest n 2 2 for which (4.9.2)
+
and then give the confidence interval IN = ($N - d, p~ d) where $N = 4 ( D N / s N ) It is worthwhile to note that N is a genuine stopping variable. That is
P ( N < 00) = 1 since
P(N=oo) =
=
limP(N>n)
n+oo
lim P
n+oo
= 0.
(
d2 < u; -
(B/s)) n
210
CHAPTER 4. SEQUENTIAL ESTIMATION
Towards the properties of the above sequential procedure, we have the following theorem.
Theorem 4.9.2 Under the assumption that o2 > 0 (which implies that q5 ( p / o ) > 0 ) we have
Proof. In Lemma 4.7.1 set Yn = [&q5 (D/s) / {uq5 (p/cr)}]2 , f ( n )= n u i / u i and t = uiu2q52(PI.)/d2. Since limt,m f ( N ) / t -+ 1 a.s. and f i ( f i p~) / {uq5 ( p / o ) }behaves like a standard normal variable as t + 00, (i) and (ii) follow. If you define N* = smallest n such that n 2 u:&:$~( 0 )/ d 2 , then N*> N . Now, using Nadas' theorem (Theorem 4.7.2, for stopping time Ad) for N * , (iii) and (iw) will follow. W
4.10
Nonparametric Confidence Intervals
4.10.1
Confidence Intervals for the p-point of a Distribution F'unction
Farrell (1966a) 1966b) has given two sequential procedures for setting up bounded width confidence intervals for'the p-point of a distribution function that are based on the i.i.d. sequence of random variables { X n , n > l}. For further details, the reader is referred to Govindarajulu (1987, Section 5.11.1).
4.10.2
Confidence Intervals for Other Population Parameters
Geerstsema (1970a) has applied the methods of Section 4.7 for constructing a sequential nonparametric confidence interval procedure for certain population parameters. Notice that the methods of Section 4.6 will not apply here since the functional form of the density is unknown.
A General Method for Constructing Bounded Length Confidence Intervals Let X I ,X2, ...)X n be a fixed random sample of size n from a population having F for its cumulative distribution function (cdf) and let 6' be a parameter of the population. We are interested in constructing for 8, a confidence interval of length
4.10. NONPARAMETRIC CONFIDENCE INTERVALS
211
not larger than 2d. For each positive integer n, consider two statistics Ln and Un (not depending on d ) based on the first n observations, such that Ln < Un a s . and P (L, 5 8 5 U,) = 1 - a (so that, for n large, (Ln,Un) is a confidence interval for 8 with coverage probability approximately 1 - a ) . Define a stopping variable N to be the first integer n 2 no such that Un - L n 5 2d, where no is a positive integer. Take as confidence interval ( L N U , N ) . Then one could ask: (i) What is the coverage probability of the procedure? (ii) What is the expected sample size? These questions can, under the following regularity assumptions, be answered asymptotically as d --+0.
A l : Ln < Un a s . (Lnand Un are independent of d ) A2:
Jn(Un - Ln) + 2K,lA
A3:
fi(Ln - 8) = Zn/A+ K,/A o(1) a s . as n + 00 where Zn is a standardized average of i.i.d. random variables having finite second moment.
a.s. as n + 00 where A and where @ denotes the standard normal cdf.
> 0 and @ ( K a )= 1 - a / 2
+
A 4 : The set { Nd2}d,o is uniformly integrable. Then we have the following result.
Theorem 4.10.1 (Geertsema, 1970a). Under the assumptions (Al)-(A4) (2) N is well-defined, E ( N ) < 00 f o r all d > 0 , N (= N ( d ) ) is a function of d which is nondecreasing as d decreases, limd,o N = 00 a.s. and limd,oE(N) = 00..
A Procedure Based on the Sign Test Let X1,X2, ...,Xn be i.i.d. random variables having a unique median y. For testing the hypothesis y = 0, one uses the sign test based on the statistic
where I ( B ) denotes the indicator function of the set B . In the case of a fixedsample of size n, a confidence interval for y can be derived from the sign test in
CHAPTER 4. SEQUENTIAL ESTIMATION
212
a standard way. This confidence interval is of the form (X,,,(n) X,,,(n)) where Xn,l 5 Xn,2 5 - - 5 Xn,n are the ordered X’s and where a ( n ) and b(n) are integers depending on n. The limiting coverage probability as n --,00 of such confidence interval is 1 - a if ) . ( a
- 5+ n
KaJn -and b ( n ) 2
-
n KaJn -2 2 -
(4.10.1)
From this confidence interval one can thus obtain a sequential procedure as follows: 5 2d and choose Let N be the first integer n > no for which X,,,(n) - xn,$(,) as resulting confidence interval (XN,b(N), XN,.(N)) where { a ( n ) }and { b ( n ) }are sequences of positive integers satisfying Assumption (A5) which is given below and no is some integer. This procedure is similar to Farrell’s (1966b) procedure discussed in Govindarajulu (1987 Section 5.11)). The following assumption will be needed.
A5: Let X1,X2, ... be a sequence of i.i.d. random variables with common cdf F ( x - y) where F ( x ) is symmetric about zero. F has two derivatives in a neighborhood of zero and the second derivative is bounded in the neighborhood so that y is the unique median of the X’s. The sequences, {an} and {b,} are defined by b, = max [l,{ (n/2)- Kan’/’/2}], an = n - bn + 1 where [z] denotes the largest integer contained in x. We shall now show that the above procedure satisfies (Al)-(A4). Without loss of generality we can assume that y = 0. A strong use is made of the following representation of sample quantiles by Bahadur (1966). Under Assumption (A5) (4.10.2)
where {Ic ( n ) }is a sequence of positive integers satisfying k(n)= n p + ~ ( f i l n n ) , 0 < p < 1, F(5) = p , F’(5) = and Fn is the empirical cdf of the X’s. Then, we have the following Lemma.
f(c)
Lemma 4.10.1 (Geertsema, 1970a). We have
(iii) The set {Nd2}d,o is uniformly integrable.
4.10. NONPARAMETRIC CONFIDENCE INTERVALS
213
Proof. (i) It readily follows from (4.10.2) that
(ii) readily follows from (4.10.2). For the proof of (iii) see Geertsema (1970a). It consists of using a result of Yahav and Bickel (1968) and Hoeffding (1963, Theorem 1). The following theorem is a direct consequence of Theorem 4.10.1 and Lemma 4.10.1. H
Theorem 4.10.2 (Geertsema, 1970a). The confidence interval procedure based o n the sign test has asymptotic coverage probability 1 - a as d 3 0. The stopping variable N satisfies limd,o E ( N d 2 )= K2/4f2(0). Remark 4.10.1.1 Unlike in Theorem 4.7.1, no assumption of the finiteness of the second moment of F is made in Theorems 4.10.1 and 4.10.2.
A Procedure Based on the Wilcoxon One-Sample Test The Wilcoxon one-sample test procedure is based on the statistic
where XI, X2, ...,X n is a random sample from a distribution symmetric about y. The test procedure is used to test the hypothesis y = 0 against shift alternatives. A confidence interval for y based on a fixed-sample of size n is of the form (Zn,b(n),z n l a ( n ) ) where Zn,l I Zn,2 I ' I Zn1n(n+l)/2are ordered averages ( X i X j ) / 2 , for i , j = 1,2, ...,n and i 5 j . The limiting coverage probability of such an interval is 1 - a if
+
(4.10.3)
The Sequential Procedure Let N be the first integer n 2 no for which Znla(n) - Znlb(,) 5 2d and choose as resulting confidence interval ( Z N , b ( N ) - z~,~,,)). {.()} n)} and {b(n)}are sequences of positive integers satisfying (4.10.3) and no is some positive integer.
214
CHAPTER 4. SEQUENTIAL ESTIMATION
The asymptotic analysis of this procedure is somewhat complicated because it is based on ordered dependent random variables, namely the Zn,k, k = 1,2, ...,n(n+ 1)/2. Fortunately the theory of U-statistics [See Hoeffding, 1948, 19631 can be applied. The statistic
(;)y g
, ( X i + X2 ,
.o>
(4.10.4)
i=l j=i+l
is a one-sample U-statistic and the test based on it is asymptotically equivalent to the Wilcoxon one-sample test.
The Modified Sequential Procedure Let N be the first integer n 2 no for which W,,,(,) - Wn,b(n)5 2d and choose as resulting confidence interval ( W N , b ( N ) ,WNlu(N)). { a ( n ) } and {b(n)} are sequences of positive integers satisfying (4.10.3), and no is some positive integer 5 Wn,2 5 - - 5 Wn,n(n+1)/2 are ordered averages ( X i Xj)/2, for and Wn,1 i < j and i, j = 1,2, ...,n. So, let us confine ourselves to the modified sequential procedure. We need the following assumption.
+
A6: Let XI, X2,... be a sequence of i.i.d. random variables having common cdf F ( z - 7 ) where F is symmetric about zero. F has density f which satisfies f2(z)dx < 00. G (x - 7 ) denotes the cdf of (XI+ X2)/2 and G has a second derivative in some neighborhood of zero with G“ bounded in the neighborhood. G‘ is denoted by g when it exists. { a ( n ) } and { b ( n ) } are sequences of positive integers defined by
The following facts can easily be established. (i) The assumptions on F guarantee the existence of a derivative for G. (ii) If f has a Radon-Nikodym derivative f’ satisfying 00, then assumptions on G are satisfied.
s If‘l
< 00
s
and
s (f’)2 <
(iii) Assumption (A6) implies that G’(0) > 0 since G’(0) = 2 f2(z)d.Without loss of generality we can set y = 0. We also have the following result.
4.10. NONPARAMETRIC CONFIDENCE INTERVALS
215
Theorem 4.10.3 (Geertsema, 1970a). Both the confidence interval procedures based o n the Wilcoxon one-sample test procedure have asymptotic coverage probability 1 - a as d + 0. T h e stopping variable N satisfies: d+O lim
E ( N d 2 )= L 12K :
[I
-2
f2(x)dx]
.
Asymptotic Efficiencies of the Procedures Consider two bounded length confidence interval procedures T and S , for estimating the mean of a symmetric population by means of an interval of prescribed length 2d. Denote by NT and Ns the stopping variables and by p~ and p s the coverage probabilities associated with the procedures T and S respectively.
Definition 4.10.1 The asymptotic efficiency as d + 0 of procedure T relative to S is e(T,S ) = lirnd+o E (Ns) / E ( N T ) provided limd+o p~ = limd+o p s and that all the limits exist. Denote by M the procedure of Chow and Robbins (1965) [See Equation (4.7.8) and by S and W the procedures based on sign test and Wilcoxon test respectively. Then it follows from Theorems 4.7.1, 4.10.2 and 4.10.3 (under Assumptions (AS) and (A6), and o2 < 00) that e ( S ,M ) = 4a2f2 (0) ,
(4.10.5)
If one regards the procedures M , S and W as based on the t-test, sign test and the Wilcoxon one-sample test respectively, one sees that the above efficiencies are the same as the Pitman-efficiencies of the respective (fixed-sample size) tests relative to each other.
Monte Carlo Studies Geertsema (1970b) has performed a series of Monte Carlo studies for different values of d and for a few symmetric populations (normal, contaminated normal, uniform, and double exponential) to compare the behavior of the procedures with the asymptotic results. He surmises that the actual coverage probability is quite close to the asymptotic coverage probability for the two procedures that are based on rank tests is higher than that of the procedures based on the t-test. The results also suggest that E ( N ) 5 {Kz/3 [G (d) - 1/212} C in the case of the W procedure where C is a constant. The results illustrate the upper bound E ( N ) 5 K:a2/d2 no for the procedure based on the t-test [see Simons, 19681.
+
+
CHAPTER 4. SEQUENTIAL ESTIMATION
216
4.10.3 Fixed-width Confidence Intervals for P(X
X
X
If u2 is known, and if d is small compared to 02,the fixed-sample size procedure is given as follows: For any n 2 1 define In = (& - d,& + d ) and let ITa denote the (1fractile of the standard normal distribution. then, for a sample of size n determined by n = smallest integer
K p
2d2
(4.10.7)
the interval In has coverage probability
and (4.10.7) implies that limd,o ( d 2 n )/ (K2u2)= 1. It follows from the asymptotic normality of @n [See Theorem 2.1 of Govindarajulu 1968133, which could easily be extended to the situation where F and G are purely discrete since F and G can be made continuous by the continuization process described by Govindarajulu, LeCam and Raghavachari (1967) that limd,o P ( p E In) = 1- a. However, in real situations F and G and consequently o2 are unknown, so that no fixed-sample size method is available.
217
4.10. NONPARAMETRIC CONFIDENCE INTERVALS
Sequential Fixed-width Procedure Let
(4.10.8) and let { K n } be any sequence of positive constants tending to Ka as n --+ 00. Then define d2n N = smallest n 2 1 such that Vn < -. (4.10.9)
K:
Thus, we have the following theorem establishing the asymptotic properties of the above procedure.
Theorem 4.10.4 If 0 < a < 00, we have (i) limd,o
( d 2 N )/ (K:cr2) = 1 a s . , (asymptotic optimality),
(ii) limd,o P ( p E I,> = 1 - a (asymptotic consistency), (iii) 1imd-o ( d 2 E ( N )/) (K2a2)= 1 (asymptotic eficiency).
(4.10.10) (4.10.11) (4.10.12)
Proof. In Lemma 4.7.1 set Yn = Vn/a2,f(n)= n K z / K i and t = K i a 2 / d 2 . Then (4.10.9) can be rewritten as N = N ( t ) = smallest lc 2 1 such that Yi = f ( k ) / t . By Lemma 4.10.1
d2N f ( N ) - lim 1 = lim a.s. t-mo
t
d+OK2a2
(4.10.13)
which proves (4.10.10). Now
By (4.10.13) d O / a + Ka and N / t -+ 1 in probability as t + 00. Writing ( f i n - p ) = S ( F , - F ) d G - S ( G , - G ) d F + R , where & = J ( F n - F ) d (Gn - G) and using the Kolmogorov's inequality for reverse martingales, it can = op (l)4. be shown that -RN Hence, from Theorem 4.5.4 we infer that as t --+ 00, f i ( 2 j ~- p ) /a behaves like a standard normal variable. Hence
4I thank Dr. Paul Janssen of Lindburgs University center for pointing this out to me.
218
CHAPTER 4. SEQUENTIAL ESTIMATION
which establishes (4.10.11). Now the property of asymptotic efficiency (4.10.12) follows immediately from Lemma 4.7.2 whenever the distributions of the X i and E: are such that
However, this is trivially true since Vn of Theorem 4.10.4. W
5 2 and o2 > 0. This completes the proof
It also follows from Theorem 4.10.2 that
E ( N )- K202/d2= 0 (1). Case 2. Let X and Y have an unknown bivariate distribution. Let 2 = X - Y and let H ( z ) denote the distribution of 2. Then Z i = X i - yZ (i = 1,2, ..., n) constitute a random sample of size n from H . Also, let Hn(2)denote the empirical distribution function based on Zl,Z2, ..&. Let p = H ( 0 ) and fin = Hn(0). Then, it is well known from the asymptotic normality of the binomial variable and Cramer’s (1946, p. 254) theorem that
Then, a fixed-width confidence interval procedure is as follows: The stopping variable N is given by
N = smallest integer n 2 2 such that H, (0) [l - Hn (O)] 5
d2n
-
(4.10.14)
Ki
+
and we give the confidence interval IN = (HN(O)- d , H ~ ( 0 ) d ) . Then the asymptotic properties of optimality, consistency and efficiency also hold for the above sequential procedure.
Asymptotic Relative Efficiencies (ARE) We would like to compare the nonparametric procedures for estimating P ( X < Y ) with the parametric competitors discussed in Section 4.9 assuming that they have the same prescribed width and coverage probability. Thus the asymptotic efficiency of the procedure given by (4.10.9) relative to (4.9.2) (given by the ratio of the reciprocals of expected sample sizes) is
(4.10.15)
4.11. PROBLEMS
2 19
+
where u2 = 1 + ( p 2 / 2 0 2 ) , p = E ( Y - X I , o2 = war(^ - X ) = w a r ( X ) wur ( Y ) ,and o*2is given by (4.10.6). The asymptotic efficiency of the procedure in (4.10.14) relative to (4.9.2) is
Weiss and Wolfowitz (1972) give an optimal fixed-length confidence limit for the location parameter of an unknown distribution. They also compare its performance with the means procedure of Chow and Robbins (1965) and the scores procedure of Sen and Ghosh (1971). Sproule (1969) has extended the results of Chow and Robbins (1965) to the class of U-statistics.
4.11
Problems
4.2-1 In the binomial problem, assume that at least two observations are taken. Write 0** for the point (2,l). Estimate unbiasedly O (1- 0). [Hint: we have P(O**)= 28 (1- 8 ) and let Y = $P(O**(N,T') = T** (n,T') / 2 ~ ( n , T . . ) where T** (n,t ) denotes the number of paths from 0** to (n,t ) . Then Y is
an unbiased estimate of 8 (1 - 8 ) .] 4.2-2 In the binomial case, check whether removal of (i) O**, (ii) (2,O) destroys
closure. Find all unbiased estimates of 0 which depend on (N,T') only. Among them, there is only one bounded estimate. 4.2-3 Verify that the sequence of sufficient statistics for the exponential family of densities is transitive. 4.2-4 If X is normal (8, 02),show that
{ x ~sx},is transitive.
4.3-1 Show that the exponential family of densities satisfy the regularity cone). ditions of Theorem 4.3.2 on f (z; 4.4-1 Is it possible to have a two-stage procedure for estimating the parameter
of the exponential density where
0 ) } , z 2 8, (ii) f (z; 8) = 8-1 exp ( - x p ) , z > o ? (i) f (z;8) = exp {-
(Z -
(Two-stage procedure for the binomial parameter). Let X I ,X 2 , ... be i.i.d. Bernoulli variables with P ( X I = 1) = p and P ( X I = 0) = 1 - p for some 0 < p < 1. Let T be an unbiased estimator for p such that war ( T )5 B for all p and specified B. For each integer m, let T, = Xi/m
4.4-2
cE1
220
CHAPTER 4. SEQUENTIAL ESTIMATION
+
and c = ( 4 m ) - l . Consider the estimator 5!?m where Pm = (B/C)Tm ( 1 - I3C-l) Tmlfiwhere f i m = ( 1 - BC-l) Nm and Nm satisfies
B*
Show that
?m
B B*+C-C’
and --- B*
is an unbiased estimator of p with war Tm 5 B for all 0.
)
[Hint: See Samuel (1966, pp. 222-223) and note that the expected saving in observations is (BIG‘)E (Nm) when compared with the estimator T m l ~ which is based on the second sample above, which is also unbiased and whose variance is bounded by B.] 4.4-3 (Two-stage procedure for the Poisson parameter). Let
P ( X = x) = exp (-0) 0 ” / x ! , x = O , I , 2 , ... (0 > 0 ) . Let T be an unbiased estimator for 0 such that war ( T )5 B for all 0 where B is a specific bound. Let Sm = X I + X 2 - - - X m , Nm = ( 1 Sm)/mB. Define g ( S m ) = 1 if Sm = 0, and = Sm otherwise. Take additional Nm - m observations and consider
+ +
(i) Show that Ee(Z) = 0 and war (Zm) = B where h = me, m = (pB)-1/2. ( p 2 1 ) .
+ B [h(A) - (p
+
-
1 ) e-’]
/p
(ii) Assuming that supx h (A) = 0.1249, make the necessary modifications in the choices of m, Nm and Zm so that the estimator has the desired properties. Also compare the expected sample sizes of this estimator and the estimator of Birnbaum and Healy (1960). [Hint: See Samuel (1966, pp. 225-226) .] 4.4-4 (Two-stage procedure for estimating the variance of a normal distribu-
tion). Let X I ,X 2 , ... denote a sequence of i.i.d. normal variables distributed normal ( p , a 2 ) ,where p and u2 are unknown. Set up a two-stage procedure for given d and a where 2d denotes the fixed width and 1 - a denotes the confidence coefficient of the interval estimate for 02. [Hint: Let s;,, denote the unbiased version of the sample variance based on a preliminary sample of size no taken from the normal density. It is desired to determine n, on the basis of the preliminary sample such that
P(lSi+, -a21 < d ) > 1 - a
221
4.11. PROBLEMS
where
and X I ,X2, ...,Xn+l is an additional random sample of size n above probability statement is equivalent to
+ 1.
The
where E is expectation with respect to n, a = d / a 2 , V = s i + , / a 2 and f1 (-In)is the density of a chi-square variable divided by n, the degrees of freedom. Connell and Graybill (1964) have shown that
+
Hence, if a were known we set n = 1 7r (In a ) 2/ a 2 . Since a is unknown, let + 7r (In a)’ k2s;,/d2 where k is some constant independent of a such
n=1
> 1- a which determines the value of k, given by
that
(no - 1) [ ( ~ / a ) ~ / ( n o- ~11 )
k=
2 In (1/a)
For further details, see Graybill and Connell (1964a) or Govindarajulu (1987, pp. 375-377).] 4.4-5 (Two-stage procedure for estimating the parameter of a uniform density). Let f (u)= l/8 for 0 < u < 8. Then determine the sample size n, based on a preliminary sample of size no (specified) from f (u) such that
where d and a are specified and
8,
is an estimator of 8.
[Hint: The maximum likelihood estimator of 8 is Y(n),the largest observation in the sample. Let d/B = a. Then P((Yin)-BI
= P
(
l-a<-
e
= E [l- (1 - u ) ~ ] = l - E ( l - a ) n > 1 - a.
222
CHAPTER 4. SEQUENTIAL ESTIMATION If 8 were known, we let n' = Ilnal /In (1- a), 0 < a < 1. Since B is unknown, we replace a with the estimator, namely u = bd/Y(,), and determine b by the inequality
E
[(
1--
; : )n ]
> - a-
For further details, see Graybill and Connell (196413) or Govindarajulu (1987, p. 378).] Using double sampling scheme, estimate the Poisson mean with given fractional standard error u112 and also set up a (1 - 20)100% confidence interval for 0.
4.4-6
4.4-7 Estimate the exponential mean 0 by double sampling method with given fractional standard error u112 and also set up a (1 - 2a)100% confidence
interval for 8. 4.4-8 Estimate the binomial proportion 8 with given fractional standard error u1/2 and also set up a (1- 2a)100% confidence interval for 8. 4.4-9 Let 6' denote the difference between the means of two normal populations
and o2 be twice the individual population variances, which are assumed to be equal. Devise a double sampling procedure for estimating 0 with specified standard error all2. [Hint: Proceed as in Example 4.4.4.1 4.5-1 Let X I ,X2, ... be a sequence of independent Poisson random variables
having mean 8. Estimate 8 (via large-sample theory) with given small standard error a. 4.5-2 Let X I ,Xp, ... be a sequence of independent random variables having the density exp {- (X - 8 ) ) for x 2: 8. Estimate 8 with given small standard
error a. 4.5-3 Proceed as in 4.5-2 if 8 is the parameter in the Bernoulli distribution. 4.5-4 Let { X n } be a sequence of random variables such that X n -+ B a.s. and
let N denote a stopping variable such that N n / n --+ 1 in probability. Then Write P ( ~ X N - 01 > E ) = P(~XN - 81 > E , N = m ) , and partition the summation into the sets: (i) Im/n - 11 > S and (ii) Im/n - 11 5 S and use Lemma 4.11.1 in Govindarajulu (1987, p. 421).
XN + 8 in probability.
4.5-5 Let s i = n-l
c,"=1
Cy=l( X i - X n ) 2 , namely the sample variance.
result in Problem 4.5-4, show that s i --+ o2 in probability.
Using the
4.11. PROBLEMS
223
4.5-6 Let X n be uniformly continuous in probability and let X n -+ 9 a s . Let g be a function such that g' is continuous in the neighborhood of 9. Then
g ( X n ) is uniformly continuous.
[Hint: g ( X n )- g ( 8 ) = ( X n - 8) g' (X;), note that g' ( X n ) --+ g' (9) a s . and use Problem 4.5.4.1 4.5-7
Let
8,
denote the mle of 8 based on random sample of size n when
f (2;0) denotes probability (or density) function of the random variable X . Assuming the regularity assumptions (a)-(c) of Theorem 4.5.5 and
show that
[Hint: Starting from Equation (4.11.9) in Govindarajulu (1987, p. 427) one can obtain
and for some e > 0
Now, using Lohe's (1977 Vol. 1, pp. 254-255) lemma we have 1
n2
-'
Bn (9) = o ( I ) .
Consequently
n+
from which it follows that lemma, we have n' IBn (9)
(an
- 6 ) B; (9) = o (1)
8, - 6 = o
0.
+ I (0)l + 0 a.s. with
Again, using L o h e ' s (1977)
E
+ 6).
= S/ (1
Now, using these in Equation (i) we obtain the desired result.]
CHAPTER 4. SEQUENTIAL ESTIMATION
224
4.6-1 Set up a large-sample fixed-width confidence interval for 8 2 when 81 is [exp {- (X - 01) /82}], z 2 81. unknown where f (x;81,OZ) =8';
[Hint: Note that n1I2 [ X p )- 811 tends to zero in probability as n where X(1) = min ( X i ,X2, .,.,Xn).]
-+ oo
Assume that the underlying population has the distribution function F (z; el, e2) = 1-exp {(z - el) /e2}2] for x 2 el. Set up a large-sample fixed-width confidence interval for 82 when 81 is unknown.
4.6-2
[-
4.6-3 Let X be distributed as Poisson with parameter 8. Set up a large-sample
fixed-width confidence interval for 8. 4.6-4 Let ( X I ,X2) have the probability function:
+
where 81,& 2 0, 81 82 5 1 and x , y = 0, 1,...,n. Set up a large-sample fixed-width confidence interval for (i) 81 and (ii) 8 2 assuming that both 81 and 82 are unknown. 4.6-5 Let
Set up a fixed-width confidence interval for (i) 8 when cr is unknown and for (ii) u2 when 8 is unknown. [Hint: For alternative procedures, see Zacks (1966) who shows that the procedure for 8 which is based on the sample mean is inefficient when compared with the procedure based on the maximum likelihood estimator of 0.1 4.6-6 Let X be distributed as normal (p,cr2) where p and u2 as unknown. Set
up a large-sample fixed-width confidence interval for cr2. 4.6-7 Let X be distributed uniformly on (0,B).
Set up a large-sample fixed-
width confidence interval for 8. [Hint: Let Y(n)denote the maximum in a random sample of size n. Use 8, = (n 1)Y(n,/n as the unbiased estimate of 8 based on the maximum likelihood estimate.]
+
4.7-1 Let 89, 102, 108, 92, 98, 110, 88, 96, 94, 105, 107, 87, 112, 95, 99, ... constitute a sequence of independent observations from a normal population having an unknown mean p and variance u2. Estimate p by a confidence interval of given width 6 and confidence coefficient 0.90. Also find the expected sample size assuming that u2 = 10. [Hint: Use no = 2.1
4.11. PROBLEMS
225
4.7-2 For the above data estimate p with prescribed standard error a = 1.
4.9-1 Let X and Y have a bivariate normal distribution with unknown mean vector and variance-covariance matrix. Also, let ( X ; , Y , ) i = 1,2, ... denote a sequence of independent observations from this population. Set up a fixed-width confidence interval for p = P ( X < Y)and study its asymptotic properties when the width is small. [Hint: See Section 4.9.1
This page intentionally left blank
Chapter 5
Applications to Biostatistics In this chapter we will study some sequential procedures that are germane t o biostatistics
5.1
The Robbins-Monro Procedure
Let Y (x)denote the response t o a stimulus or dose level x and assume that Y (x) takes the value 0 or 1 with E [Y(x)]= P {Y(2) = 1) = M (x) where M (x) is unknown. We wish t o estimate 0 such M (0) = a where a is specified (0 < a < 1). Next the Robbins-Monro (1951) procedure is as follows: Guess an initial value x1 and let y, (x,) denote the response at x,. Then choose xn+l by the recursion formula:
where an, n = 1 , 2 , ... is a decreasing sequence of positive constants and an tends to 0 as n tends to infinity. If we stop after n iterations, xn+l will be the estimate of 8. Without loss of generality we can set a = 0. Then (5.1.1) becomes
A suitable choice for an is c/n where c is chosen optimally in some sense. Further it is not unreasonable to assume that M ( z ) > 0 for all x > 0. With an = c/n, Sacks (1958) has shown that (xn - 8) fiis approximately normal with mean 0 and variance 02c2/(2cal - 1) where a1 = M' (0) and o2 = war (Y (x)Ix) provided cal > 1/2. Robbins and Monro (1951) proved that xn converges to 8 in probability under general assumptions on the sequence {an} and on the distribution function H (ylx) = P {Y (x)5 y)x}. When an = c/n this result becomes Theorem 5.1.1 If (i) Y ( x )is a bounded random variable and (ii) for some - S for x < 8 and M ( x ) 2 a S for x > 8, then xn
+
6 > 0, M ( x ) 5 a
227
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
228
converges in quadratic mean and hence in probability to 0, i.e. limn+mbn = limn+w E (xn - 8)2 = 0 and 2, + 0 in probability. Blum (1954) established the strong convergence of xn to 8. Recall that the (xn - 0) is u2c2/ (2ca1 - 1) and this is minimized at asymptotic variance of c = l/cq, the minimum value being a2/ay. The choice of c was recommended by Hodges and Lehmann (1956) and it suffices to require that c > / (2a1)-'.
Jn
+
Remark 5.1.1 If we can assume that M(x) = a a1 (x - e), then one can derive explicit expressions for E ( X n + l ) and E (x:+~). See for instance Govindarajulu (2001, Section 7.2.3).
5.2
Parametric Estimation
Although the Robbins-Monro (R-M) procedure is nonparametric since it does not assume any form for M ( x ) or for H(ylx) = P ( Y 5 yJx), in several cases, especially in quantal response situations H (ylz) is known to be Bernoulli except for the value of a real parameter y. We can reparameterize it so that y will become the parameter to be estimated. Let E [Y(x)]= M, (x)and var [Y (x)] = V, (x). Since y determines the model, 8 is a function of a and y. Further, assume that there is a one-to-one correspondence between 8 and y so that there exists a function h such that y = ha (0). Then we may use xn as the R-M estimate of 8 and obtain the estimate of y as h, (xn). Now the problem is choosing a n in order to minimize the asymptotic variance of the estimate of y. Using the delta method one can easily show that f i [ha(xn) - y]is asymptotically normal with mean 0 and variance [h',(e)]2 u2 c2 / (2cal - 1). In our quantal response problem
Y (x)= 1 with probability M, (x) and
v,(4= M, (4 Then the R-M estimate
2,
- M,
(41 *
of 8 is such that
J n ( x n - e)
=
( rile!;)
normal 0,
-
Let
For given a the best value of c = [M; (0)I-l. With this c the asymptotic variance of Jnh, (xn) is [h',p)12a2(e) - a (1 - a )
@)I2
[M.
[q(0)12
5.2. PARAMETRIC ESTIMATION
229
since o2(8) = V, (8) = a (1- a ) and by differentiating the identity (8) = a with respect to 8 we obtain h; (0) = - M . (8) / M , (0). Now, the value of a which minimizes a (1- a ) / [M; (O)] will be independent of y provided that M; (0) factors into a function of 8 and a function of y [like M, (8) = r { s (7)) t ( O ) ] . For example, we can take P {Y (x)= 1) = M, (2) = F [x - y F-' (p)] for some 0 < ,O < 1, where F is a distribution function. Now, with that representation, y can be interpreted as the dose of x for which the probability of response is ,O; i.e., y = LDloop (lethal dose loop). Then the formula for the asymptotic variance takes the form of a (1- a)
+
+
since M, (8) = a implies F [8 - y F-' (p)] = a which in turns implies F-' ( a )= 8 - y F-l and M; (8) = -F' [O - y F-l (p)]. The asymptotic variance is independent of p since the problem is invariant under translation shifts. Now the value of a that minimizes the asymptotic variance is a = 1/2 when F is normal or logistic. [Note that the derivative is fe4{(l- 2a) f 2 (F-' ( a ) )- 2a (1- a ) x f' ( F - l ( Q . ) ) Lf = f (F-l(Q.>)l. If we want to estimate y = LDloop, then we do not need the parametric model since we can set a = p and y = 0 and thus estimate y directly from x, via the R-M method. The advantage of this method is that it assumes very little about the form of F , the disadvantage may be a significant loss of efficiency, especially when ,O is not close to 1/2.
+
(a)
+
Remark 5.2.1 Stopping rules for the R-M procedure are available in the literature. For survey of these, see Govindarajulu (1995).
Example 5.2.1 Suppose we wish to estimate the mean bacterial density y of a liquid by the dilution method. For a specified volume x of the liquid, let 1, if the number of bacteria is 0, otherwise.
21
Then
P { Y (x)= 1) = M, (x)= 1 - e-Y2 under the Poisson model for the number of bacteria in a volume x. Hence - (1 - a ) In (1 - a) M; (e) = ee-@ = Y since 1 - e--Ye = a. Consequently, the asymptotic variance becomes
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
230
and whatever be y, this is minimized by minimizing the first factor with respect to a. Hence the best a is the solution of the equation 2 a = -In (1 - a ) or a = 0.797. Thus, the recommend procedure is to carry out the R-M procedure with a = 0.797 and a, = 4.93/n.$, [since 1/c = a1 = M.4 (0) = y (1- a)]where .$, is our prior estimate of y. Our estimate of y after n steps is a -2-
-
%+1
1.594 zn+1
1 since y = ha (0) = -- In (1- a )
0
and the asymptotic variance is y2/4a (1 - a ) = 1 . 5 4 4 ~since ~ 2a = - In (1 - a ) .
5.3
Up and Down Rule
Dixon and Mood (1948) proposed an up and down method for estimating LD50 which is simpler than the R-M procedure. In the latter the dose levels are random and hence cannot be specified ahead of time. The Dixon-Mood method chooses a series of equally spaced dose levels xi. Let h denote the dose span. Typically the dose levels are logs of the true dose levels. The response Y (x)is observed only at these levels. The first observation is taken at the best initial guess of LD50.If the response is positive the next observation is made at the immediately preceding lower dose level. If the response is zero, the next trial is made at the immediately higher dose level. If positive response is coded as 1 and no response is coded as 0, then the data may look like the one in Figure 5.3.1
I
1
I
1
-3
-2
I
I
-1
I
I
0
1
I
I
I
I
I
2
3
Figure 5.3.1 Dose Level The data in Figure 5.3.1 can be explained as follows: We start at dose level 0 at which the response is 1; then we go to dose -1 and the response is 0; then we go to dose 0, and the response is 1; then go
5.4. SPEARMAN-KARBER (S-K) ESTIMATOR
231
to dose -1 at which the response 0; then go to dose 0, and the response is 1, then go to dose -1 at which the response is 0; then go to zero dose at which the response is 0, then go to dose 1 and suppose the response is 1; then go to dose 0, the response is 0; then take dose 1, the response is 0; then go to dose 2, the response is 1. The main advantage of this up and down method is that testing hovers around the mean. Also there is an increase in the accuracy of the estimate. The saving in the number of observations may be 30-40%. Further, the statistical analysis is fairly simple. One disadvantage may be that it requires each specimen be tested separately which may not be economical, especially in tests of insecticides. This method is not appropriate for estimating dose levels other than LD50. Also, the sample size is assumed to be large and one must have a rough idea of the standard deviation in advance. We set the dose level to be equal to the standard deviation. The up and down procedure stops when the nominal sample size is reached. The nominal sample size N* is a count of the number of trials, beginning with the first pair of responses that are unlike. For example, in the sequence of trial-result 000101, N* = 4. Dixon (1970, p. 253) provides some tables to facilitate the estimate of LD50 for 1 < N* 5 6 and is given by
where xt denotes the final dose level in an up and down sequence and k is read from Table 2 of Dixon (1970). For instance, if the response is 011010 and xt = 0.6 and h = 0.3, then N* = 6 and the estimate of LD50 is 0.6+0.831(0.3) = 0.85. For nominal sample sizes greater than 6, the estimate of LD50 is (Cxi hA*)/N* where the xi values are the dose levels among the N* nominal sample trials and A* is obtained from Table 3 of Dixon (1970, p. 254). Also note that A* depends on the number of initial-like responses and on the difference in the cumulative number of ones and zero values in the nominal sample of size N*.
+
5.4 Spearman-Karber (S-K) Estimator The Spearman-Karber estimator has several desirable merits (especially from the theoretical point of view). For extensive literature on the estimator the reader is referred to Govindarajulu (2001). Nanthakumar and Govindarajulu (N-G) (1994, 1999) derive the fixed-width and risk-efficient sequential rules for estimating the mean of the response function. Govindarajulu and Nanthakumar (2000) (G-N (2000)) have shown that the MLE of LD50 and the scale parameter in the logistic case are equivalent to the Spearmen-Karber type of estimators. They also derive simple expressions for the bias and the variance of the S-K estimator of LD50. Using these they obtain sequential rules that are simple to carry out. These will be presented below.
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
232
+
...,20,X I , ...)X k - 1 ,
x k denote the 2k 1 dose levels with X i = when xo is chosen at random between 0 and d. We subject n experimental units at each dose level and record the responses as 1or 0 according as the experimental unit responds to the dose or not. Let P j = P ( z j ) denote the probability of a positive response at xj = zo jh. By definition, p, the mean of the tolerance distribution is given by
Let
20
X - k , x-k+l,
+ ih, i = - k ,
..., 0, ..., k
+
(5.4.1) Then the S-K estimator is given by 0
k
(5.4.2) where p j = rj/n denotes the sample proportion of positive responses at x j . In particular, if P ( x ) = [1+ exp {- ( x - 0 ) /o}]-',then the S-K estimator of 8 is
Also the S-K type of estimator for
CJ
is given by
G-N (2000) have shown that the mles of 8 and CT coincide with (5.4.3) and (5.4.4). First let us give simple expressions for B = the bias in 8, and the variance of 8 k . G-N (2000) obtain (5.4.5) and
[
o? = ( h o ) 'I2 1 - exp 81,
n
{
- (kh
+h/2)}]
0
(5.4.6)
5.4. SPEARMAN-KARBER (S-K) ESTIMATOR
233
Fixed-Width Sequential Rule Let 2 0 be the specified width and y be the specified confidence coefficient. Then we wish to determine k such that
P
(IAe k - eI S D ) 2 7 -
(5.4.7)
it can be shown that (5.4.7) is implied
Using the asymptotic normality of by
(5.4.8) where z = [(1+ y) /2]. So using (5.4.5) and (5.4.6) in (5.4.8) when 8 and u are known, the optimal number of dose levels is 2k* 1 where
+
{
exp k*h
+
s>
2
128 - hl D
-
(z2hu/n)' I 2
- (z2hu/n)lI2
(5.4.9)
-
Since 8 and 0 are unknown, we obtain the following adaptive rule: Stop at dose level 2K 1 where
+
128 - hl - (z2h&/n)lI2
in the log is 5 1) where 2ko
(5.4.10)
+ 1 denotes the initial number of dose levels.
Example 5.4.1 Let h = 0.2, D = 0.62, y = 0.90, n = 3, 8 = 1.25 and u = 1. If we choose 20 = 0.05, the rule (5.4.9) yields k* = 11. If the data is
...
2-1
20
0000010000000020120
0
1
then, K = 14, with
814 =
z1 1
... 031111521102133333333
1.35.
Asymptotic Properties of the Sequential Rules
G-N (2000) obtained the following properties of the sequential rule (5.4.10). (i) The sequential procedure terminates finitely with probability one. (ii) E [K ( h )/ k * (h)]-+ 1 when h = ho/m where k* ( h ) is given by (5.4.9).
--+
0 as m
+ 00
for some ho
> 0,
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
234 (iii)
(IsK(h) -
el 5 D ) / P
(Ii&*(h) -
e
+
1 as h
--+ 0 ,
when h is
proportional to D 2 .
In the following table we provide some simulated values based on 100 simulations, with y = 0.95 and n = 3. Table 5.4.1’ Simulated Results for Fixed-Width Estimation
d 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2
D e 1 0.37 1 0.53 0.37 -1 0.53 -1 2 0.37 2 0.53 -2 0.37 0.53 -2
Average K 52.4 18.8 57.4 24.1 57.6 27.6 54.5 24.8
k* Coverage Probability 48 1.oo 1.oo 20 1.oo 50 21 1.oo 0.91 57 0.94 25 1.oo 57 0.92 25
Point Estimation Let c denote the cost of each experimental unit. Then we want to select the stopping stage k that minimizes
R
+ Cost var (8k) + ~2 + (2k + 1)cn,
= Risk =
(5.4.11)
where B is given by (5.4.5). Note that
Using the approximations given earlier, one can obtain
ha n
R A - (I
- 2e-lhIU)
+ (20 - h)2 e-lhlo + lcn
(5.4.12)
+
where 1 = (2k 1). If 0 and a are known, the optimum 1 (to be denoted by 2*) is given by
2n2cae1*h/20 = [4n3cah (20 - h)2 L
+ h4021‘ I 2 - h2a
(5.4.13)
J
‘Reproduced with the permission of Taylor and Francis Ltd. The website for Statistics is http://www.tandf.co,uk/journals/titles/0233/888.html
5.4. SPEARMAN-KARBER (S-K) ESTIMATOR
235
Since 8 and CT are unknown, we have the following adaptive rule: Stop when the number of dose levels is L where
+ h4] 1’2
- h 2 };
(5.4.14)
or approximately we can take
2e - h l } provided c = 0 (hl+q) for some
r]
> 0 where
(5.4.15)
and 6 are based on 1 dose levels.
Example 5.4.2 Let 8 = 0.625, o = 0.5, h = 0.2, n = 3 and c = 0.00055 and ko = 5. Computations yield 1*
= 13 and hence k* = 6 for (5.4.14)
1*
= 15 and hence k* = 7 for (5.4.15)
For the following data (generated for the above parameter configuration)
we stop at L = 15 with rule (5.4.15). 100 simulations were carried out with n = 3 and Ic0 = 5 . Table 5.4.22 Simulated Values of Stopping Time Using (5.4.15)
h 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2
C 8 0.002 1 0.008 1 0.002 -1 0.008 -1 2 0.002 0.008 2 0.002 -2 0.008 -2
0
1 1 1 1 1 1 1 1
Average K 18.1 6.3 18.4
7.7 22.6 0.9 23.7 10.4
k* R K / R ~= * risk ratio 24 0.976 10 0.968 25 0.904 11 0.834 1.112 31 1.115 14 32 1.104 14 1.026
Asymptotic Properties of the Point Estimate (i) E [ L( h )/ E * (h)] 1 when h = ho/m + 0 as m --+ --+
00
for some ho > 0.
(ii) R L / R + ~ 1 as h --+ 0. (Risk-efficiency) 2Reproduced with the permission of Taylor and Francis Ltd. T h e website for Statistics is http://www. tandf.co.uk/journals/titles/0233/888.html
236
5.5
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
Repeated Significance Tests
In some situations, data is accumulating. Several tests may be conducted on the accumulating data until either one of the tests yields a significant result or the nonsignificance is accepted at the end. This is called repeated significance testing. In this procedure a significance at level 0.05 according to the final test will not be the level of significance relative to the trial as a whole. Invariably, the true significance level will be larger than that from the final test. Because of this unstructured and unplanned behavior, it is preferable to adopt a sequential procedure from the beginning. This formal method of repeated significance testing is called a partially sequential procedure constructed on the basis of repeatedly applying fixed-sample size procedures in a systematic manner. Let 8 index the density function f (2;0). Suppose we wish to test HO : 8 = 0 against the alternative HI : 8 # 0. Given a sample of fixed size, say X I ,X2, ...Xn, the likelihood ratio test with critical level A = ea rejects Ho if and only if 1, > a where 1, denotes the logarithm of the likelihood function. Then the probability of a type I error is a = Po (1, > u ) (5.5.1) which may be estimated by using the chi-squared approximation to the null distribution of 1,. One may test Ho repeated over time. If m and N are integers such that 1 5 m < N and 0 < b 5 a , then the repeated significance test rejects Ho if and only if In > a for some n ( m 5 n < N ) or 1~ > b. Thus letting t = t , = inf{n 2 m : 1, > a } the stopping time is T = min ( N ,t,) and the test rejects HO if and only if either t, < N or I N > b. The probability of a type I error is given by (5.5.2) which is typically much larger than a. Similarly, for testing HO : 8 5 0 and H I : 8 > 0 we have the following procedure. Let m 2 1 be the initial sample size, N 2 m be the maximum sample size and the logarithmic critical bounds be a and b with 0 < b 5 a. Then we reject Ho if either 1, > a and 8, > 0 for some n E [ m , N ]or 1~ > b and ,8 > 0. Hence, if 1: = 1nI (8, > 0) and
t+ = t,+= inf{n 2 m : 1: > u }
(5.5.3)
the stopping time of the test T+ = min (taf,N ) . Note that 8, is the mle of 8. Woodroofe (1982, Section 7.2) gives asymptotic expressions (as a -+ 00) for the error probabilities and expected sample sizes of the two repeated significance
5.5. REPEATED SIGNIFICANCE TESTS
237
likelihood ratio test procedures when the underlying distribution is the exponential family of densities given by
f (x;0 ) = exp [ex- @ ( 43 with respect to some sigma-finite measure over (-00, 00) . Then 8, becomes X , = ( X i X2 X n ) /n.
+ + + * * *
Mantel-Haenszel Test For the reference, see Miller (1981). Suppose there are two populations, where n1 and n2 are categorized as dead or alive. Dead Alive Sample size Population I U b 721 Population I1 c d n2 Total ml m2 n Let p l = P(the patient dieslhe belongs to population I) and p2 = P(the patient dieslhe belongs to population 11). Suppose we wish to test HO : p l = p2 versus HI : p l # p2 and use the statistic
-
n (ad - bc)2 nln2mlm2
where $1 = u/n1, $2 = c/n2, $ = ml/n. With the correction for continuity,
x: =
n(lud- bcl -n/2)2 2 - x1. n1722m1m2
Now for given nl, n2, m1 and m2, the cell(1,l) frequency A has the hypergeometric distribution
varo(A) = n1n2m1m2 n2 (n - I) Hence
238
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
If we have a sequence of 2 x 2 tables (for instance, various hospitals) and we wish to test
HO
: P l l = P12, .-.,Pkl = p k 2
where pi1 = P(death1treatment 1 at hospital i) and pi2 = P(death1treatment 2 at hospital i). Then Mantel-Haenszel (MH) statistic is given by
and with the continuity correction, it becomes
Then we reject HO for large volume of MH,.
5.6
Test Statistics Useful in Survival Analysis
The log rank statistic plays an important role in survival analysis3, which is a special case of general statistics called 2 and V which will be described below. Consider two treatments, the standard (S) and the experimental ( N ) . Let Gs ( t ) [GN(t)]denote the probability that the survival time of a patient on treatment S [ N ]will exceed t. Suppose we are interested in testing
HO : there is no difference in the two treatments versus that is, the experimental treatment provides a longer survival. If the treatments are related such that
GN ( t )= [Gs(t)lA, A = e-'. Then, we can rewrite HO and HI as
HO : 8 = 0 (i.e., GN ( t )= Gs (t) for all t ) and
HI : 8 > 0 (ie., GN ( t )> Gs ( t )) 3Whitehead (1983, Sections 3.7 and 3.8, served as source for this subsection).
(5.6.1)
5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS
239
If X [Y]denotes the survival time of a patient having the new [standard] treatment ; then under (5.6.1),
= iluAdu
Thus p=-
ee
+ ee)
I&[
or ~ = l n
(5.6.2)
(1
and 8 = 0 corresponds to p = 1/2. A reference improvement value for 8, namely OR can be chosen by selecting the corresponding value for p, namely pR. Alternatively if time is measured in units of years and Gs(1) = 0.65 and G N ( ~=)0.75, then OR is the solution of 0.75 = (0.65)exp(-eR);namely 8~ = 0.404, which corresponds to pR = 0.6.
Test Statistics for the Sequential Case Suppose rn patients taking the new treatment have died previous to a particular point in time and n patients of the standard treatment have died. Assume that the times of deaths of these patients are known. If the progression of the disease is the response of interest rather than death of a patient and detection might be possible only at one of a series of monthly examinations, all recorded progression times would be multiples of months (i.e., integers). Hence ties could occur with positive probability. If dl < d2 < - - - < dk denote the distinct (uncensored) survival times and Oi the frequency of di (i = 1 , 2 , ..., k ) . Let ri = number of survival times
2 di.
Of these ri, let T i N be those on the new treatment and ris be those on the ~ ris = ri). Let A ~ N = r i N / T i = proportion of patients standard treatment ( r i + with new treatment surviving di or longer. Similarly Ais = riS/ri. Then let
k
(5.6.3) i= 1
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
240 and
(5.6.4)
Z is called the log
rank
statistic (It is related to Mantel-Haenszel test; see
Miller (1981, pp. 94-98).) An equivalent form of 2 is
i=l If m and n are large
. v =
mn
I *
me-812
+ neeI2‘
If t denotes the total number of deaths, then V = t / 4 when m = n
t/2.
General Forms for the Statistics Z and V Starting from the likelihood function one can derive general forms for Z and V. Denote the observed data by X = ( X I ,X2, ...,X n ) , the parameter of interest by 8 and the nuisance parameter by Q which could be a vector. For the sake of simplicity, let us take \I, to be scalar. Further, let L(8,XP;X) denote the likelihood, 1 (8, Q; X) the log likelihood and ( 8 ) be the maximum likelihood estimate (mle) of X€’ for a given value of 8. From the consistency property of the mle, we infer that in large samples, 1 8 , s (8) will be close to 1 (8, Q) with the additional advantage that it depends only on 8. This enables us to obtain an expansion of 1 8, ( 8 ) in powers of 8 and then identify the statistics Z and V from the expansion:
0
0 i0 e, 5(e)
= constant
1 + ez - -e2v + o (e3) 2
(5.6.5)
where Z is called the ‘eficient score’ for 8 and V is called the Fisher information about 8 contained in 2.
Example 5.5.1 Let X = ( X I ,X2, ...,Xn)’ be a random sample from normal Sn = Cy==l X i , then
( p , 1) population. Then if
1 ( p ) = constant
Note that Sn
d N
+ pSn - -21p 2n.
normal (np,n). If there is no nuisance parameter
z = 10 (0) , V = -lee
(0)
(5.6.6)
241
5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS where
d d2 le = a s 1 (0) and V = -1 (0).
(5.6.7)
de2 In the presence of a single nuisance parameter @ let P! (0) = S* and we can expand
A
(8) =
where
5(0) about 0 = 0 as follows: 5(0) = S* + 0Ge (0) + o (e2)
35 (8). Further, since zq
(5.6.8)
(8) is an mle,
{ 0, G ( 8 ) ) = 0.
(5.6.9)
Hence
for all 8.
Using (5.6.10) in (5.6.8) we have (after ignoring 0 (0')) (5.6.11) Now expanding Z(0, S ) about (0, @*) we have
I
(e, S )
=
1 z (0, s * )+ 0ze (0, Q*) + 202Zee (0, s * )
+e
( S - Q*)
(0, Q*)
+ -21 (XP- .*)'
(0, Q*) (5.6.12)
where the term la (O,@*) is omitted because its value is zero. Now, substitute G (0) for S , use G (0) - S* from (5.6.11) and obtain
Hence Z = 10 (0, S * )
and
= -l/P (0, Q*)
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
242
where lee (0, 9*)is the leading element of the inverse of the matrix
[
lee le*
le*
l w
]
the arguments of the second derivatives being 0 and 9*.
Example 5.5.2 Let X I ,X2, ..., X n be a random sample from normal(p, 0 2 ). Let 6 = p and 9 = o - ~ If . Sn = X I + X2 - . * X n , then
+ +
n l ( 6 , 9)= -- In (27r) 2
+ -n2 In 9
Then one can easily obtain 10
(0, @*) = Q*Sn,
leq (0, Q*) = Sn,
Hence
1/lee (0, e*)= -nQ*
+ -29*2s; . n
Thus
2z2 2 = 9 * S n and V = n9* - -. n
Example 5.5.3 If pl and p2 denote the proportion of cures by two different treatments, we will be interested in testing HO : pl = p2 versus HI : p l > p2. We can reparameterize and set ~ = 1 nP[ l ( 1 - P 2 ) P2 (1- P 1 )
]
and 9 = In Then HO corresponds to 6 = 0 and H1 corresponds to 6 > 0.
Asymptotic Distribution of 2 When 6 is small Whitehead (1983, p. 56) asserts that the approximate distribution of 2 is normal with mean 6V and variance V . This result is extensively used in constructing triangular and double triangular sequential tests of hypotheses about 6. Suppose we wish to test HO : 6 = 0 versus H I : 6 > 0. Then we plot 2 on the y-axis and V along the x-axis and the triangular test can be depicted as in Figure 5.6.1. The continuation region is 2 E ( - c XlV, c X2V) where X 2 = -A1 yields the symmetrical case. Then the lines meet on the V-axis.
+
+
243
5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS
ZA Reject H,
C
0
-C
Figure 5.6.1 A Plot of 2 aginst V for the Trangular Test
Double Triangular Case Suppose we are interested in testing HO : 8 = 0 versus H I : 8 # 0. Then we run two triangular tests:
R: R,
: :
Ho Ho
VS.
VS.
Ht:O>O HC:O
b V
Reject H, in favor o f HI-
Figure 5.6.2 The Double Triangular Test
As we see in Figure 5.6.2 we accept Ho when the outcome is within the triangle bounded by the lines z = c + X2V and z = c - qlV. Note that we will have P(reject HolO = 0) = 2a. If I denotes the interval inspection, then V is usually an integral multiple of I . The choice of the length of the inspection interval is specified by the organizers of the clinical trial. For the triangular test,
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
244
according to Whitehead (1983, p. 72),
2 -
vmax
=
4c 0R
-
where 9~ need to be specified by the problem on hand and P (reject Hole = 0) = a and P (accept Hole = 0,) = a. For further details on the theory of the triangular test the reader is referred to Whitehead (1983, Section 4.9). 0.583fi is called the correction for overshoot. The rejection rule for the composite test R in the double triangular test is
R+
R-
R
rejects HO accepts HO rejects HO in favor of H: accepts HO accepts HO accepts HO accepts HO rejects HO rejects HO in favor of H c The duration of R = max(duration of R+, duration of R-) since both component tests must stop before R does. The constants are computed along the lines of the triangular test. We have
where c and Vmaxare as given before. Whitehead (1983, Equation 4.9.7) gives an explicit expression for E (V.19) where V* is the stopping value for V.
Repeated Significance Tests Using Statistics 2 and V Suppose we wish to test Ho : 0 = 0 vs. the alternative HI : 9 # 0. Let the inspection interval be I . A maximum number, N of inspections is specified. Carry out a fixed sample size test at each inspection. The form of the test is to reject HO when 2 falls out of the interval ( - k n , k n ) and to accept HO otherwise. The value of k is determined from normal distribution tables and corresponds to a ‘normal significance level’ 2a‘. That is, each of these tests is conducted at the level 2a’. If any one of these tests rejects Ho, then the overall procedure terminates with the rejection of Ho. If all N of them accept Ho, then the overall procedure proceeds to the
5.6. TEST STATISTICS USEFUL IN SURVIVAL ANALYSIS
245
Nth inspection and then accepts Ho. The sequential trial is planned so that P (accept Hole = 0) = 1-2a and P (accept Hole = 8,) = P (accept HOjB = -8,) = p.
Thus 2a is the overall significance level of the test. For given values of I , OR, a, ,B and N one can compute 2a' and k. Originally Armitage (1975, Tables 5.5 and 5.6) has prepared certain tables which enables one to carry out the procedure. Whitehead (1983, Table XI) reproduces these tables. If we use the approximate normality of 2 with mean OV and variance V , we have for a = 0.025, p = 0.05,
P{Z
E (-c,c)
( &)
le = 01= 1 - 2~ --
= 0.95
which implies that c = 1.96fi. Further,
= 0.05.
Hence c - 8 R v = -1.645fl
which implies that
For instance, 8; = 0.5 yield V A 52. Then
=
c-ev
Whitehead (1983, Equation 6.2.6) gives an explicit expression for E(V*]O) where V* denotes the stopping value for V .
Sequential Probability Ratio Test based on 2 and V Suppose we wish to test HO: 8 = 0 versus H I : 8 = 8R with a = p. Then using bounds A = (1 - a ) / a = 1/B, one can easily compute the continuation region to be Z E (-c+sV,c+sV) where s = -8R 1 and c = - log
2
OR
(9 .
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
246
In order to correct for the overshoot one can take c=1 log
( l-a 7 - 0.583fi, )
OR
where I denotes the inspection interval (see Whitehead (1983, p. 84)). Estimation of 8
If we reject Ho via the SPRT, triangular, double triangular or the repeated significance test procedure, it is of interest to estimate 8 (pointwise or via a fixedwidth confidence interval). If for given d and y we want to find V such that P 8 - 8 5 d = y, then using the normality of 2 with mean 8V and variance V , one can easily determine the required V to be
(IA I 1
On the other hand if we wish to minimize the risk function ~ ( 8=) AE
2
(e - 8 ) + v
then the optimal V = All2. If we wish to estimate 8 after we stop in a sequential trial (say at V * ) ,then 8 = Z*/V* is no longer unbiased, since the stopping rule depends on the evidence accumulated. Armitage (1957), Siegmund (1978) and Whitehead and Jones (1979) have provided estimates of 8 after a sequential test procedure has been terminated, Whitehead (1983, Table V) has provided tables for computing a confidence interval for 8.
5.7
Sample Size Re-estimation Procedures
5.7.1 Normal Responses Estimation of the required sample size is an important issue in most of clinical trials. Fixed-sample designs use previous date or guessed values of the parameters which can be unreliable. The classical sequential designs are limited to situations where outcome assessment can be made only after patients are enrolled in the trial. Group sequential designs have also been used in clinical trials. However, the type I error rate at each analysis stage needs to be adjusted so as to control the overall type I error probability at a specified level. In several clinical trials, especially those dealing with nonfatal ailments, investigators would like to come up with a procedure at an interim stage in order to obtain updated information on the adequacy of the planned initial sample size. This often takes place when
5.7. SAMPLE SIZE REESTIMATION PROCEDURES
247
the natural history of the ailment is not well known or the treatment under study is new. In those cases, investigations are often unsure of the assumed values of the parameters that were initially used for calculating the sample size of the planning stage. Note that the initial parameter values are obtained, invariably, from various studies conducted on different populations of patients, diagnostic criteria etc. Thus, the initial sample size does not guarantee either the width of the confidence interval in estimation or the desired power in hypothesis-testing setup. Hence, it is desirable to monitor the clinical trial so as to assume that the basic assumptions on the design are reasonably satisfied and to construct procedures for estimating the sample size using the data available at the interim stage. Thus Shih (1992) makes a compelling case for not unblinding the treatment codes at the interim stage so that the integrity of the trial is maintained and no conscious or unconscious bias is introduced. If the goal of the trial is to reestimate the required sample size, the only decision that would be taken is the determination of how many additional observations, if any, are needed beyond those planned earlier. If no further observations are needed, the planned sample size is sufficient and the trial will be carried out. In a two-treatment double-blind clinical experiment, one is interested in testing the null hypothesis of equality of the means against a one-sided alternative when the common variance o2 is unknown, We wish to determine the required total sample size when the error probabilities, a and ,B are specified at a predetermined alternative. Assuming normal response, Shih (1992) provided a two-stage procedure which is an extension of Stein’s (1945) one-sample procedure. He estimates o2 by the method of maximum likelihood via the E-M algorithm and carries out a simulation study in order to evaluate the effective level of significance and the power. For further references on normal response, the reader is referred to Shih (1992). Govindarajulu (2002) proposed a closed-form estimator for u2 and showed ana-lytically that the difference between the effective and nominal levels of significance is negligible and that the power exceeds 1 - ,B when the initial sample size is large. Govindarajulu (2003) has extended the above results when the responses are from arbitrary distributions. In the following we present these results which are valid for responses from an arbitrary distribution.
5.7.2
Formulation of the Problem
Suppose the two treatment responses X and Y have unknown means pl and p2 and unknown common variance u2.We further assume that o2 is not functionally related to p1 and p2. We wish to test HO : pl = p2 against the alternative, I71 : pl < p2 with specified error probabilities a and p at p2 = pl S* where S* is specified. Since the clinical trial is double-blind, we do not know to which
+
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
248
treatment the response belongs. If U denotes the response,
U={
X , if the observation is on a patient assigned to treatment 1 Y , if the observation is on a patient assigned to treatment 2.
If ni denotes the number of patients assigned to treatment i (i = 1,2) we take 721 = n2 = n/2 where n denotes the total number of patients. Since equal number of patients are allocated to each treatment
P(U=X)=P(U=Y)=1/2.
(5.7.1)
Consequently one can easily obtain
E(U)=
2
and warU = o2
+ (P2 -4P d 2
(5.7.2)
where E ( X ) = p l , E ( Y )= p2 and war ( X ) = war ( Y )= 02. If Dn = C:==, U ; / n and, X,, and Fn2 denote the sample means, we have the identity :
(5 7.3) If a and p denote the error probabilities at the alternative p2 = pl+ 6*, then Shih (1992) obtains (5.7.4) n1 = 722 = 2 (za zp)2 (.-*/s*)~ a
+
(1- a ) , denoting the where o* is an initially guessed value of o and z, = standard normal distribution function. Let en denote an estimate of o and let
M = n1
(5.7.5)
Then, N , the total number of observations on each treatment is defined as (5.7.6) Draw N - n1 additional observations from each treatment. Then the decision rule is: Reject HOwhen (5.7.7) and accept HO otherwise. We take as in Govindarajulu (2002), n
(5.7.8) Now, one can ask (assuming that the responses follow arbitrary distributions):
5.7. SAMPLE SIZE REESTIMATION PROCEDURES
249
(i) What is the effective level of significance of this procedure? (ii) What is the effective power at the specified alternative? In the following we will provide answers to these questions.
The Effective Level of Significance The following lemmas are needed. Lemma 5.7.1 Let
(5.7.9)
Then as n becomes large Z2 n - 1 -+
when NO is true 6*2/4a2 when p2 - p1 = 6*
{ 0,
an probability.
Proof. When Ho is true Z has a standard normal distribution as n becomes large. When p2 - p l = 6*one can write
where 5 will have (asymptotically) a standard normal distribution. Thus Z/ (n - 1)11 tends to zero (in probability) when HOholds and to 6*/2a when p2 - p1 = 6*.H Lemma 5.7.2 A s n becomes large
'{ 02
Proof. Let Then we can write
zi
=
:'+
when Ho holds 6*2/4a2 when p2 - pl = 6*.
-
( X i - pl)/o and y3 =
(% - p 2 )/a,i, j
(5.7.10)
= 1,2 ,...,n1.
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
250
By the weak law of large numbers, n;' Cyil 2! and nl' CYi, tend to 1 in probability. Using Lemma 5.7.1 completes the proof of Lemma 5.7.2. Lemma 5.7.3 Let 8 = ( a * / ~ )T ~ h e.n as n gets large, an probability when Ho is true 1iS*2/4a2 when p2 - p1 = S*.
Proof. Readily follows upon noting that M0/n1 = &:/a2. Now, let us consider the effective level of significance which will be denoted by a*. Also, for the sake of simplicity, let z = z,. Then, (5.7.12) where POdenotes the probability computed when HO is true. We can write
a* = PO{y" - X N > z&n (2/N)1/2 ,&n 5 a * } +PO{FN- X N > x&n (2/N)lI2,&n > a* = TI
+ T2, respectively.
For sufficiently large n, when Ho is true, &n ( ~ * / a Thus ) ~ . using Lemma 5.7.2, we have 571 N
Next, we can write
Po (2 > z , 0 2 1) = T2
(5.7.13)
5 a* becomes 0 2 1 where 0 a!, if 0 2 1, 0, otherwise.
as
Using Lemma 5.7.3, we can write
By Anscornbe's theorem (1952)
Hence
T2 Thus a* = TI
+ T2
M
aif0<1, = OifO21. M
a for all values of 0. H
=
5.7. SAMPLE SIZE RE-ESTIMATION PROCEDURES
251
Effective Power at the Specified Alternative Let ,B* = P* {Y" -
x" 5 8, (2/N)l12z } . Then, we have the following result.
Result 5.7.1 W e have
,
when 8
s*2
2 1+ 402 '
s*2 when 8 < 1+ -
(5.7.14)
402 '
when Q, (q) = p. Proof. We can express
p*
as
+
= T;" T; respectively.
Note that since
(CT;/O)~
+1
+ S*2/402 in probability, S*2 402
T;"x 0 when 8 < 1+ and S*2 T l = 0 when 8 2 1 + 402 Thus, when 8 2 1
where
+ S*2/402, we can write
-
2= (F,, - X,,
d
- S*) ( n 1 / 2 0 ~ )=~normal(0,l) /~ for large 721
we obtain
= 2 (zcy
+ zp)2 ($)
2
,
721.
Also since
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
252 Next, when 8 < 1
+ 6*2/402
Using Lemma 5.7.3 and Anscombe's theorem (1952) on the asymptotic standard normality of 1/2 YM - X M - S* (1 402 0 we obtain
(y2 + ") (
Thus 6*2
T;
2~
p, when 8 < 1 + 402 ' 0,
when 9
6*2 2 1+ 402
-
This completes the proof of Result 5.7.1. I Govindarajulu (2002, 2003) has tabulated the values of (1 - p*) / (1 - p) as a percent per selected values of the parameters when 8 2 1 +S*2/4a2. The following conclusions can be drawn from the above tables. (i) For fixed 6* and o*,both S*/o and & increase as 0 decreases. When o *~ o2 2 6*2/4, the gain in power is higher than when 0*2- cr2 < S*2/4. The values of 0.2, 0.35 and 0.50 for S*/a are considered to be of interest in clinical trials. (ii) The percentage of gain in power is non-negative and is less than 3 percent for all practical values of S*/a.
Fixed-width Confidence Interval Estimation Suppose we wish to estimate 7 = p2 - p1 with a confidence interval having width 2d and confidence coefficient y. Let z = ~ ( l + ~be) such p that 2@ (2)- 1 = y as before, let o* be a preliminary estimate of 0. Then the number of patients to be assigned to each treatment is given by
(q) 2
nl=
.
(5.7.15)
Let &n (when n = 2711) denote an estimate of o2 based on the blinded responses U 1 ) U2, ...,Un. Then according to the two-stage procedure, we stop at n 1 if
(5.7.16)
253
5.7. SAMPLE SIZE REESTIMATION PROCEDURES otherwise allocate M - n1 additional patients to each treatment where
Note that M / n l = In other words, the total number of patients on each treatment is nl,
if
&n
5 u*
(5.7.17)
N = { M, if5n>a*.
Assume that n1 is sufficiently large, - say > 30. After we stop, the confidence interval for 7 = p2 - p1 is (YN - X N ) f d. Further, we assume that after total experimentation, the clinical trial is unblinded and we know which are X and Y observations. Of much interest is the effective coverage probability y* of the resultant confidence interval. Towards this we have the following result.
Result 5.7.2 For suficiently large
n1,
we have
+ v2
- 1, when 8 < 1 4u2 ’
+ 402 -
when 0 2 1 q2
2@ ( z d ) - 1,
Proof. We have
Proceeding as in the proof of Result 5.7.1, one can show that P1
-1,
x
when821+-
v2 402
)
otherwise, and
T2
{;
[Z ( I + -$)1’2]
x
- 1, when 6’ < 1
+402 v2 ’
otherwise, where 0 = ( o * / o > ~H. From (5.7.18) one notes that y* > 1- a for all 6’ when n* is large.
(5.7.18)
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
254
5.7.3 Binary Response Shih and Zhao (1997) propose a design for sample size re-estimation with interim binary data for double-blind clinical trials. Based on simulation study, they infer that the effect is only slight on the type I error and the nominal power. Govindarajulu (2004) derive closed-form expressions for the effective type I error probability and the power at the specified alternative. In the following we will give the results of Govindarajulu (2004). The Design and the Preliminaries
The following is the randomized design proposed by Shih and Zhao (1997). Assume that an interim analysis is conducted when the clinical trial is halfway completed from what was originally planned (i.e., when the outcome data is available from a total n* patients. Let n* be a positive integer which is a multiple of 2/7r (1 - 7r) where 0 < 7r < 1 and 7r # 1/2. Allocate at random n * / 2 patients to stratum A and the rest to stratum B. In stratum A allocate 7rn*/2 patients to treatment 1 and the rest to treatment 2. In stratum B allocate (1 - T ) n*/2 patients to treatment 1 and the rest of treatment 2. Note that the last allocation to treatments is double-blind. Let nA,1
respond to treatment 1 in stratum A,
n ~ , respond 2 nB,1
to treatment 2 in stratum A,
respond to treatment 1 in stratum B,
n ~ , respond 2
to treatment 2 in stratum B,
Let pi be the probability that a patient responds to treatment i (i = 1 , 2 ) . Let 81 denote the probability that a patient responds in stratum A and 02 denote the probability that a patient responds in stratum B. Due to the double-blindness of the experiment, only nA,1 n ~ , 2 and nB,1 nB,1 are observable. Also, el = 7t-pl (1- 7r)p2, e2 = (1- n ) p l 7rp2
+
+
+
+
and
(5.7.19) Now solving for p l and p2 and estimating, we obtain $1 =
where
7r
# 1/2
and
7r81
61
7r82 - (1- n) 8 1 - (1- 7r) 8 2 and $2 = 2T - 1 2n - 1
and
82
are independent.
(5.7.20)
5.7. SAMPLE SIZE RE-ESTIMATION PROCEDURES
255
One can easily see that $1 and $2 are unbiased for p l and p2 respectively, since 01 and 02 are unbiased for 01 and 02 respectively. Further war-cow ? I ) P2
From this we obtain
Hence and
war P(+)
= war81 -I-war62
4 We are interested in testing
Ho : p i = p2 versus HI : p l
# p2.
Let a denote the type I error probability and 1- p be the power at p l = pT and p2 = pz where pT and pa are specified. Let n* denote the required number of patients on each treatment which is assumed to be reasonably large. We are given that
(5.7.21)
+
+
Where P = (PI p2) /2, fi = ($1 $ 2 ) / 2 , z = 2 computed when HO is true. Also given is 1- p =
4
and Po denotes the probability
power at (p;,p;)
(5.7.22) where P* denotes the probability computed when (p1,p2) = (pT,p;). Note that when n*/2is large, $1 and $2 being consistent estimators of p l and p2 respectively,
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
256
we can replace @ (1 - @)by p (1- p) in (5.7.21) and by p* (1 - p * ) in (5.7.22) where p* = (pf p z ) 12. Also, let q = p l - p2. Then one can easily establish that
+
n* = 2 (z+
+ zp)'p*
(1- p*> / q * 2 .
(5.7.23)
Note that (5.7.23) is known as Lachin's (1977) formula, an elementary proof of which is in Govindarajulu (2004, Result 2.1). Now, use $1 and @2 in order to update the sample size. Let
+ z p ) 2 @(1- @)/q.
ii = 2 ( z a p
(5.7.24)
Then we have the following rule:
If ii > n* increase the sample size at each treatment to q n * (typically 1 < ~1 5 4.1). If ii < n* decrease the sample size at each treatment to w2n* (0.6 5 w2 < 1). After the sample size re-estimation, the trial will be conducted according to the newly estimated sample size (without stratification). The treated groups are unblinded and compared at the final stage using all the patients' data. Typically, 7r is set to be 0.2 or 0.8 (and not near 0.5). Next we will study the effect of the sample size re-estimation on the level of significance and the power at the specified alternative. Let N denote the selected sample size per treatment.
The Effective Level of Significance Let (P1+ P2) (2 - p l - p2)
47 ( ~ 1 , p=~ )
(5.7.25)
(Pl - P2)2 When Ho is true, i.e., when p l = p2, y ( $ 1 , @ 2 ) tends to infinity as n* becomes large. Hence, the probability measure of the set
tends to one. Let zi! be the effective level of significance of the test procedure. Then
= T -+T2 respectively,
5.7. SAMPLE SIZE REESTIMATION PROCEDURES
257
where D" denotes the complement of the event D and the estimates $ I N , $2N and @N of p l , p2 and p = (p1 p2) / 2 respectively are based on the unblinded data (i.e., after the treatment codes are broken). Also, due to the binomial and independent nature of the random variables nA,1, n42, n g , l and ng,2, and the fact that n * / 2 is large, we have the following representations (in distribution):
+
n*n
nA,1
nA,2
x
-p1+ 2
21
n* (1 2 %2
(Tplql) + [ 2 1'2,
n* ( 1
2 2
*)P242] ll2
)
and
(5.7.26) where 21, 2 2 , 2; and 2; are mutually independent standard normal variables. Now, using (5.7.19) and (5.7.26) in (5.7.20) and simplifying, Govindarajulu (2004, Eq. (3.5)) obtains
(5.7.27) Letting
U1 = fi21+ d E Z 2 and U2 = d
E 2 ; - AZ,*
(5.7.28)
when p l = pa, Govindarajulu (2004, Eq. (3.7)) obtains
(5.7.29) Also, since get
where
one can readily
(5.7.30) Thus, when pl = p2, we can write the set D as
(5.7.3 1)
258
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
Recall that in the second stage, wln* - n*/2 patients are given treatment 1 and the same number of patients are assigned to treatment 2. Let X1 be the number of patients responding to treatment 1out of (w1 - 1/2) n* patients, Then
= PI
1 v2 +U '[I + (w1 - 1/2)lI22 3 1 (Plql n* ) w1 Jz
(5.7.32)
after using representations (5.2.26), (5.7.28) and
Similarly, letting X1 to denote the number of patients responding to treatment 2 out of (w1 - 1/2) n* patients, we obtain
= P2
1 1 +W1 [ 3 + (w1 '- 1/2)1/2Z4] ( 7 ) l l 2
(5.7.33)
after using representations (5.2.26), (5.7.28) and
Note that Z3 and Z4 are independent standard normal and independent of U1 and U2. Thus, from (5.7.32) and (5.7.33) we have
When Ho is true (i.e.) pl =p2 = p ) we can write
Now letting L1 = (U1 - U2) /fiand L2 = have
( 2 3 - 24)
/aand simplifying, we
5.7. SAMPLE SIZE REESTIMATION PROCEDURES
259
where L1 and L2 are independent and approximately standard normal variables. Proceeding in an analogous manner, we obtain
L1
127T + ( 2 ~ -1 1)1/2L2l > z (2w2)lI2, lLll < Jz
(5.7.36)
Hence, for sufficiently large n* we have
{ +
where = L1 (2wi - 1)lI2L2} / (2wi)lI2, i = 1 , 2 and Y3 = L1. Note that ( y Z , Y3),is standard bivariate normal (for i = 1,2) with
and cw
(Y2, f i ) = p2 = E (y2y3) = E (LT)/ ( 2 ~ 2 ) = ~ '(2w2)-ll2. ~
(5.7.39)
From (5.7.37) we obtain
(5.7.41) (see Govindarajulu (2004, Lemma A. 1)).
Example 5.7.1 Let
p = 0.10. Then
7r
= 0.2 or 0.8,
c31
= 4.1, w2 = 0.6, a = 0.05 and
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
260 Computations yield
-
a-a
=
+ 0.37272) - @ (-2.0917
[a(-2.0917
-2
- ~ (2) p L.3746
+2
~p
+ 2.2362) - Q, (-4.8010
(x)[@ (-4.8010
11.3746
-2JW
~p (2)@
(-2.0917
- 0.3727~)] d~
- 2.236~)]d~
+ 0.37272) dx
1.3746 00
1
cp (2)@ (-4.8010
+2 1.3746 = 2 (-0.006932
+ 2.2362) dx
+ 0.023525) = 0.0332.
Example 5.7.2 Let
R
= 0.2 or 0.8, w 1 = 2.54, w2 = 0.6, a = 0.05 and
p = 0.10. Then p1 = 0.4436, p2 = 0.9129 and 6 -a
f
-2
/
00
cp (x)[@ (-2.187
+ 0.495%)- @ (-2.187
- 0.495~)]da:
cp (2)[@ (-4.801
+ 2.2362) - @ (-4.801
- 2.2362)] dx
1.3746 00
3-2
1
1.3746
-2Jm
~p
(x)@ (-2.187
+ 0.4952)
G!X
1.3746 00
1
cp (2)4? (-4.801
+2 1.3746 = 2 (-0.008869
+ 2.2362) dx
+ 0.023525) = 0.0293.
Remark 5.7.1 Examples 5.7.1 and 5.7.2 indicate that there is about 60% increase in the nominal level of significance, whereas Shih and Zhao (1997) claim that there is a slight increase in the level of significance. It is recommended that the nominal a be small, say 0.01. Effective Power at the Specified Alternative
c
We wish to obtain explicit expressions for the effective power = [ ( p r , p a ) at the alternative pi = p f (i = 1,2). Note that the nominal power at the specified alternative is 1 - p. By definition
where P* denotes the probability evaluated when ( p 1 , p z ) = (pz,pa). Instead of torturing the reader with all the technical details, we simply give the final result
5.7. SAMPLE SIZE REESTIMATION PROCEDURES
261
as given in Govindarajulu (2004, Eq. (4.18)):
where
B = [p* (1 - P * ) p 2, p* = (PY +pZ) /2, 0;=
(A: + A ; )
,
= (p;q;)1/2, i = 192, 0:
=
+
[TA: (1 - T ) A;] [ T C~ (I - T ) C2I2
+ [(I -
T)
A:
c 2
T)
C1I2,
( 2 -~ 1) (CIA:- C2Ai), i = 1,2,
pi = (0102*)-'
c1 = +?7*-3
+ TA;] [ T C-~ (1 -
[q-* (1 - 2p*) - 4 (p* - p*2)] ,
= +q-*-3 [q-* (1 - 2p*)
+ 4 (p* - p*2)] , q-* = p; - pa.
Let us consider some numerical examples Example 5.7.3 Let 7r = 0.2, pr = 0.5, pa = 0.3 (yielding q-* = 0.2, p* = 0.4), w1 = 4.1, w2 = 0.6, n* = 127, a = 0.05 and 1 - ,6 = 0.90. Then computations yield A? = 0.25, A$ = 0.21, B = 0.4899 C1 = -57.5, C2 = 62.5 01 = 40.6536, 0 2 = (O.46)lI2 = 0.6782 pi = 0.5984 , i = 1,2 i.e., p1 = 0.2090, p2 = 0.5463. Hence
/a
1
oo
x 1-
roo
-J, = 1-
cp (z) [@ (-4.834 - 0.2137~)- @ (-8.929 - 0.21372)] dx
+ 0.65222) - @ (-5.4635 + 0.65222)]dx cp (x)[@ (-0.6832 + 0.65222) + @ (-5.4635 + 0.65222)] dx [@ (-0.6832
~p (2)
1
00
= 1- 0.2183 = 0.782
+ 0.2364
CHAPTER 5. APPLICATIONS TO BIOSTATISTICS
262
which is much lower than the nominal level, whereas Shih and Zhao (1997) obtain 0.9430 based on 500 simulations.
Example 5.7.4 Let 7r = 0.2, p ; = 0.4, pz = 0.25 (yielding *I' = 0.15, p* = 0.325), ~1 = 2.54, ~2 = 0.6, n* = 205, a = 0.05 and 1 - p = 0.90. Then
A: = 0.24, A; = 0.1875, B = 0.4684 CI = -122.222, C2 = 137.778 C T ~= 84.8283, 02 = 0.6538 pi = 0.5968/&, i = 1,2 Thus, p1 = 0.2648, p2 = 0.5448 and
5' =
'-1
~p
(x) [@ (-3.3698 - 0.27462) - @ (-7.4884 - 0.2746~)]dx
+ 0.64962) - @ (-5.4022 + 0.6496~)]d~ 1-lo~p (x)[(a (-0.6662 + 0.64962) - @ (-5.4022 + 0.6496~)]dx ~p
- 10
(x)[(a (-0.6662
roo
=
= 1 - 0.2210
+ 0.294
= 0.779.
Shih and Zhao (1997) obtain 0.9400 for the power based on 500 simulation runs. Thus the effective power is much lower than the specified power at the alternative. In the totally unblinded case, Govindarajulu (2004) shows that the type I error probability is under control and the power slightly increases. Thus it seems that the randomized response model adopted with blinded case is not robust. So one should abandon the creation of strata A and B , still retaining the blindedness of the trial.
5.8
Problems
5.1-1 In Robbins-Monro process set a n = l / n and a = 0. Further assume that Yn ( X n ) = x: - 2. Start with xi = 1 and iterate using the recurrent relation X n+1 = X n + $ (2 - x i ) and stop for the first time two values of xn coincide. [Hint: you should stop when xn is close to = 1.414.1
Jz
5.1-2 Assume that M (x) = a + a1 (x - 0) where, without loss of generality, we can set a = 0 and 8 = 0. That is, M (x) = a l x and war {Y (x)} = 02. NOW, since Xn+1 = xn - anyn (xn), taking the conditional expectation on both sides for given xn and iterating the resultant expression,
5.8. PROBLEMS
263
(a) show that
n i= 1
where x1 denotes the initial value. (b) Also squaring zn+l, taking conditional expectations for given xn and by iteration, show that
and hence n
n
( ~ 2 ~ ~ )
Further, if an = c / n , E (xn+1) and E take on much simpler expressions. For details, see Govindarajulu (2001, p. 133).
5.3-1 For the following sequences of trial-result, obtain estimates of LD50 (using Tables of Dixon (1970)) (a) 0001010
(b) 01101011
(c) 001101010
5.4-1 Let h = 0.2, D = 0.62, y = 0.90, n = 3, 6 = 1.25 and 0 = 1. If
zg
= 0.05,
the rule 5.4.9 yields k* = 11. Suppose the data is
...
2-1
00000000112
0
xo 1
z1
...
1
12333333333
Carry out the sequential Spearmen-Karber estimating procedure and see whether you stop with K = 12. If you stop provide an estimate of 8. 5.4-2 Carry out the sequential risk-efficient procedure for the following data. (Assume that 8 = 0.625, o = 0.5, h = 0.2, n = 3 and c = 0.0006).
and if you stop obtain the estimate of 6.
CHAPTER 5. APPLICATIONS T O BIOSTATISTICS
264
5.5-1 Let X have the probability density function
Carry out a repeated significance test for HO : 8 5 0 versus I f 1 : 8 > 0 using the following data
0.8, -1.2,1.7?2.1, -0.6,1.4,1.9, -0.4,1.5,2.7 with N = 20, exp ( u ) = 10, exp ( b ) = 8. Suppose for Lucky Hospital, we have the following data for a certain disease: Dead Alive Sample size Population I 20 80 100 Population I1 10 70 80 Total 30 150 180
5.5-2
Suppose we wish to test HO : p l = p2 versus Mantel-Haenszel test for the above data.
If1 :
pl
#
p2 carry out
5.5-3 In Example 5.5.3 after reparameterization, obtain explicit expressions for the statistics 2 and V.
Chapter 6
Matlab Programs in Sequential Analysis 6.1
Introduction
The primary purpose of this supplement' is to help the users carry out sequential procedures with real data using a minimum of long-hand calculations and to obtain decent numerical and graphical summaries of the sequential procedures. The manual contains a series of programs in Matlab (one of the most frequently used programming languages on university campuses), implementing most well-known and widely utilized procedures of sequential analysis. Each program is essentially a sample that can be (and therefore should be) changed to fit the users' needs. The programs are accompanied with a short description of the procedure and a list of arguments such as the values of the parameter under HO and H I , the significance level of the test, the coverage probability of the confidence interval, etc. The following is a list of the procedures and the names of the corresponding Matlab functions (they are sorted in the order of their appearance in the textbook) : 0
Sequential probability ratio test (SPRT), s p r t
0
Restricted SPRT (Anderson's triangular test), r e s t s p r t
0
Rushton's sequential t-test, r t t e s t
0
Sequential t-test, t t e s t
'Dr. Alex Dmitrienko, while he was a graduate student in the Department of Statistics, University of Kentucky, has helped me in preparing these computer programs in Matlab for which I am very thankful to him.
265
266
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS 0
Sequential t2-test, tsqtest
0
Hall’s sequential test, hall
0
Rank order SPRT , rankordersprt
0
Stein’s two-stage procedure (confidence interval), steinci
0
Stein’s two-stage test
0
Robbins’ power one test, robbins
0
Cox’s sequential estimation procedure, cox
, steint
Each of these functions is saved in the file named functionname .m, where functionname is simply the name of the function (for example, sprt is saved in the file sprt .m). The source code of the functions with the description and comments are given in Section 2. In case you do not wish to type them in manually, you are welcome to download the functions from http ;//WW .ms .uky .edu/ -alexe i/mat lab. Furthermore, you can either prepare by yourself (see Section 3) or download a library of frequently used probability density functions (p.d.f.’s) or probability functions (p.f.’s). This is a list of p.d.f.’s and p.f.’s available in Matlab and their Matlab names: 0
Bernoulli, bernd
0
Beta, betad
0
Binomial, bind
0
Cauchy, cauchyd
0
Double exponential, doubled
0
Exponential, expd
0
Gamma, gammad
0
Normal, normald
0
Poisson, poissond
0
t-distribution, td
0
Uniform, uniformd
0
Weibull, weibulld
6.1. INTRODUCTION
267
What follows is the list of Matlab files used by the functions described in this manual. You need to place them in your directory. 0
decisi0n.m
0
output .m
0
ttestb0und.m
The source code (in case you decide to change these files): decision.m: function p=decision(cont);
% Produces caption for the graph if cont==-1 p=’Accept the hypothesis at stage %2.0f\n’; elseif cont==l p=’Reject the hypothesis at stage %2.0f\n’ ; else p=’The procedure didnj’t stop’; end; output.m: function p=output (c,str ,title,a,b,filename) ; %Saves the results (matrix c) in the file “filename” [l,m]=size(c);
fid=fopen(filename, ’w’);
titleI=[title ’\n\n’l ; fprintf (f id,titlel) ; fprintf (titlel) ; if 1==2 fprintf(fid, ’k=%2.0f s=%6.3f\n’,c); fprintf(’k=%2.0f s=%6.3f\n’,c); elseif 1==3 fprintf(fid, ’k=%2.0f q=%6.3f r=%6.3f\n’,c); fprintf (’k=%2.0f q=%6.3f r=%6.3f\nJ ,c); else fprintf(fid, ’k=%2.0f s=%6.3f cl=%6.Clf c2=%6.3f\n’,c); fprintf(’k=%a.Of s=%6.3f cl=%6.3f c2=%6.3f\n’,c); end ;
if (a”=O)&(b”=O) fprintf (fid,’\n\na=%6.3f fprintf(’\n\na=%6.3f b=%6.3f\n’,a,b); end; fclose (f id) ;
b=%6.3f\nJ ,a,b);
268
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
ttestbound.m: function p=ttestbound(delta,a,n); % Bounds for sequential t- and t-square-tests mu=sqrt(2*n-3); d-table=linspace(.l,.l,l); tO,table=[.002497 ,00995 ,02225 ... .03922 ,060625 .08618 .1156 .1484 ,1844 .2232]; if delta>l fprintf(’The value of delta will be rounded down to 1’); delta=l; end; i=floor(delta*lO) ; tO=tO-table(i) ; c=1+tOn2; tl= ( (delta) ^2+log(a) +O.25*1og(c) ) / (to-sqrt(c) ) ; ~=-((t0)*3-6*t0)/24; t2=( (sqrt (i-tOn2)-tO)*tO*2-tO*t~+2*u~/c)/(2*sqrt(c)*(tO-sqrt (c))) ; p=-sqrt (2)*(t0*mu+tl/mu+t2/mun3)/delta;
6.2. SEQUENTIAL PROCEDURES
6.2
269
Sequential Procedures
6.2.1 Sequential Probability Ratio Test (SPRT) Problem. Given a sequence of independent identically distributed observations Xl,X2 ,..., having p.d.f. (p.f.) f(z), we wish to test HO : f(z) = fo(z) vs. Hl : f (.) = fl(+ Procedure. Let n
Sn = C ( 1 n f i ( X n ) - 1nfo(Xn)),
72
2 1-
i=l
At the nth stage, accept Ho if Sn 5 b, reject Ho if Sn 2 a and continue sampling if b < S n < a , where b = ln(P/(l - a ) ) ,a = ln((1 - /?)/a),a and P are error probabilities. Arguments. The data set x, error probabilities alpha, beta, the name of the output file filename. Example. Assume that we have Bernoulli data with parameter p and we wish to test Ho : p = 1/3 vs. H I : p = 1/2. The following Matlab function carries out the SPRT for the data set x.
% SPRT % Arguments x=[1 1 I 0 0 I 0 I I I 0 I I 1
I]; % Observations
alpha=.I; % Error probabilities beta=. I ; filename=’sprt.txt’; n=length(x); % Number of observations a=log ( (l-beta)/alpha) ; b=log(beta/(l-alpha)); % Upper and lower bounds s (l)=log(bernoulli (x (I), 1/31 1-log(bernoul1i (x( I) ,O.5) ) ;
i=2; cont=O;
% SPRT while (i<=n)&(’cont), s (i)=s (i-l)+log(bernoulli (x(i) ,1/3))-log(bernoulli (x(i) ,O.5)) ; if s (i)a cont=l; end; i=i+i ;
270
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
end ; d=decision(cont); % Decision
% Plotting the path and the bounds k=l: (i-I); cl=linspace(b,b,i-I); c2=linspace(a,a,i-l); str=sprintf (d, i-1) ; plot(k,s,k,cl,’--’,k,c2,’--’);
title(’SPRT’); xlabel(str);
axis([l n -5 5]);% Setting the scaling for the axes
% Saving the result in a file results= [k;s] ; output (results,str,’SPRT’,a,b,filename);
6.2. SEQUENTIAL PROCEDURES
6.2.2
271
Restricted SPRT (Anderson’s Test)
Problem. Same as in the SPRT. Procedure is similar to that in the SPRT, the only difference is that Anderson’s (1960) test uses convergent bounds. Let Sn be defined as in Procedure 1. Then at the nth stage, accept Ho if Sn 5 -c-dn, reject Ho if Sn 2 c+dn and continue sampling if -c - dn < Sn < c dn, where c and d are respectively the intercept and the slope of the convergent bounds. Arguments. The data set x,intercept and slope of the bounds c, d,the name of the output file filename. Example. Assume that we have normal data with mean 0 and variance 1. We wish to test Ho : 0 = -1 vs. H I : 0 = 1. The following function carries out the restricted SPRT for the data set x.
+
% Restricted SPRT (Normal (-1,l) vs. Normal (1,l))
% Arguments; x=[-0.5 1 . 2 0.7 -1.4 0.7 0 . 4 -0.9 1.1 1.51; % Observations c=4; % Intercept d=-0.3; % Slope filename=’restsprt.txt’;
n=length(x);
% Number of observations
s(1)=log(normal (x (1) ,-1,l) ) -log (normal(x (1),I , I ) ) ; i=2; cont=O;
% SPRT while (i<=n)&(-cont), s (1)= s (i-1) +log (normal (x(i) ,- I , I) ) -log (normal(x (i) ,I,I) ) ; if s(i)<-c-d*i cont=-1; elseif s (i) >c+d*i cont=l ; end ; i=i+i; end ; d=decision(cont) ; % Plotting the path and the bounds k=l:(i-l); cl=linspace(b,b,i-I); c2=linspace(a,a,i-l); str=sprintf(d, i-1) ;
272
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
plot(k,s,k,cl,’--’ ,k,~2,’--~); title(’Norma1 (-1,l) vs. Normal (1,l): Restricted SPRT’); xlabel(str) axis([i n -5 51); % Setting the scaling for the axes
% Saving the result in a file results=[k; s;cl; c21 ; output(results,str,’Restricted SPRT’,O,O,filename);
6.2. SEQUENTIAL PROCEDURES
273
6.2.3 Rushton’s Sequential t-test Problem. Assume that X I ,X 2 , . . ., are normal (8, 02), both 8 and o2 are unknown, and we want to test HO : 8 = 80 vs. H I : 8 - 80 2 So, where S is a specified number. Procedure. Rushton (1950) proposed to use the following algorithm. Let
Then at stage n) stop if qn 5 b or qn 2 a. If rn satisfies the same inequality, make the appropriate decision (i.e. accept HOif qn 5 b or reject HO if qn 2 a ) or take one more observation otherwise. Here b = ln(P/(l- a ) ) ,a = ln((1- P)/a), a and ,O are error probabilities. Arguments. The data set x, error probabilities alpha, beta, delta=S, theta=&), the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test Ho : 8 = 5 versus H I : 8 - 5 > 0.20. The following function carries out Rushton’s t-test for the data set x.
% Rushton’s sequential t-test % Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1 5.4 4.1 5.2 4.8 4.6 5.7 5.9 5.81; % Observations alpha=.05; % Error probabilities beta=.05; theta=5; % delta=0.2; filename=’rttest.txt’; n=length(x); % Number of observations a=log((i-beta)/alpha); % Upper and lower bounds b=log(beta/(i-alpha)) ; i = 2 ; cont=O; q(i)=O; r(i)=O; sum=x (I) ; sumsq= (x( I) -theta) -2;
...
274
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
% Sequential t-test while (i<=n)&(-cont), sum=sum+x(i) ; sumsq=sumsq+(x ( i) -theta) 2 ; T=(sum-i*theta)*sqrt (sumsq) ; % T-statistic
% Rushton’s statistics q(i)=O.25*(delta*T)^2+delta*T*sqrt (i-1) ; t r(i)=0.25*(delta*T)&2+delta*T*sqrt (i-l)*(1-1/(4*(i-l)) +(delta*T) -2/(24* (i-1))) ; if (q(i)a)&(r(i)>a))>a) cont=i; end; i=i+i;
end ; d=decision(cont); % Decision
% Plotting the path and the bounds k=l:(i-I); cl=linspace(b,b,i-1); c2=linspace(a,a,i-l); str=sprintf(d, (i-1)) ; plot(k,q,k,r,k,cl,’--’,k,c2,’--’); title( ’Rushton’’s sequential t-test’); xlabel(str) ;
axis([l
n -5 51); % Setting the scaling for the axes
% Saving the result in a file results=[k;q;rl; output(results,str,’Rushton”s
sequential t-test’,a,b,filename);
6.2. SEQUENTIAL PROCEDURES
6.2.4
275
Sequential t-test
Problem. Same as in Rushton’s sequential t-test (Subsection 2.3). Procedure. Govindarajulu and Howard (1989) obtained the following modification of the sequential t-test. Let Tn be defined as in Subsection 2.3. At stage n, accept Ho if Tn 5 Bn or reject Ho if Tn 2 An. Here the bounds Bn and An are defined as follows:
where to is the unique solution of
and tl = (3S2/4
+ 1nB + (lnc)/4) /(to - &),
t2 = ( t ? ( I b - to) - tot1
+ 2u(to)/c) / (2&(to - 6)) 7
p =,/,
c = 1 + t & u(t) = -(t3 - 6t)/24. The formula for A, is the same, except A replaces B. Further, B = @/(1-a ) ,A = (1- /?)/a,a and @ are error probabilities. Arguments. The data set x,error probabilities alpha, beta, delta=&, theta=&, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 8 = 5 versus Hi : 8 - 5 > 0.20. The following Matlab function carries out sequential t-test for the data set x.
% Sequential t-test
% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1 5.4 4.1 5.2 4.8 4.6 5.7 5.9 5.81; % Observations alpha=.05; % Error probabilities beta=.05; theta=5; % delta=O. 2; filename=’ttest.txt’; n=length(x); % Number of observations a=(l-beta)/alpha; % Upper and lower bounds b=beta/ (I-alpha) ;
...
276
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
i=2; t(l)=O;
cont=O;
sum=O; s= (x( I) -theta) -2;
% Sequential t-test while (i<=n)&(-cont), sum=sum+x (i) ; s=s+(x(i)-theta)^2; t (i)=(sum-i*theta)/sqrt (s) ; % T-statistic cl(i)=ttestbound(delta,b,i) ; % Lower bound c2(i)=ttestbound(delta,a,i) ; % Upper bound if t(i)c2(i) cont=l; end; i=i+i; end ; d=decision(cont); % Decision
% Plotting the path and the bounds k=l:(i-I); str=sprintf(d, (i-I)); plot(k,t,k,cl,’--’,k,c2,’--’) ; title(’Sequentia1 t-test’); xlabel(str); axis( [I n -15 151); % Setting the scaling for the axes
% Saving the result in a file results=[k;t;cl;c2]; output(results,str,’Sequential t-test’,O,O,filename);
277
6.2. SEQUENTIAL PROCEDURES
6.2.5
Sequential t2-test
Problem. Assume, as in Rushton’s t-test (Subsection 2.3), that XI,X2,. . ., are normal (8, 02)and both 8 and o2 are unknown. Now we wish to test HO : 8 = 80 versus a two-sided alternative HI : (e - 001 2 60, where 6 is a specified number. Procedure. Govindarajulu and Howard (1989) proposed also a modification of the sequential t2-test. If Tn is defined as before in Subsection 2.3, at the nth stage, accept Ho if lTnl 5 B; or reject HO if lTnl 2 A;, where the bounds B; and A: are defined exactly as B and A in Subsection 2.4, except 2B replaces B and 2A replaces A. Again, B = p/(1 - a ) , A = (1 - p)/a, a and p are error probabilities. Arguments. The data set x,error probabilities alpha,beta,delta=S,theta=eo, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 8 = 5 versus HI : 18 - 51 > 0.20. The following function carries out sequential t2-test for the data set x. % Sequential tsquare-test
% Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.1 5.4 4.1 ... 5.2 4.8 4.6 5.7 5.9 5.81; % Observations alpha=.05; % Error probabilities beta=.05; theta=5; % delta=O.2; filename=’tsqtest.txt’; n=length(x) ; % Number of observations a=(i-beta)/alpha; % Upper and lower bounds b=beta/(i-alpha) ; i=2; t(i)=O;
cont=O;
sum=O ; s= (x(i) -theta) -2 ;
% Sequential tsquare-test while (i<=n>&(-cont), sum=sum+x(i) ; s=s+ (x(i) -theta) -2; t (i)=abs( (sum-i*theta)/sqrt (s)) ;
% T-statistic
278
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS cl(i)=ttestbound(delta,2*b,i) ; % Lower bound c2(i)=ttestbound(delta,2*a, i) ; % Upper bound if t(i)c2(i) end ; i=i+i ;
cont=i;
end ; d=decision(cont); % Decision
% Plotting the path and the bounds k=l:(i-I); str=sprintf(d, (i-I)); plot(k,t,k,cl,’--’ ,k,c2,’--’); title(’Sequentia1 tsquare-test’) ; xlabel(str) ; axis( [I n -15 151); % Setting the scaling for the axes
% Saving the result in a file results= [k;t ;cl ;c21; output(results,str,’Sequential tsquare-test’,O,O,filename);
6.2. SEQUENTIAL PROCEDURES
279
6.2.6 Hall's Sequential Test Problem. Assume again that X I ,X 2 , . . ., are normal (0,a2)and both 8 and a2 are unknown. We want to test HO : 8 = 0 against H I : 8 = S (S is specified). Procedure. Hall (1962) suggested to make use of the following two-stage procedure. Take a preliminary sample of size m ( m 2) and compute the sample mean and the sample variance s& from this sample. Then define
>
xm
am = -1na
+ (Ina)2/(rn - 11,
+
bm = In@- (In,0)2/(rn - 1).
+
Now for all n 2 m 1, let rn = nS(Xn - S/2)/sk and at stage n 2 m 1, accept Ho if rn 5 b, or reject Ho if rn am. Here a and p are error probabilities. Arguments. The data set x,error probabilities alpha,beta,delta=S, the size of the preliminary sample mO, the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02. We wish to test HO : 9 = 0 versus HI : B = 0.2. The following function carries out Hall's sequential t-test for the data set x.
>
% Hall's sequential test % Arguments x=[0.4 0.3 0.2 0 . 0 0.4 0.9 0.4 0.1 0.4 0.2 0 . 7 0.9 0.0 0.01; % Observations alpha=.05; % Error probabilities beta=.05; delta=O.2; m0=9; % Size of the pilot sample filename='hall.txt'; n=length(x) ; % Number of observations s=std(x(l:mO))
; % Sample variance
r (mO) =sum(x ( I:mO) ) -mO*delta/2 ; j=mO+I; cont=O ; a=s*(-log(alpha)+(log(alpha) )-2/(mO-l))/delta; % Upper and lower bounds
b=s* (log(beta)
- (log(beta) ) ^2/(mO-I)) /delta;
% Second sample while (i<=n)&("cont), r (i) =r (i-I) +y (i) -delta/2 ;
280
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
if (r(i)a) cont=i ; end ; i=i+i ; end ;
r (mO)=O ; d=decision(cont) ;
% Plotting the path and the bounds k=l: (i-I); cl=linspace(b,b,i-I); c2=linspace(a,a,i-l) ; str=sprintf(d, i-I) ; plot(k,r,k,cl,’--’,k,c2,’--’); title(’Ha1l”s sequential test’); xlabel(str1; axis([l n -3 31); % Setting the scaling for the axes
% Saving the result in a file results=[k;r] ; output(results,str,’Hall”s sequential test’,a,b,filename);
281
6.2. SEQUENTIAL PROCEDURES
6.2.7
Stein’s Two-Stage Procedure (Confidence Interval)
Problem. Assume that X I ,X2,. . ., are normal (8, a2)with both 8 and a2 being unknown. The goal is to estimate 8 by a fixed-width confidence interval having the specified coverage probability 1- a. Procedure. Stein (1945) proposed the following two-stage procedure for constructing the fixed-width confidence interval for 8. Take a preliminary sample of size no (no 2 2) and calculate the sample variance s2 from this sample. Let t(n0 - 1,l - a/2) be the two-sided l O O a % quantile of the t-distribution with no - 1 degrees of freedom. Then set z = [ d / t ( n o- 1,1- a/2)I2,where d is the half-width of the confidence interval, and n = max( [ s 2 / z ] 1,no). If n > no, draw - d , y + d). an additional sample of n - no observations and estimate 8 by Arguments. The data set x, the coverage probability alpha,the half-width d, the size of the preliminary sample no,the 100a% quantile of the t-distribution with no - 1 degrees of freedom quan,the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to construct a 90%-confidence interval of width 6. The following Matlab function carries out Stein’s procedure for the data set x.
+
(x
% Stein’s two-stage procedure
% Arguments x=[10.5 19.5 20.5 23.1 24.3 24.3 15.6 24.6 22.2 21.9 21.31; alpha=.i; % I-Coverage probability d=3; % Half-width of the confidence interval nO=lO; % Size of the pilot sample quan=l.8125; filename=’steinci.txt’; l=length(x) ; % Number of observations z=(d/quan)^2; % Parameter y=x(I:nO) ; s=std(y) ; % Sample variance n=max(floor(s/z)+i,nO); if n>l disp(’Stein”s else xbar=mean(x (I :n) ) ;
% Total sample size procedure did not stop.’);
% Saving the result in a file fid=fopen(filename, ’w’) ; fprintf(fid,’Stein’’s two-stage procedure\n\n’);
282
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
fprintf(fid,’Confidence interval=(%6.3f, %6.3f)\n’,xbar-d,xbar+d); fprintf(fid,’Variance=%6.3f\n’,s) f close (f id) ; fprintf(’Stein”s two-stage procedure\n\n’); fprintf(’C0nfidence interval=(%6.3f, %6.3f)\n’,xbar-d,xbar+d); fprintf (’Variance=%6.3f\n’,s) ; end ;
6.2. SEQUENTIAL PROCEDURES
6.2.8
283
Stein’s Two-Stage Test
Problem. Under the assumptions of Subsection 2.7, we want to test Ho : 8 < 80
Hi : 8 2 80. Procedure. The sequential test porposed by Stein (1945) for testing HOis similar to the procedure described in Subsection 2.7. First, take a preliminary sample of size no (no 2 2) and calculate the sample variance s2. Then, let t(n0- 1,l-a) be the 100a% quantile of the t-distribution with no - 1 degrees of freedom. Finally, for any positive z, define n = max( [ s 2 / z ] + 1,no). If n > no, draw an additional sample of n - no observations and accept HO if T < t(n0 - 1,l- a ) or reject HO if T 2 t(n0 - 1,1- a ) ,where T = +(f? - &)/s . Arguments. The data set x, the probability of type I error coverage alpha, theta=&, the size of the preliminary sample no,parameter z,the 100a% quantile of the t-distribution with no - 1 degrees of freedom quan,the name of the output file filename. Example. Assume that we have normal data with mean 8 and unknown variance 02.We wish to test HO : 8 < 20 vs. Hi : 8 2 20. The following Matlab function carries out Stein’s sequential test for the data set x. VS.
% Stein’s sequential test % Arguments x=[10.5 19.5 20.5 23.1 24.3 24.3 15.6 24.6 22.2 21.9 21.31; alpha=.1; theta=20 ; nO=lO; % Size of the pilot sample z=l; quan=l.8125; filename=’steint.txt’;
l=length(x); % Number of observations y=x(l:nO) ; s=std(y) ; % Sample variance n=max(floor(s/z)+l,nO); % Total sample size if n>l disp(’Stein’’s test did not stop.’) ; else xbar=mean(x(l:n)) ; T=sqrt(n*s)*(xbar-theta); if T
284
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
% Saving the result in a file f id=fopen(f ilename ,’w ’ ) ; fprintf(fid,’Stein”s sequential test\n\n’); fprintf (fid,str); fprintf (fid,’Variance=%6.3f\n’,s) fclose (fid) ; fprintf (’Stein’’s sequential test\n\n’); fprintf(str); fprintf(’Variance=%6.3f\n’,s); end ;
6.2. SEQUENTIAL PROCEDURES
285
6.2.9 Robbins’ Power One Test Problem. Suppose that X I ,X 2 , . . ., are normal (9,l). We wish to test HO : 9 < 0 against H I : 9 > 0. Procedure. Darling and Robbins (1967a) suggested to carry out the following sequential test. Let n
sn = EX^,
+
= { ( n m) [a2
+ l n ( l + n/m)])1’2
i=l
where u2 and m are positive constants, and at the nth stage accept Ho if Sn 5 -c,,reject HO if Sn 2 c, and continue sampling otherwise. Arguments. The data set x, constants asq=A2, m = m , the name of the output file filename. Example. Assume that we have normal data with mean 8 and variance 1. The following Matlab function carries out Robbins’ sequential test for the data set x with u2 = 9 and m = 1.
% Robbins’ power one sequential test % Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.4 5.11; % Observations y=(x-4.5)/0.5; % Standardized observations asq=9; m=i; filename=’robbins.txt’; n=length(x) ; % Number of observations s(l>=y(l> ; c1 (i>=-sqrt((l+m>*(asq+log(l+l/m> 1) ; c2(i)=-ci(l); i=2; cont=O;
% Robbins’ test while (i<=n>& (-cant), ci(i>=-sqrt((i+m>*(asq+log(I+i/m>>>;
c2(i)=-cI(i) ; s(i)=s(i-i>+y(i> ; if s(i> cont=-i; elseif s(i)>ca(i) cont=i; end ; i=i+l ; end;
286
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
d=decision (cont) ;
% Plotting the path and the bounds k=l: (i-1) ; str=sprintf (d, i-1) ; plot(k,s,k,cl,’--’,k,~2,’--’); title(’Robbins” sequential test’); xlabel(str); axis([1 n -15 151); % Setting the scaling for the axes
% Saving the result in a file results=[k; s;cl;c21 ; output(results,str,’Robbins”sequential test ’,O,O,filename);
6.2. SEQUENTIAL PROCEDURES
287
6.2.10 Rank Order SPRT Problem. Let ( X I ,Yl),(X2,Yz),. . ., be independent identically distributed bivariate variables with marginal distributions F and G. We wish to test Ho : F ( z ) = G ( z ) for all z vs. H I : F 6 ( z )= G(z) for all z and some 6 # 1. Procedure. Savage and Sethuraman (1966) constructed a rank order SPRT for this problem. At stage n, let W1,.. . ,W2n denote the combined sample of X I , . . . , X n and Y1,.. . ,Yn. Also, let Fn(z) and G n ( z ) be respectively empirical distribution functions of X i , . . . ,X n and Y1,. . . ,Yn. Then
is the likelihood ratio for H I and Ho. Therefore, we can now carry out the SPRT and at stage n accept HO if Sn 5 B , reject Ho if Sn 2 A and take one more observation if neither of these inequalities is satisfied. Here B = p/(1 - a ) , A = (1- P)/a,Q! and p are error probabilities. Arguments. The two data sets x, y, error probabilities alpha, beta, the name of the output file filename. Example. The following Matlab function carries out the rank order SPRT for the data sets x, y with a = .05 and p = .05.
% Rank order SPRT % Arguments x=[5.4 5.3 5.2 4.5 5.0 5.4 3.8 5.9 5.1 5.4 4.1 5.7 5.9 5.8 5.01; y=[5.2 4.7 4.8 4.9 5.9 5.2 4.8 4.9 5.0 5.1 4.5 5.4 4.9 4.7 4.81; alpha=.05; beta=. 05; filename=’ranksprt.txt’;
n=length(x); % Number of observations a=log( (I-beta)/alpha) ; % Upper and lower bounds b=log(beta/(l-alpha)) ;
i=l; cont=O;
% SPRT while (i<=n>&(-cont) , w=[x(l:i>
y(1:i)l; % Combined sample
288
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
for j=i:(2*i), f(j)=O; g(j)=O; % Empirical distribution functions for k=l:i , if x(k)<=w(j) f (j)=f (j)+l; end; if y(k)<=w(j) g(j)=g(j)+l; end; end ; end ; s(i)=(2^i)*prod(l:
(2*i))/prod(f+2*g)
;
% Likelihood ratio
if s(i)a cont=l; end; i=i+l; end; d=decision(cont) ;
% Plotting the path and the bounds k=l:(i-I) ; cl=linspace(b,b,i-I); c2=linspace(a,a,i-l); str=sprintf(decis, i-I) ; plot(k,s,k,cl,’--’,k,c2,’--’); axis([l
title(’Rank order SPRT’); xlabel(str);
n -5 51); % Setting the scaling for the axes
% Saving the result in a file results=[k;s] ; output(results,str,’ Rank order SPRT’,a,b,filename);
6.2. SEQUENTIAL PROCEDURES
6.2.11
289
Cox’s Sequential Estimation Procedure
Problem. Suppose that X I , ,k 2 1, are normal with mean 8 and variance 02. We wish to estimate B with a given standard error a. Procedure. The two-stage estimation procedure outlined below was proposed by Cox (1952b). Draw a preliminary sample of size no (no 2 2) and calculate the sample variance s 2 . Then, let n = [s2(1+2/no)/a2].If n > no, take an additional sample of n - no observations and estimate 8 by I?n. If n is smaller than no, estimate 8 by Arguments. The data set x,the standard error a,the size of the preliminary sample no,the name of the output file filename. Example. Assume that we have normal data with mean B and unknown variance 02.We wish to estimate 8 with standard error a = 0.5. The following Matlab function carries out Cox’s sequential procedure for the data set x.
zn0.
% Cox’s sequential procedure % Arguments x=[10.5 19.5 20.5 23.1 24.3 24.3 15.6 24.6 22.2 21.9 21.31; a=0.5; nO=lO; % Size of the pilot sample filename=’cox.txt’; l=length(x) ; % Number of observations y=x (1 :no); s=std(y) ; % Sample variance n=floor(s*(1+2/nO)/a) ; if n>nO xbar=mean(x( 1 :n) ) ; else xbar=mean(x(l:nO)); end;
% Saving the result in a file fid=fopen(filename,’w’); fprintf(fid,’Cox’s sequential procedure\n\n’); fprintf(fid,’Estimate= %6.3f\n’,xbar); fprintf(fid,’Variance=%6,3f\n’,s) fclose(fid) ;
fprintf(’Cox’s sequential procedure\n\n’); fprintf(’Estimate= %6.3f\n’,xbar); fprintf(’Variance=%6.3f\n’,s); end :
290
6.3
CHAPTER 6. MATLAB PROGRAMS IN SEQUENTIAL ANALYSIS
Distribution
In this section you can find the source code of the Minitab functions generating the p.d.f’s or p.f.’s listed in the introduction. Notice that each of these functions must be saved in an m-extension file whose name coincides with the name of the function, e.g. the function bernd must placed in bernd.m.
function a=bernd(x, theta) ; % Bernoulli p.f. if x==l a=theta; else a=theta-I; end; function p=betad(x,r,s); % Beta p.d.f. a=gamma(s+r) *exp ( (r-I)*log(x)+ (s-I)*log( I-x) ) / . . . (gammdr)*gamma(s) 1; function a=bind(x,p,n) ; % Binomial p.f. a=gamma(n+i)*exp(x*log(p)+(n-x)*log(l-p))/. (gamma(x+I) *gamma(n-x+I) 1 ; function a=cauchyd(x ,theta) ; % Cauchy p.d.f. a=theta/(pi*(thetan2+xn2)); function a=doubled(x) ; % Double exponential p.d.f. a=exp(-abs(x))/2; function a=expd(x,theta) ; % Exponential p .d.f . a=theta*exp (-thetaax); function a=gammad(x,p) ; % Gamma p.d.f. a=exp(-x+(p-I)*log(x) >/gamma(p> ; function a=normald(x,theta,sigmasq) ;
% Normal p.d.f. a=exp(- (x-theta) -2)/sqrt(2*pi*sigmasq) ; function a=poissond(x,lambda) ;
..
291
6.3. DISTRIBUTION
% Poisson d.f. a=exp(-lambda+x*log(lambda))/gamma(x+i)
;
function a=td(x,n) ;
% t-distribution a=gamma((n+l)/2)/(sqrt(pi*n)*gamma(n/2)*. exp ( ( (n+l)/2)*log( i+x^2/n) ) ) ;
..
function a=uniformd(x,l,u); % Uniform p.d.f. if (lC=x)&(xC=u) a=l/(u-1) ; else a=O; end; function a=weibulld(x,m); % Weibull p.d.f. a=m*exp((m-i)*log(x)-exp(m*log(x)))
;
This page intentionally left blank
Referenced Journals The following is a list of the references that are cited in the text. The relevant page numbers in which the references are cited are given at the end of each reference in brackets. The references are arranged alphabetically according to the author’s names. Multiple-authored articles are listed according to the names of the primary authors only. The following abbreviations of journal titles are used.
List of the Abbreviated Journal Titles Acta Math. Acad. Sci, Hungar. - Acta Mathematica Academiae Scientiarum Ann. Inst. Statis. Math.
-
Annals of the Institute of Statistical Mathematics
(Tokyo)
Ann. Math. Statist.
-
The Annals of Mathematical Statistics
Ann. Statist.- The Annals of Statistics Austl. J. Statist.
-
Australian Journal of Statistics
Bull. Amer. Math. SOC.- Bulletin of the American Mathematical Society Bull. Intern. Statist. Inst. - Bulletin of the International Statistical Institute Bell Syst. Tech. J. - Bell System Technical Journal Calcutta Statist. Assoc. Bull.
-
Calcutta Statistical Association Bulletin (Cal-
cutta)
Cornrnun. Statist. Theor. Method - Communications in Statistics - Theory and Met hods
Duke Math J. -Duke Mathematical Journal J. Arner. Statis. Assoc.
-
Journal of the American Statistical Association
J. A p p l . Prob. - Journal of Applied Probability
293
REFERENCED JOURNALS
294
J. Austral. Math. SOC.- The Journal of the Australian Mathematical Society J. Ind. Statist. Assoc. - Journal of Indian Statistical Association J . Roy. Statist. SOC. Ser. A ( o r B ) - Journal of the Royal Statistical Society, Series A (or B)
Philos. Trans. Roy. SOC. Ser. A - Philosophical Transactions of the Royal Society Series A (London) Proc. Cambridge Philos. SOC. - Proceedings of the Cambridge Philosophical Society (Cambridge, England) Berkeley Symp. Math. Stat. Prob. nth (n = 1 , 2 , ...,6) - Proceedings of the nth ( n = 1,2, ...,6) Berkeley Symposium of Mathematical Statistics and Probability (Berkeley, California)
Proc.
Proc. Nut. Acad. Sci. U.S.A. - Proceeding of the National Academy of Sciences of the United Stated of America Rep. Statist. Appli. Res. Un. Japan Sci. Engrs. - Reports of Statistical Application Union of Japanese Scientists and Engineers (Tokyo) Rev. Inst. Intern. Statist. (Hague) Sankhya Ser. A (or B) (or B)
-
-
Review of the international Statistical Institute
Sankhyii, The Indian Journal of Statistics, Series A
SOC.Indust. and Appl. Muth. - Society for Industrial and Applied Mathematics South African Statist. J. - South African Statistical Journal Statis. Med. - Statistics in Medicine Theor. Prob. Appl. - Theory of Probability and Applications 2. Wahrscheinlichkeitstheorie und Verw. Gebiete - Wahrscheinlinchkeitstheorie und Verwandte Gebiete (Berlin)
Bibliography [l]Aivazyan, S. A. (1959). A comparison of the optimal properties of the
Neyman-Pearson and the Wald sequential probability ratio test. Theor. Probability Appl. 4 86-93. [lo51
[a] Albert, A. (1966). Fixed size confidence ellipsoids for linear regression parameters. Ann. Math. Statist. 37 1602-1630. [206, 2071
[3] Anderson, T. W. (1960). A modification of the sequential probability ratio test to reduce the sample size. Ann. Math. Statist. 31 165-197. [47, 48, 49, 75, 2711 [4] Anderson, T. W. and Friedman, M. (1960). A limitation of the optimum property of the sequential probability ratio test. Contributions t o Probability and Statistics Essay in honor of Harold Hotelling (Ed. Olkin et al). Standord University Press No. 6 57-59. [46, 1361 [5] Anscombe, F. J. (1949). Large-sample theory of sequential estimation. Biometrika 36 455-458. [182] [6] Anscombe, F. J. (1952). Large-sample theory of sequential estimation. Proc. Cambridge Philos. SOC.48 600-607.[182, 183, 187, 189, 200, 250, 2521 [7] Anscombe, F. J. (1953). Sequential estimation. J. Roy. Statist. SOC.Ser B. 15 1-29. [200] [8] Armitage, P. (1947). Some sequential tests of Students’ hypothesis. J. Roy. Statist. SOC.Suppl. 9 250-263. [go] [9] Armitage, P. (1950). Sequential analysis with more than two alternative hypothesis, and its relation to discriminant function analysis. J. Roy. Statist. SOC.Ser. B. 12 137-144. [go, 95, 961 [lo] Armitage, P. (1957). Restricted sequential procedures. Biometrika 44 9-26.
[47, 49, 58, 2461
295
BIBLIOGRAPHY
296
[ll]Arimtage, P. (1975). Sequential Medical Trials, 2nd Ed., Oxford, Blackwell
Scientific Publications. [245] [12] Aroian, L. A. (1968). Sequential analysis-direct method. Technometrics 10 125-132. [44]
[13] Bahadur, R. R. (1954). Sufficiency and statistical decision function. Ann. Math. Statist. 25 423-462. [145] [14] Bahadur, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37 557-580. [184, 2121 [15] Baker, A. G. (1950). Properties of some tests in sequential analysis. Biometrika 37 334-346. [73] [16] Barnard, G. A. (1952). The frequency justification of certain sequential tests. Biometrika 39 144-150. [65, 76, 851 [17] Barnard, G. A. (1954). Sampling inspection and statistical decisions. J. Roy. Statist. SOC.Ser. B. 16 151-174. [106, 1071 [18] Barnard, G. A. (1969). Practical applications of tests with power one. Bull. Intern, Statist. Inst. 43,parts 1 and 2 389-393. [121] [19] Bartlett, M. S. (1937). Some examples of statistical methods of research in agriculture and applied biology. J. Roy. Statist. SOC.Suppl. 4 137-170. [171] [20] Bartlett, M. S. (1946). The large sample theory of sequential tests. Proc. Cambridge Philos. SOC.42 239-244. [49, 81, 84, 851 [21] Bechhofer, R. (1960). A note on the limiting relative efficiency of the Wald sequential probability test. J. Arner. Statist. Assoc. 55 660-663. [lo51
[22] Berk, R. H. (1973). Some asymptotic aspects of sequential analysis. Ann. Statist. 1 1126-1138. [51, 531
[23] Bhattacharyya, A. (1946). On some analogues of the amount of information and their use in statistical estimation. Sankhya 8 1-14. [157]
[24] Bhattacharya, P. K. and Mallik, A. (1973). Asymptotic normality of the stopping times of some sequential procedures. Ann. Statist. 1 1203-1211. ~891 [25] Billard, L. and Vagholkar, M. D. (1969). A sequential procedure for testing a null hypothesis against a two-sided alternative hypothesis. J. Roy. Statist. SOC.Ser. B. 31 285-294. [97, 981
BIBLIOGRAPHY
297
[26] Birnbaum, Z. W. (1956). On a use of the Mann-Whitney statistics. Proc. Third Berlcerley Symp. Math. Stat. Prob. Univ. of California Press 13-17. [2161 [27] Birnbaum, A. and Healy, W. C. Jr. (1960). Estimates with prescribed variance based on two-stage sampling. Ann. Math. Statist. 31 662-676. [220] [28] Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. Ann. Math. Statist. 18 105-110. [144] [29] Blackwell, D. and Girshick, M. A. (1947). A lower bound for the variance of some unbiased sequential estimates. Ann. Math. Statist. 18 277-280. [157] [30] Blum, J. (1954). Multidimensional stochastic approximation methods. Ann. Math. Statist. 25 737-744. [228] [31] Bradley, R. A. (1967). Topics in rank order statistics . Proc. Fifth Berlcelely Symp. Muth. Statist. Prob. 1 593-607, University of California Press. [127] [32] Bradley, R. A., Merchant, S. D. and Wilcoxon, F. (1966). Sequential rank test 11; modified two-sample procedure. Technometrics 8 615-623. [127] [33] Breslow, N. (1969). On large sample sequential analysis with applications to survivorship data. J , Appl. Prob. 6 261-274. [85, 861 1341 Burman, J. P. (1946). Sequential sampling formulae for a binomial population J. Roy. Statist. SOC.Suppl. 8 98-103. [log] [35] Chernoff, H. (1956). Large sample theory: parametric case. Ann. Math. Statist. 27 1-22. [lo51 [36] Chernoff, H. (1960). Sequential test for the mean of a normal distribution. Proc. Fourth Berkelely Symp. Math. Statist. Prob. 1 79-92, University of California Press. [114] [37] Chernoff, H. (1968). Optimal stochastic control. Sankhya Ser. A . 30 221252. [114, 115, 1161 [38] Chow, Y. S. and Robbins, H. (1965). On the asymptotic theory of fixed width sequential confidence intervals for the mean. Ann. Math. Statist. 36 457-462. [190, 193, 196, 197, 198, 199, 202, 204, 215, 2191 [39] Chow, Y . S. and Yu, K. F. (1981). The performance of a sequential procedure for the estimation of the mean. Ann. Statist. 9 184-189. [202, 2031 [40] Chow, Y . S., Robbins, H., and Teicher, H. (1965). Moments of randomly stopped sums. Ann. Math. Statist. 36 789-799. [25, 261
298
BIBLIOGRAPHY
[41] Connel, T. L. and Graybill, F. A. (1964). A Tchebycheff type inequality for chi-square. Unpublished manuscript. [221] [42] Cox, D. R. (1952a). A note on the sequential estimation of means. Proc. Cambridge Philos. SOC.48 447-450. [76, 1821 [43] Cox, D. R. (1952b). Estimation by double sampling. Biometrika 39 217227. [167, 168, 171, 173, 2891 [44] Cox. D. R. (1963). Large sample sequential tests for composite hypotheses. Sankhya Ser. A. 25 5-12. [81, 82, 83, 851 [45] Gamer, H. (1946). Mathematical Methods of Statistics, Princeton University Press, Princeton. [69, 79, 81, 126, 2181 [46] Dantzig, G. B. (1940). On the non existence of tests of “Students” hypothesis having power functions independent of 0.Ann. Math. Statist. 11 186-192. [7] [47] Darling D. A. and Robbins, H. (1967a). Inequalities for the sequence of sample means. Proc. Nut. Acad. Sci. U.S.A. 57 1157-1180. [121] [48] Darling D. A. and Robbins, H. (196713). Iterated logarithm inequalities. Proc. Nut. Acad. Sci. U.S.A. 57 1188-1192. [121] [49] Darling D. A. and Robbins, H. (1968a). Some further remarks on inequalities for sample sums. Proc. Nut. Acad. Sci. U.S.A. 60 1175-1182. [121] [50] Darling D. A. and Robbins, H. (196813). Some nonparametric sequential tests with power one. Proc. Nut. Acad. Sci. U.S.A. 61 804-809. [123, 1241 [51] David, H. T. and Kruskal, W. H. (1956). The WAGR sequential t-test reaches a decision with probability one. Ann. Math. Statist. 2 7 797-805. ~2,691 [52] DeGroot, M. H. (1959). Unbiased binomial sequential estimation. Ann. Math. Statist. 30 80-101. [144, 148, 150, 1511 [53] Dixon, W. J. (1970). Quanta1 response variable experimentation: the up and down method: in McArthur, Colton. Statistics. in Endocrinology 251267 (MIT Press, Cambridge). [231] [54] Dixon, W. J. and Mood, A. M. (1948). A method for obtaining and analysing sensitivity data. J. Arner. Statist. Assoc. 43 109-126. [230] [55] Dodge, H. F. and Romig, H. G. (1929). A method of sampling inspection. Bell Syst. Tech. J. 8 613-631. [3]
BIBLIOGRAPHY
299
[56] Donnelly, T. G. (1957). A Family of Truncated Sequential Tests. Doctoral Dissertation, Univ. of North Carolina. [47] [57] Ericson, R. (1966). On moments of cumulative sums. Ann. Math. Statist. 37 1803-1805. [57] [58] Fabian, V. (1974). Note on Anderson’s sequential procedures. Ann. Statist. 2 170-176. [50, 511 [59] Farrell, R. H. (1966a). Bounded length confidence intervals for the p-point of a distribution function, 11. Ann. Math. Statist. 37 581-585. [210] [60] Farrell, R. H. (1966b). Bounded length confidence intervals for the p-point of a distribution function, 111. Ann. Math. Statist. 37 586-592. [210, 2121 [Sl] Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York. 378-383. [32]
[62] Fisz, M. (1963). Probability Theory and Mathematical Statistics. Wiley, New York. [126] [63] Franzh, S. (2003). SPRT fixed-length confidence intervals. Submitted to Comm. Statist. Theor. Meth. [173, 1761 [64] Geertsema, J. C. (1970a). Sequential confidence intervals based on rank
tests. Ann. Math. Statist. 41 1016-1026. [210, 211, 212, 213, 2151
[65] Geertsema, J. C. (1970b). A Monte Carlo study of sequential confidence intervals based on rank tests. South African Statist. J. 4 25-31. [215] [66] Ghosh, B. K. (1970). Sequential Tests of Statistical Hypotheses. AddisonWesley, Reading (Mass.) [v] [67] Ghosh, J. K. (1960). On some properties of sequential t-test. Calcutta Statist. Assoc. Bull. 9 77-86. [70, 711 [68] Ghosh, M. N. (1960). Bounds for the expected sample size in a sequential probability ratio test. J. Roy. Statist. SOC.Ser. B 22 360-367. [33] [69] Ghosh, M., Mukhopadhyay, N. and Sen, P. K. (1997). Sequential Estimation. Wiley. New York. [v] [70] Ghurye, S. G. and Robbins, H. (1954). Two-stage procedures for estimating the difference between means. Biometrika 41 146-152. [162, 1631 [71] Girshick, M. A., Mosteller, F., and Savage, L. J. (1946). Unbiased estimates for certain binomial sampling problems with applications. Ann. Math. Statist. 13 13-23. [147, 1481
300
BIBLIOGRAPHY
1721 Gleser, L. J. (1965). On the asymptotic theory of fixed-size sequential confidence bounds for linear regression parameters. Ann. Math. Statist. 36 463-467. (Correction note (1966) Ann. Math. Statist. 37 1053-1055). [203, 206, 2071 [73] Gleser, L. J., Robbins, H., Starr, N. (1964). Some asymptotic properties of fixed-width sequential confidence intervals for the mean of a normal population with unknown variance. Tech. Report. Dept. of Math. Statist. Columbia University. [ZOO] [74] Govindarajulu, Z. (1967). Two-sided confidence limits for P ( X < Y ) based on normal samples of X and Y. Sankhyii Ser. B. 29 35-40. [209] 1751 Govindarajulu, Z. (1968a). Asymptotic normality of two-sample rank order sequential probability ratio test based on Lehmann alternatives. Unpublished manuscript. [51, 1291 [76] Govindarajulu, Z. (1968b). Distribution-free confidence bounds for P ( X < Y ) .Ann. Inst. Statist. Math. 20 229-238. [216] [77] Govindarajulu, Z. (1974). Fixed-width confidence intervals for P(X
[80] Govindarajulu, Z. (1995). Certain sequential adaptive design problems. Adaptive Designs (IMS Lecture Notes - Monograph Series No. 25) (Eds. N. Flournoy and W. F. Rosenberger) 197-212. [229]
[81] Govindarajulu, Z. (2001). Statistical Techniques in Bioassay, 2nd, revised and enlarged Ed., Karger Publishers, New York. [228, 231, 2631 [82] Govindarajulu, Z. (2002). Robustness of a sample size re-estimation procedure in clinical trials. Advances in Methodological and Applied Aspects of Prob. and Statist. (Ed. N. Balakrishnan) Taylor and Francis, New York. 383-398. [247, 248, 2521
BIBLIOGRAPHY
301
Govindarajulu, Z. (2003). Robustness of a sample size re-estimation procedure in clinical trials (Arbitrary populations). Statist. Med. 22 1819-1828. Correction, Vol. 23 to appear. [247, 2521 Govindarajulu, Z. (2004). Robustness of a sample size re-estimation with interim binary data for double-blind clinical trials. J. Statist. Planning and Inference (to appear). [254, 256, 257, 259, 261, 2621 Govindarajulu, Z. and Howard, H. C. (1989). Uniform asymptotic expansions applied to sequential t and t2 tests. Austr. J. Statist. 31 No. 1 95-104. [63, 65, 66, 275, 2771 Govindarajulu, Z. and Howard, H. C. (1994). Asymptotic expansions applied to sequential F-test criteria. Austr. J. Statist. 36 No. 1 101-113. [77, 78, 791 Govindarajulu, Z. and Nanthakumar, A. (2000). Sequential estimation of the mean logistic response function. Statistics. 33 309-33. [231, 232, 2331 Govindarajulu, Z., LeCam, L., and Raghavachari, M. (1967). Generalizations of the theorems of Chernoff and Savage on the asymptotic normality of test statistics. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 1608-638 University of California Press. [216] Graybill, F. A. and Connell, T. L. (1964a). Sample size estimating the variance within d units of the true value. Ann. Math. Statist. 35 438-440. [194, 2211 Graybill, F. A. and Connell, T. L. (1964b). Sample size required to estimate the parameter in the uniform density within d units. J. Amer. Statist. Assoc. 59 550-556. [222] Groeneveld, R. A. (1971). A note on the sequential sign test. American Statistician 25 (2) 15-16. [125, 1261 Hajek, J. and Sidak, Z. (1967). Theory of rank test. Academic Press, New York. [126] Haldane, J. B. S. (1945). On a method of estimating frequencies. Biometrika 33 222-225. I1511 Hall, W. J. (1962). Some sequential analogs of Stein’s two-stage test. Biometraka 49 367-378. [73, 74, 2791 Hall, W. J. (1965). Methods of sequentially testing composite hypotheses with special reference to the two-sample problem Univ. of North Carolina, Inst. of Statistics Mimeo Series No. 441, 40-41. [124]
302
BIBLIOGRAPHY
[96] Hall, W. J., Wijsman, R. A., and Ghosh, J . K. (1965). The relationship between sufficiency and invariance with applications in sequential analysis. Ann. Math. Statist. 36 575-614. [124, 1271 [97] Hodges, J. L. and Lehmann, E. L. (1956). Two approximations to the Robbins-Monro process. Proc. Third Berkeley Symp. Math. Statist. Prob. Vol. 1 95-104. Univ. of California Press, Berkeley. [228] [98] Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19 293-325. [214] [99] Hoeffding, W. (1953). A lower bound for the average sample number of a sequential test. Ann. Math. Statist. 24 127-130. [34]
[loo) HoeEding, W. (1960). Lower bounds for the expected sample size and the average risk of a sequential procedure. Ann. Math. Statist. 31 352-368. [34]
[ 1011 Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-20. [213, 2141 [lo21 Hoel, P. G. (1955). On a sequential test for the general linear hypothesis. Ann. Math. Statist. 26 136-139. [75, 761 [lo31 Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. FiBh Berkeley Symp. Math. Statist. Prob. 1 221-233 University of California Press at Berkeley and Los Angeles. [186] [lo41 Joanes, D. N. (1972). Sequential tests of composite hypothesis. Biometrzka 59 633-637. Correction: Biometrika 62 (1975) 221. [85] [lo51 Johnson, N. L. (1953). Some notes on the application of sequential test for the general linear hypothesis. Ann. Math. Statist. 24 614-623. [76] [lo61 Johnson, N. L. (1959). A proof of Wald’s theorem on cumulative sums. Ann. Math. Statist. 30 1245-1247. [23] I1071 Kemp, K. W. (1958). Characteristic and the averge sample number of some sequential test AE for calculating the operating. J. Roy. Statist. SOC.Ser. B. 20 379-386. [36, 37, 391 [lo81 Kemperman, J. H. B. (1961). The Passage problem for a Stationary Markov Chain. Univ. of Chicago Press, Chicago, Illinois. [32]
[ 1091 Khan, R. A. (1969). A general method of determing fixed-width confidence intervals. Ann. Math. Statist. 40 704-709. [191, 192, 1941 [110] Lachin, J. M. (1977). Sample size determination for r x c comparative trials. Bzometrics 33 315-324. [256]
BIBLIOGRAPHY
303
[lll]Laha, R. G. and Rohatgi, V. K. (1979). Probability Theory. Wiley, New York. [189]
[112] Lawing, W. D. and David, H. T. (1966). Likelihood ratio computation of operating characteristics. Ann. Math. Statist. 37 1704-1716. [50]
[113] LeCam, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Muth. Statist. 4 1 802-828. [79, 1911 [114] Lehmann, E. L. (1950). Notes on the Theory of Estimation. Lecture notes recorded by Colin Blyth. Associated Student’s Store. Univ of California, Berkeley. [144, 153, 1561 [115] Lehmann, E. L. (1953). The power of rank test. Ann. Muth. Statist. 24 23-43. [127] [116] Lehmann, E. L. (1955). Ordered families of distributions. Ann. Math. Statist. 26 399-419. [173] [117] Lehmann, E. L. (1959). Testing Statistical Hypotheses. 104-110 Wiley, New York. [vii, 45, 46, 1741 [118] Lehmann, E. L. and Stein, C. (1950). Completeness in the sequential case. Ann. Muth. Statist. 21 376-385. [145, 1461 [119] Lindley, D. V. and Barnett, B. N. 91965). Sequential sampling, two decision problems with linear losses for binomial and normal random variables. Biometrika. 52 507-532. [lll,113, 114, 1151 c-
[120] ‘Lokve, M. (1977). Probability Theory Vol. 1. Van Nostrand, Princeton. [187, 2231 [121] Lorden, G. (1970). On excess over the boundary. Ann. Math. Statist. 41 520-527. [86] [122] Lorden, G. (1973). Open-ended tests for Koopman-Darmois familes. Ann. Statist. 1633-643. [88, 891
[123] Matthes, T. K. (1963). On the optimality of sequential probability ratio tests. Ann. Math. Statist. 34 18-21. [45] [124/ Mikhalevich, V. S. (1956). Sequential Bayes solutions and optimal methods
of statistical acceptance control. Theor. Prob. Appl. 1 395-421. [114]
[125] Miller, R. G. (1970). Sequential signed rank test. J. Amer. Statist. Assoc. 6 5 1554-1561. [132, 133, 134, 1361
304
BIBLIOGRAPHY
[126] Miller, R. G. (1981). Survival Analysis. JohnWiley and Sons, New York. [237, 2401 [127] Moriguti, S. and Robbins, H. (1962). A Bayes test of p 5 1/2 against p > 1/2. Rep. Statist. Appl. Res. Un. Japan Sci. Engrs. 9 39-60. [114] [128] Moshman, J. (1958). A method for selecting the size of the initial sample in Stein’s two-sample procedure. A n n . Math. Statist. 29 1271-1275. [lo] [129] Nadas, A. (1969). An extension of a theorem of Chow and Robbins on sequential confidence intervals for the mean. A n n . Math. Statist. 40 667671. [196, 200, 2011 [130] Nanthakumar, A. and Govindarajulu, Z. (1994). Risk-efficient estimation of the mean of logistic response function using Spearmen-Karber estimator. Statistica Sinica 4 305-324. I2311 [131] Nanthakumar, A. and Govindarajulu, Z. (1999). Fixed-width estimation of the mean of logistic response function using Spearmen-Karber estimator. Biometrical Journal 41 No. 4, 445-456. [231] [132] Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. SOC.Ser. A . 231 289-337. [111
[133] Page, E. S. (1954). An improvement to Wald’s approximation for some properties of sequential test. J. Roy. Statist. SOC.Ser. B. 16 136-139. [36] [134] Paulson, E. (1947). A note on the efficiency of the Wald sequential test. Ann. Math. Statist. 19 447-450. [loo]
[ 1351 Paulson, E. (1964). A sequential procedure for selecting the population with the largest mean from k normal populations. Ann. Math. Statist. 35 174-180. [51] [136] Putter, J. (1951). Sur une methods de double e’chantillonnage, pour estimer le mayenne d’une population Laplacienne Stratifice. Etc. Rev. Inst. Intern. Statist. 19 231-238. [162]
[ 1371 Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory. Harvard University Press, Cambridge, Massachusetts. [114] [138] Rao, C. R. (1965). Linear Statistical Inference and its Applications. Wiley, New York. [186, 1911 [139] Ray, W. D. A. (1957a). A proof that the sequential probability ratio test (SPRT) of the general linear hypothesis terminates with probability unity. Ann. Math. Statist. 28 521-522. [77]
BIBLIOGRAPHY
305
[140] Ray, W. D. A. (1957b). Sequential confidence intervals for the mean of a normal population with unknown variance. J. Roy. Statist. Soc. Ser. B. 19 133-143. [200] I1411 Renyi, A. (1957). On the asymptotic distribution of the sum of random number of random variables. Acta. Math. Acad. Sci. Hungar. 8 193-199. ~891 [142] Reynolds, M. R. Jr. (1975). A sequential signed-rank test for symmetry. Ann. Statist. 3 382-400. [134, 135, 1361 [143] Richter, D. (1960). Two stage experiments for estimating a common mean. Ann. Math. Statist. 31 1164-1173. [164, 165, 166, 1671 [144] Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. SOC.58 527-535. [162] [145] Robbins, H. (1959). Sequential estimation of the mean of a normal population. Probability and Statistics - The Herald Cram& Volume 235-245. Almquist and Wiksell, Uppsala, Sweden. [189, 201, 2021 [146] Robbins, H. (1970). Statistical methods related to the law of the iterated logarithm. Ann. Math. Statist. 4 1 1397-1409. [118, 1231
[ 1471 Robbins, H. and Monro, S. (1951). Stochastic approximation method. Ann. Math. Statist. 22 400-407. [227] [148] Roy, S. N. (1957). Some Aspects of Multivariate Analysis. Wiley. New York. ~071 [149] Rushton, S. (1950). On a sequential t-test. Biometrika 37 326-333. (63, 65, 66, 83, 2731 [150] Rushton, S. (1952). On a two-sided sequential t-test. Biometrika 39 302308. Correction: 4 1 286. [65, 66, 761 [151] Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures. Ann. Math. Statist. 29 373-405. [227] [152] Sacks, J. (1965). A note on the sequential t-test. Ann. Math. Statist. 36 1867-1869. [71, 721 [153] Samuel, E. (1966). Estimators with prescribed bound on the variance for the parameter in the binomial and Poisson distributions based on two-stage sampling. J. Amer. Statist. Assoc. 61 220-227. [220]
[154] Savage, I. R. (1959). Contributions to the theory of rank order statistics the one-sample case. Ann. Math. Statist. 30 1018-1023. [131]
306
BIBLIOGRAPHY
[155] Savage, I. R. and Sethuraman, J. (1966). Stopping tine of a rank order sequential probability ratio test based on Lehmann alternatives. Ann. Math. Statist. 37 1154-1160. Correction: 38 (1967) 1309. [127, 129, 2871 [156] Schwarz, G. (1962). Asymptotic shapes of Bayes Sequential testing regions. Ann. Math. Statist. 33 224-236. [88, 891 [157] Sen, P. K. and Ghosh, M. (1971). On bounded length sequential confidence intervals based on one-sample rank-order statistics. Ann. Math. Statist. 42 189-203. [219] [158] Seth, G. R. (1949). On the variance of estimates. Ann. Math. Statist. 20 1-27. [157, 1581 [159] Sethuraman, J. (1970). Stopping time of a rank order sequential probability ratio test based on Lehmann alternatives 11. Ann. Math. Statist. 41 13221333. [129] [160] Shih, W. J. (1992). Sample size re-estimation in clinical trials. Biopharmaceutical Sequential Applications (Ed. K.E. Peace). pp. 285-301. Marcel Dekker Inc., New York. [247, 2481 [161] Shih, W. J. and Zhao, P. L. (1997). Design for sample size re-estimation with interim data for double-blind clinical trials with binary outcomes. Statistics and Medicine. 16 1913-1923. [254, 260, 2621 [162] Siegmund. D. 0. (1967). Some one-sided stopping rules. Ann. Math. Statist. 38 1641-1646. [52] [163] Siegmund. D. 0. (1968). On the asymptotic normality of one-sided stopping rules. Ann. Math. Statist. 39 1493-1497. [190] [164] Siegmund. D. 0. (1978). Estimation following sequential tests. Biometrika. 65 341-349. [246] [165] Siegmund, D. 0. (1985). Sequential Analysis, Tests and Confidence Intervals. Springer-Verlag, New York. [v, vii, 871 [166] Simons, G. (1967). Lower bounds for average sample number of sequential multi-hypothesis tests. Ann. Math. Statist. 38 1343-1364. [95] [167] Simons, G. (1968). On the cost of not knowing the variance when making a fixed-width confidence interval for the mean. Ann. Math. Statist. 39 19461952. [215] [168] Sobel, M. and Wald, A. (1949). A sequential decision procedure for choosing one of three hypotheses concerning the unknown mean of a normal distribution. Ann. Math. Statist. 20 502-522. [go, 91, 951
BIBLIOGRAPHY
307
[169] Sproule, R. N. (1969). A Sequential Fixed- Width Confidence Interval for the Mean of a U-statistic. Ph.D. dissertation, University of North Carolina. ~2191 [170] Srivastava, M. S. (1967). On fixed-width confidence bounds for regression parameters and mean vector. J. Roy. Statist. SOC.Ser. B. 29 132-140. [206] [171] Srivastava, M. S. (1971). On fixed-width confidence bounds for regression parameters. Ann. Math. Statist. 42 1403-1411. [206, 2071
[ 1721 Starr, N. (1966a). The performance of a sequential procedure for fixed-width interval estimate. Ann. Math. Statist. 36 36-50. [200] [173] Starr, N. (1966b). On the asymptotic efficiency of a sequential procedure for estimating the mean. Ann. Math. Statist. 37 1173-1185. [202] [174] Starr, N. and Woodroofe, M. (1969). Remarks on sequential point estimation, Proceedings National Academy Sciences, USA. 63 285-288. [202] [175] Stein, C. (1945). A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Statist. 16 243-258. [7, 158, 160, 167, 247, 281, 2831 [176] Stein, C. (1946). A note on cumulative sums. Ann. Math. Statist. 17 498499. [13, 331 [177] Stein, C. (1949). Some problems in sequential estimation. Econometrika 17 77-78. [200] [178] Tallis, G. M. and Vagholkar, M. K. (1965). Formulas to improve Wald’s approximation for some properties of sequential tests. J. Roy. Statist. SOC. Ser. B. 27 74-81. [40] [179] Vagholkar M. K. (1955). Application of statistical Decision The0y to Sampling Inspection Schemes. Ph.D. thesis, University of London. [110] [180] Vagholkar M. K. and Wetherill, G. B. (1960). The most economical binomiaE sequential probability ratio test. Biometrika 47 103-109. [106, 110, 1111 [181] Ville, J. (1939). Etude critique de la Notion de collectif. Gauthier-Villars, Paris. [117]
[ 1821 Wald, A. (1947). Sequential Analysis. Wiley, New York. Reprinted by Dover Publications Inc. (1973). [v, 13, 14, 17, 19, 26, 29, 32, 34, 38, 39, 40, 43, 52, 59, 60, 79, 83, 85, 86, 1171
[183] Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20 595-601. [185]
BIBLIOGRAPHY
308
[184] Wald, A. and Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. Ann. Math. Statist. 19 326-339. [48, 1111 [185] Wasan, M. T. (1964). Sequential optimum procedures for unbiased estimation of a binomial parameter. Technometrics 6 259-272. [144, 151, 1521 [186] Weed, H. D., Jr. (1968). Sequential one-sample grouped signed rank tests for symmetry. Ph.D. dissertation, Florida State University. [130] [187] Weed, H. D., Jr. and Bradley, R. A. (1971). Sequential one-sample grouped signed rank tests for symmetry. J. Amer. Statist. Assoc. 66 321-326. [130] [188] Weed, H. D., Jr. and Bradley, R. A. and Govindarajulu, Z. (1974). Stopping time of two rank order sequential probability ratio tests for symmetry based on Lehmann alternatives. Ann. Statist. 2 1314-1322. [131, 1321 [I891 Weiss, L. (1956). On the uniqueness of Wald’s sequential test. Ann. Math. Statist. 27 1178-1181. [46]
[ 1901 Weiss, L. and Wolfowitz, J . (1972). Optimal, fixed-length non-parametric sequential confidence limits for a translation parameter. 2. WahrscheinZichkeitstheorie und Verw. Gebiete 25 203-209. [219] [191] Wetherill, G. B. (1957). Application of the Theory of Decision Functions to Sampling Inspection with Special Reference to Cost of Inspection. Ph.D. Thesis, Univ. of London. [iii] [192] Wetherill, G. B. (1975). Sequential Methods I n Statistics, Chapman and Hall, London. [vii] [193] Wetherill, G. B. and Glazebrook, K. D. (1986). Sequential Methods I n Statistics, 3rd ed. Chapman and Hall, New York. [v]
[1941 Whitehead, J. (1983). The Design and Analysis of Sequential Clinical Trials. Ellis Horwood Ltd. Publishers. Chichester. [viii, 238, 242, 244, 245, 2461 [195] Whitehead, J. and Jones, D. R. (1979). The analysis of sequential clinical trials. Biometrika 66 443-452. I2461 [196] Wiener, N. (1939). The ergodic theorem. Duke Math. J. 5 1-18. [194] [197] Wijsman, R. A. (1960). A monotonicity property of the sequential probability ratio test. Ann. Math. Statist. 31 677-684. Correction: i.b.i.d. 3 (1975) 796. [46]
BIBLIOGRAPHY
309
Wailcoxon, F., Rhodes, L. J. and Bradley, R. A. (1963). Two sequential two-sample grouped rank tests with applications to screening experiments. Biometrics 19 58-84. [127] Wirjosudirdjo, S. (1961). Limiting Behavior of a Sequence of Density Ratios. Ph. D. Thesis, Univ. of Illinois, Urbana. Abstract: Ann. Math. Statist. 33 (1962) 296-297. [127] Wittenberg, H. (1964). Limiting distributions of random sums of independent random variables. 2. Wahrscheinlichkeitstheorie und Verw. Gebiete 3
7-18. [189] Wolfowitz, J. (1947). Efficiency of sequential estimates and Wald’s equation for sequential processes. Ann. Math. Statist. 18 215-230. [23, 152, 154, 155, 1571 Wolfowitz, J. (1966). Remarks on the optimum character of the sequential probability ratio test. Ann. Math. Statist. 37 726-727. [45] Wong, S. P. (1968). Asymptotically optimal properties of certain sequential tests. Ann. Statist. 39 1244-1263. [88] Wooadroofe, M. (1977). Second order approximations for sequential point and interval estimation. Ann. Statist. 5 984-995. [190, 2021 Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis. SOC.for Indust. and Appl. Math., Philadelphia. Pennsylvania. [236] Yahav, J. A. and Bickel, P. J. (1968). Asymptotically optimal Bayes and minimax procedures in sequential estimation. Ann. Math. Statist. 39 442456. [213] Zacks, S. (1966). Sequential estimation of the mean of a lognormal distribution having a prescribed proportional closeness. Ann. Math. Statist. 37 1688-1696. [224]
This page intentionally left blank
Subject Index Asymptotic variance, 228, 229 Asymptotically risk-efficient, 202 Auxiliary problem, 19, 45 Auxiliary SPRT, 53 Average risk, 45 Average sample number (ASN), 2,21, 24, 38, 47
Absolute accuracy, 200 Absorbing barriers, 36 Acceptance rejection, 44 Acceptance sampling, 106 Adaptive rule, 233 Admissible, 2, 152 Anderson’s test, 75 Approximate bounds, 16 Approximations to OC and ASN functions, 36 Arbitrary continuation region, 44 Arbitrary distributions, 247 Arbitrary sequential test, 33 Arithmatic distribution, 87 Armitage’s procedure, 96 ASN curve, 56 ASN function, 2, 32, 36, 44, 83, 85, 102 Asymptotic behavior of the stopping time, 51 Asymptotic consistency, 193, 198, 201, 217 Asymptotic coverage probability, 213 Asymptotic efficiency, 193, 198, 201, 215, 217, 219 sign test, 126 Asymptotic normality, 189 Asymptotic normality of the statistic T, 66 Asymptotic optimality, 152, 192, 198, 201, 217 Asymptotic probability of coverage, 205 Asymptotic properties, 206, 235
B-V procedure, 98 Backward induction, 111, 115, 116 Bacterial density, 229 BAN (best asymptotically normal) estimates, 79 Barnard’s criteria, 65, 73 Barnard’s sequential t and t2-tests, 65 Barnard’s versions, 66 Bartlett’s procedure, 81 Bayes binomial SPRT, 106 Bayes risk, 45 Bayes sequential procedure, 106 normal mean, 115 Bayes Theorem, 107 Bayesian criterion of optimality, 144 Behren-Fisher problem, 84 Bernoulli case, 177 Bernoulli data, 269 Bernoulli population, 20 Bernoulli variable, 22 Best sequential procedure, 171 Best unbiased estimate, 147 Beta family, 112 Binary response, 254 Binomial distribution, 13 311
312 Binomial proportions, 85 Bivariate normal distribution, 81, 208 Boundary values, 83 Bounded length confidence intervals, 196 Bounds for ASN, 33 Brownian motion, 50 Brownian motion process, 135 Cauchy-Schwarz inequality, 154 Central chi-square, 8 Central limit theorem, 96 Characteristic roots, 205, 206 Chebyshev’s inequality, 126 Class of weight functions, 72 Closed boundaries, 47 Coefficient of variation, 171, 200 Completeness, 145, 148, 149 bounded, 145, 148 Composite hypothesis, 59, 75 Composite test, 244 Computer program, 180 Concave function, 52 Confidence coefficient, 158 Confidence interval, 173 Confidence region, 204 Confluent hypergeometric function, 65, 76 Conjugate prior, 112 Consistency property of MLE, 240 Continuation point, 146 Continuation region, 44, 60, 113, 242, 245 Continue sampling indefinitely, 120 Continue sampling region, 17 Continue-sampling inequality, 17, 83, 89 Convergent bounds, 271 Cornish-Fisher inversion, 173 Correction for overshoot, 244 Cost of each observation, 45 Cost of experimentation, 143
SUBJECT INDEX Covariance matrix, 204 Covariance structure, 208 Coverage probability, 176, 191, 196, 211 Cox’s procedure, 81 Cox’s sequential estimation procedure, 289 Cram&-Rao inequality, 144, 148, 152 lower bound, 144 Cumulative sums, 86 Curtailed inspection, 3 Curt ailed sampling double, 148 simple, 147 symmetric, 151 Delta method, 228 Discrete time problem, 116 Dominated convergence theorem, 197 Double minimax property, 71 Double sampling, 173 plan, 3 procedure, 167 Double triangular tests, 242, 243 Double-blind clinical trial, 248, 254 Dynamic programming, 115 equation, 112 method, 111 E-M algorithm, 247 Edgeworth expansion, 172 Effective coverage probability, 253 Effective level of significance, 247, 249, 250, 256 Effective power, 249, 251, 260 Efficiency of the SPRT, 99, 100 Efficient score, 240 Empirical distribution function, 123, 128, 131 Empirical distribution functions, 216, 287 Error probabilities, 2, 9, 16, 24, 40, 45, 46, 48, 49, 58, 71, 73, 83,
SUBJECT INDEX 88, 90, 97, 99, 124, 125, 174, 247 Error rates (see error probabilities), 51 Estimable, 149 Euler-Maclaurin sum formula, 78 Excess over the boundary, 86, 94 Expected sample number, 17 Expected sample size, 27, 99, 211, see also Average sample number (see Average sample number), 2 Expected stopping time, 35 Expected time, 49 Exponential distribution, 44 Exponential family, 11, 144, 145, 177 Exponential family of densities, 237 Exponential survival curves, 85 Extremum, 187 Family of densities, 14 Finite sure termination, 70 Fisher information, 240 Fixed -width confidence intervals, 191 lar ge-sample, 196 Fixed-sample procedure, 5 Fixed-sample procedures, 17, 18 Fixed-sample size procedure, 90, 99, 152, 158, 201 Fixed-sample size test (FSST), 74 Fixed-Size confidence bounds, 203 Fixed-width confidence interval, 246, 252 Fixed-width estimation, 234 Fixed-width interval, 158 Fundamental identity, 30 Generalized likelihood ratio tests, 86, 88 Generalized probability ratio test (GPRT), 173 Generating distributions, 290 Glivenko-Cantelli theorem, 123
313 Govindarajulu-Howard approximation, 73 Group sequential sampling, 111 Grouped rank tests, 127 Hall’s sequential test, 279 Helmert’s transformation, 61, 188 Highest reachable point, 113 Hypergeometric distribution, 237 Hypothesis of symmetry, 126 Hypothesis-testing problem, 11 Impossible point, 146 Indifference zone, 27 Inequalities for error probabilities, 16 Infinitesimal surface element, 59 Initial sample size, 1 Inner acceptance boundary, 134, 135 Intercept, 271 Intercepts, 13 Interchange of integration and summation, 155 Interchange of summation and expectation, 23 Interim stage, 246 Interval of convergence, 149 Invariance considerations, 79 Invariant sufficiency, 127 Inverse binomial sampling, 151 Inverse function, 123 Inversion formula for an integral, 78 Iterations, 227 Jump discontinuity, 130 Khintchine’s theorem, 80 Koopman-Darmois families, 86, 88 Kronecker’s delta, 50 L’Hospital’s rule, 20 Lachin’s formula, 256 Laplace distribution, 125 Laplace’s method, 78
314 Large-sample properties, 79 Large-sample sequential test, 85 Law of the iterated logarithm, 122 Least squares estimate, 204 Lehmann alternatives, 127 Lethal dose, 229 Level of significance, 11 Likelihood ratio, 127 Likelihood ratio open-ended test, 89 Linear hypothesis-testing problem, 75 Log likelihood, 240 Log rank statistic, 240 Logistic distribution function, 58 Lower boundary, 93 Lower bounds for the ASN, 33 Lower confidence limit, 179 Lower stopping bound, 46 m-dimensional Euclidean space, 2 Mann-Whitney statistic, 216 Mant el-Haenszel statistic, 238 Mantel-Haenszel test, 240, 264 Matlab, 265 Maximum ASN, 102 Maximum likelihood estimake (mle) , 185, 240 Maximum likelihood estimakion, 191 Maximum likelihood SPRT,,81 Mean, 6 Means procedure, 219 Measure theory, 196 Median, 126 Method of maximum likelihood, 247 Method of weight functions, 59, 79 Mill’s ratio, 49 Minimax, 152 Minimax solution (MMS), 166 Minimum probability ratio test (MPRT), 74 Minimum variance unbiased estimate, 216 MLE, 232
SUBJECT INDEX Modified likelihood ratio, 62 Modified likelihoods, 60 Modified rank order SPRT, 132 Modified SPRT, 49 Moment-gener ating function, 2 2, 149 Moments of N, 21 Monotone likelihood ratio (MLR), 46, 173, 176 Monotonicity of the power function, 46 Monotonicity property, 46 Monte Carlo results, 202 Monte Carlo studies, 121, 127, 215 Most powerful fixed-sample procedure, 100 Most powerful test, 11, 126 Natural parameter space, 89 Newton-Raphson algorithm, 78 Nominal error probability, 16 Nominal level of significance, 260 Nominal power, 260 Non-linear unbiased estimates, 204 Nonparametric confidence intervals p-point of a distribution function, 210 Normal response, 247 Nuisance parameter, 59, 75, 79, 81, 170, 191, 240
OC and ASN functions of T, 74 OC function, 32, 36, 44, 55, 56, 83, 85, 93, 98 monotonicity, 103 symmetry, 103 One-sided alternative, 8 Open-ended tests, 86, 88 Operating characteristic (OC), 2 curve, 20 function, 19, 47 Optimal properties of SPRT, 44 Optimal sequential procedure, 2
SUBJECT INDEX Optimum fixed-sample size test, 99, 102, 105 Optimum property of the SPRT, 102, 106 Optimum property of Wald’s SPRT, 46 Optimum SPRT, 59 Optimum test procedure, 110 Overshoot, 246 Percentage error, 173 Perfect information, 114 Pitman-efficiencies, 215 Positive drift, 86 Posterior probability, 107 Power curve of the SPRT, 53 Power function, 58 Power of the SPRT, 53 Power of the test, 6 Predetermined alternative, 247 Prior distribution, 59 Probability of absorption, 49 Probability of type I error, 53 Properties of sequential t-test, 70 Properties of the SPRT, 61 Property M, 46 Proportional accuracy, 200 Radon-Nikodym derivative, 2 14 Random sample size, 25 Random walk, 36, 44, 82 Randomized design, 254 Range, 187 Rank order, 127, 130 Rank order SPRT, 287 Rank sum test, 127 Rank test procedure, 123 Rao-Blackwell Theorem, 146 Regression coefficients, 203 Regularity assumptions, 211 Rejection boundary, 133 Rejection rule, 244 Reparameterize, 228
315 Repeated significance tests, 236, 244 Response, 230 Restricted SPRT, 47, 271 Risk, 45 Risk function, 143 Risk-efficient estimation, 201 Robbins’ power one test, 285 Robbins-Monro procedure, 2 27 Robbins-Monro process, 262 Rushton’s approximation, 63 Rushton’s t-test, 273 Sack’s criterion, 73 Sack’s version of the sequential t, 73 Sample information, 114 Sample paths, 46 Sample quantiles, 184 Sample size functions, 173 Sampling continuation region, 62, 77, 79 Sampling inspection plan, 3 most economical, 106 Sampling plan, 148 curtailed, 4 double, 3 inverse, 148 single, 4, 148 Sampling process, 42 Scoring procedure, 109 Scoring scheme, 109 SCR-t est modified (MSCR), 127 Semi-orthogonal matrix, 207 Sequential binomial test, 60 Sequential chi-square test, 60 Sequential configural rank test (SCRtest), 127 Sequential design problem, 1 Sequential estimation, 143, 190 Sequential estimation procedures, 156 Sequential F-test, 75, 77
SUBJECT INDEX
316 Sequential likelihood ratio procedure, 84, 85 Sequential model, 144 Sequential probability ratio test see also SPRT, 11 truncated, 40 Sequential procedure, 13, 144 Sequential procedures, 1, 11 Sequential rank test Kolmogorov-Smirnov test, 123 Sequential ranks, 137 Sequential rule, 5, 192 Sequential sampling group, 111 unit-step, 111 Sequential sign test, 124 Sequential sign test (SST), 125 efficiency, 126 Sequential signed rank test for symmetry, 132 Sequential Spearmen-Karber procedure, 263 Sequential stopping rule, 183 Sequential t and t2-test, 61, 65 Sequential t-test, 62, 71, 275 Sequential t2-test, 71, 277 Sequential test, 37 Sequential test T, 73 Sign test, 211, 215 Simple alternative, 45 Simple exponential distribution, 12 Simple hypothesis, 45 Simulation studies, 65 Simulation study, 177 Single parameter family of distributions, 196 Single sampling plan, 4 Slope, 271 Slowly varying function, 52 Spearman-Karber estimator, 23 1 Specified alternative, 260 SPRT, 13, 36, 55, 59, 109, 125, 127,
128, 131, 246, 269 difficulty, 47 limiting relative efficiency, 105 restricted, 47 truncated, 40 SPRT confidence interval fixed length, 178 SPRT fixed-width confidence interval, 179 SPRTconfidence interval, 175
SPRTs one-sided, 88 simultaneous, 88 SSR statistic, 137 SST, 127 Standard deviation, 6 Standard normal distribution, 249 Standardized average, 211 State of nature, 45 Stein’s test, 8, 74 Stein’s two stage procedure, 10 Stein’s two-stage procedure, 6,73,281 Stein’s two-stage test, 283 Stochastically larger, 208 Stopping bounds, 91 Stopping point, 146, 147 Stopping points, 144, 155 Stopping rule, 143, 145 Stopping rules, 190 Stopping time, 51, 87, 119, 189 Stopping time of the SPRT, 99 Stopping times, 129 Stopping variable, 26, 128, 196, 209 Straight line boundaries, 47 Strength of test T, 74 Strong law of large numbers, 50, 188 Strong represent at ion, 187 Student’s hypothesis, 6 Student’s t, 159 Successive approximation method, 171 Sufficient sequence, 144 Sufficient statistics, 98
317
SUBJECT INDEX Survival analysis, 238 Survival time, 238 Symmetric procedures, 48 t-distribution, 6, 9 t-test, 215 Taylor’s series, 149 Temporary confidence interval, 179 Temporary confidence intervals, 175 Terminal decision rule, 2, 143 Termination of the experiment, 1 Test with power one, 120 Tests for three hypotheses, 90 Tests of small error probability, 119 of power one, 117 Three point binomial, 111 Tolerance distribution, 232 Total sample size, 10 Transcendental equation, 64 Transitive sequence, 145 Transitivity, 145 Translation shifts, 229 Triangular matrix, 207 Triangular test, 242 Truncated process, 41 Truncated sequential test procedure, 44 Truncated SPRT, 40, 58 Truncation of the procedure, 29 Truncation point, 48 Two point binomial, 111 Two-sample (or two-stage) test, 7 Two-sample case, 9 Two-sided alternative, 8, 60, 96 Two-sided test, 135 Two-stage procedure, 162, 167 for common mean, 164 Stein’s, 158 Two-stage sampling scheme, 164 Type I error probability, 254 U-statistics, 214, 219 one-sample, 214
Uniform asymptotic expansion, 63 Uniform continuity in probability, 182 Uniformly best test, 2 Uniformly consistent solution (UCS),
166 Uniformly integrable, 212 Uniformly minimum variance unbiased estimator (UMVU), 165, 167 Uniformly most powerful unbiased test, 6 Unimodal distributions, 126 Uniqueness, 149 Uniqueness of a SPRT, 46 Unit-step sequential sampling, 111 Up and down method, 230 Upper boundary, 93 Upper confidence limit, 179 Upper percentage point, 10 Upper stopping bound, 46 Wald approximation to the boundary values, 33, 178 Wald’s approximation to OC, 36 to the ASN, 52 Wald’s bounds, 178 Wald’s criteria, 65, 73 Wald’s equation, 154 Wald’s fundamental identity, 29 Wald’s inequalities, 53 Wald’s method of weight functions, 75 Wald’s second equation, 28 Wald’s SPRT, 2, 13, 15, 24, 44, 45, 54, 55, 61, 74, 84, 121, 173 asymptotic behavior of error rate and ASN, 51 limiting relative efficiency, 105 Wald’s stopping bounds, 74 Wald’s Theorem, 26 Weak law of large numbers, 250
318 Wiener process, 47, 48, 115, 133 Wilcoxon test one-sample, 213, 214 signed rank sequential procedure, 136 signed rank statistic, 134
SUBJECT INDEX