This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
p ( 0 ) ,/n= l , 2 , - - - , n ! PCn n(0)N}) F ( P> >P n! Note that at least one of the n\ permutations (i.e., p(1), p(2)', • • -,p("!)) is exactly equal to p(0). See Appendix 8.1 for the source code which obtains all the permutations. Thus, the above three probabilities can be computed. The null hypothesis H0 : p = 0 is rejected by the one-sided test if P(p < p(0)) or P(p > p(0)) is small enough. This test is denoted by / in Monte Carlo experiments of Section 8.3.1.
n/- ^ -(OK
P(f> < P )
448
CHAPTERS. INDEPENDENCE BETWEEN TWO SAMPLES
Remark 1: The sample covariance between X and Y, SXv, is rewritten as:
The sample means X and Y take the same values without depending on the order of X and Y. Similarly, S\ and S\ are also independent of the order of X and Y. Therefore, p depends only on £"=] XiYt. That is, for the empirical distribution based on the n\ correlation coefficients, p is a monotone function of £"=1 XtYi, which implies that we have one-to-one correspondence between p and £"=1 X,F;. To test no correlation between X and Y, we need to compute £"=i ^7; but we do not have to compute p. Thus, by utilizing £"_, XjF, rather than p, computational burden can be reduced. Remark 2: As for a special case, suppose that (X,-, Yt), i = 1 , 2, • • • , n, are normally distributed, i.e.,
Define T = p V« - 2/ i/l -p2, which is based on the sample correlation coefficient p, Under the null hypothesis % : p = 0, the statistic T is distributed as the following t distribution: 7 = ^-^-2), which derivation is discussed in Appendix 8.2. This test is denoted by t in Monte Carlo experiments of Section 8.3.1. Note that we cannot use a / distribution in the case of testing the null hypothesis HO : p = po- For example, see Lehmann (1986), Stuart and Ord (1991, 1994) and Hogg and Craig (1995). Generally, it is natural to consider that (X, Y) is non-Gaussian and that the distribution of (X, Y) is not known. If the underlying distribution is not Gaussian and the / distribution is applied to the null hypothesis H0 : p = 0, the appropriate testing results cannot be obtained. However, the nonparametric permutation test can be applied even in the non-Gaussian cases, because it is distribution-free. Under normal population and the assumption of p = 0, T has the t distribution with n-2 degrees of freedom. However, as a generalization, it is shown that the distribution of T under -1 < p < 1 is quite complicated, i.e., the conditional density of T given if = (p/ V ~P 2 ) VZfcite ~*)2/o> follows a noncentral t distribution with n - 2 degrees of freedom and noncentrality parameter i// = (p/ ^\ - p2) VEILiC*; ~'x)2/(rx, where jc, denotes a realization of Xj. The conditional density of T given i//, i.e., the noncentral t distribution, is also derived in Appendix 8.2. Remark 3: Let /fa, be the ranked data of Xt, i.e., Rxt takes one of 1, 2, • • - , « . Similarly, Ryj denotes the ranked data of Yj, i.e., Ryt also takes one of 1, 2, • • -, n.
8.2. NONPARAMETRIC TESTS ON INDEPENDENCE
449
Define the test statistic SQ as follows: J - a(Rx))(a(Ryi) - a(Ry))
(o.i)
where a(-) represents a function to be specified. a(R x) and a(Ky) are given by a(Rx) = (\/ri) 2"=, a(^) and afty) = (1/w) £|L, a(fly;). The above statistic (8.1) is equivalent to the product of a(Rx,) and a(/?y,-), i.e., £"_, o(Rxi)a(Ryi), in the sense of the test statistic. When a(z) = z is specified, s0 is called the rank correlation coefficient, where z takes Rxt or Ryt. Representatively, a(-) is specified as follows: a(RXi) = c
^
s
c(pXi),
where pxi and pyi are defined as pxi = (Rxi — 0.5)/« and py,- = (/?y, — Q.5)/n, and c(-) is an inverse of a distribution function. Thus, we can consider the correlation coefficient based on the score function c(-), called the score correlation coefficient. c(-) may be specified as the inverse of the standard normal distribution function, the logistic distribution function, the uniform distribution function and so on. Especially, when c(-) is specified as the inverse of the uniform distribution function between zero and one (i.e., when c(x) = x for 0 < x < 1, c(x) = 0 for x < 0 and c(x) = 1 for x > 1 ), the .?o in the case where a(Rxi) and a(Ryi) are replaced by c(pxi) and c(py/) is equivalent to the rank correlation coefficient. In Section 8.3.1, the inverse of the uniform distribution between zero and one (i.e., w), that of the standard normal distribution function (i.e., ns), that of the logistic distribution function (i.e., Is) and that of the Cauchy distribution function (i.e., cs) are utilized for <:(•). The t in the case where X, and Yt are replaced by Rxt and Ry\ is asymptotically distributed as the standard normal distribution. This test is denoted by aw in Monte Carlo experiments of Section 8.3.1, where the empirical sizes and sample powers are obtained using the t distribution with n - 1 degrees of freedom, even though an asymptotic distribution of aw is the standard normal distribution. Remark 4: In the case where we perform the permutation test on the correlation coefficient, we need to compute the n\ correlation coefficients (for example, n\ is equal to about 3.6 million when n = 10). The case of sample size n is n times more computer-intensive than that of sample size n- 1 . Thus, the permutation test discussed in this chapter is very computer-intensive. Therefore, we need to consider less computational procedure. In order to reduce computational burden when n\ is large, it might be practical to choose some of the n\ permutations randomly and perform the same testing procedure. We can consider that all the permutations occur with equal probability, i.e., 1 /«!. Therefore, we pick up M out of the n\ permutations randomly and compute the probabilities P(p < p(0)), P(p =
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
450
p(0)) and P(p > p(0)). For example, if either of P(p > p(0)) or P(p < p(0)) is smaller than a = 0.1, the null hypothesis H0 : p = 0 is rejected with 10% significance level. We examine whether the empirical size depends on M in Section 8.3. Thus, we can choose some out of the «! permutations and compute the corresponding probabilities.
8.2.2
On Testing the Regression Coefficient
Using the exactly same approach as the nonparametric test on the correlation coefficient, discussed in Section 8.2.1, we consider the nonparametric significance test on the regression coefficients. The regression model is given by: •, n,
where the OLS estimator of j3, i.e.,j8, is represented as: P = (X'XT1X'Y =
(8.2)
The notations are as follows: Y=
x=
X2
/M
\Xn where F, denotes the rth element of a n x 1 vector Y and Xt indicates the rth row vector of a n x k matrix X. From the structure of equation (8.2), When Xt in Section 8.2.1 is replaced by (X'X)'lX't, we can see that the same discussion as in Section 8.2.1 holds. As for only one difference, Xt is a scalar in Section 8.2.1 while (X'X)"1^ is a jfc x 1 vector in this section. Therefore, we have n\ regression coefficients by changing the order of Y. This implies that the correlation between any two explanatory variables is preserved. Let/?(m), m = 1,2, • • •,n!, be the n\ regression coefficients and/^*' be the jth element offim\ i.e., frm) = ($m), ff-™\ • • •, jSJj."0)'- Suppose that/^Q) represents the jth element of the regression coefficient vector obtained from the original data series. Under the null hypothesis HO : fa = 0, the empirical distribution of fa, which is the jth element of the OLS estimator of/?, is given by: •'
„Q J
the number of /?<m) which satisfies/?*"0 < f/f, m~ 1,2, • • - , « ! n\ the number of fif which satisfies ftf = ftf, m = \, 2, • • •, n I the number of/^ra) which satisfies^
8.2. NONPARAMETRIC TESTS ON INDEPENDENCE
451
For all j = 1 , 2, • • • , k, we can implement the same computational procedure as above and compute each probability. We can perform the significance test by examining where j^ is located among the n\ regression coefficients. The null hypothesis H0 : f!j = 0 is rejected by the one-sided test if P($j < /J™) or P(fij > fi°}) is small enough. This nonparametric permutation test is denoted by / in Monte Carlo studies of Section 8.3.2. Remark 5: Generally, as for the testing procedure of the null hypothesis H0 : fi = /T, we may consider the nonparametric permutation test between (X'X)~lX't and Yt Xi/3", because fi -/? is transformed into:
J3-/3 = ( X ' X T l X ' Y - f i = (X'X)-1X'(Y - X/3) = JiX'X)'^'^ - Xfi), i=i
which implies that/?-/? is equivalent to the correlation between (X'XJ^X^ and Yj-X^. Remark 6: As for the conventional parametric significance test, the error terms e,, / = 1 , 2, • • • , n, are assumed to be mutually independently and normally distributed with mean zero and variance
*,-«. where a,-,- denotes the jth diagonal element of (X'X)"1. 1? and S2 represent/?* = (]3\, ft, • • -, ft)' and 5 2 = (Y - Xp)'(Y - Xp)l(n - k), respectively. Thus, only when et is assumed to be normal, we can use the t distribution. However, unless et is normal, the conventional t test gives us the incorrect inference in the small sample. As well known, in the large sample, using the central limit theorem ^Jn(J3j - fij) is asymptotically normal when the variance of et is finite. Thus, the case of the large sample is different from that of the small sample. In this chapter, under the non-Gaussian assumption, we examine the powers of the nonparametric tests on the correlation coefficient through Monte Carlo experiments. Moreover, in the regression analysis we examine how robust the conventional t test is, when the underlying population is not Gaussian. The above test is denoted by t in Monte Carlo studies of Section 8.3. 1 . Remark 7: As in Section 8.2. 1 , consider replacing Xtj and Yt by c(pxtj) and c(pyt), where Xiti represents the y'th element of Xt, i.e., Xt = { X i t , X&, • • •, Xiik}. pxij and pyt are defined as pjc,j = (Rxij-0.5)/n and pv; = (flyj-0.5)/n, where RxtJ and /fy, denote the ranked data of Xu and 7,, respectively. That is, X}J, X2j, • • -, XnJ are ranked by size and accordingly Rx^ takes one of the integers from 1 to n for i = 1 , 2, • - • , « .
452
CHAPTERS. INDEPENDENCE BETWEEN TWO SAMPLES
Based on the score function c(-), we can examine if the regression coefficient is zero. In this case, the regression model is written as follows: c(pyi) = P\c(pxi,\) +P2c(pxi,2) + ••• + /3kc(pxitk) + e,. Under the above regression model, we can consider the significance test of /?,. Various score functions can be taken for c(-). In this chapter, the score regression coefficient is denoted by w if the inverse of the uniform distribution between zero and one is taken for the score function <:(•), ns if the inverse of the standard normal distribution is adopted, Is if the inverse of the logistic distribution is taken and cs if the inverse of the Cauchy distribution is chosen. The t in the case where Xt and 7, are replaced by Rxt and Ryt is asymptotically distributed as the standard normal distribution. This test is denoted by aw in Monte Carlo studies of Section 8.3.2. The t distribution with n - 2 degrees of freedom is compared with the test statistic aw. In the case where there is a constant term in the regression model, the constant term may be excluded from the regression equation, which is equivalent to the following procedure: each variable is subtracted from its arithmetic average and transformed into c(-). All the elements of the constant term are ones, say jc,-,i = 1 for i = 1,2, • • •, n. The ranked data Rxt<]), i = 1,2, • • - , « , take the same value. Usually, /?jc,-,i) = n/2 is taken, which implies that we have c(pxi,\) = 0 for all i. Then, it is impossible to estimate p\. Therefore, the constant term should be excluded from the regression equation. In simulation studies of Section 8.3.2, this procedure is utilized, i.e.,/?] is a constant term in Section 8.3.2, each variable of 7, and Xtj for j = 2,3,•• • ,fc is subtracted from its arithmetic average and the following regression equation is estimated:
7=2
where 7 and Xt denote the arithmetic averages of F, and Xtj, i = 1 , 2, • • • , n, respectively. In order to perform the score tests, in Section 8.3.2 we consider the following regression model:
Note that the ranked data of 7,, X^, • • -, Xtji are equivalent to those of 7; - 7, X^ — Xi, • • •> ^u ~ %k- Therefore, pyt and pxtj remain same. Remark 8: In Remark 5, we have considered testing the null hypothesis HQ : ft = /?*. Now we discuss derivation of the confidence interval of /?. The OLS estimator ft is rewritten as: n
P = (X'X)-1X'Y =fi+ (X'X)-lX'e
=ft
8.3. MONTE CARLO EXPERIMENTS
453
By permuting the order of {e,-}"=] randomly, fl is distributed as:
where e*,i = 1 , 2, • • • , « , denote the permutation of ;, 2 = 1 , 2, • • • , n. The last term is interpreted as the correlation between (X'X)~}X'. and e\. Given (X'X^^X1., we permute e; and add/3. Then, we can obtainj8(m), m = 1 , 2, • • • , n\. Based on the n\ regression coefficients, the confidence interval of f)j is constructed by sorting /^m), m = 1 , 2, • • • , n!, by size for j = 1 , 2, • • • , k. For example, taking the 2.5% and 97.5% values after sorting j§'m), m = 1 , 2, • • • , M!, we can obtain the 95% confidence interval of/?/. Thus, the confidence interval of fy is easily obtained for all j = 1 , 2, • • • , k. However, we should keep in mind that the amount of storage is extremely large because a n! x 1 vector is required to sort/^m), m = l , 2 , - - - , n ! . Remember that we need 10! « 3.6 x 106 storages even in the case of n = 10. Remark 9: As discussed in Remark 4, we can reduce computational burden by choosing M out of the «! permutations randomly, where we consider that each realization offfj^, m = 1,2, • • • ,n!, occurs with probability !/«!. Thus, choosing the M permutations and sorting them by size, we can obtain the testing procedure and the confidence interval. By taking some out of all the permutations randomly, it is more practical and realistic to obtain the confidence interval discussed in Remark 8.
8.3 Monte Carlo Experiments 8.3.1
On Testing the Correlation Coefficient
Each value in Table 8. 1 represents the rejection rates of the null hypothesis HO : p = 0 against the alternative hypothesis H\ : p > 0 (i.e., the one-sided test is chosen) by the significance level a = 0.01, 0.05, 0.10, where the experiment is repeated G times, where G = 1 04 is taken in this section. That is, in Table 8. 1 the number of the cases where the correlation coefficient obtained from the original observed data is greater than 1 - a is divided by G = 104. In other words, we compute the probability of P(p < p<0)), repeat the experiment G times for G = 104, and obtain the number of the cases which satisfy P(p < p(0)) > 1 - a out of the G experiments, where p(0) denotes the correlation coefficient computed from the original data. The number of the cases which satisfy P(p < p(0)) > 1 - a are divided by G = 104, and the ratio corresponds to the empirical size for p = 0 or the sample power for p ^ 0. Thus, the ratio of P(p < p(0)) > 1 - a relative to G simulation runs is shown in Table 8.1, where a = 0.10, 0.05, 0.01 is examined. (B), (C), (X), (G), (D), (L), (N), (T), (U) and (Q) indicate the population distributions of (X, Y), which denote the bimodal distribution which consists of two normal distributions 0.57V(1, 1) + 0.5N(- 1 , 0.52),
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
454
Table 8.1: Empirical Sizes and Sample Powers (Ho : p = 0 and H\ : p > 0) n
P 0.0
8 0.5 0.9 0.0 (B) 10 0.5 0.9 0.0 12 0.5 0.9 0.0 8 0.5 0.9 0.0 (C) 10 0.5 0.9 0.0 12 0.5 0.9
« .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
II / .0970 .0484 .0120" .5105 .3471°° .1227°° .9885 .9702 .8400°" .0993 .0501 .0099 .5990 .4329° .1658°°° .9979 .9926 .9522°° .1012 .0500 .0087 .6643 .5086°° .2221°°° .9991 .9971 .9860 .0975 .0483 .0097 .742 1°°° .6293° .3468°°° .9933 .9867 .9262°°° .0987 .0506 .0097 00 .8001 6979000°
.4457°°° .9953 .9927 .9701°°° .1001 .0506 .0106 .8812 .7457°°° .5383°°° .9971 .9947 .9872°°°
w .1113— .0593— .0114 .5302" .3801" .1261 .9699°°° .9296°°° .7155°°" .1024 .0548" .0107 .6110 .4535 .1755° .9898°°° .9734°°° .8674°°° .1018 .0546" .0116 .6734 .5339 .2327° 9967ooo .9913°°° .944! ooo .1094— .0582— .0114 .7514° .6269°° .3239°°° .9887°°° .9743°°° .8773™° .1050' .0543" .0115 .8207° .7161 .437900° .9958 .9909°° .9540°°° .1046 .0555" .0109 .8753 .7952 .5392°°° .9993— .9975' .9812™°
ns .1023 .0510 .0097 .5239 .3636 .1257° .9726°°° .9302°°° .7244°°° .0987 .0519 .0102 .6268— .4671 — .1904 .9922=00 .9803°°° .8908°°° .1039 .0518 .0109 .7019— .5565— .2583" .9973°°° 9945000
.9615°°° .1015 .0507 .0094 .7496°° .623 1°°° .3360°°° .9879°°° 9 ooo 732
.8828°°° .1035 .0505 .0103 .8314 .7338' .4706°°° .9961 .9913°° .9650°°° .1029 .0532 .0107 .8846 .8078— .5840— .9984' .9969 .9857°°°
Is .1027 .0518 .0100 .5264 .3642 .1255° .9726°°° .9283°°° .7137°°° .0988 .0525 .0105 .6309— .4668— .1909 .9928°°° .9790°°° .8840°°° .1042 .0522 .0101 .7051"' .5567— .2584" .9975°°° 9945000 .9596°°° .1022 .0512 .0095 .7502°° .6215°°° .3361°°° .9877°™ .9719°™ .8743°™ .1026 .0506 .0112 .8342 .7343* .4693™° .9961 .9913™ .9632°°° .1025 .0522 .0105 .8842 .8078— .5839— .9984' .9965 .9846°°°
cs .1018 .0495 .0100 .4815°°° .3319°°° .1007°°° .9248°°° .8639°°° .4775°°° .0984 .0503 .0093 .5324™° .3982™° .1091°°° .9553™° .9080™° .4647°°° .0991 .0494 .0098 .5612™° .4392°°° .1706°°° .9735°™ .9330°°° .7745°°° .1006 .0507 .0097 .6812°°° .5642™° .2351™° .9663°°° .9309°°° .6241°°° .0984 .0501 .0100 .7465°°° .6382°°° .2469°°° .9845°°° .9606°°° .6234™° .0980 .0499 .0101 .7781°°° .6765°°° .4030°°° .9930™° .9745°°° .8973°°°
aw .1013 .0487 .0114 .5082 .3384°°° .1261 .9666°°° .9110°°° .7155°°° .0962 .0548" .0123" .5957 .4535 .1949' .9884°°° .9734°°° .8810°°° .1018 .0509 .0125" .6734 .5236 .2435 .9967°°° .9906°°° .9481°°° .1005 .0499 .0114 .7332°™ .5917°°° .3239°°° .9871™° .9677™° .8773™° .0983 .0543" .0127— .8085°°° .7161 .4617°™ .9953 990900 .9586°°° .1046 .0535 .0116 .8753 .7887 .5517 .9993'" .9972 .9822™°
/ .1021 .0531 .0137— .5169 .3640 .1338 .9889 .9733 .8696 .1023 .0542' .0122" .6043 .4461 .1848 .9979 .9928 .9590 .1046 .0526 .0115 .6682 .5225 .2431 .9991 .9971 .9868 .0899°°° .0553" .0226— .7628 .6418 .4157 .9936 .9890 .9577 .0886°°° .0556" .0265"* .8311 .7233 .4936 .9961 .9940 .9851 .0853™° .0560— .0255— .8795 .7900 .5631 .9973 .9961 .9916
455
8.3. MONTE CARLO EXPERIMENTS Table 8.1: Empirical Sizes and Sample Powers —< Continued >— n
P 0.0
8 0.5 0.9 0.0 (X) 10 0.5 0.9 0.0 12 0.5 0.9
« II / .10 .0980 .05 .0481 .01 .0089 .10 .5527°°° .05 .4132°°° .01 .1663°°° .10 .9985 .05 .9956 .01 .9147°°° .10 .1002 .05 .0496 .01 .0088 .10 .6070°°° .05 .46 16°°° .01 .2000°°° .10 1.000 .05 .9995 .01 .9860°°° .10 .1019 .05 .0541' .01 .0098 .10 .7070°°° .05 .5086°°° .01 .2349°°° .10 1.000 .05 .9999 .01 .9988 .10
0.0 .05 8 0.5 0.9 0.0 (G) 10 0.5 0.9 0.0 12 0.5 0.9
.01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
.0987 .0494 .0097 .5350 .3725°° .1309°°° .9912 .9771 .8716°" .1005 .0490 .0114 .6188° .4579°° .1841°°° .9988 .9960 .9626°°° .1017 .0510 .0104 .6843 .5270°°° .2362°°° .9997 .9992 .9907
w .1076" .0571*" .0116 .7913— .6480— .2764— .9958°°° .9876°°° .9062°°° .1073" .0557"* .0094 .8688"' .7608— .4146— .9996°° .9979°°° .9779°°° .1039 .0560— .0110 .9233"* .8485— .5587"' .9998 .9995°° .9956°°° .1094— .0582— .0114 .5325° .3764 .1240°°° .9798°°° .9493°°° .7534°°° .1050' .0543" .0115 .6055°°° .4519°°° .1745°°° .9942°°° .9826°°° .8980°°° .1046 .0555" .0109 .67 13°°° .5252°°° .2326°°° .9986°°° .9954°°° .9632°°°
ns
Is
cs
aw
/
.1020 .0507 .0091 .7872— .6214— .2732— .9960°°° .9869°°° .9002°°° .1057* .0505 .0100 .8818"* .7712— .4383— .9997° .9983°°° .9826°°° .1047 .0524 .0110 .9426— .8688— .5824"* .9999 .9997° .9970°°°
.1006 .0507 .0092 .7860'" .6189— .2686— .9957°°° .9860°°° .8868°°° .1059" .0506 .0101 .8832— .7691 — .4258'" 99970 .9982°°° .9792°°° .1048 .0543" .0104 .9449— .8689— .5737— .9999 .9997° .9965°°°
.1003 .0518 .0104 .6924"* .5492— .1872°°° .970 1°°° .9354°°° .5890°°° .1026 .0516 .0112 .7450— .6333— .1748°°° .9876°°° .9611°°° .5675°°° .0977 .0534 .0111 .7701" .6719— .30 15°°° .9938°°° .9740°°° .8901°°°
.0979 .0481 .0116 .7748— .5975— .2764"' .9953°°° .9828°°° .9062°°° .1009 .0557"' .0118' .8622'" .7608— .4433— .999400 9979000 .9814°°° .1039 .0535 .0116 .9233*" .8419"* .5753"' .9998 .9995°° .9959000
.1273— .0808— .0307— .5992 .4607 .2463 .9986 .9963 .9530 .1245'" .0802'" .0311 — .6765 .5285 .2888 1.000 .9996 .9936 .1208— .0789— .0321'" .7551 .5959 .3370 1.000 1.000 .9993
.1015 .0507 .0094 .5143°°° .3549°°° .1162°°° .9788°°° .9474°°° .7494°°° .1035 .0505 .0103 .6147°° .4541°° .1785°°° 9959000 .9857°°° .9059°°° .1029 .0532 .0107 .6832 .5302°° .2373°°° .9991° .9970°°° .9726°°°
.1022 .0512 .0095 .5132°°° .3512°°° .1145°°° .9780°°° .9440°°° .7310°°° .1026 .0506 .0112 .6126°°° .4509°°° .1747°°° .9958°°° .985 1 000 .8983°°° .1025 .0522 .0105 .6807 .5267°°° .2329°°° .9992 .9973°°° .9685°°°
.1006 .0507 .0097 .4470°°° .3106°°° .0895°°° .9188°°° .8565°°° .4499000
.1005 .0499 .0114 .5090°°° .3395°°° .1240°°° .9767°°° .9353°°° .7534°°° .0983 .0543" .0127— .5897°°° .4519°°° .1921°°° .9935°°° .9826°°° 9099000
.1033 .0549" .0125" .5447 .3875 .1498 .9923 .9791 .8980 .1062" .0553" .0151 — .6304 .4721 .2099 .9988 .9956 .9718 .1074" .0558— .0132— .6906 .5461 .2657 .9997 .9992 .9926
.0984 .0501 .0100 .4878°°° .3674°°° .09 13°°° .9463°°° .8939°°° .4273°°° .0980 .0499 .0101 .5051°°° .3871°"° . 1 304°°° .9624°°° .9110°°° 7499000
.1046 .0535 .0116 .6713°°° .5138°°° .2434°°° .9986°°° .9951°°° .9655°°°
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
456
Table 8.1: Empirical Sizes and Sample Powers —< Continued >P
a
/ .0987 0.0 .0499 .0093 .5700 8 0.5 .4156 .1509 .9896 0.9 .9748 .8772°°° .1022 0.0 .0513 .0106 .6472 .5025 (D) 10 0.5 .2196 .9969 .9935 0.9 .9622°°° .1002 0.0 .0506 .0113 .7127 12 0.5 .5672 .2789 .9993 0.9 .9978 .9870° .10 .0989 0.0 .05 .0492 .01 .0097 .10 .5493 8 0.5 .05 .3895 .01 .1335 .10 .9881 0.9 .05 .9713 .01 .8614°°° .10 .1016 0.0 .05 .0499 .01 .0112 .10 .6266 (L) 10 0.5 .05 .4732 .01 .2010 .10 .9971 0.9 .05 .9932 .01 .9577°° .10 .1020 0.0 .05 .0516 .01 .0118* .10 .6937 12 0.5 .05 .5477 .01 .2618 .10 .9991 0.9 .05 .9978 .01 .9872 »
.10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
w .1094"' .0582— .0114 .5587° .4104 .1503 .9790°°° .9476°°° .7675°°° .1050* .0543" .0115 .6309°°° .4875° .2126 .9930°°° .9825°°° .9024°°° .1046 .0555" .0109 .6975°° .5594 .2692° .9982°° .9943°°° .96 18°°° .1094— .0582"' .0114 .5150°°° .3642°°° .1194°°° .9731°°° .9402°°° .7296°°° .1050* .0543" .0115 .5880°°° .4308°°° .1660°°° .99 13°°° .9788°°° .8782°°° .1046 .0555" .0109 .6486°°° .5059°°° .2203°°° .9979°° .9925°°° .9499°°°
ns
Is
cs
.1015 .0507 .0094 .5399°°° .3834°°° .1386°° .9783°°° .9423°°° .7617°°° .1035 .0505 .0103 .6267° ° .4814° .2048° ° .9936° ° .9839° ° .9094° ° .1029 .0532 .0107 .6974°° .5541° .2687°° .9983°° 9954000 .9658°°°
.1022 .0512 .0095 .5365°°° .3775°°° .1347°°° .9780°°° .9402°°° .7432°°° .1026 .0506 .0112 .6246°°° .4739°°° .1983°°° .9934°°° .9834°°° .8998°°° .1025 .0522 .0105 .6881 °°° .5470°°° .2566°°° .9985°° .9948°°° .9620°°°
.1006 .0507 .0097 .4494°°° .3250°°° .0921°°° .9184°°° .8593°°° .45 1 3°°° .0984 .0501 .0100 .4788°°° .3669°°° .0864°°° .9454°°° .89 15°°° 4,99000
.1015 .0507 .0094 .4989°°° .3414°°° .1122°°° .9736°°° .9334°°° .7262°°° .1035 .0505 .0103 .5904°°° .4283°°° .1690°°° .9932°°° .9823°°° .8894°°° .1029 .0532 .0107 .6524°°° .5036°°° .2254°°° .9986 .9947°°° 959400=
.1022 .0512 .0095 .4962°°° .3369°°° .1101°°° .9729°°° .9292°°° .7095°°° .1026 .0506 .0112 .586 1000 .4252°°° .1646°°° .9933°°° .9801°°° .8786°°° .1025 .0522 .0105 .6489°°° .4985°°° .2198°°° .9984 .9942°°° .954 1°°°
.1006 .0507 .0097 .4265°°° .2963°°° .0856°°° .9082°°° .8446°°° .4359°°° .0984 .0501 .0100 .4537000 .3447°°° .0848°°° .9372°°° .8798°°° .4102°°° .0980 .0499
.0980 .0499 .0101 .4787°°° .3756°°° .1277°°° 9624000 .9048°°° 7409000
.0101 .4733°°° .3578°°° .1199°°° .9578°°° .8984°°° 7 ,94000
aw .1005 .0499 .0114 .5386°°° .3737°°° .1503 .9762°°° .9336°°° .7675°°° .0983 .0543" .0127'" .6164°°° .4875° .2316' .9923°°° .9825°°° .9139°°° .1046 .0535 .0116 .6975°° .5489°°° .2798 .9982°° 9940ooo .9642°°° .1005 .0499 .0114 .4917°°° .3294°°° .1194°°° .9699°°° .9239°°° .7296°°° .0983 .0543" .0127'" .5748°°° .4308°°° .1830°°° 9903000 .9788°°° .8918°°° .1046 .0535 .0116 .6486°°° .4939°°° .2294°°° 997900 9924ooo .9534°°°
t .0975 .0507 .0114 .5721 .4139 .1494 .9903 .9770 .9022 .1007 .0516 .0128'" .6492 .4992 .2211 .9969 .9940 .9691 .0991 .0519 .0129"* .7135 .5678 .2815 .9994 .9981 .9897 .0984 .0486 .0103 .5505 .3896 .1377 .9888 .9738 .8883 .1019 .0515 .0117' .6269 .4748 .2083 .9971 .9936 .9647 .1016 .0526 .0119' .6942 .5483 .2647 .9991 .9982 .9895
8.3. MONTE CARLO EXPERIMENTS
457
Table 8.1: Empirical Sizes and Sample Powers —< Continued >— n P 0.0 8 0.5 0.9 0.0 (N) 10 0.5 0.9 0.0 12 0.5 0.9 0.0 8 0.5 0.9 0.0 (T) 10 0.5 0.9 0.0 12 0.5 0.9
a .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
/ .0970 .0487 .0101 .5303 .3675 .1216°° .9896 .9687 .8402°°° .1019 .0504 .0100 .6151 .4540 .1848 .9970 .9917 .95 17"°° .1033 .0493 .0096 .6820 .5379 .2542 .9992 .9977 .9844 .0997 .0486 .0110 .5855 .4278 .1640 .9910 .9778° .8796°°° .0994 .0487 .0105 .6665 .5200 .2298 .9973 .9929 .9628°° .1003 .0499 .0092 .7359 .5917 .3001 .9989 .9977 .9894
w .1021 .0561'" .0108 .4875°°° .3420°°° .1097°°° .9688°°° .9235°°° .6972°°° .1009 .0495 .0100 .5534°°° .4022°°° .1554°°° .9910°°° .9730°°° .8560°°° .1014 .0514 .0097 .6269°°° .4812°°° .2081°°° .9965°°° .9913°°° .9398°°° .1053* .0550" .0108 .5585°°° .4121°° .1481°° .9764°° .9476°° .7680°° .0974 .0501 .0108 .6356°°° .4854°°° .2029°°° .9926°°° .9818°°° .9003°°° .0966 .0509 .0084 .7048°°° .5655°°° .2701°°° .9984 .9948°°° .9653°°°
ns .0960 .0492 .0096 .4709°°° .3206°°° .1046°°° .9678°°° .9193°°° .7064°°° .0973 .0497 .0093 .5602°°° .40 15°°° .1549°°° .9911°°° .9758°°° .8746°°° .1019 .0503 .0101 .63 19°°° .4819°°° .2169°°° .9975°°° .9932°°° .9527°°° .0986 .0468 .0120" .5452° ° .3924° ° .1426° ° .9772° ° .9441° ° .7653° .0942° .0470 .0100 .6398°°° .4846°°° .205 1°°° .9928°°° .9837°°° .9142°°° .0961 .0478 .0094 .7145°°° .5640°°° .2774°°° .9988 .9959°°° .9728°°°
Is .0972 .0489 .0097 .4709°°° .3174°°° .1026°°° .9672°°° .9149°°° .6908°°° .0963 .0497 .0089 .5595°°° .3979°°° .1528°°° .9910°°° .975 1°°° .8634°°° .1019 .0508 .0100 .6309°°° .4795°°° .2133°°° .9978°° .9928°°° .9476°°° .0976 .0473 .0121" .5397°°° .3907°°° .1405°°° .9772°°° .94 16°°° .7520°°° .0953 .0466 .0096 .6366°°° .4774°°° .2020°°° .9928°°° .9828°°° .9050°°° .0956 .0481 .0102 .7126°°° .5636°°° .2695°°° .9988 .9955°°° .9684°°°
cs
.0989 .0490 .0089 40990=0 .2832°°° .0786°°° .9077°°° .8387°°° .4349°°° .1029 .0512 .0100 .4498°°° .3322°°° .08 15°°° .9332°°° .8767°°° .4160°°° .1015 .0530 .0103 .4727°°° .3596°°° .1218°°° .9530°°° .8953°°° .7232°°° .1008 .0524 .01 17' .4697°°° .34 17°°° .1003°°° .9281°°° .8698°°° .4815°°° .0974 .0499 .0098 .5090°°° .3840°°° .0979°°° .9516°°° .9025°°° .4544°°° .0983 .0473 .0090 .5236°°° .4134°°° .1558°°° .9690°°° .9249°°° .7739°°°
aw .0933°° .0487 .0108 .4646°°° .3066°°° .1097°°° .9643°°° .9073°°° .6972°°° .0942° .0495 .0114 .5390°°° .4022°°° .1692°°° .9900°°° .9730°°° .8709°°° .1014 .0485 .0104 .6269°°° .4709°°° .2175°°° .9965°°° .9908°°° .9453°°° .0962 .0452°° .0108 .5353°°° .3756°°° .1481°°° .9732°°° .9348°°° .7680°°° .0900°°° .0501 .0121" .61 86°°° .4854°°° .2199°° .9920°°° .9818°°° .9119°°° .0966 .0481 .0093 .7048°°° .5540°°° .2820°°° .9984 .9944°°° .9674°°°
/ .0977 .0494 .0092 .5305 .3722 .1322 .9892 .9712 .8727 .1023 .0509 .0106 .6156 .4580 .1908 .9972 .9920 .9599 .1040 .0493 .0096 .6806 .5383 .2583 .9992 .9977 .9862 .0998 .0496 .0112 .5884 .4301 .1716 .9918 .9814 .9107 .0985 .0505 .0115 .6717 .5221 .2350 .9973 .9933 .9679 .0995 .0502 .0119' .7349 .5956 .3040 .9989 .9980 .9911
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
458
Table 8.1: Empirical Sizes and Sample Powers —< Continued >— n
P
0.0 8 0.5 0.9 0.0 (U) 10 0.5 0.9 0.0 12 0.5 0.9 0.0 8 0.5 0.9 0.0
(Q) 10 0.5 0.9 0.0 12 0.5 0.9
<* .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
II / .1015 .0524 .0092 .5145 .3528 .1198 .9829 .9598° .8234°°° .1054' .0509 .0104 .6044 .4394 .1751°° .9961 .9894 .9392°°° .1046 .0536' .0110 .6702 .5155 .2285° .9984 .9967 .9804 .0976 .0504 .0103 .5210 .3534 .1205° .9853 .9659 .8310°°° .1022 .0516 .0091 .6077 .4431 .1790 .9964 .9906 .9432°° .0970 .0494 .0105 .6810 .5325 .2371 .9993 .9975 .9818
w
.1094'" .0582*" .0114 .4944°°° .3433°° .1086°°° .9592°°° .9134°°° .6801 °°° .1050' .0543" .0115 .5621°°° .4053°°° .1506°°° .9858°°° .9625°°° .8381°°° .1046 .0555" .0109 .6226°°" .4773°°° .1957°°° .9942°°° .9861°°° .9220°°° .1066" .0534 .0127"' .4779°°° .3308°°° .1015°°° .9632°°° .9168°°° .6780°°° .1055' .0528 .0091 .5452°°° .3907°°° .1417°°° .9864°°° .9687°°° .8492°°° .0970 .0514 .0095 .61 17°°° .4655°°° .1867°°° .9958°°° .9886°°° .9309°°°
ns .1015 .0507 .0094 .4990°° .3426°° .1185° .9627°°° .9196°°° .7121°°° .1035 .0505 .0103 .5910° .4399 .1788° .9898°°° .9726°°° .8755°°° .1029 .0532 .0107 .6704 .5243 .2423 .9972°° .9921°°° .9523°°° .0984 .0486 .0103 .4735°°° .3195°°° .1017°°° .9665°°° .9178°°° .6954°°° .1039 .0511 .0099 .5628°°° .4031°°° .1603°°° .9894°°° .9762°°° .8726°°° .0965 .0488 .0102 .6389°°° .4880°°° .2097°°° .9973°°° .9921°°° .9516°°°
Is .1022 .0512 .0095 .5047 .3458° .1223 .9640°°° .9196°°° .7078°°° .1026 .0506 .0112 .5990 .4460 .1853 .9907°°° .9735°°° .8754°°° .1025 .0522 .0105 .6785 .5314' .2507' .9974° .9925°°° .9526°°° .0990 .0499 .0102 .4766°°° .3186°°° .1019°°° .9655°°° .9146°°° .6832°°° .1022 .0530 .0092 .5635°°° .4035°°° .1615°°° .9898°°° .9747°°° .8689°°° .0959 .0477 .0110 .6418°°° .4909°°° .2108°°° .9973°°° .9927°°° .9492°°°
cs .1006 .0507 .0097 .4827°°° .3392°°° .1156°° .9254°°° .8671°°° .5120°°° .0984 .0501 .0100 .5544°°° .4207°°° .1384°°° .9594-0 .9209°°° .5253°°° .0980 .0499 .0101 .5980°°° .4775°°° .2033°°° .9753°°° .944 1°°° .8107°°° .0982 .0496 .0113 .4349°°° .3000°°° .0889°°° .9083°°° .8473°°° .4608°°° .1034 .0510 .0122" .4895°°° .3584°°° .1036°°° .9449°°° .8972°°° .464 1000 .0973 .0472 .0092 .5216°°° .4014°°° .1485°°° .9626°°° .9232°°° .7666°°°
aw .1005 .0499 .0114 .4702°°° .3053°°° .1086°°° .9529°°° .8946°°° .6801000 .0983 .0543" .0127"' .5448°°° .4053°°° .1655°°° .9838°°° .9625°°° .8495°°° .1046 .0535 .0116 .6226°°° .4673°°° .2059°°° .9942°°° .9856°°° .9270°°° .0956 .0446°° .0127"' .4576°°° .2954°°° .1015°°° .9595°°° .8990°°° .6780°°° .0978 .0528 .0108 .5289°°° .3907°°° .1580°°° .9853°°° .9687°°° .8633°°° .0970 .0485 .0104 .6117°°° .4536°°° .1969°°° .9958°°° .9878°°° .9349°°°
t .1008 .0520 .0100 .5153 .3583 .1268 .9838 .9646 .8584 .1048 .0516 .0117' .6040 .4420 .1889 .9962 .9900 .9505 .1044 .0537' .0121" .6693 .5181 .2399 .9985 .9968 .9833 .0988 .0503 .0116 .5218 .3597 .1294 .9855 .9697 .8685 .1029 .0519 .0091 .6078 .4469 .1879 .9964 .9911 .9512 .0968 .0495 .0112 .6815 .5316 .2434 .9993 .9977 .9847
8.3. MONTE CARLO EXPERIMENTS
459
the Cauchy distribution (?r(l + Jt2))"1, the chi-square distribution with one degree of freedom ;^2(1) - 1, the Gumbel (extreme- value) distribution exp(-;t + a) exp(-e~*+a) for a - -.577216, the double exponential (LaPlace) distribution -0.5 exp(-|jt|), the logistic distribution e~*(\ + e~*)~2, the standard normal distribution A^O, 1), the t distribution with three degrees of freedom f(3), the uniform distribution U(-2, 2), and the quadratic distribution 3 V5(5 - *2)/100, respectively. We consider the same distribution functions taken in Section 7.4. For all the distributions, however, mean is assumed to be zero in this section. The standard error of the empirical power, denoted by p, is obtained by -y/p(l - p)/G. For example, when p = 0.5, the standard error takes the maximum value, which is 0.005. In this chapter, the random draws of (Xf, 7,) are obtained as follows. Let «,- and V; be the random variables which are mutually independently distributed. Both of u; and v, are generated as (B), (C), (X), (G), (D), (L), (N), (T), (U) or (Q). Denote the correlation coefficient between X and Y by p. Given the random draws of (ut, v,) and p, (Xt, Yi) is transformed into:
i This transformation gives us the following features: (i) variance of X is equal to that of Y and (ii) the correlation coefficient between X and Y is p. In the case of the Cauchy distribution the correlation coefficient does not exist, because the Cauchy random variable has neither mean nor variance. Even in the case of the Cauchy distribution, however, we can obtain the random draws of (Xt, Y,) given («,, v,) andp, utilizing the above formula. Using the artificially generated data given the true correlation coefficient p = 0.0, 0.5, 0.9, we test the null hypothesis HQ : p = 0 against the alternative hypothesis H] : p > 0. Taking the significance level a = 0.10, 0.05, 0.01 and the one-sided test, the rejection rates out of G = 104 experiments are shown in Table 8.1. In addition, taking the sample size n = 8, 10, 12, both the nonparametric tests and the parametric t test are reported in the table. As discussed in Table 7.2, each value in the case of p = 0 represents the empirical size, which should be theoretically equal to the significance level a. That is, in the case ofp = Oof Table 8.1, *, "", """, °, °° and 000 represent comparison with the significance level a. Let fa be the sample power of k, where k takes /, w, ns, Is, cs, aw or t. We put the superscript " when (fa - a) I -\JV(fa) is greater than 1 .6449, the superscript ** when it is greater than 1.9600, and the superscript *"* when it is greater than 2.5758, where V(fa) is given by V(fa) = a(] - a)/G under the null hypothesis H0 : p = a, where G = 104. 1.6449, 1.9600 and 2.5758 correspond to 95%, 97.5% and 99.5% points of the standard normal distribution, respectively. We put the superscript ° if (fa - a) I -JV(fa) is less than - 1 .6449, the superscript °° if it is less than - 1 .9600, and the superscript °00 if it is less than -2.5758. Therefore, the values with the superscript " indicate over-estimation of the empirical power and the values with the superscript
460
CHAPTERS. INDEPENDENCE BETWEEN TWO SAMPLES
° represent under-estimation of the empirical power. Moreover, in the case of p = 0.5, 0.9 of Table 8.1, *, ", "', °, °° and °°° indicate comparison with the t test, k takes /, w, ns, Is, cs or aw in this case. We put the superscript " when (pk - p,)/ VV(At) + V(p() is greater than 1.6449, the superscript ** when it is greater than 1.9600, and the superscript *"" when it is greater than 2.5758. The two variances are approximated as: V(pk) ~ pk(\ - pk)/G and V(p,) ~ P/(l - P,)/G, where G = 104. We put the superscript ° if (pk - p,)/ ^/V(pk) + V(pt) is less than -1.6449, the superscript °° if it is less than -1.9600, and the superscript 000 if it is less than -2.5758. Note that in large sample we have the following: (pk - Pi)/ VV(AO + VO<) ~ N(°*1) under the null hypothesis H0 : p = 0 and the alternative one H\ : p 2 0. Therefore, the values with the superscript" indicate more powerful test than the t test. In addition, the number of the superscript * shows degree of the sample power. Contrarily, the values with the superscript ° represent less powerful test than the t test. As it is easily expected, for the normal sample (N), the t test performs better than the nonparametric tests, but for the other samples such as (B), (C), (X), (G), (D), (L), (T), (U) and (Q), the nonparametric tests are superior to the t test. We discuss the cases of p = 0 and those of p ± 0 separately. Empirical Size (p = 0): First, we focus only on the empirical sizes, which correspond to the cases ofp = 0 in Table 8.1. The results of the t test are in the last column of Table 8.1. For the cases ofp = 0 in (C), (X), (G), some cases of (B), and those of (D), we can see that the t test does not work, because the empirical sizes are statistically different from the significance levels. Especially, for t in (C), as a is large, the empirical size increases more than proportionally. We can see that the t statistic of (C) is distributed with much fatter tails than the t distribution with n - 2 degrees of freedom. In the case where the population distribution is (N), t shows a good performance, because t has the t(n — 2) distribution even in small sample. We discuss aw in the case ofp = 0. aw performs better than t with respect to the empirical sizes, because the empirical sizes of aw are close to the significance levels a, compared with the sizes of t. However, it is easily shown from Table 8.1 that /, ns, Is and cs are much better than w, aw and t in the sense of the empirical size. The cases of p - 0 which have neither " nor ° in the superscript are 85 out of 90 (i.e., 3 sample sizes x 3 significance levels X 10 population distributions = 90 cases) for /, 70 for w, 87 for ns, 87 for Is, 89 for cs (only in the case of n = 10, a = 0.01 and (Q), the empirical size of cs is significantly over-estimated), 68 for aw, and 53 for t, respectively. Thus, in the size criterion, cs shows the best performance, but it is not too different from /, ns and Is. Sample Power (p ^ 0): Now, we discuss the cases ofp = 0.5, 0.9, which represent the sample powers. As it is well known, under normality assumption, t gives us the uniformly most powerful test. Therefore, for (N), / should be the best test. As a result,
8.3. MONTE CARLO EXPERIMENTS
461
Figure 8.1: Sample Powers: n — 10 and a = 0.10 (C)
1.0 P
1.0 P
.0 P
0.2
0.4
0.6
f w ns Is cs aw t 0.2
0.4
0.6
0.8
1.0 P
0.8
1.0 P
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
462
Figure 8.1: Sample Powers: n = 10 and a = 0.10—< Continued >— (N)
(L)
1.0 P
1.0 P
1.0 P
0.0
0.2
0.4
0.6
f w
ns Is
cs aw t
0.0
0.2
0.4
0.6
0.8
1.0 P
1.0 P
8.3. MONTE CARLO EXPERIMENTS
463
in the case of (N), t is more powerful than any other tests for all « = 8, 10, 12, p = 0.5, 0.9 and a = 0.10, 0.05, 0.01, i.e., all the values in t are larger than any other tests. Moreover, we obtain the result that / is the best in the sample power criterion when t works well in the empirical size criterion even if the population distribution is not normal. That is, as mentioned above, for p ± 0, " or ° in /, w, ns, Is, cs and aw indicates comparison with the t test. In the cases where t works well forp = 0 and a = 0.1, 0.05, 0.01 (i.e., n = 12 of (B), n = 8 of (D), n = 8 of (L), n = 8,10,12 of (N), n = 8,10 of (T), n = 8 of (U) and n = 8,10,12 of (Q)), t shows the most more powerful test from Table 8.1, i.e., in this case, there are a lot of values which have ° in the superscript. When t works well in the empirical size criterion, t is more powerful than / but t is not too different from /, i.e., / is also quite powerful. Even in the case where t does not have an appropriate empirical size, we see the result that /, ns, Is and cs show quite good performances. When we compare /, ns, Is and cs in this situation, we can find that cs is much less powerful than /, ns and Is, although cs shows the best performance in the size criterion. /, ns and Is are more powerful than the other tests, and / is slightly less powerful than ns and Is. Thus, /, ns and Is are relatively better than the other tests in both empirical size and sample power criteria. Even when / works well in the empirical size criterion, / is better than w, ns, Is, cs and aw (see n = 12 andp = 0.9 of (B), n = 8 of (D), n = 8 of (L), (N), n = 8,10 of (T), « = 8 of (U) and (Q) in TableS. 1). Taking the case of n = 10 and a — 0.10, Figure 8.1 represents the sample powers for p - 0.0,0.1, • • • , 0.9. In the case of n = 10, a - 0.10 and p = 0.0, 0.5, 0.9, the values in Table 8.1 are utilized to draw Figure 8.1. For each population distribution, the features of the nonparametric and parametric tests are summarized as follows: (B) It is difficult to distinguish all the lines except for cs. cs is not as powerful as the other tests. Forp = 0.7, 0.8, t is the most powerful test and w and aw are less powerful than /, ns, Is and t, but /, w, ns, Is, aw and ; are very similar to each other. (C) We obtain the same results as (B). That is, we cannot distinguish all the lines except for cs. cs is less powerful than any other tests. (X) / is parallel to t, but both are not as powerful as w, ns, Is and aw. cs intersects / and ;. cs is more powerful than / and t for large p, but it is less powerful for small p. In any case, cs is not powerful, either. (G) The sample power of cs is less than any other tests for all p = 0.0,0.1, • • • , 0.9. / and t is slightly better than w, ns, Is and aw. (D) (D) has the exactly same features as (G). That is, cs is inferior to other tests for all p = 0.0,0.1, • • • , 0.9. / and / is slightly better than w, ns, Is and aw. (L) There are three lines to be distinguished. / and / are the upper lines, cs is the lower line, and w, ns, Is and aw are in the middle.
464
CHAPTERS. INDEPENDENCE BETWEEN TWO SAMPLES
(N) We have the same results as (L). cs shows a poor performance, while / and t are powerful. (T) This case also has the same features as (L) and (N). (U) aw is lower than cs for small p, but not for large p. For p - 0.6, 0.7, 0.8, / and t are more powerful than the other tests. (Q) cs is the lowest, while / and; are the highest. Thus, cs shows a poor performance for almost all the population distributions. / is as powerful as t for all the distributions, and it is the most powerful. Summarizing above, it might be concluded that the permutation-based nonparametric test, /, is useful, because it gives us the correct empirical sizes and is quite powerful even though it does not need to assume the distribution function.
8.3.2
On Testing the Regression Coefficient
In Table 8.2, the testing procedure taken in Section 8.3.1 is applied to the regression analysis. The error term et is assumed to have the bimodal distribution which consists of two normal distributions 0.57V(1,1) + 0.5N(-l,0.52), the Cauchy distribution (;r(l + Jt2))"1, the chi-square distribution with one degree of freedom ^2(1) - 1, the Gumbel (extreme-value) distribution exp(-x + a) exp(-e~x+a) for a = -.577216, the double exponential (LaPlace) distribution -0.5 exp(- x\), the logistic distribution e~x(\ + e~*)~2, the standard normal distribution N(0,1), the ; distribution with three degrees of freedom r(3), the uniform distribution f/(-2,2), and the quadratic distribution 3 V5(5 - *2)/100, which are denoted by (B), (C), (X), (G), (D), (L), (N), (T), (U) and (Q), respectively. Let Xt = ( X u , X 2 j , - - • ,XkJ), where Xu = 1 and (01,02,03,04) = (0.0, 0.0, 0.5, 0.9) are set. Xjti for j = 2,3,4 are generated from be the same distribution as the error term e,, where X^, XT,,, and X4J are assumed to be mutually independent. Under the setup above, we obtain a series of data (F,) from Yj = XJ3 + 61. The regression coefficient estimate is given by ft = Z"=1(X';fr1X,K;, which indicates the sample covariance between (X'X)']Xt and 7,. Therefore, the significance test on the regression coefficient is equivalent to testing whether the correlation coefficient between (X'X)~}Xj and Yt is zero. The sample size is n = 9, 10, 11 and the number of the regression coefficient to be estimated is k = 4. The nonparametric tests are compared with the / test in both empirical size and sample power criteria. Each value in Table 8.2 represents the rejection rate out of G simulation runs, where G = 104. As in Table 8.1, we generate data (F,, Xt) for; = 1,2, • • •, n for each distribution, compute the probability of P(fij < ^) given the generated data, repeat the experiment G times for G = 104, and obtain the number of the cases which satisfy P(0j < ^) > 1 - a out of the G experiments, where jS(;0) denotes the j'th regression coefficient computed from the original data. The number of the cases divided by 104 correspond to the empirical sizes or the sample powers. Thus, the ratio of P($j < j8(;0)) > 1 - a is shown in Table
465
8.3. MONTE CARLO EXPERIMENTS Table 8.2: Empirical Sizes and Sample Powers (H0 : fr = 0 for / = 2,3,4) n /? ft 9
fr
fa
ft
(B) 10
fa
ft
ft
11
ft
&
ft
9
A
ft ft (C) 10
ft
ft ft 11
ft ft
a .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
/ .092 1°°° .0457°° .0095 .3578°°° .2221°°° .0612°°° .7568— .5995— .2773— .0974 .0489 .0087 .3868°°° .2407°°° .07 12°°° .8007"' .6643— .3372— .0954 .0459° .0094 .4061°°° .2686°°° .0883°°° .8380— .7086" .3899"' .0988 .0486 .0098 .4055°°° .2914°°° .1135°°° .5835°°° .48 12°°° .2656°°° .0990 .0491 .0104 .4212°°° .3092°°° .1314°°° .6109°°° .5048°°° .2962°°° .0973 .0467 .0092 .4442°°° .3222"°° .1449°°° .6394°°° .5295°°° .3265°°°
w .1041 .0509 .0098 .3460°°° .2183°°° .0579°°° .6912°°° .5365°° .2329 .0957 .0500 .0095 .3670°°° .2298°°° .0601°°° .7323°°° .5921°°° .27 19°°° .0992 .0494 .0097 .3770°°°
ns
Is
cs
aw
/
.0944° .0469 .0083° .3348°°° .2040°°° .0589°°° .6819°°° .5193°°° .2235°°° .0937°° .0468 .0088 .3681°°° .2272°°° .0629°°° .7350°°° .5845°°° .2725°°° .0972 .0504 .0096 .3799°°° .2453°°° .0783°°° .7771°°° .6338°°° .3240°°°
.0950° .0467 .0087 .3328°°° .2032°°° .0575°°° .6785°°° .5112°°° .2166°°° .0942° .0469 .0087 .3663°°° .2253°°° .0634°°° .7323°°° .5774°°° .2632°°° .0987 .0503 .0095 .3788°°° .2428°°° .0751°°° .7743°°° .6273°°° .3129°°°
.0988 .0477 .0087 .2938°°° .1846°°° .0454°°° .5540°°° .4214°°° .1210°°° .0971 .0475 .0092 .3034°°° .1981°°° .0450°°° .5686°°° .4474°°° .1194°°° .1038 .0504 .0108 .3117°°° .2053°°° .0481°°° .5742°°° .4648°°° .1476°°°
.1018 .0525 .0112 .3578°°° .2241°°° .0662°°° .6226°°° .4608°°° .1880°°° .0980 .0487 .0100 .3916°°° .2490°°° .0729°°° .6892°°° .5353°°° .2352°°° .1043 .0523 .0100 .4183°°° .2774°°° .0926°°° .7412°°° .5975°°° .2954°°°
.0973 .0511 .0125" .4120 .2675 .0842 .7135 .5521 .2423 .1010 .0513 .0101 .4513 .2966 .0988 .7711 .6304 .3042 .1033 .0556" .0110 .4900 .3344 .1205 .8240 .6939 .3703
.0984 .0478 .0087 .3958°°° .2668°°° .1018°°° .5983°°° .47 17°°° .2372°°° .1032 .0522 .0090 .4171°°° .2930°°° . 1 1 34°°° .6367°°° .5113°°° .2705°°° .1005 .0472 .0096 .4443°°° .3173°°° .1339°°° .6586°°° .6705°° .521 r°° .5464°°° .2744°°° .3080°°°
.0989 .0483 .0091 .3956°°° .2674°°° .0998°°° .6007°°° .4740°°° .2360°°° .1025 .0512 .0090 .4214°°° .2954°°° .1158°°° .6401°°° .5140°°° .2703°°° .1012 .0470 .0086 .4473°°° .3172°°° .1363°°° .6718°° .5517°°° .3097°°°
.0986 .0485 .0094 .3733°°° .2544°°° .0778°°° .5719°°° .4411°°° .1735°°° .0985 .0488 .0116 .3945°°° .2759°°° .0801°°° .5951°°° .4709°°° .1725°°° .0976 .0479 .0088 .4033°°° .2924°°° .0965°°° .6188°°° .5031°°° .2120°°°
.0981 .0473 .0094 .4034°°° .2705°° .0896°° .5636°° .4234°° .1898°° .1001 .0478 .0098 .4309°°° .2964°°° .1090°°° .6055°°° .4732°°° .2221°°° .0979 .0494 .0086 .4753°°° .3368°°° .1332°°° .6468°°° .5129°°° .2653°°°
.0965 .0567— .0208— .5008 .4115 .2675 .6414 .5604 .4076 .0952 .0589— .0216— .5195 .4399 .3066 .6682 .5887 .4483 .0946° .0564— .0213— .5433 .4614 .3261 .6853 .6135 .4809
.241 r°°
.0757°°° .7702°°° .6305°°° .3222°°° .1059" .0524 .0098 .3940°°° .2643°°° .0967°°° .5995°°° .4611°°° .2203°°° .1066" .0541* .0090 .4035°°° .2815°°° .1012°°° .6286°°° .4953°°° .2397°°° .1024 .0463° .0091 .4323°°° .2986°°° .1145°°°
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
466
Table 8.2: Empirical Sizes and Sample Powers —< Continued >— n j8 ft 9
A A
ft (X) 10
A A
ft 11
ft A
ft 9 ft
ft ft (G) 10
ft A
ft 11
ft ft
« || / .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
.0981 .0504 .0102 .3415°°° .2220°°° .0677°°° .6428°°° .5114°°° .2489°°° .1025 .0512 .0111 .3599°°° .2431°°° .0730°°° .6770°°° .5532°°° .2846°°° .0990 .0477 .0104 .3797°°° .2502°°° .0791°°° .7053°°° .587 1°°° .3237°°° .0974 .0479 .0086 .3645°°° .2302°°° .07 14°°° .7364— .5900— .2924— .1005 .0516 .0097 .3910°°° .2551°°° .0823°°° .7758"* .6375" .3472 .1033 .0501 .0090 .4135°°° .2751°°° .0939°°° .8063 .6865 .4016
w .1026 .0521 .0109 .3652°°° .2275°°° .0689°°° .6296°°° .4804°°° .21 18°°° .1035 .0528 .0093 .3754°°° .2424°°° .0755°°° .6562°°° .5 1 77°°° .2406°°° .0998 .0477 .0089 .3980°°° .2540°°° .0873°°° .6876°°° .5463°°° .2772°°° .1061" .0545" .0104 .3518°°° .2278°°° .0668°°° .6829°°° .5275°°° .2345°°° .1052' .0552" .0098 .3668°°° .2401 °°° .0744°°° .7169°°° .5746°°° .2760°°° .1015 .0488 .0092 .39 11°°° .2556°°° .0880°°° .7569°°° .6219°°° .3239°°°
ns .1017 .0488 .0093 .3558°°° .2239°°° .0650°°° .6239°°° .4748°°° .21 17°°° .1031 .0499 .0088 .3745°°° .2391°°° .0766°°° .6605°°° .5173°°° .2523°°° .0972 .0468 .0097 .4013°°° .2647°°° .0880°°° .6935°°° .5590°°° .2828°°° .1000 .0514 .0106 .3458°°° .2197°°° .0639°°° .6741°°° .5248°°° .2334°°° .1022 .0508 .0093 .3698°°° .2400°°° .0765°°° .7222°°° .5795°°° .2870°°° .0989 .0497 .0081° .3920°°° .2603°°° .090 1 000 .7676°°° .6375°°° .3353°°°
Is .1013 .0485 .0093 .3545°°° .2214°°° .0636°°° .6215°°° .4714°°° .2101°°° .1038 .0493 .0087 .376 1°°° .2364°°° .0752°°° .6611°°° .5148°°° .2459°°° .0976 .0469 .0093 39?0ooo
.2631°°° .0877°°° .6925°°° .5569°°° .2782°°° .1000 .0514 .0098 .3462°°" .2174°°° .062 1°°° .6747°°° .5231°°° .2314°°° .1028 .0519 .0090 .3691°°° .2381°°° .0738°°° .7209°°° .5763°°° .2823°°° .0991 .0490 .0080°° .3952°°° .2591°°° .0883°°° .7653°°° .6316°°° .3293°°°
cs .1009 .0526 .0089 .3072°°°
.1968°°° 0479000
.5329°°° .4036°°° . 1 244°°° .1031 .0478 .0095 .3119°°° .2007°°° .0510°°° .5487°°° .4193°°° . 1 1 52°°° .0974 .0480 .0111 .3173°°° .2153°°° .0566°°° .5604°°° .442 1000 .1424°°° .1035 .0521 .0096 .3005°°° .1909°°° .0484°°° .5780°°° .4426°°° .1379°°° .0963 .0497 .0105 .3140°°° 2073000 .0486°°° .5933°°° .4721°°° .1306°°° .1003 .0475 .0076°° .3231°°° .2155°°° .0549°°° .6109°°° .4989°°° .1724°°°
aw .0984 .0487 .0113 .3770°°° .2401°°° .0716°°° .5755°°° .4286°°° .1849°°° .1000 .0505 .0097 .4026°°° .2680°°° .0917°°° .6218°°° .4809°°° .2298°°° .0984 .0492 .0091 .4351°°° .2968°°° .1090°°° .6665°°° .5278°°° .2653°°° .1026 .0520 .0112 .3635°°° .2328°°° .0707°°° .6203°°° .4653°°° .1923°°° .1000 .0482 .0114 .3952°°° .2591°°° .0802°°° .6727°°° .5284°°° .2358°°° .0996 .0498 .0119* .4396°°° .2970°°° .0996°°° .7286°°° .5855°°° .2956°°°
/ .1198— .0744— .0267— .4679 .3514 Mil .6678 .5601 .3493 .1191 — .0730— .0263— .4950 .3747 .1964 .6984 .5962 .3941 .1171 — .0714"* .0264— .5174 .3987 .2247 .7342 .6374 .4406 .1009 .0509 .0105 .4320 .2921 .0966 .7036 .5629 .2724 .1050* .0551" .0126— .4639 .3267 .1215 .7588 .6259 .3388 .1039 .0522 .0120" .5091 .3611 .1425 .8006 .6821 .4055
8.3. MONTE CARLO
467
EXPERIMENTS
Table 8.2: Empirical Sizes and Sample Powers —< Continued >— n J8
a
ft
9 A
fa
ft (10
10 ft
&
ft 11 ft
A
/
w .1050' .0548" .0105 .3590°°° .2297°°° .0695°°° .6709°°° .5187°°° .2394°°° .1031 .0537' .0097 .3768°°° .2453°°° .0753°°° .6997°"° .5642°°° .2733°°° .1002 .0493 .0085 .3972°°° .2634°°° .0879°°° .7417°°° .6030°°° .3174°°°
ns .1018 .0503 .0097 .3509°°° .2219°°° .0680°°° .6666°°° .5232°°° .2473°°° .1028 .0514 .0096 .3816°°° .2447°°° .0777°°° .7109°°° .5709°°° .2943°°° .0995 .0490 .0080°° .4022°°° .2699°°° .0951°°° .7543°°° .6246°°° .3402°°°
Is
cs
aw
/
.1006 .0502 .0095 .3525°°° .2217°°° .0657°°° .6681°°° .5183°°° .2486°°° .1023 .0515 .0091 .3794°°° .2448°°° .0797°°° .7120°°° .5711°°° .2887°°° .1007 .0480 .0082° .4017°°° .2678°°° .0947°°° .7552°°° .6270000 .3374°°°
.1018 .0504 .0095 .3059°°° .1967°°° .0511°°° .5929°°° .4589°°° .1569°°° .0980 .0510 .0105 .3260°™ .2171°°° .0526°°° .6140°°° .4891°°° .1541°°° .0994 .0481 .0083° .3290°°° .2255°°° .0593°°° .6379°°° .5220°°° .1996°°°
.0996 .0521 .0099 .3731°°° .2429°°° .0735°°° .6121°°° .4609°°° .2043°°° .1004 .0482 .0092 .4037°°° .2665° .0870° ° .6615° .5194° ° .2448° ° .1006 .0484 .0093 .4475°°° .3024°°° .1058°°° .7132°°° .5779°°° .2975°°°
.0972 .0498 .0086 .4450 .3020 .1043 .7001 .5615 .2855 .1019 .0521 .0103 .4791 .3397 .1340 .7511 .6306 .3504 .1023 .0480 .0097 .5157 .3740 .1549 .7901 .6758 .4138
.1051' .1017 .10 .0988 .1009 .0522 .0483 .05 .0503 .0486 .01 .0086 .0097 .0092 .0093 .10 .3805°°° .3581°°° .3470°°° .3482°°° 000 .05 .24 14°°° .2232 .2191"°° .2187°°° .01 .072 1°°° .0692°°° .066 1°°° .0635°°° .10 .7467'" .6819°°° .6785°°° .6778°°° .05 .6067— .5321°°° .5299°°° .5256°°° .01 .3017— .2352°°° .2404°°° .2380°°° .10 .1017 .1049 .1025 .1016 .05 .0518 .0549" .0512 .0514 .01 .0084 .0096 .0090 .0088 .10 .4035°°° .3731°°° .3753°°° .3747°°° .05 .2665°°° .2436°°° .2429°°° .2423°°° .01 .0865°°° .0735°°° .0776°°° .0772°°° .10 .7869— .7168°°° .7235°°° .7255°°° .05 .6553— .5797°°° .5827°°° .5822°°° .01 .3600— .2774°°° .2919°°° .2861°°° .1020 .0999 .10 .1046 .1008 .0484 .05 .0505 .0485 .0481 .01 .0100 .0080°° .0083° .0083° .10 .4296°°° .3966°°° .3994°°° .3997°°° .05 .2861°°° .2593°°° .2689°°° .2650°°° .01 . 1 004°°° .0850°°° .0905°°° .0906°°° .10 .8204" .7607°°° .7711°°° .7725°°° .05 .7029— .6211"°° .6415°°° .6364°°° .01 .4162— .3206°°° .3405°°° .3360°°°
.1009 .0507 .0098 .3020°°° .1932°°° .0488°°° .5860°°° .4518°°° .1454°°° .1002 .0515 .0101 .3241°°° .2125°°° .0484°°° .6062°°° .4826°°° .1436°°° .1009 .0490 .0083° .3229°°° .223 1°°° .0558°°° .6262°°° .5124°°° .1802°°°
.0998 .0513 .0093 .3696°°° .2405°°° .0697°°° .6211°°° .4622°°° .1996°°° .1003 .0499 .0089 .3979°°° .26 16°°° .0835°°° .6756°°° .5286°°° .2429°°° .1025 .0493 .0104 .4460°°° .2984°°° .1011°°° .7311°°° .5910°°° .2985°°°
.0962 .0490 .0083° .4325 .2857 .0893 .7077 .5583 .2636 .1004 .0511 .0099 .4690 .3242 .1139 .7610 .6326 .3290 .1010 .0480 .0092 .5064 .3623 .1347 .8070 .6843 .3961
.10 ft .05 .01 .10 9 & .05 .01 .10 fa .05 .01 .10 02 .05 .01 .10 (D) 10 ft .05 .01 .10 .05 fa .01 .10 h .05 .01 .10 11 ft .05 .01 .10 fa .05 .01
.0996 .0484 .0092 .377 1°°° .2485°°° .0752°°° .7314— .5957— .3096— .1012 .0508 .0094 .4070°°° .27 13°°° .0907°°° .7684— .6437' .3632' .1026 .0506 .0096 .4298°°° .29 17°°° .1042°°° .8020" .6878' .4164
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
468
Table 8.2: Empirical Sizes and Sample Powers —< Continued >— n P
Or
1
/
.10 .0983 A .05 .0507 .01 .0105
w
.10
.3808°°°
.01 .10 A .05 .01 .10 A .05 .01 .10 (N) 10 A .05 .01 .10 A .05 .01 .10 A .05 .01 .10 11 A .05 .01 .10 R, 05
.01
.0756° .7513— .6088— .2974'" .0968 .0502 .0106 .407 1°°° .267 1000 .0828°°° .7863— .6609— .3547— .0995 .0488 .0104 .4286°°° .2840°°° .0906°°° .8262— 7097— .4101 —
.1049 .0549" .0099 .3554°°° .226 1°°° .0637°°° .6933° .5344°° .2329°° .0984 .0524 .0112 .3670°°° .2416°°° .0698°°° .7260°°° .5851°°° .2738°°° .1009 .0496 .0098 .3921°°° .2503°°° .0805°°° .76 13°°° 6197°°° .3252°°°
.10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
.0980 .0507 .0099 .3808°°° .2518°°° .0803°°° .7102— .5885— .3087 .0946° .0452°° .0098 .4105°°° .2707°°° .0948°°° .7470 .6259 .3561 .0935°° .0474 .0096 .4360°°° .2958°°° .1074°°° .7731 .6613 .402 1°°°
.1056' .0532 .0106 .3586°°° .23 10°°° .070 1000 .669 1°°° .5250°°° .2356"°° .0966 .0499 .0091 .3787°°° .2478°°° .0794°°° .6986°°° .5640°°° .2735°°° .1013 .0492 .0089 .4043°°° .2652°°° .0825°°° .7346°™ .5950°°° .3188°°°
9 A .05 .2430°°°
A
9 A
A A (T) 10 A
A A 11 A
A
ns .1009 .0509 .0093 .3482°°° .21 84°°° .0638°°° .6873°°° .5287°°° .2358°° .0980 .0495 .0106 .3708°°° .2397°°° .0724°°° .7329°°° .5888°°° .2840°°° .1008 .0507 .0094 .3915°°° .2548°°° .0839°°° .77 11°°° .6355°°° .3339°°° .1014 .0482 .0105 .3530°°° .2229°°° .0667°°° .6697°°° .5237°°° .2500°°° .0931°° .0471 .0085 .3830°°° .2479°°° .0831°°° .7097°°° .5738°°° .2913°°° .0973 .0474 .0084 .4061°°° .268 1 000 .0890°°° .7409°°° .6154°°° .3436°°°
Is
cs
aw
/
.1005 .0516 .0102 .3466°°° .2157°°° .0635°°° .6865°°° .5255°°° .2307°°° .0977 .0499 .0111 .3722°°° .2394°°° .0721°°° .7298°°° .5867°°° .2832°°° .1006 .0518 .0102 .3885°° .2553°° .0816°° .7688°° 6328°° .3264°°
.0973 .0510 .0112 .3037°°° .1917°°° .0501°°° .5785°°° .4443°°° .1423°°° .0990 .0477 .0101 .3131°°° .2076°°° .0476°°° .5952°°° .4724°°° .1365°°° .0994 .0504 .0113 .3195°°° .2087°°° .0509°°° .6053°°° 4901°°° .1694°°°
.1014 .0476 .0095 .3583°°° .2211°°° .0636°°° .6178°°° .4611°°° .1848°°° .1002 .0520 .0118' .3924°°° .2521°°° .0787°°° .6815°°° .5293°°° .2371°°° .1011 .0503 .0100 .4389°°° .2850°°° .0936°°° .7279°°° 5926°°° .2954°°°
.0964 .0499 .0101 .4253 .2780 .0829 .7044 .5503 .2478 .0985 .0507 .0107 .4604 .3081 .0999 .7661 .6238 .3117 .0991 .0469 .0100 .5055 .3522 .1204 .8119 6876 .3800
.1017 .0484 .0100 .3528°°° .22 18°°° .0671°°° .6686°°° .5229°°° .2494°°° .0927°° .0466 .0087 .3822°°° .2487°°° .0822°°° .7085°°° .5721°°° .2913°°° .0964 .0467 .0086 .4043°°° .2684°°° .0908°°° .7409°°° .6156°°° .3406°°°
.1019 .0490 .0095 .3173°°° .2032°°° .0523°°° .5893°°° .4596°°° .1588°°° .0972 .0472 .0090 .3381°°° .2258°°° .0561°°° .6068°°° .4870°°° .1571°°° .0958 .0464° .0080°° .3407°°° .2311°°° .0643°°° .62 10°°° .5122°°° .1959°°°
.1031 .0499 .0103 .3720°°° .2352°°° .0708°°° .6142°°° .4589°°° .1964°°° .0977 .0495 .0097 .4053°°° .2700°°° .0894°°° .6659°°° .5189°°° .2436°°° .0977 .0488 .0096 .4366°°° .3005°°° .1063°°° .7102°°° .5755°°° .2952°°°
.0991 .0483 .0103 .4480 .3119 .1174 .6922 .5662 .3028 .0975 .0491 .0094 .4849 .3504 .1456 .7376 .6220 .3614 .0957 .0488 .0110 .5212 .3845 .1708 .7771 .6708 .4216
469
8.3. MONTE CARLO EXPERIMENTS Table 8.2: Empirical Sizes and Sample Powers —< Continued >— n P ft 9
ft ft ft
(U) 10
ft ft A
11
ft ft ft
9 ft
ft ft (Q) 10 ft
ft ft 11
ft ft
or
/
w
ns
Is
cs
aw
/
.10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
.0998 .0495 .0081° .3701°°° .2319°°° .0660 .7704— .6184— .2749— .1027 .0520 .0082° .3974°°° .2586°°° .0755°°° .8148— .6737— .3352— .0994 .0504 .0096 .4234°°° .2794°°° .0895°°° .8532— .7353— .3930—
.1026 .0527 .0093 .3477°°° .2202°°° .0634°° .7127 .5462 .2232 .1043 .0560"' .0092 .3686°°° .2386°°° .068 1°°° .7468°°° .6000°°° .2631°°° .1009 .0510 .0084 .3925°°° .2518°°° .0789°°° .7925°°° .6490°°° .3171°°°
.0964 .0492 .0099 .3380°°° .2067°°° .059 1°°° .7004°°° .5277°°° .2090 .1039 .0529 .0098 .3663°°° .2321 °°° .0667°°° .749 1°°° .5929°°° .2575°°° .1018 .0491 .0081° .3867°°° .2506°°° .0805°°° .7986°°° .6495°°° .3115°°°
.0957 .0488 .0103 .3363°°° .2050°°° .0574°°° .6952°°° .5202°°° .2034°° .1042 .0527 .0100 .3646°°° .2276°°° .0647°°° .7422°°° .5859°°° .2483°°° .1012 .0480 .0078°° .3834°°° .2467°°° .0789°°° .7939°°° .6412°°° .2997°°°
.0997 .0501 .0100 .2857°°° .1782°°° .0416°°° .5525°°° .4177°°° .1146°°° .1015 .0532 .0102 .2948°°° .1918°°° .0424°°° .5624°°° .4434000 .1010°°° .1021 .0487 .0083° .3023°°° .2005°°° .0454°°° .5636°°° .4567°°° .1332°°°
.0970 .0482 .0102 .3504°°° .2210°°° .0629°° .6328°°° .4604°°° .1740°°° .1000 .0508 .0111 .3837°°° .2431°°° .0760°°° .6943°°° .5348°°° .2218°°° .1021 .0488 .0121" .4287°°° .2830°°° .0886°°° .7562°°° .6057°°° .2874°°°
.0953 .0477 .0095 .4092 .2614 .0717 .7199 .5490 .2153 .1025 .0513 .0107 .4502 .2891 .0867 .7827 .6310 .2839 .0988 .0500 .0102 .4906 .3282 .1090 .8341 .7002 .3592
.10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01 .10 .05 .01
.0993 .0486 .0076°° .3715°°° .2348°°° .0660° .7668"* .6180— .2808— .1022 .0513 .0089 .3978°°° .2610°°° .0774°°° .8088— .6741 — .3457— .1012 .0499 .0102 .4230°°° .28 16°°° .0908°°° .8485— .7286— .4016—
.1030 .0522 .0106 .3501°°° .2201°°° .0659° .7090° .5440 .2309 .1034 .0553" .0097 .3683°°° .2384°°° .0714°°° .7423°°° .5996°°° .2698°°° .1022 .0487 .0076°° .3942°°° .2543°°° .084 1°°° .7881°°° .6470°°° .3204°°°
.0983 .0498 .0104 .3428°°° 2099000 .0611°°° .7010°°° .5310°°° .2233 .1022 .0533 .0089 .3723°° .2361°° .0711°° .7482°° .5950°° .2715°°° .0984 .0485 .0092 .39 12°°° .2593°°° .0832°°° .7951°°° .6573°°° .3271°°°
.0980 .0496 .0104 .3435°°° .2088°°° .0598°°° .6962°°° .5275°°° .2196 .1027 .0537' .0093 .3692°°° .2335°°° .0683°°° .7449°°° .5890°°" .2632°°° .0992 .0477 .0086 .3889°°° .2562°°° .0804°°° .7946°°° .6509°°° .3167°°°
.0998 .0506 .0097 .2959°°° .1829°°° .0432°°° .5683°°° .4324°°° .1232°°° .1006 .0505 .0103 .3049°°° .1987°°° 0447000
.0991 .0484 .0102 .3546°°° .2274°°° .0655°° .6326°°° .4668°°° .1864°°° .1002 .0501 .0100 .3877°°° .2485°°° .0778°°° .6941°°° .5391°°° .2302°°° .1015 .0481 .0107 .4347°°° .2896°°° .0919°°° .7521°°° .6086°°° .2923°°°
.0954 .0476 .0086 .4123 .2632 .0727 .7199 .5509 .2252 .1019 .0515 .0088 .4513 .2979 .0931 .7758 .6316 .2918 .1000 .0486 .0100 .4956 .3369 .1122 .8281 .6953 .3698
.5836°°° .4603°°° .1142°°° .1021 .0491 .0087 .3111°°° .2093°°° .0504°°° .5887°°° .4783°°° .1502°°°
470
CHAPTERS. INDEPENDENCE BETWEEN TWO SAMPLES
8.2, where a = 0.10, 0.05, 0.01 is examined. Theoretically each value in the table should be equivalent to the probability which rejects the null hypothesis H0 : fij = 0 against the alternative hypothesis H\ : fa > 0 for j = 2,3,4. In Table 8.2, the probability which rejects H0 : fa = 0 corresponds to the significance level a and the probabilities which reject H0 : /J, = 0 for j = 3,4 indicates the sample powers, because the true parameter values are given by (f>z,fa,fa) = (0-0, 0.5, 0.9) In Table 8.2, the superscripts *, " and **" in fa imply that the empirical size is statistically larger than a by the one-sided test of the significance levels 10%, 5% and 1% (i.e., a = 0.10, 0.05, 0.01), respectively. The superscripts °, °° and °°° in fa indicate that the empirical size is statistically smaller than a by the one-sided test of the significance levels 10%, 5% and 1%, respectively. That is, each value without the superscript" or ° in fa gives us the correct size. * and ° in fa and fa represent a comparison with t. The value with " implies that the sample power of the corresponding test is significantly greater than that of t (i.e., the corresponding test is more powerful than ;), and the value with ° indicates that the sample power of the corresponding test is significantly less than that of t (i.e., the corresponding test is less powerful than t). As in Section 8.3.1, in the case of (N), the / test should be better than any other nonparametric tests. In other words, t in fa should be closer to the significance level a than any other nonparametric tests, and t in fa and fa has to be larger than any other tests, because the OLS estimator of fij follows the t distribution with n - k degrees of freedom when the error terms et, i = 1,2, • • •, n, are mutually independently and normally distributed with mean zero. The results are as follows, fit in Table 8.2 indicates the empirical size, because the true value of fa is given by zero. For the t tests of (C) and (X), the empirical sizes in fa are over-estimated for all a = 0.10, 0.05, 0.01. However, for the nonparametric tests /, w, ns, Is, cs and aw, almost all the values in Table 8.2 are very close to the significance level a. Therefore, we can conclude that the nonparametric tests is superior to the t test in the sense of the size criterion, fa andj84 in Table 8.2 represent the sample powers, because/?3 = 0.5 and fa = 0.9 are the true values. For fa, in all the cases except for (U), n = 9, a — 0.01 and /, t is more powerful than the nonparametric tests, because except for only one case all the values have ° in the superscript. In the cases of (B), (L), (N), (U) and (Q) in fa, clearly / is superior to t, although t is better than / in the cases of (C) and (X) in fa. In the other cases such as (G), (D) and (T), / is better than ; for small n and large a. Thus, we can observe through Table 8.2 that except for a few cases the permutation test / is more powerful than the t test. In addition, each difference between / and t becomes small as n is large. Therefore, the permutation test / is close to the t test as the sample size is large. Next, graphically we compare the sample powers. Taking the case of n = 10, k = 4 and a - 0.10, Figure 8.2 represents the sample powers for$ = 0,;' = 1,2,3, and fa = 0.0,0.1, • • • , 0.9. For each population distribution, the features of the nonparametric and parametric tests are summarized as follows: (B) There are five lines to be distinguished. The first line from the top is /, the
8.3. MONTE CARLO EXPERIMENTS
471
Figure 8.2: Sample Powers: n = 10, k = 4, a = 0.10 and# - 0 for i - 1,2,3
0.0
0.2
0.4
ns Is cs aw t 0.0
0.2
0.4
0.6
0.8
1.0 At
472
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
Figure 8.2: Sample Powers: n = 10, k = 4, a = 0.10 and pi = 0 for i = 1,2,3 —< Continued >— (N)
0.2
0.0
0.4
0.6
0.2
ns Is cs aw t 0.0
0.2
0.4
0.6
0.8
1.0 At
8.3. MONTE CARLO EXPERIMENTS
(C) (X) (G)
(D) (L) (N) (T) (U) (Q)
473
second line consists of w, ns and Is, the third line is given by t, the next line corresponds to aw, and the bottom line is cs. We have six lines around /34 = 0.6,0.7,0.8, which are (i) ns and Is, (ii) w, (iii) /, (iv) cs, (v) t, and (vi) aw from the top, The first line from the top is given by w, ns and Is, which are overlapped. / gives us the worst test for small /?4, but cs is the worst for/J4 greater than 0.4. We have the same results as (B). That is, we can see five lines. The first line from the top is /, the second line is given by w, ns and Is, the third line is t, the next line is aw, and the bottom line is cs. / indicates the best test, cs is the worst test, aw is slightly better than cs. This case is also similar to (B) and (G). This is also similar to (B), (G) and (L). aw is very close to cs. Both are inferior to the other tests. / is the best test. For small /?4, aw is the worst and t is the second worst, cs is not so bad for small j64, but it is the worst for large/34. Except for small /J4, we obtain the same results as (N), (B), (G) and (L).
Thus, in all the cases except for (C) and (X), / shows the best performance, cs is the worst test at least when /J4 is large, w, ns and Is are between / and t in a lot of cases. aw is as poor as cs, because aw is approximated as a normal distribution, which is a large sample property and does not hold in small sample. CPU Time: As mentioned above, in the case where we perform the significance test of the regression coefficient, we need to compute the n\ regression coefficients (for example, n\ is equal to about 3.6 million when n = 10). In Table 8.3, CPU time per simulation run is shown, where the arithmetic average from 103 simulation runs is computed. Table 8.3 indicates the CPU time required to obtain all the tests (i.e., /, w, ns, Is, cs, aw and?) in each case of n = 10, 11,12 and A: = 2,3,4, where/?,- = 0 is taken for all i = 1, 2, • • -, k. Pentium III 1 GHz Dual CPU personal computer, Windows 2000 SP2 operating system and Open Watcom C/C++32 Optimizing Compiler (Version 1.0) and are utilized. The order of computation is about «! x (k - 1), because the constant term is included in the regression model which we consider in this chapter. Note that the order of computation is n\ x k if the constant term is not included in the regression model (see Remark 7). The case of sample size n is n times more computer-intensive than that of sample size n — 1. For example, it might be expected that the case of n = 15 and k = 4 takes about 25.7 days (i.e., 15 X 14 x 13 x 13.549 minutes) to obtain the result, which is not feasible in practice. Thus, the permutation test discussed in this chapter is very computer-intensive. Therefore, we need to consider less computational procedure. In order to reduce computational burden when n! is large, it might be practical to choose some of the n\ permutations randomly and perform the same testing procedure discussed in this
474
CHAPTERS. INDEPENDENCE BETWEEN TWO SAMPLES Table 8.3: CPU Time (minutes) n\k 11 12 13
2 0.047 0.552 6.970
3 0.070 0.824 10.262
4 0.092 1.096 13.549
chapter. That is, as shown in Remark 9, taking M permutations out of the n\ permutations randomly, we compute the probabilities P(fSj < ^0)) and P(/3; > /^0)), where/3'0) denotes the estimate of the _/th regression coefficient obtained from the original data. If either of them is smaller than a = 0.1, the null hypothesis //0 : Pj = 0 is rejected. In Table 7.3 on p.424, we have examined whether the empirical sizes depend on M, taking an example of testing difference between two-sample means, where the case of M = 106 is very close to the case of all the possible combinations. Since we consider the same nonparametric tests, clearly it can be expected to obtain the same results.
8.4
Empirical Example
In Remark 8 of Section 8.2.2, we have shown how to construct the confidence interval for each parameter. In Remark 9, we have discussed how to reduce computational burden when the sample size is large. In this section, we show an empirical example as an application, taking the same example as in Chapter 7 (Section 7.5). Annual data from Annual Report on National Accounts (Economic and Social Research Institute, Cabinet Office, Government of Japan) is used. Let GDP, be Gross Domestic Product (1990 price, billions of Japanese yen), M, be Imports of Goods and Services (1990 price, billions of Japanese yen), and P, be Terms of Trade Index, which is given by Gross Domestic Product Implicit Price Deflator divided by Imports of Goods and Services Implicit Price Deflator. In Chapter 7, we have estimated two import functions (7.19) and (7.20), which are as follows: log M, = ft, + fo log GDP, + ft log P,,
(7.19)
log M, = A + ft log GDP, + #, log P, + fa log M,_,,
(7.20)
where/3\, fa, fa and/34 are the unknown parameters to be estimated. Because we have n = 46 for equation (7.19) and n = 45 for equation (7.20), it is impossible to obtain n\ permutations for both of the cases. Therefore, we take M = 106 permutations, instead of M = n] permutations. As shown in Remark 8, by considering the permutation between (X'X)~'XJ and et, i = 1,2, • • • , n, we can obtain M regression coefficients, compute the arithmetic average (AVE), standard error (SER), skewness (Skewness) and kurtosis (Kurtosis) based on the M regression coefficients and obtain the percent points (0.010, 0.025, 0.050, 0.100,0.250,0.500,0.750, 0.900, 0.950,0.975 and 0.990) to construct the confidence
8.4. EMPIRICAL EXAMPLE
475
Table 8.4: t(n — k) versus Permutation
OLSE AVE SER Skewness Kurtosis 0.010 0.025 0.050 0.100 0.250 0.500 0.750 0.900 0.950 0.975 0.990
OLSE AVE SER Skewness Kurtosis 0.010 0.025 0.050 0.100 0.250 0.500 0.750 0.900 0.950 0.975 0.990
— Equation (7.19) — t(n - k) for n = 46 and k = 3 ft ft ft -6.0123 1.2841 0.1970 0.5137 0.0000 3.1622 -7.2559 -7.0498 -6.8768 -6.6814 -6.3619 -6.0123 -5.6626 -5.3431 -5.1477 -4.9747 -4.7686
0.0403 0.0000 3.1622 1.1864 1.2026 1.2162 1.2315 1.2566 1.2841 1.3115 1 .3366 1 .3520 1 .3656 1.3817
0.0779 0.0000 3.1622 0.0083 0.0396 0.0658 0.0955 0.1439 0.1970 0.2500 0.2985 0.3281 0.3544 0.3857
Permutation (N = 106) ft ft ft 1.2841 -6.0127 0.5020 0.0394 0.0141 -0.0136 2.8773 2.8771 1.1931 -7.1627 -6.9897 1.2069 1.2189 -6.8369 1.2331 -6.6605 1.2572 -6.3565 1.2842 -6.0141 1.3111 -5.6701 1.3350 -5.3631 -5.1830 1.3489 1.3609 -5.0306 1.3745 -4.8536
— Equation (7.20) — /(« - k) for n =45 and k = 4 ft A ft ft -1.2387 0.4111 0.1785 0.6175 0.8169 0.0000 3.1538 -3.2126 -2.8862 -2.6120 -2.3020 -1.7944 -1.2387 -0.6829 -0.1754 0.1347 0.4089 0.7353
0.1376 0.0000 3.1538 0.0786 0.1336 0.1798 0.2320 0.3175 0.41 1 1 0.5048 0.5903 0.6425 0.6887 0.7437
0.0555 0.0000 3.1538 0.0444 0.0665 0.0852 0.1062 0.1407 0.1785 0.2163 0.2507 0.2718 0.2904 0.3126
0.0958 0.0000 3.1538 0.3859 0.4242 0.4564 0.4928 0.5523 0.6175 0.6827 0.7423 0.7786 0.8108 0.8491
ft
0.1969 0.0762 0.0138 2.8754 0.0228 0.0487 0.0716 0.0987 0.1447 0.1967 0.2489 0.2955 0.3228 0.3461 0.3729
Permutation (N = 106) ft ft
-1.2390 0.4112 0.7892 0.1330 0.0495 -0.0164 2.8620 2.8614 -3.0157 0.1051 -2.7588 0.1510 -2.5288 0.1914 -2.2538 0.2389 -1.7818 0.3206 -1.2465 0.4116 -0.7034 0.5024 -0.2127 0.5827 0.0754 0.6297 0.3179 0.6693 0.6008 0.7143
ft
0.1785 0.6175 0.0536 0.0926 0.1190 -0.0076 2.8675 2.8632 0.0605 0.4047 0.0773 0.4367 0.0923 0.4647 0.1102 0.4977 0.1412 0.5541 0.1772 0.6177 0.2146 0.6809 0.2487 0.7370 0.2691 0.7699 0.2865 0.7974 0.3060 0.8288
476
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES Figure 8.3: Empirical Distribution for/33 in Equation (7.20)
Dist. of 0.1785 + 0.0555X, where X ~ t(4\)
0.17850.2
0.4
intervals. The results are in Table 8.4. In Table 8.4, the right hand side corresponds to the results of the permutation test, denoted by / in the previous sections, while the left hand side represents the conventional OLS results, denoted by t in the previous sections. Therefore, the left hand side is based on the normality assumption on the error term. Thus, in the left hand side, Skewness, Kurtosis and the percent points are obtained from the t(n - k) distribution. Remember that in the case of the t(n - k) distribution Skewness and Kurtosis are given by 0.0 and 3 + 6/(n - k - 4). In the left hand side, OLSE indicates the OLS estimate of each parameter, which is independent of the underlying distribution on the error term. Moreover, note that OLSE and SER in the left hand side are the exactly same results as equations (7.19) and (7.20) on p.426. OLSE in the left hand size is almost equal to AVE in the right hand side, because the permutation test is based on OLSE and the permutation between (X'X)~}Xl and «;, i = 1,2, • • • , n. As for Skewness, Kurtosis and the percent points, the t(n - k) test is different from the permutation test. Comparing both tests with respect to SER, the permutation test is smaller than t(n - k) test for all the parameters, i.e., /?,- for / = 1,2,3 in equation (7.19) and# for i - 1,2,3,4 in equation (7.20). Furthermore, the confidence interval of the permutation test has smaller range than that of the t(n-k) test, because for example we take 0.025 and 0.975 in ^2 of equation (7.19), and obtain 1.3656 - 1.2026 = 0.163 for the t(n - k) test and 1.3609 - 1.2069 = 0.154 for the permutation test. In addition, we can know the true distribution on the error term by using the permutation test, i.e., Skewness is almost zero for both tests and all the/3,, / = 1,2,3, in equation (7.19), but the permutation test has smaller Kurtosis than the t test for all the parameters. That is, the empirical distribution based on the permutation test is symmetric around the OLS estimate, but it has thinner tails than the t(n - k) distribution. In Figure 8.3, the empirical distribution based on the permutation is displayed for /?3 in equation (7.20), which is compared with the t(n-k) distribution (note that n-k = 4 5 - 4 = 41). The solid line indicates the distribution of 0.1785 + 0.0555X, where 0.1785 and 0.0555 represent the OLSE of /?3 and its standard error, and X denotes X ~ t(n-k) for n-k = 41. The bar graph represents the empirical distribution based on the permutation. Skewness of /33 in equation (7.20) is 0.1190, which implies that the empirical distribution is slightly skewed to the right. Thus, using the permutation test,
8.5. SUMMARY
477
we can obtain a non-symmetric empirical distribution, which might be more plausible in practice. Remember that the t distribution is always symmetric.
8.5
Summary
Only when the error term is normally distributed, we can utilize the t test for testing the regression coefficient. Since the distribution of the error term is not known, we need to check whether the normality assumption is plausible before testing the hypothesis. As a result of testing, in the case where the normality assumption is rejected, we cannot test the hypothesis on the regression coefficient using the t test. In order to improve this problem, in this chapter we have shown the significance test on the regression coefficient, which can be applicable to any distribution. In Section 8.3.1, We have tested whether the correlation coefficient between two samples is zero and examined the sample powers of the two tests. For each of the cases where the underlying samples are normal, chi-squared, uniform, logistic and Cauchy, 104 simulation runs are performed and the nonparametric tests are compared with the parametric t test with respect to the empirical sizes and the sample powers. As it is easily expected, the ; test is sometimes a biased test under the non-Gaussian assumption. That is, we have the cases where the empirical sizes are over-estimated. However, the nonparametric permutation test, denoted by /, gives us the correct empirical sizes without depending on the underlying distribution. Specifically, even when the sample is normal, the nonparametric permutation test is very close to t test (theoretically, the t test should be better than any other test when the sample is normal). In Section 8.3.2, we have performed the Monte Carlo experiments on the significance test of the regression coefficients. It might be concluded that the nonparametric permutation test is closer to the true size than the t test for almost all the cases. Moreover, the sample powers are compared for both tests. As a result, we find that the permutation test is more powerful than the t test. See Figure 8.2. Thus, we can find through the Monte Carlo experiments that the nonparametric test discussed in this chapter is applied to any distribution of the underlying sample. However, the problem is that the nonparametric test is too computer-intensive. We have shown that when «! is too large it is practical to choose some of the «! permutations randomly and perform the testing procedure. An empirical example of (n, k) = (46,3), (45,4) is shown in Section 8.4, where the permutation test / is compared with the conventional t test. There, we have performed the testing procedure taking the 106 permutations out of all the possible permutations ((n - k)l in this case) randomly. Thus, it has been shown in this chapter that we can reduce the computational burden. From the empirical example, we have obtained the results that the permutation test / has smaller length in the confidence intervals than the t test.
478
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
Appendix 8.1: Permutation In this appendix, the source code on permutation is shown. The recursion is required for permutation. Therefore, the source code is written in C language, which is shown as follows.
|permutation(n) | i:
i //* 2:
int
idx[181];
*/
3- void permutation(int n) 4: {
5: 6:
int i ; void perm(int i , int n) ;
7:
8: 9:
10:
for(i=l;i<=n; i++) idx[i]=i; i=l;
perm(i,n);
11: } n //* 12:
n- void perm(int i.int n) 14: {
15:
int j ,ra;
16:
17:
18: 19: 20: 21:
if( i
for(j=i; j<=n; j++){ m=idx[i]; idx[i]=idx[j] ; idx[j]=m; perm(i+l,n) ; m=idx[i]; idx[i]=idx[j] ; idx[j]=m;
}
22: 23:
}
24: 25: 26:
else{ for(i=l;i<=n;i++) printf("%3d ",idx[i]); printf ("\n") ;
27: 28: }
}
idx[181] is defined as an external variable in Line 1. In Line 8, initially idx[i]=i is input for 1 = 1 , 2 , . . . ,n. In Line 25, idx[i], 1=1,2 n, are permuted and printed out on screen. An example of n=4 is shown in Table 8.5, where all the possible permutations are printed out in order. As n increases, the number of all the possible permutations, n!, becomes extremely large. Therefore, we sometimes need the source code which obtains some of all the permutations randomly to reduce computational burden. The following Fortran 77 source code obtains one permutation randomly out of n! permutations.
random_permutation(ix, iy ,n, index) subroutine random_permutation(ix,iy,n,index) dimension index(lQS)
APPENDIX 8.2: DISTRIBUTION
OFp
479
Table 8.5: Output by permutation(n): Case of n=4 Output
1234 1 243 1 324 1342 1432 1423 2134
2143
3: 4:
5: 6: 7:
8: 9:
10: ii:
12:
3214 3241
3 1 24 3 1 42
3412 3421 423 1 42 1 3
2314 234 1
4321
243 1 24 1 3
4132 4123
4312
do 1 i=l,n 1 index(i)=i
do 2 i=l,n-l call urnd(ix,iy,rrO j=int(rn*(n-i+l))+i
idx=index(j) index(j)=index(i)
2 index(i)=idx return
end
In Line 10, one of the integers from 1 ton is randomly input into index (i). Repeating random_permutation(ix,iy,n,index), we can obtain as many permutations as we want. As another source code which obtains one of the permutations randomly, the case of n2 = 0 in random_combination(nl,n2,num) on p.439 corresponds to the random permutation, where the number of permutation is given by num. Note that nl in the function random_corabination(nl ,n2 ,num) on p.439 is equivalent to n in the subroutine random_permutation(ix, iy, n, index) in the case of n2 = 0.
Appendix 8.2: Distribution of p Distribution of p under p = 0: We derive the distribution function of the sample correlation under p = 0. Assume that (X,, V,), i = 1,2, • • • , n, are normally distributed, i.e.,
u
O~Y Define Et = (7,- - juy) - p — ( X j - fix) for i = 1,2, • • - , « . Then, it is easily shown that E\, EI, • ••, E,, are mutually independently and normally distributed with mean zero and variance (1 - p2)a-2Y and that £,• are independent of Xt for all / = 1,2, • • • , n.
480
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
Moreover, we can rewrite the above equation as Yt = a+f>Xt + £,, where/? = and a = /JY ~ Pl*x- Because Xt is independent of £/, the conditional distribution of F, given X] = x\, X2 = x2, ••;Xn = xn is normally distributed with mean a + flxi and variance (1 - p2)cr^. Now, define/? as:
which corresponds to the OLS estimator of ft given Xt = jt,-, i = 1 , 2, • • • , n. Furthermore, we can easily show that the conditional distribution of /? given Xt = xt, i= 1,2, • • - , n , is normal with mean jS and variance (1 - (?~)cr\l X"=ite ~ *)2- Define 5 2 as:
where a = 7 -jfe is substituted in the second equality.
—- is distributed as a (1 -p2)crlY chi-square random variable with n - 2 degrees of freedom. In addition, given Xt = xt, i= 1,2, • • • ,n,/3 is independent of S2. Define T as:
T = —— /
— = " V5 2 /Z7=,te-S) 2 (n - 2)
(8.3)
The conditional distribution of T given X,• = xt, i = 1,2, • • •, n, is given by the / distribution with n - 2 degrees of freedom. From this fact, we can obtain the distribution of the sample correlation coefficient p under p = 0. p = 0 implies that/? = 0. Therefore, under p = 0, equation (8.3) is rewritten as:
T=
i-2 =====
(8.4)
APPENDIX 8.2: DISTRIBUTION
OF p
481
i/n-2
1-
Elite-
which has a f distribution with n - 2 degrees of freedom. Although T depends on E'i=1 (-*; ~ *)(F; ~ F) — ' T is independent of xt, i = 1,2, • • • ,n, except for it. Therefore, replacing *,- by Xh we have the following:
p VH- 2
t(n - 2),
\
where p is given by: P =
Thus, the conditional distribution of T given Xt = Xj, i = 1,2, • • •, n, depends on p and does not depend on X, = *,-, z' = 1,2, • • • ,n. Therefore, the conditional distribution of 7 reduces to the unconditional distribution of T. That is, we can obtain the result that the function of p under p = 0 is distributed as the t distribution with n - 2 degrees of freedom. Based on the distribution of 7, transforming the variables, the density function of p under p = 0 is derived as follows: (8.5)
f(f>} =
Distribution of p under p * 0: Take equation (8.4) as the test statistic in this case. Now, we define R as:
The mean and variance of R given x\, x^, • • •, A:,, are as follows: ?ll(Xi-x)(Yi-Y)\_
~"\
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
482
which correspond to the conditional expectations of R given xt, i = 1,2, • • • , n. Thus, we obtain R ~ N(J3 VE/life ~*) 2 > °~y(' ~ P 2 ))< which implies: R 0~Y V1 ~
where i/f is given by:
, Moreover, as mentioned above, we have:
Then, we can verify that T in equation (8.4) has the noncentral / distribution with n-2 degrees of freedom and noncentrality parameter if/, i.e., R
-P2
(/»- 2)S2 T2(l-p2)
«/«-2
«-2)
which is a conditional distribution of T given X; = jc,-, i = 1 , 2 , • • • , « . See 2.2.14 for the noncentral t distribution. Note that noncentrality parameter (// is a function of Jti, JC2, • • •, Jc,,, i.e., 41 - (pl V ~P 2 ) V^JLite ~ 'xf'lo'x- Because i^ is taken as a function of w = E/Lite ~ ^) 2 /°"x> tne above noncentral / distribution is regarded as the conditional distribution of T given W = w, where W = £"=i(T; - X)2/cr2x and
As for VF, we have
- 1), which density is denoted by /(w), i.e.,
APPENDIX 8.2: DISTRIBUTION OF p
483
The unconditional distribution of T is derived by integrating w from zero to infinity, i.e.,
r
e iif?)14' V(n - 2)nT(n-^
r siti-i
I
W
2
e
2(i-<> 2 )
'(^) Jo
: f- Tn -L_ v ^-Viv Jo 2-*-n2±£l)
(8.6)
In the fourth equality, the transformation v = w/(l -p 2 ) is used. Thus, the unconditional distribution of T is derived as (8.6). Using the density function (8.6), we con construct the confidence interval of p and test the null hypothesis such as H0 : p =p 0 . Furthermore, utilizing the relationship between T andp, i.e., T = p V/J - 2/ -^1 -p2, we have the probability density function of p under -1 < p < 1 as follows: (
}
which density was derived by R.A. Fisher for the first time (see Takeuchi (1963)). Finally, note that the density function (8.5) is equivalent to the density function (8.7) in the case of p = 0 and therefore (8.7) is consistent with (8.5).
484
CHAPTER 8. INDEPENDENCE BETWEEN TWO SAMPLES
References Conover, W.J., 1980, Practical Nonparametric Statistics (Second Edition), John Wiley & Sons. Fisher, R.A., 1966, The Design of Experiments (eighth edition), New York: Hafner. Gibbons, J.D. and Chakraborti, S., 1992, Nonparametric Statistical Inference (Third Edition, revised and Expanded), Marcel Dekker. Hogg, R.V. and Craig, A.T., 1995, Introduction to Mathematical Statistics (Fifth Edition), Prentice Hall. Hollander, M. and Wolfe, D.A., 1973, Nonparametric Statistical Methods, John Wiley & Sons. Lehmann, E.L., 1986, Testing Statistical Hypotheses (Second Edition), John Wiley & Sons. Randies, R.H. and Wolfe, D.A., 1979, Introduction to the Theory of Nonparametric Statistics, John Wiley & Sons. Sprent, P., 1989, Applied Nonparametric Statistical Methods, Chapman and Hall. Stuart, A. and Ord, J.K., 1991, Kendall's Advanced Theory of Statistics, Vol.2 (Fifth Edition), Edward Arnold. Stuart, A. and Ord, J.K., 1994, Kendall's Advanced Theory of Statistics, Vol.1 (Sixth Edition), Edward Arnold. Takeuchi, K., 1963, Mathematical Statistics (in Japanese), Toyo-Keizai.
Source Code Index betarnd(ix,iy,alpha,beta,rn) betarnd2 betarnd3(ix,iy,alpha,beta,rn) bimodal(ix,iy,pl,al,vl,a2,v2,rn) birnd(ix,iy,n,p,rn) birnd2(ix,iy,n,p,rn) birnd3(ix,iy,n,p,rn) brnd(ix,iy,p,rn) chi2prob(x,k,p) chi2perpt(p,k,x) chi2rnd(ix,iy,k,rn) chi2rnd2(ix,iy,k,rn) chi2rnd3(ix,iy,k,rrO chi2rnd4(ix,iy,k,rn) chi2rnd5(ix,iy,k,rn) chi2rnd6(ix,iy,k,rn) comb1(nl,n2) comb2(nl,n2) crnd(ix,iy,alpha,beta,rn) dexprnd(ix,iy,alpha,beta,rn) dexprnd2(ix,iy,alpha,beta,rn) dirichletrnd(ix,iy,alpha,k,rn) eigen(x,k,p,d) exprnd(ix,iy,beta,rn) frnd(ix,iy,m,n,rn) gammarnd(ix,iy,alpha,beta,rn) gammarnd2(ix,iy,alpha,rn) gammarndS(ix,iy,alpha,rn) gammarnd4 gammarndS(ix,iy,alpha,rn) gammarnd6(ix,iy,alpha,rn) gammarnd7(ix,iy,alpha,rn) gaimnarnd8(ix,iy,alpha,beta,rn) 485
100 195 207 177 144 145 146 138 105 106 102 103 104 214 215 215 434 435 129 117 127 164 155 94 110 96 184 186 193 205 211 212 213
486 geornd(ix,iy,p,rrO geornd2(ix,iy,p,rn) gumbelrnd(ix,iy,alpha,beta,rn) hgeornd(ix,iy,n,m,k,rn) hgeornd2(ix,iy,n,m,k,riO igammarndCix,iy,alpha,beta,rn)} logisticrnd(ix,iy,alpha,beta,rn) lognrnd(ix,iy,ave,var,rn) Main Program for snrnd5_2 Main Program for snrndS mnrnd(ix,iy,ave,var,k,rn) mnrnd2(ix,iy,ave,var,k,rn) mtrnd(ix,iy,n,ave,var,k,m,rn) multirnd(ix,iy,n,k,p,rn) mvar(k,var,p) nbirnd(ix,iy,n,p,rn) nbirnd2(ix,iy,n,p,rn) nchi2rnd(ix,iy,k,alpha,rn) nfrnd(ix,iy,m,n,alpha,rn) nrnd(ix,iy,ave,var,rn) ntrnd(ix,iy,k,alpha,rn) paretornd(ix,iy,alpha,beta,rn) permutation(n) pornd(ix,iy,alpha,rn) pornd2(ix,iy,alpha,rn) random_combination(nl,n2,num) random_permutation(ix,iy,n,index) recrnd(ix,iy,n,rn) resample(ix,iy,x,prob,m,rn) snperpt(p,x) snprob(x,p) snrnd(ix,iy,rn) snrnd2(ix,iy,rn) snrndB(ix,iy,rn) snrnd4(ix,iy,rn) snrndS(ix,iy,x_L,x_U,m,rn) snrnd5_2(ix,iy,x,prob,m,rn) snrnd6(ix,iy,rn) snrnd? snrndS(ix,iy,rn) snrnd9(ix,iy,rn) Source Code for Section 3.7.5 tperpt(p,k,x)
SOURCE CODE INDEX 139 140 132 150 151 97 130 93 176 203 153 154 159 165 153 148 149 119 120 91 121 133 478 142 142 439 478 137 190 90 89 85 87 88 126 173 175 183 191 203 210 235 114
SOURCE CODE INDEX
487
tprob(x,k,p) trnd(ix,iy,k,rrO unbiased(z,x,l,n,k,lag,olse,beta,se.idist) urnd(ix,iy,rn) urndl6(ix,iy,iz,rn) urnd_ab(ix,iy,a,b,rn) weight(x_L,x_U,m,x,prob) weight2(ix,iy,m,x,prob) weights(ix,iy,alpha,m,x,prob) weight4(ix,iy,alpha,beta,m.x.prob) wishartrnd(ix,iy,n,k,var,rn) wrnd(ix,iy,alpha,beta,rn)
114 112 308 83 81 124 175 189 192 194 161 135
Index absolute error loss function, 250 acceptance probability, 179 acceptance region, 50 addition rule, 3 alternative hypothesis, 49 aperiodic, 198 ARCH model, 323 ARCH(l) error, 357 asymptotic efficiency, 45 asymptotic normality, 45, 46 asymptotic properties, 45 asymptotic relative efficiency, 394, 399 Pitman's asymptotic relative efficiency, 399 asymptotic unbiasedness, 45 asymptotic Wilcoxon test, 396 Wilcoxon rank sum test, 396 autocorrelation model, 269 Bayesian estimator, 272 maximum likelihood estimator, 271 autoregressive conditional heteroscedasticity model (ARCH), 323 autoregressive model, 285 autoregressive moving average process (ARMA), 317
binomial distribution, 5, 12, 26, 143, 176 binomial random number generator, 221 binomial theorem, 13 bit operation, 432 BLUE, 63 bootstrap method, 285, 292 Box-Muller transformation, 84 bubble economy, 351 burn-in period, 196, 216 Cauchy distribution, 111, 128 Cauchy score test, 397 central limit theorem, 33, 45,47 characteristic root, 154 characteristic vector, 154 Chebyshev's inequality, 29, 31, 32 chi-square distribution, 94, 101,214 chi-square percent point, 106 chi-square probability, 105 chi-square random number generator, 219 combination, 432 complementary event, 1 composition method, 171 compound event, 1 concentrated likelihood function, 271 conditional density function, 10 conditional distribution, 10 conditional probability, 3 conditional probability density function, 10 conditional probability function, 10 confidence interval, 47 consistency, 38, 41
Bayesian estimation, 249 Bayesian estimator, 257, 272 Bernoulli distribution, 137 best linear unbiased estimator, 63 beta distribution, 98, 194, 206 beta function, 99 bias, 38 bimodal distribution, 177, 227 489
490 consistent estimator, 41 constrained maximum likelihood estimator, 55 continuous random variable, 4, 5, 9, 10 convergence in probability, 32 correlation coefficient, 19,446,453 covariance, 17 Cramer-Rao Inequality, 70 Cramer-Rao inequality, 39, 70 Cramer-Rao lower bound, 39 critical region, 49 cumulative distribution function, 7 density function, 5 density-based fixed-interval smoothing algorithm, 373 density-based recursive filtering algorithm, 373 dependent variable, 58 diffuse prior, 251 Dirichlet distribution, 162 discrete random variable, 4, 8, 10 discrete uniform distribution, 136 distribution, 4 Bernoulli distribution, 137 beta distribution, 98, 194, 206 bimodal distribution, 177, 227 binomial distribution, 5, 12, 26, 143, 176 Cauchy distribution, 111, 128 chi-square distribution, 94, 101, 214 Dirichlet distribution, 162 discrete uniform distribution, 136 double exponential distribution, 116, 127 exponential distribution, 93, 126 extreme-value distribution, 131 F distribution, 108 gamma distribution, 95, 183, 191, 204,210 geometric distribution, 138 Gumbel distribution, 131, 238
INDEX half-normal distribution, 182, 188, 202 hypergeometric distribution, 149 inverse gamma distribution, 97 LaPlace distribution, 116, 127, 238 log-logistic distribution, 185 log-normal distribution, 92 logistic distribution, 130, 237 multinomial distribution, 165 multivariate normal distribution, 152 multivariate t distribution, 157 negative binomial distribution, 147 noncentral chi-square distribution, 118 noncentral F distribution, 120 noncentral t distribution, 121 normal distribution, 7, 91, 125, 182, 188,202 Pareto distribution, 132 Pascal distribution, 138 Poisson distribution, 141 Rayleigh distribution, 135 rectangular distribution, 136 standard normal distribution, 7, 14,84, 172,209 t distribution, 111, 237 uniform distribution, 6, 13,79, 123, 172 Weibull distribution, 134 Wishart distribution, 159 distribution function, 7 distribution of sample correlation coefficient, 479 distribution-free test, 393 double exponential distribution, 116, 127 e, 12 efficiency, 38, 39 efficient estimator, 39 eigenvector, 154
JNDEX EM algorithm, 337, 339 empirical size, 412, 460 empty event, 1 estimate, 36 estimated regression line, 59 estimator, 36 event, 1 exclusive, 1 experiment, 1 explanatory variable, 58 exponential density, 135 exponential distribution, 93, 126 extended Kalman filter, 342, 343 extreme-value distribution, 131 F distribution, 108 Filtering, 376 filtering, 315, 326, 336, 373, 375 filtering algorithm, 373 filtering density, 325, 373 final data, 319 Fisher test, 394 Fisher's randomization test, 394, 398 Fisher's permutation test, 445 Fisher's randomization test, 394, 398 fixed-interval smoothing, 315 two-filter formula, 332 fixed-lag smoothing, 315 fixed-lag smoothing density, 325 fixed-parameter model, 316 fixed-point smoothing, 315 flat prior, 251 gamma distribution, 95, 183, 191, 204, 210 gamma function, 95 Gauss-Markov theorem, 62 geometric distribution, 138 Gibbs sampler, 324 Gibbs sampling, 215, 257, 274 grid search, 271 Gumbel distribution, 131, 238
491 half-normal distribution, 182, 188, 202 heteroscedasticity, 253 heteroscedasticity model Bayesian estimator, 257 maximum likelihood estimator (MLE), 256 modified two-step estimator (M2SE), 255 holiday effect, 365 hypergeometric distribution, 149 identity matrix, 74 importance resampling, 187, 222, 324 improper prior, 251 increment, 80 independence, 3, 19-21, 25, 26, 28, 29,446 independence chain, 200, 227 independence of random variables, 11 independence test, 445 independent variable, 58 information matrix, 271 integration by parts, 14, 69 integration by substitution, 6, 68 interval estimation, 47 inverse, 74 inverse gamma distribution, 97, 273 inverse transform method, 122, 135 irreducible, 198 Jacobi method, 155 Jacobian, 7, 24 joint density function, 9 joint probability density function, 9 joint probability function, 9 Kalman filter, 342 Kendall's rank correlation, 446 kth order Taylor series expansion, 70 L'Hospital's theorem, 123 lagged dependent variable, 285 Lagrange function, 41 Lagrange multiplier, 41
492
LaPlace distribution, 116, 127, 238 law of large numbers, 32, 33, 46 least squares estimate, 60 least squares estimator, 60 likelihood function, 43, 249 concentrated likelihood function, 271 likelihood ratio, 55 likelihood ratio test, 54 linear congruential generator, 80 linear estimator, 40 linear unbiased estimator, 40, 62 linear unbiased minimum variance estimator, 40 location parameter, 91 log-likelihood function, 44 log-logistic distribution, 185 log-normal distribution, 92 logistic distribution, 130,237 logistic score test, 397 loss function, 250 marginal density function, 9 marginal probability density function, 9 marginal probability function, 9 Markov chain, 197 Markov chain Monte Carlo, 197 Markov chain Monte Carlo (MCMC), 324 Markov property, 197 Markov switching model, 322 mathematical expectation, 11 maximum likelihood estimate, 43 maximum likelihood estimator, 43, 271 maximum likelihood estimator (MLE), 256 MCMC, 197 mean, 11, 15-17,35,37 mean square error, 32 measurement equation, 315
INDEX Metropolis-Hastings algorithm, 195, 222, 227, 258, 273, 324 modified two-step estimator (M2SE), 255 modulus, 80 moment-generating function, 12, 17, 24 MSB, 32 multinomial distribution, 165 multiple regression model, 66 multiplication rule, 3 multiplicative heteroscedasticity model, 253, 254 Bayesian estimator (BE), 253 maximum likelihood estimator (MLE), 253 modified two-step estimator (M2SE), 253 two-step estimator (2SE), 253 multiplicative linear congruential generator, 80 multiplier, 80 multivariate normal distribution, 152 multivariate t distribution, 157 negative binomial distribution, 147 negative definite matrix, 75 negative semidefinite matrix, 75 Newton-Raphson optimization procedure, 292 Nikkei stock average, 351 non-recursive algorithm, 335, 374 non-symmetry effect, 364 noncentral chi-square distribution, 118 noncentral F distribution, 120 noncentral t distribution, 121, 448, 482 noncentrality parameter, 118, 120, 121 noninformative prior, 251 nonparametric test, 393 normal distribution, 7, 91, 125, 182, 188,202 normal score test, 393, 397 normalization, 16
INDEX wth moment, 25 null hypothesis, 49
OLS, 60 OLSE bias, 286 one-sided test, 52 one-step ahead prediction, 335, 374 one-step ahead prediction density, 325 ordinary least squares estimate, 60 ordinary least squares estimation, 60 ordinary least squares estimator, 60, 67 outlier, 327 parametric test, 393 Pareto distribution, 132 Pascal distribution, 138 permanent consumption, 321 permutation, 478 random permutation, 479 permutation test, 446,447 Pitman's asymptotic relative efficiency, 394, 399 point estimate, 35 point estimation, 38 Poisson distribution, 141 positive definite matrix, 74, 152 positive semidefinite matrix, 74 posterior density function, 250 posterior probability density function, 250 power, 49 power function, 49 predicted value, 59 prediction, 315, 335, 373, 374 prediction density, 325, 377 prediction equation, 324, 325, 373 preliminary data, 319 prior probability density function, 249 probability, 2 probability density function, 5 probability function, 4 product event, 1 quadratic loss function, 250
493
random experiment, 1 random number, 79 random permutation, 479 random variable, 4 random walk chain, 200, 227 rank correlation coefficient, 449 rank correlation test, 446 ratio-of-uniforms method, 208 Rayleigh distribution, 135 rectangular distribution, 136 recursion, 432 recursive algorithm, 374 recursive residual, 425 regression coefficient, 58, 450, 464 regression line, 58 rejection region, 49 rejection sampling, 178, 222, 324 relative efficiency, 394 remainder, 80 residual, 58 reversibility condition, 198 revised data, 319 sample point, 1 sample power, 418, 460 sample space, 1 sampling density, 178, 227, 325 scale parameter, 91 score correlation coefficient, 449 score function, 395, 449, 452 score test, 393, 395, 446 seasonal adjustment model, 317 seed, 80 shape parameter, 95 significance level, 49 significance test, 445 simple event, 1 Smoothing, 381 smoothing, 315, 328, 336, 337, 373, 375 smoothing algorithm, 373 smoothing density, 325 Spearman's rank correlation, 446
494 standard deviation, 12 standard normal distribution, 7, 14, 84, 172,209 standard normal percent point, 90 standard normal probability, 88 standard normal random number generator, 217 standardization, 16 state space model, 315 state variable, 315 State-space model, 376 statistic, 36 stochastic variance model, 323 stochastic volatility error, 354 stochastic volatility model, 323, 324 structural change, 327 sum event, 1 t distribution, 111,237 t percent point, 114 ; probability, 113 / test, 393 target density, 178, 325 target density function, 171 Taylor series expansion, 34, 46, 70 Taylored chain, 200, 227 test statistic, 49 time varying parameter model, 316 transformation of variables, 22, 23, 84 transition equation, 315 transitory consumption, 322 transpose, 74 true regression line, 58 two-filter formula, 332, 350 two-sided test, 52 type I error, 49 type II error, 49 unbiased estimator, 38 unbiasedness, 38 unconstrained maximum likelihood estimator, 55 unexplanatory variable, 58
INDEX uniform distribution, 6, 13, 79, 123, 172 uniform score test, 397 updating equation, 324, 325, 373 variance, 11, 15, 17,35,37 Wald test, 52, 53 Weibull distribution, 134 whole event, 1 Wilcoxon rank sum test, 393, 396 Wilcoxon test Wilcoxon rank sum test, 393 Wishart distribution, 159