MA THEMA TICS: E. B. WILSON
VOL. 13, 1927
151
ON THE PROOF OF SHEPPARD'S CORRECTIONS By EDWIN B. WILSON HARVARD SCHOO...
27 downloads
462 Views
474KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
MA THEMA TICS: E. B. WILSON
VOL. 13, 1927
151
ON THE PROOF OF SHEPPARD'S CORRECTIONS By EDWIN B. WILSON HARVARD SCHOOL OF PuBLic HSALTH, BOSTON
Communicated February 12, 1927
In Whittaker and Robinson's masterly "Calculus of -Observations," Art. 99, is a proof of Sheppard's corrections. The crude moments are, by definition,
= 4us= E x?SwJ 1/2 f (xs, + nw) dn, mp I~~~~~~~~~I 5=
_ co
w
co
co
5
1/2
(1
-Xco
where x,, are equidistant abscissas spaced w apart. The frequency function f(x) is treated as extending from - X to + o, as may always be arranged by filling out the undefined range, if any, off(x) with the definition f(x) = 0. The true moment is
mp= Jxpf(x)dx = w S~3 E x4f(xX). -
CD
-
(2)
X
The only assumption stated is that the frequency curve has close contact with the axis at both ends "so that" (2) holds. The left-hand side of (2) is a definition; the right-hand side of (2) is untrue, so far as I can see, for any and every frequency function except for special positions of xs, which cannot be determined in advance. The proof given by the authors depends on finite differences, and their symbolic manipulation, much in the way in which Sheppard's original proof did (Proc. London Math. Soc., 29, 1898, 353-380). There are, however, slight differences in the proof and even in the statement of the theorem; for Sheppard assumed a finite range from xo to xp for the frequency function, divided that range into equal intervals and gave his formulas, as usually quoted, merely as approximations of others from which certain terms had been dropped because usually negligible. The terms which are dropped involve, either through summation or differencing, the behavior of the frequency function throughout its whole range. As a practical matter we do not know in advance exactly where the ends of the fitted frequency distribution are. It seems, therefore, worthwhile to reexamine the whole matter. If we may assume (2), a derivation of the Sheppard corrections may be given by the simplest of elementary calculus. Interchange the summation and integration in (1); write x-P as (x3 + nw-nw)P; expand by the binomial theorem; apply (2); and perform the integration with respect ton.
MA THEMA TICS: E. B. WILSON
152
r1/2
CO
j xS'w
mp= =
1X1/2 (1/2
rl/
O
t1/2
f (xs + nw)dn =
PROC. N. A. S.
OD
wIwx'f (xs+nw)dn
COAX2-O
W(X. + nw - nw)pf(x5 + nw)dn
- co
E (W =o -) ip! (x, + nW) 1-'(nw)ff(x, + nw)dn
-
_1/2
co(-q-s
am
) (nw)dn f (xs + nw)p5f (xs + = ( -D -1/2 o (p - i)w
nw)dx,
p'
p p-i or-.
P!wtmp-
i = 0,2,4,..
o (p-i)!(i + 1)!2' 2 2 When these equations are solved for m in terms of m' the standard form of the Sheppard corrections are found. As f(x) is necessarily positive or zero all that seems to be necessary is that the moments exist, i.e., that xPf(x) be integrable from -co to + c. As a matter of fact w E ,,x 'f(x.) is clearly not independent -of the size of w nor of the positions of the abscissas x5, but is a function of x, with period w. Let
(x') = w E-X xspf(xs),
s w -X E )Cos _dx_ Ak,p =-w (sf22, -w/2
w
with a similar expression Bk,p in the sine. Each integration extends over an interval w centered at x, and the summation adds these results from - co to + co. Hence
~
co
Ak,p
=
xs'f (x5) cos
2f
27rkxsdX
co
1
X
w
dx,,
Bk,p
=
2 J -X
27_kx xpf (x5) sin -k dx,
a
cos 27rkx,/w + E Bk,p sin 27rkxs/w. + ,6p(x)=-=Ao 2 oEAk,p
Or
w ,j x f (x5) = mp + Z Ak,p Cos 27rkxs/w + , B,p Sin 2irkx3/w. (3) -X
~~~11
This equation takes the place of (2) and is exact. The function xPf(x) does not need to be continuous, it does need to satisfy some slight condition of developability into Fourier's series, such as being of limited variation, which would always be fulfilled by any function we should use as a frequency function. To illustrate, take the normal frequency function. Here -2w'kIq'/w2 1 1 2Trkx -x/2 A= 4¢Xe cos w dx = e kA10,
B,@o=O
MA THE MA TICS: E. B. WILSON
VOL. 13, 1927
1 X
-x2/202
xeJ-
w Z
s
27rka2 -2,2k2u2/w2
2,rkx
dx = 2--e
153 1
=2Bk/, Ak,1 = O
=01Je(.),
q =e 2
wzx,f(x.) = 0 + 2 2 2irku2 e-2w2k2u2/w2 si 2wkx8
->( _
f(x.)
= 1 + 2 Ze 2t*f/
cos
/
w
1w
where eo is the Jacobi function (see, e.g., Wilson, "Advanced Calculus," p. 469). If which is q is small the 0-functions converge with extreme rapidity. Even if we = much larger than would ever be taken in practice, q = e2= 0.13533 and the series may be reduced to the term in q, neglecting q4,...The even moments m' may be obtained by applying repeatedly the operator
= (2) (w ed 2i e-sn222/w' i - [al(x.)u+l2, A s w p 2 is27rkxo] t2T2i2s2/w2(.4T2k2
- ,( /f tox e
,
=
(X,) If
It may be observed that the higher we go up in the series of moments, the more important does the correction become, the multipliers of the trigonometric series being of the order (2wka/w)e for the pth moment.
To go back and correct the proof of Sheppard's corrections we have the proper value (3) of the the last steps by inserting merely to correct W2 2 k=l infinite sum instead of (2). Then p,cos + fiAAk2
1 p!wo
k(es+
)
+2E Bkp_, 2irWr(x 2+ nw)1
The trigonometric P I )p!&terms may be expanded. Loet
2rk(p +nw
Jo
J-2/2
where for i odd we have 'k,i = 0 and for i even we have Jk,i = 0. Then mP
=
o (p-)!(i + 1)!2 .E
L
even
p
+ Bkp -,sin ) seven Z Ik,i (Ak,p-, cos 2 E. (p-i)!i!k=k l 2 Z 2 Jk,i k,p-kcosnknxI - Apssin2rkx5n ikodd. .!w' . 1
(p-i)!i!k=1
(B
W/
Where, as indicated, i runs on even integers from 0 to (p-1)/2 or p/2 as the case may be, in the first line, and on even integers from 2 up in the second, but on odd integers from 1 up in the third.
MATHEMATICS: E. B. WILSON
154
PROC. N. A. S.
The last two lines give the corrections for Sheppard's corrections due to the inexactness of (2) and take the place of Sheppard's own form of them in terms of differences. It is particularly to be observed that the mean value of the trigonometric correction terms is zero, i.e., the usual formulas for Sheppard's corrections are true on the average for any frequency function, for which the highest moments considered exist, without any consideration of how the function behaves at the ends of its range or within it. The only reason, then, for considering the behavior at the ends must be of a practical, rather than theoretical nature connected with the possibility that, if the contacts at the ends are not close, the trigonometric correction terms are of the same order of magnitude as the Sheppard corrections, so that in the absence of any knowledge of the former, it is useless to apply the latter in any particular example despite the fact that in the average the former vanish. To illustrate, consider the function f(x) = 1, - /2 < X < 1/2, f(x) = 0, k > 1/2 or x < - 1/2. This has no contact at either end of its true range, though if we include the artificial range the end contact is perfect but the intermediate behavior discontinuous. It need not be assumed that we know in advance the positions of the ends. We could have either of the following two histograms from grouped observations: -2/3 to 1/3 0.167
_ 1/3 to O
1/3 to 2/3 0 to 1/3 0.167 0.333 0.333 1/2 -.1/6 1/6 -1/6 1/6 to to -l1/2 to x 0.333 0.333 0.333 F and similarly for other values of w than 1/3. The coefficients are I'/' .2w k7r 2k7rx dx =T-sin W . Bk,O = 0, Ako= 2J1 /w \2 k7r w kir 2k7x x sin dx= y-} sin -w cos Ak,l = O. Bk,1=2J -
x
F
cos-w
The values of Ik,i and Jk,i are always as follows: Iki2i ( - 1)k -3+i5
Ik.i =
Jk,t = Then
(27k)i.1)
[(7rk)i'-(i-l)(i-2)(&Tk)i-+(i-1)(i-2)(i-3)(i-4)(7rk)i5 ... ],ieven
-2(1)* (- 1)k
[(rk)i-i(i-1)(rk)i-2 + i(i-1)(i-2)(i-3)(7rk)i-4 .], .
"O
m
0
-w
( 1)k 2w 1 27rk 7rk
.
i odd
k7r sin 27rkx w
w
The correction is quite 0 if l/w is an integer. The series is 1 , w2 .7r . 2rkx, 1. 27r 47rkx, 1 . 37r . 67rkr, + gsinw sin w.. .J. - sin- sinm ml = LSi sin It is seen that the periodic error in m' is of the order of magnitude of W2/7r2 at most. As w would always be small compared to 1, the correction would be very small. In respect to the second moment
MATHEMATICS: E. B. WILSON
VOiL. 13, 1927
=2
%
+ 12 + Wkl2(Tk)2k
w
165
w
k-i 22kkttsinws (w2k)2
os
k
or t
M2
M2
to
W2
12
732k.l
-) k2
kT
27rkx,
ow
Co
(1)sin kwt cos 2rkx. +2w7rkkk w There is a trigonometric series of the second order in w/r and one of the third order. The second order term, which will vanish, on the average, is, at its greatest, of just about the magnitude of the Sheppard correction w2/12, but may either cancel or double it. To go on to higher moments would show that a second order trigonQmetric correction occurs in them all. As an example of a skew distribution take f(x) = 0, x < 0, f(x) = ez, x 0. This type is approached by many extremely skew frequency functions which drop off sharply on the left hand. With w = 1 we might have for grouped data, if the starting points or origin were unknown in advance, either of these histograms. 1-2 3-4 5-6 6-7 0-1 4-5 x 2-3 0.004 0.001 0.012 0.031 0.233 0.086 F 0.632 1/3 to 3/2 3/2 to 5/2 5/2 to 7/2 7/2 tO 9/2 9/2 to 11/2 11/2 to 13/2 x - 1/2 to 1/ 0.007 0.003 0.019 0.143 0.050 0.383 F 0.393 The series for m'1 and m'2 are readily summed exactly in these cases and the errors from mln = 1.00, ma = 100 may be determined. The general analytical calculation gives the following results: = 2f = 2Jo 2kw Ak,oAk,o eMcos-W
dx
=
11
(2ksr/w)i Bodx (2krf/W)2.sn..dX
+
4k-r/w 1 + (2kTr/w)2
Ak,1
w d
=2
dBk,O
=
2-2(2krw/))2 (1 +
(2kr/W)3)2'
w d
kAk,
B k.1
=
8kw/w 1
+
(2k7r/W)2))
Then co
ml= Ml +
-
4k7r/w
(-1)1 (
2,rk
1
+
csik,
(2k7r/w) 2Co
to
2
1 +
sn7k.
(2k7/W)2s
If w is small compared with 2wr, this may be written ( ml =ml + 3-21) cos 2rkxt_3 (1k) sin 27rkx. 3 w 27r~ k2 kg
There are trigonometric terms of the second and third orders in wl/T. In the case of the histogram II with x, = 0, ml1 = ml - 0.05 approximately, and for I with x, = 0.5, mI = ml + 0.08, approximately. The values of A k,l and B ,k are needed for ml. Again there are errors of the second order in w/r. The values of the second moment about the mean in the two histograms are 1.09 for I and 0.84 for II with the Sheppard correction w2/12 but 1.17 and 0.92 without. If we do not apply the corrections the value of the
MA THEMATICS: D. V. WIDDER
156
PROC. N. A. S-
standard deviation varies from 8% too much to 4% too little, without the correction but from 4% too much to 8% too little if we apply it, according to the way we make up the histogram. It is doubtful if in practice w would be taken so large as 1 in such a case as this, but the dilemma would remain the same except on a reduced scale.
The illustrations have been extreme in that they possessed major discontinuities. It is probably safe to apply the Sheppard corrections in all cases of mere doubt about the contact at the ends. Whether it is worthwhile to apply the corrections at all when consideration is had for the sampling errors is another matter. For example, the relative error in a2 iS (2/n)1/' and if this is to be less than w2/12of2, the value of n must exceed 288of4/w4. On the other hand if f(x) is practically normal, the expansion for the trigonometric corrections is very rapidly convergent, as is illustrated above, and reasonably large values of w, larger than are ordinarily used in practice, give such good values of the moments that, with the Sheppard corrections, the parameters of the frequency functions may be determined well within their sampling errors, even for n large, with relatively few intervals. For example, Whittaker and Robinson, p. 189, calculate the mean and standard deviation for a set of 10,000 chest measurements given to the nearest inch. If the grouping had been in 3inch intervals we should have mean = 39.842 in place of 39.835 - 0.02 and a = 2.04 in place of 2.05 L 0.015.
NOTE ON A GENERALIZATION OF TA YLOR'S SERIES By D. V. WIDDSR* DEPARTMENT OF MATHEMATICS, UNIVERSITY OF CHICAGO Communicated January 24, 1927
1. It is a familiar fact that if one determines the constants c0, cl . . ., cn of the polynomial Sn
(x)
=
CO
+
ClX
+ c2x2 +
. ..
+ c,,xe
in such a way that the curve y = sn(x) shall have closest contact with a curve y = f(x) at a point x = t, one obtains the first (n + 1) terms of the Taylor development of f(x),
s (x) = f(t) + f'(t)(x-t) + * *
+ fl(t) (x-t) . n!
The series Go
E [Sn(x) -S. (x)](1 -
n =O