All of Statistics: A Concise Course in Statistical Inference Brief Contents 1. Introduction…………………………………………………………11 Part I Probability 2. Probability…………………………………………………………...21 3. Random Variables…………………………….…………………….37 4. Expectation…………………………………………………………..69 5. Equalities…………………………………………………………….85 6. Convergence of Random Variables………………………………...89 Part II Statistical Inference 7. Models, Statistical Inference and Learning………………………105 8. Estimating the CDF and Statistical Functionals…………………117 9. The Bootstrap………………………………………………………129 10. Parametric Inference……………………………………………..145 11. Hypothesis Testing and p-values…………………………………179 12. Bayesian Inference………………………………………………..205 13. Statistical Decision Theory……………………………………….227 Part III Statistical Models and Methods
14. Linear Regression………………………………………………...245 15. Multivariate Models……………………………………………...269 16. Inference about Independence…………………………………..279 17. Undirected Graphs and Conditional Independence……………297 18. Loglinear Models…………………………………………………309 19. Causal Inference………………………………………………….327 20. Directed Graphs………………………………………………….343 21. Nonparametric curve Estimation……………………………….359 22. Smoothing Using Orthogonal Functions..………………………393 23. Classification……………………………………………………..425 24. Stochastic Processes………………………………………………473 25. Simulation Methods………………………………………………505 Appendix Fundamental Concepts in Inference
!#"%$&%')(*,+.-0/12!34+5-6(*879:<;2+.=,>"?/9@#"%=A/B"%CD3E!9:%FG=+5G7,9:!/B"H/+.$5+5(JIK"%G= -@(*"H(*+5-@(*+5CL-M'N!9O-P(*F=,LG#(*-+.G-P(D"H(:+.-@(:+.CQ-LRSCL!T7F,(:L9U-:CQ+.QGCLVWQ-:7XLCQ+Y"%$5$5I=B"H(D" T+.G[Z +.G>"%G=8T>"%CD+.G,\$5]"%9@G+.G,E^_RBT>"H(*,LT>"H(*+5CL-LR"%G=K9@L$."H(*Q=`=+5-:CQ+.7$5+.GQ-La)+.-b/X[!3 CL<;!Q9:-K"cTdFCDfeb+5=L9?9:"%G! %'\(*%7+.CQ-?(:B"%Gf"(JI[7,+.C]"H$6+.G#(*9@[=,FCQ(:!9@I(:_g2(8!G Th"i(*QTh"H(:+.CL"%$j-P(D"H(:+.-@(:+.CQ-La)kl(m+5GCL$5F=Q-nT2=L9@G?(:!7+5CL-b$.+53%0G!G7"%9*"%TQ(:9:+5C6CLF9P;! L-P(*+.T>"H(:+.!GoRi/X[%(:-@(:9*"%77,+.Gm"%G=MCQ$Y"%-@-:+qpBC]"H(:+.!GjRi(:!7+.CQ-(:B"H(&"H9:rF,-:FB"%$5$5I09@L$.Q#"H(:L= (*0'N!$.$5<esZJF7OCQ!F9:-@L-Qa),m9@]"%=Q9S+.-)"%-:-@FTL=>(:t3EG<euCL"%$.CQF$.F,-S"%G=v"0$.+q(:(:$.r$5+.G[Z ]"%9n"%$5!L/,9*",a&wnU79:_;[+5!F-b3EG<eb$.Q=!\%'x7,9:!/B"H/+.$5+5(JIh"%G=?-@(*"H(*+5-@(*+5CL-+5-9:LyEF+59:Q=oa 0(*_g2(n+.--@F+5(*"%/$56'N!9b"%=,;H"%GCQL=8FG=Q9:!9:"%=FB"i(*L-m"%G=8!9:"%=FB"H(:z-@(:F=LG#(:-La {j|<}[|~_|~# RE },|]}4~W
~W
"%G= }Bi~
N2},H
~W
"H9:n"%$.$BCQ!GCQL9:G,L=heb+q(* CL!$5$.QCQ(*+5G4"HG="%GB"H$5I2L+5G4=B"H(*",a B!9t-:!T>(:+.T%Rx-P(D"H(:+.-P(*+.CQ-d9@L-:L"%9:CDcer"%-MCL%G[Z =FC_(*L=+5Gh-P(D"H(:+.-@(:+.CQ-r=Q7B"%9@(:T>QG#(*-reb+5$.n=B"H(*"tT+.G+5Gd"HG=vT>"%CD+.G,m$5]"%9@G+.G\9:Z -:L"%9:CD\es"%-CL%G=FC_(*L=t+.G\CL!T7F,(:L9-@CL+.QGCL)=Q7B"%9@(:T>QG#(*-La&2(D"H(:+.-P(*+.CQ+Y"%G,-j(*%F!#( (*B"i(vCL%T>7F[(*L9>-:CQ+.LG#(:+.-@(:-heL9: 9@L+5GE;%LG#(*+5G(:KebQL$ar!T>7,F,(*Q9h-:CQ+.QGE(:+.-P(*(*%F!#(b(*B"H(b-P(D"H(:+.-P(*+.CL"%$j(:L!9PI=+5=Go(n"%77$qIh(*U(*,L+.97,9:!/$5LT-La +.G%-U"%9:CDB"HG!+.G,a2(*"H(*+5-@(:+.CL+."%G-tG<e9:QCL!%G+.Qv(*B"H(MCL!T7F,(:L9O-:CQ+.QG[Z (*+5-@(*-r"%9:mTh"H32+5G\G<;!L$1CL!G#(:9:+./,F,(*+5!G-Seb+.$5mCQ!T>7,F,(*Q9r-:CQ+.QGE(:+.-P(*-rG,ef9@LCL%!G+.Q (*0!QGL9:"%$.+q(JIv%'-@(*"H(*+5-@(:+.C]"H$1(*,L!9PI?"%G=8T>_(*2=!$5!%I%a&r$._;!Q9m=B"i(D"OT>+5G+.G,O"%$qZ !!9@+5(*,T>-z"H9:UT>%9:M-@C]"%$."%/$.M(:B"%GA-@(*"H(*+5-@(*+5CL+."%G-6_;!L9z(:!F!A7X!-:-@+./$5%aUB!9:T>"%$ -@(*"H(*+5-@(*+5C]"%$#(:L!9PId+5-xT!9:)7XL9@;H"%-@+5;%r(*B"HGMCL!T7F,(:L9-:CL+5LG#(*+5-@(:-B"%=O9@]"%$5+.LQ=oa$.$ "%!9@LU-@(:F=LG#(:-\eb8=L"%$xeb+5(: (*"%GB"H$5I2-:+5-6%'=B"H(D"-@!F$5=A/XOerQ$.$%9:!FG,=L= +.G?/B"%-@+.Cz79@!/B"%/,+.$.+q(JIv"HG=?Th"H(:LT>"H(:+.C]"H$j-@(*"H(*+5-@(:+.CL-QaSm-@+.GO'W"%GC_Iv(*2!$5-s$.+53%zGQF[Z !
! 9*"H$1G_(*-QR/12!-P(*+.G,M"%G=-@F77X!9@(r;!LC_(*!9T>"%CD+5GL-)eb+5(:!F,(F,G=L9@-@(*"%G=+5GO/B"%-@+.C -@(*"H(*+5-@(:+.CL-+5-$.+53%z=!+5GO/9:"%+.G?-:F,9:!Q9@I?/XQ'N!9@z32G,eb+5G<e (*F-@t"O/B"HG=B"%+5=oa "rF,(>ebL9@ C]"%Gu-P(*F=QG#(*-$.L"%9:G /B"%-@+.C479@!/B"%/,+.$.+q(JIc"%G= -@(D"i(*+.-P(*+5CL-hyEF+5CD3E$5$ I# wm<ebQ9:Ha\s(6$.]"H-@(]RX(*B"i(6es"H-6TdI`CL!GCQ$.F-@+.!G`ebLG TdI8CL%T>7F[(*L9-:CL+5LGCQOCL!$ Z $.L"%!FQ-3%Q7,("%-:3E+.G,nT& %'( L9@)CL"%G0k1-@LG=tTdI6-P(*F=,LG#(*-(*!_("s![2=0"FG=Q9PZ -@(*"%G=+5G8%'rT2=L9@G-P(D"H(:+.-@(:+.CQ-zy2F,+.CD3E$5$I #*) U(JI[7,+.C]"H$&Th"H(:LT>"H(:+.C]"H$-@(*"H(*+5-@(*+5C CL%F9:-@`-@71QG=-h(:[ TMFCD (*+5T>?!Gc(*Q=+.!F,-h"%G,= FG,+.G-@7+.9@+.G (*%7+.CQ-4VNCL!FG#(:+.G TQ(*,[=-QR(Jer=+5T>QG-:+5!GB"%$1+.G#(*Q!9*"%$5-Q(:C%a ^>"H((:tg[71QG-:\%'CQ<;!L9@+.GT[=Q9:G CL%GCLQ7,(*->VW/X2%(*-P(*9*"H77+.G,RoCLF,9@;!UQ-@(*+5Th"i(*+.%GoRj!9:"%7+5C]"%$T>2=Q$.-_(*CHa ^aU[?km-@Q( !F,(b(:>9@L=Q-:+.%G`!F,9nF,G=L9@!9*"H=FB"H(:t%G!9:-nCL%F9:-@t!G879@!/B"%/+5$.+q(JIv"%G,=KT>"H(*Z T>"H(*+5C]"%$-P(D"H(:+.-@(:+.CQ-Lan+.-m/12!38"%9:%-:t'N9@!T (*B"i(6CL%F9:-@%a +nL9:t+.-"h-:FTT>"%9@I?%' (*,\T>"%+5Gv'N]"H(:F9:Q-nH'(:+.-/X[%3Xa ab/12!3U+.-)-:F+q(D"%/,$.m'N%9%G!9:-rFG=Q9:!9:"%=FB"H(:L-s+5GvT>"H(*oR2-@(*"H(*+5-@(:+.CL-"%G= CQ!T>7,F,(*Q9-@CL+.QGCL6"%-eL$5$j"%-!9*"H=FB"H(:-@(:F=QGE(:-+.G>CL!T7F,(:L9r-@CL+5LGCQ6"%G= %(:L9yEFB"HGE(:+5(*"H(*+q;!zpBL$5=-La abkvCQ<;!L94"%=,;H"%GCQL= (:!7+.CQ-8(:B"H(K"%9: (*9:"%=+q(*+.%GB"%$.$qI G%(8(D"%F%E(K+.G " pB9@-@( CQ!F9:-@%a B!9>g,"%T>7,$.%RrG!G7B"H9*"%TQ(:9:+.C?9@L!9@L-:-@+.!GjR/12%(*-P(*9:"%77+5GRr=,LG[Z -@+5(JIvQ-@(:+.T>"H(*+5!G?"%G=8!9*"H7+.CL"%$jT2=L$5-La , abkOB"];%4!T+5(:(:L= (*!7+5CL->+.G 7,9:!/B"H/+.$5+5(JI(*"H(v=G%(v7,$Y"]I "CLQG#(*9*"H$9:%$. +5G-@(*"H(*+5-@(*+5C]"%$)+.G['NL9:QGCLHaA %9d_g,"%T7$5%R&CL%FG#(*+.G, T>_(*2=-M"H9:h;2+.9P(*FB"H$.$5I "%/,-:LG#(La - abkJG !LG,L9*"H$R)kt(*9PI(*A"];!!+5= /XL$."%/X!9:+5G4(*Q=+.!F,-CL"%$.CQF$Y"i(*+.%G-M+.Gc'W"];!!9U%' QT>7"%-:+5L+.G,OCL!GCQL7,(:-La . abk1CL<;!Q9xG!G,7B"%9*"HT>_(*9:+5CS+.G,'NQ9:LG,CL)/1_'N!9:)7B"%9:"%T>_(*9@+.C&+.G['NL9:QGCLHax+5-+5-(* !7,71!-@+5(:&%',T!-P(-P(D"H(:+.-P(*+.CQ-/12!3E-/F,(kj/XL$5+._;!S+5(+5-o(*)9:+5!#(es"]Iz(*n=b+5(La /"%9*"%TQ(:9:+5C\T2=L$5-m"H9:dF,G9:L"%$.+5-@(*+5C\"%G=K7XL="%!!!+5C]"%$5$5IhFGGB"H(:F9*"H$a\0V +n<e e!F$5=>e3EG<e (*nQ;%L9@IE(:+.GU"H/1!F[((:m=,+.-@(:9:+5/F,(*+5!G_g[CLQ7,(s'N!9S!Gm!9 (JeM7B"%9:"%TQ(*Q9:1- #i^dk&+5G#(*9:2=FCQm-P(D"H(:+.-P(*+.CL"%$B'NFGC_(*+.%GB"%$.-)"%G,=v/12%(:-@(*9:"%77+5G ;%L9@I?L"%9:$qI"%G,=`-P(*F=QG#(*-bpBG=?(*+5-y2F,+5(*0GB"i(*F9:"%$a $ ':x+59:-@(MQ9:T 45/&9:%/B"%/+5$.+5(J6I ) "HG7= ':[LCQ!G= L9@T 2 abkt"%/B"%G,=!Gc(*?F-@FB"%3 42(D"H(:+.-P(*+.CQ8- )8"%77,9:#"%CDja[!T-P(*F=,LG#(*-\!G$qIK(D"H3%O(*OpB9@-@(tB"%$q'r"HG=+q(
, e!F$.=\/Xr"bCL9@+.T)+q'[(:QI0=,+.=\G%(-:Qs"%G#I0-@(*"H(*+5-@(:+.C]"H$%(*Q!9@I%aBF9P(*Q9:T!9:HR 79@!/B"%/+5$.+q(JI`+5-6T!9:OQG#"%!+5GhebQG-P(*F=,LG#(*-\C]"%G -@LU+5(z7,F,(z(*e!9:34+5G (*,\CQ!G#(*g[(m%'-@(*"H(*+5-@(*+5CL-Qa ab CQ!F9:-@ T><;!Q-v;!Q9@IuyEF+5CD32$qIu"%G,=uCL<;!Q9:-8TMFCD Th"i(*L9@+Y"%$a 8I CL!$ Z $.L"%!FQ-@!3%U(*B"i(0kCL<;%L9M"H$.$%'-@(*"H(*+5-@(:+.CL-z+5GA(:+.-zCQ!F9@-:>"%G=ALGCQ>(: (*+q(*$5%asCL%F9:-@+.-=QTh"HG=+.G,6/F,(&kB"];%ser!9@3%Q=>B"H9:=M(*Th"%3Hr(*,sTh"Z (*Q9:+."%$#"%-+.G#(*F,+5(*+q;!S"%-7X!-:-@+./$5&-:m(:B"H((*,T>"H(*Q9:+."%$%+5-;%L9PItF,G=L9@-@(*"%G=B"%/,$. =Q-:7+q(*z(*z'W"%-P(n7B"%CLHaSG#IEes"]I!RX-:$5<e CQ!F9:-@L-m"%9:0/X!9:+5Ga ab- n+5CDB"%9@= BQI2GT>"%G 71!+5G#(*L= !F,(LRs9:+5!!9U"%G= CL$Y"H9:+5(JI"%9:8G%(>-@I2G!G#I#Z T!F-Lakn"<;%M(*9@+.L=`(*v-P(*9:+53%O">![2=4/B"H$Y"%GCQ%a6"];!%+.= !_(:(*+5Gh/X!!%L= =<ebGM+5GdFG+5G#(*L9@L-P(*+.G,m(*QCDG+5C]"%$2=Q(*"%+.$5-LRT>"%G#It9@L-@F$5(:-x"%9@r-P(D"H(:L=teb+q(*%F,( 79@[%' a`/+5/$.+5!!9*"H7+.CO9@Q'NQ9:LG,CLL-U"i(d(*,vLG=H'b]"%CDCDB"H7,(*Q9O71!+5G#( (*,\-P(*F=,LG#(m(:"%779@!79:+."H(*z-@!F9@CLL-Qa a 6GdTdIzeL/,-:+5(:"%9@)pB$5L-eb+5(: CL2=Seb+.CDM-P(*F=,LG#(*-CL"%GdF,-:'N%9=%+.Gm"%$.$ (*,CL%T>7F[(*+.G,a +neQ;%L9QR,(*m/12!3O+5-)G,%((:+.Q=>(* "%G=v"HGEICQ!T>7,F,(*+5G $Y"HG!FB"%%6C]"%G8/X\F,-:L=ja mpB9@-@(7B"%9P(r%'1(*n(*g[(+.-SCQ!GCLQ9:GQ=veb+5(:>79@!/B"%/,+.$.+q(JId(:L!9PI!RE(*b'N!9@Th"%$ $Y"%G,!FB"%!zH'xFG,CLL9P(D"%+5G#(JIeb+.CD8+.-s(:0/B"%-:+5-b%'x-P(D"H(:+.-P(*+.CL"%$o+5G,'NL9@LGCQ%aS\/B"%-:+5C 79:%/$.QT(*"H(ber0-@(:F=,I?+5G87,9:!/B"H/+.$5+5(JI>+.- % !"#$&% !('*)+-,,+.0/213 !"451% !('0%6!(#7+,8'9:51;'0<>= )-'*?4+,#@
)-:LCQ!G=\7B"%9P(%'[(:S/12!3m+.-"%/X!F,(-P(D"H(:+.-@(:+.CL"%$%+5G,'NL9@LGCQ"HG=t+q(*-CQ$.!-@&CL!F-@+.G-QR =B"H(*"0T+.G,+.Gz"HG=UTh"%CD,+.G$5]"%9@G+.G,ab/B"H-:+.C7,9:!/$5LT %'X-@(D"i(*+.-P(*+5C]"%$[+.G,'NQ9:LG,CL +.-s(:0+.G#;!Q9:-:\%'79:%/B"%/+5$.+5(JI % A51B'0<C)-'*?4+,+.0/213)-B/D;,EGF IH6'*<51 % !('*)+-,,51J6 !K= C-L51A@
L-@O+.=L"%-6"%9@d+5$.$.F,-@(*9:"H(*Q=`+5G4x+5!F9@ a a/&9:Q=+.C_(*+.%GoR1CQ$Y"%-@-:+qpBC]"H(:+.!GjRXCQ$.F- Z (*Q9:+.G,M"%G=hL-@(:+.T>"H(*+5!G>"%9:m"%$.$ -@71QCL+Y"H$1CL"%-:Q-s%'-P(D"H(:+.-@(:+.CL"%$ +.G,'NQ9:LG,CL%aNMz"i(D"d"HGB"%$qZ I2-:+.-QRT>"%CD+.G,$.L"%9:G,+.Gd"%G=v=B"i(D"dT+.G+5Gt"%9:;H"%9@+.!F,-rGB"%TL-!+5;%LGv(:d(*79*"%CZ (*+5CLhH'-@(*"H(*+5-@(*+5C]"%$S+5G,'NL9@LGCQ%R&=Q71QG=+.G,4!G(:hCQ!G#(*_g2(La4v-@LCL%G=c7B"%9@(t%'
-
!
S9@!/B"%/,+.$.+q(JI /
Mz"H(*"O!LGQ9*"H(:+.GU79@[CQL-@-
6/-:Q9@;%L=K="H(D"
kJG,'NL9@LGCQd"HG= Mz"i(D" `+5G+.G x+5!F9: a % /&9@!/B"%/+5$.+q(JIv"%G,=8+5G,'NL9@LGCQ%a (*,K/X[!3CQ!G#(D"%+5G-!GKT!9@`CDB"H7,(*Q9h!G7,9:!/B"H/+.$5+5(JI(:B"H(hCL<;!Q9:-v-P(*2CDB"%-@(:+.C 79@[CQL-:-@L-n+.GCQ$.F=,+.G K"H9:3%<;?CDB"H+.G-Qa khB"];%A=,9*"]ebG L"];[+5$5I !G %(*Q9/12!3E-?+.G T>"%G#I7$."%CLQ-La `!-P(8CDB"%7,(:L9:CL%GE(*"%+.G "-@LC_(*+.%GfCL"%$.$5L= "r+5$./$5+.!%9*"%7,+.C bLT>"%9:3E-veb+5CDf-@L9P;!L-8/X%(*u(*c"HC_Z 3EG<eb$.Q=!8TdIc=L/[(>(:%(:L9>"%F,(:!9:->"%G=c(*71%+.G#(9:]"H=L9@-(:%(:L9UF-@Q'NF$ 9:_'NL9@LGCQL-Laukter!F,$.= L-:7XLCQ+Y"%$5$5I$5+.3H?(:TLG#(:+.!G(*,`/X[%32-/#I M6 9:2%("%G= [CDQ9@;2+.-@ V ^m"%G,= 9@+.TT>_(:(m"%G=AE(*+.9@]"%3HL9UV 6 ^b'N9:%T eb,+.CD kn"%=B"%7[(*L= T>"%G#Iv_g,"%T7$5L-n"%G=8_g[CLQ9:CQ+.-:Q-La
.
#7,C#7)-, N L 6 )+#7'* !(F
2 (D"H(:+.-P(*+.CQ+Y"%G,-b"%G=KCQ!T7F,(*Q9b-:CL+5LG#(*+5-@(:-m%' (:LG8F-:t=+ 1L9@LG#(n$Y"%G,!FB"%!z'N!9(: -*"%TK(:+.G,a +mQ9:4+5-v"=+5CQ(*+5!GB"%9PI(*B"i(v(*K9:]"H=L9vT>"]Ices"%G#((:9:_(*F9@G(* (*9@!F!,!F,(b(*0CQ!F9:-@%a
#7,C#7)-,
L-P(*+.T>"H(:+.!G CL$."%-:-@+5pBCL"H(*+5!G CL$5F-@(:L9:+5G =B"H(*" CL<;H"%9:+."H(*QCL$."%-:-@+5pBQ9 #I[7X%(:L-@+.CL!G[pB=LG,CL\+.G#(*Q9@;H"%$ =+.9@LC_(*L=`"%C_I[CQ$.+5Cz!9*"%7, s"]I!Q-:+Y"HG`+5G,'NL9@LGCQ 'N9:Qy2F,LG#(*+5-@(m+.G,'NQ9:LG,CL
'*?%I<C! )7)-
$.L"%9:G+5G -:F7XL9P;2+.-:Q=K$5]"%9@G+.G FG-@F71Q9@;2+.-@L=4$5]"%9@G+.G, (*9:"%+.G+5GO-*"HT>7$5 'N]"H(:F9:Q#I[7X%(:L-@+.#
#
s"]I!Q-mG_( "
s"]I!Q-:+Y"HGK+.G['NL9:QGCL
"
" #
$Y"%9@!6=_;2+Y"H(:+.!G?/X!FG=- 1/ n $5]"%9@G+.G,
-6$
F-@+.GU=B"i(D"M(*UL-P(*+5Th"H(: "%G?FG,32G,ebG`yEFB"%G#(*+q(JI 79@L=+5CQ(:+.GU"U=+.-@CL9@Q(* 'N9:%T 7F[(:(*+5GO=B"H(*"+5G#(*O!9@!F7V P^ V %^ (: "OT>"%7'N9:!TCL<;H"%9:+."H(*Q-s(*!F,(*CQ!TL-@F/-:_(H'"O7B"%9:"%TQ(*Q9-:7B"HCL +5GE(:L9P;%"H$j(*B"i(bCL!G#(D"H+.G-bFG3EG<ebG`y2F"%G#(*+5(JI eb+q(*8"O79:Q-:CL9@+./XL=?'N9:Qy2F,LGC_I TMF,$5(*+q;H"%9:+."H(*n=+.-P(*9@+./F,(:+.!Gveb+q(* -@71QCL+5pL=8CL!G=,+5(*+5!GB"%$ +5G=L7XLG,=LGCQt9@L$."H(*+5!G-P(D"H(:+.-@(:+.CL"%$jTQ(:[=,-b'N%9F-:+5GU=B"H(D" (:F,71=B"i(*0-:F/ @LCQ(:+5;%t/XL$5+.Q'N-P(D"H(:+.-@(:+.CL"%$jTQ(:[=,-b'N%979:2=FCQ+.G 7X!+.G#(Q-@(:+.T>"H(*Q-m"HG=?CL!G,p=LGCQd+5GE(:L9P;%"H$.eb+q(*?!FB"%9:"%G#(*QL-b!G?'N9:LyEFQGCQI8/XLB"];2+.%9 FG,+5'N!9@T/X!FG=,-m!G?79@!/B"%/,+.$.+q(JIhH'xQ9:9:%9:
"!
%$
2
!
'#7'* FI?H6'
-6$
@9 ]"%$oGEFTd/1Q9:+.G[' xV 1^ +5G,pBTMFT %x(:0$Y"%9@!L-P(bG2F,TM/1Q9h-:F,CD`(*"H( V 1^'N%9b"%$.$ (:+.G3>%'(*+5-b"%-(*0T+.G+5TMFT %'
-:F,7 V 1^ -@F79:QTMFT %(*0-@Th"%$5$.Q-@(GEFTM/XL9 v-@FCD`(:B"H( x V 1^r'N!9b"%$5$ (:+.G3>%'(*+5-b"%-(*0T>"ig[+.TMFT %'
V 1 " ^ V #" ^ %$&$&$' , )+ * ! ) -, )2# ( ,/. -0 , 0 <-0>=+? "%TTh"t'NF,GCQ(:+.!G768 ;: 3)V54x^ 9 !F[(*CL%T> @ A -:"%T>7,$.6-@7B"%CLhVW-@Q(n%'!F,(:CL!TL-*^ _;!LG#(MVW-@F/-:_(H' A ^ B xV @ ^ +5G=+.CL"H(*%9r'NFGC_(*+.%GDC +5' @ E "%G= H(*Q9@eb+.-@ FVG6^ 79@!/B"%/,+.$.+q(JI%'x_;!QGEH( I I GEFTM/XL9b%'7X!+.G#(*-+5G?-:Q( JLK CQFTMF$."H(*+q;!z=+5-@(*9@+./F[(*+.%Gh'NFGC_(*+.%G
MK 79@!/B"%/,+.$.+q(JI=LG-@+5(JIVN!9Th"H-:-D^'NF,GCQ(:+.!G ONPJ B"%-b=+5-@(:9:+./,F,(*+5!GQJ ONR
B"%-b=QG-:+q(J7I
4S "%G= B"];!0(*,\-:"%T\=,+.-@(:9:+5/F,(*+5!G TNPJ -:"%T>7,$.6%'x-:+5L 'N9:!TUJ V -P(D"%G="%9:=`wm!9@Th"H$j79@!/B"%/+5$.+q(JIh=QG-:+q(JI W -P(D"%G="%9:=`wm!9@Th"H$j=+5-@(*9@+./F[(*+.%Gv'NFGCQ0 (:+.!G F7,71QY9 4yEFB"HGE(:+.$5z%'[ZV ^r+a %\a W V " 4^ X : ? ]6V A^ 4^6_ JhV` 1^ g,7XLC_(*Q=`;H"%$.F,VWT]"%G ^rH'x9:"%G=!T;%"H9:+Y"H/$. ]6VGa,V A^@^ 4 6 aV` 1^ ? JhV` 1^ g,7XLC_(*Q=`;H"%$.F,VWT]"%G ^rH'a,V A^ bUV A^ ;H"%9@+Y"%GCQz%'9*"%G=,!T ;H"%9:+."%/$5 ' V M^ CQ;H"%9@+Y"%G,CLz/XQ(JeLLG "HG= =B"i(D" -:"%T>7,$.6-@+.L c CQ!G#;!L9@!LG,CLd+5G?79:!/"%/+.$5+5(JI "ed CQ!G#;!L9@!LG,CLd+5G?=+.-P(*9:+5/F,(:+.!G f g"eid h CQ!G#;!L9@!LG,CLd+5G?y2F"%=9*"i(*+.CzT]"HG TjPZV k 2lDm ^ V " ko^ npl f ZV ^
"
FI?H'
-6$
'# '0 '0#$I<-
-P(D"H(:+.-P(*+.CL"%$oT[=,L$ C,"O-:Q(n%'x=+.-P(*9@+./F,(:+.!G>'NFGCQ(:+.!G,-LR =,LG-@+5(JI'NFGC_(*+5!G-!9b9:Q!9:Q-:-@+.!G?'NFGC_(*+.%G 7"%9*"%TQ(:L9 Q-@(:+.T>"H(*z%'7B"%9:"%TQ(*Q9 MVGJd^ -P(D"H(:+.-P(*+.CL"%$j'NFG,CQ(*+5!GB"%$&VN(*,\T]"HGoR,'N!9g"HT>7$5<^ BV ^ $5+.3HL$.+52[=h'NF,GCQ(:+.!G 4,V #^ n d I n I +.-/X!FG=,L=?'N!9$Y"%9@! 4 vV %^ 4 &V #^ n "c d I n I +.-/X!FG=,L=K+5G79:%/B"%/+5$.+5(JI'N!9$."%9:! 4 SV !^
" "
!
, G9 < 51)+C,
< 4 ) ) 4 8 9 , 8 ) a 4 0
(
'N!9
m , a
$&$&$
* < $.+5T 8 4 "%T>T>"\'NF,GCQ(:+.!G+5-s+.-s=,QpBGQ=?/EI#3VG4^ 46\9 8 : 0 < 0>= ? 'N!9 4 a&kl' 4 (*QG3)V54x^ 4 V54 " ^3)V54 " ^aSkl' +.-n"%G8+.G#(*Q!L9s(:LG 3)V ^ 4 V Q" ^ a [!Tz-:7XLCQ+Y"%$1;%"H$.FQ-m"H9:&%3)V ^ 4 "%G= 3)V n ^ 4 &a
!#"%$&"('*)*',+.-/'*01+325467$8+32(467$8+3':9;$&)<)*$&=5>#?%$8>#4A@B&DCE?%$&=<+3',@F-G':=5>H?(=594I!+J$8'*=<+.-#KDL4 9;$&=M$&N5N():-ON5!#"%$&"('*)*',+.-P+Q254&!-R+QH$TS5':U&4!0Q4A0Q4I+1&@(N5Q&"5)*4I670IVW@B!#6X9I#'*=/Y%':N5N5'*=(>Z+3 +3254/$8=%$&):-[0!'*0\8@]9I#6N5?(+34I$&):>##Q',+32560K1^254O0_+J$&_+3'*=(>`Na#':=<+Z'*0A+3`0!Na4I9':@F-b03$&6N5):4 0QN%$894&V%+Q254/0Q4I+c&@]Nd#0Q0!'*"5):4O#?(+39I#640K e f gih jkl fjmgiclRnogi prq7lR Dn ^254`st%uvxwBysv1t%z#y|{RVa':0c+Q254i0!4I+H&@DNd#0Q0!'*"5):4M&?(+39I#674I0P8@$&=}4e~GNa4IQ'*64=<+K ]#':=<+30O'*={$&!479$&)*):4Sst%uv]wy
dWz#duy[s&MWyEt%w
Û#Û
,' U#4I=ß$8= 4U#4I=E+ ¼ VA)*4+ ¼ ® °; ç½{!¦#ç " ¼ µ}S54I=5&+Q4+Q254 9&67N5):464=<+ &@ ¼ K%$.=(@B&Q67$&)*),-#V ¼ 9$&= "a4ÃQ4$&S$&0'&!=5&+ ¼ K)(½^2549#6N5):464=<+O&@\{ ':0R+3254 46N(+.- 0!4I+*5K^254M?(=5'*#=&@]4IU&4=<+30 ¼ $&=5S,+á':0cS(4.-%=54IS ¼0/ +®½°; ç {!T ç ¼ #D¯1ç +#\ çm"d&+32 µ 2T25'*9J279;$&="a4T+32(#?5>#2<+&@$&03& ¼ #4+bK5(1$ @ ¼ âe² ¼ ã;²;åå;å '*0T$`0!4CE?54=(94i8@D0Q4+30+32(4= 7 6 ¼ æ ®êà× ç{rë`¯ç ¼ æ@B#T$8+c)*4$&0!+T&=54/'_è1å æ98aâ ^254`':=E+Q4!0Q49+3':#= &@ ¼ $&=50S + '*0 0¼ : + ® °; çß{ !Rç ¼ $&=(S ;ç +µ7!4;$&S & ¼ $8=51 S +b5K (=
0 2?4 2TQ',+34 ¼ :;+ $&0 ¼ +b@K $ @ ¼ âe² ¼ ã²;å;ååa':0O$0Q4IC[?(4=59I4 &@]0Q4+30T+3254I= A 6 ¼ æ ® à ç{rëÃ¯ç ¼ æ@B#T$&)*)a' è å 9æ 8aâ B4I+ ¼ C Ð +®°;¯ë¦ ç ¼ ²!D Eç " +7µEFK $ @]4IU&4_-4)*4I674I=<+T&@ ¼ ':0$&):0Q`9&=E+3$&'*=(4S'*= +D2 4 2T!':+34 =¼ G +Â#IVG4IC[?(':U8$&)*4I=<+3):-&HV +JI ¼ FK $ @ ¼ ':0\K$ -%=5':+Q4P0!4I+V()*4M+ L ¼ L8S54=(&+34 +32(4=E?56i"a4Ic8@x4I)*464=<+Q0c':= ¼ K
^]$&"5):4 K4<($&6N5)*4R0!N%$&9I4i$&=(S³4U#4=<+Q0K { 03$&6N5)*4H0QN%$&9I4 #?(+39I#64 ¼ 4IU#4I=<+`ÏB0Q?5"50!4I+P&@x{TÒ L¼ L =[?(6¦"a4IT&@xNd#':=E+Q0'*= ¼ Ï',@ ¼ ':N0 -%=5',+34Ò ¼ 9#6N5)*4I674I=<+&@ ¼ ÏB=5&+ ¼ Ò ¼0/ + ?5=5'*&=Ï ¼ #+`Ò ¼0: + # ¼ + '*=<+34IQ0Q4I9I+Q'*#=Ï ¼ $&=5,S +`Ò ¼ OÐ + 0Q4I+cS5Q' P 4IQ4I=5947ÏNd#':=E+Q0'*= ¼ +32%$¨+c$&Q4/=5&+T':='+`Ò ¼ G + = 0Q4I+c':=59):?50Q':#= Ï ¼ ':0P$`0!?5"50Q4+H8@x#4ICE?%$&)+3R+`Ò * =[?()*) 4U#4=<+¦Ïª$&Q) 2Z$;-[0@$&)*0!4Ò { +3Q?(44U#4I=E+¦Ïª$8S) 2\$-[0+3!?54Ò L 40Q$;-R+32%$¨+ ¼ â² ¼ ã;²;å;åå;$&!4 T]sVUda#$&!4u 1t%wwXWyZYz<w]s;FdyT':@ ¼ æ :ß¼[ ® *\2T254=54U#4Iií%®_ ] ^%Ka`%#O4~($&6N5)*48V ¼ âH®cb Ú ² Ò² ¼ ã/®cb ²eÛ#Ò² ¼ ä/®cb¿Û[²JÜ<Ò²å;å;å $8Q4
&Û Ü S5'*0 !#'*=<+K v1t5WBd&@1{Ó'*0$M0Q4CE?54I=594&@S5':0 !#':=<+Z0Q4+30 ¼ â² ¼ ã;²å;å;å[0Q?59J2 +325$8+ / æ96 8aâ ¼ æ ®r{RK ':U&4=|$&=|4IU&4=<+ ¼ V%S54.-5=54O+3254 T1ªz#tG× .11z&Wd ¼ "ExÏF\Ò® aÏF ç ¼ Ò® Ú ':':@\@A= ¯ç "ç ¼ ¼ å 0Q4CE?54I=594³&@c0Q4+30 ¼ âe² ¼ ã²;å;ååx'*0¦ud1%dyÓª1z#WyEt%s; ':@ ¼ â G ¼ ã G $&=5S 24³S54 -%=54)*':6 6 ¼ ® / 9æ6 8aâ ¼ æ.K 0Q4CE?54I=594}&@H0!4I+Q0 ¼ âe² ¼ ã²å;å;åx':0 ud1%dyCTy[z#Wy[t5s;ª ':@ ¼ â I ¼ Mã I $&=5Sm+Q254= 24¦S54 -%=54`)*':6 6 ¼ 7® : æ96 8aâ ¼ æ K $.=³4I':+Q2549$&0Q48V 2 4 2T'*):) 2TQ',+34 ¼ ¼ K E&<M< 5 ÓÀ { ®ìÍ ¤&©d»È* d¼ æ ® b Ú ² "8í.Ò ;¡¨º í]® ²eÛ[²;å;å;å:¶Ã· ¬%?© / 9æ6 8aâ ¼ æ ® © / æ6 8aâ ¼ æ® b Ú ² Ò ¤&©d» : 9æ6 8aâ ¼ æD®° Ú µG¶ F¦§F©(£ .Q¤<»|\`»[B¹© R¼ æ® Ï Ú ² "¨í.Ò F¬%E Ï Ú ² Ò ¤&©d3» : 9æ6 8aâ ¼ æ ® *׶\¾
 mg ;kI L42Z$8=E++Qi$&0!0Q':>#=$!4;$&)%=E?56¦"d4 \Ï ¼ Òx+3/4IU&4!-4U#4=<+ ¼ VE9;$8)*)*4IS`+3254Tv1¨ t ]ªwB W8@ ¼ KDL4/$8)*0Q`9$&)*) Â$¦v1W 1t ]ªwF WOT]sI¨ xWBa#Z$Ãv1W ]t ]wªF W uyEt%s;Wyd^1C[?5$&)*',@F-$&0c$`N5!#"%$&"5':)*',+.-#V 2%$&0+3Ã03$¨+3'*0_@F-b+Q25Q4I4i$¨~G':#670Ië 5 áeÄG© ¥ §«¡¨© F¬(¤& 䨣J£§,Ô&©(£}¤ º3Q¤&ÈP©%Ä[Å Á º \Ï ¼ Ò .¡ Q¤[¥J¬ ¸&©% ¼á§*£/¤ v]W ]t xwªF W T1sI¨ ]Wd ¡¨º/¤ v1W 1t ]ªwF WuyEt%s;Wy § §F £¤& §*£B¹ZJ£/ F¬%;¡¨ÈFÈ*¡¨§F©[Ô F¬[º3Ji¤;Ì#§«¡¨Åi£ Ydu \Ï ¼ Ò Ú ;¡¨ºi¸&ºeÇM¼ Ydu \Ϫ{TÒ® Ydu ZFT¼ âe² ¼ ã²;å;å;å ¤&º3M»&§*£ ;¡¨§F©% F¬%© 7 6 ¼ æ ® 6 Ï ¼ æÒeå æ98aâ æ98aâ
!
#"
%$
)&
+*
,&
< W W ¼ < < ¨ I ;<b & Ó 8 ;< 8 × G ;< <<# <
&('
)&
&
&
)&
3@1BA
)*
c
_,5=1 /Z1 7E5
0d]e
$
&
E&
,&
)&
i g
$ $
h
l
g
h
m
g
kj
n
$Ro
qp
r
$
^ 254!4$&Q4A67$&=<-/'*=<+34IQN5!4I+J$¨+3'*&=50]8@ \Ï ¼ ÒKx^2(4+ 2R9#66#=M':=<+34!N5Q4+J$8+Q'*#=(0 $&Q4Ã@BQ4ICE?54=59I'*4I0`$&=5SS54>#!44I0i&@\"a4I)*'*4@B0K $.= @BQ4IC[?(4=59-':=E+Q4!N5Q4+J$8+Q'*#= V \Ï ¼ ÒO':0 E$
$
5/
F
3
3R/K S:
9F 7EA
3[/?7
R^ >`_
V
L
/
a
MVWN
A=1 >
1 / V
1
/ L Kb5=1 L /
K87
-U583C/
3C3\7OI]3
V
/K
5 V 1
1 3
>
9F D ON
143
3X314DY5
I7 FV
/?7
1T5
:
1 /Z1
7=3?>
1 IJ/K
ML
3QP L K
L
<; 3
3C3@1 DE5
A=1 1 / ;
3
g
h
:
/?7
9F 7EA
$
f&
.-0/214365879/
Û +32(4M):#=5>ÃQ?5=³N5Q&Na#_+3':#= &@x+Q'*640+325$8+ ¼ ':0T+QQ?54'*=!4Nd4I+3',+3':#=50K`%#T4~($&6N5)*48V ':@ 24M0Q$- +Q2%$8+T+32(4N(Q#"%$8"5'*):':+.-8@x254$&S50c'*0 Û[V 2T254=³674$&=|+Q2%$8+T',@ 24MY%':N +3254 9&'*=}67$&=<-+3':674I0c+Q254=m+Q254¦N5!#Na&!+3':#=|&@+3':674I024¦>#4+R254$&S50H+Q4=5S50P+3 Û7$&0 +32(4H=E?56¦"d4Z8@ +3#0!0Q4I0'*=59IQ4;$80Q40IK H=':= -%=5',+34),-Ã)*#=(>5VE?5=5N5!4S5':9I+J$8"5)*4P0Q4CE?54I=594/&@ +3&0Q0Q4I0 2T25&0Q4H):'*6':+Q'*=5>RN(Q#Nd#!+Q'*#=`+Q4=5S50A+3i$9I#=50!+3$&=<+Z':0\$&=7'*S54$&)*' ;$8+Q'*#= V&6¦?59J2 )*' &4+3254`':S54;$&@$0_+33$8'*>#2<+O):'*=54M'*=>#4I#674+3_-#K^254`S54>&Q44 .&@ "a4I)*':4I@'*=<+Q4QN(Q4I+3 $ +3':#= ':0M+325$8+ \Ï ¼ ÒM674$&0Q?(Q40¦$&=
#"50!4!U&4 ¿0Ã0!+QQ4I=5>&+32 &@T"d4):'*4@+32%$8+ ¼ '*0+3Q?(4&K $.=}4I':+Q254T'*=<+Q4QN(Q4I+3$8+3':#= V 24i!4CE?5'*!4+Q2%$8+ c~G':#670 +37Ü25&)*S K^254iS5Q' P 4IQ4I=594 '*='*=<+34IQN5!4I+J$¨+3'*&=C2T'*):)=58+`6b$¨+Q+34Ii6¦?(9J2 ?(=E+Q'* ) 2\4S54$&) 2T',+32
0_+J$8+Q'*0!+Q'*9$&)':=(@B4 4=(94&K^254!4&V+3254 S(S' Pa4Q':=5>m':=E+Q4!N5Q4+J$8+Q'*#=50i)*4;$8S +3m+ 2 0!9J25[#)*0`&@c':=(@B4!4=59I4&ë +32(4/@B!4CE?54I=E+Q'*0_+H$8=5S|+Q254 Z$;-#4I0Q' $8=³0!9J25G&)*0KL4/S54I@B4IcS('*0Q9I?50Q0!'*#=|?5=<+3':) ) $¨+34IK =54/9;$&=|S54!':U&467$&=<- N5!#Nd4!+Q'*4I0&@ @BQ#6á+3254$¨~G'*&670I K P4IQ4M$&Q4/$¦@B.4 2/ë \Ï *EÒ ® Ú ¼ G + ® ZÏ ¼ Ò Ï +¦Ò = Ò Ú \Ï ¼ \Ï ¼ Ò ® Ð ZÏ ¼ Ò ¼ A +½® * ® ¼ 7 +³® \Ï ¼ Ò \Ï +`Òeå Ï«ÛGK Ò )*4I0Q0T#"#',U#4=':=+Q254O@B#)*): 2T'*=5%> B4I6767$(K 8¨MH¦5 "!1¡¨º¤&©%Ǹ&©% B£c¼ ¤&©d» + Ë Ï ¼ / +`Ò® ZÏ ¼ Ò ZÏ +`ÒxÐ \Ï ¼ +`Ò¶ # % `KRL!':+34 0¼ / + ® Ï ¼ + Ò / Ï ¼ +`Ò / Ï ¼ +`Ò$&=5S½=5&+34
+Q2%$8++32(40Q4 $ 4IU&4=<+307$&!4 S5'*0 !#'*=<+;&K P4I=5948V67'$ E'*=(>Q4Nd4;$¨+34S ?50!4&@T+3254@$&9+`+32%$¨+ '*0¦$&(S S5',+3':U&4R@B#S5':0 !#':=E+Z4IU#4I=<+30V 24M0!44O+32%$¨+ ® Ï ¼ + Ò 7 Ï ¼ +`Ò 7 Ï ¼ +`Ò ¼ 7 + ® \Ï ¼ + Ò ZÏ ¼ +`Ò \Ï ¼ +¦Ò ® \Ï ¼ + Ò ZÏ ¼ +`Ò \Ï ¼ +¦Ò \Ï ¼ +`ÒxÐ \Ï ¼ +¦Ò ® Ï ¼ + Ò 7 Ï ¼ +`Ò Ï ¼ +`Ò 7 Ï ¼ +`Ò Ð \Ï ¼ +`Ò ® \Ï ¼ Ò \Ï +`ÒxÐ \Ï ¼ +¦Òå}¾ E&<< /5 ) · \¡M¥J¡¨§F© ¢¡£J£J£ ¶ À ±bâTÁ T B¬%O¸&©% F¬(¤& ¬%Q¤<»¨£R¡¥J¥ÄGº3£R¡¨©b ¢¡£J£PÕ ¤&©d»RÈ* ±`ãPÁ Z F¬%T¸&©% B¬(¤& d¬%Q¤<»¨£c¡¥J¥Ä[º3£P¡¨©Ã .¡£J£ZØ ¶ B\¤&ÈFÈ%¡¨ÄG .¥J¡¨ÅbJ£Z¤&º3PQÉÄ(¤&ÈFÈ Ç
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
!*
$
$ $
$
$
$
$
$
$
*
$
$
$
$
$
$
Û È § <È ÇWË F¬(¤& /§*£JË Ï¢°¨±bâe²J±`ãµ8Òî \Ï.°¨±âJ²Q´ ã;µ8Ò`® Ï.°´1â²3±Ãâ ãIµ8Òâ ® Zâ Ï.°´1âJ²Q´ ãIµ8Ò® " Ë B¬%© \Ϫ±bâ / ±`ãeÒ® \Ϫ±bâ!Ò \Ϫ±`ãeÒxРϪ±bâ!±`ãeÒ® ã ã Ð ®rÜZ"׶Z¾ 8 I BA¼ ¼Â F¬%© ZÏ ¼ #Ò \Ï ¼ Ò ¤¨£ Ñ ¶ ¦0Q`+325$8+ ¼ â GÓ¼ ã G K B4I+ ¼ ® )*':6 V +Pã}® °; ç { ëê ç 6 ¼ ® / 9æ6 8aâ ¼ æ K R4 -%=54 +/â ® ¼ â ¼ ã²! ç " ¼ âeµEV +Hä® °; ç { ë ç ¼ ä²! ç " ¼ ã²! ç " ¼ âeµE²å;å;4å $ + 9;$&=ì"d4 0Q25 2T= +32%$¨K+ +â² +Hã²;å;å;åa$&Q4`S('*0 !&'*=<+;V ¼ b® / 9æ 8aâ ¼ æ® / æ 8aâ +cæD@B#R4$&9J2 $&=5S / 9æ6 8aâ +Pæ ® / 9æ6 8aâ ¼ æ«KcÏ T~(':#6 Ü$&><$8'*= V ):'*6 6 \Ï ¼ #Ò® ):'*6 6 æ98aâ \Ï +PæFÒ® æ968aâ \Ï +PæFÒ® æ97 68aâ +cæ ® \Ï ¼ Òå}¾
H _
7 F
K
@
$
$
$
$
$
l +7E5/Z1B5bP=1 / ;
$
F 7EA
7OI
$
$
A=1 1 /Z1
3
$
%
$
!#"
$
&%
$
$]o
p
r
$
$
r
r
$
$
o
$
p
$
 mg ;kI ¯; Ixl f g¦h jkIl fjmg¦TlOn #4I=54Q$&)«V(':@x{r'*N 0 -5=5':+Q4/$&=5S|':@14;$&9J2³#?(+Q9#64O'*04IC[?5$&)*),-)*' &4I):-#VG+3254I= ZÏ ¼ Ò® LL ¼3{ LL ² 2T25'*9J2³'*09$&)*):4S +32541] רu v]W ]t xwªF W T]sI¨ xWBdA^]Ã9&67N5?G+34/N5Q&"( $&"5':)*':+Q'*4I0V 24 =544IS +Q 9#?5=<+`+Q254=[?(6¦"a4I`&@cNa&'*=<+30`':= $&= 4IU#4I=<+ ¼ K ³4+325[S50 @B#\9I#?5=<+3':=5>`Nd#'*=<+30\$&Q4H9;$&):)*4S79#6i"5'*=%$¨+3#!' $&)%64I+Q25GS(0KDL4R=544IS5= +TS54),U#4O':=E+Q +3254I0Q4`':=m$&=<-³>#Q4$8+HS(4I+J$8'*)«KTL34 2T'*):)«V%25 24IU#4IV=54I4S $7@B4 2@$&9I+Q0P@BQ&6o9I#?5=<+3':=5> ('
!
#"
)
*
+*
$
*
*
$
&
E&
&
-,
Û 3+ 2(4#_-
+32%$8+R2T'*):)"a4|?50!4I@B?5)c) $8+Q4IK ,' U#4I= #" !49+30IVT+Q254|=[?(6¦"a4Ib&@>2Z$;-[0&@ #!S54!'*=5>/+3254I0Q4R&" !49+30Z':0 ® Ï Ð Ò Ï ÐÛ#Ò Ü Û K `%#9I#= #" !4I9I+30 @BQ&6 4K `×&\4~($&6N5):4&V(',@ 24O2%$;U#4$i9)*$&0Q0\&@xÛ Ú Nd4#N5):4O$&=5S 24 2\$&=<++3¦0!4)*4I9I+c$ 9&676':+!+344H&@DÜ`0_+3?5S54I=<+30V×+32(4=+3254IQ4$&Q4 Û Ú Û Ú ® Û Ú ® Ú ® Ü Ü Ü Û Nd#0Q0!'*"5):4O9#667',+Q+Q440IKxL4=(&+34O+32(4/@B&)*)* 2T':=5>¦N5!#Na4I!+Q'*40Ië ® ® & $ ( = S ® Ð å Ú mlOjlO mlO prqlO n $ @ 2\4cY%'*N$@$&':A9I#'*=+ 2T'*9I4&VG+Q254=+Q254HN(Q#"%$8"5'*):':+.-`&@ + 2\¦2(4;$&S50\'*0 ãâ ãâ K]L4 6¦?5),+3':N5):-+3254MN5Q&"%$&"5':)*':+Q'*4I0Z"a4I9;$&?(0Q34 24i!4><$&!S³+Q254M+ 27+Q#0Q0!40O$&0c'*=(S54Nd4=5S(4=<+;K ^254O@B#!6b$&)aS5.4 -5=5':+Q'*#=&@]'*=5S(4Nd4=5S54I=594'*0T$80T@B&)*)* 2T0IK 5 · \¡¸&©% F£T¼ ¤&©d» + ¤&º3 T1yEvy[ Ty[a § \Ï ¼ +¦ÒA® \Ï ¼ Ò \Ï +`Ò ÏªÛGK Ü#Ò ¤&©d»7\ºe§F .c¼ +¶ £ ¡¢¦¸&©% B£ ° ¼ æ1ëíç %µ §*£§F©d»[.Æ%©d»[©% § A ¼ æ æ ® æ ZÏ ¼ æFÒ ;¡¨º¦¸&ºeÇc¹©%§F .R£Ä Á £ Ρ¢ a¶ *
c
_ 5=1 /Z147E5
$
$
$
e
$
o
p
$
.=5S54INa4I=5S54=(94|9;$&= $&!'*0Q4b'*=+ 2 S5'*0_+3'*=(9I+32\$;-[0K $9I#'*= $
Û + 2T'*9I4&V 24?(0Q?%$&):):-}$&0Q0!?564¦+3254¦+Q#0Q0!40$&Q4i'*=5S54INa4I=5S54I=E+K2T25':9J2 Q4IY549I+Q0O+3254¦@$&9+ +32%$¨+P+Q254i9I#'*=³2%$&0H=(674I67&!- &@x+3254@-%Q0_+H+3&0Q0K$.=m&+Q254P'*=50_+J$&=(940IV 24iS54IQ',U#4 '*=5S(4Nd4=5S54I=594Ã"<-}U&4!':@F-['*=5>+Q2%$8+ Ï ¼ +`Òc® \Ï ¼ Ò \Ï +`Òc25#)*S(0K@`%#H4e~5$867N5):4&V ':= +3#0!0Q':=5>$7@$&':PS5':4&V ):4I+ ¼ ®ê°8ÛG² 5² [µÃ$&=5S):4I++ ®á° ²eÛ[²JÜ(² (µEKP^254= V 0¼ : + ® °8ÛG² (µEV ZÏ ¼ +`Ò³® Û " ß® \Ï ¼ Ò \Ï +`Ò|® Ï "&Û#Ò Ï«Û "&Ü#Ò$&=(SÓ0Q ¼ $&=5S + $&!4 '*=5S(4Nd4=5S54I=<+;K$.=}+Q25'*0c9;$80Q4&V 2\4¦S('*S5= +H$&0Q0!?5674+32%$¨+ ¼ $8=5ES + $&!4i':=5S54Nd4=(S54=<+ ':+ !?50_+T+3?5!=54S|#?(++Q2%$8+T+3254- 24Q48K #4 '*=5S(4Nd4=5S54I=594"<-)*[ E'*=5>¦$Ã+Q254/0Q4I+Q0T'*=|$ 4=5=³S5' $&>&3$&6|K E&<M< 5, · ¡£J£/¤I¤&§FºM¥J¡¨§F©mÕ[Ö¦ ª§FÅbJ£ ¶ À1 ¼ ® ¤& È*Q¤¨£ \¡¨© Q¤<» ¶ À ´ [ Á B¬%`¸&©% F¬(¤& «¤&§FÈ £`¡¥J¥Ä[ºJ£`¡¨©} B¬% ^ .¡£J£ ¶7· ¬%© \Ï ¼ Ò ® Ð \Ï ¼ Ò ® Ð \Ϫ$8)*) +J$8'*)*0QÒ ® Ð \ÏB´â.´aã ´â Ò ® Ð \ÏB´â!Ò â ZÏF´ ãÒ \ÏB´â Ò ?(0Q'*=(> '*=5S54INa4I=5S54I=594 ® Ð Û å å¾ E&<M< 5,& · \¡7Æ%J¡JÆdÈ* «¤ < ªÄ[ºe©(£ ºÇ¨§F©[Ô .¡£§F© ¤ Á ¤¨£ < Á ¤&ÈFÈ\§F©% .¡ ¤ © ¶ ºJ£¡¨©bÕH£Ä×¥J¥JJQ»¨£§F F¬PÆdº3¡ Á ¤ Á §FÈ §F ÇOÕ aÙ/x¬[§FÈ*1Æ%º3£¡¨©ÃØO£Ä×¥J¥JJQ»¨£§F F¬PÆdº3¡ Á ¤ Á §FÈ §F Ç Õ ¶ ¬(¤& T§*£¦ B¬%RÆdº3¡ Á ¤ Á §FÈ §F Ç B¬(¤& DÆ%º3£¡¨©
Õ£Ä×¥J¥JJQ»¨£ Á B;¡¨º3HÆ%º3£¡¨©Ø À Aé »[© ¡¨ ¢³ F¬%¸&©% á¢|§F©% ¢º3J£ ¶ À1 HN¼ [ Á ³ F¬%¸&©% M B¬(¤& ¦ F¬%M¹ºJ£ /£Ä×¥J¥JJ£J£³§*£ Á ÇÆ%º3£¡¨© Õ¤&©d»} F¬(¤& P§F /¡¥J¥ÄGº3£ ¡¨©
ºe§¤&È©%ÄGÅ Á º ^d¶ ¡¨ . B¬(¤& ¼ â² ¼ ã;²;å;åå ¤&º3 »&§*£ ;¡¨§F©% ¤&©d»b B¬(¤& ]é ® / [6 8aâ N¼ [ ¶ M© ¥JJË \Ï é ÒA® [ 68aâ Ï N¼ [ Òå ¡¨xË \Ï ¼ âQÒ® "8Ü%¶ ¼ ã ¡¥J¥ÄGº3£M§ /\/¬(¤&¸& B¬%/£QÉÄ%© ¥JiÕ Å`§*£J£J£JËAØÅ`§*£J£J£JË\Õ £Ä×¥J¥JJQ»¨£ ¶ · ¬[§*£¦¬(¤¨£OÆdº3¡ Á ¤ Á §FÈ §F Ç \Ï ¼ ãÒO® ÏªÛ "&Ü<ÒIϪZÜ "<ÒÏ "&Ü<ÒR® Ï "#Û#ÒIÏ "8Ü<Ò¶ !1¡¨È Ê
; 4
% a
#
$
$
?$
*
$
$
*
$
!$
$
$
$
$
$ $ $
$
$
$
!
!#"
'&
%$
)(
n
r
(
$
$
$
$
Û
È*¡¨§F©[Ô7 F¬[§*£/È*¡QÔ&§«¥\R£J/ F¬(¤& \Ï ¼N[ Ò®ÂÏ "#Û#Ò [ â Ï "&Ü<ÒI¶ M© ¥JJË 6 [ â 6 [ â Û \Ï é Ò® [ 8aâ Ü Û ® Ü [ 8aâ Û ® Ü å ºJ\OÄE£Q»b F¬(¤& dI¤[¥ A F¬(¤& BËA§ Ú F¬%© [6 8 [ ® "5Ï Ð Ò¶Z¾
$
r
$
r
[?56767$&_-&@ $.=5S54INa4I=5S54=(94 K ¼ $&=(S,+á$&!4':=5S54INa4I=5S54=<+c':@ \Ï ¼ +`Ò® \Ï ¼ Ò \Ï +`ÒeK Û[K$.=(S54Nd4=5S(4=59I4i'*0T0!#674+3':674I0$&0Q0!?564S³$&=5S|0Q#64I+Q'*640S54IQ':U&4S K ÜGK R':0 !#':=<+4IU&4=<+30 2T':+32Nd#0Q',+3',U#4ON5!#"%$&"5':)*',+.-b$8Q4/=5&+T':=5S54Nd4=(S54=<+;K <
$
?$
%
$
ID mg¦k  mg ;kI H0!0Q?56'*=5>R+Q2%$8+ \Ï +¦Ò\î Ú V 24TS54.-%=(4T+Q2549&=5S5':+Q'*#=5$&)GN(Q#"%$8"5'*):':+.-&@ ¼ #> ',U#4= +325$8++á25$&0TG9I9?5!Q4S³$&0@B#):)* 2T0K 5,;¯B \Ï +`Òî Ú B¬%© B¬% z#d T]FWBd1t%wMv]W ]t xwªF W ¡¢` ¼ Ô&§F¸&© + §*£ ZÏ ¼ L +`Ò® \ZÏ ¼Ï +`+`Ò Ò å ϪÛGK <Ò
!
#"
c
%$
_ 5=1 /Z147E5
$
&
E&
$
$
$
^ 25'*= b&@ \Ï ¼ L +`Ò\$80\+3254R@BQ$&9I+Q'*#=b&@+3'*640 ¼ G9I9?5!0T$&6#=5>i+Q25#0Q4O':= 2T25'*9J2 +Â[99I?5Q0IK$P4!4O$&!4R0Q#64P@$89I+30\$&"d#?(+Z9I#=5S5',+3':#=%$&)%N5!#"%$&"('*)*',+3':40K `%#Z$8=E-a-(~G4IS + 0Q?(9J2Ã+Q2%$8+ \Ï +`Ò\î Ú V \Ï L +¦Ò1'*0D$HN5Q&"%$&"5':)*':+.-/'ªK¿48Kx',+D03$¨+3'*0 -%40D+Q254\+Q25Q4I4T$¨~G'*#60 &@N(Q#"%$8"5'*):':+.-#K $.=N%$8!+3':9?5)*$&V \Ï ¼ L +`Ò Ú V Ï3{ L +`ÒT® $&=5Sm',@ ¼ â² ¼ ã;²;åå;å $8Q4 S5':0 !#':=E+a+3254I= \Ï / 9æ6 8aâ ¼ æ L +`Ò® 9æ6 8aâ ZÏ ¼ æ L +`ÒeK \?G+':+':0 '*=O>#4I=54Q$&)E×+QQ?54+Q2%$8+ \Ï ¼ L + / MÒO® \Ï ¼ L +`Ò \Ï ¼ L MÒKb^2(47!?5)*4I0M&@\N5Q&"%$&"5':)*':+.-}$&N5N():-m+Q³4U#4=<+Q0 $
$
O$
W$
$
$
6j
$
R$
$
$
Û #=}+Q254¦)*4@F+H8@D+3254i"%$&K$.=>#4I=54Q$&)]',+H':0R%¦+3254M9;$&0!4i+325$8+ \Ï ¼ L +`Ò\® \Ï +,L ¼ ÒeK ]4I#N5)*4M>#4I+P+325':0P9I#=(@B?50!4S$8)*)+3254+3':6748K`×&c4e~($&67N()*4&V×+32(4iN5!#"%$&"5':)*',+.-&@0!Na8+30 >#':U&4=-##?|2%$;U#464;$&0!)*40':0 "(?(++3254/N5!#"%$&"('*)*',+.-7+Q2%$8+-?}2%$;U&464;$80Q)*4I0T>#':U&4= +32%$¨+A-?b25$U&4P0!Na8+30A'*0=5&+ K $.=+Q25'*09;$80Q4&VE+32(4cS('SPa4Q4I=594c"d4I+ 244= ZÏ ¼ L +`Ò$&=5S \Ï +,L ¼ Ò1':0D#"#2b'*=):4><$&)d9;$80Q40+32%$8+Z':+'*0\0!#64I+3':674I09$&)*):4Sb+Q254HN(Q#0!49?G+3# ¿0 @$&)*)*$&9I-&K
Z
6$
$
$
$
E &<M< 5,;Þ ÅbQ»&§«¥Q¤&È .J£ x;¡¨º ¤ »&§*£Q¤¨£ ¬(¤¨£¡¨ÄG .¥J¡¨ÅbJ£ &¤ ©d» Ц¶ · ¬% Ædº3¡ Á ¤ Á §FÈ §F ª§«J£M¤&º3 ]e
g
¶ Ö<Ö EÕ ¶ Ö [Ö<Ö Ð ¶ Ö<Ö [Ö ¶ [Ö&Õ[Ö ! º3¡¨Å B¬%|»[B¹©%§F ª§«¡¨©Ó¡¢³¥J¡¨©d»&§F §«¡¨©d¤&ÈÆdº3¡ Á ¤ Á §FÈ §F ÇWË Ï L Òî \Ï ¦² Ò " Ï Ò® å Ú&Ú "5Ï¢å Ú&Ú å Ú&Ú#Ú Ò1® å ¤&©d» \Ï_aÐ L Ò® \Ï¢ÐM² Ò " \Ï Ò®½å #Ú Ú "5Ï_å #Ú Ú å Ú #Ú#Ú Ò å ¶ ZÆ<Æ(¤&º3©% È ÇWË B¬% .J£ §*£`I¤&§FºeÈ Ç ¤[¥J¥Ä[º!¤& ¢ ¶ §«¥ mÆ%J¡JÆdÈ*Ǩ§«È,»¯¤ Æ%¡£§F ª§F¸& [Ö`Æ%ºJ¥J©% R¡¢` F¬% §FÅbb¤&©d»¬%Q¤&È B¬[Ç`Æ%J¡JÆdÈ*`Ǩ§«È,»¤³© «Ô<¤& §F¸&7¤ Á ¡¨Ä[ [Ö Æ%º3¥J©% A¡¢T F¬%H §FÅb ¶ ÄÆ<Æ%¡£cÇ<¡¨Ä`Ô[¡A;¡¨ºP¤` .J£ ¤&©d»OÔ[ D¤PÆ%¡£§F ª§F¸& ¶ ¬(¤& ]§*£P B¬% Ædº3¡ Á ¤ Á §FÈ §F ªÇRÇ<¡¨Äi¬(¤&¸&\ B¬%T»&§*£Q¤¨£ ¡£ (Æ%J¡JÆdÈ*Z¤&©(£\º ¶ [Ö ¶¦· ¬%T¥J¡¨ºº3J¥ 1¤&©(£\º §*£ \Ï L /Ò® \Ï ¦² Ò " Ï /Ò®½å Ú#Ú "5Ï_å Ú#Ú må Ú &Ú#Ú Òx®å Ú å· ¬%TÈ*J£J£¡¨©¬%º3T§*£ B¬(¤& ]ÇE¡¨Ä7© JQ»M .¡¥J¡¨ÅHÆdÄ[ .c B¬%R¤&©(£\º©%Ä[Åbº§«¥Q¤&ÈFÈ Ç ¶ ¡¨© ] ºeÄE£ ]ÇE¡¨Ä[ºP§F©% Ä[§F ª§«¡¨© ¶ ¾ $ @ ¼ $&=5' S +á$&Q4/'*=(S54Nd4=5S(4=<+c4IU#4I=<+30P+Q254= \Ï ¼ L +`Ò® ZÏ ¼Ï +`+`Ò Ò ® ZÏ ¼ZÏÒ +`ZÏÒ +`Ò ® \Ï ¼ Òå + S5G4I0Q= +9J2%$8=5>#4 +3254/N5!#"%$&"('*)*',+.-b8@ ¼ K `×!#6 +Q254 S5.4 -5=5':+Q'*#=Ó&@9I#=5S5',+3'*&=%$&)HN(Q#"%$8"5'*):':+.- 2\4 9$&= 2T!':+Q4 ZÏ ¼ +`Ò® \Ï ¼ L +`Ò \Ï +`Òc$8=5S$&):0Q ZÏ ¼ +`Ò\® \Ï +'L ¼ Ò Ï ¼ ÒK @F+34= Vd+Q2540!4i@B#Q6i?5) $&4/>#',U#4?50 $`9#=
$
$
$
$
M$
$
&
$
$
M$
$
$
$
$
$
$
$
$
$
?$
e
M$
$
$
\$
$
ÜÚ E&<< /5 º!¤& ª\¡i¥Q¤&ºQ»¨£ eºJ¡¨Å ¤¦»[J¥ Ë1§F B¬%¡¨Ä[ 1ºJ.ÆdÈ,¤[¥JÅb©% ¶ À a¼ Á T B¬% ¸&©% F¬(¤& 1 F¬%¹º3£ ]»&º!¤&§*£ H¥JH¡¢DÈ Ä Á £c¤&©d»/È* + Á T F¬%H¸&©% F¬(¤& 1 F¬%Z£J¥J¡¨©d» »&º!¤&§*£ ZÄ×J©¡¢ §¤&Åb¡¨©d»¨£ ¶M· ¬%© \Ï ¼ ² +`Ò® ZÏ ¼ Ò ZÏ +,L ¼ Ò®Ï " #Û#Ò Ï " Ò¶ ¾
e
$
$
$
[?56767$&_-&@ \#=5S5',+3':#=%$&) !#"%$&"('*)*',+. K$ @ Ï +`ÒZî Ú +Q254= \Ï ¼ L +¦Ò® \Ï ¼Ï +`+¦Ò Ò å Û[K \Ï L +¦Ò0Q$8+3':0 -%4I0]+32(4Z$¨~G'*&6701&@%N5!#"%$&"('*)*',+.-#V@B#-(~G4Sa+bK $.=¦>#4I=54Q$&)«V ZÏ ¼ L Ò S(G4I0c=(&+c03$8+Q'*0!@F-7+3254$¨~G':#6708@xN5!#"%$&"('*)*',+.-#VE@B#N-(~G4S ¼ K ÜGK$.=|>#4I=54Q$&)«V \Ï ¼ L +¦Ò ® ] \Ï +,L ¼ ÒK (K ¼ $&=(,S +á$&!4':=5S54INa4I=5S54=<+c':@x$&=5S|#=5),-',@ \Ï ¼ L +¦Ò® \Ï +`ÒeK <
$
$
$
2$
$
M$
E$
$
$
$
lOn mlRlOh Z$;-#4I0 <+3254I#Q4I6'*0D$/?(0Q4I@B?()%Q40!?5):++Q2%$8+':0x+3254T"%$80Q'*08@N&_4~GNa4I!+A0!-[0_+3460 (M$&=5S & Z$;-&40 ×=54+30I)K (C`x'*!0!+;V 2\4/=54I4Sm$`N5!4):'*6'*=%$&_-Q40!?5):+K ¨ 8 < À c¼ â²;å;å;åI² ¼ Á ³¤Æ(¤&ºe §F ª§«¡¨© ¡¢ {/¶7· ¬%©(Ë%;¡¨ºM¤&©%Ç ¸&©% + Ë \Ï +`Ò® 9æ 8aâ Ï +,L ¼ æBÒ \Ï ¼ æÒI¶ K
g
6"
7 F
l
fi
Q:
K
7OI
7O/
F 7EA
+
$
A=1 1 / ;
$
?$
R4.-5=54 [ ® + N¼ [ $&=5S =5&+Q4H+325$8+ Pâe²å;å;å² R$&!4RS5'*0 !#'*=<+\$&=5S+Q2%$8+ ® / [ 8aâ [ $K c4=59I4&V \Ï +¦ÒA® [ ZÏ [ Ò ® [ \Ï + ¼[ Ò® [ \Ï +'L ¼N[ Ò ZÏ ¼N[ Ò
!#"
$!%
$
r
$
r
$
r
$
$
Ü 0Q':=594 ZÏ + ¼N[ Ò ® \Ï +'L ¼N[ Ò ZÏ ¼N[ Ò@B!#6+32(4HS54 -%=5':+Q'*#=&@9I#=5S5',+3':#=%$&)%N5!#"%$&"('*)*',+.-#K ¾
H > ; ,
$
$
$
8 I ¨ 1À (¼ âe²;åå;å² ¼ OÁ \¤Æ(¤&ºe ª§F §« ¡¨©¡¢ { £ Ä×¥J¬M F¬(¤& \Ï ¼ æFÒZî Ú ;¨¡ º¦Q¤[¥J¬ í3¶ B \Ï +¦Òî Ú F ¬%©(Ë×;¡¨ºiQ¤[¥J¬ í]® ²;å;ååI² Ë \Ï ¼ æ L +¦ÒA® \[ Ï\+'Ï +'L ¼ L N¼æFÒ [ Ò Ï \¼ Ï æN¼ Ò [ Ò å «Ï Û[K #Ò 7 F
K
l fi
<;
3
$
7 F
K
$
$
$
?$
$
?$
a¨R 5 ) m¥Q¤&ÈFÈ \Ï ¼ æBÒ B¬% v]¨Bdv]W ]t xwªF W½ ¼ ¤&©d» \Ï ¼ æ L +`Ò B¬% vdsIyE¨×bv1W ]t ]wªF W
¼ ¶ L4\$&N5N5),-R+32(4\S54 -%=5':+Q'*#=&@(9#=5S(':+3':#=%$&)
$
$
&
!#"
)&
&
E&
,
$
$
$
$
$
$
$
$
$
$
$
$
$
$
g
$
?$
$
$
(
g
&
$
$
k 7gij ; lRh g¦ n ^2546b$8+Q4!' $&)A'*= +Q25'*0`9J2%$8N(+34I`'*0`0!+3$&=5S%$8QS K R4I+J$8'*)*0M9;$&= "d4@B#?5=5S ':= $&=<=E?56¦"d4&@1"dG E0IK Z++3254O':=E+QQ[S5?59+3#_-b):4IU#4I)«V[+3254IQ4O'*0 R4 Q[&+$&=(S'
!
%
%
%
Ü<Û Ï«Û Ú#Ú Û#ÒeV$8+O+3254Ã':=E+Q4!674IS5' $8+Q4¦)*4U#4I)«V !'*6674+Q+R$&=5SC<[+3': ;$&47Ï Û#ÒP$&=5S$8Q Ï Ü<ÒeV×$&=5Sm$¨+P+Q254i$8S(U8$&=594ISm)*4U#4I)«V \'*):)*':=5>#0Q):4I-}Ï Ò$&=5S \Q4I'*67$&=Ï ÒK$ $&S%$8N(+34IS 67$&=<-4e~5$867N5):40/$&=(S N(Q#"5):460R@BQ#6 R4 Q[&+O$8=5S <[9J254_UG':0Q2¯Ï«Û Ú#Ú Û#Ò $&=5S Q':6764I+!+$&=5S <[+Q'* '$ &4IMÏ Û#ÒK lO cgik jjlO 4I=54Q$&)*),-#V',+'*0=5&+@B4$&0Q':"5)*4x+Qc$&0Q0!'*>#=N5Q&"%$&"5':)*':+Q'*4I0a+3c$&):)&0!?5"50Q4+30]&@5$0Q$&67N()*4 0QN5$&94T{RK $.=(0!+34$&S V<#=(4Q40_+3!'*9I+Q0$8+!+34=<+Q'*#=¦+QO$R0Q4+&@d4IU#4I=<+309;$&):)*4IS`$ Qt%w ×y 1¨t #T$ xy[w TC2T25'*9J2|'*0T$`9I) $&0!0 ê+325$8+c03$8+Q'*0 -540ë Ï' NÒ * ç V Ï':'FÒ',@ ¼ â² ¼ ã²;å;å;åI²; ç á+3254I= / 9æ6 8aâ ¼ æ1 ç $&=5S Ï':'*'FÒ ¼ ç ':67N()*'*4I0+Q2%$8+ ¼ ç K ^25470Q4I+Q0¦'*= $&!470Q$&'*S +3}"d4uy[t5s;1Wt ]wBydK L4b9$&)*)PϪ{R² Ò$³uyEt%s;¨t xwBy sv1t%z#y× $ @ '*0Z$¦N5Q#"5$&"5'*):':+.-64;$&0!?5Q4OS54 -%=54IS³#= +Q254=Ϫ{R² ² iÒ'*0Z9$&)*):4S$ v1¨ 1t ]ªwB W sv1t%z#y×iLì254= { ':0c+Q254¦Q4$&)1)*'*=(4&V 2\4i+3'$ &4 +3"d4M+Q254¦0Q67$&):)*40_+ -%4):Sb+32%$¨+9#=<+3$&'*=50T$8)*)×+3254R&Na4I=|0!?5"50!4I+30IV 2T25'*9J2':0Z9;$&):)*4ISb+325 4 ¦×Wy[w ]y[w TK p ³TlOPnxlOn K `x':)*)¨'*=O+32(4AS(4I+J$8'*)*0 &@G+Q254N5Q[&@G&@G^254#!46½ÛGK K H):0Q5V¨N5!WU&4+32546#=5&+Q#=54 S54I9!4;$&0!'*=5>9;$&0!4&K ÛGKcQU&4+Q254/0!+J$¨+3464=<+Q0c':=4CE?%$8+Q'*#=Ï«Û[K ÒeK Ü(K B4I+M{Â"d4$|0Q$&6N5)*4`0!N%$&94b$&=5S )*4+ ¼ âe² ¼ ã;²;å;å;åI²"a44IU&4=<+30IK R.4 -%=(4 + ³® / 9æ6 8 ¼ æx$8=5S /® : 9æ6 8 ¼ æ«K Ï$<Ò + +â I +Hã I $&=5S+32%$¨+ Pâ G +Hã G K ÏB"×Ò #07+Q $&=Ó'*= -%=5':+Q4 =E?56¦"d4T&@1+32(44U#4I=E+Q0 ¼ âe² ¼ ã²;å;å;å,K ÏB9KÒ #0O+3|$&):)x+3254Ã4U#4=<+Q0 ¼ âe² ¼ ã²;å;å;å54e~G94NG+HNd#0Q0!'*"5),- %$ -%=5',+34O=E?56¦"d4T&@1+325&0Q4/4IU&4=<+30IK 5KB4I+P° ¼ æ1ë¦íç %µ/"d4R$i9I#)*):49I+Q'*#=&@4U#4I=E+Q0 2T254!4 Ã'*0$&= $&!"5':+Q3$&_-Ã'*=5S54e~ *
%
^ '
f
&
^ '
)&
E&
*
&
*
)&
^
^ '
%
E
E
Z M (
Ü#Ü
0Q4+;K
A ¼ æ $&=5S A ¼ æ ® 7 ¼ æ ¼ æ ® æ æ æ æ P':=<+;ë `x':Q0!+TN(QU#4/+325':0Z@B# `®° ²;å;å;å² µEK GK>#4+4~($&9I+Q):-m+ 2}2(4;$&S50IK R4I0Q9!'*"d4 +32(4i03$867N5):4/0QN%$&9I4 ZKZLì2%$8+H'*0T+3254N5!#"%$&"5':)*',+.-b+325$8+H4e~($&9I+Q):- +Q#0Q0!40O$&!4 Q4ICE?5'*!4S (KB4I+H{ή ° Ú ² ²;å;å;å²µEK!WU&4i+325$8+P+Q254!4iS5[40P=5&+P4~G'*0_+R$?5=5',@B#Q6 S5':0!+QQ' "5?(+Q'*#=#= {'«K 4&K³':@ \Ï ¼ Òi® \Ï +¦KÒ 2T254I=54IU&4E L ¼ L® L +,Ld+3254I= Î9;$&=5=(&+ 03$¨+3'*0_@F-+32(4M$W~(':#60Z&@]N5Q#"5$&"5'*):':+.-&K K B4I+ ¼ âJ² ¼ ã²;åå;å("d4/4IU#4I=<+30NK + / 6 8aâ ¼ M® / 6 8aâ + GK K
> A 6 ¼ æ ® å æ98aâ 7
o
p
o
p
%
*
k$
$
o
$
$
r
p
%
$
$
$]o
p
K`%#N-(~G4S +á0!?59J2}+Q2%$8+ \Ï +`ÒZî Ú V%0Q25 2 +Q2%$8+ \Ï L +`Ò\03$¨+3'*0 -%40T+32(4M$W~(':#60 &@]N5Q&"%$&"5':)*':+.-&K Ú K A#? 25$U&4 N5Q&"%$&"5),- 254;$&!S
':+M"a4@B#Q48K P 2 -?
9;$&=
0!#):U&4b',+i!'*>##!#?50!):-#K $ +`':0i9;$8)*)*4IS+Q254 & ³&=E+.- H$&):)A!#"5):46|K)( N(Q' I4b':0iN5)*$&94IS $8+¦Q$&=5S5#6 "d4I+ 244= #=54&@c+325!44S5[#Q0IK A&? N('*9
$ S5[#K
^1"d49#=(9Q4+348V\)*4+ 0 0Q?(N5Na&0Q4Ã-##?$&S) 2\$;-[0N('*9 mS5[# K P 2 ³&=E+.- P$&)*)]9J25[#0Q4I0M#=(48@+Q254 &+Q254D+ 2MS(G#!0V<#Nd4=(0':+$&=(S0!25 2T0A-?`+32%$¨+':+':046N(+.-#Kc4Z+3254I=7>&':U#4I0 -#&?+Q254P&N5Na&!+3?(=5':+.-M+3&44IN7-?5S5[##0 2T',+39J2Ã+3/+3254T&+Q254?5=5&Na4I=54S $
2$
,
,
Ü'
S5[#IK &>#40_+30c':+S5[40!= +P67$8+!+34IK ^254/9I#Q!49I+H$&=50 2\4IP':0Z+32%$¨+c-?}0!25#?5):S0 2T':+Q9J2 K!U#4i',+;KF$ +2T'*):) 254I)*N+3 0!Na4I9':@F-b+3254/03$867N5):4/0QN%$&9I4i$8=5S|+Q254/Q4I)*4IU8$&=<+c4IU&4=<+30H9;$&!4I@B?5):):-#K^2E?502TQ',+34 { ®°GÏFâe²_xãJÒë¦1æ1ç° ²eÛ[²JÜGµ<µ 2T254IQ4OAâ':0 2T254!4+Q254ON5!' 4O':0T$&=5S]ãT'*0 +Q254/S5G& ³#=<+.- #Nd4=50IK K>&Q44I= #="d&+Q2 0!'*S54I0V+32(4`0Q49I#=5S'*0 !4S
#=
"d&+32
0!'*S54I0Ã$&=5S +Q254b+32('*QS
':0i>#!44I= &=
#=540!'*S54 $8=5S Q4S
&= +3254 &+Q254IKiL49J2(G#0!4$9$&QS$8+OQ$&=5S5&6 $&=5ES 240!44#=54`0!'*S54³Ïª$8)*0Q 9J2(#0Q4I= $8+M3$8=5S5#6bÒ\K $ @+Q254b0!'*S5a4 2\4b0Q447'*0>#Q4I4= V 2T2%$8+¦':0+Q254bN(Q#"%$8"5'*):':+.-m+Q2%$8+ +Q254T&+3254I0Q'*S(4c':0$&)*0!/>#Q4I4= ³$&=<-`Na4I#N5):4c':=<+3?5',+3':U&4),-¦$&=(0 24 ÛG K <[25 2 +Q2%$8++32(49I#Q!49+H$8=50 24c'*0cÛ Ü(K Ü(>K #40_+Ã9J25'*):S2%$&0i"5)*?5474I-&40 V 2T2%$8+i'*0+3254 N5!#"%$&"('*)*',+.-Ã+32%$8+P$8+T):4;$&0_++325!44/9J25':)*S5!4=2%$;U#4"5)*?(44-#4I0 (>K
,
,
%
$
$
$
*
$
$
$
$
Ü K#4I++3254cUG':Q?50\$&=5S Ú Nd4Q9I4=<+&@+Q254 B':=[?[~b?(0Q4!0Z>#4+\+32(4PU['*!?50KDL4 0Q4I)*4I9I+T$¦Nd4!0Q#=$8+\3$&=(S5#6$&=5S )*4$&Q=7+32%$8+Z254Z0_-[0!+34I6 2\$&0Z'*=G@B49I+Q4S 2T',+32 +32(4/U[':Q?50IKLì25$8+T'*0Z+Q254/N5Q#"5$&"5'*):':+.-+32%$¨+T0Q254'*0T$¦Lì'*=(S5 2T0c?(0Q4 K "d~ 9#=<+J$8'*=50 ¦9&'*=50T$&=(S|4$&9J2|2%$&0c$`S5S' Pa4!4=<+TN5Q&"%$&"5':)*':+.-&@]0Q2( 2T':=5> 254$&S50K B4I+ âJ²;å;å;åI² S54I=5&+34m+Q254mN5!#"%$&"('*)*',+.- &@/254;$8S50 #=Ó4;$89J2Î9I#'*= K <$8'*= KHLì25$8+O'*0c+3254`N(Q#"%$8"5'*):':+.-&@A$&=(&+3254IR254$&S $.= &+Q254 2#QS50 -%=(S \Ϫ±`ã L ±â¢Ò 2T254!4i± [ ® &!254;$8S50T#=|+Q#0Q40 ^%5K ( P 2Ó0!?5N5Nd#0Q4T+325$8++3254T4~GNd4!'*64=<+ 2\$&09;$8QQ':4S#?(+$&0@B#):)* 2T0K1L4T0Q4I)*4I9I+ $`9#':=|$8+T3$&=(S5#6 $&=(S|+Q#0Q0T':+T?(=E+Q'*)$`254$&S|'*0#"(+3$&'*=(4S K Ï9;Ò `x'*=5S \Ï \æ L + Ò 2T254!34 + ® &-%Q0_+P2(4;$&S|'*0#"G+J$&':=54S#=+3#0!0 5K)( Û Ú KÏ E[& P eK Ò 254$&S50 K $ @ 2\4OY%':N+3254O9I#'*= 67$&=<-b+Q'*640IV 2K4 2#?5):S 4~GNd49I+T+Q254ON5Q&Na#_+3':#= &@254$&S50\+Q¦"d4R=54;$8 KxL4 2T'*):)×6b'$ 84H+32('*0@B&Q67$&)×) $8+Q4Kx^x'$ 84 ® å Üi$&=5S ® ;Ú#Ú#Ú $8=5Sm0Q':6¦?5)*$8+34 9I#'*=|Y%'*N(0KR):&+P+Q254iN(Q#Nd#!+Q'*#=³&@254$&S50O$80R$ @B?5=59+3':#=&@ K #c4INa4$8+@B#b® å Ú Ü(K Û KÏ <[# P eK %Ò
Z M (
$
$
$
$
$
,
!*
,
$
$
$
+7
P/ 9F
9F 1
5/
+7
P/ 9F
9F 1
5/
Ü
*
$ "5':=5#6' $&)TQ$&=5S5#6 U8$&Q'*$&"5)*4'2T25'*9J2ì':0S5'*0!9?50!0Q4ISX'*=¯+Q254=54e~G+9J25$&N(+34IK $.=<+3?(':+3':#=0!?5>#>#4I0!+Q0¦+32%$8+ 2T'*):)"d49):#0Q4+3 K}^]0!44 ':@\+325':0¦'*0+3Q?(4&V 24¦9;$8=Q4Nd4;$¨+R+325':0H4e~(Nd4!'*64=<+O67$&=<-³+Q'*640H$&=5S$;U#4I3$&>&4¦+3254 U8$&):?540IK Z$&!!-Ã#?G+Z$0Q'*6i?5) $8+Q'*#=Ã$8=5Sb9I#67N5$&Q4T+3254H$;U#4Q$&>#4P&@ +Q254 0+3 KD^]_+Q25'*0Z@B#®½å¿ÜÃ$&=5S ® Ú ² ;Ú#Ú ² Ú&Ú#Ú K Û#ÛGKMÏ HE[& H eK Ò P4IQ4 24 2T':)*)>#4+¦0Q&6744~GNd4Q':4=59I40Q':6¦?5)*$8+3':=5> 9I#=5S5',+3'*&=%$&)dN5Q#"5$&"5'*):':+Q'*40IK \&=50Q':S54+3&0Q0Q':=5>Ã$¦@$&'*S5'*48 K B4I+ ¼ ®°8ÛG² 5² Gµ $&=(?S + ®° ²eÛ[²JÜ(² (µEK^254I= V \Ï ¼ Ò® "#Û[V \Ï +`Ò®XÛ "&Üc$&=5S \Ï ¼ +`Ò® "8Ü(K
P/ 9F
#7
9F 1
5/
*
O$
$
$
9$
$
?$
*
*
*
*
*
*
*
*
*
#! "%$'&'()+*+,-$'./(0" 132546287:9;2<7:=>9%4@?BACAD46254EF7G?H7:?HI4@JLKM=/NO?B=/K/JL?HK/ACPQ7R2<SCAD462<4BTVUWNXPYAHNPZK+[G7:?H\ 9<4@E0]H[GK^9L]D4@=>K/9_46?HA`KbaOK/?c289d2A`icjk2<SBKh=>NO?H=/K>]B2_N6l'4 J<4@?BAHNOEa64@JL7m4@iH[GK@T n'o>prqOs t<s uBqwvHxzy|{~}XDr
BH}BV:0bcr3+ ^ B@ 5bR@B^<8@ ¡D¢30£bb¤¥¦Z§¨ª©k83«5`©¢¬«5©-¦d®
¯°o/±²cqOs³±µ´c¶·¶ ¸/¹º´¼»m´½qµ¾ Ï 2Q4Ð=/K>JL2<4@7:?%]rNO7G?½2Z7:?%E0NO9;2¤]HJLNOiD4@iH7G[:7R2jw=/NOÑBJ898K>9/ÒB28SHK_984@EF]B[:Kd9L]D4@=>K_7:9ÓJ<4@JLK/[Gj ¿cÄOuBokÀ Àdo/Á´´ÇÈ»ÂÆcsôc»´cÄcÄc¶ oŶ o@x%ÀhÉBÆXo>Ǫo t EFK>?c2<7:N@?HK/AÔ4@?BAÔPÓK-PZN@J8\ÐAB7:J8K>=>28[Gj^PQ7G2<SwJ84@?HAHN@Ea64@J87:4@iH[GK/9/TÕZÑB2'j@NOÑÔ98SHN@ÑH[:Aw\6K/K/] t5²Xo tªo/±²cqOs³±/´½¶´½Êµ¾ 7:?E07:?HA%28SD462Q2<SHKÖ984@EF]B[:K_9L]D4@=/K^7:9-JLKµ4@[G[Gj02<SHK>J8K6Òr[GÑHJ8\½7G?HIw7:?k2<SBK^iH4@=5\½IOJ8NOÑB?HA°T ÊOo6qX¿Bs ËÍÌuc»¿co>t;´Osζ Ç/x ×½Ë@´cÀhÊc¶ o^vHxÙØÛÚÜ Ý·ÞM«5©ßb5/®Íà#bá¥m¦Z§Í£bÐDÐD¢¬0£bbÔ©ªâ^D8cãÐßD 8ä/¢åbæ«5Z¦d®QÚ©ÐLç½@dr :5è' âZ¦êéìëMëîíïëMëîíïëMëîíQíDbF¥¦Z§'éñðH®¤ò ×½Ë@´cÀhÊc¶ o^vHxÙvÛà#brÛéôó¬¥õ#ö<÷B§bø'õåùcúF÷3ùQûVü@ýÔ£b¤D¤¢¬Dã@: «/®ÖþÓ©Bbã3b-ã@L@ÿ'3 _D©DÂ@'8@rã3©#âÈ<©Ö®Mdÿ' r0 c_3:_ã38w©<Ür55«b: d R@b/® { 5r «8@ D©¢¬«5©Ó:頻â Dæ⵩ÈÛ¦êé ¥õ#ö<÷B§/® °©QLç½@dr :5-©ªâ 8@rã3© @Èc£/ :5 @<Q¥¦Z§áé õ°è %¥¦Z§'é ÷åè Ð¥¦Z§'é õwú÷Dè ¥¦Z§áé õ ù úê÷ ù ®-ò Ö7RaOK/?^4-J<46?HAHNOE a64@J87:4@iH[:K 4@?HA^4¤98ÑHiB98K>2 ÞN@l32<SBK'J8Kµ46[O[:7G?HK@ÒµABK D?HK b¥ §Üé óµ"¦ !MìÔ¥¦Z§ ! ý 4@?BA [GK>2 #%$
#
!
"
¥ ! §~é # ¥ ¥ §8§Üé # ¥óµ¦"!+_øQ¥m¦Z§ ! ý6§ # ¥é õ §~é # ¥ ¥õ §L§Üé # ¥ªóµ¦ !+_øQ¥¦Z§'éCõ¡ý6§%$ AHK/?BN@29Q28SHKÖJ<4@?BAHNOEa64@JL7m4@iH[GK¨4@?HA õ AHK/?HN62
½× Ë@´cÀ^Êc¶ oÖvHx'& ÚÜ Ý·ô «5©ñÿ' «5@rã :b¨ £b+DßD¢¬0£bb©ªâîD8cã/®)(°DbBè # ¥ é+*c§-é # ¥ó íQí^ý6§Zé ü-,/.¬è # ¥ éÅüX§¤é # ¥ªóëîíQöLídëßý6§ZéÅü0,1+@rã # ¥ é 1O§Mé # ¥ªóëMëßý6§`é ü-,2.å®3(°DML@rã3© @bc£/ :M@rãÞgã@:bÂÈ£/¢3 ©ô«8@ £b b¢3w0@b5 48ãFÜ⵩ :©ÿ76 ¦ # ¥ªóµ¦Qý6§ ¥m¦Z§ õ # ¥ éCõ § (8( 9%:; < < -ü ,2. (>= 9%:; 9 9 -ü ,1 =?( 9%:; 9 -ü ,2. @ == 9%:; @ (H Ð3bæb8@ Ý5 4/3Ô3:^B © ADCÓ·D/®¤ò FE
I. H$á& .7J+*`$'./(0" K *+"+,-$'./(0"LHNMÐ"+) O &'(?JPMQJ+.IRµ.>$TS K *+"+, $'./(0"LH Ö7RaOK>?g4%J<4@?HABNOEYa64@JL7m4@iB[:K ÒæPZKwABKD?HKÔ46?+7:E0] NOJ;254@?c2WlÑH?H=b2<7:N@?+=µ4@[G[:K/Aî2<SHK =/ÑBEÍÑH[m42<7Ga@KFAB7:9L28J87GiHÑB2<7GNO?glÑH?H=>287:NO? ¥ NOJhAH7:9;2<JL7:iHÑB287:NO?glÑH?H=b2<7GNO? § 7:?28SHKFlN@[:[:N PQ7G?HI PZ4 j@T n'obp qOs t<s³uBqÍvHWx VX(°DZ Y\[] [_ ^X
_a `b^ }c [d^X f eg[Z Yh^X j i>kZlnmZo Oì p *Böµ7ü qQ©ªâ^L@rã3© @Èc£/ :- :Ðã3s rÜæ8ã%£ mZoÖ¥õ §Üé # ¥ ûÛõ %§ $ G
U
_ "! 8 Z%8Q Z%"B d7 Q Z%"
¨¥õ §
mZo
T$
ü
T1
#
õ ü 1 7GIOÑHJ8K # T ü@ i>kl lNOJ D7:]B]H7:?HIÔ4Ô=/N@7:?k2PQ7:=>K ¥ 46EF]H[GK # T ð T § NOÑ`E07:IOSc2QPÓNO?HAHK>JïPQScjîPÓKÐirN@28SHK/JQ2kZl T NOÑîPQ7G[:[¡98K>KÐ[:462J 2<SD42W28SHK i>kZl 7:9Q4ÔÑH9LK>lÑH[°lÑH?B=>2<7GNO? 7R2QK K/=>287Ga@K/[Gj%=/NO?c25467:?H9W4@[G[ 2<SBK^7G?BlNOJLE46287:NO? 4@irNOÑB2-2<SBK^J84@?HAHN@Ea64@J87:4@iH[:K6T ×½Ë@´cÀhÊc¶ o^vHx ÛÚÜ Ý·ÞÖâ>@Ô«5©ÿ' «50@rãk :bá £bwDÍD¢30£bbF©ªâÖD8cã/® (°Db # ¥é *c§'é # ¥ é 1O§'é-ü ,2.0@rã # ¥ é ü § éô-ü ,1H® (°D¤ã@:bÂÈ£/¢3 ©QâÈ¢3æ«b © : * õ * 0ü ,2. *ÔûCõ ñü mZo_¥õ §Üé # ,2. ü^ûCõ 1 ü õ 1$ (°D? i>kZl :%5D©ÿ' ÚÜR@¢¬< ½® 9c®{d ÝD©¢@ê3:MLç½@dr :k:bdr :5è¨b¢Bã _«8@5âÈ¢3 @® iTkl %«8@º£b @b M«5©µâÈ¢½b3c® ^©Â «5wB@QD¤âÈ¢3æ«b ©º:ÍÈR3 «5©DÂD¢D©¢c5èÖæ© ;ã35«b<8b3+@rã+B@¨_:kã3s rÜæ8ãÐ⵩k@ #õ @bD©¢@|D L@rã3© @Èc£/ :Í©D F c5 @ Ý¢D5 *Böµü%@rã 1H® Í© c©¢ 5^ÿ mF¥;ü $W.½§'é $ $ e ò f-SHKwlNO[:[GNXPQ7G?HIJLK/9LÑH[G2/ÒæPQSH7:=5S+PÓKÔAHNk?HN@2¨]BJ8N aOK6Ò9LSHN PQ9¨2<SD462¨28SHK i>kZl =>NOE ]H[:Kb2[Gj%ABK>2J8E07:?HK>9-2<SHKÖAH7G9L2<JL7:iHѬ2<7:N@?kN@lá4wJ84@?HAHNOEa@46J87m46iH[:K6T ¯W²Xo>uc»mo6À Và#b' B @ i>kl m~@rã :b B @B i>kZl F® â mF¥õ §ïé ¥õ § ⵩h@ BõDb # ¥ ! §'é # ¥ ! §Ü⵩^@ ® ¯W²Xo>uc»mo6À V{âÈ¢3æ«b © m0bcr3ÍD_<8@ å Ýæ_© p *Bö/7ü qá:¨ iTkl¼âµ©W © r<©@£5c£/ Ý Í8b¢3< # â¨@rãF©D Í âï/@:s r¤5dD⵩ :©ÿ'3w3<5h«5©rã@ ©B7 6 *
!
#"
%$
'&
(
)
+*-,/.
+*-,65
0
%1
20
¯°o/±²cqOs³±µ´c¶·¶ ¸/¹ 'oÍuBqc¶ ¸ ²´XÁXoVt5²´ t # ¥ ! §ôé # ¥ ! § Ìu½»½obÁXo6»G¸dÀïoµ´XÇbÆc»m´½Äc¶ o o>ÁXo6q t x 43
.\*
!
"
¼:^æ© ;ã35«b58b3w®/®Óõ õ ù dr Ý 5¨B@mF¥õ §¤û mF¥õ ù §>® :^æ©b0@ Ý5 48\ã 6 [:7GE mF¥õ §Üé *+@rã [:7GE m¥õ § éôü½® :ÖÈR3 8«5©DD¢å©¢c5è ®/® mF¥õ §Üé mF¥õ#§á⵩h@ Hõ°è ÿDb< E mF¥÷B§%$ mF¥õ § é [:7: T #Kb2ÖÑH9Ö98SHN P2<SD462 ¥ 7:7G7 § SHNO[:AB9/T¡Kb2 õ l 1¬ÑH]H]rNO98Kw28SD462 m 7G9Ö4 i>kZl irKÍ4FJLKµ4@[?½ÑHEÍirK/Jd4@?HA`[GK>2 ÷ ö8÷ ù Iö $7$I$ i KÐ4F98K ½ÑHK/?H=>KwN@l'JLKµ4@[?½ÑHEÍirK/J89d98ÑH=5SM28SD462 ÷ ÷ ù 4@?HA[G7:E ! ÷ ! éCõ "T #Kb2 #! é %¥ $&ñö<÷ ! q 46?HAî[GK>2 é %¥ $&ñöLõ q (T 'WN@2=µ4@ÑB98Kh28SHKÖK>a@K/?c2<9¨46J8K E0NO?HN@28NO?HK@ÒB[:7:0E ! # ¥ #! § é # ¥ ) ! 1! § T f-S½ÑH9>Ò m¥õ § é # ¥ §'é #32 4 #!65 é [G7:!E # ¥ 1! §Üé [G7:!E m¥÷ ! §'é m¥õ %§ $ ! 1¬SHN PQ7G?HI ¥ 7 § 4@?HA ¥ 7G7 § 7G9h9L7:E07:[m46J/T 7'J8N a37:?HI 2<SHK0N@28SHK/JÖAH7GJ8K/=b2<7GNO??H4@EFK>[Gj@Ò#28SD462h7Gl m 98462<7G9 DK>9 ¥ 7 § Ò ¥ 7G7 § 4@?BA ¥ 7:7:7 § 28SHK/? 7G2Q7G9Q4 i>kZl lN@JW9LNOE0K¨J<4@?HABNOEa64@J87:4@iH[GK@ÒBÑB98K/9 98N@EFK¨AHK>K/] 2T ò m Bm Bm
'&
&
;Ç o>t s³Ç±/uBÆcq t;´cÄc¶ o s Ì s t s Ç prqOs tªo u½» s t ±/´½q ÄOo ÊcÆ t s·q ´uBqXob¾ tªu6¾u¬qXo|±/uc»Â»mob¾ n'obp qOs t<s³uBqÍvHx@?ê :ï `7YO}\^ + âÖ c5Í«5©¢¬D c£/ 00@ @ Ý¢D5 ÇÈÊOu¬qX¿co6qX±/o Qs t5²ßt5²Xo ó õ ö8õ ù ö7$I$I$Ýý $ s·q tª:o 9Oo»mÇ>x¯Q²XoMo>ÁXoq qcÆcÀ^ÄOo»mÇb¹ t5²Xo u3¿c¿ `hã3srÜæ¨D1A}åHÂs^CB e[ZYh^Xr ©DA}XåH ^CBÞ| `7`eg[ZYh^X qcÆcÀ^ÄOo»mÇÜ´cqX¿^t5²Xo¨»m´ ¾ E t<s uBq´c¶ Ç ´6»mo ±/uBÆcq t ¾ ⵩- £ ´cÄc¶ t¨u@ÌW»mo/´½¶ o_¥õ §Üé # ¥ éCõ §%$ qcÆcÀ^ÄOo»mÇ`Ä@o>t 'o>o6>q = ´cqX¿êyîs³ÇFqXu@t^±/uBÆcq t ¾ E E ´cÄc¶ o@x -f S½ÑH9/Ò E o_¥õ § * lNOJÐ4@[:[ õ !ì 4@?HAGF ! oÖ¥õ ! §Íé ü T`f-SBK i>kl N@l 7:9 J8K>[m4628K/Ak2287:E0K/9ZPÓKÖPQJ87G28K o 4@?BA m!o 987:E0]H[Rj%4@9 4@?HA m T 8
3
3
Hü
_ "! 8 Z%8Q Z%"B d7 Q Z%" .
E
¨¥õ § o
T1
T
ü
õ ü 1 7:I@ÑHJ8K # T 1¬ 7'JLNOiD4@iH7G[:7R2jFlÑH?B=>2<7GNO?klNOJ D7G]H]H7:?BIÔ40=>NO7:?k2PQ7:=>K ¥ @4 EF]B[:K # T ð T § *
×½Ë@´cÀhÊc¶ o^vHxRy = °( D¤r<©@£5c£/ Ý WâÈ¢3æ«b ©Ð⵩áç½@dr : ½®Ô: -ü ,2. õké * E -ü ,1 õkéü o_¥õ §Üé -ü ,2. õké 1 * õ !|, /ó *BöµüOö 13ý $ °5_ÚÜR@¢3< ½® @3®¤ò
'n o>prqOs t<s uBqwE vHxzyOy|{ 8@rã3©E @Èc£/ :Í :YO ^XÂT[E [Z`ß âkDb<MLçO:bî âÈ¢¬æ«b © o b¢D«5ßB@ oÖ¥õ § *F⵩F@ °õ°è o_¥õ §OõÞé üg@rãh⵩ @b Fû µè E b§ é o_¥õ §Oõ!$ ¥ # T üX§ # ¥ E (°D'âÈ¢3æ«b © o :w«8@ :8ãFD# A}XåB ^CB3Z `µ ^B e[ZYh^Xr kZl M¨B @¨B@ E mZoÖ¥õ §Üé o_¥ L§ E @rã o_¥õ §Üé mo ¥õ §Í@Z@ ¬D©DQõ@ ÿ3 «5 m!o :Ðã@ _b<bDc£/ :/®
E
1¬NOE0K>2<7GEFK>9¤PZKÖ9LSD4@[:[°PQJL7G28K ¥ õ §Oõ NOJ-9L7:E0]H[Gj
E
28N0E0Kµ46?
E
¥õ § cõ T
,
. 1
¨¥õ §
m!o
!
"
ü õ ü l NOJï?H7RlNOJ8E ¥ * Ò üX§ T
*
G7 IOÑHJLK # T # i>kl ×½Ë@´cÀ^Êc¶ oÖvHxzy Ø å¢XcD© ÖB@ B kl E ü lNOJ *0ûÛõ`ûVü o_¥õ §Üé * N@2<SBK/JLPQ7G98K $ E E þá :8@È Xè o¨¥õ § *+@rã o_¥õ § cõéü½®Q{ÅL@rã3© @Èc£/ :Öÿ' 3:Íã3bBb :_/@ã©wB @h W?H7GlNOJLE < è 9 Fã@:bÂÈ£/¢3 ©r® (°D i>kl :ï@ @b+£ * õ * õ *ÔûÛõMûVü m!o¨¥õ §Üé ü õ ü $ °5_ÚÜR@¢3< ½® ½®¤ò ×½Ë@´cÀ^Êc¶ oÖvHxzy v å¢XcD© ÖB@ B kl E ¥õ §'é * lN6N@2<J SHK>õ JLPQ7:9L* K $ E åæ«5 ¥õ § cõké ü6è 3:Ö:ÐÿZb ;ã3s rÜæ8 ã kZlꮤò B}6 NO?c2<7G?½ÑHNOÑH9ÔJ<46?HAHNOE a@46J87m46iH[:K>9Í=µ4@?ê[GKµ4@Aº28Nß=>NO?BlÑH9L7:NO?æT 7GJ89L2/Ò ?HN@28Kw2<SD462ÖE 7Rl 7G9¨=/NO?c2<7G?½ÑHNOÑH9Ö2<SHK>? # ¥ é õ §¨é * lN@J¨K>aOK>JLj õ NO? Ã2¨2<J;jM2T K_IOK>2¤]HJ8NOiH4@iH7:[G7G287:K/9'lJLNOE4 kZl i½j7:?c2IOJ<42<7:E ?BIHT Ï kZl =/4@?kirK_iH7:I@IOK/J 2<SD4@? ü ¥ ÑB?H[:7G\@Kw4kEF4@989dlÑB?H=>287:NO? § T NOJ_K 4@EF]B[:K@Òæ7Gl ¥õ §dé lNOJ õ ! p *Böµ-ü , -q 4@?HA *
/
&
_ "! 8 Z%8Q Z%"B d7 Q Z%" .
E
E
#
N@2<SBK/JLPQ7G98K6Òá2<SHK>E ? ¥õ § n* 4@?HA ¥õ §Oõ é ü 98NM2<SB7:9Ð7G9Í4îPÓK/[G[ AHKD?BK/A kl K>a@K/?º2<SBNOÑHIOS ¥õ E §Ðé 7:?ß98NOE0K]H[:4@=/K>9/T?|l4@=b2µÒá4 kl E =µ46?ºi K0ÑH?½irNOÑH?HAHK>A°T NOJhK 4@E EF]B[:K@Ò7Gl ¥õ §^é ¥ 1, # §ªõ E lN@J * õ ¼ü 4@?HA ¥õ §hé * N@28SHK/J;PQ7:98K6Ò 2<SHK>? ¥õ § Oõéü K>a@K/? 2<SHNOÑBIOS 7G9-?HN@2Wi N@ÑH?HAHK>A°T )
×½Ë@´cÀhÊc¶ o^vHxRy7& à#b °3:^:Öæ©- &V®¤ò (
¥õ §Üé * l6N N@2<J SHK>õ JLPQ7:9L* K $ E bæ«5 ¥õ § Oõé Oõ ,H¥;üÓúÞõ §Óé E
kZl
T, %é
6oÀhÀd´wvHxzyIV à#bm£b¨Di>kZl⵩hFL@rã3© @Èc£/ :-®?(°Db # ¥é õ §Üé mF¥õ § $ mF¥õ §ÖÿDb5m¥õ §Üé [:7GE mF¥÷B§5è # ¥õ ûC÷B§'é mF¥÷B§$ mF¥õ §5è # ¥ õ §Üé ü $ m¥õ §5è ⤠:Ы5©DÂD¢D©¢½^Db
[GNOI ¥ &C§ é
6
1
¥ b§ é # ¥ ûÛ b§Üé # ¥ û b§Üé # ¥ FûÛ û b§%$ 2ï7G9Q4@[:9LNÔÑH98KblÑH[°2J89LK i>kZl ¥ NOJD½ÑD4@?c2<7G[:K_lÑH?B=>2<7GNO? § T #
n'o>prqOs t<s uBqwvHxzy Ûà#b# £b^ÔL@rã3© @Èc£/ :¨ÿ' iTkl mF® (°D-Â
r½}/`/ i>kZl © T[D ^X eg[ZYh^X C:Ðã3srÜæ8ã%£ m ¥ O§'é 7:?Bl õM mF¥õ §Zû ⵩ ! p *¬öµ7ü q ® â m :¨bÂÈ «b æ«b<8b3@rã «5©DD¢å©¢c^Db m ¥O§Ö: D¨¢¬Dä/¢D^<8@ ¡D¢¬0£bb¤õÞb¢D«5B@c m¥õ §'é B®
1
ÃÌ%¸>uBÆ ´»moñÆcq Ì´cÀ^s·¶ ¾ sô6» Qs t5²;sÎq Ì D¹5ÆXǪt t5²OsÎq u@ÌÛs t ´XÇ t5²Xo
-K =/4@[:[ m b¥;ü-,2.c§ 2<SHK DJ89;2(3ÑH4@JL287:[:K6Ò m >¥;ü0,1O§ 28SHK-E0K/AH7:4@? ¥ N@Já98K>=/NO?HA½ÑD4@J 2<7G[:K § 4@?HA m ¥ # ,2.½§ 2<SHK¨2<SB7:J8A ½ÑD4@J;2<7G[:K@T À^s·qOsÎÀ^ÆcÀÍx
)
3
..
!
"
¤f PZNgJ84@?HAHNOE a64@J87:4@iH[:K>9 4@?HA 46J8K >[#Dw a`b^ }c[d^Xr PQJL7G2828K/? é 7Rl m!o¨¥õ §Óé mÓ¥õ § lNOJQ4@[G[ õ TÜf-SH7:9QAHN3K/9Q?HN62ïE0Kµ46?28SD462 4@?HA 46J8K K:½ÑD46[ Tï42<SHK>J/ÒD7G2¤EFK/4@?H9-2<SD42d46[:[°]HJLNOiD4@iB7:[:7R2j09L2542
M JLR H
B }6 %' [d^î ^ ^Xr 2'7G92<J84@AH7R2<7:N@?D4@[c2A028N¨AHK/?HN62JLa64@987RaOK 2<SH462ZPÓK¨4@JLK_9L28ÑH=5\FPQ7R2<S%7R2µ T QKµ4@A m 4@9 SD4@9¤AH7G9L28J87:iBÑB2<7GNO? mw# ^ 4@9 7:9Q46]H]HJ8N 7:EF4628K/[Gj m T !" $#&%('*),+.-/-10 #&-2' (#&354('(#6% h SH4@9¨4%]rNO7:?c2_EF4@9L9_AH7:9;2<JL7:iHÑB287:NO?`462 ÒHPQJ87R282? 76 ÒD7Rl # ¥ é 3§ é ü 7G?kPQSH7:=5S =µ4@9LK * õ m¥õ §Üé ü õ $ E f-SHKÖ]HJLNOiD4@iH7G[:7R2jFlÑH?B=>2<7GNO?7:9 ¥õ §Üé ü lNOJ õé 4@?HA * N@2<SHK>JLPQ7G98K@T 2 = ü i Kd4hIO7RaOK>?%7:?c2IOK/J>T !" 08#&- i " ' ":9 %# l (;<08#&-2' (#&354('(#6% ¡K>> 1¬ÑH]B] NO9LK¨2<SD462 SD4@9Q]HE JLNOiD4@iB7:[:7R2jEF4@9L9¤lÑH?H=>287:NO?IO7RaOK>?îicj ¥õ §Üé *-ü , = lN@NO28J SHK/õJ;PQéô7:98üOK Iö $ $I$7$/ö = K^984µj%2<SD42 SD469ï4wÑH?B7GlNOJLEAH7G9L2<JL7:iHѬ2<7:N@?kNO? ó½üOIö $I$I$>ö = ý T !"@?A" (% $4BCBC#0 #&-2' (#&354('(#6% ¡K>2 J8K/]BJ8K/9LK/?c2 4|=>NO7:? D7:]°T f-SHK>? é D 46?HA # ¥ é *c§'éü $:D lNOJ-98NOE0K D ! p *¬öµ7ü q T KÖ9<4µj%28SD462 SD4@9 # ¥éüX§'E 40ÕZE K/JL?HNOÑH[G[:7°AH7G9L28J87:iBÑB2<7GNO?%PQJ87R282? ÕZK/JL?HNOÑH[G[:7 F¥ Dr§ TÓf-SBKh]HJLNOiD4@iB7:[:7R2jFlÑB?H=>287:NO? 7:9 ¥õ §ÜEé D ¥;ü $:Dr§ lNOJ õ !|/ó *Böµü6ý T !"G? #&% ;H#I+.BJ08#&-2' (#&354('(#6% 13ÑH]H]rNO98K+PÓKSD4µa@K|4ê=/NO7G?CPQSH7:=5S l4@[:[G9 SHK/4@AH9dPQ7R2<SM]HJLNOiD4@iB7:[:7R2j D lNOJW98NOE0K *,û Dû ü T [:7:]2<SBKÍ=/NO7G? A 287:E0K/9d4@?HAM[GK>2
"> ">
.
!
"
E ri K2<SHK?3ÑBEÍi K>JÍN@lQSHK/4@AH9>T Ï 989LÑHE0K2<SD42Ð2<SBK29Ô4@J8KF7:?HAHK>] K>?HAHK>?½2/T #Kb2 ¥õ §Üé # ¥ éCõ § i K¨2<SBK^EF4@9L9¤lÑH?H=>287:NO?æT 2ï=/4@? i KÖ98SBNXPQ? 2<SD42 E ¥õ §'é* D ¥ª1ü $ Dr§ lN@NO28J SHK/õkJ;PQé 7:9L*BK ö7$ $I$I$/ö A Ï J<46?HAHNOE a64@J87:4@iH[:KgPQ7G28Sñ2<SBKºE46989`lÑH?B=>2<7GNO? 7G9M=µ4@[G[:K>A 4CÕZ7G?HNOE07m4@[¨J84@?HAHNOE a64@J87:4@iH[:K 4@?BAÛPZK`PQJL7G28K ÕZ7G?HNOE07m4@[ a¥ Aö Dr§ T l ÕZ7:?HN@EF7:4@[ a¥ Aö D § 4@?HA ù ÕZ7G?HNOE07m4@[ s¥ Aö D ù § 2<SHK>? úê ù ÕZ7:?HN@EF7:4@[ a¥ Aö D ú D ù § T H} ¡K>2hÑH9^2<4@\@KÔ2<SB7:9^NO]H]rNOJL28ÑH?H7G2j`2NO?BlÑH9L7:NO?æT 7:9h4îJ84@?HAHNOE a64@JL7m4@iH[GK øõ AHK/?HN6228K/J D 7G9 A 46?HA D 4@J8K AH}XD ^µ3/} ` Ò2<SD42h7G9/ ÑH98ÑH4@[:[Rj+ÑH?H\½?HN PQ?ê4@?HAßEÍÑH9;2ÐirK0K/9L287:EF462AßlJLNOE AD462<4 ø 28SD462 Ù9ÖPQSD462h9L2<462<7G9L2<7G=µ4@[ 7:?BlK>J8K>?H=/K_7G9Z46[:[r4@irNOÑB2µT ?EFNO9;2¤9;2546287:9L287:=/4@[åEFN3AHK/[G9/Òc2<SBK/J8K_4@J8KïJ<4@?HABNOE a@46J87m46iH[:K>9 4@?HA ]D4@J84@EFKb2J89 ABNO? Ã2ï=>NO?BlÑH9LK^28SHK/E T !" " ; " ' (# i 0 #&-/' (#&354('(#6$% _ SD469_4IOK>NOE0K>2<JL7:=ÖAH7G9L2<JL7:iHѬ2<7:N@? PQ7R2<S ]D4@J84@EFKb2J D !êa¥ *BöµüX§ ÒBPQJ87R2828K/? ¨K/N@E ¥ Dr§ ÒB7Gl é D¥ª1ü $ Då§ ö = Vü $ # ¥ é = §ÜE KÖSD4µa@K^28SD462 H # ¥ é = §Üé D H ¥;ü $:Då§ ïé ü $ ¥;D ü $:Då§ éôü $
+
+ f-SH7:?B\N@l 4@9¤28SHK¨?½ÑHEÍirK/J-N@l D7G]H9¤?HK>K/AHK>AîÑH?c287:[ 28SHK DJL9L2QSHK/4@AH9-PQSHK/? D7G]H]H7G?HI 4w=/NO7G?°T !" #&- - % 08#&-2' (#&3 4' #6$% ê SD4@9Ô4 7NO7:9L98NO?AH7:9;2<JL7:iHÑB287:NO?PQ7G2<Sê]D4 J<4@E0K>28K/J ¡ÒHPQJ87R282? 7NO7:9L98NO? ¥ § 7Rl E ¥õ §'é õ õ *_$ 'ïN@28K¨2<SD462 E H ¥õ §Üé H õ é éü $ + +
)
cð
!
" f-SHK 7NO7:9L98NO?07:9áN6l2[¬lNOJ=/NOÑH?c289 N6læJ84@J8K¤KbaOK>?½289Ó[G7:\6K-J<4@AH7GNc4@=b2<7Ga@K AHK>=µ4µjî4@?HAî28J<40=Ð4@=/=>7:AHK>?½289/T l 7NO7G9898N@? ¥aAö § 4@?HA ù 7NO7:9L98NO? ¥sAö ù § 2<SBK/? úÞ ù 7NO7G989LNO? ¥aAö ú ù § T B}6 KÍAHK H?HK/AgJ<46?HAHNOE¼a64@J87:4@iH[GK/9Q2KÐ7G?M4@?cjN@lá28SHK^AH7:9;2<J87 iHÑB287:NO?B9d4@irN aOK6T Ï 9 ZEFK>?½287:NO?BK/A`K/4@J8[G7:K>J/ÒH2<SBKh984@EF]B[:K^98]D4@=>KÐN@l28K/? LAH7:984@]H]rKµ4@JL9 iHÑB2d7G2ï7:9WJ8K/4@[:[Rjk2<SHK>J8KÍ7G?î2<SBKÍiD4@=5\½IOJLNOÑH?HA°T ¡K>2 9_=>NO?H9L28J8ÑH=b2Ö4984@EF]B[:K^98]D4@=>KÍK ]H[G7:=/7R2<[Rj¨lNOJ'4ÖÕZK/JL?HNOÑH[G[:7¬J<4@?HABNOEa64@JL7m4@iH[GK@T ¡K>2 Ûé p *Bö/7ü q 4@?BA0ABK D?HK # 2? # ¥ é üX§ïé # ¥m¦ôû Dr§Wé # ¥ p *Bö D qm§ïé D 4@?BA # ¥ é *c§Wé¼ü $ D Thf-S½ÑH9/Ò ÕZK/JL?HNOÑH[G[:7 F¥ Dr§ T KÐ=/NOÑH[GAMAHN%28SH7:9WlNOJ_4@[G[28SHKÍAH7G9L28J87:iBÑB2<7GNO?H9WAHK D?BK/A4@i N a@K@T ?Í]HJ84@=>287:=>K@Ò6PÓKZ28SH7:?H\¨N6lr4ïJ<46?HAHNOEVa64@J87:4@iH[GK [G7:\6KZ4WJ<4@?HABNOEô?3ÑBEÍi K>JiHѬ2lNOJLE4@[G[Gj 7G2¤7:9Q4ÔE4@]B]H7:?HIwABK D?HK>A`NO?9LNOEFKÖ984@EF]B[:K_9L]D4@=/K6T k ( ! |(0&'c$ MÍ"%$ (0"%$'.µ"*M(0L* H, MÍ"+)M( MÍ&Ü.%U .
)
)
M JLR H
T SD4@9'4 W?H7GlN@J8E ¥ Dö b§ AB7:9L28J87GiHÑB2<7GNO?°ÒPQJ87G2 2? W?H7GlNOJLE ¥ Dö b§ ÒH7Rl E ¥õ §Üé * lN@NO2<J SBK/õ JLPQ! 7G98p DK ö q QP SHK>J8K TÜf-SHKÖAH7:9;2<JL7:iHÑB287:NO?%lÑB?H=>287:NO?7:9 * õ õ ! p Dö q m¥õ § é ü õ 0$ !"19 %# l (;70 #&-2' (#&354('(#6%
Ó DS 469Q4 'ïNOJLE4@[ ¥ N@J ^46ÑH989L7m4@? § AH7:9;2<J87GiHÑB287:NO?PQ7R2<S ]D4@J84@E0K>2J89 ß4@?HA áÒDAHK>?HN@28K/A i½j |¥ ö ùb§ ÒH7Rl E ¥õ §Üé ü 1 K ] D$ 1 ü ù ¥õ $ § ù ö õ!` (;
+.B +(4- -/#I+.%
)
">
88 Z
c "j
$
.
! "
PQSHK/JLK !Å 4@?HA * T #4628K/J%PÓK+98SD46[:[d9LK/KM28SD462 V7G92<SHK*;=/K/?c28K/J ì¥ NOJ EFK/4@? § N@lÓ2<SHKÔAH7G9L2<JL7:iHѬ2<7:N@?46?HA 7:9¨28SHK ;98]HJLKµ4@A º¥ NOJÖ9;254@?HAH4@J8AßAHK>a37:462<7GNO? § N@l 2<SHKîAH7:9;2<JL7:iHÑB287:NO?æTCf-SHK 'ïN@J8EF4@[-]H[m4µj39%46?Û7:E0] N@JL2546?½20JLNO[:K7:?]HJ8N@iD4@iH7G[:7G2jÞ4@?HA 9L2<462<7G9L2<7G=/9>T `4@?cj]HSHK>?HNOE0K/?D4Ô7:??D4628ÑHJ8KÖSD4µa@KÐ4@]B]HJ8N 7:EF462[Gj 'ïNOJLE46[æAH7G9L2<JL7:iHÑ 2<7GNO?H9/(T #4628K/J>ÒBPZKÖ9LSD4@[:[ 98K>K^28SD462¤28SHKÖAH7:9;2<J87GiHÑB287:NO?%N6l4w9LÑHE N@lJ<46?HAHNOEÅa@46J87m46iH[:K>9 =µ4@? irKh46]H]HJ8N 7:EF4628K/Aicj4 'ïNOJLE4@[æAH7:9;2<J87GiHÑB287:NO? ¥ 2<SHK¨=>K/?c2<J84@[#[:7:E07G2 2<SBK/NOJLK/E § T KÜ9<4µj_28SD462 SD4694 `%^ DH} å}6Dåa `b^ }c [d^X 7Rl é * 4@?BA é ü T fJ84@AH7R2<7:N@?FAH7G=>2<4629Ó2<SH462¤4^9L2<4@?HAD4@JLA 'ïNOJLE46[DJ<4@?HABNOE a@46J87m46iH[:KQ7:9 AHK>?HN@2A%icj T f-SHK kl 4@?HA i>kZl N6l_4M9;254@?HAH4@J8A 'ïN@J8EF4@[Z46J8KkAHK>?HN@2Aicj ¥ O§ 4@?HA ¥ c§ T f-SHK kl 7:9-]H[GN@2828K/A7:? 7GIOÑHJLK # T . T'f-SHK/JLKh7G9¤?HN0=/[GNO98K>A lNOJLEK ]HJ8K>98987GNO?îlN@J dT UïK>J8K^4@J8KÖ9LNOEFK¨ÑB98K>lÑB[#l46=>2<9 ¥7§ l º¥ ö °ù § 28SHK/? é ¥ $ § , º¥ *¬öµüX§ T ¥ 7G7 § l |¥ *Bö/üX§ 2<SHK>? é ú |¥ ö ùȧ T ¥ 7G7:7 § l ! º¥ ! ö !ù § Ò éü@Iö $I$I$bö A 4@J8K¨7G?HAHK>] K>?HAHK/?c2W2<SHK>? H ! H ! ö H ! ù $ !+ !,+ !+
)
)
2QlNO[:[GNXPQ9ÓlJ8NOE ¥ 7 § 28SD462Q7Rl º¥ ö ùb§ 28SHK/? #
¥ b § é
#
0$
"$
é
$
$
0$
$
¥ # T 1O§ f-S3ÑB9PÓKÜ=µ4@?^=/NOE0]HÑB28KÜ4@?cj¨]HJLNOiD4@iH7G[:7R2<7:K>9rPÓK'P¤46?½24@9¡[GNO?HIQ4@9°PÓKÜ=µ4@?^=/NOE0]HÑB28K 2<SHK i>kZl ¥ c§ N6l#4Ð9L2<4@?HAD46J8A 'WNOJ8EF4@[ T Ï [G[å9L2542<7:9;2<7G=µ4@[D=>NOEF]BÑB2<7G?HIÐ]D46=5\646IOK/9 PQ7:[:[ =/NOE0]HÑB28K ¥ c§ 4@?HA ¥ O§ T Ï [G[¤9L2<462<7G9L287:=/9Í28K 2<9>ÒZ7G?H=/[GÑHAH7:?BIM2<SH7G9wNO?HK@Ò SD4µaOKî4 254@iB[:K¨N@la64@[:ÑHK>9QN@l ¥ c§ T ×½Ë@´cÀhÊc¶ o^vHxRy å¢ cD© ÖB@º¥ # ö O§>®QÚÜrã # ¥ ü §/® (°D¨ © Ý¢3 ©`: ü $ # é ü $ %¥ $*_$ .h.½§'é $ ü $ # ¥ ü §'éüD$ # ¥ üX§'éü $ #
.
E
Ö¥õ § o
$^ü
$1
!
"
ü
*
õ
1
:7 IOÑBJ8K # T .H K/?B987G2j%N@lá4w9;254@?HAH4@J8A 'ïNOJLE46[ T ^©ÿ rÜrã b¢å«5`B@ # ¥ O§Zé $ 1H® ÈÞ©Db^ÿZ©8ã5è rÜrã wé © @Ö3:У Fÿ'È3 $ $ $ 1Öé # ¥ O§Üé # $ é
Úæ5©D ^©È0@ ¡ c£/ :5è ¥ $B$ .Bü ðc§Üé $ 1H® (°Db<⵩<5è $B$ .Bü ðÐé $ é $ # @rãÔDbæ«5 ^é # $ $ .Bü ð ^é ü $Gü@ü ü $ ò
$
1
¥ $ 1O§>®`
$
DS 4@9ï4@? ]rNO?HK/?c287m4@[#AH7G9L2<JL7:iHѬ2<7:N@?PQ7R2<S § ÒH7Gl E ¥õ §Üé ü ö õ * PQSHK>J8K * T-f-SHKhK ]rNO?HK/?c287m4@[AH7:9;2<JL7:iHÑB287:NO?7G9WÑB98K/AM28NFE0N3AHK/[¡2<SBKh[G7GlK>287:E0K/9-N@l K/[GK/=b2<J8N@?H7:=¨=>NOEF]rNO?HK>?c2<9W4@?HAk2<SBKÖPZ4@7R2<7:?BIÔ287:E0K/9¤irK>2PÓK/K>?MJ<46J8K¨K>a@K/?c2<9>T $% " %' #I+B 0 #&-2' (#&354('(#6% ¤ ]D4@J84@E0K>2JÜÒHAHK/?BN@2A`icj ] ¥
ON J * Ò#28SHK DºNe[Yh^ r 7:9ÖAHK D?HK>A icj ¥ §-é T SD469¨4 ^46EFEF40AB7:9L28J87GiHÑB2<7GNO?PQ7G2<S`]D4@J<46EFKb2?HN@2Aîicj Ö4@EFEF4 ¥ ö § ÒB7Rl E ¥õ §Üé ü ¥ § õ ö õ *
+.;H; +@08#&-2' (#&3 4' #6
%$ ÷ c÷
">
88 Z
c "j
! "
.
PQSHK/JLK ö * T f-SHKQK ] N@?HK/?c2<7:4@[HAH7G9L28J87:iBÑB2<7GNO?Ð7G9 ÑH9L2 4 Ö4@E0E4 ¥;ü@ö § AH7G9L2<JL7:iHÑ 2<7GNO?°T l ! Ö@4 EFEF4 ¥ ! ö § 46J8K'7:?BAHK/]rK/?HABK/?c2µÒX2<SHK>? F ! + ! Ö4@EFEF4 ¥ F !,+ ! ö § T )
HS 4@9¤4@?ÕZK>2<4ÐAH7G9L2<JL7:iHѬ2<7:N@?ÔPQ7R2<SF]D4@J<46EFKb2A`icj ÕZK>2<4 ¥ ö § ÒB7Rl E ¥õ §Üé ¥ ¥ § ú ¥ § § õ ¥;ü $ºõ § ö * õ ìü$
!" ?A" ' + 08#&-2' (#&3 4' #6$% *
4@?HA
SD469¤4 HA 7:9;2<J87GiHÑB287:NO?wPQ7G28SÔAHK/IOJLK/K>9ZN@l lJ8K>K/AHNOE PQJL7G2828K/? 7Rl E ù ¥õ § é üÓú ü ù $ ù f-SHK AH7G9L2<JL7:iHѬ2<7:N@?7:9Ó987GEF7G[m4@Já28Nw4'ïN@J8EF4@[åiHÑB2-7R2¤SD4@9 2<SH7G=5\@K>J¤254@7G[:9/T ?%l4@=>2/Ò¬28SHK 'ïNOJLE46[D=/NOJLJ8K/9L] N@?HAH9Z28NÐ4 PQ7G2<S é& Táf-SHK 4@ÑH=5ScjAB7:9L28J87GiHÑB2<7GNO?Ô7:9Ó4h9L] K>=/7m46[ =µ4@9LKhN6l28SHK AH7G9L28J87:iBÑB2<7GNO?k=/NOJLJ8K>98]rNO?HAH7G?HI028N éü T'f-SHKÖAHK/?H9L7G2j7G9 E ¥õ §'é Ü¥;üÓúêü õ ù § $ fN098K>K¨2<SD462Q28SH7:9-7G9-7:?HAHK>K/Aî4wAHK/?H9L7G2j@Òr[GK>2 Ù9-AHNÔ2<SBK^7G?c2
+ 4
k
i
0 #&-2' (#&354('(#6%
Ü
SD4@94 ù AH7G9L2<JL7:iHѬ2<7:N@?_PQ7G2<S D AHK/I@J8K/K>9N@l3lJ8K>K/AHN@E ù 7Gl PQJ87G2L2? E ¥õ § é ¥FD ,1@ü § 1 ù õ ù ù öñõ *_$ l öI$7$I$/ö 4@J8K7:?HAHK>] K>?HAHK>?½29;254@?HAH4@J8A 'ïN@J8EF4@[@J84@?HAHNOE a64@J87:4@iH[:K>9 28SHK/? F !,+ ! ù ù T !"
ù
#&-2' (#&354('(#6% k
'
h*
!
"
. BMÍ&Ü.7Mh$$ G .IH$á&Ü.IJ+*`$'./(0"LH Ö7RaOK>?ì4º]H4@7:J%N6lhE AB7:98=>J8Kb29 4@?HA ÒWAHK D?HK+2<SHK ^ | `7`?e[ Yh^ r icj ¥õ#ö<÷B§ é # ¥ éìõ 4@?HA é ÷H§ T E J8NOEE ?HN PôNO?°ÒDPZK^PQJ87R2
9
<
9% : @7: 9%:
E
°3¢c5è # ¥éüOö ôé üX§'é ¥ªüOöµüX§áé (
@7 : ;h: 9%:
. ,
9%: %9 :
®¤ò
9
E
'n obp qOs t<s³uBqÍvHxRy ? b+DÔ«5©DÂD¢D©¢cw«8 5è ÿZE 0«8@ dâÈ¢3æ«b © ¥õ#ö<÷B§ÐhBãLâ ⵩kD L@E rã3© @Èc£/ :5 ¥Mö Ч â ¥õ#ö<÷B§ * ⵩ @ -¥õ#ö<÷B§5è
E ¥õ#ö8÷H§ cõ c÷ é üê@rãèÓ⵩`@ g b ê-è # ¥8¥Mö w§ ! § é ¥õ#ö<÷H§ Oõ ½÷æ® b%Ddã@: «b<bª¨©_«5©DD¢å©¢c_«8 WÿZïã3srÜæ-D ©Dc i>kZl mZo Ó¥õ#ö8÷H§'é # ¥ ûÛõ#ö ûC÷B§/® 1
1
×½Ë@´cÀ^Êc¶ oÖvHx Ø =Ûàb¤¥`ö ͧh£bÖ¢¬D ⵩ȩMD^¢3DÜ/ä/¢B@</®?(°DbBè E ¥õ#ö8÷H§áé *ü 7GN@l 28SH*wK/ûÛJ;PQõM7:98K ûV$ ü@ö *wûC÷%ûVü ÚÜrã # ¥ ì-ü ,1¬ö -ü ,1O§>® (°Dd @bD éôE ó ñ-ü ,1¬ö ì-ü ,13ýF«5©b<5D©rã ©0b¢H£b bQ©ªâÖD^¢3D /ä/¢B@</® ÈDª @L@3 © @bÖ3:^b¢B£b b-«5©b<5D©rã5èÜ 3:|«8 5誩۫5©dr¢33D@<8C©ªâMDM b ÿ3 «5C:] 9%:;r® °© è # ¥ -ü ,13ö ñ-ü ,h1O§'é-ü ,2.D®¤ò
2
2
1
2
N 0
Ÿ
"! 8 Z%8"
×½Ë@´cÀhÊc¶ o^vHxÙجyêà#bE Z¥Mö ͧ¨B @^ã3bBb ¥õ#ö<÷B§'é õÔ* úê÷ 7GN@l 28SH*wK/ûCJ;PQõM7:98K ûV$ üOö*0ûC÷ûVü (°Db ¥õwú÷B§ Oõ c÷ é õ Oõ c÷¨ú ÷ Oõ c÷ ü é 1 ½÷_ú ÷ ½÷wé 1ü ú 1ü é ü ÿ3 «5 @bÈ r¤5ÖB@Ó3:^:Ð klïò ×½Ë@´cÀhÊc¶ o^vHxÙØ@Ø âÜD¤ã@:bb£/¢3 ©w:-ã3s rÜæ8㨠© @bZ_æ© <5«b @3@¢¬ R@'< @ ©BèæDb DÔ«8@ :«b¢B@ ©B^@5У/ ©<w«5©dr Ý «8@ª8ãc® =^b<Ð:Ð@+D «5wLç½@dr :¨ÿ3 «5 £b©È5©ÿZ8ãQâÈ5© Ð ¤5© E ©Ó@rã °«5Db :5 g@_<\@ @®-àbZ¥`ö ͧ_B @hã3bBb ¥õ#ö<÷H§é * õåù5÷ 7GN@l 28SHõåK/ùïJ;PQûC7:98÷%K $ûVü h©8 rÜ<bWB@ $^ü%ûõûÅü½® ^©ÿ :bQ¢c8 rÜrã D @ Ý¢D%©ªâ ® (°DÍÂÈ « %Db< :Í©k£bF«8@<âÈ¢3 ác£b©¢3¤DÍL@330©ªâÐD @L@ ©r® `ïr « ©æ @bc£/ :5èõê/ Xè @rãÍ :báL@33Ö © @bï @ Ý¢å5/® (°DbBè¬âµ©¨83«5 rç¬8ã @ Ý¢åÖ©ªâÜõ°è'ÿZï :bæ÷ @ © @b%L@33Fÿ3 «5:^õåù û÷û ü½® bd0 Db â c©¢Þ :© © M@c r@¢35 ½® c® (°3¢½5è E ü é ¥õ#ö<÷B§ c÷ Oõé õ ù ÷ c÷ Oõ é õ ù ÷ c÷ Oõé õ ù 1ü $1 õ cõké 1¬. ü $ =hbæ«55è é 1¬0ü ,2. $ h©ÿV :b«5©dr¢3ª # ¥ w§>® (°3:k«5©È<5D©rãÔ©îD b éó¬¥õ#ö8÷H§Èø *ûÅõìû üOö8õåùkû÷ûÅõ¡ý¬® ©¢«8@Þ 53:k£ +ã@8@ÿ'3g ã@ @L@0® °© è Ч~é 1¬. ü õ ù ÷ ½÷ Oõé 1¬. ü õ ù ÷ ½÷ cõ # ¥ 1¬ü å õ ù " º $ õ # é . õ ù 1 cõké 1h* $|ò
1
'&
1
(
$
$
1
$
1
÷
÷ÔéCõåù
ü
!
"
÷Ôé õ
ü õ 7GIOÑHJ8K # T 3 f-SHKÔ[:7:I@S½2_98SD4@ABK/AßJ8K/I@7:NO?+7G9 õ ù û ÷ûü Twf-SHKÔAHK>?H987R2j+7:9¨]rNO987R2<7RaOK N aOK>Jh2<SB7:9^J8K/I@7:NO?°Tf-SHKFSD4628=5SHK/AÞJLK/IO7GNO?7G9^28SHKKbaOK>?½2 7:?c28K/J89LK/=b2
E
n'obp qOs t<s³uBqÍvHxÙØ@v âÜ¥Mö ͧ'B @ ©D°ã@:bÈ£/¢¬ ©wÿ'¨05åâÈ¢¬æ«b © o 'è Db+E DïH} #DZ| `I` eg[ZYh^X e å} :hã3srÜæ8㣠E o¨¥õ §Üé # ¥ é õ §Üé H ¥õ#ö<÷B§ ¥ # T # § # ¥ éCõ#ö éì÷H§'é H
@rãFE Dï|H} rDÓ| `I`e[ Yh^ rN e å} Y:hã3s rÜæ8ã%£ E Ó¥÷B§áé # ¥ é ÷H§'é H # ¥é õ#ö é ÷B§'é H ¥õ#ö<÷HF§ $ ¥ # T .c§
1
E
½× Ë@´cÀ^Êc¶ oÖvHx Øh& å¢XcD© B@ o +:@ @bÍwD c£/ :'B@⵩ :©ÿ/® (°DÜ0@Â@r@ ã@:bb£/¢3 ©dâµ©á «5©È<5D©rãQ©ÖDW<©ÿº© @ ·ï@rãhDQ0@Â@r@ åã@:bÂÈ£/¢3 © ⵩ «5©È<5D©r㨩0Dw«5© Ý¢3wBÖ© @ ·/®
\
D 8 "! 8 Z%8 "
E
9
<
é
9%:T9< b:T9< ;h:TE 9<
*
é ü
@7:T9< ;h:T9< :T9<
Ú©ÐLç½@dr :5è _o ¥ *c§'é # ,¬ü0*M@rã Öo ¥ªüX§'é $ ,¬ü0*D®¤ò
b:T9< :T9< 9
n'o>prqOs t<s uBqwvHx ØhVCÚ©-«5©DD¢å©¢c¤L@rã3©@Èc£/ :55è°D-0@Â@r@ 3ã3bBb 5 @< E E E E oÖ¥õ §Üé ¥õ#ö<÷B§ c÷ ö 4@?BA Ó¥÷H§'é ¥õ#ö<÷B§ O!õ $ ¥ # T O§ (°D«5©b<5D©rã@3`0@Â@r@ Óã@:bÂÈ£/¢3 ©âÈ¢3æ«b ©B%@<ã3bæ©8ãg£ mZo @rQã mÓ®
×½Ë@´cÀhÊc¶ o^vHxÙØ å ¢ cD© ÖB@
E
Ó¥õ#ö8÷H§'é ⵩-õ#ö<÷ *D® (°Db oÖ¥õ §Üé c÷Ôé ®-ò ×½Ë@´cÀhÊc¶ o^vHxÙØ å¢ E cD© ÖB@ ¥õ#ö<÷B§'é õÔ* úê÷ 7GN@l 28SH*wK/ûCJ;PQõM7:98K ûV$ üOö *0ûC÷ûVü (°Db E Ó¥÷B§'é ¥õÔúê÷H§ Oõé õ cõÍ ú ÷ c÷Ôé 1ü ú÷ $ò E
o
×½Ë@´cÀhÊc¶ o^vHxÙØ Cà#bZ¥MEö ͧ¨B @^ã3bBb ¥õ#ö<÷B§áé * ù õåù5÷ 7GN@l 2<SBõåK/ùïJLPQûC7G98÷%K $ûVü (°3¢½5è E E ÷ c÷wé 1¬ü õ ù ¥ªü1$ºõ § 1¬ü ù oÖ¥õ §Üé ¥ # õ < ö B ÷ § c Í ÷ é õ . E ⵩ $hü^ûÛõMûVü@rã oÖ¥õ §Üé *ß©DbÈÿ': /®¤ò
#
2.
!#"+)
¨"+) ¨ "%$
Ð"+)M(
Ð& .bM
M
M
!
"
JLR H
'n obp qOs t<s³uBqÍvHxÙØ ?X(#ÿZ© L@rã3© @bc£/ :5Ö @ rã @<ÐÂ# A'33 ^ â<è âµ©Í @b @rã 0è § é # ¥ ! § # ¥ ! w§F$ ¥ # T ðO§ # ¥ ! ö ! wÜ `^ÿ'ȪQ %®
?Ô]HJ87G?H=/7G]H[:K6Ò@2=5\wPQSHK>28SHK/J 4@?HA 4@JLK-7:?HAHK>] K>?HAHK>?½2ÜPÓKQ?HK/K>A028N¨=5SHK/=5\ K:½ÑD42<7:N@? ¥ # T ðc§ lNOJÖ4@[:[98ÑHiH9LK>289 4@?HA T NOJL28ÑH?D4628K/[Gj@Ò°PZK0SD4µaOKÔ2<SBKÔlN@[:[:N PQ7G?HI J8K>98ÑH[R2¨PQSH7:=5SgPÓK09;254628KwlNOJ¨=/NO?c287:?½ÑHNOÑH9ÖJ84@?HAHNOEYa64@J87:4@iH[:K>9d2<SBNOÑHIOSg7G2Ö7G9d2<JLÑHKwlNOJ AH7G98=/JLK>28K^J84@?HAHN@Ea64@J87:4@iH[:K>9Z2¿ uBqc¶ ¸ÆcÊ tªu Ç;o>tªÇQu@Ì Àïoµ´XÇbÆc»:o =Hx ×½Ë@´cÀ^Êc¶ oÖvHx vByÞàb @rã ¼B @ÖDÜ⵩ :©ÿ'3Fã@:bÂÈ£/¢3 © 6 é * é ü
+*
,6*
<
E
9
9%:; 9%:; 9%: @
E
E
9%:; 9%:; 9%: @
9%: @ %9 : @ 9
E
°DbBè o¨¥a*c§Wé Eo_¥;üX§WéE ü-,1ß@rE ã Ó¥ *c§QE é ÓE¥ªüX§Qéü-,1HE ®^ @rEã @E <wrã3 DE brã3bDïE £b5«8@¢½E o_¥ *c§ E Ó¥ *c§dé ¥a*Bö *c§<è o_¥ *c§ Ó¥;üX§dé ¥a*BöµüX§<è o¨¥;üX§ ¥ *O§¨é ¥ªüOö *c§<è oÖ¥ªüX§ ¥;üX§Öé ¥;üOöµü §/® å¢ cD© 0Bbª8cãîB@Ó @rã B @0DQ⵩ :©ÿ'3%ã@:bÈ£/¢¬ © 6 é * é ü (
9
<
9%: @ < %9 : @
< %9 : @ %9 : @
9
9%: @ %9 : @
&
&
0
!
"
E (°D5 @<Þæ©krã3Dbrã3bDk£b5«8@¢½
¥ *¬öµüX§'é H* ®-ò
E
E
_¥ *c§ Ó ¥;üX§Cé ¥;ü0,1O§/¥ªü-,1O§ é ü-,2. cb o
×½Ë@´cÀhÊc¶ o^vHxÙv@Ø å ¢ cD© QB@å @rã @<Qrã3Dbrã3bD@rãÍ£b©ÍB @QD-/@ ã3bBb E ¥õ §Üé *1õ 7RN6l 2<SH*ÔK>ûÛJLPQõ`7:9LK ûô$ ü à#bÓ¢c rÜrã # ¥ú ûôüX§>® ¡b3rã3Dbrã3bæ«55èD ©D¤ã3bBb F: E E E ¥õ#ö<÷B§'é o_¥õ § Ó¥÷B§'é .@* õå÷ 7RN@l 28SH*wK/ûÛJ;PQõ`7:9LK û $ ü@ö *wûC÷%ûVü h©ÿè E K ¥õ#ö<÷B§ c÷ Oõ # ¥ú ûVüX§~é é . õ ÷ c÷ Oõ ; ¥ ü º $ é . õ 1 õ §;ù cõké ðü $ò
$
-f SHK^lN@[:[:N PQ7G?HIÍJ8K>98ÑH[R2Q7:9-SHK>[:]BlÑB[°lNOJ¤aOK>J87Glj37G?HIÔ7:?HAHK>] K>?HAHK>?H=/K6T ¯W²Xo>uc»mo6À E å¢XcD© îB@ÍD`L@33g©ªâÔ @rã :M GD© 5b£/ |IrÜD
<5«b @3@ :/® â ¥õ#ö<÷B§Óé æ¥õ § #¥÷B§Ü⵩_ ©ÜâÈ¢¬æ«b ©B |@rã æ©Óæ5«555/@b r<©@£5c£/ Ý ã3bBb -âÈ¢3æ«b ©B 0Db @rã @<^rã3Dbrã3bD ® ×½Ë@´cÀhÊc¶ o^vHxÙ2v & à#b# @rã B @^ã3bBb E 7Rl õ * 4@?HA ÷ * 1 Hù ¥õ#ö<÷B§'é * N62<SHK>JL PQ7:9LK $ E (°DÜL@33-©ªâ¡~@rã : D <5«b @3@ :Ó¥ *Bö &C§ ï¥ *BCö &C§>® M-«8@wÿ'È ¥õ#ö8÷H§áé æ¥õ § #¥÷H§¨ÿDb< æ¥õ §'é 1 @rã #¥÷B§'é ù ?® (°3¢½5è¡ %®-ò +*-,6*
*
1
@ð
!
"
(0"+)+.>$á./(0"PM R G .7H$'&Ü.IJ+*`$á./(0"LH l 4@?HA 4@J8KÓAH7:9L=/JLK>2NOE0]HÑB2A ¼é ÷ TÔ1¬]rK/=/7 D=µ46[:[Gj@Ò # ¥ é õ ¼é ÷H§ïé # ¥ éõ#ö é÷H§ , # ¥ é÷B§ T%f-SH7G9Ð[GKµ4@AH9^ÑH9Ö2T n'obp qOs t<s³uBqÍvHxÙ2v V (°D YOr ^X #D( A}XåHs ^CB| `7`eg[Z E Yh^X ` E o E Ó¥õ#ö<÷B§ # ¥é õ#ö é ÷B§ é o ¥õ ÷B§'é # ¥é õ éì÷H§é Ó¥÷B§ # ¥ éì÷B§ E â Z¥÷H§ *D®
+o´»moÞt5»mo/´¿Bs·q 9ôs·q ¿co>o6Ê '´Xtªo» ²Xo6»:oOx ²Xo6q 'oô±>uBÀhÊcÆ tªo ! é ÷B§ # ¥ s·q t5²Xo ±/u¬q t<sÎqcÆXuBÆXÇ ±/´Ç;o 'o ´»moÛ±>uBqX¿Bs ¾ t<s uBqOsÎq 9ÐuBq¨t5²Xo_o>ÁXoq t ó é ÷rý W²Os ±6² ²´Ç Ê»mu¬Ä´½ÄOs·¶Ùs t/¸ =Hx go ´ ÁXu¬s ¿ t5²Os³Ç Ê»muBÄc¶ o6À Äb¸ ¿co>prqOsÎq 9 t5²OsÎq 9OÇ s·q tªo» ÀïÇ u@Ì t5²Xo kZlÞx ¯W²Xo Ì´±bt t5²´ tÐt5²Os Ç%¶ o/´¿cÇÍtªuß´ áo6¶Ù¶ ¾¿cobp qXo>¿ t5²Xo>u½»G¸ s ÇÊ»mu@ÁXo>¿ s·q~Àduc»mo ´X¿OÁ´cqX±/o/¿ ±/uBÆc»:ÇLo>Ç/x +o ÇÈs·ÀhÊc¶ ¸ t;´ 5os t ´XÇQ´Ð¿cobp qOs t<s³uBq3x
3
3
ON J¨=/NO?c287:?½ÑHNOÑH9^AH7G9L2<JL7:iHѬ2<7:N@?H9dPÓKFE ÑH9LK028SHKÔ9<4@E0KwAHK D?H7G287:NO?B9/TÔf-SHK07:?c2J8]HJLK 2542<7:N@?`AH7 æK>J89 7G?`2<SHKÐAH7:9L=/J8Kb2IOJ<4628K_2
)
'n obp qOs t<s³uBqÍvHxÙv Ú©Ó«5©DÂD¢D©¢c 8@rã3© @bc£/ :55è DYOrs^ r#D A}Xå HÂ ^CBê3Z`µ ^B e[ZYh^XrÛa` E E o E ¥õ#ö<÷B§ o ¥õ ÷B§'é Ó¥÷B§ E 5b¢3w3FB@ Z¥÷H§ *D?® (°DbBè E é o Z¥õ ÷B§ O!õ $ # ¥ ! ôé ÷H§'
3
3
3
×½Ë@´cÀ^Êc¶ oÖvHx v E Ûàb' @rã B @0¢¬D âµ©È ã@:bÈ£/¢¬ ©©DÍ¢¬D¤/ä/¢B@5/® bÈ â ÖB@ o ¥õ ÷B§'éüÓ⵩ *wûÛõMûVüw@rã<w©DbÈÿ': /® (°3¢½5èæ@ @b é ÷åè : W?H7GlN@J8E < è 9 @® `w«8@Mÿ'Ȫ^3:ÐQ ôéì÷ W?H7Gl ¥ *Bö/üX§/E ®¤ò J8NOE28SHKÔAHK H?H7G287:NO?gN@lÓ2<SHKÔ=/N@?HAH7G287:NO?H4@[áAHK>?H987R2jOÒ#PZK098K>K028SD462 o Ó¥õ#ö<÷H§dé
E
88 7 Z%8 "! Z%"
E
¥ õ ÷B§ ¥÷B§^é ?HK 2ïK 4@E0]H[:K6T ×½Ë@´cÀhÊc¶ o^vHxÙv Cà#b
E
o
o
E
$
¥÷ õ § do ¥õ § Tîf-SH7G9Ð=/4@?º98N@EFKb2<7:E0K/9ÖirKÑH9LK>lÑH[Z4@9Ð7G?28SHK
E
¥õ#ö<÷B§'é Ôõ* úê÷ 7GN@l 28SH*wK/ûCJ;PQõM7:98K ûV$ üOö*0ûC÷ûVü E à#bh¢c rÜrã # ¥ ~ü-,2. é -ü , # §>® È Lç½@dr : ½® @ `ÿZ%/@ÿ B@ Ó¥÷B§Fé ÷¨úì¥;ü-,1@§/®=^bæ«55è E E o E Ó¥õ#ö8÷H§ õÔú÷ $ é o ¥õ ÷H§'é Ó¥÷B§ ÷Öú ù °© è E ü ü B . é # é o Hõ #ü Oõ é õw úú Oõé ù ú ú é #ü7.1 $ºò ù ù ×½Ë@´cÀhÊc¶ o^vHxÙv ? å¢ cD© ÖB@ ï?H7Rl ¥ *Böµü §/®¤{áâȪbÍ©@£/ @D3% @ Ý¢DÍ©ªâ- ÿZ 3bæbL@ª éC1õ ï?H7RlNOJ8E ¥õ#öµüX§>® %B@':_D_0@Â@r@ æã@:bÈ£/¢¬ ©ß©ªâ ÚÜ<bÓæ©^B@è E ü 7Rl *0ûÛõ`ûVü o¨¥õ §Üé * N@28SHK/J;PQ7:9LK @rã E 7Gl * õ ÷ ü o_¥÷ õ §áé * N@2<SHK>JLPQ7G98K $ °© è E E E 7Rl * õ ÷ ü o Ó¥õ#ö<÷B§'é o_¥÷ õ § od¥õ §Üé * N62<SHK>JLPQ7:9LK $ (°DÖ0@Â@r@ @⵩ : E E
Oõ é $ / Ó¥÷B§'é o Ó¥õ#ö<÷H§ Oõké é $ [:N@I ¥;ü $Þ÷B§ ü $|õ ⵩ * ÷ ñü½®-ò
1
!
"
E
×½Ë@´cÀ^Êc¶ oÖvHx'& = þÓ©Bbã3bkDîã3bBb ß ç½@dr : ½® @½®Cà#b rÜrã E o ¥÷ õ §/® Dbß é õ°èï÷Ûw¢cbd/@Â:â %õ ù û¼÷ û~ü½® -@È Ý b<è^ÿZ/@ÿ B@ oÖ¥õ §Ôé ¥ 13-ü , §õåùX¥;Dü $ºõ >§>® =hbæ«55èD⵩E-õåùWûC÷%ûVü6è E ù õåù<÷ é 16÷ $ E ¥õ#ö<÷B§ é o ¥÷ õ §áé ù õ ù ¥ªü $ºõ § ü $|õ o¨¥õ § ^©ÿ :b¨¢c`E «5©dr¢3ª # ¥ # ,2. é -ü ,1O§>+® (°3: «8@C£b`«5©dr¢38㣠rÜ<b æ©3FB@ o_¥÷ Î-ü ,1@§áé # 1÷ ,¬ü H® (°3¢c5è E # ¥ # ,/. é0ü ,1O§'é ¥÷ ·-ü ,1O§ ½÷hé # ü 16÷ c÷wé ü $ $|ò !
"
$
* R>$á. BMÍ&Ü.7Mh$$ G .7H$'&Ü.IJ+*`$á./(0"LHNMÍ"+) M! LR H L ¡K>2 éY¥ öI$I$7$/ö8 E § PQSHK/JLK öI$7$I$>öL 4@J8KÔJ<46?HAHNOEYa@46J87m46iH[:K>9/T Kw=µ4@[G[ 4 8@rã3© @5«b© T ¡K>2 ¥õ Iö $I$7$>ö8õ § ABK/?HN@28 Kï28SHK_]rABlªT 2¤7G9Ó]rNO9L987:iB[:KQ22<=6TEÍÑB =5SÍ2<SHK¤984@E0KÓPZ4 jÍ4@97G?h28SHK¤iH7Ra@46J87m42JLj 7ö $I$I$/ö Ò ¥ # T$ § # ¥ ! Iö $I$I$>ö8 ! §Üé !,+ # ¥ ! ! 1! F§ $ E E 2Ü98Ñ F=>K/9 2ö8õ §Üé !, + o J ¥õ ! § T l Iö $I$7$>ö8 E 4@JLK-7:?HABK ]rK/?HAHK>?c2Q4@?HAkK/4@=5SkSD4@9Z28SHK_984@EFKïE46 J8IO7G?D4@[DAH7G9L 2<JL7:iHѬ2<7:N@?0PQ7R2<S% AHK>?H987R2j Ò½PÓK¨9<4µj 2<SH462 7ö $I$I$bö8 46J8K # # kV¥ 7:?HE AHK>] K>?HAHK>?½2_4@?HAî7:AHK>?½287:=/4@[:[RjAH7G9L28J87:iBÑB2A § T K^98SD4@[G[ PQJ87R2kl Ò Iö $I$I$; m Tïf-SH7:9 E0Kµ4@?H9^2<SD42 Iö $7$I$/ö8 4@J8K07:?BAHK/]rK/?HABK/?c2wAHJ84 PQ9hlJ8NOE 2<SBK9<46EFK0 AH7:9;2<JL7:iHÑB287:NO?æT Kh46[:98Nw=/4@[:[ 7ö $I$I$bö8 4 L@rã3©/@dr : lJLNOE m T îÑH=5SMN6l9;2546287:9L287:=/4@[æ 2<SBK/NOJ;j 4@?HA ]HJ<4@=b2<7G=/KÖi K>IO7:?B9QPQ7R2<S # # k N@iH98K>JLa6462<7GNO?H9W4@?HA PÓK^ 9LSD 4@[G[¡9L2<ÑBABj% 2<(SB7:9Q=µ!4@ 9LKÖ |7:?(0AHK>&á2<$4@MÐ7:[ "kPQSH$ K/? PZKÖ*LAH7GR>98$á=/ÑB. B989WMÍ9L25&Ü4.72<M^7:9;2<$7G =/9/T G .7H$'&Ü.IJ+*DU $á./(0"LH
)
\+
8 _
d Z0
"! Z%"
ÜT f-SHK_EÍÑB[G2<7Ra64@J87:462NO[:NOJ ü ÒW=/NO[GNOJ 1 Ò^TRTGTÅÒï=>NO[:N@J\åT ¡Kb2 Dé ¥ D öI$I$7$/ö D § PQSHK>J8K D * 4@?HA F + D wé ü 46?HA9LÑH]H]rNO98K02<SD42 D 7:9_2<SBKF]HJLNOiD4@iB7:[:7R2j`N6l¤AHJ84 PQ7G?HI 4%iD4@[G[N@l =/N@[:NOJ DT J<4µP A 2<7GEFK>9 ¥ 7G?HAHK/]rK/?BAHK/?c2ÖAHJ84 PQ9¨PQ7R2<S+J8K>]H[m46=/K/E0K/?c2 § 4@?HA [:Kb2 é ¥ Iö $I$7$/ö8 § PQSBK/J8K 7G9¤2<SHKÖ?½ÑHEÍirK/J-N@l2<7GEFK>9¤2<SD42Q=/NO[GNOJ 46]H] K/4@J89>T UïK>?H=/K6Ò Aêé F + T Kw9<4µj+2<SD42 SH4@9Ö4 îÑH[G287:?HN@EF7:4@[ s¥ A Ò Dr§ AH7G9L2<JL7:iHѬ2<7:N@? PQJ87R282? îÑB[G2<7G?HNOE07m4@[ a¥ Aö Dr§ T'f-SHK^]HJLNOiD4@iH7G[:7R2j0lÑB?H=>287:NO? 7:9 E ¥õ §Üé õ $7A $I$Lõ D D
¥# T §
PQSHK/JLK A A é õ $I$7$Lõ õ õ $
6oÀhÀd´wvHx'&Hy å¢ cD© FB@Ó ÑH[G287:?HNOE07m46[ ¥aAö Dr§FÿDb<^ é~¥ öI$7$I$/ö8 § @rã D%é F¥ D Iö $I$I$>ö D §/?® (°DÖ0@Â@r@ #ã@:bÈ£/¢¬ ©|©ªâ¤ w: Qæ©w@ A#è D @® ) 4B'(# +(#I+ ' " (; +.B TFf-SBK0ÑB?H7Ga64@JL7m4628K 'ïNOJLE46[áSD4@Ag2PZN ]D4@J84@E0K>2J89>Ò Û4@?BA áT ?|28SHKEÐÑH[G287Ga64@J87:462
Ò 7:9h4a@K/=b2K/A|icj4 E42<J87 TÜfNFirK/I@7:?°ÒH[GK>2 TT é
PQSHK/JLK Iö $7$I$/ö ºa¥ *BöµüX§ 46J8KÖ7:?HABK/]rK/?HAHK>?c2µTÜf-SHK^AHK/?H9L7G2jkN6l 7:9 E E
¥ c§Üé !,+ ¥ ! § é ¥ 1#ü § ù K ] $ 1ü H + ù é ¥ 1 ü § ù K ] $ 1ü $ Kd9<4µjF28SD462 SD4@9¤4Ð9L2<4@?HAD4@JLA%EÍÑH[R2<7Ra@46J87m42? * ºa¥ *Bö ¬§ PQSHK>J8KÔ7G2¨7G9_ÑH?HABK/J89;2?c2<9h4ka@K/=b2J8N3K/9Ö4@?BA 7:9¤28SHAK = =k7:ABK/?c2<7R2j%EF462<JL7 T îNOJ8KFIOK/?BK/J<46[:[Gj@Ò4aOK>=>2
)
6
$#
"!
ÃÌ
c´ qX¿ `´6»:oMÁXo/±b¾ tªu½»:Ǽt5²Xo6q é F !, + ! ! x
!
)
%
%
'
s Çt5²Xo_sÎq ÁXo»mÇ;o-u@Ì t5²XoÔÀd´ t5»Âs Ë _x
&
ð*
!
" PQSHK>J8KwAHKb2 ¥ § AHK>?HN@29¨2<SHKÍABK>2J8E07:?D46?½2_N@l 4%E42<J87 Ò Þ7:9d4aOK>=>22L2<7G?HI é * 4@?HA s t<s ÁXoº¿cobp qOs ªt oÛs ̪¹hÌuc» é IO7RaOK/9QiD46=5\2<SHKÖ9;254@?HAH4@J8A 'ïNOJLE46[ T ´c¶·¶ qXuBqµ¾ o»muÁXo>±>tªuc»mÇ #õ ¹Hõ Ó *Hx K 7G9L1¬2<9F7:?B4g=/K E46287:J89h7 9Lj3EFE0 ù K>28J87G=%=µ464@[:?H[:K>AÞA]r28NOSH9LK7G2<7R9 aO½K%ÑD4@ABJLK KîD?HJL7RN¬2?HJ8IK ]HJLNO] K>JL287:K/9 +¥ 7 § ù 7:9Ð9Lj3EFE0K>28J87G=@Ò ¥ 7G7 § é ù ù 4@?HA ¥ 7G7:7 § ù ù%é ù ù-é PQSHK>J8K ù-é ¥ ùb§ T ¯Q²Xo/uc»moÀ â |¥ *Bö B§ß@rãß é ú ù Db º¥ ö -§>® þÓ© @b< b Xè' ⤠|¥ ö -§5è Db ù6¥ $ § ºa¥ *Bö ¬§/® 1¬ÑH]H]rNO9LKFPÓK%]D4@J;2<7G287:NO?ß4 J<4@?HABNOE 'ïNOJLE4@[áa@K/=b2?® (°Db 6 9 L(°D^0@Â@r@ ¡ã@:bÈ£/ ©|©ªâ¤ :- º¥ ö §/® g@ L(°Dw«5©rã@ ©r@ ¡ã@:bb£/ ©|©ªâ¤ @ @b% éCõ : éCõ ú ¥õ $ §bö $ $ â +:h @5«b©ÖDb r º¥ ö ¬§>® ;
é ¥ $ § ¥ $ § ù ®
!
!
+*
,
!
1
%
,
'
+*
"!
%
*
1
Ô & MÐ"LH O(0& Mh$á./(0"LHY( J MÐ"+)M( MÐ& .bM JLR H E 1¬ÑH]H]rNO9LK028SD462 7G9^4 J84@?HAHNOE a@46J87m46iH[:KÍPQ7R2<S kl o 4@?HA i>klo mZo T ¡K>2 é B¥g§ irK%4lÑH?H=>287:NO?N@l ÒlNOJhK 4@EF]B[:K@Ò é ùÐNOJ é T KF=µ4@[G[ é B¥ß§ 4¨2<J84@?H9LlN@J8EF462<7GNO?wN@l T'UïN PCAHN^PZKW=/N@EF]HѬ2kZl N@l e ?28SHKÖAH7:9L=/JLK>2
+ " 8 D Z%8"
E
ðBü
! "
¥÷H§'é # ¥ ôéì÷H§'é # ¥ B¥ß§'éì÷B§áé # ¥ ó õ#ø H¥õ § é ÷rý6§áé # ¥
! ¥÷B§8§%$
×½Ë@´cÀhÊc¶ o^vHxW&h& å¢ cD© ^B@ # ¥ é $hüX§Zé # ¥ é ü §¤é ü-,2.ß@rã # ¥ +é *c§Zé ü-,1H®Mà#b é ù ® (°DbBè # ¥ én*c§^é # ¥ én*c§^éü-,1Þ@rã # ¥ é Xü §hé @È54/3\6 E # ¥éüX§¡ú # ¥E é $hü § éôü-,1H® 墬w0 õ o¨¥õ § ÷ Ó¥÷H§ &
<
9
9%:; 9%: @ 9%:;
c5QâµbÿZb @ Ý¢å50B@M
ò
9
9
<
9%: @ %9 : @
£b5«8@¢c 0D0ÂL@B⵩È0@ ©|:Ôæ©Ö©æ © 8©æ/® &
&
E
f-SHKh=>NO?c2<7G?3ÑBNOÑH9Q=µ4@9LKh7G9-SD4@JLAHK/J>T f-SBK/J8K^4@JLK¨2<SHJLK/KÖ9L28K/]H9QlNOJ D ?HAB7:?HI ' -f SBJ8K/KÖ9;22 é ó õMH¥õ §Zû ÷rý T 1 T 7G?HAk2<SHK i>kZl mÓ¥÷H§ é # ¥ û ÷B§áé # ¥ H¥ß§ZûC÷H§ E é # ¥ó õ#ø B¥õ §¤ûC÷åý6§áé o¨¥õ § cõ E # T-f-SHK kl 7:9 Z¥÷H§'é m ¥÷B§ T
E
E
×½Ë@´cÀhÊc¶ o^vHxW&hV à#b o¨¥õ §`é ⵩0õ *D® (°Db mZo¨¥õ §îé oÖ¥ § é ü $ ®Qà#b éB¥g§'é [:NOI ê®?(°Db éôó õ`%õ`û ýî@rã mÓ¥÷B§'é # ¥ ûC÷B§'é # ¥ [:NOI ûC÷H§'é # ¥ û §'é m!o¨¥ §Üéôü1$ $ E (°Db<⵩55è Ó¥÷B§áé ⵩Q÷ !`d®Öò
¥ # T üI*c§
ð\1
!
" ×½Ë@´cÀ^Êc¶ oÖvHx'& Ûàb# ï?H7Rl ¥%$hü@ö # §/®-ÚÜrã0D-BãLâЩªâ é MùX®?(°Dhã3bBb k©ªâ : E ü-,2. 7Gl $ ü õ # oÖ¥õ § é N@2<SHK>JLPQ7G98K $ *
8« @ñ©D ß c @ Ý¢å5 ¥a*Bö §>® Óþ ©Bbã3bkÂÿZ©ê«8 76" D* ÷ E üê@rã küû÷ ®FÚ©Ô«8 è é p $ ÷ ö ÷\qd@rã?m.ZE¥÷H§Wé o¨¥õ §Oõ|é ¥;ü0,1O§ ÷ ®ÓÚ©Z«8 è é p $^üOö ÷\q¡@rã m.Ó¥÷B§'é o¨¥õ §Oõé ¥ªü-,2.½§>¥ ÷¬ú üX§>® Ö _b<bDÂ@3Q m¼ÿZd3b 7Gl * ÷ ñü E 7Gl ü ÷ Z¥÷H§'é N@2<SBK/JLPQ7G98K $ºò *
(
SHK/? 7:9Ö9;2<J87G=>28[GjME0NO?HN6228[GjME0NO?HN62J89LK é 4@?BAî7G?28SH7:9-=/4@98KÖNO?BK^=/4@? 98SHN P 28SD462 E E ¥÷B§ $ Ó¥÷B§áé o_¥ ¥÷H§L§ ¥ # T üOüX§ c÷
BE
& MÐ"LH O(0& hM $á./(0"LH (
¨& M
MQJLR H
R
Í"+)M(
M
Ð&Ü.%U M
?%98N@EFKW=µ4@9LK/9ZPÓK¨4@JLKï7G?c2a@K/J<46[°J<4@?BAHNOE a64@J87 4@iH[GK/9>T N@JWK 4@EF]B[:K@Òr7Gl 4@?BA 4@J8KhIO7Ga@K/?`J84@?HAHN@E a@46J87m46iH[:K>9/ÒDPZKhEF7GIOSc2QP¤4@?c2 22 ¼é H¥Mö Ч irKF2<SBKlÑH?H=b2<7:N@?ºN@lQ7:?c28K/J8K>9L2/TMf-SHK%9;2lNOJLK
)
_+ " 8 D Z%8"
ü T ON J-K/4@=5S Ò H?HA 1 T 7G?HAk2<SHK i>kZl m ¥ c§ é é E # T-f-SHK>? ¥ c§Üé m
!
"
28SHKÖ98Kb2 é ó¬¥õ#ö<÷B§Ó B ¥õ#ö<÷B§Óû ¬ý T
"_
ð#
¥ û c§ é # ¥ B¥`ö ͧ¤û c§ # ª¥ ó¬¥õ#ö8÷H§Èø H¥õ#ö<÷H§ û 3ý6§ é ¥ c§ T #
E
Ó¥õ#ö8÷H§ Oõ ½ ÷ $
o
½× Ë@´cÀhÊc¶ o^vHxW& Cà#b¤ 8ö ù W?H7Gl ¥a*BöµüX§k£b%rã3Dbrã3bD ®|ÚÜrã+Dã3bBb |©ªâ é úÞ ù ® (°D ©DÓ3ã bBb k©ªâh¥ ö8 ù §Ö: E ¥õ öLõ ù §'é *ü N@* 28SHK/õJ;PQ 7:9LK $üOö* õ ù ñü à#b B¥õ ö8õ ù § é õ úÞõ ù ® h©ÿè mÓ¥÷H§ é # ¥ ûC÷H§'é # ¥ H¥ ö8 §ZûC÷H§ ù E é # ¥ó¬¥õ ö8õ ù §- B¥õ ö8õ ù §ZûC÷rý6§'é ¥õ ö8õ ù § cõ Oõ ù $ h©ÿ «5©5ÍDÍB@LãwB@È 68rÜrã@3 ®0ÚÜ<b¤b¢XcD© ÍB@ * V÷gû ü½® (°Db :%Db@3@E :%ÿ' @bÈ «55k¥ *¬ö *c§bö ¥÷rö *c§@rãߥ *¬ö<÷H§>® °50ÚÜR@¢3< ½®@® b 3:+«8 5è ¥õ öLõ ù § Oõ Oõ ù : D`@<8©ªâ3: ÂÈ@3@ :ÿ3 «5ß÷3ù ,h1H® â ü ô÷ 1ßDb :F @b 33î|DÔ¢3D-/ä/¢B@<%L笫5r-DÔÂÈ@3@ :Íÿ' @bÈ «55Ð¥ªüOö<÷ $CüX§böX¥ªüOöµüX§ÈöX¥÷ $CüOö/üX§/® (°3:d bÜBh@<8ü $º÷3Fù ,1H® (°Db5⵩<5è * ÷ * ùDü $ *wüÖûû ÷F÷FûVû 1ü m.Ó¥÷B§'é ü ù ÷ 1$ ã@ _b5bD@ ©BèÜD kZl : ÷ *0ûC÷%ûVü E Dü $Þ÷ ühûC÷%û 1 Ó¥÷H§'é N@2<SHK>JLPQ7G98K $ºò *
%
$
$
1
1
ðh.
ü ¥a*Bö<÷B§
¥÷$Û üOö/üX§ ü
!
"
¥;ü@ö<÷0$Cü §
* ¥ ÷rö *c§ ü * ü f-SH7:9Q7:9¤28SHKÖ=µ4@9LK *0ûC÷%ûVü T f-SH7G9W7G9¤2<SHKÖ=/4@98K ü ÷%û 1 T 7:IOÑBJ8K # T ðB f-SHK^98Kb2 lNOJ-K 4@EF]B[:K # T . $ T Ö E ¨, +"+.µ, M R ¨"+)+. WK>=µ4@[G[2<SH462h4]BJ8NOiD46iH7:[G7G2jîEFK/4@98ÑHJLK # 7:9ÖAHK D?HK/ANO?4 DK>[:A N@lZ49<4@E ]H[GK`98]H4@=/K T Ï J<4@?BAHNOE a@46J87m46iH[:K 7G9%4 |½ `7[}XH EF4@] W~ T îK/4@98ÑHJ84@iH[GK^E0Kµ46?H9¤2<SD42µÒHlNOJ-KbaOK/J;j õ Ò óµ¦ÛÔ¥m¦Z§ZûÛõ¡ý ! kT î>, ¨&Ü,ï7. H$ H ü Tï1¬SBNXPì2<SD42 # ¥é õ §Üé mF¥õ § $ mF¥õ § 4@?BA mF¥õ § $ mF¥õ §'é # ¥ ûÛõ § $ # ¥ ûÛõ %§ $ ù ù T ¡K>2 i Kß98ÑB=5Sñ2<SD42 # ¥ é 1O§ßé # ¥ é # §ßé -ü ,¬0ü * 4@?BA # ¥ é 1 1 O§é ,¬Iü * T 7'[:N@2Ô28SHK iTkl m T ï9LK m 28N D?HA # ¥ 1 û . $ § 4@?HA # ¥ 1ÔûÛ û . $ § T # 1T 7'J8N a@K ¡K/E0EF4 # T ü T T ¡K>2 SD4µaOK^]HJLNOiD4@iH7G[:7R2jFABK/?H9L7G2jklÑH?H=b2<7GNO? . 1 E -ü ,2. * õ ü o_¥õ §Üé #, # õ N@28SHK/J;PQ7:9LK $ * *
*
4)
)
ð
N c "_
"
¥ 4 § 7:?HAk28SHKÖ=/ÑHEÍÑB[m46287GaOK¨AB7:9L28J87GiHÑB2<7GNO?lÑH?B=>2<7GNO?N@l T E ¥ i § ¡K>2 Åéü-, T 7:?BA`2<SHKw]BJ8NOiD46iH7:[G7G2jkAHK>?H987R2j`lÑH?B=>2<7GNO? ¥÷H§ lNOJ T Uï7G?c2 NO?H9L7:AHK>J-2<SHJLK/KÖ=µ4@9LK/9 ûC÷%û Ò ûC÷%ûVü 4@?BA ÷ Vü T T1¡K>2 46?HA irKFAB7:98=>J8Kb22 SD4µa@KWAB7:9L28J87GiHÑB2<7GNO? m 4@?HAwAHK>?H987R2jÍlÑH?H=b2<7:N@? 4@?HAÔ[:Kb2 ÛirKQ4_98ÑHiH9LK>2 N@l2<SHKÖJLKµ4@[¡[G7:?HK6T ¡K>2 ¥õ § irK_2<SHKÖ7:?BAH7:=/462287:NO?klNOJ ¥õ §Üé *ü õõ!! , $ ¡K>2 é ¥ß§ T 7:?HA 4@?îK ]HJLK/989L7:NO?îlNOJ-2<SHK^=/ÑHEÐÑH[m46287Ga@K^AB7:9L28J87GiHÑB2<7GNO?kN@l T ¥ Uï7:?c2 DJ89;2 H?HA 28SHKÖ]HJ8N@iD4@iH7G[:7G2jFEF4@989¤lÑH?B=>2<7GNO?klNOJ T § $ 1T ¡K>2 4@?BA irKZ7G?HAHK>] K>?HAHK/?c2á4@?HAÐ9LÑH]H]rNO98KÓ2<SD462Kµ46=5SÍSDE 4@94 ï?B7GlNOJLE ¥ *Böµü § AH7G9L2<JL7:iHѬ2<7:N@?°T ¡K>2 é E07:? ó Mö Ôý T 7:?HA%28SHKÖAHK/?B987G2j ¥ c§ lNOJ TÜUï7:?c2 2WEF7GIOSc2-irK^K/4@987GK/J¤28N HJ89L2 D?BA # ¥ O§ T 1T ¡K>2 SD4µaOK^=/ABl m T 7:?HAk2<SBK^=>ABláN6l é EF4 å/ó *BöLºý T 1T ¡K>2 ] ¥ § T 7G?HA m¥õ § 46?HA m ¥ O§ T ü0* T ¡K>2 4@?BA i K¨7G?HAHK>] K>?HAHK/?c2/TZ13SHN Pì28SD462 æ¥g§ 7:9Ó7:?HAHK>] K>?HAHK>?½2WN@l # ¥ Ч PQSHK>J8K 4@?HA 46J8K¨lÑH?H=b2<7:N@?H9/T üOü TW1¬ÑH]B] NO9LK¨PZK_28NO989W4Í=/N@7:?kNO?H=>KÖ4@?HA[:Kb2 D i K_28SHK¨]HJLNOiD4@iH7G[:7R2jÔN@lSBKµ4@AH9>"T #Kb2 AHK/?BN@2T ¥ 4 § 7ÜJLN aOKÖ2<SD462 4@?HA 4@JLK^ABK/]rK/?HAHK>?c2µT ¥ i § #Kb2 7N@7:989LNO? ¥ § 4@?BA9LÑH]H]rNO98KwPÓK028NO989^4k=/NO7G? 287:E0K/9>T ¡K>2 4@?HA irKî2<SBK`?½ÑHEÍirK/JFN@l¨SHK/4@AH9k4@?BA 254@7G[:9>T 1¬SHN P 2<SD462 4@?BA 4@JLK 7:?BAHK/]rK/?HABK/?c2µT ü-1 T 7'J8N aOK^f-SHK>NOJ8K>E # T # # T
!
!
!
ðOð
!
"
ü # T1¡K>2 º¥a*BöµüX§ 46?HAî[GK>2 é o T ¥ 4 § 7:?BA%2<SHKÖ]rABllNOJ T"7Ü[GN@2-7R2µT ¥ i §+¥ æuBÀ^ÊcÆ tªo6»0×½Ë3Ê@o6»Âs·Àdoq tbx·§ ¨K/?HK>J<4628Kg4Þa@K/=b2NO?H987G9L287:?HIïN@l CJ0ü * Ò *** J<4@?BAHNOEô9L2<4@?HAD4@JLA'ïNOJLE46[:9/T ¡K>2 ÷Ôé ¥÷ ö7$I$I$/ö8÷ § PQSHK>J8K ÷ ! é T J<4µP 4^SH7:9;22 ¥Mö Ч i K-ÑB?H7GlNOJLEF[Rj^AH7:9;2<J87GiHÑB28K/AÔNO?w2<SHK-ÑB?H7G2ÜAH7G98= ó¬¥õ#ö<÷H§Z¨õ ù úk÷ ù û ü@ý (T ¡Kb2 é ù ú ù T 7:?HAk2<SBK^=>ABl'46?HA] ABlN@l ÍT ü T ¥ 8 ÆcqOs ÁXo6»mÇL´c¶^»m´½qX¿cu¬À qcÆcÀhÄ@o6» 9Oo6qXo»´ tªuc»xÙ§ ¡K>2 SH4 a@K 4 =/N@?½287:?½ÑHNOÑB9/Ò 9;2<J87G=>28[GjÖ7:?B=/J8K/4@987G?HI iTklnm T ¡K>2 é m¥ß§ T 7G?HAh28SHKZAHK>?H987R2jÐN6l Tf-SH7:9 7G9 =/4@[:[GK/Aw2<SBKï]HJLNOiD4@iB7:[:7R2jÍ7G?½28K/IOJ84@[H2<J84@?H9;lNOJ8E T 'WNXPÛ[:K>2 W?H7GlN@J8E ¥ *¬öµüX§ 4@?BAß[GK>2 é m ¥ § TÔ1¬SHN P 2<SD42 Xm T 'WN P PQJ87R29 ï?H7RlNOJ8E ¥ * Ò üX§ J<4@?HABNOE¼a@46J87m46iH[:K>9d4@?BA`IOK>?HK/J84629_J<4@?BAHNOE¼a64@J87:4@iH[:K>9 lJLNOE 4@? ] N@?HK/?c2<7:4@[ ¥ § AH7G9L28J87:iBÑB2<7GNO?°T ü ð 1T ¡K>2 7NO7:9L98NO? ¥ § 4@?HA @ 7NO7G989LNO? ¥ § 4@?HA04@9L98ÑHE0KZ2<SH462 4@?HA 46J8K 7G?HAHK/]rK/?BAHK/?c2µTÍ1¬SHN P 2<SD462d2<SHKwAH7G9L28J87:iBÑB2<7GNO?`N@l IO7RaOK>?g28SD462 ¼ú é A 7G9QÕZ7:?HN@EF7:4@[ a¥ Aö § PQSHK>J8K +é ,B¥ ú § T UW7:?c2 üO NOÑgEF4µjMÑH9LK028SHKÍlNO[:[GN PQ7:?HIl46=>2 l 7NO7G9898N@? ¥ § 4@?HA 7NO7:9L98NO? ¥ § ÒB4@?HA 4@?HA 46J8K_7G?HAHK/]rK/?BAHK/?c2µÒ¬2<SHK>? ú 7NO7:9L98NO? ¥ ú §T UW7:?c2 1¬ 'ïN622 E ¥õwú÷ ù 3 § *0ûÛõ`ûVü 46?HA *0ûC÷%ûVü o Ó¥õ#ö<÷B§'é N@2<SBK/JLPQ7G98K $ * 7G?HA ù é ù T ü 1T ¡K>2 º¥ # öµü ðO§ T 1¬N@[GaOK2<SBKîlNO[G[:N PQ7:?HI+ÑB987:?BI2<SBK 'WNOJ8EF4@[¤2546iH[:K 4@?HA ÑH9L7:?HI04w=/N@EF]HѬ2
ð$
N c "_
"
¥ i § 7G?HA # ¥ $1O§ T ¥ = § 7G?HA õ 9LÑH=5Sî28SD462 # ¥ õ §Üé $ * T ¥ A § 7G?HA # ¥ *ÔûÛ N.½§ T ¥ K § 7G?HA õ 9LÑH=5Sî28SD462 # ¥ õ ³§'é $ * T 7'J8N aOKÖlNOJLEÍÑH[m4 ¥ # T üOüX§ T ¡K>2 `ö ï?H7Rl ¥a*BöµüX§ irKÍ7:?HABK/]rK/?HAHK>?c2µT 7:?BA`2<SHK kZl lNOJ $ 4@?HA , T ¡K>2 Iö $I$I$>ö8 ] ¥ § i K # # k T ¡Kb2 é E4 åó ö7$I$I$/öL ý T 7G?HA 2<SBK kZl N@l TÜUW7:?c2 ûC÷ 7Rl46?HAîN@?H[Gj7Gl ! û ÷ lNOJ éüOöI$I$7 $>ö A T ¡K>2 46?HA i K^J<46?HAHNOEa64@JL7m4@iB[:K/9>T-1¬ÑB]H] N@98KÖ2<SD462 ¥ ߧ éñ T-1¬SBNXP 2<SH462 °u6ÁH¥Mö ͧ'é 0¥ß§ T ¡K>2 ï?H7RlNOJ8E a¥ *BöµüX§ T ¡K>2 * ìü (T ¡K>2 é *ü N@* 2<SBK/õ JLPQ7G98K 4@?HA [:Kb2 õ ü é *ü N@ 28 SHK/J;PQ7:9LK ¥ 4 § Ï J8K 4@?HA 7:?HABK/]rK/?HAHK>?c2Èe Scj S½j?BN@2Èe ¥ i §Í¥;0ü * ] NO7G?c2<9 § 7:?HA ¥ d§ TÖUï7G?c2 SD462_a64@[GÑHK/9 =/4@? 254@\6K e 'WNXP D?H A ¥ é c§ T
ü T
T
1h*
Ÿ T
1
T
#
T
11 1
!"$#&%(')%+*-,/. ,/01' 23'4.65,/7 8':9;*-'4<6=-" >&?A@B@DCFEG@-HJILKMILNPORQTSUOWVYX/@ZKWQ\[]OW^_K`VLKMQAaAORXcbMKWVdNeKWfAge@ihjNPkYIL?A@:KbW@-VdKWlR@:bMKWgemn@ OW^ohp+>&?A@i^qORVdXrKWgsan@JtuQANvILNPORQwNek]KWk&^qORgPgeOx]k-p y+zJ{G|R} ~L} n|:u
u4+W
AuUoA4u(4iM D¡¢G£W)¥¤ h ¦e§4¨ q©;ªs ¨r« `¬D NPk¼anNekdHJVd@DIL@ $Sqh®[+¯±°³²Y´µ`S¶²£[+¯³·³¸1½ ²£¹ º(²£Sqº»²£Sq[¾²£´R[² NPNP^]^]hj hjNPk¼HJORQILNPQmnORmAk SU¿npPÀÁ[ ¦ ª§ÃÆr§DÄFªsÅB ¦« ªÆ«Ó¦ « ª  ««Ç « u¨ Dªs§DÄ «ÅÉ « È uB Õ¦ ªÔJÖu« ÊÃÆW× « ÂW ¨rË Ì Ø ¦e§ÎÄÍ : ˶¥ËÐϤ ¨ hTq©;Ù ªs ¨Ñ`Ò Ä§ « u(¤Z Ë¶Ë Í Ï Â ÂWË ÎSqh®[+¯ÚÇhɯ۰ܲδµrSq²£[+¯1Ý!¯ÚÝsÞYß SU¿npáàR[ &> ?A@4@âCnEG@-HDIÃKMIdNeORQãNekäKåORQA@âæçQmAX:fG@-VäkdmAX/XrKWVÕè`OW^»Id?A@anNekÕIdVdNPfAmnILNPORQp;>&?ANPQAéêOW^ iS¶h®[ëKWk]IL?A@)KZbR@JVLKWlW@)bMKWgPmA@$èRORmãxìORmngeaãORfnIÃKMNeQwNP^oèRORmãH-OWXrEAmFIL@-aãIL?A@iQmAX/@-VÕNeHZKMg KZbR@-VdKWlR@îíðïuñ ¸1òóÐô ñ h ó OM^YKgeKWVdlW@rQmAX:fG@-VBOM^¼õÓõ¶ö÷aAVdKx]k:h ñâø ßZßZß ø h p6>&?A@ê^UKWHDI IL?uKI¼ÎSqh®[ìù í ïuñ ¸ òóô ñ h ó NPkÎKWHDILmuKMgegPèêXrORVÕ@)Id?uKWQúK`?A@-mnVdNek¾ILNPHWûëNvIYNPò kYKåIL?A@JORVd@JX HZKWgPge@JaIL?A@úgüKZxcOW^igüKMVdlR@!QmnX:f£@JVdkêId?uKMI`x_@x]NegPg¼anNekdHJmAkdkîgeKMIL@JV-p±>&?n@QAOWILKMILNPORQ ýRþ
½ ²Y´µ`S¶²£[oaA@Jkd@JVÕbR@Jk;kdORX/@ëH-OWXrX/@-QIZp @ëmAkd@&NvI»X/@-VÕ@-gvè4KMk+KYH-OWQbW@-QANP@-QIÇmAQANv^¶èFNPQAl QAOWILKMILNPORQ6kdOwx_@raAOWQ"! Ii?AKbW@/IdOãx]VÕNPId@ ¸ ¹ ²£º»Sq²£[Y^qORV$aANPkdH-VÕ@JId@/VdKWQAaAOWXbMKWVdNeKWfAge@Jk KWQAa ½ ²£º(S¶²£[Õ´R²å^qORV»HJORQILNeQmAOWmAk(VLKMQAaAORX
bMKWVdNeKWfAge@JkfnmnI(èROWmBkd?AORmngea:fG@&KZxëKMVd@ìId?uKMI ½ ²Y´µ`S¶²£[ì?uKWkäKBEAVd@JH-NPkd@)Xr@-KWQANeQnl:IL?uKMI]NPk&aANekÕH-mAkÕkd@JaNeQîVÕ@ZKWgðKWQngPèkdNPk&H-ORmAVÕkd@Jk-p >O]@JQAkdmAVÕ@+IL?uKMIÎS¶h®[£Nekx_@-gPgRaA@DtuQA@-$a #x_@;kLKZèYId?uKMIÎSqh®[s@âCFNekÕIdkNP^ ½ ¹ % ² % ´µðÞ$S¶²£[& p (YIL?A@JVÕx]NPkd@$xì@)kLKZèêId?uKMI]IL?n@)@âCFE£@JHJIÃKILNeOWQãanOF@JkäQnOWIä@DCFNek¾IZp ' ñ ô<; ²£º(S¶²£[Y¯cS >= S¥À ? )+*,.-0/.1 z$u3 254ð «(h 687ì@-VÕQAORmAgPgeNU:S 9G[-Ñ uDª ÎS¶h®[¼¯ ¸ D ¹ 9G[ÕA [ @ÛS¾À = 9\[ÇB¯ 9oÑ C )+*,.-0/.1 z$u3 D5E Ë ¦ Ö Â ¤  ¦ Y×à ¦ ª «UÍ «U¦¶Å §-Ñ 4ð «uh ¬D « uäª ÄFÅ ¬DDY¥¤_u  ¨§-Ñ uDªn ÎSqh®[]¯ ½ ²G´µoÞiS¶²£[]¯ ¸ ¹ ²£ºÁÞiS¶²£[¼¯³S F= º(S [d[ @ S¾À = º»S¾ÀÁ[Õ[ @ SÓà = º(SÊàR[Õ[]¯ S G= S¥IÀ HM¿[Õ"[ @ÛS¥À = S¾IÀ HRàW[d"[ @1SÊà = S¾IÀ H¿[d[+¯
ÀR"ß C )+*,.-0/.1 z$u J 4ð «K h 6MLäQANP^ZNS ?)À Pø O [-Ñ uDªn $S¶h®[+¯ ½ ²G´µðÞiS¶²£[ǯ ½ ²£ºÞiSq²£[¾´R²w¯ ñ ½SR ²Y´²ê¯ ÀÑTC Q ïuñ )+*,.-0/.1 z$uV UJW]Ã× ÂWË¶Ë «  «   ª ¨ Å Ø Â ¦  ¬ Ë ì  § F Y ¨W¦e§D« ¦ ¬ Ä«Ó¦ ª ¦ ¤ ¦¶«  §  X;Â Ä ×à ¨ Dª §D¦¶« Y ºÁÞiS¶²£[¼[ ¯ Z]\ÇS¥À @_² ^âP[ ` ïuñ bÑ a§D¦ ªÆ ¦ ª « ÊÆW  «Ó¦ ª c¬ Y4Ö Â «q§ ÈÓ§ S« d®¯ ²  ª ¨ e ¯ÚIÃKWQ ïuñ ² Ì ¯ há² ILKWQ ïuñ S¶²£k[ j ;f ? l° f ILKWQ ïuñ ²Î´R²î¯ ' ° % ² % ´µrSq²£[;¯ \ à ° f gÀ ²Y@ ´R²² ^ i ; ; § « u Å Â ª ¨ § ªs « ÕÔ ¦e§D«ÊnÑ m o¤ Y ħD¦¶ÅBÄ Ëv « 5 X;Â Ä ×à Y ¨W¦e§D« ¦ ¬ Ä«Ó¦ ª Å Â pª Y «U¦¶Å §  ª ¨/« r q « u Â Ø D  ÆÃY ÄwÍ+¦ Ë¶Ë § à «  «;« u Â Ø D  Æiªs Ø D § «U« Ë §4¨ Í ª Ñ ¦e§`¦e§ ¬DÃ× Â Ä§ « u X;Â Ä ×à Yú  §î« ¦ × q «  ¦ Ë §  ª ¨ uDªs×ÃãÕÔ « à Š!W¬ § D Ø Â «Ó¦ ª §  L:×à ÅBÅ ª ÑsC tuVdORX QAOxÜOR" Q #Çx]?A@-Qn@JbR@JVrx_@waANekÕH-mAkÕkr@âCnEG@-HDIÃKMIdNeORQnck #_x_@ãNeX/EAgeNPH-NPIdgPè¡KMkdkdmnXr@ IL?AKMI&IL?A@Dèw@DCFNek¾IZp u@Jw I v J¯ xnSqh¡[DSp yäOÁx1aAO4x_@$H-OWXrEAmFIL@ä$zS v4|[ {}(ÎQA@¼xëKZèêNPk_IdO4tuQAaãIº ~ìS A[;KWQAa IL?n@-QãH-ORX/EAmnId@$ÎS v:[;¯ ½ AIº ~ÇS n[Õ.´ £p 7ìmnI&IL?n@-Vd@)Nek]KMQã@-KWkdNP@-V&xìKèWp
w[
ÁzJ üz -
w G
À
w
Áz .1 iz o~ ÁzG1 , ¥~N,~L} ¥~L} } ,| 4o « v ¯ nx Sqh®[JÑ uDª $zS v:[;¯÷iSxAS¶h®[d[(¯Û° xnSq²£[¾´µoÞÎSq²£[Dß US ¿np O [
!#"$
%$
%&
')(
&> ?ANekrVd@-kÕmAgPI`X`KMéW@-krNeQILmnNPILNvbR@ãkd@JQAkd@MpÛ>&?ANPQAé OW^$EAgeKèNPQAl K®lKWX/@ãx]?A@JVd@úx_@ aAVLKZx h KMI$VLKMQAaAORXKWQAa6IL?A@JQ ¼EuKZè!èRORm vܯ xnSqh®[âp ÇOWmAV$KZbR@JVLKWlR@åNPQAH-ORX/@4NPk xAS¶²£[$IdNeX/@-kÎId?A@`HÃ?uKMQAH-@/IL?uKMIih ¯ ² #»kÕmAX/Xr@Ja÷SqORViNeQIL@JlRVLKMId@-a\[$ObR@JV4KWgPg(bMKWgemA@Jk OW^+²ð0p y¼@JVd@:NPkÎKêkdEG@-HJNüKWgoHZKMkd@W0p u@JI f£@:KWQ@JbW@-QIKWQAa6gP@J I xnSq²£[]¯ (S¶²£[äx]?A@JVd@ »Sq²£[;¯ ÀiNP^ð²
KWQAa »Sq²£[;¯ NP^ð² H 4p;>&?A@-Q ÎS (S¶h®[d[+¯ ° (S¶²£[dºÁÞ$Sq²£[¾´²î¯ ° ºÞ$S¶²£[Õ´R²w¯ ìS¶h $[Dß çQ!OMIL?A@JV]x_ORVÕaAck #uEAVÕORfuKWfnNegeNvIçè`NPk]KBkdEG@-H-NeKWgHZKMkd@)OW^»@âCFE£@JHJIÃKILNeOWQp )+*,.- /.1 ziu l4ð «oK h 6ML¼QnNP^-S ø ÀÁ[-Ñ 4ð « v ¯ xAS¶h®[+¯ Þ Ñ uDªn ÎzS v:[;¯1° ñ ¹ º(Sq²£[¾´²î¯±° ñ ¹ ´R²w¯ ? ÀRß ; ; ËÀ & « Dâ ª &  «U¦¶Ø Ñ Ë YÁ0 uDYªn Ä i×ÃS v:Ä [;Ë ¨ ¯ ©;½ ª ¨!ëIº º(~_S S nn[Õ[`.´ 4Í ¯ ¦ ×à ? «UÄ ÀâsÑ ª §C Ä«i« ¡¬D Iº ~ìS A[B¯ IÀ H ¤Z ñ +*
-,
.
/31
546.
0/21
7/31
8/21
46.
/21
:9
;46.
1
*
=<
?>
>
>
?>
@
CB
A>
D>
iz u Ârq  §D«U¦ × q ¥¤ Ä ª ¦¶« Ë DªÆ «  ª ¨ ¬-L Ârq ¦¶«  «  ª ¨ Å/Ñ 4ð «Av ¬D « u Ë« DuªDƪ « h ¥¤ 6 « uL¼ QANvË ^-Sª ÆDÀ¼[ Ö ¦ ª Ã×ÃG¨ Ñv Ò ¯ xA S¶«]h®¦e§B[ë« ¯ u XrÅ KC  Zª h ¥¤ Àv ?hbm `F¤ Ñ h ¦e§Bħ « uxArS¶²£¬-L[& Âr¯ q ÀÖu?T¦ ª ² « Í uDª & }² &ÛIÀ HRà ø  ª Â0¨ xAS¶²£[ǯڲ Í uDª ÀIHRà ÷²}ø &ÛÀÑ Dªs×Ãà ÎzS vB[+¯ ° xnSq²£[¾´µrSq²£[+¯ ° ñ ^ S¾À ? ²£[Õ´R² @ ° ñ ²Î´R²w¯ ¿O ß C ñ^ ; t\mnQAHJIdNeORQAkoOW^Fkd@DbR@-VdKWgFbMKWVÕNüKWfnge@-kðKMVd@;?uKWQAange@-a)NeQK]kdNeX/NegeKWV£xìKZèRp ^ 1¯5xAS¶h v4[ ø IL?A@JQ ÎS Î[;¯ $S xnSqh ø v4[Õ[ǯ۰ ° xnSq² ø n[Õ´µrSq² ø n[Dß Sq¿Ap ¿[ )+*,.- /.1
=E
GF
H
JI
K
K
L* NM
M
à
z$u ð « ¯ nSqh ÎS Î[;¯ ° °
ð «sSqh ø v:[ Â Ø Â ¦ ª « Ë Y Ä ª ¦ ¤Z Å
¨W¦e§D« ¦ ¬ ÄF«U¦ ª`ª « u Ä ª ¦¶«\§ -Ä Â L Ñ ø :[;¯ h ^@ v ^ÁÑ uDªn ñ ° ñ Sq² ^ @ ^ [A´R²G´+:¯ ° ñ ² ^ ´² @ ° ñ ^ ´+:¯ à ßC xnSq² n[Õ´µrSq² n[»¯ ° ø ø
)+*,.-0/.1 54 4 LM Jx v M
;
;
;
>&?A@ ¢G£ OW^h NPk&aA@JtAQA@-aúIdOrfG@ÎÎSqh Z[]KMkdkdmnXrNPQAlBIL?uKMI&ÎS % h % -[& p@)kÕ?uKWgPgVLKWVÕ@-gvèêXrKWéW@iX:mAHÃ?ãmAkÕ@)OM^(X/ORXr@JQILk&fG@JèROWQAa /¯±àFp
'
;
9;,/"Î9+%(*-" ,/0±!"$#&%(')%+*-,/. Áz- üzrm¤ h ßZßZß ø h  Lã  ª ¨ ÅjØ Â ¦  ¬ Ë §  ª ¨ ñâø ß-ßZß ø  Ã6×ê Ï â ñ ø §D«  ª «q§ « uDª ò ò h ó ¯ ó ÎSqh ó [âß SU¿Ap R[ ó ó ó )+*,.-0/.1 z$uRB 4o «Çh 6 7ìNeQnORXrNeKWgqSqí ø 9G[JÑ Ò Â «ä¦e§/« u Å Â ª ¥¤ h Ò î×Ã Ä Ë ¨ « Y «  ÖÖu ÂWË « « u ¨ q©;ª ¦¶«U¦ ª Ù $Sqh®[+¯Û°Ü²Y´µoÞÎSq²£[;¯ ²£ºÞiSq²£[;¯ ò ² í² 9 ¹ S¾Àw?}9G[ ò ï ¹ ¹ ¹D<ô ; ¬ Ä«)« ¦e§w¦e§ ªs «  ªÚ  § Y §DÄÅ « Ø ÂWË Ä Â « Ñ m ª §D«  ¨ )ªs « «  «]h ¯ ¸Ûòóô ñ h ó Í uDL h ó ¯³Àú¦ ¤ « u « §Ã§å¦e§ u  ¨§  ª ¨Bh ó ¯ « uD Í+¦e§ Ñ uDª ÎSqh ó [¼¯ S 9 = ÀA[ @ÛSdS¥wÀ ?}9G[ =} [;¯ 9  ª ¨iiS¶h®[+¯ÚÎS ¸1ó h ó [ǯ ¸1ó iS¶h ó [+¯1í 9oÑ Áz- ürz 4ð «oh ñâø ßZßZß ø h ò ¬D ¦ ª ¨ çÖuDª ¨ Dª «  ª ¨ ÅÜØ Â ¦  ¬ Ë §-Ñ uDªn ò h ó ¯ ÎSqh ó [Dß SU¿Ap ý [ óÐô ñ ó ¼OWIdNeHJ@/Id?uKMI)IL?A@rkdmAX/X`KILNeOWQ®VÕmAge@åaAO@-kQAOWIVd@ mnNeVd@rNeQAan@-EG@-QAaA@JQAH-@`fAmnI)IL?A@ X:mAgvILNPEAgeNPHZKMIdNeORQ/VdmAgP@$aAO@-k-p
J
!
F
#"
%$
&
J
'
(
)
)
*
,+
O
0 w l
| ~ Õz iS¶h ¼z ezú Ã}á| -z ÎSqh ¯É$Sqh®[ Ý ¯ Ý ¯ ¡z | ¾ YzD~L} ¼z ¾z h Ý ¼z ez ez ¼ ez ¶~¥z| +z ~ Áz U} | -zW
, $ ? , $>,b- , $ , $ & ? ? l, $ - 3$ % ? % , $ $ , $ L$ / , - ' r, V, & &
w
8 '49;*-':.6#]" '4.65 , )':9;*-'4.6#ä" >&?A@)bMKWVÕNüKWQnH-@$X/@ZKWkÕmAVd@Jk]Id?A@ ÕkdEnVd@ZKMa rOM^+KBaANek¾ILVÕNefAmnIdNeORQsp y+zJ{G|R} ~L} n|:u] Dl4ð «nh ¬D   ª ¨ Å Ø Â ¦  ¬ Ë Í+¦¶« Å Â ª Ý;Ñ u»nA UuoR ¥¤ h ¨ Dªs « ¨ c¬ Y ^ Þ^ Sqh¡[ Sqh®[ h ¦e§i¨ q©;ªs ¨ ¬cY ^ ¯ÚÎSqh ? Ýð[ ^ ¯ ° Sq²F?Ýð[ ^ ´µrSq²£[ SU¿np [  §Ã§DÄFSqÅBh®¦ ª[ Æ «ª ¨r¦e§ ¦e§ ÕÔJÖu§Ã× «  ¨ «UD¦ ªsª! « Õ¨Ô ¦e§D¬c«qY §-Ñ uª Ǩ JZÞiuÑ ðAð Un U¢G ¦e§ÇkdaSqh®[;¯  ÂWË Â
O
ÁzJ üz §Ã§DÄÅB¦ ªÆ « u Ø Â ¦  ªs×à ¦e§úÍ Ë¶Ë ¨ q©;ªs ¨ ¦¶«  §ú« u¤Z Ë¶Ë Í+¦ ªÆ ÖGLÃÖuD «U¦ §-Ù Ñ Sqh®[+¯ÚÎS¶h ^â[? Ý"^ÁÑ Ñ m ¤  ª ¨  LB×ê §D«  ª «q§i« uDª S Rh[@ D[;¯ ^ S¶h®[-Ñ Ñ m ¤ h ñDø ßZßZß ø h  L ¦ ª ¨ çÖuDª ¨ Dª «  ª ¨ ñDø ßZß-ß ø  Ã4×ê §D«  ª «¶§ « uDª ò ò Sq¿Ap [ ò ó h ó ¯ ò ó^ Sqh ó [âß óÐô ñ óÐô ñ )+*,.- /.1 ziuvc U54ð «(h 6 7ìNPQAORX/NüKWg¶SUí 9G[-ÑÒ Í ¦¶« h ¯ h ó Í uDL h ó ¯À ø ó ¸ ¦ ¤ « §Ã§ ¦e§ u  ¨§  ª ¨Bh ó ¯ « uD Í+¦e§ Ñ uDª h ¯ ¸ ó h ó  ª ¨ã« uå  ª ¨ Å Ø Â ¦  ¬ Ë §  L ¦ ª ¨ çÖuDª ¨ Dª «ÊÑ Ë § ëS¶h ó ¯ ÀÁ[B¯ 9  ª ¨ ìSqh ó ¯ [å¯ À ?B9oÑ W]Ã× ÂWË¶Ë «  « $Sqh ó [+¯ 9 = À +@ vS¾Àw?o9G[ = \¯ 9ß Í $Sqh ó ^ [+¯ 9 = À ^ @ vS¾wÀ ?}9G[ =} ^ G¯ 9ß uDLq¤ZÃà S¶h ó [w¯ ÎSqh ó ^ [ ? 9 ^ ¯ 9 ? 9 ^ ¯ 9oS¾À ? 9G[JÑ E ¦ ª ÂWË¶Ë Y Sqh¡[㯠S ¸1ó h ó [ë¯ ¸1ó S¶h ó [ë¯ ¸1ó 9oS¾À ? 9G[ë¯ í 9oS¥À ? 9G[JÑ «U¦ ×à «  « Sqh®[ì¯ ¦ ¤ Y Äw§ Ã Í Y « ¦e§iÅ r q §i¦ ª «UÄF¦¶«U¦¶Ø § Dª § Ñ 9î¯
À 9`¯ Ñ r q §DÄ L0
!
@
"
%
$#
'
)(
&
@
+*
9
9
-,
.*
,
/
+*
,
.*
,
0
0
21
/
¿
^»h ñâø -ß ßZß ø h KMVd@iVLKWQnaAORXbMKWVÕNüKWfnge@-k_IL?A@JQ!x_@)an@JtuQA@iIL?n@ JuUúu ILO Gf @ ò h ¯ íÀ ò h ó SU¿Ap þ [ ò óô ñ KWQAaîId?A@ -uUãAA quoR ILOåfG@ À ò h ? h ^ ß ^ ¯ Sq¿ApvÀ [ ó í ?÷À ò ò óô ñ Áz- ürz 4ð «oh ñâø ßZßZß ø h ò ¬D õUõ¶ö  ª ¨ Ë «Ý!¯ÚÎS¶h ó [ ^ ¯ Sqh ó [JÑ uDª ÎS h [+¯Ý ø S h [+¯ í ^ KWQAa ÎS ^ [+¯ ^ ß ò ò ^Yh KWQAa v KWVd@wVdKWQAaAORX bMKWVdNeKWfAge@Jkc#+IL?n@-Q Id?A@wH-ObMKWVdò NeKWQAHJ@!KMQAa H-ORVÕVd@-geKMILNPORQ fG@JIçx_@-@-QhjKWQA}a v Xr@-KWkdmnVd@4?nOÁx
k¾ILVÕORQAl`Id?A@4gPNeQA@-KWV]Vd@-geKMILNPORQAkÕ?ANeEwNekäfG@JIçx_@-@-Q6h KWQAa v`p *
J
*
y+zD{£|R} ~L} n|4uv 54ð «Çh  ª ¨ v ¬Då  ª ¨ ÅÉØ Â ¦  ¬ Ë §åÍ+¦¶« Å Â ª §iÝsÞ Â ª ¨ Ý"~  ª ¨Î§D«  ª ¨  ¨¨ ئ  «U¦ ª § Þ Â ª ¨ ~_Ñ q©;ªs « u+R¢FnA UuoRî¬D «UÍ ÃDª h  ª ¨ v c¬ Y S¶h ø v4[ǯ÷ PSqh ?ÝsÞë[-SvM?Ý"~o[ SU¿npPÀRÀ[  ª ¨r« u¼R¢\Áqn q¢GÚc¬ Y Sqh ø vB[ ß SU¿npPÀÁàW[ ¯ Þ ~¡¯ Sqh ø v:[;¯ ~ ¹ <
*
,
3u:×Ã Ø Â ¦  ªs×à §  «U¦e§ ©ë §-Ù Sqh ø v:[;¯ $Sqh v:[ ?ÎSqh®[ ÎSzv:[âß u:×ÃâL Ëv «Ó¦ ª §  «Ó¦e§ ©ë §-Ù ?)À S¶h ø v:[ ÀRß
J
Áz- üzr-
H
H
F w
w B o <G w GJ
¤ v ¯ @ Lh ¤Z § Å î×ê §D«  ª «q§  ª ¨ w« uDª Sqh ø v: [i¯3Àú¦ ¤  ª ¨ Sqh ø vB[+¯n?)ÀΦ ¤ & Ñm ¤ h  ª ¨sv  L ¦ ª ¨ çÖuDª ¨ Dª « « uDª Sqh ø vB[+¯ ¯ Ñ u:×ê Ø D § ¦e§ ªs «Ç« Ä ¦ ªãÆDªsD ÂWË Ñ ÁzJ üz - S¶h @Jv4[i¯ Sqh®[@ Szv:[ @1à Sqh ø v:[  ª ¨ S¶h ?lvB[$¯ S¶h®[ @ zS v:[ ?$à Sqh ø v:[JÑ L(ÆDªsD ÂWË¶Ë Y¤Z;  ª ¨ ÅÛØ Â ¦  ¬ Ë §h ñDø ßZßZß ø h ò h ó ¯ ó^ Sqh ó [ @ à ó ó Sqh ó ø h [âß ó ó ó m
#
#
#
#
1
¾
!"$#&%(')%+*-,/. '4.65 8'49Ç*J':.6#]" ,å0 ð7 ,/9(%('4.ê% 23'4.65,r7 8':9;*-'4<6=-" y¼@-VÕ@)x_@iVd@JH-ORVÕaãId?A@i@DCFE£@JHJILKMILNPORQ!OM^»kÕORXr@iNPXrEGORVÕILKWQIëVdKWQAaAORXbWKMVdNüKMfAge@Jk-p yì} ¥~ Ó } ~L} n| :z ,| , ÓV} ,| -z OWNeQIäX`KWkÕk]KMI 7ì@JVdQAORmngegeN+SUE\[ E EðS¾ÀJæ E\[ 7ìNPQAORX/NüKWg»SU" Q # E\[ E Q!ES¾ÀJæ E\[ @-ORX/@JIdVdNeH/SqE\[ À ME S¾wÀ ?}9G|[ H 9 ^ OWNekdkÕORQS G[ L¼QnNP^qORVÕX SÓK # f\[ SÓ.K @Îf\[ Rà S T? [ ^ HFÀÁà ¼OWVdXrKWg;SUÝ ø ^ [ Ý ^ »CFE£ORQn@-QILNeKWgÇS [ ^ KWXrXrKwS ø [ ^ 7ì@DIÃK!S »[ HAS @[ HASdS @[ ^ S @ @ÚÀÁ[d[ ø SqNP^#"$±ÀÁ[ "
%$
!
#
*
&
*
@]aA@JVdNPbW@-aBiS¶h®[(KWQAa Sqh®[^qOWV(IL?A@&fANPQAORX/NüKWgNeQBId?A@&EAVd@DbNeORmAk;kÕ@-HJIdNeORQsp(>&?A@ HZKWgPH-mAgeKMILNPORQAkì^qOWV&kdORX/@$OW^oIL?A@iOWId?A@-VÕkYKWVÕ@$NeQîIL?n@)@âCFH-@-VÕH-NPkd@-kJp
ý
&> ?A@igüKWk¾IëIçx_O/@JQILVdNP@-k&NPQêIL?A@$IÃKMfAge@$KMVd@ÎX:mngPILNvbMKWVdNeKMIL@äXrOaA@Jgekìx]?ANPHÃ?wNeQbRORgvbR@iK VLKMQAaAORX bR@-HDILORV]h OW^oIL?A@$^qORVÕX h pñ h ¯ p ß h >&?A@iX/@ZKWQwOW^»K/VdKWQAaAOWXbR@JHJIdORVäh NPkäan@JtuQA@Jaúfè Ý pñ $Sqh p ñ [ Ýú¯ p ¯ p ß Ý ÎSqh [ >&?A@ An quoR ÃR¢AA quRn ¶ ) Nek&an@JtuQA@Ja!IdO/fG @ S¶h ñ [ S¶h ñâø h ^ [ Sqh ñâø h [ S¶h®[;¯ pp S¶h ^ ø h ñ [ pp S¶h ^ [ pp pp Sqh ^ ø h [ ß S¶h ø h ñ [ S¶h ø h ^ [ Sqh [ ^»i h 6 (!mAgvILNPQAORX/NüKWgqSqí ø 9\[_IL?A@JQãÎSqh®[+¯í 9ê¯í+:S 9 ñDø ßZß-ß ø 9 [ëKWQAa í 9 ñ S¥À ?}9 ñ [ ?¼í 9 ñ 9 ^ ?¼ í 9 ñ9 ?äí 9 9 í 9 S¥ À ? 9 [ ?¼ S¶h®[+¯ pp ^ ñ pp ^ ^ pp pp í 9 ^ 9 ß ?äí 9 9 ±í 9 S¾wÀ ?o9 [ ñ ?¼í 9 9 ^ >oOkÕ@-@!Id?ANek #ìQAOMIL@wIL?uKMIåId?A@!XrKWVÕlRNeQuKMgìaANPkÕILVÕNefAmFILNeOWQTOM^$KWQè ORQA@ãH-OWXrEGORQA@JQI/OW^ IL?n@ãbW@-HJIdORV`NPkrfnNeQAORX/NüKMg #(IL?AKMI`NPk/h ó 6 7ìNPQAORX/NüKWg¶SUí ø 9 ó [âp >&?mnck #ëÎSqh ó [î¯ í 9 ó KWQAa S¶h ó [&¯
í 9 ó S¥À ? 9 ó [Dp äOWIL@)IL?uKIYh ó @ h 67ìNeQnORXrNeKWgqSqí ø 9 ó @ 9 [Dp¼>&?mAck # Sqh ó @h [+¯1í(S 9 ó @ 9 [-S¥ÀS? 9 ó @ 9 ü[Dp(ÎQ`Id?A@YOWId?A@-V_?uKWQAa$#nmAkÕNeQAl)IL?A@ä^qORVdX:mngüK ^qOR VBIL?n@ãbMKWVÕNüKWQAHJ@wOW^$K¡kdmAX #_x_@ã?uKZbR@!Id?uKMI Sqh ó @Ûh [ê¯ Sqh ó T[ @ Sqh ó g[ @ à Sqh ó ø h [;¯1í 9 ó S¾wÀ ?}9 ó A[ @ í 9 S¥wÀ ?}9 A[ @ à Sqh ó ø h [Dp ^»èWORmú@ muKMIL@iIL?nNek ^qORVÕX:mAgüK)x]NPIL?ãí+:S 9 ó @ 9 [JS 9 ó @ 9 []KMQAawkdORgvbR@ #uOWQA@ilR@JIdk Sqh ó ø h [+n¯ ?äí 9 ó 9 p t»NeQAKWgegvè #?A@JVd@ÇNekoKägP@-X/X`KìId?uKMIoHZKMQ4fG@Çmnkd@J^qmng^qORVtuQnaANeQAläX/@ZKWQAkoKWQnabMKWVdNeKWQAHJ@-k OW^geNPQA@ZKWVìH-ORX:fnNeQuKMIdNeORQnkëOW^X:mAgPIdNPbMKWVÕNüKMId@¼VdKWQAaAORXbR@-HDILORVÕk-p Mrz - -,4u3 2 m ¤ Ú¦e§ Â Ø Ã× «  ª ¨!h ¦e§   ª ¨ ÅjØ Ã× « Í+¦¶« Å Â ª Ý Â ª ¨ Ø Â ¦  ªs×à ) « uDª ÎS \h®[;¯ £Ý  ª ¨ S \h®[;¯ ) GwÑ m ¤ ³¦e§  Š « ¦ Ô « uDª ÎS äh®[;¯ YÝ Â ª ¨ S ¼h®[;¯ ) (Ñ
*
#*
*
,
8.
:.
.
.
.
*
.
,+
+M c b w
/, .656*J%+*-,/.'4= !"$#&%(')%+*-,/. mAEAEGORkd@êId?uKMIh KWQAa v3KWVd@rVLKWQnaAORX bMKWVdNeKWfAge@Jk-p Ú?uKMIBNPk)Id?A@êX/@ZKWQOW^ëh KWX/ORQAlåIL?AORkÕ@4ILNPXr@Jk&x]?A@-Q v M¯ {>&?A@BKWQAk¾xì@JVÎNek]IL?AKMI¼x_@:HJORXrEnmnIL@iIL?A@Xr@-KWQ OW^Îh KMk`fG@J^qORVÕ@!fAmFI`x_@kdmAfnkÕILNvILmnId@¡º Þ ~ Sq² % A[:^qORV`ºÁÞiS¶²£[/NPQ IL?n@úaA@DtuQANPIdNeORQTOW^ @DCFEG@-HJILKMILNPORQp y+zJ{G|R} ~L} n|:u3 2n
u:×ê ¨W¦¶«U¦ ª ÂWË ÕÔJÖuÃ× «  «U¦ ª¥¤ h Æ ¦¶Ø Dª v ¯ ¡¦e§ ÎS¶h % v
¯ n[+¯·¸½ ²Î²$º ºÞ Þ ~ ~ÇSq² Sq² % n% A[A[A´R´R² ² aAH-ORNekÕQH-IdVÕNe@JQILmA@4ORHZmAKWkYkÕ@HZKWkÕ@Rß SU¿ApvÀ O [ m ¤ xnSq² A[$¦e§  ¤ Ä ªs× «Ó¦ ª¥¤ ²  ª ¨ ¡« uDª ø ÎS xnSqh ø v:[ % v J¯ n[+¯ · ¸½ xnxASq² S¶² øAA[u[uººÞ Þ ~;~ S¶² S¶² % n% n[A[A´´R² ² aAHJORNPkdQHJILVdNe@JQIdmA@:OWHZmAKMkÎkdHZ@ KMkd@Rß ø SU¿ApvÀZ¿[
Ú?A@-VÕ@ZKWk #ðÎSqh¡[YNek$KîQmAX:fG@-V #ÎS¶h % v³¯ n[¼NPk$Kê^qmAQAHJIdNeORQ6OM^TGp 7ì@D^qORVd@:x_@ ORfAkÕ@-VÕbW@v>#Rxì@äaAORQ"! I;éQAOx÷IL?n@]bMKWgPmA@ëOW^\$Sqh % v
¯ A[okdO)NPI+NPk;KiVdKWQAaAOWXbMKWVdNeKWfAge@ x]?ANeHÃ?xì@/aA@-QnOWIL@å$S¶h % v4[Dp çQOWIL?A@JV)x_ORVÕaAkc#ðiS¶h % v[iNPk$IL?A@rVLKWQAanORXÉbMKWVdNeKWfAge@ x]?AORkÕ@úbMKWgemA@ãNek:$Sqh % v ¯ n[Bx]?A@JlQ Ú¯ Gp NPXrNPgüKWVÕgPè #»$SxnSqh ø v[ % v4[/NPk/Id?A@ VLKWQnaAORX bWKMVdNüKMfAge@åx]?AORkÕ@êbWKMgemA@êNPk)$S xnSqh ø v:[ % v ¯ A[$x]?A@-Q ¯ £p6>&?nNek:NPk:K bR@JVÕèwH-OWQn^qmAkdNPQAl/E£ORNPQI&kdO/ge@DI]mAk]geOORéêKMIäKMQã@âCnKWXrEnge@Wp )+*,.- /.1 ziu 22 Ä ÖÖu § Í ¨  Íh 6KL¼QANv^-S ø À[-Ñ ¤ « D Í ãW¬ § D Ø h ¯ ² Í ¨ Â Í v % h ¯² 6[LäQANP^-S¶² ø ÀÁ[->Ñ m ª «UÄF¦¶«U¦¶Ø Ë Y Í `ÕÔJÖuÃ× «ä«  «+$S v % h ¯²£[ί S¾gÀ @ ²£[ HRàAÑ m ª4¤ Â × « Iº ~ Þ$S % ²£[+¯ IÀ HAS¥À ? ²£[ ¤Z ² & &ÛÀ  ª ¨ $S v % h¯ ²£[ǯ ° ñ &º ~ Þ S % ²£[Õ.´ 4¯ Àw? À ² ° ñ ì.´ B¯ gÀ @ à ² ¹ ¹  § ÕÔJª Öu¨ Ã × Å « ¨Ø Ñ ¦ Ĭ § $Í uzS v § % h® Ø [+¯ Ä S¾$À ¦e§)@ã« h®u)|[ HWª àAÄÑ Å ¬DD«U¦ ×Ã$ Szv« % hÉ«GίS v ²£[Ç% h®¯ [+S¾¯ Àg@TS¾$À ²£@ã[ h®HRà [ HRà/ªs×æe§ h  ¯  ² ¦e§ W¬  § D Â Ø Ë ¨sÑ C ÂWË
*
@
/
(
Áz- üzr- Áz .1 zi ;} ~¥z ,Á~¥z êz */Wz D~N,Á~L} n| E+  ª ¨ Å±Ø Â ¦  ¬ Ë §h  ª ¨ v  §Ã§DÄÅB¦ ªÆ « uBÕÔJÖuÃ× «  «U¦ ª § ÕÔ ¦e§D« Í $ Â Ø «  « ÎzS v % h®[ £¯÷iSv:[ KWQAa $Sqh % v[ £¯ÚÎSqh¡[Dß Sq¿ApvÀ R[ ÃYÆDªsD ÂWË¶Ë Yu¤Z  pª Y]¤ Ä ªs× «U¦ ª xAS¶² ø n[iÍ Î Â Ø $S xAS¶h ø v4[ % h®[ G¯ $S xAS¶h ø v4[d[ KWQAa $S xnSqh ø v:[ % h®[ G¯ $SxnSqh ø v:[Õ[Dß Sq¿ApvÀ ý [ _@ ! gegEAVdObR@ëId?A@ìtuVÕkÕI+@ muKMILNPORQp L¼kdNPQAlYId?A@ëaA@DtuQANPIdNeORQ4OW^\H-ORQnaANPIdNeORQuKMg @DCFEG@-HDIÃKMIdNeORQãKWQAawIL?n@i^UKMHJI&IL?AKMIYº(Sq² ø n[(¯±º»Sq²£[dº»S % ²£[ # ÎzS v % h®[ ¯ ° $zS v % h¯²£[ÕºÁÞÎSq²£[Õ´R²î¯ °c° nº(S % ²£[¾.´ Aº(S¶²£[Õ´R² ¯ ° ° nº(S % ²£[Õº(Sq²£[¾´R²G+´ 4¯±°c° Aº»Sq² ø n[Õ´R²G.´ :¯÷iS v:[â}ß C
J
'
*
3&
$2)(
,
*
,
1
*
,
*
,
,+
*
,
$z u32W X ª §D¦U¨ DÕÔ Â Å Ö Ë Ñ Ñ Í × Â ª Í ×Ã Å Ö Ä« $Szv4[ ªs Å « u ¨¼¦e§Ç« ð©;ª ¨ä« u ¦ ª «s¨ Dª §D¦¶« Y º(S¶² ø n[  ª ¨Y« uDªr×Ã Å Ö Ä« ÎSzv:[+¯ ½`½ Aº(S¶² ø n[Õ´R²G´.Ñ ªr  §D¦ D Í Â Y ¦e§Ç« ¨ « ¦e§Ç¦ ª «UÍ §D« çÖ §-Ñ E ¦ §D« Í ÂWË Ã Â ¨ Y q ªs Í«  «$Szv % h®[+¯ S¾Àg@h®|[ HRànÑ Ä§ $zS v:[;¯÷_ÎzS v % h®[+¯ S¥gÀ @ à h®[ ¯ S¾Àg@Îà S¶h®[d[ ¯ S¥Àg@ÛàS¾À]HRàR[d[ ¯ O HM¿Aß C y+zJ{G|R} ~L} n|:A 2U u:×ê ¨W¦¶«U¦ ª ÂWË Ø Â ¦  ªs×à ¦e§¨ q©;ªs ¨  § zS v % h¯²£[;¯ ° S ?Ý;S¶²£[d[ ^ º(S % ²£[¾+ ´ Sq¿ApvÀ [ Í uDL Ý;S¶²£[ǯ÷iS v % hɯ ²£[-Ñ Áz- ürz Eo$  ª ¨ Å³Ø Â ¦  ¬ Ë §ìh  ª ¨ v S vB[+¯Ú S v % h®A [ @ ÎzS v % h®[Dß )+*,.-0/.1
"
"
I
F
@
"
$
J
'
.
þ
B
ziu 2 Â Í Â ×Ã Ä ª « Y  «  ª ¨ Å ¤âL Å « u a ª ¦¶« ¨ «  « §-Ñ uDª ¨ Â Í í ÖuÃÃÖ Ë Â «  ª ¨ Å ¤âL Å « uw×Ã Ä ª « Y Ñ 4ð «_h ¬D « u`ª ÄÅ ¬DDꥤ « u § iÖuÃÃÖ Ë Í uå Â Ø Â ×ÃD «  ¦ ª ¨W¦e§  § Ñ m ¤ ¨ Dªs « §« u¼ÖGLÃÖu «Ó¦ ª ¥¤å¥¤&ÖuÃÃÖ Ë ¦ ª «  « ×Ã Ä ª « Y Í+¦¶« « u ¨W¦e§  § « uDª ¦e§ ÂWË §   ª ¨ Å Ø Â ¦  ¬ Ë §D¦ ªs×à ¦¶«+Ø Â ¦ § ¤âL Å ×Ã Ä ª « Y « ×Ã Ä ª « Y Ñ ë¦¶Ø Dª ¯ Í / Â Ø «  «ìh 6 7ìNPQAORX/NüKWgUSqí ø W[-Ñ Ä§ iS¶h % ¯ W[¼¯í  ª ¨ Sqh % ¯ W[ä¯í nS¾À? R[JÑ Ä ÖÖu § «  «&« uå  ª ¨ Å Ø Â ¦  ¬ Ë Â §  a ª ¦ ¤Z Å È Ì ¨W¦e§D« ¦ ¬ Ä«Ó¦ ª Ñ uDªn iS¶h®[`¯3Ç$S¶h % [`¯ iSqí [Y¯ íGÎS [ίí"HRàAÑ 4ð «äħ ×Ã Å Ö Ä« « u Ø Â ¦  ªs×Ãr¥¤ hTÑ Í S¶h®[Y¯ Sqh % S[ @ iS¶h % [Dß 4ð « § ×Ã Å Ö Ä« « u § «UÍ « D Å4§-Ñ E ¦ §D« Sqh % [:¯ í S¾ À ? [ £¯1íGÎS S¾ À ? [Õ[Ç¯í ½ nS¾À ? W[dº(S R[¾´ $¯1í ½ ; ñ AS¥À ? W[Õ´ i¯"í H ý Ñ ÕÔ « $Sqh % [&¯ Sqí [&¯ í ^ S [&¯ í ^ ½ S 0?1S¥IÀ HRàR[Õ[ ^ ´ B¯
í ^ HFÀàAÑ Dªs×Ãà S¶h®[+¯ SqAí H ý A[ @1SUí ^ HFÀÁàR[JsÑ C )+*,.- /.1
<
!
,
/
/
*
I
T"Î# 6.6*Z#]':= 6"$.656*D
"!#$
%'&(*),+ -/.0-/13245.768.94;:<4-=),>@?A.9B
>&?A@åNeQIL@JlRVLKMg»OW^_KîXr@-KWkdmnVLKWfAgP@4^qmAQAHDILNPORQ xAS¶²£[YNekÎan@JtuQA@JaKMkÎ^qORgegPOÁx]kJp tNeVdk¾I$kdmnEFæ E£OWkd@BIL?uKI x`Nek$kdNPXrEnge@ #X/@ZKMQANeQAlêId?uKMIiNvI$IÃKWéM@-kituQANPId@-gvèXrKWQèúbMKWgemn@-k ñDø ßZßZß ø ObR@-VãKEuKWV¾ILNvILNeOWQ ñDø ß-ßZß ø p>&?A@-Q ½ xnSq²£[¾´µrSq²£[6¯ ¸ óô ñ ó ìSxnSqh®[ ó [âp >&?A@YNPQIL@-lWVLKWg\OW^ðK)E£OWkdNPIdNPbW@¼X/@ZKWkÕmAVLKMfAge@]^qmAQAHDILNeOWQFxiNekÇaA@JtAQA@-aêfè ½ xnSq²£[¾´µrSq²£[+¯ geNPX ó ½ x ó Sq²£[Õ´µrSq²£[ox]?n@-Vdw@ x ó NPk+KÎkd@ mA@-QAHJ@¼OM^£kÕNeX/EAge@Ç^qmAQAHDILNeOWQAk+kdmAHÃ?åIL?AKMSI x ó S¶²£[ xAS¶²£[rKWQna x S¶²£[ xnSq²£[/KWk ' pÛ>&?nNek`anOF@JkêQAOWI`aA@-EG@-QAa OWQ÷IL?A@!EuKWVÕIdNeHâæ ó mAgüKMVãkÕ@ mA@JQAH-@Mp >&?A@NPQId@-lRVdKWg$OW^:KTX/@ZKWkÕmAVLKMfAge@6^qmAQAHDILNPORQ xNPkwaA@JtuQn@-aÛIdO fG@ ½ xnSq²£[¾´µrSq²£[`¯ ½ x +S¶²£[Õ´µ`S¶²£w[ ? ½ xïoS¶²£[Õ´µ`S¶²£[/KMkdkdmnXrNPQAlfGOWId? NPQId@-lRVdKWgek/KWVÕ@ tuQANvIL@ #Ax]?n@-Vd@ x Sq²£[;¯1XrK_C ZIxnSq²£[ ø `4KWQAa x ï Sq²£[;n¯ ?X/Ne_Q ZIxAS¶²£[ ø `p
A.
.
,+
DC
&
C
+
FE
E
"!#HG
I
2@JK)L4-NMN)L4O),?A.0-A1P4O>RQOST4O+ -A1324O6
9
4
.
H
(
+y zD{£|R} ~L} n|4u 2 uo¢G£\oÁn UçooW U¢G d Zð;AUuR Áuo ¢\T&¥¤ h e¦ §4¨ q©;ªs ¨ ¬cY
ÞiS [;¯ $S Þ [+¯ ° ¹ ´µrSq²£[ Í uDL Ø Â ¦ § Ø D « u)Ã ÂWË ª ÄÅ ¬DD §-Ñ E
>
G
(
>
çQãx]?AKMI¼^qOWgegeOx]k #Fx_@4KWkÕkdmAX/@iIL?uKMI&Id?A@)XrlW^oNPk]x_@-gPgaA@JtAQA@-aã^qORVäKWgeg NeQwkdXrKWgPg QA@JNelR?fGORÞVd?AOOaîOW^ p
Vd@JgüKMId@-aw^qmAQAHDILNeOWQãNPkëIL?A@)HÃ?uKWVdKWHJId@-VÕNekÕIdNeH$^qmAQAHDILNPORQ"#AaA@DtuQA@Ja fè$S ó [x]?A@-VÕ@ ¯ ?)ÀRp(>&?ANPk(^qmAQAHDILNeOWQBNek+KWgvxëKZèk+xì@JgegAan@JtuQA@Ja/^qOWV+KWgeg p(>&?A@ X/lW^(NPkämnkd@J^qmngo^qORVäkd@DbR@JVLKWgVd@ZKMkdORQAkJp t»NeVÕkÕI #\NvI¼?n@-geEnk¼mAk¼H-ORX/EAmnId@)Id?A@)XrORX/@-QIdk¼OW^ KîaANek¾ILVÕNefAmnIdNeORQsp @JH-ORQA$a #NvI$?A@-gPEAkimAk$tuQAa¡IL?A@åaANPkÕIdVdNefnmnILNPORQ6OW^_kdmAX/k$OW^_VLKWQAanORX bMKWVdNeKWfAgP@-k-p
>&?ANPVd"a #äNPI`Nekêmnkd@-aÚILOEnVdObR@6IL?n@6H-@-QIdVLKWgÎgPNeX/NPIåIL?n@-ORVÕ@-X x]?nNeHÃ? xì@ aANPkdH-mnkdkägüKMId@-V-p Ú?A@-QBId?A@&XrlM^\Nekx_@-gPgAaA@DtuQA@-$ a #NvI(H-KWQBf£@ìkd?AOx]QåIL?uKMI»xì@&HÃ?uKMQAlR@&NeQId@-VdHÃ?AKWQAlR@ IL?n@)OWE£@JVLKMIdNeORQAkìOW^(aAN s@JVd@JQIdNüKMIdNeORQêKWQAa ¾IÃKWéNPQAlB@DCFEG@-HJILKMILNPORQp >&?ANPk]ge@ZKMaAkëILO
S [+ ¯ ´ ´ Þ ¯÷ ´ ´ Þ ¯Ú h h Þ j ô<; ¯ $Sqh®[âß ô<; ô<; 7_èrIÃKWéNeQnl:^qORV&aA@-VÕNPbMKMIdNPbR@Jkëxì@$HJORQAHJgemAaA@$Id?uKMI S [+¯ÚÎSqh [Dp+>&?ANPk&lRNPbW@-k&mAkäK X/@JIL?nOFaî^qORV&HJORXrEnmnILNPQAl:IL?A@iX/ORX/@-QILk&OW^»KåaANek¾ILVdNPfAmnIdNeORQp )+*,.-0/.1 z$u3 2 l4o «oi h 6 (CFEoS¥ÀÁ[-Ñ Eo  pª Y &±À
Þ$S [;¯Ú Þ ¯ l ° f ¹ ï ¹ ´R²w¯ l° f ïuñ ¹ ´²î¯ wÀ ? À ß ; ; u ¦ ª « ÊÆW ÂWË ¦e§:¨W¦¶Ø DÓÆDª «Ç¦ ¤ ÀÑ Þ$S [ǯ IÀ HAS¾wÀ ? [ ¤Z ÂWË¶Ë &
ÀÑ Í
S [B¯ À  ª ¨ S [B¯àAÑ Dªs×Ãà ÎSqh¡[B¯ À  ª ¨ Sqh®[å¯Ü$S¶h ^ s [? Ý^ ¯ à ?÷ÀY¯ ÀÑ Mrz - -,4u3 D _ÃÃÖuD «U¦ § ¥¤ « u Å ÆÕ¤ Ñ È Ì m ¤ v ¯ [ h @ /« uDª ~ìS [+¯ ! ÞiS [-Ñ È Ì m ¤ h ñâø ßZßZß ø h  L ¦ ª ¨ çÖuDª ¨ Dª «  ª ¨ v
¯ ¸1ó h ó « uD"ª ~_S [+$¯ # ó ó S [ Í uDL ó ¦e§)« u Å ÆÕ¤ò ¥¤ h ó Ñ *
>
&
>
>
+>
>
>
>
>
/
I
"
!
#
?>
nÀ
] (
iz u DF 4ð «+h 6 7_NeQAORX/NüKMgqSUí ø 9G[-Ñ § ¬Dq¤ZL Í q ªs Í1« Þ Â «+h ¯ ¸ ó h ó Í uDL ìS¶h ó ¯
ÀÁ[;¯ 9  ª ¨ ëS¶h ó ¯ [+¯ À?G9oÑ Í ó S [;¯Ú ¯S 9 = U[@ SdS¥À?}9\[d[+¯ 9 @ wÍ uDL )¯ À?}9ðÑ Ä§ ÞiS [ǯ # ó ó S [;¯S:9 @ R[ ò Ñ ÁzJ üz 4ð «h  ª ¨ v ¬D)  ª ¨ ÅÜØ Â ¦  ¬ Ë §-Ñsm ¤ ÞiS [;¯ ~_S [ ¤Z ÂWË¶Ë ¦ ª  ªÃÖuDª ¦ ª « D Ø ÂWËoÂ Ã Ä ª ¨ « uDª h ¯ vêÑ )+*,.- /.1 ziu DD 4ð «K h 6M7ìNeQAOWXrNeKWgqSUí 9G[ ª ¨ii h 6M7ìNPQAORX/NüKWgUSqí ^ ø 9G[ ¬D ¦ ª ¨ çÖuDª Ï ¨ Dª « Ñ 4ð « v ¯Úh ñ @h ^ Ñ Í ñDø Â
~_S [;¯ ñ S [ ^ S [;¯ :S 9 @ R[ S 9 @ W[ ¯ S 9 @ W[ ò ò ò ò  ª¦ ªs¨/×ÃÍ Î« uà Ã×ÃÅ dÆWÆÕ¤Bª ¦ ×à « u × « ËvD «Ó«¦ D §) « §$u« u¨W ¦e§DÅ « ÆÕ¦¤)¬ Ä¥¤ «Ó¦ ª 7_ÈNeQA¦UÑ OR X/Ñ$Nü« KMugqDSULí åñ × @ª í ^« ø ¬D9G [i¨Wªs¦e§D « « u¦ D¬ )Ä«Ó ¦ ªª Ñ ¨ Å Ø Â ¦  ¬ Ë Í ¦ Â×Ã:  §]« u §  ŠŠÆÕ¤ Ì Í $×ês× Ë ÄA¨ «  «$v  6 í+ Sqí ñ @îí ^ ø  9\[-Ï Ñ (!OWXr@JQI @-QA@JVLKMIdNeQAl tumAQAHDILNPORQê^qORV ORX/@ ìORX/XrORQ ÎNPkÕIdVdNefnmnILNPORQAk ÎNek¾ILVdNPfAmnIdNeORQ X/lW^ 7ì@-VÕQAORmAgPgeN;SqE\[ 9 @1S¾w À ?}9G[ 7ìNeQnORXrNeKWg»SU" Q # E\[ :S 9 @ÛS¾wÀ ?}9G[Õ[ ò ïuñ ORNPkdkÕORQ S £[ ¼ORVÕX`KMg;SUÝ # [ @âCFE GÝ @ ^ KMXrXrKwS # [ ï ^qOWV & !#]"$9;#¼* " ÀRp mAEnE£ORkÕ@äx_@äEAgüKZèåKilKWX/@ëx]?A@-VÕ@äx_@äkÕIÃKMVÕIÇx]NPIL? ëaAOWgegüKMVdk-p (ÎQr@-KWHÃ?`EAgeKZè:OW^ IL?n@YlKWX/@]èRORmê@JNPId?A@-VÇaAORmAfnge@YOWV_?uKMgP^sèWORmAV_XrORQn@Jè #x]NPId?ê@ muKMg£EAVÕORfuKWfnNegeNvIçèRp Ú?uKMIäNPkëèRORmAV]@âCnEG@-HDIL@Ja!^qORV¾ILmAQA@)KM^¶Id@-V&í6ILVÕNüKWgPPk { àFp ?AOxcIL?AKMI Sqh¡[㯠NP^$KWQAaOWQAgPèTNP^¼IL?A@JVd@!NekêK®H-ORQnkÕIÃKMQI ãkÕmAHÃ?÷IL?AKMI Sqh¯ J[;¯ ÀRp p u@JI¼h ñDø ßZßZß ø h 6 L¼QANv^qORVdXãS ø ÀÁ[]KWQAaúge@J I v ¯ XrK_C Zh ñDø ßZßZß ø h ` p t»NPQAa O ÎSzv [âp ò ò ò ò )+*,.- /.1
@
9
9
>
>
/
>
>
'
0/
>
>
>
E
&
N>
>
>
*
B
"!
#
+
%#
$#
à
(
¿Ap3EuKWVÕIdNeHJge@úk¾IÃKWV¾ILkîKIêIL?A@úORVdNPlRNeQTOW^iIL?A@úVd@ZKMgYgeNPQA@!KWQAa X/OÁbW@-kwKWgPORQAl®IL?A@ gPNeQA@iNeQ ÕmAX/EAk¼OM^+ORQA@mAQANPI-p t\OWV¼@-KWHÃ? ÕmAX/E!Id?A@:EAVÕORfuKWfANPgeNvIçèêNek9Id?uKMIäIL?A@ EuKMVÕILNPH-gP@¼x]NegPg ÕmnXrEêOWQA@$mAQANvIìILO:Id?A@$ge@D^¶I&KWQnawId?A@ÎEAVÕORfuKWfnNegeNvIçè/NekYÀT? 9îId?uKMI Id?A@iEuKWVÕIdNeHJge@Îx]NegPg ÕmAX/EwORQA@)mAQANvI&ILOåIL?A@iVdNPlR?IZp u@JI]h fG@)Id?A@iE£ORkÕNPIdNeORQîOW^ Id?A@4EuKMVÕILNPH-gP@:KM^¶Id@-VYímAQANvILk- p t»NeQAaã$S¶h [äKWQAa Sqh [Dpåò Sq>&?ANekÎNPkYéQAOx]Q¡KWk KBVLKMQAaAORX³xìKWgPéGp [ ò ò Fp ^UKWNeV)H-ORNPQ Nek)ILORkÕkd@JaTmnQIdNeg_Kú?A@-KWaNekORfnIÃKMNeQA@Jaop Ú?uKMI:NekId?A@ê@âCnEG@-HDIL@Ja QmAX:fG@-V]OW^oILOWkdkd@JkäId?uKMI&x]NPgegsfG@iVd@ mnNeVd@Ja_{ ý p +VdObW@>&?A@JORVd@JX ¿Ap ý ^qOWV&aANekÕH-Vd@DIL@iVLKMQAaAORX³bMKWVÕNüKWfnge@-kJp p u@JIåh f£@!K¡H-ORQILNPQmAORmAk/VLKWQnaAORX bMKWVÕNüKWfAgP@êx]NPId? ö 6µ/p mAEnE£ORkÕ@!Id?uKMI Sqh [ί À`KMQAa®Id?uKMIÎ$Sqh®[Y@âCFNekÕIdk-p ?AOxId?uKMI$ÎSqh®[Y¯ ½ ; f _S¶h ²£[¾´²ðp yäNeQIZû ìORQAkÕNeaA@JVNPQId@-lRVdKMILNPQAlãfè¡EAKWVÕIdk-pî>&?A@/^qORgegPOÁx]NPQAlê^UKWHDINPk?n@-geEF^qmAgÊû)NP^ ÎSqh¡[ì@DCFNek¾ILk]IL?A@JQ!gPNeX ¹ ² wÀ ? µrSq²£[ £¯ p np + VdObW@>&?A@JORVd@JX ¿ApvÀ ý p f [ u@JIåh ñâø h ^ ø ßZß-ß ø h fG @ S ø ÀÁ[:VLKWQAanORX bMKWVdNæ þ pS -0/ ~¥z )+*/Wz U} -¼zM|~ p [ KWfnge@-kKWQAaige@DI h ¯íðïuñ ¸ òóô ñ h ó p#+gPOWI h bW@-VÕò kdmAkoí4^qORVí!¯ À ø ßZßZß ø À ø p ]@-EG@ZKMI&^qOWVëh ñÃø ò h ^ ø ßZßZß ø h 6 ëKWmAHÃ?èðp »ò CFEAgüKMNeQîx]?èêIL?A@JVd@iNek&kÕmAHÃ?KBaANv^üæ ^q@JVd@-QnH-@Wp ò À p u @JI&K h 6 S ø ÀÁ[ëKMQAa!gP@JI v ¯ Þ p t»NeQnaîÎS vB[ëKWQAa S v:[âp ÀRÀRpS -0/ ~¥z g)+*/Wz U} -¼zM|~ \} - .1 ,~L}á| å~ Áz F~¥ , ÃzJ~ p [ u@JI v ñâø v ^ ø ßZßZßnf£@ NPQAaA@-EG@-QnaA@-QIVLKWQAanORXÉbMKWVdNeKWfAge@Jk$kdmAHÃ?IL?uKMI Szv ó ¯3ÀÁ[$¯ Szv ó ¯ ?)ÀÁ[i¯ IÀ HWàF p u@JIäh ¯ ¸ òóô ñ v ó p&>&?ANPQAéwOWS^ v ó ¯ À:KMk ¾Id?A@k¾ILOHÃéwEAVdNPH-@)NeQAHJVd@ZKMkd@-a fè6ORQn@raAOWgegüò KMV <#Av ó ¯ ?ÀrKWk ¾IL?n@/k¾ILOHÃé6EAVdNPH-@åaA@JH-Vd@-KWkd@Ja fè¡ORQA@åaAORgPgüKWV KWQnawh KWk&Id?A@$bWKMgemA@$OW^oIL?n@)k¾ILOHÃéêORQãauKZèîí»p SUK[st»NeQnò a`iS¶h [ëKWQAa Sqh [âp Sqf\[ NeX:mAgeKMIL@&h ò KWQAaêEAgPOWI;ò h bW@-VdkÕmAk&í!^qOWV_í!¯ À ø à ø ßZßZß ø À ø p ]@-EG@ZKMI Id?A@)x]?nORge@)kdNPX:mAògüKILNeOWQãkÕ@JbW@-VLKMògILNPXr@Jk-p äOWILNPH-@iIçxìO`Id?ANeQnlRk-pwt»NPVdk¾I #\NPIc! kä@ZKWk¾è IdO ÕkÕ@-@ úEuKMIÕIL@JVdQAkNeQ6IL?A@rkd@ mn@-QAHJ@ê@JbW@-QId?AORmAlR?NPIiNPkVdKWQAaAORXãp @-HJORQAa"#
,+
9
*
,
(
?>
&
*
,+
] (
O
èROWmx]NegPg(tuQAaId?uKMI)IL?A@/^qORmAViVdmAQnk4geOORébW@-VÕè®aANs@JVd@JQI4@JbW@-Q IL?nORmAlR?®IL?A@Dè x_@-Vd@ålR@JQA@-VdKMIL@Ja6IL?A@BkdKWXr@xëKZèRp y¼Ox aAOîIL?A@BH-KWgeHJmAgüKMIdNeORQnk¼NPQTSÓK[¼@âCFEAgüKWNPQ IL?n@)kÕ@-HJORQAaúOWfAkd@JVÕbMKMILNPOR_Q { ÀÁàFp +VdObR@åIL?n@/^qORVÕX:mAgeKWkÎlRNvbR@-Q®NeQ6Id?A@BIÃKWfAgP@åKMIiIL?A@åfG@-lRNPQAQANPQAlîOW^ @-HJIdNeORQ¡¿Ap ¿ ^qORV4Id?A@ 7ì@JVdQAORmngegezN # oORNekÕkdOR"Q #LäQANP^qORVÕX # (CFEGORQA@JQIdNüKWzg # KMXrXrKúKWQna 7_@JIÃKFp y¼@JVd@`KWVd@rkdORX/@`?ANPQILk-p tuORV$IL?A@rXr@-KWQ OW^ìIL?A@ ORNPkdkdOW" Q #(mAkÕ@`IL?A@/^UKWHJI)IL?AKMI ä¯ ¸ ¹Df <ô ; ¹ H² vp»>oOYHJORX/EAmnIL@ÇIL?A@ÇbMKWVdNeKWQAH-@ #tAVdkÕI»H-ORX/EAmnId@ÇÎSqh Sqh ?rÀÁ[d[âp tuORV:IL?n@îXr@-KWQ OW^¼Id?A@ KMXrXr K #+NPI4x]Negeg_?A@-gPETIdO¡X:mAgPIdNeEAgvèKWQAa aANPbNPaA@îfè ÇS F@ÀÁ|[ H £ñ KWQAaîmnkd@$IL?A@Î^UKMHJIëId?uKMI]K KWXrXrKan@-QAkÕNPIçèêNPQIL@-lWVLKMId@-këILOîÀWp tuORVëIL?n@ 7_@JIÃK #uX:mAgPIdNeEAgvèrKWQAawaANPbNPaA@$fè _S @ÀÁ[ ÇS |[ H ÇS @ @ÀÁ[âp À O p mAEnE£ORkÕ@:xì@BlW@-QA@JVLKMId@BK`VLKMQAaAORXcbMKWVdNeKWfAge@ NPQúIL?A@^qORgegPOÁx]NPQAlåxëKZèR0p t»NPVdk¾I x_@ uNPEK6^UKWNPVBH-ORNPQp ^YIL?A@wH-OWNeQTNekå?A@ZKMaAck #ìILKWéW@ ILO®?uKZbR@!K LäQANP^ÕS #PÀ[ aANPkÕILVÕNefAmFILNeOWQp ^£Id?A@ëH-OWNeQBNekIÃKWNPgeck #IÃKWéM@ ÷ILO$?AKbW@äK L¼QANv^ÕS O # ¿[oanNekÕIdVdNPfAmnILNPORQp SÓKRs[ t»NeQAaîId?A@iXr@-KWQwOW^ 4p SUfuw[ tNeQAaêId?A@ikÕIÃKMQAauKWVÕa!aA@DbNüKMIdNeORQwOW^ :p ÀZ¿Apwu@JIäh ñâø ßZßZß ø h ÚKWQAoa v ñÃø ßZßZß ø v f£@)VLKWQnaAORXÜbMKWVdNeKWfAgP@-k]KWQAaãge@DI ñDø ßZßZß ø KWQAa ñDø ßZßZß ø fG@)HJORQAk¾IÃKWQILkJp ò ?AOxÛIL?AKMI ò h ó ø ò v ¯ ò ó Sqh ó ø v [âß ó óô ñ Õô ñ óô ñ dô ñ À Fpwu@JI ºÁÞ ~ÇS¶² ø n[+¯ Rñ Sq²G@ A[ OM IL÷?A@J²VÕx] NekÕ@WÀ ß ø à t»NPQAa SÓàh ? O v5@ R[Dp À ý wp u@JIäVSüCA[&fG@iKB^qmAQAHJIdNeORQwOW^ðCãKWQnaãgP@JI]kZSqèn[]fG@)K:^qmAQAHJIdNeORQwOW^oèRp ?nOÁx1IL?uKI ÎS xAS¶h®[ zS v4[ % h®[+J¯ xnSqh®[ ÎS zS v4[ % h¡[Dß YgPkd
>
E
*
*
#
#
#
#
H
6H
H
H
&(
W¿
(
À p +VdObW@)Id?uKMI SvB[+¯Ú Szv % h®[A@ $Szv % h®[âß yäNeQIZû u@JI ¯ $Sv:[(KWQna/gP@JI S¶²£[_¯÷iSv % h¯²£[âp ¼OMIL@ëIL?AKMI(ÎS Sqh®[Õ[;¯ ;ÎzS v % h®[Y¯ÎSzv4[ί6ß"7ì@ZKWViNPQ6XrNPQAaIL?uKI 4NekiKê^qmAQAHDILNPORQ¡OW^_²ðp äOx x]VÕNPIL@ S v:[;¯ $Szv ?ã[k^ë¯ÚÎSÕSzv ? MS¶h®[d[p@S Sqh®["?ã[Õ[N^ß (CFEuKMQAaîIL?A@ k mAKWVd@ìKWQAa4ILKWéW@ÇIL?A@_@DCFE£@JHJILKMILNPORQp ÇORm4IL?n@-Q:?uKZbW@_ILOäIÃKWéM@ìIL?n@_@âCnEG@-HDIÃKMIdNeORQ OW^;Id?AVd@J@:IL@-VÕXrkJp çQ¡@-KWHÃ?¡HZKWkÕ@ #mAkÕ@åId?A@BVdmnge@:OW^+Id?A@BNPId@-VdKMIL@Ja6@DCFE£@JHJILKMILNPORQû NÓp @Mp»$SqkÕIdm Ç[;¯ÚÎS¶ÎSUk¾ILm % h®[d[âß À np ?nOÁx IL?AKMI$NP^+ÎSqh % v¯ n[]¯ )^qORVÎkÕORX/@BH-ORQAk¾IÃKWQI Id?A@-Q6h KWQna v KMVd@ mAQnH-ORVÕVd@-geKMIL@Jap À þ p]>&?ANPk mA@-k¾ILNPORQ¡NekYIdOî?A@-gPEèRORm¡mAQAan@-Vdk¾IÃKWQna®Id?A@BNeaA@-KêOW^_K -u Ó U » q¢G u@DI$h ñÃø ßZßZß ø h fG@/õÓõ¶ö x]NvIL?6X/@ZKWQ¡Ý KWQAabMKWVÕNüKWQnH-@ ^ p u@JI h ¯ íðïuñ ¸1ñ òóôh ó p¼>&?A@-Q òh NPkÎK JZn U J U #AIL?uKMIYNek #Kr^qmAQnHJILNPORQúOM^+IL?A@ auKò IÃKnp NeQAHJ@ h Nek&K4VdKWQAaAOWX bMò KWVÕNüKWfAgP@#NPIì?uKWk&K:anNekÕIdVdNPfAmnILNPORQpo>&?ANekìaANek¾ILVdNæ fAmFILNeOWQNek¼HZKWgPge@-òa!IL?A@`§ Â Å Ö Ë ¦ ªÆ ¨W¦e§D« ¦ ¬ Ä«U¦ ªT¥¤ « u §D«  «U¦e§D«Ó¦ × Ñ ä@JHZKWgPg^qVdOWX >&?A@JORVd@JXc¿ApPÀ ý IL?AKMIäÎS h [ì¯ ÝKWQAa S h [_¯ ^ HMí»p ÎOR$Q ! I¼H-ORQF^qmAkd@)IL?A@ aANPkÕIdVdNefnmnILNPORQ:OW^\IL?n@&auKMIÃK)ºò Þ KWQAaBIL?n@&aANek¾ILVdNPfAò mnIdNeORQ:OM^£Id?A@&kÕILKMILNPkÕILNPH]º Þ p>O XrKWéW@äIL?ANPkìHJge@ZKMcV #ge@JI_h ñâø ßZßZß ø h 6ML¼QnNP^qORVÕXúS ø ÀÁ[âp u@JI]ºÁÞ f£@YId?A@ÎaA@JQAkdNvIçè OW^nIL?A@LäQANP^qOWVdXãS ø ÀÁ[Dp+geOWI(ºÁÞYp ò äOÁxTgP@JI h ¯1íðïuñ ¸ òóÐô ñ h ó pt»NPQAaÎS h [ KWQna S h [Dp +geOMI»IL?n@-X KMk+KY^qmAQAHDILNeOWQ:OW^Gí»p ò ìORXrX/@-QI-p ¼Ox kdNPX:mAgüKIL@;IL?Aò @ aANPkÕIdVdNefnmnILNPò ORQiOW^ h ^qORVoí!¯ À ø ø à ø À p ì?A@JHÃé)Id?uKMIoIL?A@_kdNPX:mAgüKIL@-aibMKWgemn@-k OW^ÇiS h [)KWQna Sò h [)KMlRVd@J@rx]NPId?èWORmAVId?A@-OWVd@JIdNeH-KWg;HZKWgPH-mAgeKMILNPORQAk-p Ú?uKMI aAOBèWORm!ò QnOWILNPH-@)KWfGORmnI&ò IL?n@)kdKWX/EAgeNPQAl:aANPkÕILVÕNefAmFILNeOWQîOW^ h KWk]í¡NeQAHJVd@ZKMkd@-k { ò à p +VdObW@ u@-X/XrK:¿Apáà p
#
#
*
#
#
*
#
,
+
*
(
#
#
#+
*
*
!#"%$'&)( !#* + ,-/.1032546-/.7( 89*/.;:!#=?>A@B>C.1D
GE FIHCJLKNMPORQRSTQUHCVWMPXTHYKIVTH[Z\KIO6Z\]^X`_a]^KbFIcIQUFbd/JLKNMPFeSfQgSfQUHAVWSfhIMiS#jkQUd^heS`]PSfhbHCXmlnQRVTH_oH hNMPXmcpST]'qC]^jkrIKbSTHPstuhIH[v3lnQUOROwMiOUVT]W_aH1KIVmHCcpQUF3SThIH;SfhbHC]^XxvY]PZyqA]^Fez^HCXmd^HCFbqCH{lnhIQUq|h QUVucIQRVTqAKIVTVmHCc/QRF3SfhIH{FIH~}Sq|hNMPrSfHCXAs1KbXnIXTVmSQRFIHCJLKNMPORQRSGv
QUVMPXmP]zoVQUFbHCJLKNMPORQRSGv^s AeiA'i|iL {¡£¢C¤I¥¦e§¨ ©«ªL¬®y¯[°)± ²[¯³µ´w¶¦´N·G´w¯¸e³P°¹»ºP¯®¼m³P´o½¾¶¦¿ º¦³P¼~¹³e²CÀU¯)³P´o½kÁ[ÂÃeÃN¶Á¯{°\Äb³P°Å1Æ\±®ÇW¯mÈ^¹UÁ[°\ÁCÉÊy¶¦¼{³P´NË{ÌÎÍÐÏLÑ ÆØ sRÙ Ç Ò Æ\±ÓÍÔÌmÇÎÕ Å;Æ\̱ÖÇ × Ú;ÛwÜÜÞÝwß Å;Æ\±®Çàâáäæã åaç Æ å Çxè å âà áÞBä é åaç Æ å Çxè åaê %á é ã aå ç Æ å Çxè å ë á é ã aå ç Æ å Çmè å/ë Ì á é ã ç Æ å Çxè å àìÌ Ò Æ»±ÍíÌmÇ sî ïØ
ï
!
nCe¦#"Ô%$Þ'&Aª ~AL 7¨ ¢A¤I¥¦L§ ¨ ©ªL¬ 9¯[°( à Å{Æ»±®Ç ³P´o½)+*1à-, Æ\±ÖÇCÉ . ÄN¯[´bÑ ± 02(3/ ë ÌmÇuÕ )4Ì * * MPFIc Ò Æ5/76/ ë98 ÇÎÕ Ù * Æ«Ø s;: Ç Ò %Æ / 1 ¯ 6 à Æ\> ± 0?(AÇ @B)%DÉ C[´ Ãb³P¼~°F¹ E[ÂÀg³P¼fÑ Ò Æ%/G6/Í : Ç`Õ Ù @'8H³P´o½ Ò Æ%/G6/Í < ÄN¯[¼f= I ÇÎÕ Ù @BJNÉ Ú;ÛwÜÜÞÝaß K H{KIVmH#MiXTP]zo VQRFIHCJLKNMPORQRSGv
Sf] qC]^FbqCOUKbcIH1SfhNMiS Ò Æ%/ ±L0M(N/ ë ÌmÇ%à Ò Æ%/ ±L0M(N/ * ë Ì * ÇÎÕ Å;Æ\±1Ì 0M* (9ÇO* à )4Ì * * × tuhIH{VTHAqC]^FbcrNMPXxSuZ\]^OUOR]lnVu_Lv3VTHASmSfQRFId ÌBà 8 ) s6î PQPeS Re§ U TWVYX9Z ÂÃeÃN¶Á¯ < ¯`° ¯|Á[°Î³WÃo¼f¯T½PF¹ E[°¹¶¦´ ¿Y¯[°\ÄN¶C½¦Ñγ3´w¯[¾¼T³PÀ9´w¯[°\[?¶¦¼ ¯mÈL³P¿7ÃoÀU¯|Ñ ¶¦´ ³#Á¯[° ¶ [N]´w¯ < ° ¯|Á[!° ET³¦Á¯|ÁCÉu9¯[°w_± ^wà Ù ¹ [°»ÄN` ¯ Ão¼|¯T½PF¹ E[°G¶¦¼7¹UÁ < ¼f¶¦´¾¸ ³P´o½;±_^wà Ï ¹ [)°\ÄN¯nÃo¼f¯T½PF¹ E[°G¶¦¼`¹UÁ`¼~¹g¸¦Ä¾° É . ÄN¯[´ a± ` bà ]cWdfe ^hg d _± ^n¹UÁ`°\ÄN¯ ¶P²[Á¯[¼~ºP¯T½5¯[¼~¼f¶¦¼W¼m³P° ¯CÉ ³ E|Ä_± ^;¿k³PË ²[¯k¼f¯¸e³P¼m½¾¯T½³¦ÁYl³ k;¯[¼~´w¶¦Â¾À»À£¹ < ¹»°»Ä®Ân´ m¦´w¶ < ´¿Y¯T³Pp´ oy?É q¯ < ¶¦ÂÀg½ i j À£#¹ me¯'°Gr¶ m¦´w¶ < °\ÄN¯#°«¼~ÂN¯|Ñ1²C¾°nÂn´ m¦´w¶ < ´¯[¼~¼f¶¦¼'¼T³P°Gs¯ oy_É C~´N°«Â¾¹»°«¹»ºP¯[À£ËÑ < ¯Y¯mÈAÃN5¯ E[°u°»Äb³P° ±t`pÁ|ÄN¶¦ÂÀg½²[u¯ E[ÀU¶Á¯)°G¶ oyÉ v`¶ < À£#¹ me¯[À£Ëk¹UÁ t± `/°G¶k´w¶¦°u²[¯ < ¹»°»Ä¾¹»l´ w
¶ [xozy{q/¯1Äb³PºP¯ °»Äb³P° , Æ t± `^Ç|à , Æ\± d }Ç @B]4*6à oyÆ Ù 0tooAÇ @B]µ³P´o½ ± 0~o/¾Í w|ÇÎÕ ,kÆ wt± * `eÇ à oyÆ Ù ]+0tw * ooÇ Õ H]4Ù w * Ò 5Æ / 1 Á[¹»4´ E|x¯ oyÆ Ù 0~ooÇÎÕ d [?¶¦¼`³PÀ»À oyÉÊy¶¦z¼ w6à × : ³P´oS½ ]à Ù Ï^Ï °»ÄN¯W²[¶¦Â¾´o½
¹UÁWÉ
'jeÉ î %
& . + >?*ttmD 89*/.1:!W=?>A@2 ]¾HacIQUFbdIVnQRFIHCJLKNMPORQRSGvQRVnVTQUjkQUOUMPX6QRF VmrIQUXmQRSuST]3MiXTP]zo V7QUFbHCJLKNMPORQRSGv_IKS7QgSnQUV M{VThNMPXmraHAX6QRFIHCJLKNMPORQRSGvP s KHrIXmHCVmHCFeS6SThIH7XmHCVTKbORShIHCXmHQRFkSGl]#rNMPXxSfVAsBtuhIH7rbXT]¾]PZ\V6MiXTH QUFSThIH1SfHCq|hbFIQUqCMPO9MPrIroHCFIcbQg}as
ï
nz r _
Aei 9¯[° d C× ×?× W`²[¯{¹»´o½¾¯GÃN¯[´o½¾¯[´N°Î¶P²[Á¯[¼~º¦³P°«¹¶¦´bÁÁ[ E|Ä5°»Äb³P° Å;Æ W^\Çà Ï ³P´o½
^yÕ W^9Õ 5^GÉ9¯[° wnÍÐÏNÉ . ÄN¯[´bÑf[?¶¦¼W³P´NË{ÌÎÍÔÏ¾Ñ ` ` ÆØ s I Ç Ò ^hg W^ ë w Õ c é ^hg é c × d d
!
"
$#&%('*) ,+ . ) -$#0/21
AeiR 9¯[°y± d ?× ×?× ±` HCXmFI]^KIOROUQ ÆhooÇAÉ . ÄN¯[´bÑW[?¶¦¼)³P´NË=wÍÐÏLÑ ÆØ s HLÇ Ò / ±a`U0to/LÍ|w `Õ : c * ` ± `)à ] cWd e `^hg d ± ^GÉ < ÄN¯[¼f¯ t
4365
9
87
! #
Pe=Re§ TWV µ9¯[°6± d ×?×C× ±_` HAXTFI]PKIOUORQ Æ ooÇCÉ®9¯[° ]µà Ù Ï^Ï ³P´o½ w'à × : É q/¯ ÁC³ < °»Äb³P° ÄN¯C²CËÁ|ÄN¯[º Á1˦¹¯[Àg½¾¯T½ Ò %Æ / ±t`U0~o /Íw|ÇÎÕ × Ï : Ø × E5E|¶¦¼m½P¹»´¾¸
°Gu ¶ v)¶¯ ½P¹»´¾¸ Á{¹»´w¯ CÂI³PÀ£¹»°ËÑ ± 0to/LÍ × : Ç6Õ : c * d äGä * à × Ï^Ï^Ï Ò %Æ / L  E|Ä3Á[¿k³PÀ»ÀU¯[¼1°\Äb³P´®É
'jeÉ î < ľF¹ E|Äp¹UÁ)¿'W ]¾H acIQUFIdbVQUFIHAJ¾KIMPOUQgSGvd^QRzPHCVKIV;MkVTQUjkrIORH{l6M?v ST]YqAXTH?M¦SfH#M Z\]^X6M`_IQRFI]^jkQMPONrNMiXfMPjkHASTHCX o s K HlnQUOROocbQUVTqAKIVTV6qC]^FbIcIHCFIqAH1QUFeSfHAXmziMPORVOUMiSfHAX _IKbShIHAXTH{QUVÎSThIH{_NMPVTQRq{QUcIHCMbs ÞQg} ÍµÏ MPFIcpOUH[S Ù OR]^d : d * w}``à × : ] v ]H wcIQRFIdI VuQUFIHAJLKNMPOUQgSGv l Ò / t± `0~o/¾?Í w}` Õ : c * ` à × PQ
:
>=
A
;3<5
@?
CB
D?
E
%$F
-.#
U
GIHKJMLNDO,JDGIOQPRJTS
OWVYX[Z]\
^
`_
/
ba
c
_ed f
5
7
Ig
9
! h#
_
ï^ï ! HAS à Æ ±t` 0w ±t` ê w|Ç sntuhIHAF aÒ Æ @ ooÇÎà Ò Æ5/ ±a` 0 o /NÍ wfÇÕ sHCFIqAH Ò hÆ o Ç ë Ù 0 oSThNMiS7QUV SfhbH#XfMPFbcI]^j QUFeSfHAXmziMPO STXfMPrbVSfhbHWSfXmKIHWrNMiXfMPjkHASTHCX ziMPOUKbH o lnQRSfh rIXT]P_NMP_IQROUQRSGvÙ 0 olHWqCMPOUO M Ù 0 íqA]^FbNcIHAFIqCH'QRFeSfHCXxziMPOsn5]^XTH ]^FSfhbQUVuOM¦SfHCXAs ,!W< 46-32
p46- !#" !W* +
.1* DÞ.1* 89*/.1:!#=?>[@>C.1D tuhIQUV1VmHCq[SfQU]PF®qA]^FeS|MPQRFIV;SGl] QRFIHCJLKNMPORQRSTQUHCV]^F H[}roHCq[SfHCc®ziMPOUKIHAV;SfhNM¦S)MPXTH']PZ»STHCF KIVmHAZ\KIO«s nCe¦ í% $e¥ iª i i W¨ ¢C¤I¥¦e§¨ ©«ªL¬ C [B± ³P´o½ æÄb³PºP ¯ %´N¹»° ¯º¦³P¼~¹»· ³P4´ E|¯|Á{°»ÄN¯[´ Å / ± /¾Õ Å1Æ\± * ÇGÅ;Æ * Ç × 2 Æ«Ø s ØPÇ !HAq?MPOROwSfhIMiSM Z\KbFIqASTQU]^# F "3QRV %$QRZ9Z\]^XuHCMPq|h å '& MPFIc5H?Miq|h )( Ï +Ù * " Æ åWê Æ Ù 0 Ç & ÇÕ " Æ å Ç ê Æ Ù 0 Ç " Æ & Ç × E ,Z "'QRV SGlnQRqCHcIQ aHCXmHCFeSfQUMP_IORH eSfhIHAF3qC]PFLzPH[}QRSGv
XTHAcIKIqAHCV6ST]Wq|hIHAq|¾QRFIdWSThNMiS ".- - Æ å Ç ë Ï Z\]^XnMPORO å s%E Sq?MPF5_oH)VmhI]lnF SThNMiSnQRZ "
QUVnqC]^FezPH[}pSfhbHCFQgSuOUQRHCVnMP_o]z^H)MPFevORQUFIH;SThNMiS Sf]PKIq|hIHC/V "ÖMiS)Vm]^j
H'ro]^QRFLS 9q?MPOROUHCc®MS|MPFbd^HCFeS`OUQUFbHP1s 0 Z\KIFIqASTQU]^2F " QRV
QRZ 0 " QUV1qA]^Fez^H[}a4 s 3Þ}bMPjkrIOUHAV)]iZÎqC]PFLzPH[}®Z\KIFIq[SfQR]^FIV)MPXT1H " Æ å Ç1à å * MPFb5c " Æ å Ç;à +6 s 3Þ}bMPj
rbOUHCVu]iZÞqA]^FIq?M?zPH`Z\KIFIq[SfQR]^FIVnMPXmH/" Æ å Çà-0 å * MPFIc#" Æ å Ç%à OR]^d å s nCe¦8 7Ô: 9i¢ xi¢L {¡ ¢A¤I¥¦L§ ¨ ©ªL»¬ C [ " ¹UuÁ E|¶¦´NºP¯mÈk°\ÄN¯[´ Å " Æ\±ÖÇ ë " ÆÅ ±®Ç × Æ«Ø s Ç C [ " ¹UÁ E|¶¦4 ´ ET³PºP¯{°»ÄN¯[´ Å " Æ\±ÖÇ6Õ " ÆÅ ±®Ç × Æ«Ø s Ç Tg
_
g
_
g
g
_
_
>
8
GIH
_
_
JKX
O
_
_
g
_
g
g
4GIH
J
GIZX
O
Ú;ÛwÜÜÞÝaß;H[S;< Æ å Ç à ê å _aH1M'ORQUFIH S|MiFId^HCFeSuST]=" Æ å Ç MiSÎSThIH{ra]PQUFeS Å1Æ\±®Ç s
g
j r !
4
QUFbqCH "3QRVuqC]^Fez^H~} aQgSuOUQRHCVnMP_o]z^H1SfhIH{ORQUFIH < Æ å Ç s ] Å " Æ»±®Ç ë Å < Æ\±®Ç%àìÅ;Æ ê f±®Ç%à ê TÅ{Æ»±®Çà < Æ»Å1Æ\±ÖÇTÇà " ÆűÖÇ × î Xm]^j PHCFIVmHCF VÖQUFIHAJLKNMPOUQgSGvÐl6HVmHCHSThNMiS Å ± * ë ƻű®ÇO* MPFIc Å;Æ Ù @±®Ç ë Ù @Å;Æ\±ÖÇ s QRFIqCH'OR]^dpQUV;qA]^FIqCMzPH Å{Æ OU]^d ±ÖÇ)Õ OR]^d Å;Æ\±®Ç s N]^X;H[}bMPjkrIORH VTKbrIra]PVTH SfhNM¦S ± Æ I Ù Ç stuhIHAF Å;Æ Ù @±®Ç ë Ù @ I s í.;46- * >?4n!#= .1* + > "&& & a&. + >?*ttmD 89*/.1:!W=?>A@2 K H;lnQROUOwj
MPiH;KIVTH{]PZ9SThIH1H[}bMPq[SuZ\]^XTj]PZytÞM?v¾OU]^XAVÎSThIHC]PXTHCj BQg Z "kQRVnM#VTjk]]PSTh Z\KIFIq[SfQU]PF ISfhIHAF5SThIHCXmH`QRVnMkFLKIjW_aHAX ÆÏ Ç VTKbq|hSfhNM¦S " Æ Çà " Æ«ÏeÇ ê " -Æ«ÏeÇ ê " Æ Çs * Ú;ÛwÜÜÞÝ)]iZItuhIHA]^XTHAj Ø s H s N]^X9MPFev ÌÎÍµÏ ¦lH6hIMzPH ¦Z\XT]^j MiXTP]zo VÞQUFbHCJLKNMPORQRSGv SfhNM¦S ` ` Ò ^hg ^ ë w à Ò Ì ^hg W^ ë OÌ w à Ò é ë é d d Õ céÅ é à c é ^ Å;Æ é Ç × Æ«Ø s ï Ç QUFbqCH ^1Õ ^7Õ ^ l6H3qCMPFlnXTQgSfH ^ MPV#M qC]^Fez^H~}qA]^j#_IQRFNMiSfQR]^F®]PZ ^ MPFIc 5^ FNMPjkHCOgv ^wà ^ ê Æ Ù 0 Ç ^ lnhbHCXTH àÆ W^50
^»AÇ @IÆ ^50 ^Ç s ] ^_ev)SThIH6qA]^Fez^H~}bQgSGv ]PZ é l6H{hNM?z^H é Õ W^ ^ 0 0
^^ é ê ^\^\00 ^^ é × tÞMiPH)H[}raHAqASfMiSfQR]^FIVn]PZ_a]PSThpVTQUcbHCVMPFIcpKIVTH{SfhbH{ZMiqASuSfhIMiS Æ W^Ç à Ï ST] d^HAS Æ«Ø s JeÇ é Õ 0 ^\0 ^ ^ é ê ^ 0 ^
^ é à lnhIHCXmH à ÌAÆ ^\0 ^Ç " Æ Çà 0 ê OR]^d Æ Ù 0 ê Ç MPFIc 5-à 0 ^ @IÆ ^ 0 ^»Ç s
Kg
ï J
s ` _
[g
^
g
3
^
g
Tg
# "! !
^
g
#
!
Ig
g
Ig
_
$#
h
-%
).&/(+* )0,
!
h
%
!,
)'&)(+* )
"
* )
g
&
_
_
R
g
21
* )
'*)
+ )
43
3
g
* )
7
'*)
$7
+ )
7K
65 %
-
7
R
g
^Ï J
!
]PSTH)SThNMiS;" Æ«ÏeÇ%à "%- «Æ ÏeÇ àâÏ s07OUVm] " Æ ÇÎÕ Ù @BH Z\]^XMPOUO ÍìÏ s vptM?v¾OU]^XAV SfhbHC]^XmHCj bSfhIHAXTH{QUVM Æ Ï Ç VTKbq|h SfhIMiS " Æ Ç à " Æ«ÏeÇ ê " ÆÏeÇ ê : * " Æ Ç à : * " Æ Ç6Õ ï * à Ì * Æ ^\ï0
^»Ç * × HAFIqCH Å é Õ Õ é c × tuhIH{XTHAVTKIOgSnZ\]^OUOR]lnVZ\XT]^j ÆØ s ï Ç s î Ú;ÛwÜÜÞÝÖ]PZÎtuhIHA]^XTHAj Ø s Ø 4s 9H[S W^6à Æ Ù @']ÇCÆ»_± ^0 ooÇ stuhIHCF Å;Æ ^»Ç)à Ï MPFIc µÕ ^`Õ lnhIHAXTH à 0!o @B] MPFIc pà Æ Ù 0MooAÇ @B] s 07ORVT] Æ 0 ¾Ç * à Ù @B] * s 07rbrIORv¾QUFbd#SfhIH{OMiVmSntuhIHA]^XTHAj l6H)d^HAS Ò Æ t± `U0to5|Í wfÇ à Ò ^ ^yÍ w Õ c é é ` × tuhIH Mi_a]z^HhI]^OUcbV
Z\]^XY` MPFev Ì ÍÓÏ s EGFÐrNMPXxSfQUqAKIOMiX SfMPPH Ì à H]+w MPFbcÔl6H dPHAS Ò Æ a± ` 0 oÍ ` w|ÇÎÕ Bc * s v'M;VmQUjkQUOUMPXyMPXTd^Kbj
HAFLSÞl6Hnq?MiFkVmhI]l SThNMiSÒ Æ t± ` ` 0 o 0w|ÇÎÕ ± `0~o/¾?Í w `Õ : c * sî c * s KbSmSfQRFId'SfhIHAVTH{Sf]^d^H[SfhIHAXul6H)d^HASuÒ / t ~ >?0 =?>C_& k"! - >?4 . !#"%$D 07F{H[}qAHCOUORHCFeSyXTH[Z\HCXTHAFIqCH%]^F{rIXm]^_NMP_bQUOUQgSGvQUFIHAJ¾KIMPOUQgSfQUHAVMPFbc{SThIHCQRXKIVTH%QRF1VmS|M¦SfQUVxSfQRqCV MPFIc rNMiSmSfHCXmFXmHCqC]Pd^FIQRSTQU]^FpQR V ;H[z¾XT]v^H ;v ]P Xm MiFIc KId^]PVTQ Æ Ù JJ Ç stuhIHWrbXT]¾]PZB]PZ ]¾H acIQRFIdI VuQUFIHAJ¾KIMPOUQgSGvYQRVnZ\Xm]^jSThNMiSuSTH[}¾S?s 4n.1"%4>?DÞ.1D Ù^s HAS ± 3Þ}ro]^FIHCFeSTQMPO Æ Ç s ÞQUFbc`Ò %Æ / ± 0 (=/ 9ë 8 )uÇ Z\]^X 8 Í Ù^s 6]Pj
rNMiXTH SThIQUVÎST] SThIH{_a]PKIFIcpv^]^K5d^HASnZ\Xm]^j6hIHA_evVmhIHAzo V1QUFIHAJLKNMPOUQgSGv^s :s HAS ± y]^QUVmVT]^F Æ aÇ s VTH 6hbHC_ev¾VThIH[za V1QUFbHCJLKNMPORQRSGvYST]YVmhI]lâSfhNM¦SÒ Æ»± ë : oÇ6Õ Ù @ s ` ± _ ` Æ ooÇ MPFbc t ± `)à ]cWdWe ^ g _ ± ^ s ]^KbFIc#Ò %Æ / a± `0 C H m X I F ^ ] I K R O U O Q I s HAS ± d d ? × ? × × o/bÍ wfÇ KIVmQUFId 6hIHC_ev¾VThIH[zoV)QUFIHAJLKNMPOUQgSGvpMPFIcKIVTQRFIdr]H wcIQRFIdI V7QRFIHCJLKNMPORQRSGvPs !!
[g
g
5
!!
!
!!
g
* )
65 %
-
$#%('*) ,+ ).-$#0/21
<
<
R
c
d
!
[g
R
# /&%(1
-
g
5
! #
7
! #
9
! #
g
3
^
3
365
5
Ù hI]l SThNMiS ylnhIHAF ] QUV{OUMPXTd^H SfhIHk_o]^KIFIc®Z\XT]^j ]¾HacIQUFIdbV{QUFbHCJLKNMPORQRSGv/QRV VTj
MPOROUHCXSfhNMPFSThIH{_a]^KbFIc5Z\Xm]^j6hbHC_ev¾VThIH[za V1QUFbHCJLKNMPORQRSGv^s H s HAS ± ±_` HAXTFI]PKIOUORQ Æ ooÇ s d ×?×C× Æ M Ç 9H[S ÍÐÏ _aH1b}HCcMPFIcpcIH[NFIH Ù OR]^d : w}``à : ] W× HAS of``à9] cWd e `^hg d _± ^ s ;HANFIH `)àÆ of`U02w}` of` ê w}`^Ç s VT!H ]H wcIQRFIdI V QUFbHCJLKNMPORQRSGvYST] VThI]l SThNMiS Ò Æ ` qC]PFLSfMPQUFbV ooÇ ë Ù 0 × KHWq?MiOU O ` M Ù 0 E|¶¦´ ½¾¯[4´ E|¯)¹»´N° ¯[¼~º¦³P
À [?¶¦¼ oyÉ EGFrbXfMPq[SfQUqAH l6H)SfXmKIFIq?M¦SfH SfhbH)QRFeSfHCXxziMPOVT]kQRSncb]HAVFb]PSnd^] _aHAOU]l Ï ]^XnMi_a]z^H Ù^s Æ _ Ç)Æ $= Re¥© ¦ PQjRPi¨ i¢© VÇ HAS? VH[}bMPjkQUFIH1SfhbH`rIXm]^roHCXmSTQUHAV]iZÞSfhbQUVnqC]^F cIHAFIqCH)QUFeSfHAXmziMPO«s 9H[S à Ï × Ï^Ø MiFIc o3à Ï × H s 6]^FIcbKIqAS7MkVmQUj#KbOMiSTQU]^F3VxSfKIcv Sf]
VTHAHWhI]l ]PZ»STHCF5SfhIH)QUFeSfHAXmziMPO9qC]PFLSfMPQUFbV oÆ q?MiOUOUHAcSfhIH)qC]z^HAXfMPdPH Ç s ;] SfhIQRV Z\]^XnziMPXTQR]^KIVuziMPOUKIHAV]iZ ] _oHASGlHCHAFÙ#MPFbc Ù Ï^ÏPÏ^Ï s OU]iSSThIH`qA]z^HCXTMPd^HWzPHCXmVTKIV ] s Æ q Ç OU]PSySfhbHÎOUHAFIdPSfhW]PZNSfhIH6QUFeSfHAXmziMPOLz^HCXmVTKIV ] s KbrIra]PVTHÎlH6lÎMiFLSSThIH6ORHCFIdiSfh ]PZySfhIH{QRFLSTHCXxzPMiOwSf]k_aH1Fb]
jk]^XTH;SThNMPF × ÏeØ s ]l OUMPXTdPH;VThI]PKIOUc ] _aH J
= U On!
g
g
365
_
c
_ed
_
_
g
_
This is page 89 Printer: Opaque this
6 Convergence of Random Variables
6.1 Introduction The most important aspect of probability theory concerns the behavior of sequences of random variables. This part of probability is called large sample theory or limit theory or asymptotic theory. This material is extremely important for statistical inference. The basic question is this: what can we say about the limiting behavior of a sequence of random variables X1 , X2 , X3 , . . .? Since statistics and data mining are all about gathering data, we will naturally be interested in what happens as we gather more and more data. In calculus we say that a sequence of real numbers xn converges to a limit x if, for every ² > 0, |xn − x| < ² for all large n. In probability, convergence is more subtle. Going back to calculus for a moment, suppose that xn = x for all n. Then, trivially, limn xn = x. Consider a probabilistic version of this example. Suppose that X1 , X2 , . . . is a sequence of random variables which are independent and suppose each has a N (0, 1) distribution. Since these all have the same distribution, we are
90
6. Convergence of Random Variables
tempted to say that Xn “converges” to X ∼ N (0, 1). But this can’t quite be right since P(Xn = Z) = 0 for all n. (Two continuous random variables are equal with probability zero.) Here is another example. Consider X1 , X2 , . . . where Xi ∼ N (0, 1/n). Intuitively, Xn is very concentrated around 0 for large n. But P (Xn = 0) = 0 for all n. This chapter develops appropriate methods of discussing convergence of random variables. There are two main ideas in this chapter: 1. The law of large numbers says that sample average P X n = n−1 ni=1 Xi converges in probability to the expectation µ = E(X).
2. The central limit theorem says that sample average has approximately a Normal distribution for large n. More √ precisely, n(X n − µ) converges in distribution to a Normal(0, σ 2 ) distribution, where σ 2 = V(X).
6.2 Types of Convergence The two main types of convergence are defined as follows.
6.2 Types of Convergence
91
Definition 6.1 Let X1 , X2 , . . . be a sequence of random variables and let X be another random variable. Let Fn denote the cdf of Xn and let F denote the cdf of X. P
1. Xn converges to X in probability, written Xn −→ X, if, for every ² > 0, P(|Xn − X| > ²) → 0
(6.1)
as n → ∞. 2. Xn converges to X in distribution, written Xn à X, if, lim Fn (t) = F (t) (6.2) n→∞
at all t for which F is continuous.
There is another type of convergence which we introduce mainly because it is useful for proving convergence in probability. Definition 6.2 Xn converges to X in quadratic mean qm (also called convergence in L2 ), written Xn −→ X, if, E(Xn − X)2 → 0
(6.3)
as n → ∞. If X is a point mass at c – that is P(X = c) = 1 – we write qm qm P Xn −→ c instead of X −→ X. Similarly, we write Xn −→ c and Xn à c. Example 6.3 Let Xn ∼ N (0, 1/n). Intuitively, Xn is concentrating at 0 so we would like to say that Xn à 0. Let’s see if this is true. Let F be the distribution function for a point mass at
92
6. Convergence of Random Variables
√ 0. Note that nXn ∼ N (0, 1). Let Z denote a standard normal √ random variable. For t < 0, Fn (t) = P(Xn < t) = P( nXn < √ √ √ nt) = P(Z < nt) → 0 since nt → −∞. For t > 0, √ √ √ Fn (t) = P(Xn < t) = P( nXn < nt) = P(Z < nt) → 1 √ since nt → ∞. Hence, Fn (t) → F (t) for all t 6= 0 and so Xn à 0. But notice that Fn (0) = 1/2 6= F (1/2) = 1 so convergence fails at t = 0. But that doesn’t matter because t = 0 is not a continuity point of F and the definition of convergence in distribution only requires convergence at continuity points. ¥ The next theorem gives the relationship between the types of convergence. The results are summarized in Figure 6.1. Theorem 6.4 The following relationships hold: qm P (a) Xn −→ X implies that Xn −→ X. P (b) Xn −→ X implies that Xn à X. (c) If Xn à X and if P(X = c) = 1 for some real number c, P then Xn −→ X. In general, none of the reverse implications hold except the special case in (c). qm
Proof. We start by proving (a). Suppose that Xn −→ X. Fix ² > 0. Then, using Chebyshev’s inequality, E|Xn − X|2 → 0. P(|Xn − X| > ²) = P(|Xn − X| > ² ) ≤ ²2 Proof of (b). This proof is a little more complicated. You may skip if it you wish. Fix ² > 0 and let x be a continuity point of F . Then 2
2
Fn (x) = P(Xn ≤ x) = P(Xn ≤ x, X ≤ x + ²) + P(Xn ≤ x, X > x + ²) ≤ P(X ≤ x + ²) + P(|Xn − X| > ²) = F (x + ²) + P(|Xn − X| > ²).
Also, F (x − ²) = P(X ≤ x − ²) = P(X ≤ x − ², Xn ≤ x) + P(X ≤ x + ², Xn > x)
6.2 Types of Convergence
93
≤ Fn (x) + P(|Xn − X| > ²). Hence, F (x−²)−P(|Xn −X| > ²) ≤ Fn (x) ≤ F (x+²)+P(|Xn −X| > ²). Take the limit as n → ∞ to conclude that F (x − ²) ≤ lim inf Fn (x) ≤ lim sup Fn (x) ≤ F (x + ²). n→∞
n→∞
This holds for all ² > 0. Take the limit as ² → 0 and use the fact that F is continuous at x and conclude that limn Fn (x) = F (x). Proof of (c). Fix ² > 0. Then, P(|Xn − c| > ²) = P(Xn < c − ²) + P(Xn > c + ²)
≤ P(Xn ≤ c − ²) + P(Xn > c + ²) = Fn (c − ²) + 1 − Fn (c + ²)
→ F (c − ²) + 1 − F (c + ²) = 0 + 1 − 0 = 0.
Let us now show that the reverse implications do not hold. Convergence in probability does not imply convergence in quadratic mean. Let U ∼ Unif(0, 1) and let √ √ Xn = nI(0,1/n) (U ). Then P(|Xn | > ²) = P( nI(0,1/n) (U ) > P
²) = P(0 ≤ U < 1/n) = 1/n → 0. Hence, Then Xn −→ 0. But R 1/n E(Xn2 ) = n 0 du = 1 for all n so Xn does not converge in quadratic mean. Convergence in distribution does not imply convergence in probability. Let X ∼ N (0, 1). Let Xn = −X for n = 1, 2, 3, . . .; hence Xn ∼ N (0, 1). Xn has the same distribution function as X for all n so, trivially, limn Fn (x) = F (x) d for all x. Therefore, Xn → X. But P(|Xn − X| > ²) = P(|2X| > ²) = P(|X| > ²/2) 6= 0. So Xn does not tend to X in probability. ¥
94
6. Convergence of Random Variables
point-mass distribution quadratic mean
probability
distribution
FIGURE 6.1. Relationship between types of convergence. P
We can conclude that E(Xn ) → b if Xn is uniformly integrable. See the technical appendix.
Warning! One might conjecture that if Xn −→ b then E(Xn ) → b. This is not true. Let Xn be a random variable defined by P(Xn = n2 ) = 1/n and P(Xn = 0) = 1 − (1/n). Now, P(|Xn | < P ²) = P(Xn = 0) = 1 − (1/n) → 1. Hence, Z −→ 0. However, E(Xn ) = [n2 ×(1/n)]+[0×(1−(1/n))] = n. Thus, E(Xn ) → ∞. Summary. Stare at Figure 6.1. Some convergence properties are preserved under transformations. Theorem 6.5 Let Xn , X, Yn , Y be random variables. Let g be a continuous function. P P P (a) If Xn −→ X and Yn −→ Y , then Xn + Yn −→ X + Y . qm qm qm (b) If Xn −→ X and Yn −→ Y , then Xn + Yn −→ X + Y . (c) If Xn à X and Yn à c, then Xn + Yn à X + c. P P P (d) If Xn −→ X and Yn −→ Y , then Xn Yn −→ XY . (e) If Xn à X and Yn à c, then Xn Yn à cX. P P (f ) If Xn −→ X then g(Xn )−→ g(X). (g) If Xn à X then g(Xn ) à g(X).
6.3 The Law of Large Numbers Now we come to a crowning achievement in probability, the law of large numbers. This theorem says that the mean of a large sample is close to the mean of the distribution. For example, the proportion of heads of a large number of tosses is expected to be close to 1/2. We now make this more precise.
6.3 The Law of Large Numbers
95
Let X1 , X2 , . . . , be an iid sample and let µ = E(X1 ) and σ = V(X1 ). Recall that the sample mean is defined as X n = Note that µ = P E(Xi ) is the same n−1 ni=1 Xi and that E(X n ) = µ and V(X n ) = σ 2 /n. for all i so we can define µ = E(Xi ) for any i. By conTheorem 6.6 (The Weak Law of Large Numbers (WLLN).) P vention, we often If X1 , . . . , Xn are iid , then X n −→ µ. write µ = E(X1 ). 2
Interpretation of WLLN: The distribution of X n becomes more concentrated around µ as n gets large. Proof. Assume that σ < ∞. This is not necessary but it simplifies the proof. Using Chebyshev’s inequality, ¢ V(X n ) ¡ σ2 = P |X n − µ| > ² ≤ ²2 n²2
which tends to 0 as n → ∞. ¥
There is a stronger theorem in the appendix called the Example 6.7 Consider flipping a coin for which the probability strong law of large of heads is p. Let Xi denote the outcome of a single toss (0 numbers. or 1). Hence, p = P (Xi = 1) = E(Xi ). The fraction of heads after n tosses is X n . According to the law of large numbers, X n converges to p in probability. This does not mean that X n will numerically equal p. It means that, when n is large, the distribution of X n is tightly concentrated around p. Suppose that p = 1/2. How large should n be so that P (.4 ≤ X n ≤ .6) ≥ .7? First, E(X n ) = p = 1/2 and V(X n ) = σ 2 /n = p(1 − p)/n = 1/(4n). From Chebyshev’s inequality, P(.4 ≤ X n ≤ .6) = P(|X n − µ| ≤ .1)
= 1 − P(|X n − µ| > .1) 1 25 ≥ 1− = 1 − . 4n(.1)2 n
The last expression will be larger than .7 if n = 84. ¥
96
6. Convergence of Random Variables
6.4 The Central Limit Theorem Suppose that X1 , . . . , Xn are iid with mean µ and variance σ. P The central limit theorem (CLT) says that X n = n−1 i Xi has a distribution which is approximately Normal with mean µ and variance σ 2 /n. This is remarkable since nothing is assumed about the distribution of Xi , except the existence of the mean and variance. Theorem 6.8 (The Central Limit Theorem (CLT).) Let X1 , . . . , Xn P be iid with mean µ and variance σ 2 . Let X n = n−1 ni=1 Xi . Then √ n(X n − µ) Zn ≡ ÃZ σ where Z ∼ N (0, 1). In other words, Z z 1 2 √ e−x /2 dx. lim P(Zn ≤ z) = Φ(z) = n→∞ 2π −∞ Interpretation: Probability statements about X n can be approximated using a Normal distribution. It’s the probability statements that we are approximating, not the random variable itself. In addition to Zn à N (0, 1), there are several forms of notation to denote the fact that the distribution of Zn is converging to a Normal. They all mean the same thing. Here they are: Zn ≈ N (0, 1) µ ¶ σ2 X n ≈ N µ, n µ ¶ σ2 X n − µ ≈ N 0, n ¡ ¢ √ n(X n − µ) ≈ N 0, σ 2
6.4 The Central Limit Theorem
√
97
n(X n − µ) ≈ N (0, 1). σ
Example 6.9 Suppose that the number of errors per computer program has a Poisson distribution with mean 5. We get 125 programs. Let X1 , . . . , X125 be the number of errors in the programs. We want to approximate P(X < 5.5). Let µ = E(X1 ) = λ = 5 and σ 2 = V(X1 ) = λ = 5. Then, µ√ ¶ √ n(X n − µ) n(5.5 − µ) ≈ P(Z < 2.5) = .9938. < P(X < 5.5) = P σ σ √ The central limit theorem tells us that Zn = n(X − µ)/σ is approximately N(0,1). However, we rarely know σ. We can estimate σ 2 from X1 , . . . , Xn by n
Sn2 =
1 X (Xi − X n )2 . n − 1 i=1
This raises the following question: if we replace σ with Sn is the central limit theorem still true? The answer is yes. Theorem 6.10 Assume the same conditions as the CLT. Then, √ n(X n − µ) Ã N (0, 1). Sn You might wonder, how accurate the normal approximation is. The answer is given in the Berry-Ess`een theorem. Theorem 6.11 (Berry-Ess`een.) Suppose that E|X1 |3 < ∞. Then sup |P(Zn ≤ z) − Φ(z)| ≤ z
33 E|X1 − µ|3 √ 3 . 4 nσ
(6.4)
There is also a multivariate version of the central limit theorem.
98
6. Convergence of Random Variables
Theorem 6.12 (Multivariate central limit theorem) Let X1 , . . . , Xn be iid random vectors where X1i X2i Xi = .. . Xki
with mean
µ=
µ1 µ2 .. . µk
and variance matrix Σ. Let
where X i = n−1
Pn
=
X=
i=1
√
X1 X2 .. . Xk
X1i . Then,
E(X1i ) E(X2i ) .. . E(Xki )
.
n(X − µ) Ã N (0, Σ).
6.5 The Delta Method If Yn has a limiting Normal distribution then the delta method allows us to find the limiting distribution of g(Yn ) where g is any smooth function. Theorem 6.13 (The Delta Method) Suppose that √ n(Yn − µ) Ã N (0, 1) σ and that g is a differentiable function such that g 0 (µ) 6= 0. Then √ n(g(Yn ) − g(µ)) Ã N (0, 1). |g 0 (µ)|σ
6.5 The Delta Method
99
In other words, ¶ µ ¶ µ 2 σ2 0 2σ Yn ≈ N µ, =⇒ g(Yn ) ≈ N g(µ), (g (µ)) . n n Example 6.14 Let X1 , . . . , Xn be iid with finite mean µ and fi√ nite variance σ 2 . By the central limit theorem, n(X n )/σ Ã N (0, 1). Let Wn = eX n . Thus, Wn = g(X n ) where g(s) = es . Since g 0 (s) = es , the delta method implies that Wn ≈ N (eµ , e2µ σ 2 /n). ¥ There is also a multivariate version of the delta method. Theorem 6.15 (The Multivariate Delta Method) Suppose that Yn = (Yn1 , . . . , Ynk ) is a sequence of random vectors such that √
n(Yn − µ) Ã N (0, Σ).
Let g : Rk → R ane let
∂g ∂y1
∇g(y) = ... . ∂g ∂yk
Let ∇µ denote ∇g(y) evaluated at y = µ and assume that the elements of ∇µ are non-zero. Then √
¢ ¡ n(g(Yn ) − g(µ)) Ã N 0, ∇Tµ Σ∇µ .
Example 6.16 Let ¶ ¶ µ ¶ µ µ X1n X12 X11 , ..., , X2n X22 X21
be iid random vectors with mean µ = (µ1 , µ2 )T and variance Σ. Let n n 1X 1X X1 = X2 = X1i , X2i n i=1 n i=1
100
6. Convergence of Random Variables
and define Yn = X 1 X 2 . Thus, Yn = g(X 1 , X 2 ) where g(s1 , s2 ) = s1 s2 . By the central limit theorem, µ ¶ √ X 1 − µ1 n à N (0, Σ). X 2 − µ2 Now ∇g(s) =
Ã
!
=
¶µ
µ2 µ1
∂g ∂s1 ∂g ∂s2
µ
s2 s1
¶
and so ∇Tµ Σ∇µ
= (µ2 µ1 )
µ
σ11 σ12 σ12 σ22
¶
= µ22 σ11 + 2µ1 µ2 σ12 + µ21 σ22 .
Therefore, √
³ ´ 2 2 n(X 1 X 2 − µ1 µ2 ) Ã N 0, µ2 σ11 + 2µ1 µ2 σ12 + µ1 σ22 .
¥
6.6 Bibliographic Remarks Convergence plays a central role in modern probability theory. For more details, see Grimmet and Stirzaker (1982), Karr (1993) and Billingsley (1979). Advanced convergence theory is explained in great detail in van der Vaart and Wellner (1996) and van der Vaart (1998).
6.7 Technical Appendix 6.7.1 Almost Sure and L1 Convergence We say that Xn converges almost surely to X, written as Xn −→ X, if P({s : Xn (s) → X(s)}) = 1. L
1 We say that Xn converges in L1 to X, written Xn −→ X, if
E|Xn − X| → 0
6.7 Technical Appendix
101
as n → ∞. Theorem 6.17 Let Xn and X be random vaiables. Then: as P (a) Xn −→ X implies that Xn −→ X. qm L1 (b) Xn −→ X implies that Xn −→ X. L1 P (c) Xn −→ X implies that Xn −→ X. The weak law of large numbers says that X n converges to EX1 in probability. The strong law asserts that this is also true almost surely. Theorem 6.18 (The strong law of large numbers.) Let X1 , X2 , . . . as be iid. If µ = E|X1 | < ∞ then X n −→ µ. A sequence Xn is asymptotically uniformly integrable if lim lim sup E (|Xn |I(|Xn | > M )) = 0.
M →∞
n→∞
P
If Xn −→ b and Xn is asymptotically uniformly integrable, then E(Xn ) → b. 6.7.2 Proof of the Central Limit Theorem If X is a random variable, define its moment generating function (mgf) by ψX (t) = EetX . Assume in what follows that the mgf is finite in a neighborhood around t = 0. Lemma 6.19 Let Z1 , Z2 , . . . be a sequence of random variables. Let ψn the mgf of Zn . Let Z be another random variable and denote its mgf by ψ. If ψn (t) → ψ(t) for all t in some open interval around 0, then Zn à Z. proof of the central limit theorem. Let Yi = (Xi − P µ)/σ. Then, Zn = n−1/2 i Yi . Let ψ(t) be the mgf of Yi . The P √ mgf of i Yi is (ψ(t))n and mgf of Zn is [ψ(t/ n)]n ≡ ξn (t).
102
6. Convergence of Random Variables
Now ψ 0 (0) = E(Y1 ) = 0, ψ 00 (0) = E(Y12 ) = V ar(Y1 ) = 1. So, ψ(t) = ψ(0) + tψ 0 (0) +
t3 t2 00 ψ (0) + ψ 00 (0) + · · · 2! 3!
t2 t3 00 + ψ (0) + · · · 2 3! t2 t3 00 = 1 + + ψ (0) + · · · 2 3! = 1+0+
Now, · µ ¶¸n t ξn (t) = ψ √ n ¸n · 2 t3 t 00 + ψ (0) + · · · = 1+ 2n 3!n3/2 #n " t3 t2 00 ψ (0) + · · · + 3!n1/2 = 1+ 2 n → et
2 /2
which is the mgf of a N(0,1). The result follows from the previous Theorem. In the last step we used the fact that, if an → a then ³ an ´n → ea . 1+ n
6.8 Excercises 1. Let X1 , . . . , Xn be iid with finite mean µ = E(X1 ) and finite variance σ 2 = V(X1 ). Let X n be the sample mean and let Sn2 be the sample variance. (a) Show that E(Sn2 ) = σ 2 .
P P (b) Show that Sn2 −→ σ 2 . Hint: Show that Sn2 = cn n−1 ni=1 Xi2 − 2 dn X n where cn → 1 and dn → 1. Apply the law of large P numbers to n−1 ni=1 Xi2 and to X n . Then use part (e) of Theorem 6.5.
6.8 Excercises
103
2. Let X1 , X2 , . . . be a sequence of random variables. Show qm that Xn −→ b if and only if lim E(Xn ) = b and
n→∞
lim V(Xn ) = 0.
n→∞
3. Let X1 , . . . , Xn be iid and let µ = E(X1 ). Suppose that qm the variance is finite. Show that X n −→ µ. 4. Let X1 , X2 , . . . be a sequence of random variables such that ¶ µ 1 1 1 = 1 − 2 and P (Xn = n) = 2 . P Xn = n n n Does Xn converge in probability? Does Xn converge in quadratic mean? 5. Let X1 , . . . , Xn ∼ Bernoulli(p). Prove that n
1X 2 P X −→ p and n i=1 i
n
1 X 2 qm X −→ p. n i=1 i
6. Suppose that the height of men has mean 68 inches and standard deviation 4 inches. We draw 100 men at random. Find (approximately) the probability that the average height of men in our sample will be at least 68 inches. 7. Let λn = 1/n for n = 1, 2, . . .. Let Xn ∼ Poisson(λn ). P
(a) Show that Xn −→ 0.
P
(b) Let Yn = nXn . Show that Yn −→ 0. 8. Suppose we have a computer program consisting of n = 100 pages of code. Let Xi be the number of errors on the ith page of code. Suppose that the Xi0 s are Poisson with mean P 1 and that they are independent. Let Y = ni=1 Xi be the total number of errors. Use the central limit theorem to approximate P(Y < 90).
104
6. Convergence of Random Variables
9. Suppose that P(X = 1) = P(X = −1) = 1/2. Define Xn =
½
X with probability 1 − en with probability n1 .
1 n
Does Xn converge to X in probability? Does Xn converge to X in distribution? Does E(X − Xn )2 converge to 0? 10. Let Z ∼ N (0, 1). Let t > 0. (a) Show that, for any k > 0, P(|Z| > t) ≤
E|Z|k . tk
(b) (Mill’s inequality.) Show that ½ ¾1/2 −t2 /2 2 e P(|Z| > t) ≤ . π t Hint. Note that P(|Z| > t) = 2P(Z > t). Now write out what P(Z > t) means and note that x/t > 1 whenever x > t. 11. Suppose that Xn ∼ N (0, 1/n) and let X be a random variable with distribution F (x) = 0 if x < 0 and F (x) = 1 if x ≥ 0. Does Xn converge to X in probability? (Prove or disprove). Does Xn converge to X in distribution? (Prove or disprove). 12. Let X, X1 , X2 , X3 , . . . be random variables that are positive and integer valued. Show that Xn à X if and only if lim P(Xn = k) = P(X = k)
n→∞
for every integer k.
6.8 Excercises
105
13. Let Z1 , Z2 , . . . be i.i.d., random variables with density f . Suppose that P(Zi > 0) = 1 and that λ = limx↓0 f (x) > 0. Let Xn = n min{Z1 , . . . , Zn }. Show that Xn à Z where Z has an exponential distribution with mean 1/λ. 2
14. Let X1 , . . . , Xn ∼ Uniform(0, 1). Let Yn = X n . Find the limiting distribution of Yn . 15. Let
µ
X11 X21
¶ µ ¶ µ ¶ X12 X1n , , ..., X22 X2n
be iid random vectors with mean µ = (µ1 , µ2 ) and variance Σ. Let n n 1X 1X X1 = X1i , X 2 = X2i n i=1 n i=1 and define Yn = X 1 /X 2 . Find the limiting distribution of Yn .
Part II Statistical Inference
103
7 Models, Statistical Inference and Learning
7.1 Introduction Statistical inference, or \learning" as it is called in computer science, is the process of using data to infer the distribution that generated the data. The basic statistical inference problem is this: We observe X1 ; : : : ; Xn F . We want to infer (o r estimate o r learn) F o r some feature of F such as its mean.
7.2 Parametric and Nonparametric Models A statistical model is a set of distributions (or a set of densities) F. A parametric model is a set F that can be parameterized b y a nite n umber of parameters. For example, if we assume that the data come from a Normal distribution then
106
7. Models, Statistical Inference and Learning
the model is
F = f (x; ; ) =
1
p
exp
1 (x 2 2
)2 ; 2 R ; > 0 :
(7.1) This is a two-parameter model. We have written the density as f (x; ; ) to show that x is a value of the random variable whereas and are parameters. In general, a parametric model takes the form F = f (x; ) : 2 (7.2) where is an unknown parameter (or vector of parameters) that can take values in the parameter space . If is a vector but we are only interested in one component of , we call the remaining parameters nuisance parameters. A nonparametric model is a set F that cannot be parameterized by a nite number of parameters. For example, FALL = fall cdf 0 sg is nondistinction parametric.
The between parametric and nonparametric is more subtle than this but we don't need a rigorous de nition for our purposes.
Example 7.1 (One-dimensional Parametric Estimation.) Let X1 ; : : : ; Xn
be independent Bernoulli(p) observations. The problem is to estimate the parameter p.
Example 7.2 (Two-dimensional Parametric Estimation.) Suppose that
X1 ; : : : ; Xn F and we assume that the pdf f 2 F where F is given in (7.1). In this case there are two parameters, and . The goal is to estimate the parameters from the data. If we are only interested in estimating then is the parameter of interest and is a nuisance parameter.
Example 7.3 (Nonparametric estimation of the cdf.) Let X1 ; : : : ; Xn
be independent observations from a cdf F . The problem is to estimate F assuming only that F 2 FALL = fall cdf 0 sg. Example 7.4 (Nonparametric density estimation.) Let X1 ; : : : ; Xn
be independent observations from a cdf F and let f = F 0 be the
7.2 Parametric and Nonparametric Models
107
. Suppose we want to estimate the pdf f . It is not possible to estimate f assuming only that F 2 FALL . We need to assume some smoothness on f . For example, we might assume T that f 2 F = FDENS FSOB where FDENS is the set of all probability density functions and
pdf
FSOB = f :
Z
(f 00 (x))2 dx < 1 :
The class FSOB is called a Sobolev space; it is the set of functions that are not \too wiggly." Example 7.5 (Nonparametric estimation of functionals.) Let X1 ; : : : ; Xn R
F . Suppose we want to estimate = E (X1 ) = x dF (x) assuming only that exists. The mean may be thought of as a R function of F : we can write = T (F ) = x dF (x). In general, any function of F is called a statistical functional . Other R 2 examples of functions are the variance T (F ) = x dF (x) R 2 xdF (x) and the median T (F ) = F 1=2 .
Example 7.6 (Regression, prediction and classi cation.) Suppose we
observe pairs of data (X1 ; Y1 ); : : : (Xn ; Yn). Perhaps Xi is the blood pressure of subject i and Yi is how long they live. X is called a predictor or regressor or feature or independent variable. Y is called the outcome or the response variable or the dependent variable. We call r(x) = E (Y jX = x) the regression function. If we assume that f 2 F where F is nite dimensional { the set of straight lines for example { then we have a parametric regression model. If we assume that f 2 F where F is not nite dimensional then we have a parametric regression model. The goal of predicting Y for a new patient based on their X value is called prediction. If Y is discrete (for example, live or die) then prediction is instead called classi cation. If our goal is to estimate the functin f , then we call this regression or curve estimation. Regression models
108
7. Models, Statistical Inference and Learning
are sometimes written as
Y = f (X ) +
(7.3)
where E () = 0. We can always rewrite a regression model this way. To see this, de ne = Y f (X ) and hence Y = Y + f (X ) f (X ) = f (X )+. Moreover, E () = E E (jX ) = E (E (Y f (X ))jX ) = E (E (Y jX ) f (X )) = E (f (X ) f (X )) = 0.
It is traditional in most introductory courses to start with parametric inference. Instead, we will start with nonparametric inference and then we will cover parametric inference. In some respects, nonparametric inference is easier to understand and is more useful than parametric inference. What's Next?
There are many approaches to statistical inference. The two dominant approaches are called frequentist inference and Bayesian inference. We'll cover both but we will start with frequentist inference. We'll postpone a discussion of the pro's and con's of these two until later. Frequentists and Bayesians.
If F = ff (xR; ) : 2 g is a parametric model, we write P (X 2 A) = A f (x; )dx and E (r(X )) = R r(x)f (x; )dx. The subscript indicates that the probability or expectation is with respect to f (x; ); it does not mean we are averaging over . Similarly, we write V for the variance. Some Notation.
7.3 Fundamental Concepts in Inference Many inferential problems can be identi ed as being one of three types: estimation, con dence sets or hypothesis testing. We will treat all of these problems in detail in the rest of the book. Here, we give a brief introduction to the ideas. 7.3.1 Point Estimation
7.3 Fundamental Concepts in Inference
109
Point estimation refers to providing a single \best guess" of some quantity of interest. The quantity of interest could be a parameter in a parametric model, a cdf F , a probability density function f , a regression function r, or a prediction for a future value Y of some random variable. By convention, we denote a point estimate of by b. Remember that is a xed, unknown quantity. The estimate b depends on the data so b is a random variable.
More formally, let X1 ; : : : ; Xn be n iid data point from some distribution F . A point estimator bn of a parameter is some function of X1 ; : : : ; Xn:
bn = g (X1; : : : ; Xn ): We de ne
bias (bn ) = E (bn )
(7.4) to be the bias of bn . We say that bn is unbiased if E (bn ) = . Unbiasedness used to receive much attention but these days it is not considered very important; many of the estimators we will use are biased. A point estimator bn of a parameter is consistent if bn P! . Consistency is a reasonable requirement for estimators. The distribution of bn is called the sampling distribution. The standard deviation of bn is called the standard error, denoted by se : se = se (bn )
q = V (bn ):
(7.5)
Often, it is not possible to compute the standard error but usually we can estimate the standard error. The estimated standard error is denoted by seb . Example 7.7 Let X1 ; : : : ; Xn
Bernoulli(p) and let pbn = n
1 P Xi . i
P Then E (pbn ) = n 1 i E (Xi ) = p so pbn is unbiased. The standard
110
7. Models, Statistical Inference and Learning p
p
error is se = p V (pbn ) = p(1 error is seb = pb(1 pb)=n.
p)=n. The estimated standard
The quality of a point estimate is sometimes assessed by the mean squared error, or MSE, de ned by MSE = E (bn
)2 :
Recall that E () refers to expectation with respect to the distribution n Y f (x1 ; : : : ; xn ; ) = f (xi ; ) i=1
that generated the data. It does not mean we are averaging over a density for . Theorem 7.8 The MSE can be written as MSE = bias (bn )2 + V (bn ):
Proof.
E (bn
(7.6)
Let n = E (bn ). Then
)2 = E (bn n + n )2 = E (bn n)2 + 2(n )2 E (bn = (n )2 + E (bn n )2 = bias 2 + V (bn ):
n ) + E ( n
!P 0 and se ! 0 as n ! 1 then bn is consistent, that is, bn ! . Proof. If bias ! 0 and se ! 0 then, by Theorem 7.8, qm MSE ! 0. It follows that bn ! . (Recall de nition 6.3.) The result follows from part (b) of Theorem 6.4. Theorem 7.9 If bias
Example 7.10 Returning to the coin ipping example, we have p that E p (pbn ) = p so that bias = p p = 0 and se = p(1 p)=n !
0. Hence, pbn P! p, that is, pbn is a consistent estimator.
)2
7.3 Fundamental Concepts in Inference
111
Many of the estimators we will encounter will turn out to have, approximately, a Normal distribution. De nition 7.11 An estimator is asymptotically Nor-
mal if
bn
se
N (0; 1):
(7.7)
7.3.2 Con dence Sets
A 1 con dence interval for a parameter is an interval Cn = (a; b) where a = a(X1 ; : : : ; Xn ) and b = b(X1 ; : : : ; Xn ) are functions of the data such that P (
2 Cn) 1 ;
for all 2 :
(7.8)
In words, (a; b) traps with probability 1 . We call 1 the coverage of the con dence interval. Commonly, people use 95 per cent con dence intervals which corresponds to choosing = 0:05. Note: Cn is random and is xed! If is a vector then we use a con dence set (such a sphere or an ellipse) instead of an interval.
Warning! There is much confusion about how to interpret a con dence interval. A con dence interval is not a probability statement about since is a xed quantity, not a random variable. Some texts interpret con dence intervals as follows: if I repeat the experiment over and over, the interval will contain the parameter 95 per cent of the time. This is correct but useless since we rarely repeat the same experiment over and over. A better interpretation is this: On day 1, you collect data and construct a 95 per cent con dence interval for a parameter 1 . On day
112
7. Models, Statistical Inference and Learning 2, you collect new data and construct a 95 per cent con dence interval for an unrelated parameter 2 . On day 3, you collect new data and construct a 95 per cent con dence interval for an unrelated parameter 3 . You continue this way constructing con dence intervals for a sequence of unrelated parameters 1 ; 2 ; : : : Then 95 per cent your intervals will trap the true parameter value. There is no need to introduce the idea of repeating the same experiment over and over.
Example 7.12 Every day, newspapers report opinion polls. For
example, they might say that \83 per cent of the population favor arming pilots with guns." Usually, you will see a statement like \this poll is accurate to within 4 points 95 per cent of the time." They are saying that 83 4 is a 95 per cent con dence interval for the true but unknown proportion p of people who favor arming pilots with guns. If you form a con dence interval this way everyday for the rest of your life, 95 per cent of your intervals will contain the true parameter. This is true even though you are estimating a dierent quantity (a dierent poll question) every day.
Later, we will discuss Bayesian methods in which we treat as if it were a random variable and we do make probability statements about . In particular, we will make statements like \the probability that is Cn, given the data, is 95 per cent." However, these Bayesian intervals refer to degree-of-belief probabilities. These Bayesian intervals will not, in general, trap the parameter 95 per cent of the time. Example 7.13 In the coin ipping setting, let Cn = (pbn n ; pbn +
n ) where 2n = log(2=)=(2n). From Hoeding's inequality (5.4) it follows that P(p 2 Cn ) 1 for every p. Hence, Cn is a 1 con dence interval.
7.3 Fundamental Concepts in Inference
113
As mentioned earlier, point estimators often have a limiting Normal distribution, meaning that equation (7.7) holds, that is, bn N (; seb 2 ). In this case we can construct (approximate) con dence intervals as follows. Theorem 7.14 (Normal-based Con dence Interval.) Suppose that bn N (; seb 2 ). Let be the cdf of a standard Normal and
let z=2 = 1 (1 (=2)), that is, P(Z > z=2 ) = =2 and P( z=2 < Z < z=2 ) = 1 where Z N (0; 1). Let
Cn = (bn Then
z=2 seb ; bn + z=2 seb ):
P (
2 Cn) ! 1 :
(7.10)
Let Zn = (bn )=seb . By assumption Zn Z N (0; 1). Hence, Proof.
P (
2 Cn)
(7.9)
=
P bn
=
P
!
P
z=2 seb < z=2 <
bn
b se
z=2 < Z < z=2 = 1 :
Z where
< bn + z=2 seb !
< z=2
For 95 per cent con dence intervals, = 0:05 and z=2 = 1:96 2 leading to the approximate 95 per cent con dence interval bn 2seb . We will discuss the construction of con dence intervals in more generality in the rest of the book. Example 7.15 Let X1 ; : : : ; Xn P
Bernoulli( p) and let pbn = n P
1 Pn Xi . i=1
2 n p(1 p) = n 2 np(1 Then V (pbn ) = n 2 ni=1 V (Xp i) = n i=1 p p) = p(1 p)=n. Hence, se = p(1 p)=n and seb = pbn (1 pbn )=n. By the Central Limit Theorem, pbn N (p; se b 2 ). Therefore, an approximate 1 con dence interval is
pb z=2 seb = pb z=2
r
pbn (1 pbn ) : n
114
7. Models, Statistical Inference and Learning
Compare this with the con dence interval in the previous example. The Normal-based interval is shorter but it only has approximately (large sample) correct coverage. 7.3.3 Hypothesis Testing
The term \retaining the null hypothesis" is due to Chris Genovese. Other terminology is \accepting the null" or \failing to reject the null."
In hypothesis testing, we start with some default theory { called a null hypothesis { and we ask if the data provide suÆcient evidence to reject the theory. If not we retain the null hypothesis. Example 7.16 (Testing if a Coin is Fair) Suppose X1 ; : : : ; Xn Bernoulli(p) denote n independent coin ips. Suppose we want to test if the coin is fair. Let H0 denote the hypothesis that the coin is fair and let H1 denote the hypothesis that the coin is not fair. H0 is called the null hypothesis and H1 is called the alternative hypothesis. We can write the hypotheses as
H0 : p = 1=2 versus H1 : p 6= 1=2: It seems reasonable to reject H0 if T = jpbn (1=2)j is large. When we discuss hypothesis testing in detail, we will be more precise about how large T should be to reject H0 .
7.4 Bibliographic Remarks Statistical inference is covered in many texts. Elementary texts include DeGroot and Schervish (2001) and Larsen and Marx (1986). At the intermediate level I recommend Casella and Berger (2002) and Bickel and Doksum (2001). At the advanced level, Lehmann and Casella (1998), Lehmann (1986) and van der Vaart (1998).
7.5 Technical Appendix
7.5 Technical Appendix
115
Our de nition of con dence interval requires that P ( 2 Cn ) 1 for all 2 . An pointwise asymptotic con dence interval requires that lim infn!1 P ( 2 Cn ) 1 for all 2 . An uniform asymptotic con dence interval requres that lim infn!1 inf 2 P ( 2 Cn ) 1 . The approximate Normal-based interval is a pointwise asymptotic con dence interval. In general, it might not be a uniform asymptotic con dence interval.
116
7. Models, Statistical Inference and Learning
8 Estimating the
cdf
and Statistical
Functionals
The rst inference problem we will consider is nonparametric estimationthe of cdf F and functions of the cdf.
8.1 The Empirical Distribution Function Let X1 ; : : : ; Xn F be iid where F is a distribution function on the real line. We will estimate F with the empirical distribution function, which is de ned as follo ws. De nition 8.1 The empirical distribution function Fbn
is the cdf that puts mass 1=n at e ach data point Xi . Formally,
Fbn (x) =
Pn i=1 I (Xi
x)
n n umber of observations less than or equal to x = (8.1) n
where
I (Xi x) =
1 if Xi x 0 if Xi > x:
8. Estimating the cdf and Statistical Functionals
118
Example 8.2 (Nerve Data.) Cox and Lewis (1966) reported 799
waiting times between successive pulses along a nerve bre. The rst plot in Figure 8.1 shows the a \toothpick plot" where each toothpick shows the location of one data point. The second plot shows that empirical cdf Fbn . Suppose we want to estimate the fraction of waiting times between .4 and .6 seconds. The estimate is Fbn (:6) Fbn (:4) = :93 :84 = :09.
The following theorems give some properties of Fbn (x). Theorem 8.3 At any xed value of x, E
Fbn (x)
= F (x) and
V Fbn (x)
Thus,
F (x)(1 F (x)) n P and hence, Fbn (x) ! F (x). MSE =
=
F (x)(1 F (x)) : n
!0
Actually,
supx jFbn (x) F (x)j Theorem 8.4 (Glivenko-Cantelli Theorem.) Let X1 ; : : : ; Xn F . converges to 0 Then almost surely. sup jFbn(x) F (x)j P! 0: x
8.2 Statistical Functionals A statistical functional T (F ) is any function Rof F . Examples R are the mean = xdF (x), the variance 2 = (x )2 dF (x) and the median m = F 1 (1=2). De nition 8.5 The plug-in estimator of = T (F ) is
de ned by
bn = T (Fbn ):
In other words, just plug in Fbn for the unknown F .
8.2 Statistical Functionals
0.0
0.2
0.4
0.6
119
0.8
1.0
1.2
1.4
0.8
1.0
1.2
1.4
0.4 0.0
cdf
0.8
time
0.0
0.2
0.4
0.6 time
FIGURE 8.1. Nerve data. The solid line in the middle is the empirical distribution function. The lines above and below the middle line are a 95 per cent con dence band. The con dence band is explained in the appendix.
120
8. Estimating the cdf and Statistical Functionals R
A functional of theR form r(x)dF (x) is called a Rlinear functional. Recall that r(x)dFP(x) is de ned to be r(x)f (x)dx in the continuous case and j r(xj )f (xj ) in the discrete. The empirical cdf Fbn (x)R is discrete, putting mass 1=n at each Xi . Hence, if T (F ) = r(x)dF (x) is a linear functional then we have: R
The plug-in estimator for linear functional T (F ) = r(x)dF (x) is:
T (Fbn)
=
Z
r(x)dFbn (x)
n 1X = r(Xi ): n i=1
(8.2)
Sometimes we can nd the estimated standard error se b of T (Fbn ) by doing some calculations. However, in other cases it is not obvious how to estimate the standard error. In the next chapter, we will discuss a general method for nding se b . For now, let us just assume that somehow we can nd se b . In many cases, it turns out that 2 T (Fbn ) N (T (F ); se b ):
By equation (7.10), an approximate 1 for T (F ) is then
con dence interval
T (Fbn ) z=2 se: b We will call this the Normal-based interval. For a 95 per cent con dence interval, z=2 = z:05=2 = 1:96 2 so the interval is T (Fbn ) 2 se: b Example 8.6 (The mean.) Let = T (F ) = R
R
x dF (x). The plugin estimator is b = x dFbn (x) = X n . The standard error is q p se = V (X n ) = = n. If b denotes an estimate of , then
(8.3)
8.2 Statistical Functionals
p
121
the estimated standard error is b= n. (In the next example, we shall see how to estimate .) A Normal-based con dence interval for is X n z=2 se2 : Example 8.7 (The Variance) Let 2 = T (F ) = V (X ) = R
xdF (x) 2 . The plug-in estimator is
b2 = =
Z
R 2 x dF (x)
Z
2 b xdFn (x) !2 n X
x2 dFbn(x)
n 1X X2 n i=1 i
1 X n i=1 i
n 1X = (X n i=1 i
X n )2 :
Another reasoable estimator of 2 is the sample variance
Sn2 =
n X
1
(Xi
X n )2 :
n 1 i=1 In practice, there is little dierence between b2 and Sn2 and you can use either one. Returning the our last example, we now see that the estimated standard error of the estimate of the mean is p se b = b = n. Example 8.8 (The Skewness) Let and 2 denote the mean and
variance of a random variable X . The skewness is de ned to be R
E (X )3 = = 3
R
(x )3 dF (x) : (x )2 dF (x) 3=2
The skewness measure the lack of symmetry of a distribution. P To nd the plug-in estimate, rst recall that b = n 1 i Xi and P b2 = n 1 i (Xi b)2 . The plug-in estimate of is
b =
R
nR
(x
(x
1 P (Xi n i o3=2 = b3
)3 dFbn(x) )2 dFbn(x)
b)3
:
122
8. Estimating the cdf and Statistical Functionals
Example 8.9 (Correlation.) Let Z = (X; Y ) and let = T (F ) = E (X
X )(Y Y )=(x y ) denote the correlation between X and Y , where F (x; y ) is bivariate. We can write T (F ) = a(T1 (F ); T2 (F ); T3 (F ); T4 (F ); T5 (F )) where R
R
R
T1 (F ) = R x dF (z ) T2 (F ) = R y dF (z ) T3 (F ) = xy dF (z ) T4 (F ) = x2 dF (z ) T5 (F ) = y 2 dF (z ) and
a(t1 ; : : : ; t5 ) = p
(t4
t3
t1 t2 : t21 )(t5 t22 )
Replace F with Fbn in T1 (F ), . . . , T5 (F ), and take
b = a(T1 (Fbn ); T2 (Fbn ); T3 (Fbn ); T4 (Fbn ); T5 (Fbn )): We get
b =
qP
P i (Xi
i (Xi
X n )(Yi Y n ) qP 2 X n )2 i (Yi Y n )
which is called the sample correlation.
Example 8.10 (Quantiles.) Let F be strictly increasing with den-
sity f . The T (F ) = F 1 (p) be the pth quantile. The estimate if T (F ) is Fbn 1 (p). We have to be a bit careful since Fbn is not invertible. To avoid ambiguity we de ne Fbn 1 (p) = inf fx : Fbn (x) pg. We call Fbn 1 (p) the pth sample quantile.
Only in the rst example did we compute a standard error or a con dence interval. How shall we handle the other examples. When we discuss parametric methods, we will develop formulae for standard errors and con dence intervals. But in our nonparametric setting we need something else. In the next chapter, we will introduce two methods { the jackknife and the bootstrap { for getting standard errors and con dence intervals. Example 8.11 (Plasma Cholesterol.) Figure 8.2 shows histograms
for plasma cholesterol (in mg/dl) for 371 patients with chest
8.2 Statistical Functionals
123
pain (Scott et al. 1978). The histograms show the percentage of patients in 10 bins. The rst histogram is for 51 patients who had no evidence of heart disease while the second histogram is for 320 patients who had narrowing of the arteries. Is the mean cholesterol dierent in the two groups? Let us regard these data R as samples Rfrom two distributions F1 and F2 . Let 1 = xdF1 (x) and 2 = xdF2 (x) denote the means of the two populations. R The plug-in estimates are b1 = xdFbn;1 (x) = X n;1 = 195:27 R and b2 = xdFbn;2 (x) = X n;2 = 216 :19: Recall that the standard Pn 1 error of the sample mean b = n i=1 Xi is se ( b) =
v u n X u tV 1 Xi
n i=1
=
v u u t
n 1 X V (Xi ) = n2 i=1
r
n 2 =p 2 n n
which we estimate by b ( se b) =
where
pbn
v u n u1 X t b = (X n i=1 i
X )2 :
For the two groups this yields seb (b1 ) = 5:0 and seb (b2 ) = 2:4. Approximate 95 per cent con dence intervals for 1 and 2 are b1 2seb (b1 ) = (185; 205) and b2 2seb (b2 ) = (211; 221). Now, consider the functional = T (F2 ) T (F1 ) whose plug-in estimate is b = b2 b1 = 216:19 195:27 = 20:92: The standard error of b is se =
p V ( b2
b1 ) =
p
V ( b2 ) + V ( b1 ) =
p
(se (b1 ))2 + (se (b2 ))2
and we estimate this by b = se
p
(seb (b1 ))2 + (seb (b2 ))2 = 5:55:
124
8. Estimating the cdf and Statistical Functionals
An approximate 95 per cent con dence interval for is b 2seb = (9:8; 32:0). This suggests that cholesterol is higher among those with narrowed arteries. We should not jump to the conclusion (from these data) that cholesterol causes heart disease. The leap from statistical evidence to causation is very subtle and is discussed later in this text.
8.3 Technical Appendix In this appendix we explain how to construct a con dence band for the cdf . Theorem 8.12 (Dvoretzky-Kiefer-Wolfowitz (DKW) inequality.) Let X1 ; : : : ; Xn be iid from F . Then, for any > 0,
P
sup jF (x) x
j > 2e
Fbn (x)
2n2 :
(8.4)
From the DKW inequality, we can construct a con dence set. Let 2n = log(2=)=(2n), L(x) = maxfFbn (x) n ; 0g and U (x) = minfFbn (x) + n ; 1g. It follows from (8.4) that for any F , P (F
2 Cn) 1 :
Thus, Cn is a nonparametric 1 con dence set for F . A better name for Cn is a con dence band. To summarize: A 1 nonparametric con dence band for F is (L(x); U (x)) where
L(x) = maxfFbn (x) n ; 0g U (x) = minfFbn (x) + n ; 1g s 1 2 n = log : 2n
8.3 Technical Appendix
100
150
200
250
300
125
350
400
plasma cholesterol for patients without heart disease
100
150
200
250
300
350
400
plasma cholesterol for patients with heart disease
FIGURE 8.2. Plasma cholesterol for 51 patients with no heart disease and 320 patients with narrowing of the arteries.
126
8. Estimating the cdf and Statistical Functionals
Example 8.13 The dashed lines 8.1 give a 95 per cent q in Figure
con dence band using n =
2 1 2n log :05 = :048.
8.4 Bibliographic Remarks The Glivenko-Cantelli theorem is the tip of the iceberg. The theory of distribution functions is a special case of what are called empirical processes which underlie much of modern statistical theory. Some references on empirical processes are Shorack and Wellner (1986) and van der Vaart and Wellner (1996).
8.5 Exercises 1. Prove Theorem 8.3. 2. Let X1 ; : : : ; Xn Bernoulli(p) and let Y1 ; : : : ; Ym Bernoulli(q ). Find the plug-in estimator and estimated standard error for p. Find an approximate 90 per cent con dence interval for p. Find the plug-in estimator and estimated standard error for p q . Find an approximate 90 per cent con dence interval for p q . 3. (Computer Experiment.) Generate 100 observations from a N(0,1) distribution. Compute a 95 per cent con dence band for the cdf F . Repeat this 1000 times and see how often the con dence band contains the true distribution function. Repeat using data from a Cauchy distribution. 4. Let X1 ; : : : ; Xn F and let Fbn (x) be the empirical distribution function. For a xed x, use the central limit theorem to nd the limiting distribution of Fbn (x). 5. Let x and y be two distinct points. Find Cov(Fbn (x); Fbn (y )).
8.5 Exercises
127
6. Let X1 ; : : : ; Xn F and let Fb be the empirical distribution function. Let a < b be xed numbers and de ne = T (F ) = F (b) F (a). Let b = T (Fbn) = Fbn (b) Fbn (a). Find the estimated standard error of b. Find an expression for an approximate 1 con dence interval for . 7. Data on the magnitudes of earthquakes near Fiji are available on the course website. Estimate the cdf F (x). Compute and plot a 95 per cent con dence envelope for F . Find an approximate 95 per cent con dence interval for F (4:9) F (4:3). 8. Get the data on eruption times and waiting times between eruptions of the old faithful geyser from the course website. Estimate the mean waiting time and give a standard error for the estimate. Also, give a 90 per cent con dence interval for the mean waiting time. Now estimate the median waiting time. In the next chapter we will see how to get the standard error for the median. 9. 100 people are given a standard antibiotic to treat an infection and another 100 are given a new antibiotic. In the rst group, 90 people recover; in the second group, 85 people recover. Let p1 be the probability of recovery under the standard treatment and let p2 be the probability of recovery under the new treatment. We are interested in estimating = p1 p2 . Provide an estimate, standard error, an 80 per cent con dence interval and a 95 per cent con dence interval for . 10. In 1975, an experiment was conducted to see if cloud seeding produced rainfall. 26 clouds were seeded with silver nitrate and 26 were not. The decision to seed or not was made at random. Get the data from http://lib.stat.cmu.edu/DASL/Stories/CloudSeeding.html
128
8. Estimating the cdf and Statistical Functionals
Let be the dierence in the median precipitation from the two groups. Estimate . Estimate the standard error of the estimate and produce a 95 per cent con dence interval.
9 The Bootstrap
The bootstrap is a nonparametric method for estimating standard errors and computing con dence intervals. Let
Tn = g (X1 ; : : : ; Xn) be a statistic, that is, any function of the data. Suppose we want to know V F (Tn ), the v ariance of Tn . We hav e written V F to emphasize that the variance usually depends onP the unknown n 1 distribution function F . ForRexample, if Tn = n X then i=1 R i 2 2 2 V F (Tn ) = =n where = (x ) dF (x) and = xdF (x). The bootstrap idea has two parts. First we estimate V F (Tn ) with Pn 1 V Fbn (Tn ). Thus, for Tn = n b 2 =n i=1 Xi we hav e V Fbn (Tn ) = P complicated where b2 = n 1 ni=1 (Xi X n ). However, in more cases we cannot write down a simple formula for V Fbn (Tn ). This leads us to the second step which is to approximate V Fbn (Tn ) using simulation. Before proceeding, let us discuss the idea of simulation.
9.1 Simulation
130
9. The Bootstrap
Suppose we draw an iid sample Y1 ; : : : ; YB from a distribution G. By the law of large numbers, B 1X Yj P! Yn = B j =1
Z
y dG(y ) = E (Y )
as B ! 1. So if we draw a large sample from G, we can use the sample mean Y n to approximate E (Y ). In a simulation, we can make B as large as we like in which case, the dierence between Y n and E (Y ) is negligible. More generally, if h is any function with nite mean then B 1X h(Y ) P! B j =1 j
Z
h(y )dG(y ) = E (h(Y ))
as B ! 1. In particular,
2 Z 2 Z B B B 1X 1X 1X P 2 2 2 (Y Y ) = Y Y ! y dF (y) ydF (y ) = V (Y ): B j =1 j B j =1 j B j =1 j Hence, we can use the sample variance of the simulated values to approximate V (Y ).
9.2 Bootstrap Variance Estimation According to what we just learned, we can approximate V Fbn (Tn ) by simulation. Now V Fbn (Tn ) means \the variance of Tn if the distribution of the data is Fbn ." How can we simulate from the distribution of Tn when the data are assumed to have distribution Fbn ? The answer is to simulate X1 ; : : : ; Xn from Fbn and then compute Tn = g (X1; : : : ; Xn ). This constitutes one draw from the distribution of Tn . The idea is illustrated in the following diagram: Real world F =) X1 ; : : : ; Xn =) Tn = g (X1; : : : ; Xn ) Bootstrap world Fbn =) X1 ; : : : ; Xn =) Tn = g (X1; : : : ; Xn )
9.2 Bootstrap Variance Estimation
131
How do we simulate X1 ; : : : ; Xn from Fbn ? Notice that Fbn puts mass 1/n at each data point X1 ; : : : ; Xn. Therefore, drawing an observation from Fbn is equivalent to drawing one point at random from the original data set. Thu, to simulate X1 ; : : : ; Xn Fbn it suÆces to draw n observations with replacement from X1 ; : : : ; Xn . Here is a summary: Boostrap Variance Estimation 1. Draw X1 ; : : : ; Xn Fbn . 2. Compute Tn = g (X1; : : : ; Xn ). . 3. Repeat steps 1 and 2, B times, to get Tn; 1 ; : : : ; Tn;B
4. Let
B 1X vboot = T B b=1 n;b
B 1X T B r=1 n;b
!2
:
(9.1)
Example 9.1 The following pseudo-code shows how to use the
bootstrap to estimate the standard of the median.
132
9. The Bootstrap
Bootstrap for The Median Given data X = (X(1), ..., X(n)): T <- median(X) Tboot <- vector of length B for(i in 1:N){ Xstar <- sample of size n from X (with replacement) Tboot[i] <- median(Xstar) } se <- sqrt(variance(Tboot))
The following schematic diagram will remind you that we are using two approximations: V F (Tn )
not so small z}|{
V Fbn (Tn )
small
z}|{
vboot :
Example 9.2 Consider the nerve data. Let = T (F ) =
R
(x
)3 dF (x)= 3 be the skewness. The skewness is a measure of
asymmetry. A Normal distribution, for example, has skewness 0. The plug-in estimate of the skewness is
b =
T (Fbn ) =
R
(x
P )3 dFbn (x) n1 ni=1 (Xi = b3 b3
X n )3
= 1:76:
To estimate the standard error with the bootstrap we follow the same steps as with the medin example except we compute the skewness from each bootstrap sample. When applied to the nerve data, the bootstrap, based on B = 1000 replications, yields a standard error for the estimated skewness of .16.
9.3 Bootstrap Con dence Intervals
133
9.3 Bootstrap Con dence Intervals There are several ways to construct bootstrap con dence intervals. Here we discuss three methods. Normal Interval. The simplest is the Normal interval
Tn z=2 seb boot
(9.2)
where seb boot is the bootstrap estimate of the standard error. This interval is not accurate unless the distribution of Tn is close to Normal. Pivotal Intervals. Let = T (F ) and bn = T (Fbn ) and de ne the
denote bootstrap replicapivot Rn = bn . Let bn; 1 ; : : : ; bn;B tions of bn . Let H (r) denote the cdf of the pivot:
H (r) = PF (Rn r): De ne Cn? = (a; b) where a = bn H 1 1 and b = bn 2 It follows that
H 1 : 2
(9.4)
= P(a bn bn b bn ) = P(bn b bn bn a) = P(bn b Rn bn a) = H (bn a) H (bn b) H H 1 = H H 1 1 2 2 = 1 = 1 : 2 2 Hence, Cn? is an exact 1 con dence interval for . Unfortunately, a and b depend on the unknown distribution H but we P(a
b)
(9.3)
134
9. The Bootstrap
can form a bootstrap estimate of H :
Hb (r) =
B 1X r) I (Rn;b B b=1
(9.5)
= b where Rn;b bn . Let r denote the sample qauntile n;b ; : : : ; R ) and let denote the sample qauntile of of (Rn; 1 n;B ). Note that r = bn . It follows that an ap(n; 1 ; : : : ; n;B proximate 1 con dence interval is Cn = (ba; bb) where b a
= bn
b b
= bn
b Hb 1 1 = n r1 =2 = 2bn 2 Hb 1 = bn r= 2 = 2bn 2
1
=2
: = 2
In summary: The 1
bootstrap pivotal con dence interval is Cn =
2bn
b
1 =2 ; 2bn
b= 2
:
Typically, this is a pointwise, asymptotic con dence interval. Theorem 9.3 Under weak conditions on T (F ), PF (T (F ) 2 Cn) !
1 as n ! 1, where Cn is given in (9.6).
Percentile Intervals. The bootstrap percentile interval is
de ned by
; Cn = = 2 1
=2
:
The justi cation for this interval is given in the appendix. Example 9.4 For estimating the skewness of the nerve data, here
are the various con dence intervals.
(9.6)
9.3 Bootstrap Con dence Intervals Method
135
95% Interval
Normal (1.44, 2.09) Percentile (1.42, 2.03) Pivotal (1.48, 2.11)
All these con dence intervals are approximate. The probability that T (F ) is in the interval is not exactly 1 . All three intervals have the same level of accuracy. There are more accurate bootstrap con dence intervals but they are more complicated and we will not discuss them here. Example 9.5 (The Plasma Cholesterol Data.) Let us return to the
cholesterol data. Suppose we are interested in the dierence of the medians. Pseudo-code for the bootstrap analysis is as follows: x1 <- first sample x2 <- second sample n1 <- length(x1) n2 <- length(x2) th.hat <- median(x2) - median(x1) B <- 1000 Tboot <- vector of length B for(i in 1:B){ xx1 <- sample of size n1 with replacement from x1 xx2 <- sample of size n2 with replacement from x2 Tboot[i] <- median(xx2) - median(xx1) } se <- sqrt(variance(Tboot)) Normal <- (th.hat - 2*se, th.hat + 2*se) percentile <- (quantile(Tboot,.025), quantile(Tboot,.975)) pivotal <- ( 2*th.hat-quantile(Tboot,.975), 2*th.hat-quantile(Tboot,.025) )
The point estimate is 18.5, the bootstrap standard error is 7.42 and the resulting approximate 95 per cent con dence intervals are as follows:
136
9. The Bootstrap Method
95% Interval
Normal (3.7, 33.3) Percentile (5.0, 33.3) Pivotal (5.0, 34.0) Since these intervals exclude 0, it appears that the second group has higher cholesterol although there is considerable uncertainty about how much higher as re ected in the width of the intervals.
The next two examples are based on small sample sizes. In practice, statistical methods based on very small sample sizes might not be reliable. We include the examples for their pedagogical value but we do want to sound a note of caution about interpreting the results with some skepticism. Example 9.6 Here is an example that was one of the rst used
to illustrate the bootstrap by Bradley Efron, the inventor of the bootstrap. The data are LSAT scores (for entrance to law school) and GPA.
LSAT
576 651
635 605
558 653
578 575
666 545
580 572
555 594
661
GPA
3.39 3.36
3.30 3.13
2.81 3.12
3.03 2.74
3.44 2.76
3.07 2.88
3.00 3.96
3.43
Each data point is of the form Xi = (Yi; Zi ) where Yi = LSATi and Zi = GPAi . The law school is interested in the correlation RR (y Y )(z Z )dF (y; z ) = qR : R (y Y )2 dF (y ) (z Z )2 dF (z ) The plug-in estimate is the sample correlation P (Y Y )(Zi Z ) b = qP i i : P 2 2 ( Y Y ) ( Z Z ) i i i i
9.3 Bootstrap Con dence Intervals
137
The estimated correlation is b = :776. The bootstrap based on B = 1000 gives seb = :137. Figure 9.1 shows the data and a histogram of the bootstrap replications b1 ; : : : ; bB . This histogram is an approximation to the sampling distribution of b. The Normalb = (:51; 1:00) based 95 per cent con dence interval is :78 2se while the percentile interval is (.46,.96). In large samples, the two methods will show closer agreement. Example 9.7 This example is borrowed from An Introduction to
the Bootstrap by B. Efron and R. Tibshirani. When drug companies introduce new medications, they are sometimes requires to show bioequivalence. This means that the new drug is not substantially dierent than the current treatment. Here are data on eight subjects who used medical patches to infuse a hormone into the blood. Each subject received three treatments: placebo, old-patch, new-patch. subject placebo old new old-placebo new-old ______________________________________________________________ 1 9243 17649 16449 8406 -1200 2 9671 12013 14614 2342 2601 3 11792 19979 17274 8187 -2705 4 13357 21816 23798 8459 1982 5 9055 13850 12560 4795 -1290 6 6290 9806 10157 3516 351 7 12412 17208 16570 4796 -638 8 18806 29044 26325 10238 -2719
Let Z = old placebo and Y = new old. The Food and Drug Administration (FDA) requirement for bioequivalence is that jj :20 where E (Y ) = F : E F (Z )
9. The Bootstrap
3.4
138
•
•
•
•
3.2
•
3.0
GPA
• •
•
•
• •
2.8
• • •
• 560
580
600
620
640
660
0
50
100
150
LSAT
0.2
0.4
0.6 Bootstrap Samples
FIGURE 9.1. Law school data.
0.8
1.0
9.4 Bibliographic Remarks
139
The estimate of is
b =
Y 452:3 = :0713: = 6342 Z
The bootstrap standard error is se b = :105. To answer the bioequivalence question, we compute a con dence interval. From B = 1000 bootstrap replications we get the 95 per cent interval is (-.24,.15). This is not quite contained in [-.20,.20] so at the 95 per cent level we have not demonstrated bioequivalence. Figure 9.2 shows the histogram of the bootstrap values.
9.4 Bibliographic Remarks The boostrap was invented by Efron (1979). There are several books on these topics including Efron and Tibshirani (1993), Davison and Hinkley (1997), Hall (1992), and Shao and Tu (1995). Also, see section 3.6 of van der Vaart and Wellner (1996).
9.5 Technical Appendix 9.5.1 The Jackknife
There is another method for computing standard errors called the jackknife, due to Quenouille (1949). It is less computationally expensive than the bootstrap but is less general. Let Tn = T (X1 ; : : : ; Xn ) be a statistic and T( i) denote the statistic Pn th 1 with the i observation removed. Let T n = n i=1 T( i) . The jackknife estimate of var(Tn ) is n n 1X (T vjack = n i=1 ( i)
T n )2
and the jackknife estimate of the standard error is se b jack = pvjack . Under suitable conditions on T , it can be shown that p vjack consistently estimates var(Tn ) in the sense that vjack =var(Tn ) !
9. The Bootstrap
0
20
40
60
80
140
-0.3
-0.2
-0.1
0.0 Bootstrap Samples
FIGURE 9.2. Patch data.
0.1
0.2
9.6 Excercises
141
1. However, unlike the bootstrap, the jackknife does not produce consistent estimates of the standard error of sample quantiles. 9.5.2 Justi cation For The Percentile Interval
Suppose there exists a monotone transformation U = m(T ) such that U N (; c2 ) where = m(). We do not suppose we ). know the transformation, only that one exist. Let Ub = m(n;b Let u be the sample quantile of the Ub 's. Since a monotone transformation preserves quantiles, we have that u=2 = ). Also, since U N (; c2 ), the =2 quantile of U is m(= 2 z=2 c. Hence u=2 = z=2 c. Similarly, u1 =2 = + z=2 c. Therefore, P(= 2
1
=2 )
= = =
m() m(1 =2 )) P(u=2 u1 =2 ) P(U cz=2 U + cz=2 ) U P( z=2 z=2 ) c ) P(m(= 2
= = 1
:
An exact normalizing transformation will rarely exist but there may exist approximate normalizing transformations.
9.6 Excercises
1. Consider the data in Example 9.6. Find the plug-in estimate of the correlation coeÆcient. Estimate the standard error using the bootstrap. Find a 95 per cent con dence inerval using all three methods. 2. (Computer Experiment.) Conduct a simulation to compare the four bootstrap con dence interval methods. Let n = R 50 and let T (F ) = (x )3 dF (x)= 3 be the skewness. Draw Y1 ; : : : ; Xn N (0; 1) and set Xi = eYi , i = 1; : : : ; n. Construct the four types of bootstrap 95 per cent intervals for T (F ) from the data X1 ; : : : ; Xn . Repeat this whole
142
9. The Bootstrap
thing many times and estimate the true coverage of the four intervals. 3. Let
X1 ; : : : ; Xn t3 where n = 25. Let = T (F ) = (q:75 q:25 )=1:34 where qp denotes the pth quantile. Do a simulation to compare the coverage and length of the following con dence intervals for : (i) Normal interval with standard error from the bootstrap, (ii) bootstrap percentile interval. Remark: The jackknife does not give a consistent estimator of the variance of a quantile.
4. Let X1 ; : : : ; Xn be distinct observations (no ties). Show that there are 2n 1 n distinct bootstrap samples. Hint: Imagine putting n balls into n buckets. 5. Let X1 ; : : : ; Xn be distinct observations (no ties). P Let X1 ; : : : ; Xn denote a bootstrap sample and let X n = n 1 ni=1 Xi . Find: E (X n jX1 ; : : : ; Xn), V (X n jX1 ; : : : ; Xn ), E (X n ) and V (X n ). 6. (Computer Experiment.) Let X1 ; :::; Xn Normal(; 1): Let = e and let b = eX be the mle. Create a data set (using = 5) consisting of n=100 observations. (a) Use the bootstrap to get the se and 95 percent con dence interval for : (b) Plot a histogram of the bootstrap replications for the parametric and nonparametric bootstraps. These are estimates of the distribution of b. Compare this to the true sampling distribution of b.
9.6 Excercises
143
7. Let X1 ; :::; Xn Unif(0; ): The mle is b = Xmax = maxfX1 ; :::; Xng. Generate a data set of size 50 with = 1: (a) Find the distribution of b. Compare the true distribution of b to the histograms from the parametric and nonparametric bootstraps. (b) This is a case where the nonparametric bootstrap does very poorly. In fact, we can prove that this is the case. Show that, for the parametric bootstrap P (b = b) = 0 but for the nonparametric bootstrap P (b = b) :632. Hint: show that, P (b = b) = 1 (1 (1=n))n then take the limit as n gets large. 2
R
8. Let Tn =PX n , = E (X1 ), k = jx bk = n 1 ni=1 jXi X n jk . Show that
jk dF (x) and
2 4X n b2 4X n b3 b4 vboot = + + 3: n n2 n
144
9. The Bootstrap
10 Parametric Inference
We now turn our attention to parametric models, that is, models of the form n
F = f (x; ) : 2
o
(10.1)
where the R k is the parameter space and = (1 ; : : : ; k ) is the parameter. The problem of inference then reduces to the problem of estimating the parameter . Students learning statistics often ask: how would we ever know that the distribution that generated the data is in some parametric model? This is an excellent question. Indeed, we would rarely hav e such knowledge which is why nonparametric methods are preferable. Still, studying methods for parametric models is useful for two reasons. First, there are some cases where background knowledge suggests that a parametric model provides a reasonable approximation. F or example, counts of traÆc accidents are known from prior experience to follow approximately a P oissonmodel. Second, the inferential concepts
146
10. Parametric Inference
for parametric models provide background for understanding certain nonparametric methods. We will begin with a brief dicussion about parameters of interest and nuisance parameters in the next section, then we will discuss two methods for estimating , the method of moments and the method of maximum likelihood.
10.1 Parameter of Interest Often, we are only interested in some function T (). For example, if X N (; 2 ) then the parameter is = (; ). If our goal is to estimate then = T () is called the parameter of interest and is called a nuisance parameter. The parameter of interest can be a complicated function of as in the following example. Example 10.1 Let X1 ; : : : ; Xn Normal(; 2). The parameter is = (; ) is = f(; ) : 2 R ; > 0g. Suppose that Xi is the outcome of a blood test and suppose we are interested in , the fraction of the population whose test score is larger than 1. Let Z denote a standard Normal random variable. Then
=
P(X
= 1
> 1) = 1
P(X
P
X
1 1 Z< =1 Z< :
P
< 1) = 1
1 <
The parameter of interest is = T (; ) = 1 ((1 )= ). Example 10.2 Recall that X has a Gamma(; ) distribution if
f (x; ; ) =
1 x 1 e ()
x= ;
where ; > 0 and
() =
Z 1
0
y 1 e y dy
x>0
10.2 The Method of Moments
147
is the Gamma function. The parameter is = (; ). The Gamma distribution is sometimes used to model lifetimes of people, animals, and electronic equipment. Suppose we want to estimate the mean lifetime. Then T (; ) = E (X1 ) = .
10.2 The Method of Moments The rst method for generating parametric estimators that we will study is called the method of moments. We will see that these estimators are not optimal but they are often easy to compute. They are are also useful as starting values for other methods that require iterative numerical routines. Suppose that the parameter = (1 ; : : : ; k ) has k components. For 1 j k, de ne the j th moment
j j () =
E (X j )
=
Z
xj dF (x)
(10.2)
and the j th sample moment n 1X Xij : bj = n i=1
(10.3)
De nition 10.3 The method of moments estimator bn
is de ned to be the value of such that
1 (bn ) 2 (bn ) .. . k (bn )
b1 b2 .. . = bk : = = .. .
(10.4)
Formula (10.4) de nes a system of k equations with k unknowns.
148
10. Parametric Inference
Example 10.4 Let X1 ; : : : ; Xn Bernoulli(p). Then 1 = E p (X ) = P
p and b1 = n 1
n i=1 Xi .
By equating these we get the estimator
n 1X pbn = X: n i=1 i
Example 10.5 Let X1 ; : : : ; Xn Normal(; 2 ). Then, 1 = E (X1 ) =
Recall that V (X ) = and 2 = E (X12 ) = V (X1 ) + (E (X1 ))2 = 2 + 2 . We need E (X 2 ) (E (X ))2 . to solve the equations Hence, E (X 2 ) = n 1X V (X ) + (E (X ))2 . b = X
n i=1 i n 1X 2 2 Xi2 : b + b = n i=1
This is a system of 2 equations with 2 unknowns. The solution is
b = X n n 1X 2 (X b = n i=1 i
X n )2 :
Theorem 10.6 Let bn denote the method of moments estimator.
Under the the conditions given in the appendix, the following statements hold: 1. The estimate bn exists with probability tending to 1. 2. The estimate is consistent: bn P! . 3. The estimate is asymptotically Normal:
pn(b ) n
where
N (0; )
= g E (Y Y T )g T ; Y = (X; X 2 ; : : : ; X k )T , g = (g1 ; : : : ; gk ) and gj = @j 1 ()=@.
10.3 Maximum Likelihood
149
The last statement in the Theorem above can be used to nd standard errors and con dence intervals. However, there is an easier way: the bootstrap. We defer discussion of this until the end of the chapter.
10.3 Maximum Likelihood The most common method for estimating parameters in a parametric model is the maximum likelihood method. Let X1 ; : : : ; Xn be iid with pdf f (x; ). De nition 10.7 The likelihood function is de ned by
Ln() =
n Y i=1
f (Xi ; ):
(10.5)
The log-likelihood function is de ned by `n () = log Ln ().
The likelihood function is just the joint density of the data, except that we treat it is a function of the parameter . Thus Ln : ! [0; 1). The likelihood function is not a density function: in general, it is not true that Ln() integrates to 1. De nition 10.8 The maximum likelihood estimator
, denoted by bn , is the value of that maximizes Ln(). mle
The maximum of `n () occurs at the same place as the maximum of Ln (), so maximizing the log-likelihood leads to the same answer as maximizing the likelihood. Often, it is easier to work with the log-likelihood. Remark 10.9 If we multiply Ln() by any positive constant c (not
depending on ) then this will not change the
mle
. Hence, we
10. Parametric Inference
0.4
0.6
Ln(p)
0.8
1.0
150
0.0
0.2
pbn 0.0
0.2
P
p
0.4
0.6
0.8
1.0
FIGURE 10.1. Likelihood function for Bernoulli with n = 20 and n X = 12. The mle is p = 12=20 = 0:6. n i=1 i
b
shall often be sloppy about dropping constants in the likelihood function. Example 10.10 Suppose that X1 ; : : : ; Xn Bernoulli(p). The prob-
ability function is f (x; p) = px(1 known parameter is p. Then,
Ln(p) = where S =
n Y
f (Xi ; p) =
i=1 P i Xi .
n Y i=1
p)1
x
pXi (1 p)1
for x = 0; 1. The unXi
= pS (1 p)n
S
Hence,
`n (p) = S log p + (n S ) log(1 p): Take the derivative of `n (p), set it equal to 0 to nd that the mle is p bn = S=n. See Figure 10.1. Example 10.11 Let X1 ; : : : ; Xn
N (; 2 ).
The parameter is = (; ) and the likelihood function is (ignoring some constants)
Ln(; )
=
Y i
1 exp
1 (X 2 2 i
)2
10.3 Maximum Likelihood (
151
)
1 X = (Xi )2 2 2 i nS 2 n(X )2 n = exp exp 2 2 2 2 P P where X = n 1 i Xi is the sample mean and S 2 = n 1P i (Xi X )2 . The last equality above follows from the fact that Pi (Xi 2 + n(X )2 which can be veri ed by writing (Xi )2 = nS i P 2 2 ) = i (Xi X + X ) and then expanding the square. The log-likelihood is nS 2 n(X )2 `(; ) = n log : 2 2 2 2 Solving the equations @`(; ) @`(; ) = 0 and =0 @ @ we conclude that b = X and b = S . It can be veri ed that these are indeed global maxima of the likelihood. n exp
Example 10.12 (A Hard Example.) Here is the example that con-
fuses everyone. Let X1 ; : : : ; Xn Unif (0; ). Recall that 1 x f (x; ) = 0 0otherwise : Consider a xed value of . Suppose < Xi for some i. Then, Q f (Xi ; ) = 0 and hence Ln () = i f (Xi ; ) = 0. It follows that Ln() = 0 if any Xi > . Therefore, Ln() = 0 if < X(n) where X(n) = maxfX1 ; : : : ; Xn g. Now consider any X(n) . For every Xi we then have that f (Xi; ) = 1= so that Ln () = Q n i f (Xi ; ) = . In conclusion,
Ln() =
1 n X(n) 0 < X(n) :
See Figure 10.2. Now Ln() is strictly decreasing over the interval [X(n) ; 1). Hence, bn = X(n) .
0.5 0.0 0.0
0.5
1.0
x
1.5
2.0
0.0
0.5
x
1.0
=1
1.5
2.0
1.0 0.0
0.0
0.5
0.5
1.0
Ln()
1.5
1.5
= :75
f (x; )
1.0
f (x ; )
1.0 0.0
0.5
f (x ; )
1.5
10. Parametric Inference 1.5
152
0.0
0.5
1.0
= 1:25
x
1.5
2.0
0.6
0.8
1.0
1.2
1.4
FIGURE 10.2. Likelihood function for Uniform (0; ). The vertical lines show the observed data. The rst three plots show f (x; ) for three dierent values of . When < X(n) = maxfX1 ; : : : ; Xn g, as in the rst plot, f (X(n) ; ) = 0 and hence Ln () = ni=1 f (Xi ; ) = 0. Otherwise f (Xi ; ) = 1= for each i and hence Ln () = ni=1 f (Xi ; ) = (1=)n . The last plot shows the likelihood function.
Q
Q
10.4 Properties of Maximum Likelihood Estimators.
153
10.4 Properties of Maximum Likelihood Estimators. Under certain conditions on the model, the maximum likelihood estimator bn possesses many properties that make it an appealing choice of estimator. The main properties of the mle are: (1) It is consistent: bn P! ? where ? denotes the true value of the parameter ; (2) It is equivariant: if bn is the mle of then g (bn ) is the mle of g ( ); p (3) It is asymptotically Normal: n(b ? )=seb N (0; 1) where seb can be computed analytically; (4) It is asymptotically optimal or eÆcient: roughly, this means that among all well behaved estimators, the mle has the smallest variance, at least for large samples. (5) The mle is approximately the Bayes estimator. (To be explained later.) We will spend some time explaining what these properties mean and why they are good things. In suÆciently complicated problems, these properties will no longer hold and the mle will no longer be a good estimator. For now we focus on the simpler situations where the mle works well. The properties we discuss only hold if the model satis es certain regularity conditions. These are essentially smoothness conditions on f (x; ). Unless otherwise stated, we shall tacitly assume that these conditions hold.
10.5 Consistency of Maximum Likelihood Estimators. Consistency means that the mle converges in probability to the true value. To proceed, we need a de nition. If f and g are
154
10. Parametric Inference
's, de ne the Kullback-Leibler distance between f and g to be Z f (x) dx: (10.6) D(f; g ) = f (x) log g (x) This is not a dis- It can be shown that D(f; g ) 0 and D(f; f ) = 0. For any tance in the formal ; 2 write D(; ) to mean D(f (x; ); f (x; )). We will sense. assume that 6= implies that D(; ) > 0. Let ? denote the true value of . Maximizing `n () is equivalent to maximizing pdf
Mn () =
f (Xi ; ) 1X log : n i f (Xi ; ? )
By the law of large numbers, Mn () converges to
E ?
f (Xi ; ) log f (Xi ; ? )
= = =
Z
f (x; ) log f (x; ? )dx f (x; ? ) Z f (x; ? ) log f (x; ? )dx f (x; ) D(? ; ):
Hence, Mn () D(? ; ) which is maximized at ? since D(? ; ? ) = 0 and D(? ; ? ) < 0 for 6= ? . Hence, we expect that the maximizer will tend to ? . To prove this formally, we need more than Mn () P! D(? ; ). We need this convergence to be uniform over . We also have to make sure that the function D(? ; ) is well behaved. Here are the formal details. Theorem 10.13 Let ? denote the true value of . De ne
Mn () =
f (Xi ; ) 1X log n i f (Xi ; ? )
and M () = D(? ; ). Suppose that
sup jMn () M ()j P! 0 2
(10.7)
10.6 Equivariance of the mle
155
and that, for every > 0,
sup M () < M (? ):
:j ?j
(10.8)
Let bn denote the mle. Then bn P! ? .
. See appendix.
Proof
10.6 Equivariance of the mle Theorem 10.14 Let = g () be a one-to-one function of . Let
bn be the mle of . Then bn = g (bn) is the mle of .
Let h = gQ1 denote the inverse of g . Then bn = h(bn ). Q For any , L( ) = i f (xi ; h( )) = i f (xi ; ) = L() where = h( ). Hence, for ana , Ln ( ) = L() L(b) = Ln (b). Proof.
Example 10.15 Let X1 ; : : : ; Xn N (; 1). The mle for is bn =
X n . Let = e . Then, the mle for is b = eb = eX .
10.7 Asymptotic Normality It turns out that bn is approximately Normal and we can compute its variance analytically. To explore this, we rst need a few de nitions.
156
10. Parametric Inference De nition 10.16 The score function is de ned to be
@ log f (X ; ) : @ The Fisher information is de ned to be s(X ; ) =
In () = =
V n X i=1
n X i=1
(10.9)
!
s(Xi ; )
V (s(Xi ; )) :
(10.10)
For n = 1 we will sometimes write I () instead of I1 (). It can be shown that E (s(X ; )) = 0. It then follows that V (s(X ; )) = E (s2 (X ; )). In fact a further simpli cation of In () is given in the next result. Theorem 10.17 In () = nI () and
I () = =
2 @ log f (X ; )
E @ 2 2 Z 2 @ log f (x; )
@ 2 2
f (x; )dx:
(10.11)
10.7 Asymptotic Normality Theorem 10.18 (Asymptotic Normality of the
priate regularity conditions, the following hold: 1. Let se =
mle
.) Under appro-
p
1=In (). Then, (bn
se
2. Let seb =
157
q
)
N (0; 1):
(10.12)
N (0; 1):
(10.13)
1=In (bn ). Then, (bn
b se
)
The proof is in the appendix. The rst statementpsays that N (; se ) where the standard error of bn is se = 1=In(). The second statement says that this is still true even if we replace the standard error by its estimated standard error seb = q 1=In(bn ). Informally, the theorem says that the distribution of the mle can 2 be approximated with N (; se b ). From this fact we can construct an (asymptotic) con dence interval.
bn
Theorem 10.19 Let
Cn = bn Then, P ( 2 Cn ) ! 1 Proof.
Then,
P (
z=2 seb ; bn + z=2 seb : as n ! 1.
Let Z denote a standard normal random variable.
2 Cn)
= =
P bn
P
bn + z=2 seb
z=2 seb z=2
bn
b se
z=2
!
158
10. Parametric Inference
!
P(
z=2 < Z < z=2 ) = 1 :
For = :05, z=2 = 1:96 2, so:
bn 2 seb is an approximate 95 per cent con dence interval. When you read an opinion poll in the newspaper, you often see a statement like: the poll is accurate to within one point, 95 per cent of the time. They are simply giving a 95 per cent con dence interval of the form bn 2seb . Example 10.20 Let X1 ; : : : ; Xn P
Bernoulli(p). The mle is pbn =
and f (x; p) = p)1 x , log f (x; p) = x log p + (1 x) log(1 p), s(X ; p) = (x=p) (1 x)=(1 p), and s0 (X ; p) = (x=p2 ) + (1 x)=(1 p)2 . Thus, i Xi =n
px (1
p (1 p) 1 I (p) = E p ( s0 (X ; p)) = 2 + = : 2 p (1 p) p(1 p) Hence,
1 b = p se = In (pbn )
1 pb(1 pb) 1=2 p = : n nI (pbn )
An approximate 95 per cent con dence interval is
pb(1 pb) 1=2 pbn 2 : n
Compare this with the Hoeding interval. Example 10.21 Let X1 ; : : : ; Xn
N (; 2 ) where 2
is known. The score function is s(X ; ) = (X )= 2 and s0 (X ; ) = 1= 2 so that I1 () = 1= 2 . The mle is bn = X n . According to Theorem 10.18, X n N (; 2 =n). In fact, in this case, the distribution is exact.
(10.14)
10.8 Optimality
159
Poisson(). Then bn = X n and some calculations show that I1 () = 1=, so
Example 10.22 Let X1 ; : : : ; Xn
s
1
b = q se
nI1 (bn )
Therefore,qan approximate 1 bn z=2 bn =n.
=
bn : n
con dence interval for is
10.8 Optimality Suppose that X1 ; : : : ; Xn N (; 2 ). The mle is bn = X n . Another reasonable estimator is the sample median en . The mle satis es pn(b ) N (0; 2 ): n It can be proved that the median satis es pn(e ) N 0; 2 : n 2 This means that the median converges to the right value but has a larger variance than the mle . More generally, consider two estimators Tn and Un and suppose that pn(T ) N (0; t2 ) n and that
pn(U
) N (0; u2 ): We de ne the asymptotic relative eÆciency of U to T by ARE (U; T ) = t2 =u2 . In the Normal example, ARE (en ; bn ) = 2= = :63. The interpretation is that if you use the median, you are only using about 63 per cent of the available data. Theorem 10.23 If bn is the mle and en is any other estimator then The result is acn
tually more subtle than this but we needn't worry about the details.
160
10. Parametric Inference
ARE (en ; bn ) 1: Thus, the mle has the smallest (asymptotic) variance and we say that mle is eÆcient or asymptotically optimal.
This result is predicated upon the assumed model being correct. If the model is wrong, the mle may no longer be optimal. We will discuss optimality in more generality when we discuss decision theory.
10.9 The Delta Method. Let = g () where g is a smooth function. The maximum likelihood estimator of is b = g (b). Now we address the following question: what is the distribution of b? Theorem 10.24 (The Delta Method) If = g () where g is dier-
entiable and g 0() 6= 0 then
pn(b
n b (b) se
)
N (0; 1)
(10.15)
b (bn ) = jg 0 (b)j se b (bn ) se
(10.16)
where bn = g (bn ) and
Hence, if then P (
Cn = bn
z=2 seb (bn ); bn + z=2 seb (bn )
2 Cn) ! 1 as n ! 1.
Example 10.25 Let X1 ; : : : ; Xn Bernoulli(p) and let
= g (p) = log(p=(1 p)). The Fisher information function is I (p) = 1=(p(1 p)) so the standard error of the mle pbn is se = fpbn (1 pbn )=ng1=2 .
(10.17)
10.9 The Delta Method.
161
The mle of is b = log pb=(1 pb). Since, g 0(p) = 1=(p=(1 p)), according to the delta method 1 b ( bn ) = jg 0 (pbn )jse b (pbn ) = p : se npbn (1 pbn ) An approximate 95 per cent con dence interval is 2 bn p : npbn (1 pbn ) Example 10.26 Let X1 ; : : : ; Xn
N (; 2).
Suppose that is known, is unknown and that we want to P estimate = log . 1 The log-likelihood is `( ) = n log 22 i (xi )2 . Dierentiate and set equal to 0 and conclude that
b =
P i (Xi
n
)2 1=2
:
To get the standard error we need the Fisher information. First,
log f (X ; ) = log with second derivative
1 2 and hence
(X )2 2 2
3(X )2 4
1 3 2 2 + = : 2 4 2 p Hence, seb = bn = 2n: Let = g ( ) = log( ). Then, log bn . Since, g 0 = 1= ,
I ( ) =
b ( bn ) = se
bn
=
bn
1 b 1 p =p b 2n 2n
andp an approximate 95 per cent con dence interval is 2= 2n.
162
10. Parametric Inference
10.10 Multiparameter Models These ideas can directly be extended to models with several parameters. LetP = (1 ; : : : ; k ) and let b = (b1 ; : : : ; bk ) be the n mle . Let `n = i=1 log f (Xi ; ),
Hjj =
@ 2 `n @ 2 `n and H = : jk @j2 @j @k
De ne the Fisher Information Matrix by 2
In () =
E (H11 ) E (H12 ) 6 E (H ) E (H ) 22 6 21 6 . .. . 4 . . E (Hk1 ) E (Hk2 )
.. .
3 E (H1k ) E (H2k ) 7 7 7: .. 5 . E (Hkk )
(10.18)
Let Jn () = In 1 () be the inverse of In . Theorem 10.27 Under appropriate regularity conditions,
p
n(b ) N (0; Jn ):
Also, if bj is the j th component of b, then
pn(b ) j j bj se
N (0; 1)
where seb 2j = Jn (j; j ) is the j th diagonal element of Jn . The approximate covariance of bj and bk is Cov(bj ; bk ) Jn (j; k).
There is also a multiparameter delta method. Let = g (1 ; : : : ; k ) be a function and let 0 1 @g B @ C B C B .. C rg = B C . B C @ @g A @k 1
be the gradient of g .
10.11 The Parametric Bootstrap
163
Theorem 10.28 (Multiparameter delta method) Suppose that
rg evaluated at b is not 0. Let b = g(b). Then pn(b ) N (0; 1)
b (b) se
where
b (b) = se
q
b g )T Jbn (r b g ); (r
(10.19)
b g is rg evaluated at = b. Jbn = Jn (bn ) and r
Example 10.29 Let X1 ; : : : ; Xn
N (; 2 ). Let
=. In homework question 8 you will show that n In (; ) = 2
= g (; ) =
0
0 2n2 :
Hence,
2 0 2 n 0 2
1 Jn = In 1 (; ) =
The gradient of g is
rg =
! 2
1
:
:
Thus, b (b) = se
q
r
r
( b g )T Jbn ( b g ) =
1 1 b2 1=2 pn b4 + 2b2 :
10.11 The Parametric Bootstrap For parametric models, standard errors and con dence intervals may also be estimated using the bootstrap. There is only one change. In the nonparametric bootstrap, we sampled
164
10. Parametric Inference
X1 ; : : : ; Xn from the empirical distribution Fbn . In the parametric bootstrap we sample instead from f (x; bn ). Here, bn could be the mle or the method of moments estimator. Example 10.30 Consider example 10.29. To get to bootstrap standardPerror, simulate X1 ; : P : : ; Xn N (b; b2 ), compute b = n 1 i Xi and b2 = n 1 i (Xi b )2 . Then compute b = g (b; b ) = b =b . Repeating this B times yields bootstrap replications b1 ; : : : ; bB and the estimated standard error is s
b boot = se
PB b=1 (bb
B
b)2
:
The bootstrap is much easier than the delta method. On the other hand, the delta method has the advantage that it gives a closed form expression for the standard error.
10.12 Technical Appendix 10.12.1 Proofs Proof of Theorem 10.13.
have Mn (bn ) Mn (? ). Hence,
M (? ) M (bn )
=
P
!
Since bn maximizes Mn (), we
Mn (? ) M (bn ) + M (? ) Mn (? ) Mn (b) M (bn ) + M (? ) Mn (? ) sup jMn () M ()j + M (? ) Mn (? )
0
where the last line follows from (10.7). It follows that, for any Æ > 0, P M (bn ) < M (? ) Æ ! 0:
10.12 Technical Appendix
165
Pick any > 0. By (10.8), there exists Æ > 0 such that j ? j implies that M () < M (? ) Æ . Hence,
j
? j > ) P M (bn ) < M (? ) Æ
P( bn
! 0:
Next we want to prove Theorem 10.18. First we need a Lemma. Lemma 10.31 The score function satis es E [s(X ; )] = 0: R
Note that 1 = f (x; )dx. Dierentiate both sides of this equation to conclude that Proof.
@ 0 = @ = =
Z
Z
Z
Z
@ f (x; )dx @ Z @f (x;) @ log f (x; ) @ f (x; )dx = f (x; )dx f ( x; ) @ f (x; )dx =
s(x; )f (x; )dx = E s(X ; ):
Proof of Theorem 10.18.
Let `() = log L(). Then,
0 = `0 (b) `0 () + (b )`00 (): Rearrange the above equation to get b = `0 ()=`00 () or, in other words,
pn(b ) =
p1n `0 () 1 `00 () n
TOP BOTTOM :
Let Yi = @ log f (Xi ; )=@. Recall that E (Yi ) = 0 from the previous Lemma and also V (Yi ) = I (). Hence, X p p TOP = n 1=2 Yi = nY = n(Y i
0)
W
N (0; I )
166
10. Parametric Inference
@ 2 log f (Xi; )=@2 .
by the central limit theorem. Let Ai = Then E (Ai ) = I () and BOTTOM = A P! I ()
by the law of large numbers. Apply Theorem 6.5 part (e), to conclude that p b W d 1 n( ) = N 0; : I () I () Assuming that I () is a continuous function of , it follows that I (bn ) P! I (). Now
bn = se b =
pnI 1=2 (b )(b n
n
n
)
(
pnI 1=2 ()(b )o I (bn ) n I ()
) 1 =2
:
The rst terms tends in distribution to N(0,1). The second term tends in probability to 1. The result follows from Theorem 6.5 part (e). Outline of Proof of Theorem 10.24. Write,
b = g (b) g () + (b )g 0 () = + (b )g 0(): Thus, and hence
pn(b ) pn(b )g0() p
nI ()(b ) g 0 ( )
p
nI ()(b ):
Theorem 10.18 tells us that the right hand side tends in distribution to a N(0,1). Hence, p
nI ()(b ) g 0()
N (0; 1)
10.12 Technical Appendix
or, in other words, where
167
b N ; se 2 (bn )
(g 0())2 : nI () The result remains true if we substitute b for by Theorem 6.5 part (e). se 2 (bn ) =
10.12.2 SuÆciency
A statistic is a function T (X n) of the data. A suÆcient statistic is a statistic that contains all the information in the data. To make this more formal, we need some de ntions. De nition 10.32 Write xn
$ yn if f (xn; ) = c f (yn; )
for some constant c that might depend on xn and y n but not . A statistic T (xn ) is suÆcient if T (xn ) $ T (y n) implies that xn $ y n.
Notice that if xn $ y n then the likelihood function based on xn has the same shape as the likelihood function based on y n. Roughly speaking, a statistic is suÆcient if we can calculate the likelihood function knowing only T (X n ). Example 10.33 Let X1 ; : : : ; Xn
Example 10.34 Let X1 ; : : : ; Xn
N (; ) and let T
pS (1
p)n S
where S =
Then, f (X n; ; ) =
Y i Y
P
Bernoulli(p). Then L(p) = i Xi so S is suÆcient.
= (X; S ).
f (Xi ; ; ) (
p1 exp = i 2 n 1 p = exp 2
)
1 X (Xi )2 2 2 i nS 2 n(X )2 exp 2 2 2 2
168
10. Parametric Inference
The last expression depends on the data only through T and therefore, T = (X; S ) is a suÆcient statistic. Note that U = (17 X; S ) is also a suÆcient statistic. If I tell you the value of U then you can easily gure out T and then compute the likelihood. SuÆcient statistics are far from unique. Consider the following statistics for the N (; 2 ) model:
T1 (X n ) T2 (X n ) T3 (X n ) T4 (X n )
= = = =
(X1 ; : : : ; Xn) (X; S ) X (X; S; X3 ):
The rst statistic is just the whole data set. This is suÆcient. The second is also suÆcient as we proved above. The third is not suÆcient: you can't compute L(; ) if I only tell you X . The fourth statistic T4 is suÆcient. The statistics T1 and T4 are suÆcient but they contain redundant information. Intuitively, there is a sense in which T2 is a \more concise" suÆcient statistic than either T1 or T4 . We can express this formally by noting that T2 is a function of T1 and similarly, T2 is a function of T4 . For example, T2 = g (T4 ) where g (a1 ; a2 ; a3 ) = (a1 ; a2 ).
De nition 10.35 A statistic T is minimally suÆcient if
(i) it is suÆcient and (ii) it is a function of every other suÆcient statistic.
Theorem 10.36 T is minimal suÆcient if T (xn ) = T (y n) if and
only if xn $ y n .
A statistic induces a partition on the set if outcomes. We can think of suÆciency in terms of these partitions.
10.12 Technical Appendix Example 10.37 Let X1 ; X2 ; X3 P
169
Bernoulli().
Let V = X1 , T = i Xi and U = (T; X1 ). Here is the set of outcomes and the statistics: X1 X2 V T U 0 0 0 0 (0,0) 0 1 0 1 (1,0) 1 0 1 1 (1,1) 1 1 1 2 (2,1) The partitions induced by these statistics are:
V T U
! f(0; 0); (0; 1)g; f(1; 0); (1; 1)g ! f(0; 0)g; f(0; 1); (1; 0)g; f(1; 1)g ! f(0; 0)g; f(0; 1)g; f(1; 0)g; f(1; 1)g:
Then V is not suÆcient but T and U are suÆcient. T is minimal suÆcient; U is not minimal since if xn = (1; 0; 1) and y n = (0; 1; 1), then xn $ y n yet U (xn ) 6= U (y n ). The statistic W = 17 T generates the same partition as T . It is also minimal suÆcient. Example 10.38 For a N (; 2 ) model, T = (X; S ) is a minimally P
suÆcient statistic. For the Bernoulli model, T = i XP i is a minimally suÆcient statistic. For the Poisson model, TP= i Xi is a minimally suÆcient statistic. Check that T = ( i Xi ; X1 ) is suÆcient but not minimal suÆcient. Check that T = X1 is not suÆcient.
I did not give the usual de nition of suÆciency. The usual de nition is this: T is suÆcient if the distribution of X n given T (X n ) = t does not depend on . Example 10.39 Two coin ips. Let X = (X1 ; X2 ) Bernoulli(p).
Then T = X1 + X2 is suÆcient. To see this, we need the distribution of (X1 ; X2 ) given T = t. Since T can take 3 possible values, there are 3 conditional distributions to check. They are:
170
10. Parametric Inference
(i) the distribution of (X1 ; X2 ) given T = 0:
P (X1 = 0; X2 = 0jt = 0) = 1; P (X1 = 0; X2 = 1jt = 0) = 0; P (X1 = 1; X2 = 0jt = 0) = 0; P (X1 = 1; X2 = 1jt = 0) = 0 (ii) the distribution of (X1 ; X2 ) given T = 1: 1 P (X1 = 0; X2 = 0jt = 1) = 0; P (X1 = 0; X2 = 1jt = 1) = ; 2 1 P (X1 = 1; X2 = 0jt = 1) = ; P (X1 = 1; X2 = 1jt = 1) = 0 2 (iii) the distribution of (X1 ; X2 ) given T = 2: P (X1 = 0; X2 = 0jt = 2) = 0; P (X1 = 0; X2 = 1jt = 2) = 0; P (X1 = 1; X2 = 0jt = 2) = 0; P (X1 = 1; X2 = 1jt = 2) = 1: None of these depend on the parameter p. Thus, the distribution of X1 ; X2 jT does not depend on so T is suÆcient. Theorem 10.40 (Factorization Theorem) T is suÆcient if and
only if there are functions g (t; ) and h(x) such that f (xn ; ) = g (t(xn ); )h(xn ). Example 10.41 Return to the two coin ips. Let t = x1 + x2 .
Then
f (x1 ; x2 ; ) = f (x1 ; )f (x2 ; ) = x1 (1 )1 x1 x2 (1 )1 = g (t; )h(x1 ; x2 ) where g (t; ) = t (1 )2 X1 + X2 is suÆcient.
t
x2
and h(x1 ; x2 ) = 1. Therefore, T =
Now we discuss an implication of suÆciency in point estimation. Let b be an estimator of . The Rao-Blackwell theorem says that an estimator should only depend on the suÆcient statistic, otherwise it can be improved. Let R(; b) = E [( b)2 ] denote the mse of the estimator.
10.12 Technical Appendix
171
Theorem 10.42 (Rao-Blackwell) Let b be an estimator and
let T be a suÆcient statistic. De ne a new estimator by
e = E (bjT ): Then, for every , R(; e) R(; b): Example 10.43 Consider ipping a coin twice. Let b = X1 . This
is a well de ned (and unbiased) estimator. But it is not a function of the suÆcient statistic T = X1 + X2 . However, note that e = E (X1 jT ) = (X1 + X2 )=2. By the Rao-Blackwell Theorem, e has MSE at least as small as b = X1 . The P same applies b with n coin ips. Again de ne = X1 and T = i Xi . Then P 1 e = E (X1 jT ) = n i Xi has improved mse . 10.12.3 Exponential Families
Most of the parametric models we have studied so far are special cases of a general class of models called exponential families. We say that ff (x; ; 2 g is a one-parameter exponential family if there are functions (), B (), T (x) and h(x) such that f (x; ) = h(x)e()T (x) B() : It is easy to see that T (X ) is suÆcient. We call T the natural suÆcient statistic. Example 10.44 Let X Poisson(). Then
x e 1 x log = e x! x! and hence, this is an exponential family with () = log , B () = , T (x) = x, h(x) = 1=x!. f (x; ) =
Example 10.45 Let X
Bin(n; ). Then
n x f (x; ) = (1 )n x
x
n = exp x log + n log(1 ) : x 1
172
10. Parametric Inference
In this case,
() = log and
1
; B () = n log()
T (x) = x; h(x) =
n : x
We can rewrite an exponential family as
f (x; ) = h(x)eT (x) A() where = () is called the natural parameter and
A( ) = log
Z
h(x)eT (x) dx:
For example a Poisson can be written as f (x; ) = ex e =x! where the natural parameter is = log . Let X1 ; : : : ; Xn be iid from a exponential family. Then f (xn ; ) is an exponential family: n f (xn ; ) = hn (xn )hn (xn )e()Tn (x )
Q
Bn ()
P
where hn (xn ) = i h(xi P ), Tn (xn ) = i T (xi ) and Bn () = nB (). This implies that i T (Xi ) is suÆcient. Example 10.46 Let X1 ; : : : ; Xn Uniform(0; ). Then
f (xn ; ) =
1 I (x ) n (n)
where I is 1 if the term inside the brackets is true and 0 otherwise, and x(n) = maxfx1 ; : : : ; xn gP . Thus T (X n ) = maxfX1 ; : : : ; Xn g is suÆcient. But since T (X n ) 6= i T (Xi ), this cannot be an exponential family.
10.12 Technical Appendix
173
Theorem 10.47 Let X have density in an exponential family.
Then,
E (T (X )) = A0 ( ); V (T (X )) = A00 ( ):
If = (1 ; : : : ; k ) is a vector, then we say that f (x; ) has exponential family form if
f (x; ) = h(x) exp
( k X j =1
)
j ()Tj (x) B () :
Again, T = (T1 ; : : : ; Tk ) is suÆcient and P n iid samples also has P exponential form with suÆcient statistic ( i T1 (Xi ); : : : ; i Tk (Xi )). Example 10.48 Consider the normal family with = (; ). Now,
f (x; ) = exp 2 x
x2 2 2
1 2 + log(2 2 ) 2 2
:
This is exponential with
1 () = 2 ; T1 (x) = x 1 ; T (x) = x2 2 () = 2 2 2 1 2 2 B () = + log(2 ) ; h(x) = 1: 2 2 P P Hence, with n iid samples, ( i Xi ; i Xi2 ) is suÆcient. As before we can write an exponential family as
f (x; ) = h(x) exp T T (x)
A( )
R
where A( ) = h(x)eT T (x) dx. It can be shown that E (T (X )) = A_ ( ) V (T (X )) = A( )
where the rst expression is the vector of partial derivatives and the second is the matrix of second derivatives.
174
10. Parametric Inference
10.13 Exercises 1. Let X1 ; : : : ; Xn Gamma(; ). Find the method of moments estimator for and . 2. Let X1 ; : : : ; Xn Uniform(a; b) where a and b are unknown parameters and a < b. (a) Find the method of moments estimators for a and b. (b) Find the mle ba and bb. R
(c) Let = xdF (x). Find the mle of . (d) Let b be the mle from (1bc).R Let e be the nonparametric plug-in estimator of = xdF (x). Suppose that a = 1, b = 3 and n = 10. Find the mse of b by simulation. Find the mse of e analytically. Compare. 3. Let X1 ; : : : ; Xn N (; 2 ). Let be the .95 percentile, i.e. P(X < ) = :95. (a) Find the mle of . (b) Find an expression for an approximate 1 con dence interval for . (c) Suppose the data are: 3.23 -2.50 1.88 -0.68 4.43 1.03 -0.07 -0.01 0.76 1.76 0.33 -0.31 0.30 -0.61 1.52 1.54 2.28 0.42 2.33 -1.03 0.39
0.17 3.18 5.43 4.00
Find the mle b. Find the standard error using the delta method. Find the standard error using the parametric bootstrap.
10.13 Exercises
175
4. Let X1 ; : : : ; Xn Uniform(0; ). Show that the mle is consistent. Hint: Let Y = maxfX1 ; :::; Xn g:. For any c, P(Y < c) = P(X1 < c; X2 < c; :::; Xn < c) = P(X1 < c)P(X2 < c):::P(Xn < c). 5. Let X1 ; : : : ; Xn Poisson(). Find the method of moments estimator, the maximum likelihood estimator and the Fisher information I (). 6. Let X1 ; :::; Xn N (; 1). De ne
Yi =
1 if Xi > 0 0 if Xi 0:
Let = P(Y1 = 1): (a) Find the maximum likelihood estimate b of . (b) Find an approximate 95 per cent con dence interval for : P (c) De ne e = (1=n) i Yi . Show that e is a consistent estimator of . (d) Compute the asymptotic relative eÆciency of e to b. Hint: Use the delta method to get the standard error of the mle . Then compute the standard error (i.e. the standard deviation) of e. (e) Suppose that the data are not really normal. Show that b is not consistent. What, if anything, does b converge to? 7. (Comparing two treatments.) n1 people are given treatment 1 and n2 people are given treatment 2. Let X1 be the number of people on treatment 1 who respond favorably to the treatment and let X2 be the number of people on treatment 2 who respond favorably. Assume that X1 Binomial(n1 ; p1 ) X2 Binomial(n2 ; p2 ). Let = p1 p2 .
176
10. Parametric Inference
(a) Find the mle for . (b) Find the Fisher information matrix I (P1 ; p2 ). (c) Use the multiparameter delta method to nd the asymptotic standard error of b. (d) Suppose that n1 = n2 = 200, X1 = 160 and X2 = 148. Find b. Find an approximate 90 percent con dence interval for using (i) the delta method and (ii) the parametric bootstrap. 8. Find the Fisher information matrix for Example 10.29. 9. Let X1 ; :::; Xn Normal(; 1): Let = e and let b = eX be the mle. Create a data set (using = 5) consisting of n=100 observations. (a) Use the delta method to get seb and 95 percent con dence interval for . Use the parametric bootstrap to get b and 95 percent con dence interval for . Use the nonse parametric bootstrap to get seb and 95 percent con dence interval for : Compare your answers. (b) Plot a histogram of the bootstrap replications for the parametric and nonparametric bootstraps. These are estimates of the distribution of b. The delta method also gives b se 2 ). an approximation to this distribution namely, Normal(; Compare these to the true sampling distribution of b (whihc you can get by simulation). Which approximation, parametric bootstrap, bootstrap, or delta method is closer to the true distribution? 10. Let X1 ; :::; Xn Unif(0; ): The mle is b = X(n) = maxfX1 ; :::; Xng. Generate a data set of size 50 with = 1: (a) Find the distribution of b analytically. Compare the true distribution of b to the histograms from the parametric and nonparametric bootstraps.
10.13 Exercises
177
(b) This is a case where the nonparametric bootstrap does very poorly. Show that, for the parametric bootstrap P(b = b) = 0 but for the nonparametric bootstrap P(b = b) :632. Hint: show that, P(b = b) = 1 (1 (1=n))n then take the limit as n gets large. What is the implication of this?
178
10. Parametric Inference
11 Hypothesis Testing and p-values
Suppose we want to know if a certain chemical causes cancer. We take some rats and randomly divide them into two groups. We expose one group to the chemical and then we compare the cancer rate in the two groups. Consider the following two h ypotheses:
The Null Hypothesis: The cancer rate is the same in the two groups The Alternative Hypothesis: The cancer rate is not the same in the two groups. If the exposed group has a much higher rate of cancer than the unexposed group then we will reject the null hypothesis and conclude that the evidence fa vors the alternative h ypothesis; in other words we will conclude that there is evidence that the chemical causes cancer. This is an example of hypothesis testing. More formally, suppose that we partition the parameter space into two disjoint sets 0 and 1 and that we wish to test
H0 : 2 0 v ersus H1 : 2 1 :
(11.1)
180
11. Hypothesis Testing and p-values
We call H0 the null hypothesis and H1 the alternative hypothesis. Let X be a random variable and let X be the range of X . We test a hypothesis by nding an appropriate subset of outcomes R X called the rejection region. If X 2 R we reject the null hypothesis, otherwise, we do not reject the null hypothesis:
X 2 R =) reject H0 X 2= R =) retain (do not reject) H0 Usually the rejection region R is of the form n
o
R = x : T (x) > c
where T is a test statistic and c is a critical value. The problem in hypothesis testing is to nd an appropriate test statistic T and an appropriate cuto value c.
Warning! There is a tendency to use hypthesis testing methods even when they are not appropriate. Often, estimation and con dence intervals are better tools. Use hypothesis testing only when you want to test a well de ned hypothesis. Hypothesis testing is like a legal trial. We asume someone is innocent unless the evidence strongly suggests that he is guilty. Similarly, we retain H0 unless there is strong evidence to reject H0 . There are two types of errors when can make. Rejecting H0 when H0 is true is called a type I error. Rejecting H1 when H1 is true is called a type II error. The possible outcomes for hypothesis testing are summarized in the Table below:
11. Hypothesis Testing and p-values
Retain Null p
H0 true H1 true type II error
181
Reject Null type p I error
Summary of outcomes of hypothesis testing. De nition 11.1 The power function of a test with rejection re-
gion R is de ned by
() = P (X 2 R):
(11.2)
The size of a test is de ned to be
= sup (): 20
(11.3)
A test is said to have level if its size is less than or equal to .
A hypothesis of the form = 0 is called a simple hypothesis. A hypothesis of the form > 0 or < 0 is called a composite hypothesis. A test of the form
H0 : = 0 versus H1 : 6= 0 is called a two-sided test. A test of the form
H0 : 0 versus H1 : > 0 or
H0 : 0 versus H1 : < 0
is called a one-sided test. The most common tests are twosided.
182
11. Hypothesis Testing and p-values
N (; ) where is known. We want to test H0 : 0 versus H1 : > 0. Hence, 0 = ( 1; 0] and 1 = (0; 1). Consider the test:
Example 11.2 Let X1 ; : : : ; Xn
reject H0 if T > c
n
o
where T = X . The rejection region is R = xn : T (xn ) > c . Let Z denote a standard normal random variable. The power function is
() = P X > c p pn(c ) n(X ) = P > pn(c ) = P Z> p n(c ) : = 1 This function is increasing in . Hence
size = sup () = (0) = 1 <0
pnc
:
To get a size test, set this equal to and solve for c to get
c=
1 (1 ) pn :
p
So we reject when X > 1 (1 )= n. Equivalently, we reject when pn (X 0) > z : Finding most powerful tests is hard and, in many cases, most powerful tests don't even exist. Instead of searching for most powerful tests, we'll just consider three widely used tests: the Wald test, the 2 test and the permutation test. A fourth test, the likelihood ratio test, is dicussed in the appendix.
11.1 The Wald Test
183
11.1 The Wald Test Let be a scalar parameter, let b be an estimate of and let b be the estimated standard error of b. se The test is named
after Abraham Wald (1902-1950), who was a very in uential mathematical statistician.
De nition 11.3 The Wald Test
Consider testing
H0 : = 0 versus H1 : 6= 0 : Assume that b is asymptotically Normal:
pn(b ) 0
N (0; 1):
b se
The size Wald test is: reject H0 when jW j > z=2 where b 0 W= : se b Theorem 11.4 Asymptotically, the Wald test has size , that is,
jZ j > z=2 !
P0
as n ! 1.
Under = 0 , (b 0 )=seb N (0; 1). Hence, the probability of rejecting when the null = 0 is true is Proof.
P0
jW j > z=2
=
P0
!
P
=
jb 0 j > z =2 b se
jN (0; 1)j > z=2 :
!
Remark 11.5 Most texts de ne the Wald test slightly dierently.
They use the standard error computed at = 0 rather then at the estimated value b. Both versions are valid.
184
11. Hypothesis Testing and p-values
Let us consider the power of the Wald test when the null hypothesis is false. Theorem 11.6 Suppose the true value of is ? = 6 0 . The power
(? ) { the probability of correctly rejecting the null hypothesis { is given (approximately) by 1
0
b se
?
+ z=2 +
0
b se
?
z=2 :
(11.4)
Recall that seb tends to 0 as the sample size increases. Inspecting (11.4) closely we note that: (i) the power is large if ? is far from 0 and (ii) the power is large if the sample size is large. Example 11.7 (Comparing Two Prediction Algorithms) We test a
prediction algorithm on a test set of size m and we test a second prediction algorithm on a second test set of size n. Let X be the number of incorrect predictions for algorithm 1 and let Y be the number of incorrect predictions for algorithm 2. Then X Binomial(m; p1 ) and Y Binomial(n; p2 ). To test the null hypothesis that p1 = p2 write
H0 : Æ = 0 versus H1 : Æ 6= 0 where Æ = p1 standard error
p2 . The r b = se
mle
is Æb = pb1
pb2 with estimated
pb1 (1 pb1 ) pb2 (1 pb2 ) + : m n
The size Wald test is to reject H0 when jW j > z=2 where
W=
Æb 0 b se
=
q
pb1
pb1 (1 pb1 ) m
pb2 +
pb2 (1 pb2 ) n
:
The power of this test will be largest when p1 is far from p2 and when the sample sizes are large.
11.1 The Wald Test
185
What if we used the same test set to test both algorithms? The two samples are no longer independent. Instead we use the following strategy. Let Xi = 1 if algorithm 1 is correct on test case i and Xi = 0 otherwise. Let Yi = 1 if algorithm 2 is correct on test case i Yi = 0 otherwise. A typical data set will look something like this: Test Case Xi Yi Di = Xi
1 2 3 4 5 .. . n
1 1 1 0 0 .. . 0
0 1 1 1 0 .. . 1
1 0 0 -1 0 .. . -1
Yi
Let
Æ = E (Di ) = E (Xi )
E (Yi ) = P(Xi
= 1)
P(Yi
= 1):
p
P
ThenPÆb = D = n 1 ni=1 Di and seb (Æb) = S= n where S 2 = n 1 ni=1 (Di D)2 . To test H0 : Æ = 0 versus H1 : Æ 6= 0 we use b se b and reject H0 if jW j > z=2 . This is called a paired W = Æ= comparison. Example 11.8 (Comparing Two Means.) Let X1 ; : : : ; Xm and Y1 ; : : : ; Yn
be two independent samples from populations with means 1 and 2 , respectively. Let's test the null hypothesis that 1 = 2 . Write this as H0 : Æ = 0 versus H1 : Æ 6= 0 where Æ = 1 2 . Recall that the nonparametric plug-in estimate of Æ is Æb = X Y with estimated standard error r b = se
s21 s22 + m n
186
11. Hypothesis Testing and p-values
where s21 and s22 are the sample variances. The size Wald test rejects H0 when jW j > z=2 where
W=
Æb 0 b se
=
X
q
Y
s2 m+ n
s21
2
:
Example 11.9 (Comparing Two Medians.) Consider the previous
example again but let us test whether the medians of the two distributions are the same. Thus, H0 : Æ = 0 versus H1 : Æ 6= 0 where Æ = 1 2 where 1 and 2 are the medians. The nonparametric plug-in estimate of Æ is Æb = b1 b2 where b1 and b2 are the sample medians. The estimated standard error seb of Æb can be obtained from the bootstrap. The Wald test statistic is b se b. W = Æ=
There is a relationship between the Wald test and the 1 asymptotic con dence interval b seb z=2 .
Theorem 11.10 The size Wald test rejects H0 : = 0 versus
H1 : 6= 0 if and only if 0 2= C where
C = (b seb z=2 ; b + seb z=2 ): Thus, testing the hypothesis is equivalent to checking whether the null value is in the con dence interval.
11.2 p-values Reporting \reject H0 " or \retain H0 " is not very informative. Instead, we could ask, for every , whether the test rejects at that level. Generally, if the test rejects at level it will also reject at level 0 > . Hence, there is a smallest at which the test rejects and we call this number the p-value.
11.2 p-values
187
De nition 11.11 Suppose that for every 2 (0; 1) we have
a size test with rejection region R . Then, n
p-value = inf :
T (X n)
o
2 R :
That is, the p-value is the smallest level at which we can reject H0 .
Informally, the p-value is a measure of the evidence against H0 : the smaller the p-value, the stronger the evidence against H0 . Typically, researchers use the following evidence scale: p-value
< :01 .01 - .05 .05 - .10 > :1
evidence
very strong evidence againt H0 strong evidence againt H0 weak evidence againt H0 little or no evidence againt H0
Warning! A large p-value is not strong evidence in favor of H0 . A large p-value can occur for two reasons: (i) H0 is true or (ii) H0 is false but the test has low power. But do not confuse the p-value with P(H0 jData). The pvalue is not the probability that the null hypothesis is true. The following result explains how to compute the p-value. Theorem 11.12 Suppose that the size test is of the form
reject H0 if and only if T (X n) c : Then,
p-value = sup P (T (X n ) T (xn )): 20
In words, the p-value is the probability (under H0 ) of observing a value of the test statistic as or more extreme than what was actually observed.
We discuss quantities like P(H0 jData) in the chapter on Bayesian inference.
188
11. Hypothesis Testing and p-values
For a Wald test, W has an aproximate N (0; 1) distribution under H0 . Hence, the p-value is
p-value P jZ j > jwj = 2P Z <
jw j
= 2 Z <
jw j
(11.5) 0 )=seb is the observed value of
where Z N (0; 1) and w the test statistic. Here is an important property of p-values. = (b
Theorem 11.13 If the test statistic has a continuous distribu-
tion, then under H0 : = 0 , the p-value has a Uniform (0,1) distribution.
If the p-value is less than .05 then people often report that \the result is statistically signi cant at the 5 per cent level." This just means that the null would be rejected if we used a size = 0:05 test. In general, the size test rejects if and only if p-value < . Example 11.14 Recall the cholesterol data from Example 8.11.
To test of the means ard dierent we compute
W=
Æb 0 b se
=
X
q
Y
s21 m
+ sn2 2
=
216:2 195:3 = 3:78: 52 + 2:42
To compute the p-value, let Z N (0; 1) denote a standard Normal random variable. Then,
p-value = P(jZ j > 3:78) = 2P(Z < 3:78) = :0002 which is very strong evidence against the null hypothesis. To test if the medians are dierent, let b1 and b2 denote the sample medians. Then,
b W= 1
b se
b2
=
212:5 194 = 2:4 7:7
11.3 The 2 distribution
189
where the standard error 7.7 was found using the boostrap. The p-value is
p-value = P(jZ j > 2:4) = 2P(Z < 2:4) = :02 which is strong evidence against the null hypothesis.
Warning! A result might be statistically signi cant and yet the size of the eect might be small. In such a case we have a result that is statistically signi cant but not practically signi cant. It is wise to report a con dence interval as well.
11.3 The distribution 2
Let Z ; : : : ; Zk be independent, standard normals. Let V = Then we say that V has a 2 distribution with k degrees of freedom, written V 2k . The probability density of V is v (k=2) 1 e v=2 f (v ) = k=2 2 (k=2) for v > 0. It can be shown that E (V ) = k and V (k) = 2k. We de ne the upper quantile 2k; = F 1 (1 ) where F is the 2 2 cdf . That is, P (k > k; ) = .
1 Pk 2. Z i=1 i
11.4 Pearson's Test For Multinomial Data 2
Pearson's 2 test is used for multinomial data. Recall that X = (X1 ; : : : ; Xk ) has a multinomial distribution if
n f (x1 ; : : : ; xk ; p) = px1 pxkk x1 : : : xk 1 where
n n! = : x1 : : : xk x1 ! xk ! The mle of p is pb = (pb1 ; : : : ; pbk ) = (X1 =n; : : : ; Xk =n).
190
11. Hypothesis Testing and p-values
Let (p01 ; : : : ; p0k ) be some xed set of probabilities and suppose we want to test H0 : (p1 ; : : : ; pk ) = (p01 ; : : : ; p0k ) versus H1 : (p1 ; : : : ; pk ) 6= (p01 ; : : : ; p0k ):
Pearson's 2 statistic is k X (Xj
k np0j )2 X (Oj Ej )2 T= = np0j Ej j =1 j =1 where Oj = Xj is the observed data and Ej = E (Xj ) = np0j is the expected value of Xj under H0 .
2k 1 . Hence the test: reject H0 if T > 2k 1; has asymptotic level . The p-value is P(2k > t) where t is the observed value of the test statistic. Theorem 11.15 Under H0 , T
Example 11.16 (Mendel's peas.) Mendel bred peas with round yel-
low seeds and wrinkled green seeds. There are four types of progeny: round yellow, wrinkled yellow, round green and (4) wrinkled green. The number of each type is multinomial with probability (p1 ; p2 ; p3 ; p4 ). His theory of inheritance predicts that 9 3 3 1 p= ; ; ; p0 : 16 16 16 16 In n = 556 trials he observed X = (315; 101; 108; 32). Since, np10 = 312:75; np20 = np30 = 104:25 and np40 = 34:75, the test statistic is (315 312:75)2 (101 104:25)2 (108 104:25)2 (32 34:75)2 + + + = 0:47: 2 = 312:75 104:25 104:25 34:75 The = :05 value for a 23 is 7.815. Since 0:47 is not larger than 7.815 we do not reject the null. The p-value is
p-value = P(23 > :47) = :07 which is only moderate evidence againt H0 . Hence, the data do not contradict Mendel's theory. Interestingly, there is some controversey about whether Mendel's results are \too good."
11.5 The Permutation Test
191
11.5 The Permutation Test The permutation test is a nonparametric method for testing whether two distribution are the same. This test is \exact" meaning that it is not based on large sample theory approximations. Suppose that X1 ; : : : ; Xm FX and Y1 ; : : : ; Yn FY are two independent samples and H0 is the hypothesis that the two samples are identically distributed. This is the type of hypothesis we would consider when testing whether a treatment diers from a placebo. More precisely we are testing
H0 : FX = FY versus H1 : FX = 6 FY : Let T (x1 ; : : : ; xm ; y1 ; : : : ; yn) be some test statistic, for example,
T (X1 ; : : : ; Xm ; Y1; : : : ; Yn ) = jX m
Y n j:
Let N = m + n and consider forming all N ! permutations of the data X1 ; : : : ; Xm ; Y1 ; : : : ; Yn . For each permutation, compute the test statistic T . Denote these values by T1 ; : : : ; TN ! . Under the null hypothesis, each of these values is equally likely. The distribution P0 that puts mass 1=N ! on each Tj is called the permutation distribution of T . Let tobs be the observed value of the test statistic. Assuming we reject when T is large, the pvalue is N! 1 X p-value = P0 (T > tobs ) = I (T > t ): N ! j =1 j obs
Example 11.17 Here is a toy example to make the idea clear.
Suppose the data are: (X1 ; X2 ; Y1 ) = (1; 9; 3). Let T (X1 ; X2 ; Y1) = jX Y j = 2. The permutations are:
Under the null hypothesis, given the ordered data values,
X1 ; : : : ; Xm ; Y1; : : : ; Yn is uniformly distributed over the N ! permutations of the data.
192
11. Hypothesis Testing and p-values
permutation value of T probability (1,9,3) 2 1/6 (9,1,3) 2 1/6 (1,3,9) 7 1/6 (3,1,9) 7 1/6 (3,9,1) 5 1/6 (9,3,1) 5 1/6 The p-value is P(T > 2) = 4=6.
Usually, it is not practical to evaluate all N ! permutations. We can approximate the p-value by sampling randomly from the set of permutations. The fraction of times Tj > tobs among these samples approximates the p-value. Algorithm for Permutation Test
1. Compute the observed value of the test statistic tobs = T (X1 ; : : : ; Xm ; Y1 ; : : : ; Yn). 2. Randomly permute the data. Compute the statistic again using the permuted data. 3. Repeat the previous step B times and let T1 ; : : : ; TB denote the resulting values. 4. The approximate p-value is B 1X I (T > t ): B j =1 j obs
Example 11.18 DNA microarrays allow researchers to measure
the expression levels of thousands of genes. The data are the levels of messenger RNA (mRNA) of each gene, which is thought
11.6 Multiple Testing
193
to provide a measure how much protein that gene produces. Roughly, the larger the number, the more active the gene. The table below, reproduced from Efron, et. al. (JASA, 2001, p. 1160) shows the expression levels for genes from two types of liver cancer cells. There are 2,638 genes in this experiment but here we show just the rst two. The data are log-ratios of the intensity levels of two dierent color dyes used on the arrays. Type I
Type II
Patient
1
2
3
4
5
6
7
8
9
10
Gene 1
230.0
-1,350
-1,580.0
-400
-760
970
110
-50
-190
-200
Gene 2
470.0
-850
-.8
-280
120
390
-1730
-1360
-.8
-330
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Let's test whether the median level of gene 1 is dierent between the two groups. Let 1 denote the median level of gene 1 of Type I and let 2 denote the median level of gene 1 of Type II. The absolute dierence of sample medians is T = jb1 b2 j = 710. Now we estimate the permutation distribution by simulation and we nd that the estimated p-value is .045. Thus, if we use a = :05 level of signi cance, we would say that there is evidence to reject the null hypothesis of no dierence.
In large samples, the permutation test usually gives similar results to a test that is based on large sample theory. The permutation test is thus most useful for small samples.
11.6 Multiple Testing In some situations we may conduct many hypothesis tests. In example 11.18, there were actually 2,638 genes. If we tested for a dierence for each gene, we would be conducting 2,638 separate hypothesis tests. Suppose each test is conducted at level . For any one test, the chance of a false rejection of the null is . But the chance of at least one false rejection is much higher.
194
11. Hypothesis Testing and p-values
This is the multiple testing problem. The problem comes up in many data mining situations where one may end up testing thousands or even millions of hypotheses. There are many ways to deal with this problem. Here we discuss two methods. Consider m hypothesis tests:
H0i versus H1i ; i = 1; : : : ; m and let P1 ; : : : ; Pm denote m p-values for these tests. The Bonferroni Method
Given p-values P1 ; : : : ; Pm , reject null hypothesis H0i if Pi < =m. Theorem 11.19 Using the Bonferroni method, the probability of
falsely rejecting any null hypotheses is less than or equal to .
Let R be the event that at least one null hypotheses is falsely rejected. Let Ri be the event that the ith null hypothesisSis falsely rejected. Recall that if A1 ; : : : ; Ak are events then Pk k P( i=1 Ai ) i=1 P(Ai ). Hence, Proof.
P(R) = P
m [ i=1
!
Ri
m X i=1
P (Ri ) =
m X
= m i=1
from Theorem 11.13. Example 11.20 In the gene example, using = :05, we have
that :05=2638 = :00001895375. Hence, for any gene with p-value less than .00001895375, we declare that there is a signi cant dierence.
The Bonferroni method is very conservative because it is trying to make it unlikely that you would make even one false rejection. Sometimes, a more reasonable idea is to control the
11.6 Multiple Testing
195
false discovery rate (FDR) which is de ned as the mean of the number of false rejections divided by the number of rejections. Suppose we reject all null hypotheses whose p-values fall below some threshold. Let m0 be the number of null hypotheses are true and m1 = m m0 null hypotheses are false. The tests can be categorize in a 2 2 as in the following table. H0 True H0 False Total
H0 Not Rejected H0 Rejected U V T S m R R
Total m0 m1 m
De ne the False Discovery Proportion (FDP) FDP =
V=R if R > 0 0 if R = 0:
The FDP is the proportion of rejections that are incorrect. Next de ne FDR = E (FDP). The Benjamini-Hochberg (BH) Method
1. Let P(1) < < P(m) denote the ordered p-values. 2. De ne
`i =
i ; Cm m
and
n
R = max i : P(i) < `i
o
(11.6)
where Cm isPde ned to be 1 if the p-values are independent and Cm = m i=1 (1=i) otherwise. 3. Let t = P(R) ; we call t the BH rejection threshold. 4. Rejects all null hypotheses H0i for which Pi t:
196
11. Hypothesis Testing and p-values
Theorem 11.21 (Benjamini and Hochberg) If the procedure
above is applied, then regardless of how many nulls are true and regardless of the distribution of the p-values when the null hypthesis is false,
FDR = E (FDP)
m0 : m
Example 11.22 Figure 11.1 shows 7 ordered p-values plotted as
vertical lines. If we tested at level without doing any corection for mutiple testing, we would reject all hypotheses whose p-values are less than . In this case, the 5 hypotheses corresponding to the 5 smallest p-values are rejected. The Bonferroni method rejects all hypotheses whose p-values are less than =m. In this example, this leads to no rejections. The BH threshold corresponds to the last p-value that falls under the line with slope . This leads to three hypotheses being rejected in this case.
The Bonferonni method controls the probability of a single false rejection. This is very strict and leads to low power when there are many tests. The FDR method controls the fraction of false discoveries which is a more reasonable criterion when there are many tests. Summary.
11.7 Technical Appendix 11.7.1 The Neyman-Pearson Lemma
In the special case of a simple null H0 : = 0 and a simple alternative H1 : = 1 we can say precisely what the most powerful test is. Theorem 11.23 (Neyman-Pearson.) Suppose we test H0 : = 0 versus H1 : = 1 . Let Qn f (x ; ) L (1 ) T= = Qni=1 i 1 : L(0) i=1 f (xi ; 0 )
11.7 Technical Appendix
197
p-values
t
=m 0
reject
don't reject
threshold
FIGURE 11.1. Schematic illustration of Benjamini-Hochberg procedure. All hypotheses corresponding to the last undercrossing are rejected.
1
198
11. Hypothesis Testing and p-values
Suppose we reject H0 when T > k. If we choose k so that P0 (T > k ) = then this test is the most powerful, size test. That is among all tests with size , this test maximizes the power (1 ).
11.7.2 Power of the Wald Test
11.6. Let Z N (0; 1). Then, Proof of Theorem
Power = (? ) = P? (Reject H0 ) = P? (jW j > z=2 ) ! j b 0 j = P ? > z=2 b se
= = =
P ?
b 0 b se
!
> z=2 + P?
b 0 b se
!
< z=2
b z=2 + P? b < 0 se b z=2 P? b > 0 + se ! b ? 0 ? b ? 0 ? > + z=2 + P? < P ? b b b b se se se se 0 ? + z=2 + P Z < 0 ? z=2 P Z> b b se se
= 1
0
b se
?
+ z=2 +
0
b se
?
z=2 :
11.7.3 The t-test
To test H0 : = 0 where is the mean, we can use the Wald test. When the data are assumed to be Normal and the sample size is small, it is common instead to use the t-test. A random variable T has a t-distribution with k degrees of freedom if it has
!
z=2
11.7 Technical Appendix
density
f (t) =
p
k
k+1
2
k
t2 2 1+ k
199
(k+1)=2 :
When the degrees of freedom k ! 1, this tends to a Normal distribution. When k = 1 it reduces to a Cauchy. Let X1 ; : : : ; Xn N (; 2 ) where = (; 2 ) are both unknown. Suppose we want to test = 0 versus 6= 0 . Let
T=
pn(X
n
Sn
0 )
where Sn2 is the sample variance. For large samples T N (0; 1) under H0 . The exact distribution of T under H0 is tn 1 . Hence if we reject when jT j > tn 1;=2 then we get a size test. 11.7.4 The Likelihood Ratio Test
Let = (1 ; : : : ; q ; q+1 ; : : : ; r ) and suppose that 0 consists of all parameter values such that (q+1 ; : : : ; r ) = (0;q+1 ; : : : ; 0;r ). De nition 11.24 De ne the likelihood ratio statistic
by
sup2 L() = 2 log = 2 log sup20 L()
L(b) L(b0)
!
where b is the mle and b0 is the mle when is restricted to lie in 0 . The likelihood ratio test is: reject H0 when (xn ) > 2r q; .
For example, if = (1 ; 2 ; 3 ; 4 ) and we want to test the null hypothesis that 3 = 4 = 0 then the limiting distribution has 4 2 = 2 degrees of freedom.
200
11. Hypothesis Testing and p-values
Theorem 11.25 Under H0 , d 2 2 log (xn ) ! r q :
Hence, asymptotically, the LR test is level .
11.8 Bibliographic Remarks The most complete book on testing is (1986). See also Casella and Berger (1990, Chapter 8). The FDR method is due to Benjamini and Hochberg (1995).
11.9 Exercises 1. Prove Theorem 11.13. 2. Prove Theorem 11.10. 3. Let X1 ; :::; Xn Uniform(0; ) and let Y = maxfX1 ; :::; Xn g. We want to test H0 : = 1=2 versus H1 : > 1=2. The Wald test is not appropriate since Y does not converge to a Normal. Suppose we decide to test this hypothesis by rejecting H0 when Y > c. (a) Find the power function. (b) What choice of c will make the size of the test .05? (c) In a sample of size n = 20 with Y=0.48 what is the p-value? What conclusion about H0 would you make? (d) In a sample of size n = 20 with Y=0.52 what is the p-value? What conclusion about H0 would you make? 4. There is a theory that people can postpone their death until after an important event. To test the theory, Phillips
11.9 Exercises
201
and King (1988) collected data on deaths around the Jewish holiday Passover. Of 1919 deaths, 922 died the week before the holiday and 997 died the week after. Think of this as a binomial and test the null hypothesis that = 1=2: Report and interpret the p-value. Also construct a con dence interval for : Reference: Phillips, D.P. and King, E.W. (1988). Death takes a holiday: Mortality surrounding major social occasions. The Lancet, 2, 728-732. 5. In 1861, 10 essays appeared in the New Orleans Daily Crescent. They were signed \Quintus Curtuis Snodgrass" and some people suspected they were actually written by Mark Twain. To investigate this, we will consider the proportion of three letter words found in an author's work. From eight Twain essays we have: .225 .262 .217 .240 .230 .229 .235 .217 From 10 Snodgrass essays we have: .209 .205 .196 .210 .202 .207 .224 .223 .220 .201 (source: Rice xxxx) (a) Perform a Wald test for equality of the means. Use the nonparametric plug-in estimator. Report the p-value and a 95 per cent con dence interval for the dierence of means. What do you conclude? (b) Now use a permutation test to avoid the use of large sample methods. What is your conclusion? 6. Let X1 ; : : : ; Xn N (; 1). Consider testing
H0 : = 0 versus = 1:
202
11. Hypothesis Testing and p-values
Let the rejection region be R = fxn : T (xn ) > cg where P T (xn ) = n 1 ni=1 Xi . (a) Find c so that the test has size . (b) Find the power under H1 , i.e. nd (1). (c) Show that (1) ! 1 as n ! 1. 7. Let b be the mle of a parameter and let seb = fnI (b)g 1=2 where I () is the Fisher information. Consider testing
H0 : = 0 versus 6= 0 :
Consider the Wald test with rejection region R = fxn : jZ j > z=2 g where Z = (b 0 )=seb . Let 1 > 0 be some alternative. Show that (1 ) ! 1. 8. Here are the number of elderly Jewish and Chinese women who died just before and after the Chinese Harvest Moon Festival. Week Chinese Jewish -2 55 141 33 145 -1 1 70 139 2 49 161 Compare the two mortality patterns. 9. A randomized, double-blind experiment was conducted to assess the eectiveness of several drugs for reducing postoperative nausea. The data are as follows. Number of Patients Incidence of Nausea Placebo 80 45 Chlorpromazine 75 26 Dimenhydrinate 85 52 Pentobarbital (100 mg) 67 35 Pentobarbital (150 mg) 85 37
11.9 Exercises
203
(source: ) (a) Test each drug versus the placebo at the 5 per cent level. Also, report the estimated odds-ratios. Summarize your ndings. (b) Use the Bonferoni and the FDR method to adjust for multiple testing. 10. Let X1 ; :::; Xn Poisson(): (a) Let 0 > 0. Find the size Wald test for
H0 : = 0 versus H1 : 6= 0 : (b) (Computer Experiment.) Let 0 = 1, n = 20 and = :05. Simulate X1 ; : : : ; Xn Poisson(0 ) and perform the Wald test. Repeat many times and count how often you reject the null. How close is the type I error rate to .05?
204
11. Hypothesis Testing and p-values
12 Bayesian Inference
12.1 The Bay esianPhilosophy The statistical theory and methods that we hav e discussed so far are known as frequentist (or classical) inference. The frequentist point of view is based on the following postulates: (F1) Probability refers to limiting relative frequencies. Probabilities are objective properties of the real world. (F2) P arameters are xed, (usually unknown) constants. Because they are not uctuating, no probability statements can be made about parameters. (F3) Statistical procedures should be designed to hav e well de ned long run frequency properties. F or example, a 95 per cent con dence in terval should trap the true v alue of the parameter with limiting frequency at least 95per cent. There is another approach to inference called Bay esian inference. The Bay esian approach is based on the following postulates:
206
12. Bayesian Inference
(B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots of things, not just data which are subject to random variation. For example, I might say that `the probability that Albert Einstein drank a cup of tea on August 1 1948" is .35. This does not refer to any limiting frequency. It re ects my strength of belief that the proposition is true. (B2) We can make probability statements about parameters, even though they are xed constants. (B3) We make inferences about a parameter , by producing a probability distribution for . Inferences, such as point estimates and interval estimates may then be extracted from this distribution. Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general, Bayesian methods provide no guarantees on long run performance. The eld of Statistics puts more emphasis on frequentist methods although Bayesian methods certainly have a presence. Certain data mining and machine learning communuties seem to embrace Bayesian methods very strongly. Let's put aside philosophical arguments for now and see how Bayesian inference is done. We'll conclude this chapter with some discussion on the strengths and weaknesses of each approach.
12.2 The Bayesian Method
Bayesian inference is usually carried out in the following way. 1. We choose a probability density f () { called the prior distribution { that expresses our degrees of beliefs about a parameter before we see any data.
12.2 The Bayesian Method
207
2. We choose a statistical model f (xj) that re ects our beliefs about x given . Notice that we now write this as f (xj) instead of f (x; ). 3. After observing data X1 ; : : : ; Xn, we update our beliefs and form the posterior distribution f (jX1 ; : : : ; Xn ). To see how the third step is carried out, rst, suppose that is discrete and that there is a single, discrete observation X . We should use a capital letter now to denote the parameter since we are treating it like a random variable so let denote the parameter. Now, in this discrete setting, P (X = x; = ) PP (X = xj = )P ( = ) = P ( = jX = x) = P (X = x) P (X = xj = )P ( = ) which you may recognize from earlier in the course as Bayes' theorem. The version for continuous variables is obtained by using density functions: f (xj)f () f (jx) = R : (12.1) f (xj)f ()d
If we have n iid observations X1 ; : : : ; Xn, we replace f (xj) Qn with f (x1 ; : : : ; xn j) = i=1 f (xi j). Let us write X n to mean (X1 ; : : : ; Xn ) and xn to mean (x1 ; : : : ; xn ). Then L ()f () / L ()f (): f (xn j)f () =R n f (jxn ) = R n f (x j)f ()d Ln()f ()d n (12.2) In the right hand side of the last equation, we threw away the R denominator Ln()f ()d which is a constant that does not depend on ; we call this quantity the normalizing constant. We can summarize all this by writing: \posterior is proportional to likelihood times prior:00 (12.3) You might wonder, doesn't it cause a problem to throw away R the constant Ln ()f ()d? The answer is that we can always
208
12. Bayesian Inference R
recover the constant is since we know that f (jxn )d = 1. Hence, we often omit the constant until we really need it. What do we do with the posterior? First, we can get a point estimate by summarizing the center of the posterior. Typically, we use the mean or mode of the posterior. The posterior mean is R Z L ()f () n n = f (jx )d = R n (12.4) Ln()f ()d : WeRcan also obtain aRBayesian interval estimate. De ne a and b by a1 f (jxn )d = b1 f (jxn )d = =2. Let C = (a; b). Then P(
so C is a 1
2 Cj
xn )
=
Z b a
f (jxn ) d = 1
posterior interval.
Example 12.1 Let X1 ; : : : ; Xn
Bernoulli(p). Suppose we take
the uniform distribution f (p) = 1 as a prior. By Bayes' theorem the posterior has the form
f (pjxn ) / f (p)Ln(p) = ps (1 p)n s = ps+1 1 (1 p)n s+1 1 P
where s = i xi is the number of heads. Recall that random variable has a Beta distribution with parameters and if its density is
f (p; ; ) =
( + ) 1 p (1 p) 1 : () ( )
We see that the posterior for p is a Beta distribution with parameters s + 1 and n s + 1. That is,
f (pjxn ) =
(n + 2) p(s+1) 1 (1 p)(n s+1) 1 : (s + 1) (n s + 1)
We write this as
pjxn Beta(s + 1; n s + 1):
12.2 The Bayesian Method
209
Notice that we have gured Rout the normalizing constant without actually doing the integral Ln (p)f (p)dp. The mean of a Beta (; ) is =( + ) so the Bayes estimator is
p=
s+1 : n+2
It is instructive to rewrite the estimator as
p = n pb + (1 n)pe where pb = s=n is the mle, pe = 1=2 is the prior mean and n = n=(n + 2) 1. A 95 per cent posterior interval can be obtained Rb by numerically nding a and b such that a f (pjxn ) dp = :95. Suppose that instead of a uniform prior, we use the prior p Beta(; ). If you repeat the calculations above, you will see that pjxn Beta( + s; + n s). The at prior is just the special case with = = 1. The posterior mean is
+s n + p= = pb + p + +n + +n + +n 0 where p0 = =( + ) is the prior mean.
In the previous example, the prior was a Beta distribution and the posterior was a Beta distribution. When the prior and the posterior are in the same family, we say that the prior is conjugate.
N (; 2 ). For simplicity, let us assume that is known. Suppose we take as a prior N (a; b2 ). Example 12.2 Let X1 ; : : : ; Xn
In problem 1 in the homework, it is shown that the posterior for is jX n N (a; b2 ) (12.5) where
= wX + (1 w)a
210
12. Bayesian Inference
where
1 1 1 1 w = 1 se2 1 and 2 = 2 + 2 se b se2 + b2 p and se = = n is the standard error of the mle X . This is another example of a conjugate prior. Note that w ! 1 and =se ! 1 as n ! 1. So, for large n, the posterior is approxb se2 ). The same is true if n is xed but b ! 1, imately N (; which corresponds to letting the prior become very at. Continuing with this example, let is nd C = (c; d) such that P r( 2 C jX n) = :95. We can do this by choosing c such that P r( < cjX n ) = :025 and P r( > djX n) = :025. So, we want to nd c such that c n n P ( < cjX ) = P < X c = P Z< = :025: Now, we know that P (Z < 1:96) = :025. So c = 1:96 implying that c = 1:96: By similar arguments, d = + 1:96: So a 95 per cent Bayesian interval is 1:96 . Since b and se, the 95 per cent Bayesian interval is approximated by b 1:96 se which is the frequentist con dence interval.
12.3 Functions of Parameters How do we make inferences about a function = g ()? Remember in Chapter 3 we solved the following problem: given the density fX for X , nd the density for Y = g (X ). We now simply apply the same reasoning. The posterior cdf for is
H ( j
xn )
= P(g () ) =
Z
A
f (jxn )d
12.4 Simulation
211
where A = f : g () g. The posterior density is h( jxn ) = H 0 ( jxn ). Example 12.3 Let X1 ; : : : ; Xn
that pj log(p=(1
Xn
H(
j
xn)
Bernoulli(p) P and f (p) = 1 so
Beta(s + 1; n s + 1) with s =
n i=1 xi .
p)). Then = = = =
n x ) = P log 1 e n x 1+e
j
P( P P
Z e =(1+e )
0
P
P
Let
=
n x
f (pjxn) dp
Z e =(1+e ) (n + 2) ps(1 (s + 1) (n s + 1) 0
p)n s dp
and
h( jxn ) = H 0 ( jxn ) = = = for
s
s
(n + 2) e (s + 1) (n s + 1) 1 + e
e (n + 2) (s + 1) (n s + 1) 1 + e (n + 2) e (s + 1) (n s + 1) 1 + e
1 1+e
1 1+e s 1 1+e
n s
0
@ 1+e e @ @
n s n s+2
1 1+e
2 R.
12.4 Simulation The posterior can often be approximated by simulation. Suppose we draw 1 ; : : : ; B p(jxn ). Then a histogram of 1 ; : : : ; B approximates the posterior density p(jxn ). An approximation
1
2
A
212
12. Bayesian Inference P
to the posterior mean n = E (jxn ) is B 1 Bj=1 j : The posterior 1 interval can be approximated by (=2 ; 1 =2 ) where =2 is the =2 sample quantile of 1 ; : : : ; B . Once we have a sample 1 ; : : : ; B from f (jxn ), let i = g (i ). Then 1 ; : : : ; B is a sample from f ( jxn ). This avoids the need to do any analytical calculations. Simulation is discussed in more detail later in the book. Example 12.4 Consider again Example 12.3. We can approximate the posterior for without doing any calculus. Here are the steps: 1. Draw P1 ; : : : ; PB Beta(s + 1; n s + 1). 2. Let i = log(Pi =(1 Pi )) for i = 1; : : : ; B . Now 1 ; : : : ; B are iid draws from h( jxn ). A histogram of these values provides an estimate of h( jxn ).
12.5 Large Sample Properties of Bayes' Procedures. In the Bernoulli and Normal examples we saw that the posterior mean was close to the mle . This is true in greater generality. Theorem 12.5 Under appropriate regularity conditions, we have 2 b se b ) where bn is the that the posteriorqis approximately N (;
and seb = 1= nI (bn ). Hence, n bn . Also, if C = (bn z=2 seb ; bn + z=2 seb ) is the asymptotic frequentist 1 con dence interval, then Cn is also an approximate 1 Bayesian posterior interval: P( 2 C jX n ) ! 1 : mle
There is also a Bayesian delta method. Let = g (). Then 2 jX n N (b; se e )
12.6 Flat Priors, Improper Priors and \Noninformative" Priors.
213
where b = g (b) and se e = se jg 0 (b)j.
12.6 Flat Priors, Improper Priors and \Noninformative" Priors. A big question in Bayesian inference is: where do you get the prior f ()? One school of thought, called \subjectivism" says that the prior should re ect our subjective opinion about before the data are collected. This may be possible in some cases but seems impractical in complicated problems especially if there are many parameters. An alternative is to try to de ne some sort of \noninformative prior." An obvious candidate for a noninformative prior is to use a \ at" prior f () / constant. In the Bernoulli example, taking f (p) = 1 leads to pjX n Beta(s + 1; n s + 1) as we saw earlier which seemed very reasonable. But unfettered use of at priors raises some questions. Consider the N (; 1) example. Suppose we adopt a at prior f () / c where c > 0 is a constant. Note R that f ()d = 1 so this is not a real probability density in the usual sense. We call such a prior an improper prior. Nonetheless, we can still carry out Bayes' theorem and compute the posterior density f () / Ln ()f () / Ln (). In the normal example, this gives jX n N (X; 2 =n) and the resulting point and interval estimators agree exactly with their frequentist counterparts. In general, improper priors are not a problem as long as the resulting posterior is a well de ned probability distribution. Improper Priors.
Flat Priors are Not Invariant. Go back to the Bernoulli example and consider using the at prior f (p) = 1. Recall that a at prior presumably represents our lack of information about p before the experiment. Now let = log(p=(1 p)). This is a transformation and we can compute the resulting distribution
214
12. Bayesian Inference
for . It turns out that
f ( ) =
e : (1 + e )2
But one could argue that if we are ignorant about p then we are also ignorant about so shouldn't we use a at prior for ? This contradicts the prior f ( ) for that is implied by using a at prior for p. In short, the notion of a at prior is not well-de ned because a at prior on a parameter does not imply a at prior on a transformed version of the parameter. Flat priors are not transformation invariant. Jereys came up with a \rule" for creating priors. The rule is: take f () / I ()1=2 where I () is the Fisher information function. This rule turns out to be transformation invariant. There are various reasons for thinking that this prior might be a useful prior but we will not go into details here. Example 12.6 Consider the Bernoulli (p). Recall that Jeffreys' Prior.
I (p) =
1
p(1 p)
:
Jerey's rule says to use the prior
f (p) /
p
I (p) = p 1=2 (1 p) 1=2 :
This is a Beta (1/2,1/2) density. This is very close to a uniform density.
In a multiparameter problem, the Jereys' prior is de ned to p be f () / detI () where det(A) denotes the determinant of a matrix A.
12.7 Multiparameter Problems
12.7 Multiparameter Problems
215
In principle, multiparameter problems are handled the same way. Suppose that = (1 ; : : : ; p ). The posterior density is still given by p(jxn ) / Ln ()f (): The question now arises of how to extract inferences about one parameter. The key is nd the marginal posterior density for the parameter of interest. Suppose we want to make inferences about 1 . The marginal posterior for 1 is
f (1 j
xn )
=
Z
Z
f (1; ; pjxn )d2 : : : dp:
In practice, it might not be feasible to do this integral. Simulation can help. Draw randomly from the posterior:
1 ; : : : ; B f (jxn) where the superscripts index the dierent draws. Each j is a vector j = (1j ; : : : ; pj ). Now collect together the rst component of each draw: 11 ; : : : ; 1B : These are a sample from f (1 jxn ) and we have avoided doing any integrals. Example 12.7 (Comparing two binomials.) Suppose we have n1 control patients and n2 treatment patients and that X1 control patients survive while X2 treatment patients survive. We want to estimate = g (p1 ; p2 ) = p2 p1 . Then,
X1 Binomial(n1 ; p1 ) and X2 Binomial(n2 ; p2 ): Suppose we take f (p1 ; p2 ) = 1. The posterior is
f (p1 ; p2 jx1 ; x2 ) / px1 1 (1 p1 )n1
x1 px2 (1
2
p2 )n2
x2 :
Notice that (p1 ; p2 ) live on a rectangle (a square, actually) and that f (p1 ; p2 jx1 ; x2 ) = f (p1 jx1 )f (p2 jx2 )
216
12. Bayesian Inference
where f (p1 jx1 ) / px1 1 (1 p1 )n1 x1 and f (p2 jx2 ) / px2 2 (1 p2 )n2 x2 which implies that p1 and p2 are independent under the posterior. Also, p1 jx1 Beta(x1 + 1; n1 x1 + 1) and p2 jx2 Beta(x2 +1; n2 x2 +1). If we simulate P1;1 ; : : : ; P1;B Beta(x1 + 1; n1 x1 +1) and P2;1 ; : : : ; P2;B Beta(x2 +1; n2 x2 +1) then b = P2;b P1;b , b = 1; : : : ; B , is a sample from f ( jx1 ; x2 ).
12.8 Strengths and Weaknesses of Bayesian Inference Bayesian inference is appealing when prior information is available since Bayes' theorem is a natural way to combine prior information with data. Some people nd Bayesian inference psychologically appealing because it allows us to make probability statements about parameters. In contrast, frequentist inference provides con dence sets Cn which trap the parameter 95 per cent of the time, but we cannot say that P( 2 CnjX n ) is .95. In the frequentist approach we can make probability statements about Cn not . However, psychological appeal is not really an argument for using one type of inference over another. In parametric models, with large samples, Bayesian and frequentist methods give approximately the same inferences. In general, they need not agree. Consider the following example. Example 12.8 Let X
N (; 1)
and suppose we use the prior
N (0; 2 ). From (12.5), the posterior is
x 1 jx N ; = N (cx; c) 1 1 + 2 1 + 12 where c = 2 =( 2 + 1). A 1 per cent posterior interval is C = (a; b) where p p a = cx cz=2 and b = cx + cz=2 :
12.9 Appendix
217
Thus, P( 2 C jX ) = 1 . We can now ask, from a frequntist perspective, what is the coverage of C , that is, how often will this interval contain the true value? The answer is P (a <
< b) = = =
p
p p
P (cX z=2 c < < cX + z=2 c) + z=2 c z=2 c c <X < P c c (1 c) z=2 c (1 c) + z=2 P
p
c
p
pc
p
p
(1 c) + z=2 c (1 c) z=2 c = c c where Z N (0; 1). Figure 12.1 shows the coverage as a function of for = 1 and = :05. Unless the true value of is close to 0, the coverage can be very small. Thus, upon repeated use, the Bayesian 95 per cent interval might contain the true value with frequency near 0! In contrast, a con dence interval has coverage 95 per cent coverage no matter what the true value of is. What should we conclude from all this? The important thing is to understand that frequentist and Bayesian methods are answering dierent questions. To combine prior beliefs with data in a principled way, use bayesian inference. To construct procedures with guaranteed long run performance, such as con dence intervals, use frequentist methods. It is worth remarking that it is possible to develop nonparametric Bayesian methods similar to plug-in estimation and the bootstrap. be forewarned, however, that the frequency properties of nonparametric Bayesian methods can sometimes be quite poor.
12.9 Appendix Proof of Theorem 12.5. It can be shown that the eect of the prior diminishes as n increases so that f (jX n) / Ln()f () Ln(). Hence,
12. Bayesian Inference
0.4 0.0
0.2
coverage
0.6
0.8
1.0
218
0
1
2
3
4
5
FIGURE 12.1. Frequentist coverage of 95 per cent Bayesian posterior interval as a function of the true value . The dotted line marks the 95 per cent level.
12.10 Bibliographic Remarks
219
log f (jX n) `(). Now, `() `(b) + ( b)`0 (b) + [( b)2 =2]`00 (b) = `(b) + [( b)2 =2]`00 (b) since `0 (b) = 0. Exponentiating, we get approximately that
f (jX n) / exp
(
1 ( b)2 2 n2
)
where n2 = 1=`00 (bn ). So the posterior of is approximately Normal with mean b and variance n2 . Let `i = log f (Xi j), then
n 2 = = n =
X
`00 (bn ) = 1 X
n
nI (bn )
i
i
`00i (bn )
`00i (bn ) nE
h
i
`00i (bn )
and hence n se(b).
12.10 Bibliographic Remarks Some references on Bayesian inference include Carlin and Louis (1996), Gelman, Carlin, Stern and Rubin (1995), Lee (1997), Robert (1994) and Schervish (1995). See Cox (1997), Diaconis and Freedman (1996), Freedman (2001), Barron, Schervish and Wasserman (1999), Ghosal, Ghosh and van der Vaart (2001), Shen and Wasserman (2001) and Zhao (2001) for discussions of some of the technicalities of nonparametric Bayesian inference.
12.11 Exercises 1. Verify (12.5).
2. Let X1 ; :::; Xn Normal(; 1): (a) Simulate a data set (using = 5) consisting of n=100 observations. (b) Take f () = 1 and nd the posterior density. Plot the density.
220
12. Bayesian Inference
(c) Simulate 1000 draws from the posterior. Plot a histogram of the simulated values and compare the histogram to the answer in (b). (d) Let = e . Find the posterior density for analytically and by simulation. (e) Find a 95 per cent posterior interval for . (f) Find a 95 per cent con dence interval for . 3. Let X1 ; :::; Xn Uniform(0; ): Let f () posterior density.
/ 1=.
Find the
4. Suppose that 50 people are given a placebo and 50 are given a new treatment. 30 placebo patients show improvement while 40 treated patients show improvement. Let = p2 p1 where p2 is the probability of improving under treatment and p1 is the probability of improving under placebo. (a) Find the mle of . Find the standard error and 90 per cent con dence interval using the delta method. (b) Find the standard error and 90 per cent con dence interval using the parametric bootstrap. (c) Use the prior f (p1 ; p2 ) = 1. Use simulation to nd the posterior mean and posterior 90 per cent interval for . (d) Let = log
p1
1 p1
1
p2
p2
be the log-odds ratio. Note that = 0 if p1 = p2 . Find the mle of . Use the delta method to nd a 90 per cent con dence interval for . (e) Use simulation to nd the posterior mean and posterior 90 per cent interval for .
12.11 Exercises
221
5. Consider the Bernoulli(p) observations 0101000000 Plot the posterior for p using these priors: Beta(1/2,1/2), Beta(1,1), Beta(10,10), Beta(100,100). 6. Let X1 ; : : : ; Xn Poisson(). (a) Let Gamma(; ) be the prior. Show that the posterior is also a Gamma. Find the posterior mean. (b) Find the Jereys' prior. Find the posterior.
"!#$
%'& (*),+)-/.0)-213+5476981:-/.-2;=< >@?A8;=BC
DFEHGID
J5KML#NPORQ7OPS,TUKVOPLFW
X [Z:\]#^#_^*`_Z / a ]b,_`c* /]`H!#\2d
Hce Vfbg Y ch!#ciaj a]"]F^kl/c* ]m0/nco_]"^p] q:c*] c*_b5_`/ce V/]
^rs U] `/_]tco[
vuvwxrqy
\z[mU`=
5ce
{|] /#}~ {"] #/ V/=l/c* ]_u3]}^#]}o\2]F] o
co] d/#ce
o*
Fg `}:q] !^r0"P[7Mb'}\2r3 oq] c*
a/_]`{Aq] \]co
` sl2 l/\
a `]F\_^!#_u ]#^# d /
co_*}#\2aZt d /
co_`
\ohu _: Ut
e_`c* /] ] q0"uFwx*/3a¡
!
:] q^_\] */#]l{mF
l/c* ]~¢]co_co\[
aa^d s"y[y£V¤¦¥*
^d/# U]`a Z
a!_] q¦$^\_`]!#a$
`\[
aa^§ y u XYd
aac*
!`d/n^#\__
\z{p_x}¨r
^© !##
¥y[rªI¤¦ MU¬«:P"® ¯zu¦°]c*
aa{m0«±c*
s³²´µ/]¶:u¢:`
! "
`]co$zf#
co a_] q a]q!#\_/]_ «:PF® ¯# y%$ ¯'& `b!
_^_] a]`m «:PF® ¯ #)( *$©+ (
]a!#/_`]a]`m «:PF® ¯ #)( * $©+ ( , « , a]_®
^10q32# 4_`] gx]#a]`m «:PF® ¯ #.-=q¢/ # s «:PF® ¯ #65*a]8 79;:=< >@? A HGJI¯'KLG M!aa
\2jg _a~a]ON 9;:=< >C? B AED/F
t oq!\z/] q] aa]}/ V
`c* /] o t ] q^ / #u#]sco Q4_t/#~ ]bm#`]c*z/c*_ }}aa}`/
SRk¯zub]s
``
e_`c* /]_m}~3_Z
a! //# [Z_/
3a]`] ju P~
tco^AA}
TVUOW+XLY Z[Y \]X3^_]`a^cbedgf v=yF®2 ¯#xw ?
VPOhji k/lm
7«:P"®z ¯ D
#zy
fnporqSs l o iut
qQn
«:P"®z SGU¯¯ HGJI2 ¯EKLGJN F
X 'oa]`$q!#\_]k$b! ^'`]m*``jr{`!l |*}~ c*
b!
`^|`]2¯ v=PF®2 ¯V#w ?
R% $p¯ & #
|*}~
# ?
R ¯j
Y
u & ?
I ¯N
xw s/`l,] qU\2
F/_mbyªho|0z§#8*§F¥y[ªx¤ M.h£§M£¤[yJ§[¤YpMp¥y[Aªx¤¦ M P ¤0§M" bMV DFEHGC
Q TK OPSnORWjVSV ¡ ¢O SW
¦]|\]co
`=x}]r_`/ce V/]`}~*\[
Y\]co
`=$`jrq!\ g /]_u3]}_Z_m"/ ^#]F_ #] ]Z"^: $\a[
l}~_~ ,]}#\2 l/ce ]¢Uz/_u ]#^#/q]aa]}# hzf#
co a_u £¤
¥§¦ ¨ U8^_©`«ª¬fpo ® R z¯pyF® 0¯°lm+±§l nnp²³s´f¶µf l t f¶²npq ¸ m · n;¹;² lt f ± f tt[iutºQi nn;»½¼ ium npq ± f t o¢µ i fnporqSs l o iut n;¾ ¿/#ÀR l m+± &  # Á »Ãbedgf
¸ !' [
«
Á v=P"®
& ¯ v=yF® ¿¯
0 -
0 ¸ !E ¶
! V
)(
-
$# % '&
*
Á
[ ¸
!'a
"! %a
!
k ² m o¢q ium n lt f v=PF®2 ¿¯# w ? SR $¬¯ &3# 0 lm+± v=yF® oSdgf m3v=yF® w ¢Á½$£¯ & #rÁ½$´¯ & N §i o¢q f/oSd l o q k3 ? v=P"®z ¿ ¯ i oSdgf t µVqQnf v=P"® ¿¯ v=PF®2 & ¯ » fpqSoHdgf t fnporqSs l o iut q i ² n º d k iut s º ±¸i s8q m+l o fn8oSdgf i oSdgf t nff q · ² t f » » & t q ± q ² ºQi ² n fnpo¢qSs l o iut ;²³o*qSo nf t fno i q ºSº ² npo tEl o f´oSdgf i q m o oSd qQn m i o i q i ² n*d i µo i i s lt f%orµ it qQn k ² m orq ium n;» t qQn
*
.- 0/ 0/1 * 2/ + )7 98 ;:=< 4: ?>A@CB 65 @ B D )( * HG E* FD ,+
65
@CB
JI ML *
6K 9NO@ )K'0 $PK ¯ N Q
¯ & &
#
¯/
43
² m q qQn l l o qSo
£¤
¥/¦ ¨ U ^_©`«_x¬Jfpo 3 R ¿ ® N N;N® R P~_]!#aaP ¯ » ¼ ium npq ± f t n;¹;² l t f ± ;q l n Vµf*d l f%oSd l o f ttiut%ºQi nn lm+±´º fpo V ¿ # R » q m f½oHd¸qQn½d l n
K
%K K `¿ ¯#= R¯V#
R
v= ® m i oHdgf t fnporqSs l o iut qQn
B
T U S T WV Q Z IZ ¿ R lm+±\Tjlm+± µ dgf t f #YX V lt f]D i npqSorq4Bf *ium npo lm oSn;» bed¸qQn S [ oHdgf^D i npo f t q iut s´f lm ²npq m¸·´l P~_/ `_4 T ®aV'b D t q iut » +§i µcv=6K® K ¯ # K ¯ Y
K ¯¯ & & , & , & & d d w d S WT UT # S $PK , , T eV Q^f T3eV3 Qf f Q 0K '0 $PK ¯ d Q K WT $PK & N # gT W V Q ¯& T e V Q f
K
&
#
qQn
! "
K
v= ¿¯
v=
K & ¯
0.0020
0.0025
0.0015
0.0000
0.0005
0.0010
0.0
0.2
0.4
K
0.6
] « \ [ @!
0.8
!
0 ¸%«
¿ L &
+§i µ º fpo T1# V # Q » _ml Fs D º f:=<»4:=< * d i q * f;» b bedgf t fnp² º o¢q m¸· fnpo¢qSs l o iut qQn K & )(
lm+±t qQn
* o¢q ium
k ² m
#
S Q
*
µf½µVq ºSº f
D
ºal q m oSd¸qQn
Q Q
qQn
6K K & ¯V# Q ¯ & Q Q
)(
1.0
v= ®
N
OD
:=<
R
bedgf t Qq n k ² m ¢o q ium n »¸» n µf l t f Qº i oro f ± q m· ² t f m fpqSoSdgf t fnpo¢qSs l o iut ² m q k iut s º ±¸i s8q m+l o fn½oHdgf i So dgf t »
65
* lm
)-
nff
`zf#
co a_# a H:_^e/] U3
a] \_]co
:j q!\z/]u"]h^]5`]mb}~#^ ]zgIb!ch~`!coce
l{=] qj q!\z/]u¢}]!\2p`!coce
`_ =/*ce VfFch!c `j ^Y P¢ [{_3``ju
¸ !' [
«
TVUOW+XLY Z[Y\³X ^_©`)bedgf '§0r¤
¯V# v= lm+±M0h§F=VPOh
P ¯
µ dgf t f
qQn
¯ P ¯EKb
v=P"® y
0Á#u ¯
t q iutVk iut »
FD º f:=<»<»
£¤
/ ¥ ¦ ¨ U ^_©`
f½d l f
¼ ium npq ± f t l· l q m oSdgforµ i fnpo¢qSs l o iut n q m l s
B
lm+±
0Á#ua0M¯
?
D
qQn l
qQn
¯ !# = v yF®
® ¯#
Vy;h
L
K
v= ¿ ¯V#
ce Mf ¿ K'0
,
Q
$
K¯ #
0
Q
K & ¯#¬c* Vf Q ¯ & # Q ¯ & N Q Q Q Q , °l nf ±3ium s l qSs8²³s t qQn)(- K qQn l @pfpo¢o f t fnpo¢qSs l o iut npq m* f v= K ¯]/ & & B f t - µ dgf m Q qQn ºalt · f)- v= K ¿¯ d l n½nps lºSº f t½t qQn)( f * fD+o v= K ¿¯ » i µf k iutl nps lºSºt f · q ium q m oSdgfFD ltEl s´fpo f t nD l* f m f lt]K1#À0 »%bed¸² n)s lm 5 Dgf i D º f D t f k f t K ¿ o i K »bed¸qQn3q ºSº ²npo tEl o fn oSd l o ium f 3 m ²¸s;@pf t & np²³s8s lt q fn º q ( f8s l qSs8²¸s t qQn)( lt f qSsFg D f tSk f * o » +/i µ *ium npq ± f t oSdgf ( »"8 iut q ºSº ² npo tEl orq ium - º fpo ² n§o l ( f %K ¯ #c0 »°bedgf m °l 5 fn t qQn) ® K ¿`¯V# y v=%K® K ¿`¯EK K´# y K 0 $ K ¯ K K # 0 Q Q v=
lm+±
® K & ¯#zy v=%K® K & ¯'KK´# Q ¯ N Q Q & - - ® K & ¯ ® K ¿l¯ µ d¸q * d8np² ·· fnpoHn°oSd l o K
8
Q
@
i t u ¿ qQn l pfpo¢o f t fnpo¢qSs l o iut »bed¸qQn3s8q · d¸o nffps q m ¢o ²¸qSo¢q f º t f l n ium+l º f ;²³o§oHd¸qQn lm npµf t ± f gf m+± n ium oHdgf d i q f i k t q iut » bedgf l ± o lm o l· f i k ² npq m¸· s l qSs8²³s t qQn s n qQnoSd l o°Sq o ±¸i fn m i o t f ¹;²¸q t f ± fn +qSo fqSoSn t[i º fpÃ
D
)(-
D
*
4B 65
D * D @ )-
B
@ E@
3
! "
D
*
3
- * FD º f D t[i @ º fpsÃn)65 * ² º o »HG
d q · d ± qSs´f m pn q ium+lº i s ium f o i d ii nf l t q iut » pm ¸ d ii npq m¸·´l3± f k f m npq º f t q iut l m pf8f o t pf s´f º ± q
*
@ D
0*
@
`x}]´!coc*
e] qj´q!\z/] ! `nx}]´^q¡g q`co_]"^:q] 3^zZ" o`c* /] \2]"]` d= ]ecoc*@4 ] /#oc* VfFc5!c9`ja[ ^/]Ac*c* Vf|l/ce ]OI\2]"] n coc*@ 4/§ P¢ [{_3``jda
^¢/ ] P¢ [{ t_`c* /]`u
R
*
TVUOW+XLY Z[Y \]X3^_]` t qQn qQn lºSº f ± l k iut t q iut q k
)(
*
D
5
f oSd l os8q m qSs8qufn½oSdgf °l f n ± f qQnpq ium t ² º § h§U"V¤¥y » i t s u l ºSº qQn l fn t ² º f
8
v=P"®z ¯#
65 -
A5
#q ® ¯ ?
B
'0ÁFu=Á¯
µ dgf t f§oHdgf§q m s8²¸sÂqQn i f t/lºSº fnporqSs l o iut n »
R
43
TVUOW+XLY Z[Y \]X3^_]` m fnporqSs l o iut oHd l o3s8q m qSs8qufn oHdgf s l q s8²¸s t qQn §qQn lºSº f ± ½ l py¦r'§r ¤¥ » iut s lºSº Qq n s8q m qSs l q k v P"® = ¯V#¬#q ! v=P"® ¯ '0ÁFu b¯ ? ? µ dgf t f§oHdgf§q m s8²¸sÂqQn i f t/lºSº fnporqSs l o iut n »
)(
*
8
65 -
3
B
DFEHGE
oT LFW hW ¡ ORQ T¡UK W
0z Y$ s ]u°`]c ¢P [{ /_]_c|m U] `/_]^#x{
F
P( GU¯V#
HGV( ¯ y¯ S GU¯
F
# 5
F
S GV( ¯ y¯ F HGV( ¯ P ¯EKb
'0 Á#u ¯
}_ HGU¯ # 5 HG0®/¯EKb # 5 SGV( ¯ P ¯EKbrh '§P0§¥ P_[ ¦¤0M'] qR£F u / F _bVydVy;hY] q,
_`c* /]
%
« p" % !'
SGU¯{
( GU¯V#
UO\CU¥
@
5
bedgf ° l
«:P"® SGU¯¯ P( GU¯EKb¸N y
)( ® ¯
fn t qQn
® ¯# y
n l orqQn fn
'0Á#u ¯
F
+ ( GU¯ HGU¯©KLGJN
9B
¬Jfpo HGU¯ pf/oSdgf lº ²gf i k oSd l o¶s8q m qSs8qufn f n fnpo¢qSs l o iut » °l
I+ ( GU¯
»*bedgf m qQn oSdgf
5 ¦XY\
A_}`/§P¢ [{_3``jn
q]aa]} d ® ¯ # y v=yF® ¯ y¯'K"%# y y «:PF® HGU¯`¯ SGV( ¯EKLG y¯EKb f F
«:P"®z SGU¯¯ HG0®2 ¯EKLG+Kb§#
#
y
y
#
y
dµ y «:PF®2 HGU¯`¯ y+( GU¯EKb
y
F
F
f
«:yF®z# HGU¯¯ P( GU¯ HGU¯EKLG+Kb
y
F
HGU¯©KLG
#zy
I ( GU¯ SGU¯©KLGJN
Iw qU}\2]"] HGU¯]U¢¢Z
a!~] qU3/ V,c*coQ4_ + ( GU¯/_ }~}aabcoc*@4H¢
^s _Z `{/Go
^5b!c*coQ4_H/ /_ / Ja 5 I ( GU¯ S GUE¯ KLG0u G
:]M} }5\
#^
fF a\q]`ch!a =q]/#ÃP¢ [{ _`/ce V/]
q]] c*$ \_ \$a]¢q!#\_/]_u UO\CU¥ Hk«:yF® ¯ # P $ ¯ &
oHdgf m oHdgf
SGU¯#zy y+( GU¯'Kb/#w$y+( R F
5
5 l
#xGU¯pN
fn/fnpo¢qSs l o iut qQn
'0Á#u ¯
Hk«:P"®z ¯%# ( $ ( oHdgf m oSdgf °l f n3fnporqSs l o iut Qq noSdgfs´f ± q lm oHdgf i npo f t q iut P( GU¯ » Hkt«:P"® ¯ Qq n uf ti ium f ºQi nn Ho dgf m oHdgf °l F fnpo¢qSs l o iut qQn%oSdgf§s i;± f i k oHdgf i npo f t q iut P( GU¯ » F
?D
^D
3
)-
5
i k fn
#X' }aa" #]Z~] c q]b! ^s_`] a]_uV#
P¢ [{_
!#a SGU¯0coc*@4_ I+ ( GU¯#65:Pg$ HGU¯`¯'& P+( GU¯EKbFu¦
jb#
F
! "
/#3^_Z V/Z ] q I ( GU¯H}/o_ \z~/]h SGU¯
#^ez b! a /´ ] -*{"_a^/5b! ]r 5y§$ SGU¯¯ P( GU¯EKb #c-#uF]aZ" oq] F S GU¯~}~$ z/ 0 Á#u "u G
JI
;(
£ ¤
¥§¦ ¨ U8^_©`^¬Jfpo R ¿2® N N N_® R ¯'H® e&z¯ µ dgf t f & qQn m i µ m » ² i nf¶µf²nf l ¯' ® & ¯ t q iut k iut »bedgf °l fn fnporqSs l o iut µVqSoSd d qQn t fn gf oo i n;¹;² lt f ± f tpt[iut%ºQi nn%qQn§oSdgf i npo f t q iut s´f lm µ d¸q 3
L DD D *
D
DFEHG
D
&
SR 2¿ ® N N N_® RJI¯V#
5
RÀ
I
&
I
I
&
-
*
G
gN
ORSORQ7Tj3NPLFW
#]ac ] q ^# oco#c* Vfo!a\]co a\[ ^
#^A} \[ ] V/_c* F p\]co a_/\]Z_/
] q d] `{7##!# }d}aa co/]´ q_} j z{£_!a/u nce pco`/
d]r2
j
[}¢ [{oq]c_\_/]d¸ P¢ [{_¢l/c* ],}n \_]`/
j q!\z/]A
coce Mfvu
U;\CUu¥ .¬Jfpo @pf§oSdgf
fn t ² º f k iut n i s´f
t q iut
® V¯ # #q H® ¯N
L²DD
D
5
°l
¾
'0 Á#u¯
?B
i nf%oHd l o v=PF® ¯
bedgf m qQn%s8q m qSs l lm+±
qQn
H® ± ¯ q] t
aa³N
'0 Á#u¯
* lºSº f ± l5¥b§_oªI§!V§ ¥
»
"F!# U] s/ V5n#] c*c* VfUuk/_=$
"g ] `!a= `!\2/ ! ? v=P"® ¯ / ! ? v=yF®z ¯ u#F#\5
[ Z_/
*] q nq!\z/]' a }~ {" a`
Y]$b!
a,/]Ac* VfFg ch!cAmF} [Z$/ H® $¯ ! v=yF® ¯ u3_\ m
?
® ¯%¬`! = v P"® ¯]/7!# = v yF® % ¯
?
?
® ¯
% 9
%
}\2A\] b/
^#\_ '0Á#u ¯ u G
UO\ QU¥ L²DD i nf oSd l o qQn°oSdgf l 5 fn t ² º f µVqSoSd t f 3 ng D f * o o i n i s´f,D t q iut » L²DD i nf k ² t oHdgf t oHd l o d l n *ium npo lm o ( ¾ v=PF® ¯# k iut n i s´f »°bedgf mn qQn%s8q m qSs l » t qQn)
¢P [{ `j H® ¯¶#c5v=PF® ¯ ,P¯'Kb8#$
#^ #\ v=P"® ¯ H® ¯q] aaFu3 ]}
a{Ah `_Z"] !3] g _c|u G
£¤
¥/¦ ¨ U ^ _©`^_ ¼ u i m npq ± f t oSdgf f t m i ² Sº º qs i;± f º µVqSoSdn;¹;² lt f ± f tt[iut µ f ± So d l o oSdgf8fnporqSs l o iut ºQi nn;» pm f l s º f » ´µf½nd i
FD :=< <
IZ X I K0 HR ¯ # [
d l n l s´f lm 1# qQn§s8q m
T
V
-
*ium
)(
*
Z
¿ R
Q
Q
Q
ED gT V
npo b d¸qQn f nporqSs l o iut qQn1oSdgf l m o t qQn k ² m ro q ium » e i npo f t q iut lm+± dgf m f oSdgf °l f n t ² º f k iut So dgf t q iut%P~_/ ® ¯ µVqSoSd » /f m f Ho dgf t f q i ² § n oHdgf iut fps oSd¸qQn8fnpo¢qSs l o iut # qSs l »
* Q G
£¤
¥/¦ ¨ U ^_©`^ o¢q ium
¬Jfpo
5
* )-]@ 5
)-
D
^D B
¼ ium npq ± f t*l· l q m oSdgf
@
-
* 3
f tm i ² ºSº q ;²¸oµVqSoHd ºQi nn k ² m
«:6K® K ¯V# K06 K' 0 $ $PK K ¯'& ¯
K HR I ¯# K # S Q
N
N
)( d K$PK¯ & d K 0 $2K¯ 0 0 v=%K® K # ¯V# K ' 0 $PK¯ f # K'0 $PK¯ Q f Q µ d¸q * d- l n l k ² m* orq ium i k K -qQn *ium npo lm o » o * lm @pf*nd i µ m oHd l 4o - k iut oHd¸qQn ºQi nn k ² m* orq ium - K HR I ¯ qQn°oSdgf l 5 fn%fnporqSs l o iut ² m+± f t oHdgf D t q iut %K ¯ #c0 » §f m* f)- K qQn§s8q m qSs l »G bedgf t qQn qQn
! "
!/
ab!l/]/]d
jA_} V:/5c*c* Vfn_`c* /] q] 3] c*
aco]F^_a
U;\CUu¥ ¬Jfpo R3¿z ® N N;N_® RJI/ ¯pyF ® 0à ¯ lm+±º fpo / # R »°bedgf m qQn8s8q m qSs l µVqSoSd t fnDgf * o o i lm 5 µf ºSº 3 @pfd l Bf ± ºQi nn k ² m* orq ium » o
U¨ ¨ U u
U
¥°U
XVZ u
Z Z U§¨ U U¨ Qq n%oSdgf iumgº fnporqSs l o iut µVqSoSd oSd¸qQn t[i gf t o » 'UOZ ¥ Z U;\]X U¤ wIq h /
co_`
\st``\_/_^mv5/#]`c
U]Z s^]F_
X
¥§¥ UpZ ¢Y
\Z Z U \ ¢Y¸Y X¸` ] : a{d
/$ f"t f c* a]}_u U QU; ¨ Z \]¨ ¦ £ ¤
¥§¦ ¨ U8^_©`^ ² i nfoSd l o R ¯'PF®;0M¯ lm+± oSd l o qQn m i µ m Z \ 'UOZ \ ¥°U
QU o i º q f q m oSdgf q m o f t lº$ ® ! µ dgf t f ®0 »8bedgf´² m qr¹;²gf ]` s8q m qSs l fnporqSs l o iut ² m+± f t n;¹;² lt f ± f tt[iut%ºQi nn%qQn
^D D
65
5
L DD B
`/
/
J(
# HRk¯# 2 0 R¯ $&%$'$)(%_¯ $&% *$)(% ¯ » o * lm @pf
)-
µ g d f t f 2
#"¯ # nd i µ m oHd l oVoSd¸qQn qQn oHdgf °l fn t ² º f*µVqSoSd t fn gf oo i oSdgf t q iut oHd l o +²¸oSn*s l nn + l o lm+± s l nn + l o $ »-, iut f i f t qSo l m pf n d i µ m oHd l ooHdgf t qQn qQn m i o ium pn o lm o ;²¸oqSo ±¸i fn n l orqQn k %v=PF® ¯$ ® ¯ k iut%lºSº nff q · ² t f » » /f m f Ãbedgf iut fps » 3qSs º q f n So d l o qQn s8q m qSs l »
5
G
* ;:=< <
8
DFEHG/.
F:
@
D *
* )-
D
B - * @ 5 :=< 4:: FD
TOPQQ 3 0 OrL#NPO21V4365
D
:
7
)(
OPSORQ T¬TvS73 =T L#W
° ]
/ c*z/\Hco]F^_a 0/ V/lq {}[
j` !a
x{t\_]^/]m /#dce Mf#ch!c aj _a]F]"^k`c* /] 55
`][f#ce V/a{c*c* VfUu ]`^_~`"!
_^n`]~a] }\2e~`b!
_^n¡ a!Z
`¡
#\ u wx
/
co_\c*]"^a¢}a¡
c* #a_m\[ `]}/ V : ¦LY
¨=¨ ; Z U /#hZ ¡ \hc ^]co V/s
]n/#o``j] q,/# |98¸~ <=u
QU Y
uY V\/ U ] ! a{db! a/Z
`¡
\_ > Q ( & ¯@? LY ¨ U Z U ¯ # ? ¯ Y
u &BA ? p¯ N v=PF® u
¢Y
X;U Y \/\/ U ¿ > Q ( ¯ ` }* [} /#
#/_$]Y
/
co_\hc*]"^_amU/sZ
\_ ] q/ |C8~
`][f#ce V/a{ 0 ¯ A = P¯
Q:D
%F 6 =!
%
L
"
5
10
15
20
!
¸ % Ã
−0.5
2
0.0
0.5
! ;
« a [ @!Ã[ !'a ! % ' ! /a %
C !' ¸ !'! E § %' ©
\
\ #
}` D y¯~¢° `#q]c* ]u":\_ m v PF® ¯ Q= A D
0
P¯
0Á#ua0-¯ N
°]=
{£] =l/ce ]=ms\
U#]M} / Voq] sa¡
` Q m v=P"®2 ¯ v=yF® ¯zu |] $ _\a{m
ac a Ic7`! !# Q = ? ( ? v P ® ¯
0
D
P ¯
N
0Á#ua0L0M¯
[{F$/ [mU na]"\[
aPmUa
h/
co ah_ m/ |C8~ c*Fg c* VfUuUwI$\
k
a`]ds]}/ t/ |C8~
# ][fFc* /_a{A/ P¢ [{_3`!a u wx7`!coce
l{m 7
/
co_\|co]F^#a=}7a¡ A/
co am / |C8~ h
`]fFce a{'coce MfY
^ P¢ [{_u,`= \[ [Z [ /_Y`!a|#[ j¬^#]M}¬} /kb!chA] qs
co_/_ a¡
` $
¢$zf"3zf#
co a`]}u Z ¯py Z ® e& ¯ -&% # £¤
¥/¦ ¨ U ^_©`^ !8
X#"\ ¥
¨¥°U
X!$ ¬Jfpo Q S 0 ® N N;N® Q »Ã¬Jfpo # ¿ ;® N N N® I ¯ ± f m i o f oHdgf ± l o llm+±º fpo # S S S
! "
P¿z®;N N Nz®2CIb¯§±
f m i o f§oSdgf§² m
(
m i µ m
D
R
ltEl s´fpo f t n;»
nnp²³s´f½oHd l o
I Z FI P ¿z® N N;N_®2CI¯~ Z & & [¿ - oSdgf t f lt f l n s lm 5 D ltEl s´fpo f t n k iut n i s´f - » pm oHd¸qQn s i;± f º * U ¥
X " \ ¥
¨ l n i @pnf t B l o¢q ium n;»§bedgf |C8~ IZ qQn Z# S Z # S ¿ ®;N N Nz® S I¯ » m+± f t oSdgf ¥°U
X ¦ Q\ ¨ Uu¥ Y ºQi nn k ² m* orq ium£«:PF® ¯# X ¿ $ ¯ & -%oSdgf t qQn)( i k oSdgf |C8~ qQn ¥°\CU LUuXU S
¨ Z u
X v=P"® ¯# &» o * lm p@ f½nd i µ [ m oSd l ooSdgf§s8q m qSs lt qQn) ( qQn l DD t[i q43 YZ ¨ \³\ ;O` 8
X s l o f º65 e& e&p p&z¯lm+±§ium f * lm m+±½lm fnpo¢qSs l o iut oHd l o l* d¸q f B fn X\³X ¦u
S
¥ UpZ ¢Y UO oSd¸qQn t qQn) ( » L q m* f e& e&© p& ¯]/ e& - µf nff oSd l o d l n nps lºSº f t t qQn)( Z[Y=¥*
Z[Y\³X ¦ Q\ ¨ Uu¥ oSd lm oHdgf |C8~ » m D tEl* orq * f)- oHdgf ± q * f t f m* fp@ fporµff m oHdgf t qQn)( n * lm
CU ¥*
Z U¥
Z[Y
¨=¨ U<=LY u
¨ UuXZ Z \ Z L Y p@ f3np² p@ npo lm orq lº »bed¸qQn3nd i µ n oHd l o/s l qSs8²³s º q ( f º qQd ii;± qQn m i o lm D orqSs lº fnpo¢qSs l o iut q m d¸q · d ± qSs´f m npq ium+lº D ti @ º fpsÃn;» i + ¥°\ U¨`
DFEHG
'3Q¨ORWMWMO,ORNPOr¡
|ce Mf$l/c* ]
^8P¢ [{ H_`/ce V/]
`` ]"]"^
_ `/ce V/] /_~ ¦/z{ Z ¢c*
aa"``ju F] c*z/co¦¦
a]t!#_q!#a /]=\2 /
\z/`Q4_
^|_`c* /]_u
R
TVUOW+XLY Z[Y \]X3^_]`a^ m f npo¢qSs l o iut qQn f qQnpoSn lm i oHdgf t½t ² º f pn ² d So d l o v=P"®z v=P"®z £ ¤
¥§¦ n;¹;² lt f s8qQnnpq nps lºSº f v=¢F Á ® m it ² º l ± ± f
¯
@
¯
q k oHdgf t f
*
¯±q]3
aa
^ v=yF® ¯±q]3 a[
lt] h¸N
/
v=yF®
*
¨ U8^_©`^ ¬Jfpo R ¯'PF®;0M¯ lm+± ium npq ± f t fnporqSs l o¢q m¸·h µVqSoSd HR¯#®Á » f µVq ºSº nd i µ oHd l o qQn l ± ± f ttiut´ºQi nn;» ¬Jfpo º f;» ² i nf m i o »°bedgf m oHdgf t f8f qQnpoSn l3± q *f t f m o t ² º f VµVqSoSd ¯ # - » §f m f -3# t t qQn » m lt o¢q ² ºalt v=¢ÁF® ¯ v=rÁ#® ¯ # 3 5 R SU G ¯ $ Á¯'& HGJIÁ¯'KLG »%bed¸² n yHGU¯ # Á » i oSdgf t f8qQn f oHd l o pf l oSn » fF m oSd i ² · d qQn l ± s8qQnnpq º f/qSo¶qQn pº f ltº l qQnpq ium t ² º f;»
@ ,L DD )( D
y§py[[ ¥
*
A@
-
*
HG
B
)-
@
L
* )*
65
3
!% ¸ ¢"
]*^#`x{©
eªI¤¦¥y¥*¤ Mqq]o_Z `{7'
#^7_Z `{ -#m5 ? ?( P ¯EKb x-#u
UO\CU¥ h§F V¤¦¥"s§Mr§pP[ ¦¥+ L²DD i nfoHd l o ¶ lm+± oSd l o v=yF®z ¯ qQn l2*ium orq m ² i ² n k ² m* orq ium i k3/k iut f Bf t 5 »¬Jfpo p@ f l D t q iut ± f m npqSo 5 µVqSoSd k ² ºSº np²DD iut o lm+± º fpo @pf oSdgf °l 5 fn t ² º f;» Sk oHdgf l 5 fn t qQn)( qQn m qSo f%oSdgf m qQn l ± s8qQnnpq@ º f;»
F !# U]
^co`a u__ ` fF` :_`/ !\2 / v=P"® ¯ v=yF® ¯*q]
aa3£
^jv=P ® ¯ / !a p v=P ®2 ¯q] ¢]co uz #6v=y ®z ¯$ v=P ®2 ¯ x-#u F\_%v \_]/b!] !m[` 0
-`!\2 v=yF® ¯O$ v=P"® ¯ q]
aa´P $ ®2 ¯zu 3 ]}$m
H® ¯
$
® ¯
v=yF® y
#
#
¯ P ¯EKb*$
y
v=P"®
¯ P ¯EKb
Q v=P"®z ¯ $ v=P"®z ¯ y¯EKb ? y v=P"® ¯ $ v=P"® ¯ y¯EKb ? ( y ? y¯'K" ? ( -]N y
3_\ m H® ¯ ®2 ¯zuc* a_U/ ^]"#] 0c*coQ4_ ® ¯}\2A\]/
^\_qy
\_ t h/§P¢ [{ :!a u G
UO\CU¥ . ¬fpo R3¿ ® N;N N® RJI p ¯ H® e&z¯ » t[iut§ºQi nn)- R qQn l ± s8qQnnpq@ º f;»
m+± f t n;¹;² lt f ± f t
~ ]"] q] q¢a¡ `/_]_c b!~/_\2\[
a#
^h ]co`/^ !#:/ ^ o:
:q] aa]}_u5 ]``]co[
A3
^co`#aq]
{n`\_/a{d ]/Z `]_u#¦
j $/$ `]/]oU% ¯' ® &_¯zu#X _ &s Z l{pa¡ m0d ]``]c*
p5
# ][fFc* /_a{_"!
a] R£u
3
! "
]}
dco#c* VfFx{ ^´
^co`#ax{raj ^
dwx£ _
aRm¦
3 !#a ce [{$~] m ] 5]¦/_uP~!#¦`
] c* qy
\_ ajb
^co`ax{o
^Acoce VfFx{u
U;\CUu¥ L²DD i nf½oHd l o npq@ º f;» bedgf m qSo qQn§s8q m qSs l »
j v=RF ®2 ¯
)(
*
d l n ium npo lm o t qQn
#
3
lm+± qQn l ± s8qQn
nq]e`]c* Vu wIqhY }~_]
coce Vf=A/_$zfF`/: =!a !#\2|/
v=yF®
® ¯V# uN %¯ ! = v P"® ¯ /¨!# = v F ?
?
¢}~]!#a^Aco a{o/ Vt= ^c*`a u G 3]} }=\[
`]MZ o `l/\_^Z`]] q~#]`c b! ^|`]a]_u
*q]
0ÁFu@0
U;\CUu¥ ¬Jfpo R3¿z® N;N N_®ER;I/z¯'P"® 0M¯ »bedgf m -² m+± f t n;¹;² lt f ± f tt[iut§ºQi nn)- / # R qQn%s8q m qSs l »
\_\]`^# k]'_]_c 0 Á#u ¸0m o
^co`#a u ½0 Q }\2A\] `/
[u_!a:q] aa]}¢q`]c g jd] q s ]`c 0ÁFu "u G a ]! ©coce Mfp`!a_* A] o !
/_^©]Y|
^co`g ah/z{k
`\_a]`=]A
^co`a u # [{/ V n$_M¥ y §py[[ ¥£q`rzfF`/ Y!a
^
-p`!\2 / V v=P"® ¯ /v=P"®z ¯ $ q]
aa"u
U;\CUu¥ npq@ º f;» DFEHG
Hk
65
qQn°s8q m qSs l oHdgf m qSoqQn m i onpo t[ium¸·º
q m+l ± s8qQn
J¡VL#OPSWnJTUK T 3V
F! ]|/ R ¯pyF® 0M¯=
^¨\]`^_*_`c* / '} b! ^±_] a] ut°]c /#p zZ"]!\z/] }~£jb]}i/ V
3
]
« ¸ !' ¸ § !
SRk¯ # R o
^#c*a u :]M} \]`^_*_`c* / x}~]m,!#zg a¡ ^ "!
/_ §# y¿z®2 & ¯0
^5`! ]/ R3¿z¯'P¿z® 0¯
#^ ¯V#YX & ¿ y $ ¯ & u R z¯'P ® 0M¯#^ ^#/a{mM}/ a]`«3yF® & & [ 3] `! #` a{m H R¯ # R
^co`aA}`1R # H R3¿ ® R ¯ u 3]} \] ^$/o _
a@ 4[ ]r ] Z ]`ce
aZ co[ u & _ # P ¿ ® N;N N®2 [¯zm R # H R3¿ ;® N N NE® R¯}/ R ¯'P ® 0¯ Fg ^ #^ ¯3
^ra]3«:PF® ¯ # X ¿ P $ ¯ & u "/_
`]!^_^ [ _Z `{ ]t}#d: `]MZ ^d [mq Á#m/## H RkV¯ #jR ,
^Fg c*`a uwI\[ Y*`]}Y /sq]aa]} n_`c* /]_mjb]}
/
co g "A_`c* /]_m# :`ce aa_~`j
HRk¯#
d
0 $
Z$ Z R Z X R &f
0Á#ua0M¯
Z
}`´#"¯ # ce Mf "F®- bu,*l/c* ]=`jb* R o/] g }¢ ^ -#uHco`/
h/ m }_¨l/ce ce
{£
/
csg __m¢/_e nZ
a! ``jb pl/ce _u¢ ]``Z ]Y a¡ [{"h pco U] `2 b`]a=Yc*]"^_'#]
co_/`\ q!\z/] A_`c* /]u DFEHG =O ,NRO K T 1O nLFQ TUK ¦W
Iw 5h^o\!a5/] ^£]F]jb5 s\_]Zsco]"^`p^\_`]'zg ]`{r [ V^_/
aPu \z/$] q ^\_`]] `{r\
U q]!^
_aa o ^ P R--¯zm©P~` '0 ¯ m° !]''0 ¯¢
#^ c*
|
^
`aa A' 0 ¯zu DFEHG HL"u K FOPWL#W
0uwx=[
\2o] q/#¢q]aa]M} co]F^#a_m ^r ¯/ P¢ [{_,`jh
#^ /#/~P { :l/c* ]mF! *`b!
_^_] a]`u
P ¯ R
zP~]co¡
a
Q ®gK¯zm=K´zP~z2 gT ®aV¦¯zu
y¯¶R® ]`]U¯ m3 c*c* "gT® V ¯ u y\[¯R
z¯'P"® e&z¯~}` e&3jb]}r
^|§z¯p ® &z¯
u
u
! "
Fu:_ R3¿ ® N;N N® RJI6 ¯pyF® e&z¯n
#^ ` ! ]}~kl/ce }/a] tq!#\_/]«:P"® ¯ # P/$ ¯ & & u F]} R
^#c*at
^Ac*c* VfUu Á#u:_s
V¿ ® N;N N®2 Ud
/e
/
co_
\_ u ,`]Z ~ ]l/`]co]F^/ P¢ [{_`c* /]!^#4_]Vg ]#a]`u #
u
_aa¡ =
#^ ~P _ _u ¯:_
JI
R3¿z® N N N_® R =U =/
co aq`]cµ
e&[u ]^_`c* /]`t] q,
^ `#!#/]|}/rZ
\_ q]`c V& }` & =/|/
co aAZ
`¡
\_ u _*/|a]` q!#\_/]nq]l/ce e&3 «: & ® & ¯ # & &
d
$x0 $pa]
& &
f
N
° ^*] F/c*
aZ
a!#t] q ¢ / coc*@4H`j=q]~
aa e&u
Fu rP~`a#m0 LÁ¯zu"_ a]¢q!#\_/]
a Q ® K¯ ^d!# U] t
R®cP]co¡
«:%K® K ¯ #
d
0 $
K
K f
&
}_8-E/WK / 0u ]^3/h_`c* /] K SRk¯ # -FuU _`/ce V/]:qy
aa]!#^#h/=
co_/_
\¢-#®;0M¯3!#$} }aa
aa]}/u F#]M}'/ K S Rk¯ #.-/#!#b! mco#c* Vf `!a u u \]¥§¦ Z U ¶£¤¸¦LU ¢Y=¥ UuXZp` ¯ ] c* 5hj] qhcoa
^
co g "el/ce ]t'0Á#ua0M¯{e`ch!#a¡ ]u`{=Z
g ]!hZ
a!#o] q Q
^´Z
]!sZ_\_]eFu F!coc*
@4d{] ! `!#a/_u
Part III Statistical Models and Methods
243
This is page 244 Printer: Opaque this
This is page 245 Printer: Opaque this
14 Linear Regression
Regression is a method for studying the relationship between a response variable Y and a covariates X . The covariate is also called a predictor variable or a feature. Later,
we will generalize and allow for more than one covariate. The The term \regression" is due to data are of the form Sir Francis Galton (Y1 ; X1 ); : : : ; (Yn; Xn): (1822-1911) who One way to summarize the relationship between X and Y is noticed that tall and short men tend through the regression function to have sons with Z r(x) = E (Y jX = x) = y f (y jx)dy: (14.1) heights closer to the mean. He called this Most of this chapter is concerned with estimating the regression \regression towards the mean." function. 14.1
Simple Linear Regression
The simplest version of regression is when Xi is simple (a scalar not a vector) and r(x) is assumed to be linear:
r(x) = 0 + 1 x:
246
14.
Linear Regression
This model is called the the simple linear regression model. Let i = Yi ( 0 + 1 Xi ). Then, E (i
jXi)
= = = =
( 0 + 1 Xi )jXi) = E (Yi jXi ) r(Xi) ( 0 + 1 Xi ) ( 0 + 1 Xi ) ( 0 + 1 Xi ) 0:
E (Yi
( 0 + 1 Xi )
Let 2 (x) = V (i jX = x). We will make the further simplifying assumption that 2 (x) = 2 does not depend on x. We can thus write the linear regression model as follows. De nition 14.1 The Linear Regresion Model
Yi = 0 + 1 Xi + i
(14.2)
where E (i jXi) = 0 and V (i jXi ) = 2 .
Example 14.2 Figure 14.1 shows a plot of Log surface temperature (Y) versus Log light intensity (X) for some nearby stars. Also on the plot is an estimated linear regression line which will be explained shortly. The unknown parameters in the model are the intercept 0 and the slope 1 and the variance 2 . Let b0 and b1 denote estimates of 0 and 1 . The tted line is de ned to be rb(x) = b0 + b1 x: (14.3) The predicted values or tted values are Ybi = rb(Xi ) and the residuals are de ned to be b b b bi = Yi Yi = Yi 0 + 1 Xi : (14.4) The residual sums of squares or rss is de ned by n X rss = b2i : i=1
247
4.4 4.3 4.2 4.1 4.0
Log surface temperature (Y)
4.5
4.6
14.1 Simple Linear Regression
4.0
4.5
5.0
Log light intensity (X)
FIGURE 14.1. Data on stars in nearby stars.
5.5
248
14.
Linear Regression
The quantity rss measures how well the tted line ts the data. De nition 14.3 The least squares estimates are the valP ues b0 and b1 that minimize rss = ni=1 b2i . Theorem 14.4 The least squares estimates are given by Pn (Xi X n )(Yi Y n ) i=1P b 1 = ; (14.5) n 2 i=1 (Xi X n ) b0 = Y n b1 X n : (14.6) An unbiased estimate of 2 is
b = 2
1
n 2
X n i=1
b2i :
(14.7)
Example 14.5 Consider the star data from Example 14.2. The least squares estimates are b0 = 3:58 and b1 = 0:166. The tted line rb(x) = 3:58 + 0:166x is shown in Figure 14.1. Example 14.6 (The 2001 Presidential Election.) Figure 14.2 shows the plot of votes for Buchanan (Y) versus votes for Bush (X) in Florida. The least squares estimates (omitting Palm Beach County) and the standard errors are
b0 = 66:0991 seb ( b0 ) = 17:2926 b1 = 00:0035 seb ( b1 ) = 0:0002: The tted line is
Buchanon = 66:0991 + :0035 Bush: (We will see later how the standard errors were computed.) Figure 14.2 also shows the residuals. The inferences from linear
14.2 Least Squares and Maximum Likelihood
249
regression are most accurate when the residuals behave like random normal numbers. Based on the residual plot, this is not the case in this example. If we repeat the analysis replacing votes with log(votes) we get
b0 = 2:3298 seb ( b0 ) = :3529 b1 = 0:730300 seb ( b1 ) = 0:0358: This gives the t
log(Buchanon) = 2:3298 + :7303 log(Bush): The residuals look much healthier. Later, we shall address two interesting questions: (1) how do we see if Palm Beach County has a statistically plausible outcome? (2) how do we do this problem nonparametrically?
14.2
is,
Least Squares and Maximum Likelihood
Suppose we add the assumption that i jXi
N (0; ), that 2
Yi jXi N (i ; i2 )
where i = 0 + 1 Xi . The likelihood function is n Y i=1
n Y
f (Xi ; Yi ) =
i=1
n Y
= where L1 =
i=1
i=1 fX (Xi )
L
2
fX (Xi )
L L
=
Qn
fX (Xi )fY jX (YijXi )
1
n Y i=1
fY jX (Yi jXi )
2
and
=
n Y i=1
fY jX (Yi jXi ):
(14.8)
Linear Regression
• •• • • • • • • • • • • • • • • • • • ••••••••• •• • • 0
50000
200
• • • • • ••••• • ••••••••••• •••• ••• • •••• • •• • • • • •
• ••
•
150000
0
Residuals
1000
3000
•
250000
0
50000
150000
1.0
•
0.0
•
-1.0
Residuals
8 7 6 5 4 3
••
2
log Buchanon
• • •• • • • • • • • • ••• ••• •• • • • • • • • •• • • • • ••• • • • •• • • • •••• •• • •• •• •
8
9
10
250000
Bush
•
7
••
•
Bush
•
•
-400
14.
0
Buchanon
250
11
12
log Bush
FIGURE 14.2. Voting Data for Election 2000.
7
• • • • • • • • • • •• •• • • • • • • • • •• • •• • • • • • • • • • • • •• ••• • • • •• • • • • • • • •• • • • • • • • 8
9
10 log Bush
11
12
14.3 Properties of the Least Squares Estimators
251
The term L1 does not involve the parameters 0 and 1 . We shall focus on the second term L2 which is called the conditional likelihood, given by ( ) n Y X 1 L2 L( 0; 1; ) = fY jX (YijXi) / n exp 22 (Yi i)2 : i=1 i The conditional log-likelihood is
`( 0 ; 1 ; ) = n log
n 1 X
2 2
i=1
Yi
2 ( 0 + 1 Xi ) : (14.9)
To nd the mle of ( 0 ; 1 ) we maximize `( 0 ; 1 ; ). From (14.9) we see that maximizing the likelihood is the same as minimizing 2 Pn the rss i=1 Yi ( 0 + 1 Xi ) . Therefore, we have shown the following. Theorem 14.7 Under the assumption of Normality, the least squares estimator is also the maximum likelihood estimator. We can also maximize `( 0 ; 1 ; ) over yielding the mle 1X 2 b2 = bi : (14.10)
n
i
This estimator is similar to, but not identical to, the unbiased estimator. Common practice is to use the unbiased estimator (14.7). 14.3
Properties of the Least Squares Estimators
We now record the standard errors and limiting distribution of the least squares estimator. In regression problems, we usually focus on the properties of the estimators conditional on X n = (X1 ; : : : ; Xn ). Thus, we state the means and variances as conditional means and variances.
252
14.
Linear Regression
Theorem 14.8 Let bT = ( b0 ; b1 )T denote the least squares estimators. Then, 0 n b E ( jX ) = 1 ! Pn 1 2 2 X X n n i=1 i V ( bjX n ) = (14.11) n s2X Xn 1 P where s2X = n 1 ni=1 (Xi X n )2 . The estimated standard errors of b0 and b1 are obtained by taking the square roots of the corresponding diagonal terms of V ( bjX n ) and inserting the estimate b for . Thus, r Pn 2 b i=1 Xi p b ( b0 ) = se (14.12)
n
sX n b p : b ( b1 ) = se sX n
(14.13)
b ( b0 jX n ) and seb ( b1 jX n) but We should really write these as se b ( b0 ) and seb ( b1 ). we will use the shorter notation se Theorem 14.9 Under appropriate conditions we have: 1. (Consistency): b0
!
P
0
and b1
! .
P
1
2. (Asymptotic Normality):
b0 0 b ( b0 ) se 3. Approximate 1
N (0; 1) and
b1 1 b ( b1 ) se
N (0; 1):
con dence intervals for 0 and 1 are
b0 z=2 seb ( b0 ) and b1 z=2 seb ( b1 ):
(14.14)
The Wald test statistic for testing H0 : 1 = 0 versus H1 : 1 6= 0 is: reject H0 if W > z=2 where W = b1 =seb ( b1 ).
14.4 Prediction
253
Example 14.10 For the election data, on the log scale, a 95 per cent con dence interval is :7303 2(:0358) = (:66; :80). The fact that the interval excludes 0 The Wald statistics for testing H0 : 1 = 0 versus H1 : 1 6= 0 is W = j:7303 0j=:0358 = 20:40 with a p-value of P(jZ j > 20:40) 0. This is strong evidence that that the true slope is not 0.
14.4
Prediction
Suppose we have estimated a regression model rb(x) = b0 + b1 from data (X1 ; Y1 ); : : : ; (Xn; Yn ). We observe the value X = x of the covariate for a new subject and we want to predict their outcome Y . An estimate of Y is
Yb = b0 + b1 X :
(14.15)
Using the formula for the variance of the sum of two random variables, V (Yb ) = V ( b0 + b1 x ) = V ( b0 ) + x2 V ( b1 ) + 2x Cov( b0 ; b1 ):
Theorem 14.8 gives the formulas for all the terms in this equab (Yb ) is the square root of tion. The estimated standard error se this variance, with b2 in place of 2 . However, the con dence interval for Y is not of the usual form Yb z=2 . The appendix explains why. The correct form of the con dence interval is given in the following Theorem. We call the interval a prediction interval.
254
14.
Linear Regression
Theorem 14.11 (Prediction Interval) Let
bn2 = seb 2 (Yb ) + b2 Pn (Xi X )2 2 i =1 +1 : = b P n i (Xi X )2 An approximate 1
(14.16)
predicition interval for Y is Yb z=2 n:
(14.17)
Example 14.12 (Election Data Revisited.) On the log-scale, our linear regression gives the following prediction equation: log (Buchanon) = 2:3298 + :7303log (Bush). In Palm Beach, Bush had 152954 votes and Buchanan had 3467 votes. On the log scale this is 11.93789 and 8.151045. How likely is this outcome, assuming our regression model is appropriate? Our prediction for log Buchanan votes -2.3298 + .7303 (11.93789)=6.388441. Now 8.151045 is bigger than 6.388441 but is is \signi cantly" bigger? Let us compute a con dence interval. We nd that bn = :093775 and the approximate 95 per cent con dence interval is (6.200,6.578) which clearly excludes 8.151. Indeed, 8.151 is nearly 20 standard errors from Yb . Going back to the vote scale by exponentiating, the con dence interval is (493,717) compared to the actual number of votes which is 3467. 14.5
Multiple Regression
Now suppose we have k covariates X1 ; : : : ; Xk . The data are of the form (Y1 ; X1 ); : : : ; (Yi; Xi ); : : : ; (Yn; Xn ) where
Xi = (Xi1 ; : : : ; Xik ):
14.5 Multiple Regression
255
Here, Xi is the vector of k covariate values for the ith observation. The linear regression model is
Yi =
k X j =1
j Xij + i
(14.18)
for i = 1; : : : ; n, where E (i jX1i ; : : : ; Xki) = 0. Usually we want to include an intercept in the model which we can do by setting Xi1 = 1 for i = 1; : : : ; n. At this point it will be more convenient to express the model in matrix notation. The outcomes will be denoted by 0 1
B Y =B B @
Y1 Y2 C C .. C . A Yn
and the covariates will be denoted by 0
1 X11 X12 : : : X1k B X21 X22 : : : X2k C C: X =B B .. C . . . . . . @ . . . . A Xn1 Xn2 : : : Xnk
Each row is one observation; the columns correspond to the k covariates. Thus, X is a (n k) matrix. Let 0 1 0 1
1
1
k
n
C B C =B @ ... A and = @ ... A : Then we can write (14.18) as
Y = X + :
(14.19)
Theorem 14.13 Assuming that the (k k) matrix X T X is invertible, the least squares estimate is
256
14.
Linear Regression
b = (X T X ) 1 X T Y:
(14.20)
The estimated regression function is
rb(x) =
k X j =1
bj xj :
(14.21)
The variance-covariance matrix of b is
j
V ( b X n ) = 2 (X T X )
1
:
Under appropriate conditions,
b N ( ; 2 (X T X ) 1 ): An unbiased estimate of 2 is
b = 2
1
n k
X n i=1
b2i
where b = X b Y is the vector of residuals. An approximate 1 con dence interval for j is
bj z=2 seb ( bj )
(14.22)
b 2 ( bj ) is the j th diagonal element of the matrix b2 (X T X ) 1 . where se Example 14.14 Crime data on 47 states in 1960 can be obtained at http://lib.stat.cmu.edu/DASL/Stories/USCrime.html. If we t a linear regression of crime rate on 10 variables we get the following:
14.6 Model Selection
257
Covariate
Least Estimated t value p-value Squares Standard Estimate Error (Intercept) -589.39 167.59 -3.51 0.001 ** Age 1.04 0.45 2.33 0.025 * Southern State 11.29 13.24 0.85 0.399 Education 1.18 0.68 1.7 0.093 Expenditures 0.96 0.25 3.86 0.000 *** Labor 0.11 0.15 0.69 0.493 Number of Males 0.30 0.22 1.36 0.181 Population 0.09 0.14 0.65 0.518 Unemployment (14-24) -0.68 0.48 -1.4 0.165 Unemployment (25-39) 2.15 0.95 2.26 0.030 * Wealth -0.08 0.09 -0.91 0.367 This table is typical of the output of a multiple regression program. The \t-value" is the Wald test statistic for testing H0 : j = 0 versus H1 : j 6= 0. The asterisks denote \degree of signi cance" with more asterisks being signi cant at a smaller level. The example raises several important questions. In particular: (1) should we eliminate some variables from this model? (2) should we interpret this relationships as causal? For example, should we conclude that low crime prevention expenditures cause high crime rates? We will address question (1) in the next section. We will not address question (2) until a later Chapter.
14.6
Model Selection
Example 14.14 illustrates a problem that often arises in multiple regression. We may have data on many covariates but we may not want to include all of them in the model. A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression,
258
14.
Linear Regression
the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; too many covariates yields high variance. Good predictions result from achieving a good balance between bias and variance. In model selection there are two problems: (i) assigning a \score" to each model which measures, in some sense, how good the model is and (ii) searching through all the models to nd the model with the best score. Let us rst discuss the problem of scoring models. Let S f1; : : : ; kg and let XS = fXj : j 2 S g denote a subset of the covariates. Let S denote the coeÆcients of the corresponding set of covariates and let bS denote the least squares estimate of S . Also, let XS denote the X matrix for this subset of covariates and de ne rbS (x) to be the estimated regression function from (14.21). The predicted valus from model S are denoted by Ybi (S ) = rbS (Xi ). The prediction risk is de ned to be
R(S ) =
n X i=1
E (Ybi (S )
Yi )2
(14.23)
where Yi denotes the value of a future observation of Yi at covariate value Xi . Our goal is to choose S to make R(S ) small. The training error is de ned to be
Rbtr (S ) =
n X i=1
(Ybi (S )
Yi )2 :
This estimate is very biased and under-estimates R(S ). Theorem 14.15 The training error is a downward biased estimate of the prediction risk:
btr (S )) < R(S ): E (R In fact,
bias (Rbtr (S )) = E (Rbtr (S )) R(S ) = 2
X i=1
Cov(Ybi; Yi ): (14.24)
14.6 Model Selection
259
The reason for the bias is that the data are being used twice: to estimate the parameters and to estimate the risk. When we t a complex model with many parameters, the covariance Cov(Ybi ; Yi) will be large and the bias of the training error gets worse. In summary, the training error is a poor estimate of risk. Here are some better estimates. Mallow's Cp statistic is de ned by
Rb(S ) = Rbtr (S ) + 2jS jb2
(14.25)
where jS j denotes the number of terms in S and b2 is the estimate of 2 obtained from the full model (with all covariates in the model). This is simply the training error plus a bias correction. This estimate is named in honor of Colin Mallows who invented it. The rst term in (14.25) measures the t of the model while the second measure the complexity of the model. Think of the Cp statistic as: lack of t + complexity penalty: Thus, nding a good model involves trading o t and
complexity.
A related method for estimating risk is AIC (Akaike Information Criterion). The idea is to choose S to maximize Some texts use a slightly dierent `S jS j (14.26) de nition of AIC which involves where `S is the log-likelihood of the model evaluated at the multiplying the mle . This can be thought of \goodness of t" minus \com- de nition here by plexity." In linear regression with Normal errors, maximizing 2 or -2. This has no eect on which AIC is equivalent to minimizing Mallow's Cp ; see exercise 8. Yet another method for estimating risk is leave-one-out model is selected. cross-validation. In this case, the risk estimator is
RbCV (S ) =
n X ( Yi i=1
Yb(i) )2
(14.27)
260
14.
Linear Regression
where Yb(i) ) is the prediction for Yi obtained by tting the model with Yi omitted. It can be shown that ! n X bi (S ) 2 Y Y i (14.28) RbCV (S ) = 1 Uii (S ) i=1 where Uii (S ) is the ith diagonal element of the matrix
U (S ) = XS (XST XS ) 1 XST :
(14.29)
Thus, one need not actually drop each observation and re- t the model. A generalization is k-fold cross-validation. Here we divide the data into k groups; often people take k = 10. We omit one group of data and t the models to the remaining data. We use the tted model to predict the data in the group that P was omitted. We then estimate the risk by i (Yi Ybi )2 where the sum is over the the data points in the omitted group. This process is repeated for each of the k groups and the resulting risk estimates are averaged. For linear regression, Mallows Cp and cross-validation often yield essentially the same results so one might as well use Mallows' method. In some of the more complex problems we will discuss later, cross-validation will be more useful. Another scoring method is BIC (Bayesian information criterion). Here we choose a model to maximize BIC(S ) = rss (S ) + 2jS jb2:
(14.30)
The BIC score has a Bayesian interpretation. Let S = fS1 ; : : : ; Sm g denote a set of models. Suppose we assign the prior P(Sj ) = 1=m over the models. Also, assume we put a smooth prior on the parameters within each model. It can be shown that the posterior probability for a model is approximately,
j
P(Sj data)
BIC (Sj )
Pe eBIC S : r
( r)
14.6 Model Selection
261
Hence, choosing the model with highest BIC is like choosing the model with highest posterior probability. The BIC score also has an information-theoretic interpretation in terms of something called minimum description length. The BIC score is identical to Mallows Cp except that it puts a more severe penalty for complexity. It thus leads one to choose a smaller model than the other methods. Now let us turn to the problem of model search. If there are k covariates then there are 2k possible models. We need to search through all these models, assign a score to each one, and choose the model with the best score. If k is not too large we can do a complete search over all the models. When k is large, this is infeasible. In that case we need to search over a subset of all the models. Two common methods are forward and backward stepwise regression. In forward stepwise regression, we start with no covariates in the model. We then add the one variable that leads to the best score. We continue adding variables one at a time until the score does not improve. Backwards stepwise regression is the same except that we start with the biggest model and drop one variable at a time. Both are greedy searches; nether is guaranteed to nd the model with the best score. Another popular method is to do random searching through the set of all models. However, there is no reason to expect this to be superior to a deterministic search.
Example 14.16 We apply backwards stepwise regression to the crime data using AIC. The following was obtained from the program R. This program uses minus our version of AIC. Hence, we are seeking the smallest possible AIC. This is the same is minimizing Mallows Cp .
262
14.
Linear Regression
The full model (which includes all covariates) has AIC= 310.37. The AIC scores, in ascending order, for deleting one variable are as follows: variable Pop Labor South Wealth Males U1 Educ. U2 Age Expend AIC 308 309 309 309 310 310 312 314 315 324 For example, if we dropped Pop from the model and kept the other terms, then the AIC score would be 308. Based on this information we drop \population" from the model and the current AIC score is 308. Now we consider dropping a variable from the current model. The AIC scores are: variable South Labor Wealth Males U1 Education U2 Age Expend AIC 308 308 308 309 309 310 313 313 329 We then drop \Southern" from the model. This process is continued until there is no gain in AIC by dropping any variables. In the end, we are left with the following model:
Crime = 1:2 Age + :75 Education + :87 Expenditure + :34 Males
Warning! This does not yet address the question of which variables are causes of crime. 14.7
The Lasso
There is an easier model search method although it addresses a slightly dierent question. The method, due to Tibshirani, is called the Lasso. In this section we assume that the covariates have all been rescaled to have the same variance. This puts each covariate on the same scale. Consider estimating = ( 1 ; : : : ; k ) by minimizing the loss function n X i=1
(Yi
Ybi )2 +
k X j =1
j j j
(14.31)
:86 U1 + 2:31 U2:
14.8 Technical Appendix
263
where > 0. The idea is to minimize the sums of squares but we include a penalty that gets large if any of the j0 s are large. The solution b = ( b1 ; : : : ; bk ) can be found numerically and will depend on the choice of . It can be shown that some of the bj 's will be 0. We interpret this is meaning that the j th is omitted from the model. Hence, we are doing estimation and model selection simultaneously. We need to choose a value of . We can do this by estimating the prediction risk R() as a function of and choosing to minimze the estimated risk. For example, we can estimate the risk using leave-one-out crossvalidation. Example 14.17 Returning to the crime data, Figure 14.3 shows the results of the lasso. The rst plot shows the leave-one-out cross-validation score as a function of . The minimum occurs at = :7. The second plot shows the estimated coeÆcients as a function of . You can see how some estimated parameters are zero until gets larger. At = :7, all the bj are non-zero so the Lasso chooses the full model. 14.8
Technical Appendix
Why is the predicition interval of a dierent form than the other con dence intervals we have seen? The reason is that the quantity we want to estimate, Y , is not a xed parameter, it is a random variable. To understand this point better, let = 0 + 1 X and let b = b0 + b1 X . Thus, Yb = b while Y = + . Now, b N (; se 2 ) where se 2 = V (b) = V ( b0 + b1 x ):
q Note that V (b) is the same as V (Yb ). Now, b 2 V ar(b) is a 95 per cent con dence interval for = 0 + 1 x using the usual
Linear Regression
800
1200
14.
cross-validation score
264
0.2
0.4
0.6
0.8
0.6
0.8
−10
b()
10
30
0.2
0.4
FIGURE 14.3. The Lasso applied to the crime data.
14.8 Technical Appendix
265
argument for a con dence interval. It is not a valid con dence intervalqfor Y . To see why, let's compute the probability that q Yb 2 V (Yb ) contains Y . Let s = V ar(Yb ). Then, ! b Y Y P(Yb 2s < Y < Yb + 2s) = P 2< <2 s ! b <2 = P 2< s ! b = P 2< <2 s s 2 P 2 < N (0; 1) N 0; s2 < 2 2 = P 2 < N 0; 1 + 2 < 2
s
6= :95:
The problem is that the quantity of interest Y is equal to a parameter plus a random variable. We can x this by de ning
n = V (Yb ) + 2 = 2
P (xi x )2 i P + 1 2: n i (xi x)2
In practice, we substitute b for and we denote the resulting quantity by bn. Now consider the interval Yb 2bn. Then, ! b Y Y <2 P(Yb 2bn < Y < Yb + 2bn ) = P 2< bn ! b = P 2< <2 bn N (0; s2 + 2 ) P 2< <2 bn N (0; s2 + 2 ) P 2< <2
n
266
14.
Linear Regression
= P ( 2 < N (0; 1) < 2) = :95: Of course, a 1 14.9
interval is given by Yb z=2 bn .
Exercises
1. Prove Theorem 14.4. 2. Prove the formulas for the standard errors in Theorem 14.8. You should regard the Xi 's as xed constants. 3. Consider the regression through the origin model:
Yi = Xi + : Find the least squares estimate for . Find the standard error of the estimate. Find conditions that guarantee that the estimate is consistent. 4. Prove equation (14.24). 5. In the simple linear regression model, construct a Wald test for H0 : 1 = 17 0 versus H1 : 1 6= 17 0 . 6. Get the passenger car mileage data from http://lib.stat.cmu.edu/DASL/Data les/carmpgdat.html (a) Fit a simple linear regression model to predict MPG (miles per gallon) from HP (horsepower). Summarize your analysis including a plot of the data with the tted line. (b) Repeat the analysis but use log(MPG) as the response. Compare the analyses. 7. Get the passenger car mileage data from http://lib.stat.cmu.edu/DASL/Data les/carmpgdat.html (a) Fit a multiple linear regression model to predict MPG (miles per gallon) from HP (horsepower). Summarize your analysis.
14.9 Exercises
267
(b) Use Mallow Cp to select a best sub-model. To search trhough the models try (i) all possible models, (ii) forward stepwise, (iii) backward stepwise. Summarize your ndings. (c) Repeat (b) but use BIC. Compare the results. (d) Now use the Lasso and compare the results. 8. Assume that the errors are Normal. Show that the model with highest AIC (equation (14.26)) is the model with the lowest Mallows Cp statistic. 9. In this question we will take a closer look at the AIC method. Let X1 ; : : : ; Xn be iid observations. Consider two models M0 and M1 . Under M0 the data are assumed to be N (0; 1) while under M1 the data are assumed to be N (; 1) for some unknown 2 R :
M M
: X1 ; : : : ; Xn 1 : X1 ; : : : ; Xn 0
N (0; 1) N (; 1); 2 R :
This is just another way to view the hypothesis testing problem: H0 : = 0 versus H1 : 6= 0. Let `n () be the log-likelihood function. The AIC score for a model is the log-likelihood at the mle minus the number of parameters. (Some people multiply this score by 2 but that is irrelevant.) Thus, the AIC score for M0 is AIC0 = `n (0) and the AIC score for M1 is AIC1 = `n (b) 1. Suppose we choose the model with the highest AIC score. Let Jn denote the selected model: 0 > AIC1 Jn = 01 ifif AIC AIC1 > AIC0 : (a) Suppose that M0 is the true model, i.e. = 0. Find lim P (Jn = 0) :
n!1
268
14.
Linear Regression
Now compute limn!1 P (Jn = 0) when 6= 0.
(b) The fact that limn!1 P (Jn = 0) 6= 1 when = 0 is why some people say that AIC \over ts." But this is not quite true as we shall now see. Let (x) denote a Normal density function with mean and variance 1. De ne 0 b fn (x) = 0 ((xx)) ifif JJn = n = 1: b p If = 0, show that D(0 ; fbn ) ! 0 as n ! 1 where Z f (x) D(f; g ) = f (x) log dx g (x) p is the Kullback-Leibler distance. Show also that D( ; fbn ) ! 0 if 6= 0. Hence, AIC consistently estimates the true density even if it \overshoots" the correct model.
REMARK: If you are feeling ambitious, repeat this analysis for BIC which is the log-likelihood minus (p=2) log n where p is the number of parameters and n is sample size.
This is page 269 Printer: Opaque this
15
Multivariate Models
In this chapter we revisit the Multinomial model and the multivariate Normal, as we will need them in future chapters. Let us rst review some notation from linear algebra. ReP T call that if x and y are vectors then x y = j xj yj . If A is a matrix then det(A) denotes the determinant of A, AT denotes the transpose of A and A denotes the the inverse of A (if the inverse exists). The trace of a square matrix A { denoted by tr(A) { is the sum of its diagonal elements. The trace satis es tr(AB ) = tr(BA) and tr(A) + tr(B ). Also, tr(a) = a if a is a scalar (i.e. a real number). A matrix is positive de nite if xT x > 0 for all non-zero vectors x. If a matrix is symmetric and positive de nite, there exists a matrix = { called the square root of { with the following properties: 1. = is symmetric; 2. = = = ; 3. = = = = = = I where = = ( = ) . 1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1
270
15.1
15.
Multivariate Models
Random Vectors
Multivariate models involve0a random vector X of the form 1 X=
B @
X1
...
C A:
Xk The mean of a random vector X is de ned by 0 1 0 1 1 E (X1 ) . C B ... C =B @ .. A = @ A: k E (Xk )
The covariance matrix is de ned to be 2 V (X ) Cov(X ; X ) 6 Cov(X ; X ) V (X ) = V (X ) = 664 .. ... . Cov(Xk ; X ) Cov(Xk ; X ) 1
1
2
1
2
2
1
2
...
(15.1) 3
Cov(X1 ; Xk ) Cov(X2 ; Xk ) 7 7
...
V (Xk )
7: 5
(15.2) This is also called the variance matrix or the variance-covariance matrix.
a be a vector of length k and let X be a random vector of the same length with mean and variance . Then E (aT X ) = aT and V (aT X ) = aT a. If A is a matrix with k columns then E (AX ) = A and V (AX ) = AAT . Theorem
15.1
Let
Now suppose we have a random sample of n vectors: 0 B B B @
X11 X21
...
Xk1
1
0
C C C; A
B B B @
X12 X22
...
Xk2
1
C C C; A
0
:::;
B B B @
X1n X2n
...
Xkn
The sample mean X is a vector de ned by 0
X1
. X =B @ .. Xk
1 C A
1
C C C: A
(15.3)
15.2 Estimating the Correlation
271
P
n where X i = n j Xij . The sample variance matrix, also called the covariance matrix or the variance-covariance matrix, is 2 3 1
=1
S=
where
6 6 6 4
s11 s12 s12 s22
s k s k
s1k s2k
skk
...
...
1
...
...
2
7 7 7 5
(15.4)
n X 1 sab = (Xaj X a )(Xbj X b): n 1j =1
It follows that E (X ) = . and E (S ) = . 15.2
Estimating the Correlation
Consider n data points from a bivariate distribution:
X11 X21
;
X12 X22
;;
X1n X2n
:
Recall that the correlation between X and X is E ((X )(X )) = : 1
1
1
1
2
2
2
2
(15.5)
The sample correlation (the plug-in estimator) is b =
Pn
i=1 (X1i
X 1 )(X2i s1 s2
X 2)
:
(15.6)
We can construct a con dence interval for by applying the delta method as usual. However, it turns out that we get a more accurate con dence interval by rst constructing a con dence interval for a function = f () and then applying the inverse function f . The method, due to Fisher, is as follows. De ne 1 f (r) = log(1 + r) log(1 r) 2 1
272
15.
Multivariate Models
akd let = f (). The inverse of r is g (z ) f 1 (z ) =
e2z 1 : e2z + 1
Now do the following steps: Approximate Con dence Interval for The Correlation
1. Compute b
1 = f (b) = 2 log(1 + b) log(1 b) :
2. Compute the approximate standard error of b which can be shown to be 1 : b (b) = p se n 3 3. An approximate 1 con dence interval for = f () is
(a; b) b
pz=2 ; b + pz=2 : n 3 n 3
4. Apply the inverse transformation f (z) to get a con dence interval for : a e 1; e b 1 : e a+1 e b+1 1
15.3
2
2
2
2
Multinomial
Let us now review the Multinomial distribution. Consider drawing a ball from an urn which has balls with k dierent colors labeled color 1, color 2, ... , color k. Let p = (p ; : : : ; pk ) 1
15.3 Multinomial
273
P
where pj 0 and kj pj = 1 and suppose that pj is the probability of drawing a ball of color j . Draw n times (independent draws with replacement) and let X = (X ; : : : ; Xk ) where Xj is Pk the number of times that color j appeared. Hence, n = j Xj . We say that X has a Multinomial (n,p) distribution. The probability function is =1
1
=1
n f (x; p) = px1 pxkk x1 : : : xk 1
where
n n! = : x1 : : : xk x1 ! xk ! Theorem 15.2 Let X Multinomial(n; p). Then the marginal distribution of Xj is Xj Binomial(n; pj ). The mean and variance of X are 0 1 np1 B C E (X ) = @ ... A npk and
V (X ) =
0 B B B @
np1 (1 p1 ) np1 p2 np1 p2 np2 (1 p2 ) .. .
np1 pk
.. .
np2 pk
.. .
np1 pk np2 pk .. .
npk (1 pk )
1 C C C: A
That Xj Binomial(n; pj ) follows easily. Hence, E (Xj ) = npj and V (Xj ) = npj (1 pj ). To compute Cov(Xi ; Xj ) we proceed as follows. Notice that Xi + Xj Binomial(n; pi + pj ) and so V (Xi + Xj ) = n(pi + pj )(1 pi pj ). On the other hand, Proof.
V (Xi + Xj )
= V (Xi) + V (Xj ) + 2Cov(Xi; Xj ) = npi (1 pi) + npj (1 pj ) + 2Cov(Xi; Xj ):
Equating this last expression with n(pi + pj )(1 pi pj ) implies that Cov(Xi; Xj ) = npi pj .
274
15.
Theorem
Multivariate Models
15.3
The maximum likelihood estimator of
0
pb = B @
1
0
.. C . A pbk
=B @
pb1
X1 n .. .
Xk n
p is
1 C A
= Xn :
The log-likelihood (ignoring the constant) is
Proof.
`(p) =
k X j =1
Xj log pj :
When we maximize `Pwe have to be careful since we must enforce the constraint that j pj = 1. We use the method of Lagrange multipliers and instead maximize A(p) =
k X j =1
Now
Xj log pj +
X
j
pj
1 :
@A(p) Xj = p + : @pj j P
Setting @A@p p = 0 yields pbj = Xj =. Since j pbj = 1 we see that = n and hence pbj = Xj =n as claimed. Next we would like to know the variability of the mle . We can either compute the variance matrix of pb directly or we can approximate the variability of the mle by computing the Fisher information matrix. These two approaches give the same answer in this case. The direct approach is easy: V (pb) = V (X=n) = n V (X ) and so ( ) j
2
0
V (pb) =
1B B
n
B @
p1 (1 p1 ) p1 p2 p1 p2 p2 (1 p2 )
...
p1 p k
...
p2 pk
...
p1 pk p2 pk
...
pk (1 pk )
1 C C C: A
15.4 Multivariate Normal
15.4
275
Multivariate Normal
Let us recall how the multivariate Normal distribution is de ned. To begin, let 0 1 Z=
Z1
...
B @
Zk
C A
where Z ; : : : ; Zk N (0; 1) are independent. The density of Z is 1
(
k X
)
2
1 z = 1 exp 2j j (2)k=
1 zT z : 2 (15.7) The variance matrix of Z is the identity matrix I . We write Z N (0; I ) where it is understood that 0 denotes a vector of k zeroes. We say that Z has a standard multivariate Normal distribution. More generally, a vector X has a multivariate Normal distribution, denoted by X N (; ), if its density is 1 exp f (z ) = (2)k=
2
2
=1
1 f (x; ; ) = (2)k= det() = exp
1 (x )T (x ) 2 (15.8) where det() denotes the determinant of a matrix, is a vector of length k and is a k k symmetric, positive de nite matrix. Then E (X ) = and V (X ) = . Setting = 0 and = I gives back the standard Normal. 2
Theorem
15.4
1
1 2
The following properties hold:
1. If
Z N (0; 1) and X = + 1=2 Z
2. If
X N (; ), then
=
1 2
then
X N (; ).
(X ) N (0; 1).
N (; ) a is a vector of the same length as X , then aT X N (aT ; aT a).
3. If
X
276
15.
Multivariate Models
4. Let
V = (X Then
V
)T 1 (X
):
k . 2
Suppose we partition a random Normal vector X into two parts X = (Xa; Xb) We can similarly partition the the mean = (a ; b ) and the variance =
aa ab : ba bb
X N (; ). Then: (1) The marginal distribition of Xa is Xa N (a ; aa ). (2) The conditional distribition of Xb given Xa = xa is
Theorem
15.5
Let
Xb jXa = xa N ((xa ); (xa )) where
(xa ) = b + ba aa1 (xa a ) (xa ) = bb ba aa1 ab :
(15.9) (15.10)
n from a N (; ), the log-likelihood is (up to a constant not depending on or ) Theorem
15.6
Given a random sample of size
is given by
n
n n ( X )T (X ) tr( S ) 2 2 2 log det():
`(; ) = The
mle
1
1
is
b = n 1 S: b = X and n
(15.11)
15.5 Appendix
15.5
277
Appendix
Proof of Theorem 15.6. Denote the i random vector by X i. The log-likelihood is X n 1 X(X i )T (X i ): kn log(2 ) log det() `(; ) = f (X i; ; ) = 2 2 2 i i th
1
Now, X
i
(X i )T (X i ) = 1
= P
X
[(X i X ) + (X )]T [(X i X ) + (X )]
i
1
X
(X i X )T (X i X ) + n(X )T (X )
i
1
1
since i(X i X ) (X ) = 0. Also, notice that (X i )T (X i ) is a scalar, so 1
1
X
i
(X i )T (X i ) =
X
=
X
1
i i
tr (X i
)T 1 (X i
)
tr 1 (X i
)(X i
)T
X
)(X i
)T
"
= tr
1
i
= n tr S 1
#
(X i
and the conclusion follows. 15.6
Exercises
1. Prove Theorem 15.1. 2. Find the Fisher information matrix for the mle of a Multinomial. 3. Prove Theorem 15.5. 4. (Computer Experiment. Write a function to generate nsim observations from a Multinomial(n; p) distribution.
278
15.
Multivariate Models
5. (Computer Experiment. Write a function to generate nsim observations from a Multivariate normal with given mean and covariance matrix . 6. (Computer Experiment. Generate 1000 random vectors from a N (; ) distribution where =
3 ; = 8
2 6 : 6 2
Plot the simulation as a scatterplot. Find the distribution of X jX = x using theorem 15.5. In particular, what is the formula for E (X jX = x )? Plot E (X jX = x ) on your scatterplot. Find the correlation between X and X . Compare this with the sample correlations from your simulation. Find a 95 per cent con dence interval for . Estimate the covariance matrix . 2
1
1
2
1
1
2
1
1
1
2
7. Generate 100 random vectors from a multivariate Normal with mean (0; 2)T and variance
1 3 : 3 1
Find a 95 per cent con dence interval for the correlation . What is the true value of ?
This is page 279 Printer: Opaque this
16 Inference about Independence
In this chapter we address the following questions: (1) How do we test if two random variables are independent? (2) How do we estimate the strength of dependence between two random variables? When Y and Z are not independent, we say that they are or or If Y and Z are associated, it does not imply that Y causes Z or that Z causes Y . If Y does cause Z then changing Y will change the distribution of Z , otherwise it will not change the distribution of Z . For example, quitting smoking Y will reduce your probability of heart disease Z . In this case Y does cause Z . As another example, owning a TV Y is associated with having a lower incidence of starvation Z . This is because if you own a TV you are less likely to live in an impoverished nation. But giving a starving person will not cause them to stop being hungry. In this case, Y and Z are associated but the relationship is not causal.
dependent associated related.
Recall that we write Y q Z to mean that Y and Z are independent.
280
16.
Inference about Independence
We defer detailed discussion of the important question of causation until a later chapter.
16.1
Two Binary Variables
Suppose that Y and Z are both binary. Consider a data set (Y1 ; Z1 ); : : : ; (Yn ; Zn). Represent the data as a two-by-two table:
Y =0 Y =1 Z = 0 X00 X01 X0 Z = 1 X10 X11 X1 X0 X1 n = X where the Xij represent counts:
Xij = number of observations for which Y = i and Z = j: P
The dotted subscripts denote sums. For example, Xi = j Xij . This is a convention we use throughout the remainder of the book. Denote the corresponding probabilities by:
Y =0 Y =1 Z = 0 p00 p01 p0 Z = 1 p10 p11 p1 p0 p1 1 where pij = P(Z = i; Y = j ). Let X = (X00 ; X01 ; X10 ; X11 ) denote the vector of counts. Then X Multinomial(n; p) where p = (p00 ; p01 ; p10 ; p11 ). It is now convenient to introduce two new parameters.
16.1 Two Binary Variables
De nition 16.1 The
odds ratio is de ned to be p00 p11 : p01 p10
(16.1)
= log( ):
(16.2)
= The
281
log odds ratio is de ned to be
Theorem
16.2 The following statements are equivalent:
1. Y q Z . 2. = 1. 3. = 0. 4. For i; j 2 f0; 1g,
pij = pi pj :
(16.3)
Now consider testing
H0 : Y
qZ
versus H1 : Y
q6 Z:
First we consider the likelihood ratio test. Under H1 , X Multinomial(n; p) and the mle is pb = X=n. Under H0 , we again have that X Multinomial(n; p) but p is subject to the constraint pij = pi pj ; j = 0; 1: This leads to the following test. Theorem
Let
16.3 (Likelihood Ratio Test for Independence in a 2-by-2 table) T =2
1 X 2 X
i=0
X X Xij log ij : XiXj j =0
(16.4)
Under H0 , T 21 . Thus, an approximate level test is obtained by rejecting H0 then T > 21; .
282
16.
Inference about Independence
Another popular test for independence is Pearson's 2 test. Theorem
Let
16.4 (Pearson's test for Independence in a 2-by-2 table) 2
U= where
1 X 1 X (Xij
i=0 j =0
Eij =
Eij )2
Eij
(16.5)
Xi Xj : n
Under H0 , U 21 . Thus, an approximate level test is obtained by rejecting H0 then U > 21; .
Here is the intuition for the Pearson test. Under H0 , pij = pipj , so the maximum likelihood estimator of pij under H0 is
pbij = pbi pbj =
Xi Xj : n n
Thus, the expected number of observations in the (i,j) cell is
Eij = npbij =
XiXj : n
The statistic U compares the observed and expected counts. Example 16.5 The following data from Johnson and Johnson
(1972) relate tonsillectomy and Hodgkins disease. (The data are actually from a case-control study; we postpone discussion of this point until the next section.)
Hodgkins Disease No Disease Tonsillectomy 90 165 255 No Tonsillectomy 84 307 391 Total 174 472 646 We would like to know if tonsillectomy is related to Hodgkins disease. The likelihood ratio statistic is T = 14:75 and the pvalue is P(21 > 14:75) = :0001. The 2 statistic is U = 14:96
16.1 Two Binary Variables
283
and the p-value is P(21 > 14:96) = :0001. We reject the null hypothesis of independence and conclude that tonsillectomy is associated with Hodgkins disease. This does not mean that tonsillectomies cause Hodgkins disease. Suppose, for example, that doctors gave tonsillectomies to the most seriosuly ill patients. Then the assocation between tonsillectomies and Hodgkins disease may be due to the fact that those with tonsillectomies were the most ill patients and hence more likely to have a serious disease.
We can also estimate the strength of dependence by estimating the odds ratio and the log-odds ratio . Theorem
16.6 The mle 's of b=
and are
X00 X11 ; b = log b: X01 X10
(16.6)
The asymptotic stanrad errors (computed from the delta method) are r 1 1 1 1 b (b se
) = + + + (16.7)
X00 X01 b ( b) = bse b (b se
):
X10
X11
(16.8)
Remark 16.7 For small sample sizes, b and b
can have a very
large variance. In this case, we often use the modi ed estimator
X00 + 21 X11 + 12 b= : X01 + 21 X10 + 12
(16.9)
Yet another test for indepdnence is the Wald test for = 0 b (b given by W = (b 0)=se
). A 1 con dence interval for is
z=2 seb (b ). A 1 con dence interval for can be obtained b b ( b). Second, since in two ways. First, we could use b z=2 se = e we could use
b (b exp b z=2 se
) :
This second method is usually more accurate.
(16.10)
284
16.
Inference about Independence
Example 16.8 In the previous example, b=
and
b
90 307 = 1:99 165 84
= log(1:99) = :69:
So tonsillectomy patients were twice as likely to have Hodgkins disease. The standard error of b is r
1 1 1 1 + + + = :18: 90 84 165 307 The Wald statistic is W = :69=:18 = 3:84 whose p-value is P(jZ j > 3:84) = :0001, the same as the other tests. A 95 per cent con dence interval for is b
2(:18) = (:33; 1:05). A 95 per cent con dence interval for is (e:33 ; e1:05 ) = (1:39; 2:86). 16.2
Interpreting The Odds Ratios
Suppose event A as probability P (A). The odds of A are de ned as P (A) : odds(A) = 1 P (A) It follows that odds(A) P(A) = : 1 + odds(A) Let E be the event that someone is exposed to something (smoking, radiation, etc) and let D be the event that they get a disease. The odds of getting the disease given that you are exposed are P(D jE ) odds(DjE ) = 1 P(DjE ) and the odds of getting the disease given that you are not exposed are P(D jE c ) : odds(DjE c) = 1 P(DjE c)
16.2 Interpreting The Odds Ratios
285
The odds ratio is de ned to be =
odds(DjE ) : odds(DjE c)
If = 1 then disease probability is the same for exposed and unexposed. This implies that these events are independent. Recall that the log-odds ratio is de ned as = log( ). Independence corresponds to = 0. Consider this table of probabilities:
Dc E c p00 E p10 p0
D p01 p0 p11 p1 p1 1
Denote the data by
Dc D E c X00 X01 X0 E X10 X11 X1 X0 X1 X Now P (D
jE ) = p p+11p 10 11
and P(DjE c) =
p01
p00 + p01
and so odds(DjE ) = and therefore,
p11 p and odds(DjE c) = 01 p10 p00 =
p11 p00 : p01 p10
To estimate the parameters, we have to rst consider how the data were collected. There are three methods. Multinomial Sampling. We draw a sample from the population and, for each person, record their exposure and disease status. In this case, X = (X00 ; X01 ; X10 ; X11 ) Multinomial(n; p).
286
16.
Inference about Independence
We then estimate the probabilities in the table by pbij = Xij =n and b = pb11 pb00 = X11 X00 :
pb01 pb10
X01 X10
We get some exposed and unexposed people and count the number with disease in each group. Thus, Prospective Sampling. (Cohort Sampling).
X01 X11
Binomial(X0 ; P (DjE c)) Binomial(X1 ; P (DjE )):
We should really write x0 and x1 instead of X0 and X1 since in this case, these are xed not random but for notational simplicity I'll keep using capital letters. We can estimate P (DjE ) and P (DjE c) but we cannot estimate all the probabilities in the table. Still, we can estimate since is a function of P (DjE ) and P (DjE c). Now X Pb(DjE ) = 11 X1 and X Pb(DjE c) = 01 : X0 Thus, b = X11 X00
X01 X10
just as before.
Case-Control (Retrospective) Sampling. Here we get
some diseased and non-diseased people and we observe how many are exposed. This is much more eÆcient if the disease is rare. Hence,
X10 X11
Binomial(X0 ; P (E jDc)) Binomial(X1 ; P (E jD)):
From these data we can estimate P(E jD) and P(E jDc). Surprisingly, we can also still estimate . To understand why, note
16.2 Interpreting The Odds Ratios
that P(E
jD) = p p+11p ; 01 11
1 P(E jD) =
p01
p01 + p11
By a similar argument, odds(E jDc) =
287
; odds(E jD) =
p11 : p01
p10 : p00
Hence,
odds(E jD) p11 p00 = = : odds(E jDc) p01 p10 From the data, we form the following estimates:
Pb(E jD) = Therefore,
X11 X [ X [ X ; 1 Pb(E jD) = 01 ; odds( E jD) = 11 ; odds( E jDc) = 10 : X1 X1 X01 X00 b=
X00 X11 : X01 X10
So in all three data collection methods, the estimate of out to be the same.
turns
It is tempting to try to estimate P(DjE ) P(DjE c). In a case-control design, this quantity is not estimable. To see this, we apply Bayes' theorem to get P(E jD )P(D ) P(E c jD )P(D ) P(D jE ) P(D jE c ) = : P(E ) P(E c ) Because of the way we obtained the data, P(D) is not estimable from the data. However, we can estimate = P(DjE )=P(DjE c), which is called the , under the
relative risk rare disease assumption. Theorem 16.9 Let = P(DjE )=P(DjE c). Then
as P(D) ! 0.
!1
288
16.
Inference about Independence
Thus, under the rare disease assumption, the relative risk is approximately the same as the odds ratio and, as we have seen, we can estimate the ods ratio. In a randomized experiment, we can interpret a strong association, that is 6= 1, as a causal relationship. In an observational (non-randomized) study, the association can be due to other unobserved variables. We'll discuss causation in more detail later.
confounding
16.3
Two Discrete Variables
Now suppose that Y 2 f1; : : : ; I g and Z 2 f1; : : : ; J g are two discrete variables. The data can be represented as an I by J table of counts:
Z=1
.. . Z=i .. . Z=I
Y =1 Y =2 X11 X12 .. .
.. .
Y = j Y = J X1j X1J X1 .. .
.. .
.. .
.. .
.. .
.. .
.. .
Xi1
Xi2
Xij
XiJ
Xi
XI 1 X1
XI 2 X2
XIj Xj
XIJ XJ
XI
.. .
.. .
Consider testing H0 : Y
.. .
q Z versus H1 : Y q6 Z .
.. .
.. .
n
16.3 Two Discrete Variables
Theorem
16.10 Let T =2
I X J X i=1
289
X X Xij log ij : XiXj j =1
(16.11)
The limiting distribution of T under the null hypothesis of independence is 2 where = (I 1)(J 1). Pearson's 2 test statistic is
U=
I X J X (nij i=1 j =1
Eij )2
Eij
:
(16.12)
Asymptotically, under H0 , U has a 2 distribution where = (I 1)(J 1). Example 16.11 These data are from a study by Hancock et al
(1979). Patients with Hodkins disease are classi ed by their response to treatment and by histological type. Type LP NS MC LD
Positive Response Partial Response No Response 74 18 12 104 68 16 12 96 154 54 58 266 18 10 44 72
The 2 test statistic is 75.89 with 2 3 = 6 degrees of freedom. The p-value is P(26 > 75:89) 0. The likelihood ratio test statistic is 68.30 with 2 3 = 6 degrees of freedom. The p-value is P(26 > 68:30) 0. Thus there is strong evidence that reponse to treatment and histological type are associated.
There are a variety of ways to quantify the strength of dependence between two discrete variables Y and Z . Most of them are not very intuitive. The one we shall use is not standard but is more interpretable.
290
16.
Inference about Independence
We de ne
Æ (Y; Z ) = max jPY;Z (Y A;B
2 A; Z 2 B )
P Y (Y
2 A)
PZ (Z
2 B )j
(16.13) where the maximum is over all pairs of events A and B . Theorem
16.12 Properties of Æ:
1. 0 Æ (Y; Z ) 1.
2. Æ (Y; Z ) = 0 if and only if Y
q Z.
3. The following identity holds: I X J 1X Æ (X; Y ) = p 2 i=1 j =1 ij
4. The
mle
(16.14)
of Æ is
Æ (X; Y ) = where
pi pj :
I
J
1 X X pb 2 i=1 j =1 ij
pbi pbj
(16.15)
Xij X X ; pbi = i ; pbj = j : n n n The interpretation of Æ is this: if one person makes probability pbij =
statements assuming independence and another person makes probability statements without assuming independence, their probability statements may dier by as much as Æ . Here is a suggested scale for interpreting Æ : 0 < Æ :01 negligible association :01 < Æ :05 non-negligible association :05 < Æ :10 substantial association Æ > :10 very strong association A con dence interval for Æ can be obtained by bootstrapping. The steps for bootstrapping are:
16.4 Two Continuous Variables
291
1. Draw X Multinomial(n; pb); 2. Compute pbij ; pbi ; pbj . 3. Compute Æ . 4. Repeat. Now we use any of the methods we learned earlier for constructing bootstrap con dence intervals. However, we should not use a Wald interval in this case. The reason is that if Y q Z then Æ = 0 and we are on the boundary of the parameter space. In this case, the Wald method is not valid. Example 16.13 Returning to Example 16.11 we nd that Æb = :11.
Using a pivotal bootstrap with 10; 000 bootstrap samples, a 95 per cent con dence interval for Æ is (.09,.22). Our conclusion is that the association between histological type and response is substantial. 16.4
Two Continuous Variables
Now suppose that Y and Z are both continuous. If we assume that the joint distribution of Y and Z is bivariate Normal, then we mesure the dependence between Y and Z by means of the correaltion coeÆcient . Tests, estimates and con dence intervals for in the Normal case are given in the previous chapter. If we do not assume Normality then we need a nonparametric method for assessing dependence. Recall that the correaltion is E ((X1 1 )(X2 2 )) = :
1 2 A nonparametric estimator of is the plug-in estimator which is
b =
Pn (X1i X 1 )(X2i X 2 ) qP i=1 Pn n 2 2 i=1 (X1i X 1 ) i=1 (X2i X 2 )
292
16.
Inference about Independence
which is just the sample correlation. A con dence interval can be constructed using the bootstrap. A test for = 0 can be based on the Wald test using the bootstrap to estimate the standard error. The plug-in approach is useful for large samples. For small samples, we measure the correlation using the bS . We simply replace the data by their ranks { ranking each variable separately { then we compute the correaltion coeÆcient of the ranks. To test the null hypothesis that S = 0 we need the distribution of bS under the null hypothesis. This can be obtained easily by simulation. We x the ranks of the rst variable as 1; 2; : : : ; n. The ranks of the second variable are chosen at random from the set of n! possible orderings. Then we compute the correlation. This is repeated many times and the resulting distribution P0 is the null distribution of bS . The p-value for the test is P0 (jRj > jbS j) where R is drawn from the null distribution P0 .
correaltion coeÆcient
Spearman rank
Example 16.14 The following data (Snedecor and Cochran, 1980,
p. 191) are systolic blood pressure X1 and are diastolic blood pressure X2 in millimeters:
X1 100 105 110 110 120 120 125 130 130 150 170 195 X2 65 65 75 70 78 80 75 82 80 90 95 90 The sample correlation is b = :88. The bootstrap standard error is .046 and the Wald test statistic is :88=:046 = 19:23. The p-value is near 0 giving strong evidence that the correlation is not 0. The 95 per cent pivotal bootstrap con dence interval is (.78,.94). Because the sample size is small, consider Spearman's rank correlation. In ranking the data, we will use average ranks if there are ties. So if the third and fourth lowest numbers are the same, they each get rank 3.5. The ranks of the data are:
16.5 One Continuous Variable and One Discrete
293
X1 1 2 3.5 3.5 5.5 5.5 7 8.5 8.5 10 11 12 X2 1.5 1.5 4.5 3 6 7.5 4.5 9 7.5 10.5 12 10.5 The rank correlation is bS = :94 and the p-value for testing the null hypothesis that there is no correaltion is P0 (jRj > :94) 0 which is was obtained by simulation. 16.5
One Continuous Variable and One Discrete
Suppose that Y 2 f1; : : : ; I g is discrete and Z is continuous. Let Fi (z ) = P(Z z jY = i) denote the cdf of Z conditional on Y = i. Theorem
16.15 When Y 2 f1; : : : ; I g is discrete and Z is con-
tinuous, then Y
q Z if and only if F1 = = FI .
It follows from the previous theorem that to test for independence, we need to test
H0 : F1 = = FI versus H1 : not H0 : For simplicity, we consider the case where I = 2. To test the null hypothesis that F1 = F2 we will use the . Let n1 denote the nimber of observations for which Yi = 1 and let n2 denote the nimber of observations for which Yi = 2. Let
two sample
Kolmogorov-Smirnov test Fb1 (z ) =
and
Fb2 (z )
=
n 1 X
n1
i=1
n 1 X
n2
i=1
I (Zi z )I (Yi = 1) I (Zi z )I (Yi = 2)
denote the empirical distribution function of Z given Y = 1 and Y = 2 respectively. De ne the test statistic
D = sup jFb1 (x) Fb2 (x)j: x
294
16.
Theorem
Inference about Independence
16.16 Let H (t) = 1 2
1 X j =1
( 1)j 1 e
Under the null hypthesis that F1 = F2 , r
lim P
n!1
2j 2 t2
:
n1 n2 D t = H (t): n1 + n2
It follows from the theorem that an approximate level test is obtained by rejecting H0 when r
16.6
n1 n2 D > H 1 (1 ): n1 + n 2
Bibliographic Remarks
Johnson, S.K. and Johnson, R.E. (1972). New England Journal of Medicine. 287. 1122-1125. Hancock, B.W. (1979). Clinical Oncology, 5, 283-297. 16.7
Exercises
1. Prove Theorem 16.2. 2. Prove Theorem 16.3. 3. Prove Theorem 16.9. 4. Prove equation (16.14). The data here an approximate creation using information in article.
are rethe the
5. The New York Times (January 8, 2003, page A12) reported the following data on death sentencing and race, from a study in Maryland: Death Sentence No Death Sentence Black Victim 14 641 White Victim 62 594
16.7 Exercises
295
Analyze the data using the tools from this Chapter. Interpret the results. Explain why, based only on this information, you can't make causal conclusions. (The authors of the study did use much more information in their full report.) 6. Analyze the data on the variables Age and Financial Status from: http://lib.stat.cmu.edu/DASL/Data les/montanadat.html 7. Estimate the correlation between temperature and latitude using the data from http://lib.stat.cmu.edu/DASL/Data les/USTemperatures.html Use the correlation coeÆcient and the Spearman rank correaltion. Provide estimates, tests and con dence intervals. 8. Test whether calcium intake and drop in blood pressure are associated. Use the data in http://lib.stat.cmu.edu/DASL/Data les/Calcium.html
This is page 297 Printer: Opaque this
17
Undirected Graphs and Conditional Independence
Graphical models are a class of multivariate statistical models that are useful for representing independence relations. They are also useful develop parsimonious models for multivariate data. To see why parsimonious models are useful in the multivariate setting, consider the problem of estimating the joint distribution of several discrete random variables. Two binary variables Y1 and Y2 can be represented as a two-by-two table which corresponds to a multinomial with four categories. Similarly, k binary variables Y1 ; : : : ; Y correspond to a multinomial with N = 2 categories. When k is even moderately large, N = 2 will be huge. It can be shown in that case that the mle is a poor estimator. The reason is that the data are sparse: there are not enough data to estimate so many parameters. Graphical models often require fewer parameters and may lead to estimators with smaller risk. There are two main types of graphical models: undirected and directed. Here, we introduce undirected graphs. We'll discuss directed graphs later. k
k
k
298
17. Undirected Graphs and Conditional Independence
17.1
Conditional Independence
Underlying graphical models is the concept of conditional independence. De nition 17.1 Let
X , Y and Z be discrete random variables. We say that X and Y are conditionally independent given Z , written X q Y j Z , if
P(X
=
x; Y
=
j
y Z
=
z)
=
P(X = xjZ = z )P(Y
=
j
y Z
=
z)
(17.1)
for all x; y; z . If X , Y and Z are continuous random variables, we say that X and Y are conditionally independent given Z if fX;Y
for all x,
y
j (x; y jz ) = f j (xjz )f j (y jz ): Z
X Z
Y Z
and z .
Intuitively, this means that, once you know Z , Y provides no extra information about X . The conditional independence relation satis es some basic properties. Theorem 17.2 The following implications hold:
=) X q Y j Z and U = h(X ) =) X q Y j Z and U = h(X ) =) X q Y j Z and X q W j(Y; Z ) =) X q Y j Z and X q Z j Y =) X
q j Y
Z
Y U X X X
q j q j q j( ) q( )j q( ) X
Z
Y
Z
Y
Z; U
W; Y
Z
Y; Z :
The last property requires the assumption that all events have positive probability; the rst four do not.
17.2 Undirected Graphs
299
Y
b
b
X
b
FIGURE 17.1. A graph with vertices E
=
f(X; Y ); (Y; Z )g.
17.2
V
=
Z
fX; Y ; Z g.
The edge set is
Undirected Graphs
An undirected graph G = (V; E ) has a nite set V of vertices (or nodes) and a set E of edges (or arcs) consisting of pairs of vertices. The vertices correspond to random variables X; Y; Z; : : : and edges are written as unordered pairs. For example, (X; Y ) 2 E means that X and Y are joined by an edge. An example of a graph is in Figure 17.1. Two vertices are adjacent, written X Y , if there is an edge between them. In Figure 17.1, X and Y are adjacent but X and Z are not adjacent. A sequence X0 ; : : : ; X is called a path if X 1 X for each i. In Figure 17.1, X; Y; Z is a path. A graph is complete if there is an edge between every pair of vertices. A subset U V of vertices together with their edges is called a n
i
i
subgraph.
If
are three distinct subsets if V , we say that C separates A and B if every path from a variable in A to a variable in B intersects a variable in C . In Figure 17.2 fY; W g and fZ g are separated by fX g. Also, W and Z are separated by fX; Y g. A; B
and
C
300
17. Undirected Graphs and Conditional Independence
FIGURE 17.2. Z
W
b
b
X
Y
b
b
Z
fY; W g and fZ g fX; Y g.
are separated by
fX g.
Also,
W
and
are separated by
Y
b
X
b
b
FIGURE 17.3. 17.3
X
Z
q Z jY .
Probability and Graphs
Let V be a set of random variables with distribution P. Construct a graph with one vertex for each random variable in V . Suppose we omit the edge between a pair of variables if they are independent given the rest of the variables: no edge between
X
and Y
()
X
q jrest Y
where \rest" refers to all the other variables besides X and Y . This type of graph is called a pairwise Markov graph. Some examples are shown in Figures 17.3, 17.4 17.6 and 17.5. The graph encodes a set of pairwise conditional independence relations. These relations imply other conditional independence
17.3 Probability and Graphs
301
Y
b
X
b
b
Z
FIGURE 17.4. No implied independence relations.
X
b
bW
Y
b
b
Z
and
Y
FIGURE 17.5.
b X
X
q Z jfY; W g
b Y
q W jfX; Z g.
b
b
Z
W
FIGURE 17.6. Pairwise independence implies that But is
X
q Z jY ?
X
q Z jfY ; W g.
302
17. Undirected Graphs and Conditional Independence
relations. How can we gure out what they are? Fortunately, we can read these other conditional independence relations directly from the graph as well, as is explained in the next theorem. Theorem 17.3 Let
G=(
) be a pairwise Markov graph for a distribution P. Let A; B and C be distinct subsets of V such that C separates A and B . Then A q B jC . V; E
Remark 17.4 If A and B are not connected (i.e. there is no path
from A to B ) then we may regard A and B as being separated by the empty set. Then Theorem 17.3 implies that A q B .
The independence condition in Theorem 17.3 is called the global Markov property. We thus see that the pairwise and global Markov properties are equivalent. Let us state this more precisely. Given a graph G , let Mpair (G ) be the set of distributions which satisfy the pairwise Markov property: thus P 2 Mpair (G ) if, under P , X q Y jrest if and only if there is no edge between X and Y . Let Mglobal (G ) be the set of distributions which satisfy the global Markov property: thus P 2 Mpair (G ) if, under P, A q B jC if and only if C separates A and B . Theorem 17.5 Let
G be a graph. Then,
Mpair
(G ) = Mglobal (G ).
This theorem allows us to construct graphs using the simpler pairwise property and then we can deduce other independence relations using the global Markov property. Think how hard this would be to do algebraically. Returning to 17.6, we now see that X q Z jY and Y q W jZ . Example 17.6 Figure 17.7 implies that (Y; Z ). Example 17.7 Figure 17.8 implies that
j
Z Y
.
X
X
q
Y
,
X
q j( W
q
and
X
) and
X
Z
Y; Z
q q
17.3 Probability and Graphs
Y
b
X
b
b
FIGURE 17.7.
X
X
b
qY,
X
qZ
and
X
Z
q (Y; Z ).
Y
b bW b
Z
FIGURE 17.8.
X
q W j(Y; Z )
and
X
q Z jY .
303
304 17.4
17. Undirected Graphs and Conditional Independence Fitting Graphs to Data
Given a data set, how do we nd a graphical model that ts the data. Some authors have devoted whole books to this subject. We will only treat the discrete case and we will consider a method based on log-linear models which are the subject of the next chapter.
17.5
Bibliographic Remarks
Thorough treatments of undirected graphs can be found in Whittaker (1990) and Lauritzen (1996). Some of the exercises below are adapted from Whittaker (1990).
17.6
Exercises
1. Consider random variables (X1 ; X2 ; X3 ). In each of the following cases, draw a graph that has the given independence relations. (a) X1 q X3 j X2 . (b) X1 q X2 j X3 and
X1
q
X3
j
X2
(c) X1 q X2 j X3 and
X1
q
X3
j
X2
. and
X2
q
X3
j
X1
.
2. Consider random variables (X1 ; X2 ; X3 ; X4 ). In each of the following cases, draw a graph that has the given independence relations. (a) X1 q X3 j X2 ; X4 and X1 q X4 j X2 ; X3 and X2 q X4 j X1 ; X3 .
17.6 Exercises
X1
X2
b
b
305
X4
b
b X3
FIGURE 17.9.
X1
X2
b
b
X3
b
X4
b
FIGURE 17.10.
(b) X1 q X2 j X3 ; X4 and X1 q X3 j X2 ; X4 and X2 q X3 j X1 ; X4 . (c)
X1
q
X3
j
X2 ; X4
and X2 q X4 j X1 ; X3 .
3. A conditional independence between a pair of variables is minimal if it is not possible to use the Separation Theorem to eliminate any variable from the conditioning set, i.e. from the right hand side of the bar (Whittaker 1990). Write down the minimal conditional independencies from: (a) Figure 17.9; (b) Figure 17.10; (c) Figure 17.11; (d) Figure 17.12. 4. Here are breast cancer data on diagnostic center (X1 ), nuclear grade (X2 ), and survival (X3 ):
306
17. Undirected Graphs and Conditional Independence
X3
X4
b
bX
b
bX
2
1
FIGURE 17.11.
X1
b
X2
b
b
X3
b
b
b
X4
X5
X6
FIGURE 17.12.
17.6 Exercises
307
malignant malignant benign benign X3 died survived died survived Boston 35 59 47 112 Glamorgan 42 77 26 76 X2
X1
(a) Treat this as a multinomial and nd the maximum likelihood estimator. (b) If someone has a tumour classi ed as benign at the Glamorgan clinic, what is the estimated probability that they will die? Find the standard error for this estimate. (c) Test the following hypotheses:
q 2j 1 q 3j 2 q 3j
X1
X
X3
X
X
X2
X
X
X1
versus versus versus
q= 2j = 3j 1 q = 3j 2 q
X1
X
X3
X
X
X2
X
X
X1
Based on the results of your tests, draw and interpret the resulting graph.
18 Loglinear Models
In this chapter we study loglinear models which are useful for modelling multivariate discrete data. There is a strong connection between loglinear models and undirected graphs. Parts of this Chapter draw on the material in Whittaker (1990). 18.1
The Loglinear Model
Let X = (X1 ; : : : ; Xm) be a random vector with probability function f (x) = P(X
= x) = P(X1 = x1; : : : ; Xm = xm )
where x = (x1; : : : ; xm ). Let rj be the number of values that Xj takes. Without loss of generality, we can assume that Xj 2 f0; 1; : : : ; rj 1g. Suppose now that we have n such random vectors. We can think of the data as a sample from a Multinomial with N = r1 r2 rm categories. The data can be represented as counts in a r1 r2 rm table. Let p = (p1; : : : ; pN ) denote the multinomial parameter.
This is page 309 Printer: Opaque this
310
18.
Loglinear Models
Let S = f1; : : : ; mg. Given a vector x = (x1 ; : : : ; xm ) and a subset A S , let xA = (xj : j 2 A). For example, if A = f1; 3g then xA = (x1; x3 ). Theorem 18.1 The joint probability function f (x) of a single random vector X = (X1 ; : : : ; Xm ) can be written as X log f (x) = (18.1) A (x) AS
where the sum is over all subsets A of S 's satisfy the following conditions:
= f1; : : : ; mg and the
1. ; (x) is a constant; 2. For every A S , the rest of the x0j s.
A
(x) is only a function of xA and not
3. If i 2 A and xi = 0, then
(x) = 0. The formula in equation (18.1) is called the log-linear expansion of f . Note that this is the probability function for a single draw. Each A (x) will depend on some unknown parameters A. Let = ( A : A S ) be the set of all these parameters. We will write f (x) = f (x; ) when we want to estimate the dependence on the unknown parameters . In terms of the multinomial, the parameter space is
P=
n
A
p = (p1 ; : : : ; pN ) : pj
0;
N X j =1
pj
o
=1
:
This is an N 1 dimensional space. In the log-linear represenation, the parameter space is n o = = ( 1 ; : : : ; N ) : = (p); p 2 P where (p) is the set of values associated with p. The set is a N 1 dimensinal surface in R N . We can always go back and forth betwee the two parameterizations we can write = (p) and p = p( ).
18.1 The Loglinear Model
Example 18.2 Let X
311
Bernoulli(p) where 0 < p < 1. We can
write the probability mass function for X as f (x) = px (1
p)1
x
= px1 p12
for x = 0; 1, where p1 = p and p2 = 1
x
p. Hence,
log f (x) = ; (x) + 1 (x) where ; (x)
= log(p2) p1 1 (x) = x log p : 2
Notice that ; (x) is a constant (as a function of x) and 1 (x) = 0 when x = 0. Thus the three conditions of the Theorem hold. The loglinear parameters are 0 = log(p2 ); 1 = log
p1 : p2
The original, multinomial parameter space is P = f(p1 ; p2 ) pj 0; p1 + p2 = 1g. The log-linear parameter space is
= f( 0 ; 1) 2 R 2 :
e 0 + 1 + e 0
:
= 1:g
Given (p1 ; p2 ) we can solve for ( 0 ; 1 ). Conversely, given ( 0 ; 1 ) we can solve for (p1 ; p2 ). Example 18.3 Let X
= (X1; X2) where X1 2 f0; 1g and X2 2
f0; 1; 2g. The joint distribution of n such random vectors is a
multinomial with 6 categories. The multinomial parameters can be written as a 2-by-3 table as follows: multinomial x2 0 1 2 x1 0 p00 p01 p02 1 p10 p11 p12
312
18.
Loglinear Models
The n data vectors can be summarized as counts: data x2 0 1 2 x1 0 C00 C01 C02 1 C10 C11 C12 For x = (x1 ; x2 ), the log-linear expansion takes the form
log f (x) = ; (x) + 1 (x) + 2 (x) +
12 (x)
where ; (x)
= log p00 p10 1 (x) = x1 log p 2 (x) 12 (x)
00
= I (x2 = 1) log
p01 p00
= I (x1 = 1; x2 = 1) log
+ I (x2 = 2) log
p11 p00 p01 p10
p02 p00
+ I (x1 = 1; x2 = 2) log
Convince yourself that the three conditions on the 's of the theorem are satis ed. The six parameters of this model are: 1 = log p00 4 = log
p02 p00
2
= log
5
= log
p10 p00
p11 p00 p01 p10
3 = log 6 = log
p01 p00
p12 p00 p02 p10
:
The next Theorem gives an easy way to check for conditional independence in a loglinear model. Theorem 18.4 Let (Xa ; Xb ; Xc) be a partition of a vectors (X1 ; : : : ; Xm ). Then Xb q Xc jXa if and only if all the -terms in the log-linear
expansion that have at least one coordinate in b and one coordinate in c are 0.
To prove this Theorem, we will use the following Lemma whose proof follows easily from the de nition of conditional independence.
p12 p00 : p02 p10
18.2 Graphical Log-Linear Models
313
Lemma 18.5 A partition (Xa ; Xb ; Xc) satis es Xb q XcjXa if and
only if f (xa ; xb ; xc ) = g (xa ; xb )h(xa ; xc ) for some functions g and h
(Theorem 18.4.) Suppose that t is S 0 whenever tShas coordinates in b and c. Hence, t is 0 if t 6 a b or t 6 a c. Therefore X X X log f (x) = S t (x) + S t (x) t (x): Proof.
ta
b
ta
ta
c
Exponentiating, we see that the joint density is of the form g (xa ; xb )h(xa ; xc ). By Lemma 18.5, Xb q Xc jXa . The converse follows by reversing the argument. 18.2
Graphical Log-Linear Models
A log-linear model is graphical if missing terms correspond only to conditional independence constraints. De nition 18.6 Let log f (x) =
P
AS
A
(x) be a log-linear
model. Then f is graphical if all -terms are non-zero except for any pair of coordinates not in the edge set for some graph G . In other words, A (x) = 0 if and only if fi; j g A and (i; j ) is not an edge.
Here is way to think about the de nition above: If you can add a term to the model and the graph does not change, then the model is not graphical. Example 18.7 Consider the graph in Figure 18.1.
is
The graphical log-linear model that corresponds to this graph
log f (x) = +
; + 1 (x) + 2 (x) + 3 (x) + 4 (x) + 5 (x) 12 (x) + 23 (x) + 25 (x) + 34 (x) + 35 (x) + 45 (x) + 235 (x) + 345 (x):
314
18.
Loglinear Models
X5
X4
b
b
b
X1
X2
X3
b
b
FIGURE 18.1. Graph for Example 18.7.
Let's see why this model is graphical. The edge (1; 5) is missing in the graph. Hence any term containing that pair of indices is omitted from the model. For example, 15 ;
125 ;
135 ;
145 ;
1235 ;
1245 ;
1345 ;
12345
are all omitted. Similarly, the edge (2; 4) is missing and hence 24 ;
124 ;
234 ;
245 ;
1234 ;
1245 ;
2345 ;
12345
are all omitted. There are other missing edges as well. You can check that the model omits all the corresponding terms. Now consider the model
log f (x) = +
; (x) + 1 (x) + 2 (x) + 3 (x) + 4 (x) + 5 (x) 12 (x) + 23 (x) + 25 (x) + 34 (x) + 35 (x) + 45 (x):
This is the same model except that the three way interactions were removed. If we draw a graph for this model, we will get the same graph. For example, no terms contain (1; 5) so we omit the edge between X1 and X5 . But this is not graphical since it has extra terms omitted. The independencies and graphs for the two models are the same but the latter model has other constraints besides conditional independence constraints. This is not a bad
18.3 Hierarchical Log-Linear Models
315
thing. It just means that if we are only concerned about presence or absence of conditional independences, then we need not consider such a model. The presence of the three-way interaction 235 means that the strength of association between X2 and X3 varies as a function of X5 . Its absence indicates that this is not so. 18.3
Hierarchical Log-Linear Models
There is a set of log-linear models that is larger than the set of graphical models and that are used quite a bit. These are the hierarchical log-linear models. De nition 18.8 A log-linear model is hierarchical if
0 and a t implies that
t
= 0.
a
=
Lemma 18.9 A graphical model is hierarchical but the reverse
need not be true.
Example 18.10 Let
log f (x) = ; (x) + 1(x) + 2 (x) + 3 (x) +
12 (x) + 13 (x):
The model is hierarchical; its graph is given in Figure 18.2. The model is graphical because all terms involving (1,3) are omitted. It is also hierarchical. Example 18.11 Let
log f (x) = ; (x)+ 1 (x)+ 2 (x)+ 3 (x)+ 12 (x)+ 13 (x)+ 23 (x): The model is hierarchical. It is not graphical. The graph corresponding to this model is complete; see Figure 18.3. It is not graphical because 123 (x) = 0 which does not correspond to any pairwise conditional independence.
316
18.
Loglinear Models
b
b
b
X1
X2
X3
FIGURE 18.2. Graph for Example 18.10.
X1
X2
b
b
b
X3 FIGURE 18.3. The graph is complete. The model is hierarchical but not graphical.
18.4 Model Generators
b
b
b
X1
X2
X3
317
FIGURE 18.4. The model for this graph is not hierarchical.
Example 18.12 Let
log f (x) = ; (x) + 3 (x) +
12 (x):
The graph corresponding is in Figure 18.4. This model is not hierarchical since 2 = 0 but 12 is not. Since it is not hierarchical, it is not graphical either.
18.4
Model Generators
Hierarchical models can be written succinctly using generators. This is most easily explained by example. Suppose that X = (X1 ; X2 ; X3 ). Then, M = 1:2 + 1:3 stands for log f =
; + 1 + 2 + 3 + 12 + 13 :
The formula M = 1:2+1:3 says: \include 12 and 13 ." We have to also include the lower order terms or it won't be hierarchical. The generator M = 1:2:3 is the saturated model log f =
; + 1 + 2 + 3 + 12 + 13 + 23 + 123 :
The saturated models corresponds to tting an unconstrained multinomial. Consider M = 1 + 2 + 3 which means log f =
; + 1 + 2 + 3:
318
18.
Loglinear Models
1.2
1+2 1
2
; FIGURE 18.5. The lattice with two variables.
This is the mutual independence model. Finally, consider M = 1:2 which has log-linear expansion log f =
; + 1 + 2 + 12 :
This model makes X3 jX2 = x2 ; X1 = x1 a uniform distribution. 18.5
Lattices
Hierarchical models can be organized into something called a lattice. This is the set of all hierarchical models partially ordered by inclusion. The set of all hierarchical models for two variables can be illustrated as in Figure 18.5. M = 1:2 is the saturated model, M = 1 + 2 is the independence model, M = 1 is independence plus X2jX1 is uniform, M = 2 is independence plus X1 jX2 is uniform, M = 0 is the uniform distribution. The lattice of trivariate models is shown in gure 18.6. 18.6
Fitting Log-Linear Models to Data
Let denote all the parameters in a log-linear model M . The loglikelihood for is `( ) =
X j
xj log pj ( )
18.6 Fitting Log-Linear Models to Data
saturated (graphical)
1.2.3
two-way
1.2 + 1.3 + 2.3
319
graphical
1.2 + 1.3
1.2 + 2.3
1.3 + 2.3
graphical
1.2 + 3
1.3 + 2
2.3 + 1
mutual independence (graphical) conditional uniform independence uniform
1+2+3 1.2
1.3
2.3
1+2
1+3
2+3
1
2
3
;
FIGURE 18.6. The lattice of models for three variables.
320
18.
Loglinear Models
where the sum is over the cells and p( ) denotes the cell probabilities corresponding to . The mle b generally has to be found numerically. The model with all possible -terms is called the saturated model. We can also t any sub-model which corresponds to setting some subset of terms to 0. De nition 18.13 For any submodel M , de ne the deviance
dev(M ) by
dev(M ) = 2(`bsat
`bM )
where `bsat is the log-likelihood of the saturated model evaluated at the mle and `bM is the log-likelihood of the model M evaluated at its mle . Theorem
for
18.14
The deviance is the likelihood ratio test statistic
H0 : the
model is M versus H1 : the model is not M: d Under H0 , dev (M ) ! 2 with degrees of freedom equal to the
dierence in the number of parameters between the saturated model and M .
One way to nd a good model is to use the deviance to test every sub-model. Every model that is not rejected by this test is then considered a plausible model. However, this is not a good strategy for two reasons. First, we will end up doing many tests which means that there is ample opportunity for making Type I and Type II errors. Second, we will end up using models where we failed to reject H0. But we might fail to reject H0 due to low power. The result is that we end up with a bad model just due to low power. There are many model searching strategies. A common approach is to use some form of penalized likelihood. One version of penalized is the AIC that we used in regression. For any model
18.6 Fitting Log-Linear Models to Data
M
de ne
321
AIC(M ) = 2 jM j (18.2) where jM j is the number of parameters. Here is the explanation of where AIC comes from. Consider a set of models fM1; M2 ; : : :g. Let fbj (x) denote the estimated probability function obtained by using the maximum likelihood estimator ofr mode Mj . Thus, fbj (x) = fb(x; bj ) where bj is the mle of the set of parameters j for model Mj . We will use the loss function D(f; fb) where X f (x) f (x) log D(f; g ) = g (x) x `b(M )
is the Kullback-Leibler distance between two probability functions. The corresponding risk function is R(f;P fb) = E (D(f; fb). Notice that D(f; fb) = c A(f; fb) where c = x f (x) log f (x) deos not depend on fb and A(f; fb) =
X x
f (x) log fb(x):
Thus, minimizing the risk is equivalent to maximizing a(f; fb) E (A(f; fb)). It is tempting to estimate a(f; fb) by Px fb(x) log fb(x) but, just as the training error in regression is a highly P bbiased estimate of prediction risk, it is also the case that x f (x) log fb(x) is a highly biased estimate of a(f; fb). In fact, the bias is approximately equal to jMj j. Thus: Theorem 18.15 AIC (Mj ) is an approximately unbiased estimate of a(f; fb). After nding a \best model" this way we can draw the corresponding graph. We can also check the overall t of the selected model using the deviance as described above.
322
18.
Loglinear Models
Example 18.16 Data on breast cancer from Morrison et al (1973)
were presented in homework question 4 of Chapter 17. The data are on diagnostic center (X1 ), nuclear grade (X2 ), and survival (X3 ): (Morrison et al 1973): X2 malignant malignant benign benign X3 died survived died survived X1 Boston 35 59 47 112 Glamorgan 42 77 26 76 The saturated log-linear model is:
CoeÆcient Standard Error Wald Statistic p-value (Intercept) 3.56 0.17 21.03 center 0.18 0.22 0.79 grade 0.29 0.22 1.32 survival 0.52 0.21 2.44 centergrade -0.77 0.33 -2.31 centersurvival 0.08 0.28 0.29 gradesurvival 0.34 0.27 1.25 centergradesurvival 0.12 0.40 0.29 The best sub-model, selected using AIC and backward searching is
0.00 *** 0.42 0.18 0.01 * 0.02 * 0.76 0.20 0.76
CoeÆcient Standard Error Wald Statistic p-value (Intercept) 3.52 0.13 25.62 < 0.00 *** center 0.23 0.13 1.70 0.08 grade 0.26 0.18 1.43 0.15 survival 0.56 0.14 3.98 6.65e-05 *** centergrade -0.67 0.18 -3.62 0.00 *** gradesurvival 0.37 0.19 1.90 0.05 The graph for this model M is shown in Figure 18.7. To test the t of this model, we compute the deviance of M which is 0.6. The appropriate 2 has 8 6 = 2 degrees of freedom. The p-value is P(22 > :6) = :74. So we have no evidence to suggest that the model is a poor t.
18.6 Fitting Log-Linear Models to Data
Center
Grade
323
Survival
FIGURE 18.7. The graph for Example 18.16.
The example above is a toy example. With so few variables, there is little to be gained by searching for a best model. The overall multinomial mle would be suÆcient. More importantly, we are probably interested in predicting survivial from grade and center in which case this should really be treated as a regresion problem with outcome Y = survival and covariates center and grade. Example 18.17 Here is a synthetic example. We generate n = 100 random vectors X = (X1; : : : ; X5) of length 5. We generated the data as follows:
X1 Bernoulli(1=2)
and
Xj jX1 ; : : : ; Xj 1
It follows that f (x1 ; : : : ; x5 )
= =
1=4 if Xj 1 = 0 3=4 if Xj 1 = 1:
x1 1 x1 x2 1 x2 x3 1 x3 x4 1 x4 3 3 3 3 3 3 3 1 3
2
4
2
4
4
4
4
x1+x2+x3+x4 4 x1 x2 x3 x4 1 3 3
4
4
:
We estimated f using three methods: (i) maximum likelihood treating this as a multinomial with 32 categories, (ii) maximum likelihood from the best loglinear model using AIC and forward selection and (iii) maximum likelihood from the best loglinear
4
4
4
324
18.
Loglinear Models
model using BIC and forward selection. We estimated the risk by simulating the example 100 times. The average risks were: Method Risk MLE 0.63 AIC 0.54 BIC 0.53 In this example, there is little dierence between AIC and BIC. Both are better than maximum likelihood. 18.7
Bibliographic Remarks
A classic references on loglinear models is Bishop, Fienberg and Holland (1975). See also Whittaker (1990) from which some of the exercises are borrowed. 18.8
Exercises
1. Solve for the p0ij s in terms of the 's in Example 18.3. 2. Repeat example 18.17 using 7 covariates and n = 1000. To avoid numerical problems, replace any zero count with a one. 3. Prove Lemma 18.5. 4. Prove Lemma 18.9. 5. Consider random variables (X1; X2; X3 ; X4). Suppose the log-density is log f (x) = ; (x) + 12 (x) + 13 (x) + 24 (x) + 34 (x): (a) Draw the graph G for these variables. (b) Write down all independence and conditional independence relations implied by the graph.
18.8 Exercises
325
(c) Is this model graphical? Is it hierarchical? 6. Suppose that parameters p(x1; x2 ; x3) are proportional to the following values: x1
0 0 0 1 0 2 8 1 16 128
x2 x3
1 1 0 1 4 16 32 256
Find the -terms for the log-linear expansion. Comment on the model. 7. Let X1; : : : ; X4 be binary. Draw the independence graphs corresponding to the following log-linear models. Also, identify whether each is graphical and/or hierarchical (or neither). (a) log f = 7 + 11x1 + 2x2 + 1:5x3 + 17x4 (b) log f = 7+11x1 +2x2 +1:5x3 +17x4 +12x2 x3 +78x2 x4 + 3x3 x4 + 32x2x3 x4 (c) log f = 7+11x1 +2x2 +1:5x3 +17x4 +12x2x3 +3x3x4 + x1 x4 + 2x1 x2 (d) log f = 7 + 5055x1x2 x3 x4
This is page 327 Printer: Opaque this
19
Causal Inference
In this Chapter we discuss causation. Roughly speaking, the statement \X causes Y " means that changing the value of X will change the distribution of Y . When X causes Y , X and Y will be associated but the reverse is not, in general, true. Association does not necessarily imply causation. We will consider two frameworks for discussing causation. The rst uses the notation of counterfactual random variables. The second, presented in the next Chapter, uses directed acyclic graphs.
19.1
The Counterfactual Model
Suppose that X is a binary treatment variable where X = 1 means \treated" and where X = 0 means \not treated." We are using the word \treatment" in a very broad sense: treatment might refer to a medication or something like smoking. An alternative to \treated/not treated" is \exposed/not exposed" but we shall use the former.
328
19. Causal Inference
Let Y be some outcome variable such as presence or absence of disease. To distinguish the statement \X is associated Y " from the statement \X causes Y " we need to enrich our probabilistic vocabulary. We will decompose the response Y into a more negrained object. We introduce two new random variables (C0 ; C1 ), called potential outcomes with the following interpretation: C0 is the outcome if the subject is not treated (X = 0) and C1 is the outcome if the subject is treated (X = 1). Hence,
Y=
C0 if X = 0 C1 if X = 1:
We can express the relationship between Y and (C0 ; C1 ) more succintly by Y = CX : (19.1) This equation is called the consistency relationship. Here is a toy data set to make the idea clear:
X Y C0 C1
0 0 0 0 1 1 1 1
4 7 2 8 3 5 8 9
4 7 2 8 * * * *
* * * * 3 5 8 9
The asterisks denote unobserved values. When X = 0 we don't observe C1 in which case we say that C1 is a counterfactual since it is the outcome you would have had if, counter to the fact, you had been treated (X = 1). Similarly, when X = 1 we don't observe C0 and we say that C0 is counterfactual. Notice that there are four types of subjects:
19.1 The Counterfactual Model
329
Type C0 C1 Survivors 1 1 Responders 0 1 Anti-repsonders 1 0 Doomed 0 0 Think of the potential outcomes (C0 ; C1 ) as hidden variables that contain all the relavent information about the subject. De ne the average causal eect or average treatment eect to be = E (C1 ) E (C0 ): (19.2) The parameter has the following interpretation: is the mean if everyone were treated (X = 1) minus the mean if everyone were not treated (X = 0). There are other ways of measuring the causal eect. For example, if C0 and C1 are binary, we de ne the causal odds ratio P(C1 = 1) P(C1 = 0)
1) PP((CC0 == 0) 0
and the causal relative risk P(C1 = 1) : P(C0 = 1)
The main ideas will be the same whatever causal eect we use. For simplicity, we shall work with the average causal eect . De ne the association to be
= E (Y jX = 1)
E (Y
jX = 0):
(19.3)
Again, we could use odds ratios or other summaries if we wish.
19.1 (Association is not equal to Causation) In general, 6= . Theorem
Example 19.2
Suppose the whole population is as follows:
330
19. Causal Inference
X Y C0 C1 0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
Again, the asterisks denote unobserved values. Notice that C0 = C1 for every subject, thus, this treatment has no eect. Indeed, 8 8 1X 1X = E (C1 ) E (C0 ) = C C 8 i=1 1i 8 i=1 0i 1+1+1+1+0+0+0+0 1+1+1+1+0+0+0+0 = 8 8 = 0:
Thus the average causal eect is 0. The observed data are X 's and Y 's only from which we can estimate the association:
jX = 1) E (Y jX = 0) 1+1+1+1 0+0+0+0 = 1: = 4 4
=
E (Y
Hence, 6= . To add some intutition to this example, imagine that the outcome variable is 1 if \healthy" and 0 if \sick". Suppose that X = 0 means that the subject does not take vitamin C and that X = 1 means that the subject does take vitamin C. Vitamin C has no causal eect since C0 = C1 . In this example there are two types of people: healthy people (C0 ; C1 ) = (1; 1) and unhealthy people (C0 ; C1 ) = (0; 0). Healthy people tend to take vitamin C while unhealty people don't. It is this association between (C0 ; C1 ) and X that creates an association between X and Y . If we only had data on X and Y we would conclude that X
19.1 The Counterfactual Model
331
and Y are associated. Suppose we wrongly interpret this causally and conclude that vitamin C prevents illness. Next we might encourage everyone to take vitamin C. If most people comply the population will look something like this:
X Y C0 C1
0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
Now = (4=7) (0=1) = 4=7. We see that went down from 1 to 4/7. Of course, the causal eect never changed but the naive observer who does not distinguish association and causation will be confused because his advice seems to have made things worse instead of better. In the last example, = 0 and = 1. It is not hard to create examples in which > 0 and yet < 0. The fact that the association and causal eects can have dierent signs is very confusing to many people including well educated statisticians. The example makes it clear that, in general, we cannot use the association to estimate the causal eect . The reason that 6= is that (C0 ; C1 ) was not independent of X . That is, treatment assigment was not independent of person type. Can we ever estimate the causal eect? The answer is: sometimes. In particular, random asigment to treatment makes it possible to estimate .
19.3 Suppose we randomly assign subjects to treatment and that P(X = 0) > 0 and P(X = 1) > 0. Then = . Hence, any consistent estimator of is a consistent estimator Theorem
332
19. Causal Inference
of . In particular, a consistent estimator is
b =
Eb (Y
= Y1
jX = 1) Eb (Y jX = 0) Y0
is a consistent estimator of , where
Y1 =
1
X
n1 i:X =1
Yi ;
Y0 =
i
n1 =
1
X
Y; n0 i:X =0 i i
Pn X and n = Pn (1 X ). 0 i i=1 i i=1
Since X is randomly assigned, X is independent of (C0 ; C1 ). Hence, Proof.
=
E (C1 )
E (C0 )
= E (C1 jX = 1) E (C0 jX = 0) = E (Y jX = 1) E (Y jX = 0) = :
since X q (C0 ; C1 ) since Y = CX
The consistency follows from the law of large numbers. If Z is a covariate, we de ne the conditional causal eect by
z = E (C1 jZ = z )
E (C0
jZ = z):
For example, if Z denotes gender with values Z = 0 (women) and Z = 1 (men), then 0 is the causal eect among women and 1 is the causal eect among men. In a randomized experiment, z = E (Y jX = 1; Z = z ) E (Y jX = 0; Z = z ) and we can estimate the conditional causal eect using appropriate sample averages.
19.2 Beyond Binary Treatments
333
Summary
Random variables: (C0 ; C1 ; X; Y ). Consistency relationship: Y = CX . Causal Eect: = E (C1 ) E (C0 ). Association: = E (Y jX = 1) E (Y jX = 0). Random assigment =) (C0 ; C1 ) q X =) = . 19.2
Beyond Binary Treatments
Let us now generalize beyond the binary case. Suppose that X 2 X . For example, X could be the dose of a drug in which case X 2 R . The counterfactual vector (C0 ; C1 ) now becomes the counterfactual process
fC (x) : x 2 Xg
(19.4)
where C (x) is the outcome a subject would have if he received dose x. The observed reponse is given by the consistency relation
Y
C (X ):
(19.5)
See Figure 19.1. The causal regression function is
(x) = E (C (x)):
(19.6)
The regression function, which measures association, is r(x) = E (Y jX = x).
19.4 In general, (x) 6= r(x). However, when X is randomly assigned, (x) = r(x). Theorem
An example in which (x) is constant but r(x) is not constant is shown in Figure 19.2. The gure shows the counterfactuals processes for four subjects. The dots represent Example 19.5
334
19. Causal Inference
C (x)
7 6 5
Y = C (X ) 4 3 2 1 0
0
1
2
3
4
5
6
X
7
8
9
10
FIGURE 19.1. A counterfactual process C (x). The outcome Y is the value of the curve C (x) evaluated at the observed dose X .
11
x
19.3 Observational Studies and Confounding
335
their X values X1 ; X2 ; X3 ; X4 . Since Ci(x) is constant over x for all i, there is no causal eect and hence C (x) + C2 (x) + C3 (x) + C4 (x) (x) = 1 4 is constant. Changing the dose x will not change anyone's outcome. The four dots in the lower plot represent the observed data points Y1 = C1 (X1 ); Y2 = C2 (X2 ); Y3 = C3 (X3 ); Y4 = C4 (X4 ). The dotted line represents the regression r(x) = E (Y jX = x). Although there is no causal eect, there is an association since the regression curve r(x) is not constant. 19.3
Observational Studies and Confounding
A study in which treatment (or exposure) is not randomly assigned, is called an observational study. In these studies, subjects select their own value of the exposure X . Many of the health studies your read about in the newspaper are like this. As we saw, association and causation could in general be quite dierent. This descrepancy occurs in non-randomized studies because the potential outcome C is not independent of treatment X . However, suppose we could nd groupings of subjects such that, within groups, X and fC (x) : x 2 Xg are independent. This would happen if the subjects are very similar within groups. For example, suppose we nd people who are very similar in age, gender, educational background and ethnic background. Among these people we might feel it is reasonable to assume that the choice of X is essentially random. These other variables are called confounding variables. If we denote these other variables collectively as Z , then we can express this idea by saying that fC (x) : x 2 Xg q X jZ: (19.7) Equation (19.7) means that, within groups of Z , the choice of treatment X does not depend on type, as represented by fC (x) :
A more precise definition of confounding is given in the next Chapter.
336
19. Causal Inference
3
b
C1 (x)
b
C2 (x)
2
b
C3 (x)
1
b
C4 (x)
0
0
1
2
3
4 b
3
b
2
Y3
b
1
Y4
b
0
0
1
2
3
5
x
Y1 Y2 (x)
4
5
FIGURE 19.2. The top plot shows the counterfactual process C (x) for four subjects. The dots represent their X values. Since Ci (x) is constant over x for all i, there is no causal eect. Changing the dose will not change anyone's outcome. The lower plot shows the causal regression function (x) = (C1 (x) + C2 (x) + C3 (x) + C4 (x))=4. The four dots represent the observed data points Y1 = C1 (X1 ); Y2 = C2 (X2 ); Y3 = C3 (X3 ); Y4 = C4 (X4 ). The dotted line represents the regression r(x) = E (Y jX = x). There is no causal eect since Ci (x) is constant for all i. But there is an association since the regression curve r(x) is not constant.
19.3 Observational Studies and Confounding
337
x 2 Xg. If (19.7) holds and we observe Z then we say that there is no unmeasured confounding. Theorem
19.6 Suppose that (19.7) holds. Then, (x) =
Z
E (Y
jX = x; Z = z)dFZ (z)dz:
(19.8)
If rb(x; z ) is a consistent estimate of the regression function E (Y jX = x; Z = z ), then a consistent estimate of (x) is n b(x) = 1 X rb(x; Zi): n i=1 In particular, if r(x; z ) = 0 + 1 x + 2 z is linear, then a consistent estimate of (x) is
b(x) = b0 + b1 x + b2 Z n
(19.9)
where ( b0 ; b1 ; b2 ) are the least squares estimators. It is useful to compare equation R(19.8) to E (Y jX = x) which can be written as E (Y jX = x) = E (Y jX = x; Z = z )dFZ jX (z jx). Remark 19.7
Epidemiologists call (19.8) the adjusted treatment eect. The process of computing adjusted treatment eects is called adjusting (or controlling) for confounding. The selection of what confounders Z to measure and control for requires scienti c insight. Even after adjusting for confounders, we cannot be sure that there are not other confounding variables that we missed. This is why observational studies must be treated with healthy skepticism. Results from observational studies start to become believable when: (i) the results are replicated in many studies, (ii) each of the studies controlled for plausible confounding variables, (iii) there is a plausible scienti c explanation for the existence of a causal relationship.
338
19. Causal Inference
A good example is smoking and cancer. Numerous studies have shown a relationship between smoking and cancer even after adjusting for many confouding variables. Moreover, in laboratory studies, smoking has been shown to damage lung cells. Finally, a causal link between smoking and cancer has been found in randomized animal studies. It is this collection of evidence over many years that makes this a convincing case. One single observational study is not, by itself, strong evidence. Remember that when you read the newspaper.
19.4
Simpson's Paradox
Simpson's paradox is a puzzling phenomenon that is dicussed in most statistics texts. Unfortunately, it is explained incorrectly in most statistics texts. The reason is that it is nearly impossible to explain the paradox without using counterfactuals (or directed acyclic graphs). Let X be a binary treatment variable, Y a binary outcome and Z a third binary variable such as gender. Suppose the joint distribution of X; Y; Z is
X=1 X=0
Y =1 Y =0 .1500 .2250 .0375 .0875 Z = 1 (men)
Y =1
Y =0
.1000 .0250 .2625 .1125 Z = 0 (women)
The marginal distribution for (X; Y ) is
19.4 Simpson's Paradox
X=1 X=0
339
Y =1 Y =0 .25 .30 .55
.25 .50 .20 .50 .45 1
Now,
j
P(Y = 1 X = 1)
j
P(Y = 1 X = 0) =
P(Y = 1; X = 1) P(X = 1)
:25 :30 :50 :50 = 0:1:
P(Y = 1; X = 0) P(X = 0)
=
We might naively interpret this to mean that the treatment is bad for you since P(Y = 1jX = 1) < P(Y = 1jX = 0). Furthermore, among men,
j
P(Y = 1 X = 1; Z = 1)
j
P(Y = 1 X = 0; Z = 1) P(Y = 1; X = 1; Z = 1) = P(X = 1; Z = 1)
=
:15
:3750 = 0:1:
:0375 :1250
P(Y = 1; X = 0; Z = 1) P(X = 0; Z = 1)
Among women,
j
P(Y = 1 X = 1; Z = 0)
j
P(Y = 1 X = 0; Z = 0) P(Y = 1; X = 1; Z = 0) = P(X = 1; Z = 0)
=
:1
:1250 = 0:1:
:2625 :3750
P(Y = 1; X = 0; Z = 0) P(X = 0; Z = 0)
To summarize, we seem to have the following information:
340
19. Causal Inference
Mathematical Statement P(Y = 1jX = 1) < P(Y = 1jX = 0) P(Y = 1jX = 1; Z = 1) > P(Y = 1jX = 0; Z = 1) P(Y = 1jX = 1; Z = 0) > P(Y = 1jX = 0; Z = 0)
English Statement? treatment is harmful treatment is bene cial to men treatment is bene cial to women
Clearly, something is amiss. There can't be a treatment which is good for men, good for women but bad overall. This is nonsense. The problem is with the set of English statements in the table. Our translation from math into english is specious.
The inequality P(Y = 1jX = 1) < P(Y = 1jX = 0) does not mean that treatment is harmful.
The phrase \treatment is harmful" should be written mathematically as P(C1 = 1) < P(C0 = 1). The phrase \treatment is harmful for men" should be written P(C1 = 1jZ = 1) < P(C0 = 1jZ = 1). The three mathematical statements in the table are not at all contradictory. It is only the translation into English that is wrong. Let us now show that a real Simpson's paradox cannot happen, that is, there cannot be a treatment that is bene cial for men and women but harmful overall. Suppose that treatment is bene cial for both sexes. Then
j
P(C1 = 1 Z = z )
> P(C0 = 1jZ = z )
for all z . It then follows that P(C1 = 1) =
X z X
j
P(C1 = 1 Z = z )P(Z = z )
P(C0 = 1jZ = z )P(Z = z ) z = P(C0 = 1):
>
Hence, P(C1 = 1) > P(C0 = 1) so treatment is bene cial overall. No paradox.
19.5 Bibliographic Remarks 19.5
341
Bibliographic Remarks
The use of potential outcomes to clarify causation is due mainly to Jerzy Neyman and Don Rubin. Later developments are due to Jamie Robins, Paul Rosenbaum and others. A parallel development took place in econometrics by various people including Jim Heckman and Charles Manski. Currently, there are no textbook treatments of causal inference from the potential outcomes viewpoint. 19.6
Exercises
1. Create an example like Example 19.2 in which > 0 and < 0. 2. Prove Theorem 19.4. 3. Suppose you are given data (X1 ; Y1 ); : : : ; (Xn; Yn) from an observational study, where Xi 2 f0; 1g and Yi 2 f0; 1g. Although it is not possible to estimate the causal eect , it is possible to put bounds on . Find upper and lower bounds on that can be consistently estimated from the data. Show that the bounds have width 1. Hint: Note that E (C1 ) = E (C1 jX = 1)P(X = 1) + E (C1 jX = 0)P(X = 0). 4. Suppose that X 2 R and that, for each subject i, Ci (x) = 1i x. Each subject has their own slope 1i . Construct a joint distribution on ( 1 ; X ) such that P( 1 > 0) = 1 but E (Y jX = x) is a decreasing function of x, where Y = C (X ). Interpret. Hint: Write f ( 1 ; x) = f ( 1 )f (xj 1 ). Choose f (xj 1 ) so that when 1 is large, x is small and when 1 is small x is large.
This is page 343 Printer: Opaque this
20
Directed Graphs
20.1
Introduction
Directed graphs are similar to undirected graphs except that there are arrows between vertices instead of edges. Like undirected graphs, directed graphs can be used to represent independence relations. They can also be used as an alternative to counterfactuals to represent causal relationships. Some people use the phrase Bayesian network to refer to a directed graph endowed with a probability distribution. This is a poor choice of terminology. Statistical inference for directed graphs can be performed using frequentist or Bayesian methods so it is misleading to call them Bayesian networks.
20.2
DAG's
A directed graph G consists of a set of vertices V and an edge set E of ordered pairs of variables. If (X; Y ) 2 E then there is an arrow pointing from X to Y . See Figure 20.1.
344
20. Directed Graphs
Y
X
Z
FIGURE 20.1. A directed graph with E = f(Y; X ); (Y ; Z )g.
V
= fX; Y ; Z g and
If an arrow connects two variables X and Y (in either direction) we say that X and Y are adjacent. If there is an arrow from X to Y then X is a parent of Y and Y is a child of X . The set of all parents of X is denoted by or (X ). A directed path from X to Y is a set of vertices beginning with X , ending with Y such that each pair is connected by an arrow and all the arrows point in the same direction as follows: X
X
! ! Y
or X
Y
A sequence of adjacent vertices staring with X and ending with Y but ignoring the direction of the arrows is called an undirected path. The sequence fX; Y; Z g in Figure 20.1 is an undirected path. X is an ancestor of Y is there is a directed path from X to Y . We also say that Y is a descendant of X . A con guration of the form:
X
!Y
Z
is called a collider. A con guration not of that form is called a non-collider, for example,
X or
X
!Y !Z Y
!Z
20.3 Probability and DAG's
345
A directed path that starts and ends at the same variable is called a cycle. A directed graph is acyclic if it has no cycles. In this case we say that the graph is a directed acyclic graph or DAG. From now on, we only deal with graphs that are DAG's. 20.3
Probability and DAG's
Let G be a DAG with vertices V = (X1 ; : : : ; X ). k
De nition 20.1 If P is a distribution for V with probability
function p, we say that If P is represents P if
p(v ) =
Y k
Markov to
p(x
i
G or that G
j )
(20.1)
i
i=1
where are the parents of X . The set of distributions represented by G is denoted by M (G ). i
i
Example 20.2 For the DAG in Figure 20.2, P
only if its probability function p has the form
2 M (G ) if and
p(x; y; z; w) = p(x)p(y )p(z j x; y )p(w j z ):
The following theorem says that P 2 M (G ) if and only if the Markov Condition holds. Roughly speaking, the Markov Condition says that every variable W is independent of the \past" given its parents. Theorem 20.3 A distribution P
lowing
Markov Condition
2 M (G ) if and only if the fol-
holds: for every variable W ,
fj W qW
W
(20.2)
f denotes all the other variables except the parents and where W descendants of W .
346
20. Directed Graphs
X Z
W
Y FIGURE 20.2. Another DAG.
B
A
D
E
C
FIGURE 20.3. Yet another DAG.
Example 20.4 In Figure 20.2, the Markov Condition implies that
X qY
and W
q fX; Y g j Z:
Example 20.5 Consider the DAG in Figure 20.3. In this case
probability function must factor like
p(a; b; c; d; e) = p(a)p(bja)p(cja)p(djb; c)p(ejd):
The Markov Condition implies the following independence relations:
D q A j fB; C g; E q fA; B; C g j D and B q C j A
X2
20.4 More Independence Relations
347
X3
X1
X5
X4
FIGURE 20.4. And yet another DAG. 20.4
More Independence Relations
With undirected graphs, we saw that pairwise independence relations implied other independence relations. Luckily, we were able to deduce these other relations from the graph. The situation is similar for DAG's. The Markov Condition allows us to list some independence relations. These relations may logically imply other other independence relations. Consider the DAG in Figure 20.4. The Markov Condition implies:
X1 q X2 ;
X2 q fX1 ; X4 g;
X4 q fX2 ; X3 g j X1 ;
X3 q X4 j fX1 ; X2 g;
X5 q fX1 ; X2 g j fX3 ; X4 g
It turns out (but it is not obvious) that these conditions imply that
fX ; X g q X j fX ; X g: 4
5
2
1
3
How do we nd these extra independence relations? The answer is \d-separation." d-separation can be summarized by three rules. Consider the four DAG's in Figure 20.5 and the DAG in Figure 20.6. The rst 3 DAG's in Figure 20.5 are non-colliders. The DAG in the lower right of Figure 20.5 is a collider. The DAG in Figure 20.6 is a collider with a descendent.
In what follows, we implicitly assume that P is faithful to G which means that P has no extra independence relations other than those logically implied by the Markov Condition.
348
20. Directed Graphs
X
Y
Z
X
Y
Z
X
Y
Z
X
Y
Z
FIGURE 20.5. The rst three DAG's have no colliders. The fourth DAG in the lower right corner has a collider at Y .
X
Y
Z
W FIGURE 20.6. A collider with a descendent.
The rules of d-separation.
Consider the DAG's in Figures 20.5 and 20.6. Rule (1). In a non-collider, X and Z are d-connected, but they are d-separated given Y . Rule (2). If X and Z collide at Y then X and Z are d-separated but they are d-connected given Y . Rule (3). Conditioning on the descendant of a collider has the same eect as conditioning on the collider. Thus in Figure 20.6, X and Z are d-separated but they are d-connected given W . Here is a more formal de nition of d-separation. Let X and Y be distinct vertices and let W be a set of vertices not containing X or Y . Then X and Y are d-separated given W if there
20.4 More Independence Relations
X
U S1
V
W
349
Y
S2
FIGURE 20.7. d-separation explained.
exists no undirected path U between X and Y such that (i) every collider on U has a descendant in W and (ii) no other vertex on U is in W . If U; V and W are distinct sets of vertices and U and V are not empty, then U and V are d-separated given W if for every X 2 U and Y 2 V , X and Y are d-separated given W . Vertices that are not d-separated are said to be dconnected. Example 20.6 Consider the DAG in Figure 20.7. From the d-
seperation rules we conclude that: X and Y are d-separated (given the empty set) X and Y are d-connected given fS1 ; S2 g X and Y are d-separated given fS1 ; S2 ; V g
Theorem 20.7 (Spirtes, Glymour and Scheines) Let A, B
and C be disjoint sets of vertices. Then A q B j C if and only if A and B are d-separated by C . Example 20.8 The fact that conditioning on a collider creates
dependence might not seem intuitive. Here is a whimsical example from Jordan (2003) that makes this idea more palatable. Your friend appears to be late for a meeting with you. There are two explanations: she was abducted by aliens or you forgot to set your watch ahead one hour for daylight savings time. See Figure 20.8 Aliens and Watch are blocked by a collider which implies they are marginally independent. This seems reasonable since, before we know anything about your friend being late, we would expect these variables to be independent. We would also expect that P(Aliens = yesjLate = Y es) > P(Aliens = yes);
350
20. Directed Graphs
aliens
watch
late
FIGURE 20.8. Jordan's alien example. Was your friend kidnapped by aliens or did you forget to set your watch?
learning that your friend is late certainly increases the probability that she was abducted. But when we learn that you forgot to set your watch properly, we would lower the chance that your friend was abducted. Hence, P(Aliens = yesjLate = Y es) 6= P(Aliens = yesjLate = Y es; W atch = no). Thus, Aliens and Watch are dependent given Late.
Graphs that look dierent may actually imply the same independence relations. If G is a DAG, we let I (G ) denote all the independence statements implied by G . Two DAG's G1 and G2 for the same variables V are Markov equivalent if I (G1 ) = I (G2 ). Given a DAG G , let skeleton(G ) denote the undirected graph obtained by replacing the arrows with undirected edges. and G2 are Markov equivalent if and only if (i) skeleton(G1 ) = skeleton(G2 ) and (ii) G1 and G2 have the same colliders. Theorem 20.9 Two DAG's
G
1
Example 20.10 The rst three DAG's in Figure 20.5 are Markov
equivalent. The DAG in the lower right of the Figure is not Markov equivalent to the others. 20.5
Estimation for DAG's
Let G be a DAG. Assume that all the variables V = (X1 ; : : : ; X ) are discrete. The probability function can be written m
p(v ) =
Y k
p(x
i
i=1
j ): i
(20.3)
20.5 Estimation for DAG's
351
To estimate p(v ) we need to estimate p(x j ) for each i. Think of the parents of X as one discrete variable Xe with many levels. For example, suppose that X3 has parents X1 and X2 and that X1 2 f0; 1; 2g while X2 2 f0; 1g. We can regard the parents X1 and X2 as a single variable Xe3 de ned by i
i
i
Xe3 =
i
8 1 > > > > > > <2 > > > > > :
3 4 5 6
if if if if if if
Hence we can write
p(v ) =
X1 = 0; X2 = 0 X1 = 0; X2 = 1 X1 = 1; X2 = 0 X1 = 1; X2 = 1 X1 = 2; X2 = 0 X1 = 2; X2 = 1:
Y k
p(x j xe ): i
(20.4)
i
i=1
Theorem 20.11 Let V1 ; : : : ; V be iid random vectors from disn
tribution p given in (20.4). The maximum likelihood estimator of p is
pb(v ) =
Y k
pb(x j xe ) i
(20.5)
i
i=1
where
#fi : X = x and Xe = xe g : #fi : Xe = xe g Example 20.12 Let V = (X; Y; Z ) have the DAG in the top left of Figure 20.5. Given n observations (X1 ; Y1; Z1 ); : : : ; (X ; Y ; Z ), the mle of p(x; y; z ) = p(x)p(y jx)p(z jy ) is #fi : X = xg ; pb(x) =
pb(x j xe ) = i
i
i
i
i
i
i
i
n
i
n #fi : X = x and Y = y g pb(y jx) = ; #fi : Y = y g i
i
i
and
pb(z jy ) =
#fi : Y = y and Z = z g : #fi : Z = z g i
i
i
n
n
352
20. Directed Graphs
It is possible to extend these ideas to continuous random variables as well. For example, we might use some parametric model p(xj ; ) for each condiitonal density. The likelihood function is then x
x
L() =
Y n
p(V ; ) =
YY n
m
i
i=1
p(X j ; ) ij
j
j
i=1 j =1
where X is the value of X for the ith data point and are the parameters for the j th conditional density. We can then proceed using maximum likelihood. So far, we have assumed that the structure of the DAG is given. One can also try to estimate the structure of the DAG itself from the data. For example, we could t every possible DAG using maximum likelihood and use AIC (or some other method) to choose a DAG. However, there are many possible DAG's so you would need much data for such a method to be reliable. Producing a valid, accurate con dence set for the DAG structure would require astronomical sample sizes. DAG's are thus most useful for encoding conditional independence information rather than discovering it. ij
20.6
j
j
Causation Revisited
We discussed causation in Chapter 19 using the idea of counterfactual random variables. A dierent approach to causation uses DAG's. The two approaches are mathematically equivalent though they appear to be quite dierent. In Chapter 19, the extra element we added to clarify causation was the idea of a counterfactual. In the DAG approach, the extra element is the idea of intervention. Consider the DAG in Figure 20.9. The probability function for a distribution consistent with this DAG has the form p(x; y; z ) = p(x)p(y jx)p(z jx; y ). Here is pseudo-code for generating from this distribution:
20.6 Causation Revisited
X
Y
353
Z
FIGURE 20.9. Conditioning versus intervening.
for i = 1; : : : ; n : x < p (x ) y < p j (y jx ) z < p j (z jx ; y ) i
X
i
i
Y X
i
Z X;Y
i
i
i
i
i
Suppose we repeat this code many times yielding data (x1 ; y1 ; z1 ); : : : ; (x ; y ; z ). Among all the times that we observe Y = y , how often is Z = z ? The answer to this question is given by the conditional distribution of Z jY . Speci cally, n
P(Z = z
jY = y)
=
P(Y = y; Z = z ) p(y; z ) = P(Y = y ) p(y )
P
P
p(x; y; z ) p(x) p(y jx) p(z jx; y ) = = p(y ) p(y ) X p(y jx) p(x) X p(x; y ) = p(z jx; y ) = p(z jx; y ) p(y ) p(y ) x
X x
=
x
p(z jx; y ) p(xjy ):
x
x
Now suppose we intervene by changing the computer code. Speci cally, suppose we x Y at the value y . The code now looks like this:
n
n
354
20. Directed Graphs
set Y = y for i = 1; : : : ; n x < p (x ) z < p j (z jx ; y ) i
X
i
i
Z X;Y
i
i
Having set Y = y , how often was Z = z ? To answer, note that the intervention has changed the joint probability to be p (x; z ) = p(x)p(z jx; y ): The answer to our question is given by the marginal distribution X X p (z ) = p (x; z ) = p(x)p(z jx; y ): x
x
We shall denote this as P(Z = z jY := y ) or p(z jY := y ). We call P(Z = z jY = y ) conditioning by observation or passive conditioning. We call P(Z = z jY := y ) conditioning by intervention or active conditioning. Passive conditioning is used to answer a predictive question like: \Given that Joe smokes, what is the probability he will get lung cancer?"
Active conditioning is used to answer a causal question like:
\If Joe quits smoking, what is the probability he will get lung cancer?"
Consider a pair (G ; P) where G is a DAG and P is a distribution for the variables V of the DAG. Let p denote the probability function for P. Consider intervening and xing a variable X to be equal to x. We represent the intervention by doing two things: (1) Create a new DAG G by removing all arrows pointing into X ; (2) Create a new distribution p (v ) = P(V = v jX := x) by removing the term p(xj ) from p(v ). X
20.6 Causation Revisited
355
The new pair (G ; P ) represents the intervention \set X = x." Example 20.13 You may have noticed a correlation between rain
and having a wet lawn, that is, the variable \Rain" is not independent of the variable \Wet Lawn" and hence p (r; w) 6= p (r)p (w) where R denotes Rain and W denotes Wet Lawn. Consider the following two DAGS: R;W
R
W
Rain
! Wet Lawn
Rain
Wet Lawn:
The rst DAG implies that p(w; r) = p(r)p(wjr) while the second implies that p(w; r) = p(w)p(rjw) No matter what the joint distribution p(w; r) is, both graphs are correct. Both imply that R and W are not independent. But, intuitively, if we want a graph to indicate causation, the rst graph is right and the second is wrong. Throwing water on your lawn doesn't cause rain. The reason we feel the rst is correct while the second is wrong is because the interventions implied by the rst graph are correct. Look at the rst graph and form the intervention W = 1 where 1 denotes \wet lawn." Following the rules of intervention, we break the arrows into W to get the modi ed graph:
Rain
set
Wet Lawn =1
with distribution p (r) = p(r). Thus P(R = r j W := w) = P(R = r ) tells us that \wet lawn" does not cause rain. Suppose we (wrongly) assume that the second graph is the correct causal graph and form the intervention W = 1 on the second graph. There are no arrows into W that need to be broken so the intervention graph is the same as the original graph. Thus p (r) = p(rjw) which would imply that changing \wet" changes \rain." Clearly, this is nonsense. Both are correct probability graphs but only the rst is correct causally. We know the correct causal graph by using background knowledge.
356
20. Directed Graphs
Remark 20.14 We could try to learn the correct causal graph
from data but this is dangerous. In fact it is impossible with two variables. With more than two variables there are methods that can nd the causal graph under certain assumptions but they are large sample methods and, furthermore, there is no way to ever know if the sample size you have is large enough to make the methods reliable.
We can use DAG's to represent confouding variables. If X is a treatment and Y is an outcome, a confounding variable Z is a variable with arrows into both X and Y ; see Figure 20.10. It is easy to check, using the formalism of interventions, that the following facts are true. In a randomized study, the arrow betwen Z and X is broken. In this case, even with Z unobserved (represented by enclosing Z in a circle), the causal relationship between X and Y is estimable because it can be shown that E (Y jX := x) = E (Y jX = x) which does not involve the unobserved Z . In an observational study, with all confounders observed, we get R E (Y jX := x) = E (Y jX = x; Z = z )dF (z ) as in formula (19.8). If Z is unobserved then we R cannot estimate the causal eect because E (Y jX := x) = E (Y jX = x; Z = z )dF (z ) involves the unobserved Z . We can't just use X and Y since in this case. P(Y = y jX = x) 6= P(Y = y jX := x) which is just another way of saying that causation is not association. In fact, we can make a precise connection between DAG's and counterfactuals as follows. Suppose that X and Y are binary. De ne the confounding variable Z by Z
Z
8 > > < 12 Z= 3 > :
if if if 4 if
(C0 ; C1 ) = (0; 0) (C0 ; C1 ) = (0; 1) (C0 ; C1 ) = (1; 0) (C0 ; C1 ) = (1; 1):
20.7 Bibliographic Remarks
Z
X
Z
Y
X
357
Z
Y
X
FIGURE 20.10. Randomized study; Observational study with measured confounders; Observational study with unmeasured confounders. The circled variables are unobserved.
From this, you can make the correspondence between the DAG approach and the counterfactual approach explicit. I leave this for the interested reader. 20.7
Bibliographic Remarks
There are a number of texts on DAG's including Edwards (1996) and Jordan (2003). The rst use of DAG's for representing causal relationships was by Wright (1934). Modern tratments are contained in Spirtes, Glymour, and Scheines (1990) and Pearl (2000). Robins, Scheines, Spirtes and Wasserman (2003) discuss the problems with estimating causal structure from data. 20.8
Exercises
1. Consider the three DAG's in Figure 20.5 without a collider. Prove that X q Z jY . 2. Consider the DAG in Figure 20.5 with a collider. Prove that X q Z and that X and Z are dependent given Y . 3. Let X 2 f0; 1g, Y 2 f0; 1g, Z 2 f0; 1; 2g. Suppose the distribution of (X; Y; Z ) is Markov to:
X
!Y !Z
Y
358
20. Directed Graphs
Create a joint distribution p(x; y; z ) that is Markov to this DAG. Generate 1000 random vectors from this distribution. Estimate the distribution from the data using maximum likelihood. Compare the estimated distribution to the true distribution. Let = (000 ; 001 ; : : : ; 112 ) where = P(X = r; Y = s; Z = t). Use the bootstrap to get standard errors and 95 per cent con dence intervals for these 12 parameters. rst
4. Let V = (X; Y; Z ) have the following joint distribution
X Y jX=x Z j X = x; Y = y
Bernoulli
1
2 e
4x
2
Bernoulli
e2( + ) 2 Bernoulli 1 + e2( + )
1 + e
4x
x
2
y
x
y
2
:
(a) Find an expression for P(Z = z j Y = y ). In particular, nd P(Z = 1 j Y = 1).
(b) Write a program to simulate the model. Conduct a simulation and compute P(Z = 1 j Y = 1) empirically. Plot this as a function of the simulation size N . It should converge to the theoretical value you computed in (a). (c) Write down an expression for P(Z = 1 j Y := y ). In particular, nd P(Z = 1 j Y := 1).
20.8 Exercises
359
(d) Modify your program to simulate the intervention \set Y = 1." Conduct a simulation and compute P(Z = 1 j Y := 1) empirically. Plot this as a function of the simulation size N . It should converge to the theoretical value you computed in (c). 5. This is a continuous, Gaussian version of the last question. Let V = (X; Y; Z ) have the following joint distribution
X Y jX=x Z j X = x; Y = y
Normal (0; 1) Normal (x; 1) Normal ( y + x; 1):
Here, ; and are xed parameters. Economists refer to models like this as structural equation models. (a) Find R an explicit expression for f (z j y) and E (Z j Y = y ) = zf (z j y )dz . (b) Find an explicit expression for f (z j Y := y ) and then R nd E (Z j Y := y ) zf (z j Y := y )dy . Compare to (b). (c) Find the joint distribution of (Y; Z ). Find the correlation between Y and Z . (d) Suppose that X is not observed and we try to make causal conclusions from the marginal distribution of (Y; Z ). (Think of X as unobserved confounding variables.) In particular, suppose we declare that Y causes Z if 6= 0 and
360
20. Directed Graphs
we declare that Y does not cause Z if = 0. Show that this will lead to erroneous conclusions. (e) Suppose we conduct a randomized experiment in which Y is randomly assigned. To be concrete, suppose that
X Y Z j X = x; Y = y
Normal(0; 1) Normal(; 1) Normal( y + x; 1):
Show that the method in (d) now yields correct conclusions i.e. = 0 if and only if f (z j Y := y ) does not depend on y .
This is page 359 Printer: Opaque this
21
Nonparametric Curve Estimation
In this Chapter we discuss nonparametric estimation of probability density functions and regression functions which we refer to as curve estimation. In Chapter 8 we saw that it is possible to consistently estimate a cumulative distribution function F without making any assumptions about F . If we want to estimate a probability density function f (x) or a regression function r(x) = E (Y jX = x) the situation is dierent. We cannot estimate these functions consistently without making some smoothness assumptions. Correspondingly, we will perform some sort of smoothing operation on the data. A simple example of a density estimator is a histogram, which we discuss in detail in Section 21.2. To form a histogram estimator of a density f , we divide the real line to disjoint sets called bins. The histogram estimator is piecewise constant function where the height of the function is proportional to number of observations in each bin; see Figure 21.3. The number of bins is an example of a smoothing parameter. If we smooth too
360
21. Nonparametric Curve Estimation
gb(x) This is a function of the data
This is the point at which we are evaluating bg()
FIGURE 21.1. A curve estimate gb is random because it is a function of the data. The point x at which we evaluate gb is not a random variable.
much (large bins) we get a highly biased estimator while if we smooth too little (small bins) we get a highly variable estimator. Much of curve estimation is concerned with trying to optimally balance variance and bias. 21.1 The Bias-Variance Tradeo
Let g denote an unknown function and let bgn denote an estimator of g. Bear in mind that bgn(x) is a random function evaluated at a point x. bgn is random because it depends on the data. Indeed, we could be more explicit and write bgn(x) = hx (X ; : : : ; Xn ) to show that bgn (x) is a function of the data X ; : : : ; Xn and that the function could be dierent for each x. See Figure 21.1. As a loss function, we will use the integrated squared error (ISE): Z L(g; bgn) = (g (u) bgn(u)) du: (21.1) 1
1
2
The risk or mean integrated squared error (MISE) is
R(f; fb) = E L(g; bg) :
(21.2)
21.2 Histograms
361
Lemma 21.1 The risk can be written as
R(g; bgn) = where
Z
b (x) dx + 2
Z
v (x) dx
(21.3)
b(x) = E (bgn (x)) g (x)
(21.4)
is the bias of gbn (x) at a xed x and
v (x) = V (bgn(x)) = E (bgn(x)
E (b gn(x))
2
)
(21.5)
is the variance of bgn (x) at a xed x.
In summary, RISK = BIAS + VARIANCE: 2
(21.6)
When the data are over-smoothed, the bias term is large and the variance is small. When the data are under-smoothed the opposite is true; see Figure 21.2. This is called the bias-variance trade-o. Minimizing risk corresponds to balancing bias and variance. 21.2 Histograms
Let X ; : : : ; Xn be iid on [0; 1] with density f . The restriction to [0; 1] is not crucial; we can always rescale the data to be on this interval. Let m be an integer and de ne bins 1
1 ; B = 1 ; 2 ; : : : ; B = m 1 ; 1 : (21.7) B = 0; m m m m m 1
2
De ne the binwidth h = 1=m, let j beRthe number of observations in Bj , let pbj = j =n and let pj = Bj f (u)du.
21. Nonparametric Curve Estimation
0.06
0.08
0.10
362
0.04
Risk
0.02
Variance
0.00
Bias squared Optimal Smoothing
0.1
0.2
0.3
0.4
more smoothing −−−>
FIGURE 21.2. The Bias-Variance trade-o. The bias increases and the variance decreases with the amount of smoothing. The optimal amount of smoothing, indicated by the vertical line, minimizes the risk = bias2 + variance.
0.5
21.2 Histograms
363
The histogram estimator is de ned by fbn (x) =
8 > pb1 =h > > < pb =h 2
.
.. > > > : pb =h m
x 2 B1 x 2 B2
...
x 2 Bm
which we can write more succinctly as fbn (x)
=
n X pbj j =1
h
I (x 2 Bj ):
(21.8) R Bj
To
understand the motivation for this estimator, let pj = f (u)du and note that, for x 2 Bj and h small, R f (u)du f (x)h p b p fbn (x) = j j = Bj f (x) = f (x): h h h Example 21.2 Figure 21.3 shows three dierent histograms based
on n = 1; 266 data points from an astronomical sky survey. Each data point represents the distance from us to a galaxy. The galaxies lie on a \pencilbeam" pointing directly from the Earth out into space. Because of the nite speed of light, looking at galaxies farther and farther away corresponds to looking back in time. Choosing the right number of bins involves nding a good tradeo between bias and variance. We shall see later that the top left histogram has too few bins resulting in oversmoothing and too much bias. The bottom left histogram has too many bins resulting in undersmoothing and too few bins. The top right histogram is just right. The histogram reveals the presence of clusters of galaxies. Seeing how the size and number of galaxy clusters varies with time, helps cosmologists understand the evolution of the universe.
The mean and variance of fbn(x) are given in the following Theorem.
21. Nonparametric Curve Estimation
0
0
2
10
4
20
6
30
8
40
10
50
12
60
14
364
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10 Just Right
0
−8 −10 −14
−12
20
40
cross validation score
60
−6
Oversmoothed
0.15
0.00
0.05
0.10 Undersmoothed
0.15
0.20
0
200
400
600
800
number of bins
FIGURE 21.3. Three versions of a histogram for the astronomy data. The top left histogram has too few bins. The bottom left histogram has too many bins. The top right histogram is just right. The lower, right plot shows the estimated risk versus the number of bins.
1000
0.20
21.2 Histograms
365
21.3 Consider xed x and xed m, and let Bj be the bin containing x. Then, pj (1 pj ) pj and V (fbn (x)) = : (21.9) E (fbn (x)) = h nh2 Theorem
Let's take a closer look at the bias-variance tradeo using equation (21.9) . Consider some x 2 Bj . For any other u 2 Bj , f (u) f (x) + (u x)f 0 (x) and so Z Z 0 pj = f (u)du f (x) + (u x)f (x) du Bj
Bj
=
f (x)h + hf 0 (x) h j
Therefore, the bias b(x) is p b(x) = E (fbn (x)) f (x) = j f (x) h f (x)h + hf 0 (x) h j x
1 = 2 If xej is the center of the bin, then Z
b (x) dx 2
Bj
Z
Bj
(f 0(xe
Z
1
j ))
0
Bj
h = (f 0(xej )) 12 :
b (x)dx = 2
m Z X
2
x
dx
2
x
dx
3
b (x)dx
h (f 0 (xe
=1
1 h j 2
2
j =1 Bj m h2 X
= 12 j
1 2
Z
x :
x :
2
f (x)
(f 0(x))2 h j
2
Therefore,
1 2
h
f 0 (x) h j
1 2
j ))
2
m X j =1
h (f 0(xej )) 12
h2
12
2
Z
1 0
3
(f 0(x)) dx: 2
366
21. Nonparametric Curve Estimation
Note that this increases as a function of h. Now consider the variance. For h small, 1 pj 1, so v (x)
pj nh
2
f (x)h + hf 0 (x) h j = nh2 (x) fnh
1 2
x
where we have kept only the dominant term. So, Z 1 v (x)dx : 1
nh
0
Note that this decreases with h. Putting all this together, we get: Theorem
R
21.4 Suppose that (f 0 (u))2 du < 1. Then Z
1 ( f 0 (u)) du + : 12 nh h2
R(fbn ; f )
2
(21.10)
The value h that minimizes (21.10) is
h =
1
R
n1=3
1=3
6
(f 0(u)) du 2
:
(21.11)
With this choice of binwidth,
R(fbn ; f ) where C = (3=4)2=3
R
C n2=3 1=3
(f 0(u)) du 2
(21.12) .
Theorem 21.4 is quite revealing. We see that with an optimally chosen binwidth, the MISE decreases to 0 at rate n = . By comparison, most parametric estimators converge at rate 2 3
21.2 Histograms
367
n 1 . The slower rate of convergence is the price we pay for being nonparametric. The formula for the optimal binwidth h is
of theoretical interest but it is not useful in practice since it depends on the unknown function f . A practical way to choose the binwidth is to estimate the risk function and minimize over h. Recall that the loss function, which we now write as a function of h, is L(h) =
=
Z
(fbn(x) f (x)) dx 2
Z
fbn (x) dx 2
Z
2
fbn (x)f (x)dx +
Z
f 2 (x) dx:
The last term does not depend on the binwidth h so minimizing the risk is equivalent to minimizing the expected value of J (h) =
Z
fbn2 (x) dx
Z
2 fbn(x)f (x)dx:
We shall refer to E (J (h)) as the risk, although it diers from R the true risk by the constant term f (x) dx. 2
De nition 21.5 The cross-validation estimator of risk
is
Jb(h) =
Z
2
fbn (x) dx
n 2X fb i (Xi ) n i=1
(
)
(21.13)
where fb( i) is the histogram estimator obtained after removing the ith observation. We refer to Jb(h) as the crossvalidation score or estimated risk. Theorem
21.6 The cross-validation estimator is nearly unbiased: E (Jb(x))
E (J (x)):
368
21. Nonparametric Curve Estimation
In principle, we need to recompute the histogram n times to compute Jb(h). Moreover, this has to be done for all values of h. Fortunately, there is a shortcut formula. Theorem
21.7 The following identity holds: Jb(h)
m n+1 X pb2j : (n 1) j=1
2
= (n 1)h
(21.14)
Example 21.8 We used cross-validation in the astronomy exam-
ple. The cross-validation function is quite at near its minimum. Any m in the range of 73 to 310 is an approximate minimizer but the resulting histogram does not change much over this range. The histogram in the top right plot in Figure 21.3 was constructed using m = 73 bins. The bottom right plot shows the estimated risk, or more precisely, Ab, plotted versus the number of bins.
Next we want some sort of con dence set for f . Suppose fbn is a histogram with m bins and binwidth h = 1=m. We cannot realistically make con dence statements about the ne details of the true density f . Instead, we shall make con dence statements about f at the resolution of the histogram. To this end, de ne pj for x 2 Bj h
fe(x) =
where pj =
R
Bj
(21.15)
f (u)du which is a \histogramized" version of f .
21.9 Let m = m(n) be the number of bins in the histogram Assume that m(n) ! 1 and m(n) log n=n ! 0 as n ! 1. De ne Theorem
fbn .
`(x) = u(x) =
nq
max
q
o2
fbn (x) c; 0
2 b fn (x) + c
(21.16)
21.2 Histograms
where
c=
Then,
z=(2m)
r
2
m : n
369
(21.17)
lim P `(x) fe(x) u(x) for all x 1 : n!1
(21.18)
Here is an outline of the proof. A rigorous proof requires fairly sophisticated tools. From the central limit thep pbj orem, pbj N (pj ; pj (1 pj )=n). By the delta method, p p N ( pj ; 1=(4n)). Moreover, it can be shown that the pbj 's are approximately independent. Therefore, Proof.
p
p
2 n
pbj
pp Z
(21.19)
j
j
where Z ; : : : ; Zm N (0; 1). Let A be the event that `(x) fe(x) u(x) for all x. So, 1
A =
= = = Therefore, P(Ac )
= = =
n
`(x) fe(x) u(x) for all x
np
`(x)
q
nq
o
p
u(x) for all x
fe(x) q
fbn (x) c q n fbn (x) max x
fe(x) q e f (x)
o
q
fbn (x) + c o
for all x
o
c :
r ! r q q p b p j j fe(x) > c = P max P max fbn (x) >c x j h h p p p p P max pbj j > c h = P max 2 n pbj j > 2c j j p p P max 2 n pbj > z=(2m) j j m X P max Zj > z=(2m) P Zj > z=(2m) j j =1
p
p
p
j j
p
p
j j
p
p
hn
370
21. Nonparametric Curve Estimation
=
m X
= : m j =1
Example 21.10 Figure 21.4 shows a 95 per cent con dence enve-
lope for the astronomy data. We see that even with over 1,000 data points, there is still substantial uncertainty.
21.3 Kernel Density Estimation
Histograms are discontinuous. Kernel density estimators are smoother and they converge faster to the true density than histograms. Let X ; : : : ; Xn denote the observed data, a sample from f . In this chapter, a kernel is de ned to be anyR smooth funcR tion K suchR that K (x) 0, K (x) dx = 1, xK (x)dx = 0 and K x K (x)dx > 0. Two examples of kernels are the 1
2
2
Epanechnikov kernel K (x) =
3 4
0
p
p
(1 x =5)= 5 jxj < 5 2
(21.20)
otherwise
and the Gaussian (Normal) kernel K (x) = (2)
=
1 2
e
x2 =2 .
De nition 21.11 Given a kernel K and a positive number
h, called the bandwidth, the kernel density estimator is de ned to be fb(x)
n X 1 = n h1 K x h Xi : i
(21.21)
=1
An example of a kernel density estimator is show in Figure 21.5. The kernel estimator eectively puts a smoothed out lump of mass of size 1=n over each data point Xi. The bandwidth h
371
0
10
20
30
40
50
21.3 Kernel Density Estimation
0.00
0.05
0.10
0.15
FIGURE 21.4. 95 per cent con dence envelope for astronomy data using m = 73 bins.
0.20
21. Nonparametric Curve Estimation
0.00
0.05
0.10
0.15
372
−10
−5
0
FIGURE 21.5. A kernel density estimator fb. At each point x, fb(x) is the average of the kernels centered over the data points Xi . The data points are indicated by short vertical bars.
5
21.3 Kernel Density Estimation
373
controls the amount of smoothing. When h is close to 0, fbn consists of a set of spikes, one at each data point. The height of the spikes tends to in nity as h ! 0. When h ! 1, fbn tends to a uniform density. Example 21.12 Figure 21.6 shows kernel density estimators for
the astronomy data using for three dierent bandwidths. In each case we used a Gaussian kernel. The properly smoothed kernel density estimator in the top right panel shows similar structure as the histogram. However, it is easier to see the clusters in the kernel estimator.
To construct a kernel density estimator, we need to choose a kernel K and a bandwidth h. It can be shown theoretically and empirically that the choice of K is not crucial. However, the choice of bandwidth h is very important. As with the histogram, we can make a theoretical statement about how the risk of the estimator depends on the bandwidth. Theorem
21.13 Under weak assumptions on f and K , R
Z K (x)dx 1 00 b R(f; fn ) K h (f (x)) + 4 nh R where K = x K (x)dx. The optimal bandwidth is 4
2
4
2
2
(21.22)
2
c1 2=5 c12=5 c3 1=5 h =
(21.23)
n1=5 R R R where c1 = x2 K (x)dx, c2 = K (x)2 dx and c3 = (f 00 (x))2 dx. With this choice of bandwidth, c R(f; fbn ) 44=5 n for some constant c4 > 0. n
Write Kh(x; X ) = h P 1 b i Kh (x; Xi ). Thus, E [fn (x)]
Proof.
K ((x X )=h) and fbn (x) = = E [Kh(x; X )] and V [fbn(x)] = 1
21. Nonparametric Curve Estimation
0
0
2
10
20
4
30
6
40
8
50
10
60
12
70
374
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.15
0.20
just right
−7.4 −7.6
0
−7.8
200
400
600
800
cross−validation score
1000
−7.2
1200
−7.0
1400
oversmoothed
0.10
0.00
0.05
0.10
0.15
0.20
0.002
0.004
0.006
0.008
bandwidth undersmoothed
FIGURE 21.6. Kernel density estimators and estimated risk for the astronomy data.
0.010
0.012
21.3 Kernel Density Estimation
n
1
V [Kh (x; X )].
E [Kh (x; X )]
Now,
= =
Z Z Z
1 K x t f (t) dt h h K (u)f (u hu) du
= K (u) f (u) hf (u) + 21 f 00 (u) + du Z 1 00 = f (u) + 2 h f (u) u K (u) du 0
2
R
375
2
R
since K (x) dx = 1 and x K (x) dx = 0. The bias is 1 00 E [Kh (x; X )] f (x) k h f (x): 2 By a similar calculation, R f ( x ) K (x) dx : V [fb (x)] 2
2
2
n
n hn
The second result follows from integrating the squared bias plus the variance. We see that kernel estimators converge at rate n = while histograms converge at the slower rate n = . It can be shown that, under weak assumptions, there does not exist a nonparametric estimator that converges faster than n = . The expression for h depends on the unknown density f which makes the result of little practical use. As with the histograms, we shall use cross-validation to nd a bandwidth. Thus, we estimate the risk (up to a constant) by 4 5
2 3
4 5
Jb(h) =
Z
fb2 (x)dz
2 n1
n X i=1
fb i (Xi )
(21.24)
where fb i is the kernel density estimator after omitting the i observation.
th
376
21. Nonparametric Curve Estimation
Theorem
21.14 For any h > 0, E
h
i
Jb(h) = E [J (h)] :
Also,
Jb(h)
hn1
XX
2
i
j
X K i
h
Xj
2 K (0) + nh
(21.25)
R
where K (x) = K (2) (x) 2K (x) and K (2) (z ) = K (z y )K (y )dy . In particular, if K is a N(0,1) Gaussian kernel then K (2) (z ) is the N (0; 2) density. For large data sets, We then choose the bandwidth hn that minimizes Jb(h). A b f and (21.25) justi cation for this method is given by the following remarkable can be computed theorem due to Stone. quickly using the fast Fourier trans- Theorem 21.15 (Stone's Theorem.) Suppose that f is bounded. form. Let fbh denote the kernel estimator with bandwidth h and let hn
denote the bandwidth chosen by cross-validation. Then, R
2
f (x) fbhn (x)
inf h
R
f (x)
dx
fbh (x)
2
dx
! 1:
P
(21.26)
Example 21.16 The top right panel of Figure 21.6 is based on
cross-validation. These data are rounded which problems for cross-validation. Speci cally, it causes the minimizer to be h = 0. To overcome this problem, we added a small amount of random Normal noise to the data. The result is that Jb(h) is very smooth with a well de ned minimum. Remark 21.17 Do not assume that, if the estimator fb is wiggly,
then cross-validation has let you down. The eye is not a good judge of risk.
21.3 Kernel Density Estimation
377
To construct con dence bands, we use something similar to histograms although the details are more complicated. The version described here is from Chaudhuri and Marron (1999).
378
21. Nonparametric Curve Estimation Con dence Band for Kernel Density Estimator
1. Choose an evenly spaced grid of points V = fv ; : : : ; vN g. For every v 2 V , de ne 1 v Xi Y (v ) = K : (21.27) 1
i
h
h
Note that fbn(v) = Y n(v), the average of the Yi(v)0s. 2. De ne
s(v ) pn P where s2(v) = (n 1) 1 ni=1(Yi(v) Y n(v))2. se (v ) =
(21.28)
3. Compute the eective sample size ESS (v ) =
Pn (v Xi ) i=1 K h
K (0)
:
(21.29)
Roughly, this is the number of data points being averaged together to form fbn(v). 4. Let V = fv 2 V : ESS (v) 5g. Now de ne the number of independent blocks m by 1 = ESS m n where ESS is the average of ESS over V . 5. Let
q=
and de ne
1
1 + (1 ) =m 2 1
(21.30)
(21.31)
`(v ) = fbn (v ) q se (v ) and u(v ) = fbn (v ) + q se (v )
(21.32) where we replace `(v) with 0 if it becomes negative. Then, P
n
`(v ) f (v ) u(v ) for all v
o
1 :
21.3 Kernel Density Estimation
379
Example 21.18 Figure 21.7 shows approximate 95 per cent con-
dence bands for the astronomy data.
Suppose now that the data Xi = (Xi ; : : : ; Xid ) are d-dimensional. The kernel estimator can easily be generalized to d dimensions. Let h = (h ; : : : ; hd) be a vector of bandwidths and de ne 1
1
fbn (x)
n X 1 = n Kh(x Xi )
where Kh (x Xi ) =
(21.33)
i=1
1
nh1 hd
( d Y j =1
K
xi
Xij
)
(21.34)
hj
where h ; : : : ; hd are bandwidths. For simplicity, we might take hj = sj h where sj is the standard deviation of the j variable. There is now only a single bandwidth h to choose. Using calculations like those in the one-dimensional case, the risk is given by 1
th
1 R(f; fbn ) K 4 4
" d X R
j =1
h4j
Z
fjj2 (x)dx + d
X j 6=k
h2j h2k
Z
#
fjj fkk dx
K (x)dx + nh h d where fjj is the second partial derivative of f . The optimal bandwidth satis es hi c n = d ) leading to a risk of order n = d . From this fact, we see that the risk increases quickly with dimension, a problem usually called the curse of dimensionality. To get a sense of how serious this problem is, consider the following table from Silverman (1986) which shows the sample size required to ensure a relative mean squared error less than 0.1 at 0 when the density is multivariate normal and the optimal bandwidth is selected. 2
1
1
4 (4+ )
1 (4+ )
21. Nonparametric Curve Estimation
0
10
20
30
40
380
0.00
0.05
0.10
0.15
FIGURE 21.7. 95 per cent Con dence bands for kernel density estimate for the astronomy data.
0.20
21.4 Nonparametric Regression
381
Dimension Sample Size 1 4 2 19 3 67 4 223 5 768 6 2790 7 10,700 8 43,700 9 187,000 10 842,000 This is bad news indeed. It says that having 842,000 observations in a ten dimensional problem is really like having 4 observations.
21.4 Nonparametric Regression
Consider pairs of points (x ; Y ); : : : ; (xn; Yn) related by 1
1
Yi = r(xi ) + i
(21.35)
where E (i ) = 0 and we are treating the xi 's as xed. In nonparametric regression, we want to estimate the regression function r(x) = E (Y jX = x). There are many nonparametric regression estimators. Most involve estimating r(x) by taking some sort of weighted average of the Yi's, giving higher weight to those points near x. In particular, the Nadaraya-Watson kernel estimator is de ned by
382
21. Nonparametric Curve Estimation
rb(x) =
n X i=1
wi (x)Yi
(21.36)
where the weights wi(x) are given by wi (x) =
x xi h Pn x xj : K j =1 h
K
(21.37)
The form of this estimator comes from rst estimating the joint density f (x; y) using kernel density estimation and then inserting this into: R Z yf (x; y )dy r(x) = E (Y jX = x) = yf (y jx)dy = R : f (x; y )dy Theorem 21.19 Suppose that V (i ) = . The risk of the Nadaraya2
Watson kernel estimator is Z 4 Z h4 2 2 x K (x)dx R(rbn ; r)
4 +
Z
2
R
f 0 (x) 2 00 0 r (x) + 2r (x) dx
f (x)
K (x)dx dx: nhf (x) 2
(21.38)
The optimal bandwidth decreases at rate n choice the risk decreases at rate n 4=5 .
=
1 5
and with this
In practice, to choose the bandwidth h we minimize the cross validation score Jb(h) =
n X i=1
(Yi rb i(xi ))
(21.39)
2
where rb i is the estimator we get by omitting the i variable. An approximation to Jb is given by n X 1 b (21.40) J (h) = (Yi rb(xi )) : K i 1 Pn K xi xj th
2
2
(0)
=1
j =1
h
21.4 Nonparametric Regression
383
Example 21.20 Figures 21.8 shows cosmic microwave background
(CMB) data from BOOMERaNG (Netter eld et al. 2001), Maxima (Lee et al. 2001) and DASI (Halverson 2001). The data consist of n pairs (x1 ; Y1 ); : : : ; (xn ; Yn) where xi is called the multipole moment and Yi is the estimated power spectrum of the temperature uctuations. What your seeing are sound waves in the cosmic microwave background radiation which is the heat, left over from the big bang. If r(x) denotes the true power spectrum then Yi = r(xi ) + i
where i is a random error with mean 0. The location and size of peaks in r(x) provides valuable clues about the behavior of the early universe. Figure 21.8 shows the t based on crossvalidation as well as an undersmoothed and oversmoothed t. The cross-validation t shows the presence of three well de ned peaks, as predicted by the physics of the big bang.
The procedure for nding con dence bands is similar to that for density estimation. However, we rst need to estimate . Suppose that the xi's are ordered. Assuming r(x) is smooth, we have r(xi ) r(xi) 0 and hence 2
+1
Yi+1
h
Yi = r(xi+1 ) + i+1
i
h
r(xi ) + i
i
i
i
+1
and hence Yi ) V (i+1
V (Yi+1
i ) = V (i+1 ) + V (i ) = 2 2 :
We can thus use the average of the n 1 dierences Yi to estimate . Hence, de ne
+1
2
b2 =
1
n 1 X
2(n 1) i (Yi =1
+1
Yi)2 :
Yi
(21.41)
5000 4000 3000 2000 1000 0
0
1000
2000
3000
4000
5000
6000
21. Nonparametric Curve Estimation
6000
384
200
400
600
800
1000
200
400
800
1000
Oversmoothed
0
6e+05 5e+05 3e+05
1000
4e+05
2000
3000
estimated risk
4000
5000
7e+05
6000
8e+05
Undersmoothed
600
200
400
600
800
1000
20
40
60
80
bandwidth Just Right (Using cross−valdiation)
FIGURE 21.8. Regression analysis of the CMB data. The rst t is undersmoothed, the second is oversmoothed and the third is based on cross-validation. The last panel shows the estimated risk versus the bandwidth of the smoother. The data are from BOOMERaNG, Maxima and DASI.
100
120
21.4 Nonparametric Regression
385
Con dence Bands for Kernel Regression
Follow the same procedure as for kernel density estimators except, change the de nition of se (v) to v u n uX se (v ) = bt w2 (xi )
(21.42)
i=1
where b is de ned in (21.41). Example 21.21 Figure 21.9 shows a 95 per cent con dence enve-
lope for the CMB data. We see that we are highly con dence of the existence and position of the rst peak. We are more uncertain about the second and third peak. At the time of this writing, more accurate data are becoming available that apparently provide sharper estimates of the second and third peak.
The extension to multiple regressors X = (X ; : : : ; Xp) is straightforward. As with kernel density estimation we just replace the kernel with a multivariate kernel. However, the same caveats about the curse of dimensionality apply. In some cases, we might consider putting some restrictions on the regression function which will then reduce the curse of dimensionality. For example, additive regression is based on the model 1
Y=
p X j =1
rj (Xj ) + :
(21.43)
Now we only need to t p one-dimensional functions. The model can be enriched by adding various interactions, for example, Y=
p X j =1
rj (Xj ) +
X j
rjk (Xj Xk ) + :
(21.44)
Additive models are usually t by an algorithm called back tting.
21. Nonparametric Curve Estimation
−1000
0
1000
2000
3000
4000
5000
6000
386
200
400
600
800
FIGURE 21.9. 95 per cent con dence envelope for the CMB data.
1000
21.5 Appendix: Con dence Sets and Bias
387
Back tting
1. Initialize r (x ), . . . , rp(xp ). 1
1
2. For j = 1; : : : ; p: P
(a) Let i = Yi s6 j rs (xi ). (b) Let rj be the function estimate obtained by regressing the i 's on the j covariate. =
th
3. If converged STOP. Else, go back to step 2. Additive models have the advantage that they avoid the curse of dimensionality and they can be t quickly but they have one disadvantage: the model is not fully nonparametric. In other words, the true regression function r(x) may not be of the form (21.43). 21.5 Appendix: Con dence Sets and Bias
The con dence bands we computed are not for the density function or regression function but rather, for the smoothed function. For example, the con dence band for a kernel density estimate with bandwidth h is a band for the function one gets by smoothing the true function with a kernel with the same bandwidth. Getting a con dence set for the true function is complicated for reasons we now explain. First, let's review the parametric case. When estimating a scalar quantity with an estimator b, the usual con dence interval is of the form q bz= sn where b is the maximum likelihood estimator and sn = V (b) is the estimated standard error of the estimator. Under standard regularity conditions, b N (; sn) 2
388
21. Nonparametric Curve Estimation
and
lim P b 2 sn b + 2 sn = 1 : n!1 But let us take a closer look. Let bn = E (bn) We know that b N (; sn ) but a more accurate statement is b N ( + bn ; sn). We usually ignore the bias term bn in large sample calculations but let's keep track of it. The coverage of the usual con dence interval is
P b
z=2 sn b + z=2 sn
=
z=2
P
=
P
!
z=2 sn N (bn ; sn) z=2 z=2 sn bn ; 1 z=2 : z=2 N sn
P
=
b
In a well behaved parametric model, sn is of size n = bn is of size n . Hence, bn=sn ! 0 so the last displayed probability statement becomes P( z= N (0; 1) z= ) = 1 . What makes parametric con dence intervals have the right coverage is the fact that bn=sn ! 0. The situation is more complicated for kernel methods. Consider estimating a density f (x) at a single point x with a kernel density estimator. Since fb(x) is a sum if iid random variables, the central limit theorem implies that c f (x) b f (x) N f (x) + b (x); (21.45) 1 2
1
2
2
2
n
where
nh
1 (21.46) 2 R R is the bias, c = x K (x)dx and c = K (x)dx. The estimated standard error is bn (x) = h2 f 00 (x)c1
1
2
2
2
(
c fb(x) sn (x) = 2 nh
)1=2
:
(21.47)
21.6 Bibliographic Remarks
389
Suppose we use the usual interval fb(x) z= sn. Arguing as above, the coverage is approximately, bn (x) P z= N ; 1 z= : sn (x) 2
2
2
The optimal bandwidth is of the form h = cn = for some constant c. If we plug h = cn = into (21.46) and(21.47) we see that b (x)=sn(x) does not tend to 0. Thus, the con dence interval will have coverage less than 1 . 1 5
1 5
n
21.6 Bibliographic Remarks
Two very good books on curve estimation are Scott (1992) and Silverman (1986). I drew heavily on those books for this Chapter. 21.7 Exercises
1. Let X ; : : : ; Xn f and let fbn be the kernel density estimator using the boxcar kernel: <x< K (x) = 01 otherwise : 1
1 2
1 2
(a) Show that E (fb(x))
and V (fb(x)) =
1
nh2
2 4
Z 1 =
h
Z x+(h=2) x (h=2)
x+(h=2) x (h=2)
f (y )dy
f (y )dy Z x+(h=2) x (h=2)
f (y )dy
!2 3 5:
390
21. Nonparametric Curve Estimation
(b) Show that if h ! 0 and nh ! 1 as n ! 1 then fbn (x) ! f (x). P
2. Get the data on fragments of glass collected in forensic work. You need to download the MASS library for R then: library(MASS) data(fgl) x <- fgl[[1]] help(fgl)
The data are also on my website. Estimate the density of the rst variable (refractive index) using a histogram and use a kernel density estimator. Use cross-validation to choose the amount of smoothing. Experiment with different binwidths and bandwidths. Comment on the similarities and dierences. Construct 95 per cent con dence bands for your estimators. 3. Consider the data from question 2. Let Y be refractive index and let x be aluminium content (the fourth variable). Do a nonparametric regression to t the model Y = f (x) + . Use cross-validation to estimate the bandwidth. Construct 95 per cent con dence bands for your estimate. 4. Prove Lemma 21.1. 5. Prove Theorem 21.3. 6. Prove Theorem 21.7. 7. Prove Theorem 21.14.
21.7 Exercises
391
8. Consider regression data (x ; Y ); : : : ; (xn; Yn). Suppose that 0 xi 1 for all i. De ne bins Bj as in equation (21.7). For x 2 Bj de ne rbn (x) = Y j where Y j is the mean of all the Yi's corresponding to those xi 's in Bj . Find the approximate risk of this estimator. From this expression for the risk, nd the optimal bandwidth. At what rate does the risk go to zero? 1
1
9. Show that with suitable smoothness assumptions on r(x), b in equation (21.41) is a consistent estimator of . 2
2
This is page 393 Printer: Opaque this
22 Smoothing Using Orthogonal Functions
In this Chapter we study a dierent approach to nonparametric curve estimation based on orthogonal functions. We begin with a brief introduction to the theory of orthogonal functions. Then we turn to density estimation and regression. L2 Spaces Let v = (v ; v ; v ) denote a three dimensional vector, that is, a list of three real numbers. Let V denote the set of all such vectors. If a is a scalar (a number) and v is a vector, we de ne av = (av ; av ; av ). The sum of vectors v and w is de ned by v + w = (v + w ; v + w ; v + w ). The inner product between P two vectors v and w is de ned by hv; wi = i viwi. The norm (or length) of a vector v is de ned by
22.1
Orthogonal Functions and
1
1
2
3
2
1
3
1
2
2
3
p
3
jjvjj = hv; vi =
3 =1
v u 3 uX t vi2 : i=1
(22.1)
394
22. Smoothing Using Orthogonal Functions
Two vectors are orthogonal (or perpendicular) if hv; wi = 0. A set of vectors are orthogonal if each pair in the set is orthogonal. A vector is normal if jjvjj = 1. Let = (1; 0; 0), = (0; 1; 0), = (0; 0; 1). These vectors are said to be an orthonormal basis for V since they have the following properties: (i) they are orthogonal; (ii) they are normal; (iii) they form a basis for V which means that if v 2 V then v can be written as a linear combination of , , : 1
2
3
1
v=
X 3
j =1
where
j j
2
3
j = hj ; v i:
(22.2)
For example, if v = (12; 3; 4) then v = 12 +3 +4 . There are other orthonormal bases for V , for example, 1 1 1 1 1 1 1 2 = p3 ; p3 ; p3 ; = p2 ; p2 ; 0 ; = p6 ; p6 ; p6 : You can check that these three vectors also form an orthonormal basis for V . Again, if v is any vector then we can write 1
2
2
1
v=
3 X
j =1
j
3
3
where
j
j = h j ; v i:
For example, if v = (12; 3; 4) then v = 10:97 + 6:36 + 2:86 : Now we make the leap from vectors to functions. Basically, we just replace vectors with functions and sums with integrals. Let L (a;Rb) denote all functions de ned on the interval [a; b] such that ab f (x) dx < 1: 1
2
3
2
2
L2 [a; b] = f : [a; b] ! R ;
Z b a
f (x) dx < 1 : 2
(22.3)
22.1 Orthogonal Functions and L2 Spaces
395
We sometimes write L instead of L (a; b). RThe inner product between two functions f; g 2 L is de ned by f (x)g(x)dx. The norm of f is sZ f (x) dx: (22.4) jjf jj = 2
2
2
2
R
Two functions are orthogonal if f (x)g(x)dx = 0. A function is normal if jjf jj = 1. R A sequence of functions ; ; ; : : : is orthonormal if j (x)dx = R 1 for each j and i(x)j (x)dx = 0 for i 6= j . An orthonormal sequence is complete if the only function that is orthogonal to each j is the zero function. In this case, the functions ; ; ; : : : form in basis, meaning that if f 2 L then f can The the be written as 1
1
2
2
2
3
3
f (x) =
equality in displayed equationR means that (f (x)
2
1 X j =1
j j (x);
where
Parseval's relation says that Z
jjf jj f (x)dx = 2
2
where = ( ; ; : : :). 1
j =
1 X j =1
Z b a
f (x)j (x)dx:
(22.5) (22.6)
j2 jj jj2
2
Example 22.1 An example of an orthonormal basis for L2 [0; 1] is the de ned as follows. Let 1 (x) = 1 and for j 2 de ne p
cosine basis
j (x) =
2 cos((j 1)x):
The rst six functions are plotted in Figure 22.1.
Example 22.2 Let
2 :1 f (x) = x(1 x) sin (x + :05) p
(22.7)
fn (x))2 dx ! where fn (x) Pn j =1 j j (x).
0 =
22. Smoothing Using Orthogonal Functions
−1.5
0.6
−1.0
0.8
−0.5
1.0
0.0
0.5
1.2
1.0
1.4
1.5
396
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
0.0
FIGURE 22.1. The rst six functions in the cosine basis.
22.1 Orthogonal Functions and L2 Spaces
397
which is called the \doppler function." Figure 22.2 shows f (top left) and its approximation
fJ (x) =
J X j =1
j j (x)
with J equal to 5 (top right), 20 (bottom left) and 200 (bottom right). As J increases we see that fJ (x) gets closer to f (x). The R1 coeÆcients j = 0 f (x)j (x)dx were computed numerically.
Example 22.3 The ne by
Pj (x) =
Legendre polynomials on [ 1; 1] are de-
1
dj 2 2j j ! dxj (x
1)j ;
j = 0; 1; 2; : : :
(22.8)
It can be shown that these functions are complete and orthogonal and that Z 1
2 : (22.9) 2j + 1 p It follows that the functions j (x) = (2j + 1)=2Pj (x), j = 0; 1; : : : form an orthonormal basis for L ( 1; 1). The rst few 1
Pj2(x)dx =
2
Legendre polynomials are
P0 (x) P1 (x)
= = P (x) = P (x) = 2
3
1
x
1 3x 1 2 1 5x 3x: 2 2
3
These polynomials may be constructed explicitly using the following recursive relation:
Pj +1 (x) =
(2j + 1)xPj (x) j+1
jPj
1
(x) :
(22.10)
22. Smoothing Using Orthogonal Functions
−0.4
−0.1
−0.2
0.0
0.0
0.1
0.2
0.2
0.4
398
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−0.4
−0.4
−0.2
−0.2
0.0
0.0
0.2
0.2
0.4
0.4
0.0
FIGURE 22.2. Approximating the doppler function p 2:1 f (x) = x(1 x) sin (x+:05) with its expansion in the cosine basis. The P function f (top left) and its approximation fJ (x) = Jj=1 j j (x) with J equal to 5 (top right), 20 (bottom R 1 left) and 200 (bottom right). The coeÆcients j = 0 f (x)j (x)dx were computed numerically.
22.2 Density Estimation
399
The coeÆcients ; ; : : : are related to the smoothness of the function f . To see why, note that if f is smooth, then its derivatives will be nite. Thus we expect that, for some k, R (f k (x)) dx < 1 where f k is ther k derivative of f . Now P1 consider the cosine basis (22.7) and let f (x) = j j j (x). Then, 1
1
( )
2
2
( )
th
0
=1
Z
1 0
(f k (x)) dx = 2 ( )
2
P1 2 j =1 j
1 X j =1
j2 ( (j
1)) k : 2
The only way that ((j 1)) k can be nite is if the j 's get small when j gets large. To summarize: 2
If the function f is smooth then the coecients j will be small when j is large.
For the rest of this chapter, assume we are using the cosine basis unless otherwise speci ed. 22.2
Density Estimation
Let X ; : : : ; Xn be observations from a distribution on [0; 1] with density f . Assuming f 2 L we can write iid
1
2
f (x) =
1 X j =1
j j (x)
where ; ; : : : is an orthonormal basis. De ne n X 1 b = (X ): 1
2
j
Theorem
n
i=1
j
i
(22.11)
22.4 The mean and variance of bj are E
bj
V bj
= =
j j2 n
(22.12) (22.13)
400
22. Smoothing Using Orthogonal Functions
where
Z
= V (j (Xi)) = (j (x) j ) f (x)dx: (22.14) We have n X 1 b E j = n E (j (Xi)) i = EZ (j (X )) = j (x)f (x)dx = j : The calculation for the variance is similar. Hence, bj isPan unbiased estimate of j . It is tempting to estimate f by 1j bj j (x) but this turns out to have a very high variance. Instead, consider the estimator j
2
2
Proof.
=1
1
=1
( )=
fb x
J X j =1
bj j (x):
(22.15)
The number of terms J is a smoothing parameter. Increasing J will decrease bias while increasing variance. For technical reasons, we restrict J to lie in the range 1J p p where p = p(n) = n. To emphasize the dependence of the risk function on J , we write the risk function as R(J ). Theorem
22.5 The risk of fb is given by R(J ) =
J X j2 j =1
n
+
1 X
j =J +1
(22.16)
j2 :
In kernel estimation, we used cross-validation to estimate the risk. In the orthogonal function approach, we instead use the risk estimator Rb
(J ) =
J X bj2 j =1
n
+
p X
j =J +1
bj2
bj2 n
+
(22.17)
22.2 Density Estimation
401
where a = maxfa; 0g and +
n X 1 j (Xi ) bj = n 1i
2
(22.18)
bj :
2
=1
To motivate this estimator, note that bj is an unbiased estmate of j and bj bj is an unbiased estimator of j . We take the positive part of the latter term since we know that j cannot be negative. We now choose 1 Jb p to minimize Rb(f;b f ). Here is a summary. 2
2
2
2
2
2
Summary of Orthogonal Function Density Estimation
1. Let
bj
n X 1 = (X ):
n
i=1
j
i
2. Choose Jb to minimize Rb(J ) over 1 J p = pn where Rb(J ) =
and
J X bj2 j =1
+
n
p X j =J +1
bj
2
n X 1 bj = j (Xi ) n 1i 2
bj2 n
+
2
bj :
=1
3. Let
( )=
fb x
Jb X j =1
bj j (x):
The estimator fbn can be negative. If we are interested in exploring the shape of f , this is not a problem. However, if we need our estimate to be a probability density function, we
402
22. Smoothing Using Orthogonal Functions
can truncate the estimate and then normalize it. That, we take R fb = maxffbn (x); 0g= maxffbn(u); 0gdu. Now let us construct a con dence band for f . Suppose we estimate f using JPorthogonal functions. We are essentially estimating fe(x) = Jj j j (x) not the true density f (x) = P1 j j j (x). Thus, the con dence band should be regarded as a band for fe(x). Theorem 22.6 An approximate 1 con dence band for fe is (`(x); u(x)) where `(x) = fbn (x) c; u(x) = fbn (x) + c (22.19) 1
0
=1
=1
where
JK c= p n
and
p 1 + p2z
s
2
J
(22.20)
K = 1max max jj (x)j: j J x
For the cosine basis, K
p = 2.
Example 22.7 Let
5 1 X (x; ; :1) f (x) = (x; 0; 1) + j 6 6j 5
=1
where (x; ; ) denotes a Normal density with mean and standard deviation , and (1 ; : : : ; 5 ) = ( 1; 1=2; 0; 1=2; 1). Marron and Wand (1990) call this \the claw" although I prefer the \Bart Simpson." Figure 22.3 shows the true density as well as the estimated density based on n = 5; 000 observations and a 95 per cent con dence band. The density has been rescaled to have most of its mass between 0 and 1 using the transformation y = (x + 3)=6.
403
0
1
2
3
4
22.2 Density Estimation
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
1
2
3
4
0.0
FIGURE 22.3. The top plot is the true density for the Bart Simpson distribution (rescaled to have most of its mass between 0 and 1). The bottom plot is the orthogonal function density estimate and 95 per cent con dence band.
404
22. Smoothing Using Orthogonal Functions
22.3
Regression
Consider the regression model Yi = r(xi ) + i ; i = 1; : : : ; n (22.21) where the i are independent with mean 0 and variance . We will initially focus on the special case where xi = i=n. We assume that r 2 L [0; 1] and hence we can write 2
2
r(x) =
1 X j =1
j j (x)
where
Z
j =
1 0
r(x)j (x)dx
(22.22)
where ; ; : : : where is an orthonormal basis for [0; 1]. De ne n 1X Yi j (xi ); j = 1; 2; : : : (22.23) bj = n 1
2
i=1
Since bj is an average, the central limit theorem tells us that bj will be approximately normally distributed. Theorem
22.8
2 bj N j ; : n
(22.24)
The mean of bj is n n X X 1 1 b E (Y ) (x ) = r(x ) (x ) E ( ) =
Proof.
j
i
n
Z i=1
j
i
n
r(x)j (x)dx = j
i=1
i
j
i
where the approximate equality follows from the de nition of a R P Riemann integral: i n h(xi) ! h(x)dx where n = 1=n. The variance is n X 1 b V ( ) = V (Y ) (x ) 1
0
j
n2
i=1
i
2
j
i
22.3 Regression
=
405
n n 2 X 2 1 X 2 ( x ) = 2 (x ) n2 i=1 j i n n i=1 j i Z 2 2 2 j (x)dx = n n
since R j (x)dx = 1. As we did for density estimation, we will estimate r by 2
rb(x) =
Let
R(J ) = E
Z
J X j =1
bj j (x):
(r(x)
rb(x))2 dx
be the risk of the estimator. P Theorem 22.9 The risk R(J ) of the estimator rbn (x) = Jj
=1
is
R(J ) =
1 J 2 X + 2: n j =J +1 j
bj j (x)
(22.25)
To estimate for = V ar(i ) we use 2
b2 =
n n X b2 k i=n k+1 j
(22.26)
where k = n=4. To motivate this estimator, recall that if f is smooth then j p0 for large j . So, for j k, bj N (0; =n). Thus, bj bj = n for for j k, where bj N (0; 1). Therefore, 2
b
2
=
n n X b2 k i=n k+1 j
n n X k i=n k+1
pn bj
2
406
22. Smoothing Using Orthogonal Functions
= =
n 2 X b2 k i=n k+1 j 2 2 k k
since a sum of k Normals has a k distribution. Now E (k ) = k and hence E (b ) . Also, V (k ) = 2k and hence V (b ) ( =k )(2k) = (2 =k) ! 0 as n ! 1. Thus we expect b to be a consistent estimator of . There is nothing special about the choice k = n=4. Any k that increases with n at an appropriate rate will suÆce. We estimate the risk with 2
2
4
2
2
2
2
2
4
2
2
n b2 X Rb(J ) = J + b2 n j =J +1 j
b2 n
: +
(22.27)
We are now ready to give a complete description of the method, which Beran (2000) calls REACT (Risk Estimation and Adaptation by Coordinate Transformation).
22.3 Regression
407
Orthogonal Series Regression Estimator
1. Let
bj
2. Let
n X 1 = n Yij (xi ); j = 1; : : : ; n: i=1
n n X b2 b = k i=n k+1 j
(22.28)
2
where k n=4. 3. For 1 J n, compute the risk estimate Rb
n b2 X (J ) = J n + bj2 j =J +1
b2 n
: +
4. Choose Jb 2 f1; : : : ng to minimize Rb(J ). 5. Let Jb X fb(x) = bj j (x): j =1
Example 22.10 Figure 22.4 shows the doppler function f and n = 2; 048 observations generated from the model
Yi = r(xi ) + i N (0; (:1)2 ). Then gure shows the true
where xi = i=n, i function, the data, the estimated function and the estimated risk. The estimated gure was based on Jb = 234 terms.
Finally, we turn to con dence bands. As before, these bands are not really for the true function r(xP) but rather for the smoothed version of the function re(x) = Jjb j j (x). =1
22. Smoothing Using Orthogonal Functions
−0.4
−0.5
−0.2
0.0
0.0
0.2
0.5
0.4
408
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.04 0.00
−0.4
0.02
−0.2
0.0
Estimated Risk
0.2
0.06
0.4
0.08
0.0
0
500
1000 J
FIGURE 22.4. The doppler test function.
1500
2000
22.3 Regression
409
22.11
Theorem Suppose the estimate rb is based on J terms and b is de ned as in equation (22.28). Assume that J < n k + 1. An approximate 1 con dence band for re is (`; u) where
`(x) u(x) where
=
p
s
p
s
rbn(x) K J
=
rbn(x) + K J
z b J b2 pn + n
(22.29)
z b J b2 pn + n
(22.30)
K = 1max max jj (x)j j J x
and
b
2
2 J b J = n 1+ k 4
(22.31)
and k = n=4 is p from equation (22.28). In the cosine basis, we may use K = 2.
Example 22.12 Figure 22.5 shows the con dence envelope for the doppler signal. Th rst plot is based on J = 234 (the value of J that minimizes the estimated risk). The second is based on p J = 45 n. Larger J yields a higher resolution estimator at the cost of large con dence bands. Smaller J yields a lower resolution estimator but has tighter con dence bands.
So far, we have assumed that the xi's are of the form f1=n; 2=n; : : : ; 1g. If the xi's are on interval [a; b] then we can rescale them so that are in the interval [0; 1]. If the xi 's are not equally spaced the methods we have discussed still apply so long as the xi 's \ ll out" the interval [0,1] in such a way so as to not be too clumped together. If we want to treat the xi's as random instead of xed, then the method needs signi cant modi cations which we shall not deal with here.
22. Smoothing Using Orthogonal Functions
−1.0
−0.5
0.0
0.5
1.0
410
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−1.0
−0.5
0.0
0.5
1.0
0.0
FIGURE 22.5. The con dence envelope for the doppler test function using n = 2; 048 observations. The top plot shows the estimate and envelope using J = 234 terms. The bottom plot shows the estimate and envelope using J = 45 terms.
22.4 Wavelets 22.4
411
Wavelets
Suppose there is a sharp jump in a regression function f at some point x but that f is otherwise very smooth. Such a function f is said to be spatially inhomogeneous. See Figure 22.6 for an example. It is hard to estimate f using the methods we have discussed so far. If we use a cosine basis and only keep low order terms, we will miss the peak; if we allow higher order terms we will nd the peak but we will make the rest of the curve very wiggly. Similar comments apply to kernel regression. If we use a large bandwidth, then we will smooth out the peak; if we use a small bandwidth, then we will nd the peak but we will make the rest of the curve very wiggly. One way to estimate inhomogeneous functions is to use a more carefully chosen basis that allows us to place a \blip" in some small region without adding wiggles elsewhere. In this section, we describe a special class of bases called wavelets, that are aimed at xing this problem. Statistical inference suing wavelets is a large and active area. We will just discuss a few of the main ideas to get a avor of this approach. We start with a particular wavelet called the Haar wavelet. The Haar father wavelet or Haar scaling function is de ned by 0x<1 (x) = 10 ifotherwise (22.32) : The mother Haar wavelet is de ned by (x) = 11 ifif 0 < xx 1:; (22.33) For any integers j and k de ne j;k (x) = 2j= (2j x k) and j;k (x) = 2j= (2j x k): (22.34) The function j;k has the same shape as but it has been rescaled by a factor of 2j= and shifted by a factor of k. 1 2
2
1 2
2
2
22. Smoothing Using Orthogonal Functions
0.4
0.6
0.8
1.0
1.2
1.4
412
0.0
0.2
0.4
0.6
FIGURE 22.6. An inhomogeneous function. The function is smooth except for a sharp peak in one place.
0.8
1.0
22.4 Wavelets
413
See Figure 22.7 for some examples of Haar wavelets. Notice that for large j , j;k is a very localized function. This makes it possible to add a blip to a function in one place without adding wiggles elsewhere. Increasing j is like looking in a microscope at increasing degrees of resolution. In technical terms, we say that wavelets provide a multiresolution analysis of L (0; 1). Let Wj = f jk ; k = 0; 1; : : : ; 2j 1g be the set of rescaled and shifted mother wavelets at resolution j. 2
Theorem
22.13 The set of functions n
; W0 ; W1 ; W2 ; : : : ;
o
is an orthonormal basis for L2 (0; 1).
It follows from this theorem that we can expand any function f 2 L (0; 1) in this basis. Because each Wj is itself a set of functions, we write the expansion as a double sum: 2
f (x) = (x) +
where =
Z
1 0
1 2X1 X j
j =0 k=0
j;k
f (x)(x) dx; j;k =
j;k
Z
1 0
(x)
f (x)
(22.35) j;k
(x) dx:
We call the scaling coeÆcient and the j;k's are called the detail coeÆcients. We call the nite sum fe(x) = (x) +
j 1 J 1 2X X
j =0 k=0
j;k
j;k
(x)
(22.36)
2 0 −2 −4
−4
−2
0
2
4
22. Smoothing Using Orthogonal Functions
4
414
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
2 0 −2 −4
−4
−2
0
2
4
0.2
4
0.0
FIGURE 22.7. Some Haar wavelets. Top left: the father wavelet (x); top right: the mother wavelet (x); bottom left: 2;2 (x); bottom right: 4;10 (x).
22.4 Wavelets
415
the resolution J approximation to f . The total number of terms in this sum is 1+
J 1 X j =0
2j = 1 + 2J 1 = 2J :
Example 22.14 Figure 22.8 shows the doppler signal, and its resolution J approximation for J = 3; 5 and J = 8.
Haar wavelets are localized, meaning that they are zero outside an interval. But they are not smooth. This raises the question of whether there exist smooth, localized wavelets that from an orthonormal basis. In 1988, Ingrid Daubechie showed that such wavelets do exist. These smooth wavelets are diÆcult to describe. They can be constructed numerically but there is no closed form formual for the smoother wavelets. To keep things simple, we will continue to use Haar wavelets. For your interest, gure 22.9 shows an example of a smooth wavelet called the \symmlet 8." We can now use wavelets to do density estimation and regression. We shall only discuss the regression problem Yi = r(xi ) + i where i N (0; 1) and xi = i=n. To simplify the discussion we assume that n = 2J for some J . There is one major dierence between estimation using wavelets instead of a cosine (or polynomial) basis. With the cosine basis, we used all the terms 1 j J for some J . With wavelets, we use a method called thresholding where we keep a term in the function approximation if its coeÆcient is large, otherwise, we throw we don't use that term. There are many versions of thresholding. The simplest is called hard, universal thresholding.
22. Smoothing Using Orthogonal Functions
0.0
0.0
−0.4
−0.4
−0.2
−0.2
f
0.2
0.2
0.4
0.4
416
0.0
0.2
0.4
0.6
0.8
1.0
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−0.4
−0.4
−0.2
−0.2
0.0
0.0
0.2
0.2
0.4
0.4
x
0.0
0.2
0.4
FIGURE 22.8. PThe Pdoppler signal and its reconstruction fe(x) = (x) + Jj =01 k j;k j;k (x) based on J = 3, J = 5 and J = 8.
417
−1.0
0.0
0.5
1.0
1.5
22.4 Wavelets
0
2
4
−0.2
0.2
0.6
1.0
−2
0
2
4
FIGURE 22.9. The symmlet 8 wavelet. The top plot is the mother and the bottom plot is the father.
6
418
22. Smoothing Using Orthogonal Functions
Haar Wavelet Regression
1. Let J = log (n) and de ne 1 X (x )Y and D = 1 X b = k i i j;k n n 2
i
j;k
i
(xi )Yi (22.37)
for 0 j J 1. 2. Estimate by p median j DJ ;k j : k = 0; : : : ; 2J b = n 0:6745 1
3. Apply universal thresholding: bj;k =
4. Set
(
Dj;k
0
if jDj;kj > b otherwise:
( ) = b(x) +
fb x
q
j 1 J 1 2X X
j =j0 k=0
2 log
n
bj;k
n
)
j;k
1
1 : (22.38) (22.39)
(x):
In practice, we do not compute Sk and Dj;k using (22.37). Instead, we use the discrete wavelet transform (DWT) which is very fast. For Haar wavelets, the DWT works as follows.
22.4 Wavelets
419
DWT for Haar Wavelets
Let y be the vector of Yi's (length n) and let J = log (n). Create a list D with elements D[[0]]; : : : ; D[[J 1]]: Set: p 2
temp < y= n:
Then do: for(j m I D[[j ]]
in < < <
temp <
(J 1) : 0)f 2j (1 : m) p temp[2 I ] temp[(2 I ) 1] = 2 p temp[2 I ] + temp[(2 I ) 1] = 2 g
Remark 22.15 If you use a programming language that does not allow a 0 index for a list, then index the list as
D[[1]]; : : : ; D[[J ]]
and replace the second last line with
p 1] = 2 The estimate for probably looks strange. It is similar to the estimate we used for the cosine basis but it is designed to be insensitive to sharp peaks in the function. To understand the intuition behind universal thresholding, consider what happens when there is no signal, that is, when j;k = 0 for all j and k.
D[[j + 1]] < temp[2 I ] temp[(2 I )
420
22. Smoothing Using Orthogonal Functions
22.16
Theorem Suppose that j;k = 0 for all j and k and let bj;k be the universal threshold estimator. Then
(
P bj;k
= 0 for all j; k) ! 1
as n ! 1.
To simplify the proof, assume that is known. Now Dj;k N (0; =n). We will need Mill's inequality : if Z p t = N (0; 1) then P(jZ j > t) (c=t)e where c = 2= is a constant. Thus, X P(max jDj;k j > ) P(jDj;k j > ) j;k p pn X njDj;k j > P Proof.
2
2 2
=
j;k X
c pn exp j;k c p2 log ! 0: n
1 n 2
2
2
Example 22.17 Consider Yi = r(xi ) + i where f is the doppler signal, = :1 and n = 2048. Figure 22.10 shows the data and the estimated function using universal thresholding. Of course, the estimate is not smooth since Haar wavelets are not smooth. Nonetheless, the estimate is quite accurate.
22.5
Bibliographic Remarks
A reference for orthogonal function methods is Efromovich (1999). See also Beran (2000) and Beran and Dumbgen (1998). An introduction to wavelets is given in Ogden (1997). A more advanced treatment can be found in Hardle, Kerkyacharian, Picard and Tsybakov (1998). The theory of statistical estimation
421
−0.5
0.0
0.5
22.5 Bibliographic Remarks
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−0.4
−0.2
0.0
0.2
0.4
0.0
FIGURE 22.10. Estimate of the Doppler function using Haar wavelets and universal thresholding.
422
22. Smoothing Using Orthogonal Functions
using wavelets has been developed by many authors, especially David Donoho and Ian Johnstone. There is a series of papers especially worth reading: Donoho and Johnstone (1994), Donoho and Johnstone (1995a), Donoho and Johnstone (1995b), and Donoho and Johnstone (1998). 22.6
Exercises
1. Prove Theorem 22.5. 2. Prove Theorem 22.9. 3. Let 1 1 1 1 1 1 1 2 = p3 ; p3 ; p3 ; = p2 ; p2 ; 0 ; = p6 ; p6 ; p6 : Show that these vectors have norm 1 and are orthogonal. 4. Prove Parseval's relation equation (22.6). 5. Plot the rst ve Legendre polynomials. Very, numerically, that they are orthonormal. 6. Expand the following functions in the cosine basis on [0; 1]. For (a) and (b), nd the coeÆcients j analytically. For (c) and (d), nd the coeÆcients j numerically, i.e. 1
2
j =
Z
1 0
3
N r r X 1 f f (x)j (x) N N j N r=1
for some large integer N . Then plot the partial sum Pnj j j (x) for increasing values of n. p (a) f (x) = 2cos(3x). (b) f (x) = sin(x). (c) f (x) = Pj hj K (x tj ) where K (t) = (1+sign(t))=2, =1
11 =1
22.6 Exercises
423
(tj ) = (:1; :13; :15; :23; :25; :40; :44; :65; :76; :78; :81), (hj ) = (4; 5; 3; 4; 5; 4:2; 2:1; 4:3; 3:1; 2:1; 4:2). p (d) f = x(1 x) sin x : : : 7. Consider the glass fragments data. Let Y be refractive index and let X be aluminium content (the fourth variable). (a) Do a nonparametric regression to t the model Y = f (x)+ using the cosine basis method. The data are not on a regular grid. Ignore this when estimating the function. (But do sort the data rst.) Provide a function estimate, an estimate of the risk and a con dence band. (b) Use the wavelet method to estimate f . 8. Show that the Haar wavelets are orthonormal. 9. Consider again the doppler signal: p 2 :1 f (x) = x(1 x) sin : x + 0:05 21 ( + 05)
Let n = 1024 and let (x ; : : : ; xn ) = (1=n; : : : ; 1). Generate data Yi = f (xi ) + i where i N (0; 1). (a) Fit the curve using the cosine basis method. Plot the function estimate and con dence band for J = 10; 20; : : : ; 100. (b) Use Haar wavelets to t the curve. 10. (Haar density Estimation.) Let X ; : : : ; Xn f for some density f on [0; 1]. Let's consider constructing a wavelet histogram. Let and be the Haar father and mother wavelet. Write 1
1
f (x) (x) +
j 1 J 1 2X X
j =0 k=0
j;k
j;k
(x)
424
22. Smoothing Using Orthogonal Functions
where J log (n). Let 2
bj;k
n X 1 =
n
i=1
j;k
(Xi):
(a) Show that bj;k is an unbiased estimate of j;k. (b) De ne the Haar histogram ( ) = (x) +
fb x
j 1 B 2X X
j =0 k=0
bj;k
j;k
(x)
for 0 B J 1. (c) Find an approximate expression for the MSE as a function of B . (d) Generate n = 1000 observations from a Beta (15,4) density. Estimate the density using the Haar histogram. Use leave-one-out cross vaidation to choose B . 11. In this question, we will explore the motivation for equation (22.38). Let X ; : : : ; Xn N (0; ). Let p median(jX j; : : : ; jXnj) b = n : 0:6745 (a) Show that E (b) = . (b) Simulate n = 100 observations from a N(0,1) distribution. Compute b as well as the usual estimate of . Repeat 1000 times and compare the MSE. (c) Repeat (b) but add some outliers to the data. To do this, simulate each observation from a N(0,1) with probability .95 and simulate each observation from a N(0,10) with probability .95. 12. Repeat question 6 using the Haar basis. 2
1
1
This is page 425 Printer: Opaque this
23 Classi cation
23.1 Introduction The problem of predicting a discrete random variable Y from another random variable X is called classi cation, supervised learning, discrimination or pattern recognition. In more detail, consider iid data (X1 ; Y1 ); : : : ; (Xn ; Yn) where Xi = (Xi1 ; : : : ; Xid ) 2 X R d is a d-dimensional vector and Yi takes values in some nite set Y . A classi cation rule is a function h : X ! Y . When we observe a new X , we predict Y to be h(X ). Example 23.1 Here is a an example with fake data. Figure 23.1 shows 100 data points. The covariate X = (X1 ; X2 ) is 2-dimensional
and the outcome Y 2 Y = f0; 1g. The Y values are indicated on the plot with the triangles representing Y = 1 and the squares representing Y = 0. Also shown is a linear classi cation rule represented by the solid line. This is a rule of the form a + b1 x1 + b2 x2 > 0 h(x) = 10 ifotherwise :
426
23. Classi cation
Everything above the line is classi ed as a 0 and everything below the line is classi ed as a 1.
Example 23.2 The Coronary Risk-Factor Study (CORIS) data
involve 462 males between the ages of 15 and 64 from three rural areas in South Africa (Rousseauw et al 1983, Hastie and Tibshirani, 1987). The outcome Y is the presence (Y = 1) or absence (Y = 0) of coronary heart disease. There are 9 covariates: systolic blood pressure, cumulative tobacco (kg), ldl (low density lipoprotein cholesterol), adiposity, famhist (family history of heart disease), typea (type-A behavior), obesity, alcohol (current alcohol consumption), and age. Figure 23.2 shows the outcomes and two of the covariates, systolic blood pressure and tobaco consumption. People with/without heart disease are coded with triangles and squares. Also shown is a linear decision boundary which we will explain shortly. In this example, the groups are much harder to tell apart. In fact, 141 people are misclassi ed using this classi cation rule.
At this point, it is worth revisiting the Statistics/Data Mining dictionary: Statistics classi cation data covariates classi er estimation
Computer Science supervised learning training sample features hypothesis learning
Meaning predicting a discrete Y from X (X1 ; Y1 ); : : : ; (Xn; Yn) the Xi 's map h : X ! Y nding a good classi er
In most cases in this Chapter, we deal with the case f0; 1g.
Y
=
427
0.0
0.2
0.4
x2
0.6
0.8
1.0
23.1 Introduction
0.0
0.2
0.4
0.6 x1
FIGURE 23.1. Two covariates and a linear decision boundary. 4 means Y = 1. means Y = 0. You won't see real data like this.
0.8
1.0
23. Classi cation
15 10 5 0
Tobacco
20
25
30
428
100
120
140
160
180
Systolic Blood Pressure
FIGURE 23.2. Two covariates and and a linear decision boundary with data from the Coronary Risk-Factor Study (CORIS). 4 means Y = 1. means Y = 0. This is real life.
200
220
23.2 Error Rates and The Bayes Classi er
429
23.2 Error Rates and The Bayes Classi er Our goal is to nd a classi cation rule h that makes accurate predictions. We start with the following de nitions. One can use other De nition 23.3 The
true error rate of a classi er h is
L(h) = P(fh(X ) 6= Y g)
(23.1)
and the empirical error rate or training error rate is n 1X b Ln (h) = I (h(Xi ) 6= Yi ): (23.2) n i=1
First we consider the special case where Y = f0; 1g. Let
r(x) = E (Y jX = x) = P(Y = 1jX = x)
denote the regression function. From Bayes' theorem we have that f1 (x) r(x) = (23.3) f1 (x) + (1 )f0 (x) where
f0 (x) = f (xjY = 0) f1 (x) = f (xjY = 1) and = P(Y = 1). De nition 23.4 The
ned to be
Bayes classi cation rule h is de
1 if r(x) > 12 (23.4) 0 otherwise: The set D(h) = fx : P(Y = 1jX = x) = P(Y = 0jX = x)g is called the decision boundary.
h (x) =
measures of error as well.
430
23. Classi cation
Warning! The Bayes rule has nothing to do with Bayesian
inference. We could estimate the Bayes rule using either frequentist or Bayesian methods. The Bayes' rule may be written in several equivalent forms:
h (x) = and
1 if P(Y = 1jX = x) > P(Y = 0jX = x) 0 otherwise
h (x) =
1 if f1 (x) > (1 )f0 (x) 0 otherwise:
(23.5) (23.6)
23.5 The Bayes rule is optimal, that is, if h is any other classi cation rule then L(h ) L(h). Theorem
The Bayes rule depends on unknown quantities so we need to use the data to nd some approximation to the Bayes rule. At the risk of oversimplifying, there are three main approaches: 1. Empirical Risk Minimization. Choose a set of classi ers and nd bh 2 H that minimizes some estimate of L(h).
H
2. Regression. Find an estimate rb of the regression function r and de ne if rb(x) > 21 b h(x) = 01 otherwise : 3. Density Estimation. Estimate f0 from the Xi 's for which Yi = 0, estimate f1 from the Xi 's for which Yi = 1 and let Pn 1 b = n i=1 Yi . De ne
rb(x) = Pb(Y = 1jX = x) = and
b h(x)
=
bfb1 (x) bfb1 (x) + (1 b)fb0 (x)
1 if rb(x) > 12 0 otherwise:
23.3 Gaussian and Linear Classi ers
431
Now let us generalize to the case where Y takes on more than two values as follows. Theorem
23.6 Suppose that Y 2 Y = f1; : : : ; K g. The
optimal rule is
h(x) = argmaxk P(Y = kjX = x) = argmaxk k fk (x) where
(23.7) (23.8)
fk (x)k ; (23.9) r fr (x)r r = P (Y = r), fr (x) = f (xjY = r) and argmaxk means \the value of k that maximizes that expression." P(Y
= k j X = x) =
P
23.3 Gaussian and Linear Classi ers Perhaps the simplest approach to classi cation is to use the density estimation strategy and assume a parametric model for the densities. Suppose that Y = f0; 1g and that f0 (x) = f (xjY = 0) and f1 (x) = f (xjY = 1) are both multivariate Gaussians:
fk (x) =
1
(2 )d=2 jk j1=2
exp
1 (x 2
k )T k 1 (x
k ) ; k = 0; 1:
Thus, X jY = 0 N (0 ; 0 ) and X jY = 1 N (1 ; 1 ).
432
23. Classi cation
Theorem 23.7 If X jY = 0 N (0 ; 0 ) and X jY = 1 N (1 ; 1 ), then the Bayes rule is
h (x) =
(
1 if r12 < r02 + 2 log 0 otherwise
1 0
+ log jj01 jj
(23.10)
where
ri2 = (x i )T i 1 (x i ); i = 1; 2
(23.11)
is the Manalahobis distance. An equivalent way of expressing the Bayes' rule is
h(x) = argmaxk Æk (x) where
1 log jk j 2
1 (x 2
k )T k 1 (x k ) + log k (23.12) and jAj denotes the determinant of a matrix A. Æk (x) =
The decision boundary of the above classi er is quadratic so this procedure is often called quadratic discriminant analysis (QDA). In practice, we use sample estimates of ; 1; 2 ; 0 ; 1 in place of the true value, namely: n n 1X 1X b0 = (1 Yi ); b1 = Y n i=1 n i=1 i 1 X 1 X Xi ; b1 = X b0 = n0 i: Y =0 n1 i: Y =1 i 1 X 1 X S0 = (Xi b0 )(Xi b0 )T ; S1 = (X n0 i: Y =0 n1 i: Y =1 i i
i
i
where n0 =
P
i
i (1
Yi ) and n1 =
P i Yi .
b1 )(Xi
b1 )T
23.3 Gaussian and Linear Classi ers
433
A simpli cation occurs if we assume that 0 = 0 = . In that case, the Bayes rule is
h(x) = argmaxk Æk (x)
(23.13)
where now
1 T 1 + log k : (23.14) 2 k The parameters are estimated as before except that the mle of is n S +n S S = 0 0 1 1: n0 + n1 The classi cation rule is Æ1 (x) > Æ0 (x) h (x) = 10 ifotherwise (23.15) where
Æk (x) = xT 1 k
Æj (x) = xT S 1 bj
1 T 1 b S bj + log bj 2 j
is called the discriminant function. The decision boundary fx : Æ0 (x) = Æ1(x)g is linear so this method is called linear
discrimination analysis (LDA).
Example 23.8 Let us return to the South African heart disease
data. The decision rule in Figure 23.2 was obtained by linear discrimination. The outcome was classi ed as 0 classi ed as 1 y = 0 277 25 y = 1 116 44 The observed misclassi cation rate is 141=462 = :31. Including all the covariates reduces the error rate to .27. The results from quadratic discrimination are classi ed as 0 classi ed as 1 y = 0 272 30 y = 1 113 47
434
23. Classi cation
which has about the same error rate 143=462 = :31. Including all the covariates reduces the error rate to .26. In this example, there is little advantage to QDA over LDA.
Now we generalize to the case where Y takes on more than two values.
23.9 Suppose that Y 2 f1; : : : ; K g. If fk (x) = f (xjY = k) is Gaussian, the Bayes rule is Theorem
h(x) = argmaxk Æk (x) where
1 log jk j 2
1 (x 2
k )T k 1 (x k ) + log k : (23.16) If the variances of the Gaussians are equal then Æk (x) =
Æk (x) = xT 1 k
1 T 1 + log k : 2 k
(23.17)
We estimate Æk (x) by by inserting estimates of k , k and k . There is another version of linear discriminant analysis due to Fisher. The idea is to rst reduce the dimension of covariates to one dimension by projecting the data onto a line. Algebraically, this meansd replacing the covariatePX = (X1 ; : : : ; Xd ) with a linear combination U = wT X = dj=1 wj Xj . The goal is to choose the vector w = (w1 ; : : : ; wd ) that \best separates the data." Then we perform classi cation with the new covariate Z instead of X . We need de ne what we mean by separation of the groups. We would like the two groups to have means that are far apart relative to their spread. Let j denote the mean of X for Yj and let be the variance matrix of X . Then E (U jY = j ) = J E (wT X jY = j ) = wT j and V(U ) = wT w. De ne the separa-
The quantity arises in physics, where it is called the Rayleigh coeÆcient.
23.3 Gaussian and Linear Classi ers
tion by
435
(E (U jY = 0) E (U jY = 1))2 wT w T T (w 0 w 1 )2 = wT w T w (0 1 )(0 1 )T w = : T w wP We estimate J as follows. Let nj ni=1 I (Yi = j ) be the number of observations in group j , let X j be the sample mean vector of the X 's for group j , and let Sj be the sample covariance matrix in group j . De ne wT S w (23.18) Jb(w) = T B w SW w where SB = (X 0 X 1 )(X 0 X 1 ) (n0 1)S0 + (n1 1)S1 : SW = (n0 1) + (n1 1) Theorem 23.10 The vector w = SW1 (X 0 X 1 ) (23.19) is a minimizer of Jb(w). We call (23.20) U = wT X = (X 0 X 1 )T SW1 X the Fisher linear discriminant function. The midpoint m between X 0 and X 1 is 1 1 m = (X 0 + X 1 ) = (X 0 X 1 )T SB 1 (X 0 + X 1 ) (23.21) 2 2 Fisher's classi cation rule is 0 if wT X m h(x) = 1 if wT X < m 0 if (X 0 X 1 )T SW1 x m = 1 if (X 0 X 1 )T SW1 x < m: Fisher's rule is the same as the Bayes linear classi er in equation (23.14) when b = 1=2.
J (w) =
436
23. Classi cation
23.4 Linear Regression and Logistic Regression A more direct approach to classi cation is to estimate the regression function r(x) = E (Y jX = x) without bothering to estimste the densities fk . For the rest of this section, we will only consider the case where Y = f0; 1g. Thus, r(x) = P(Y = 1jX = x) and once we have an estimate rb, we will use the classi cation rule if rb(x) > 21 h(x) = 01 otherwise (23.22) : The simplest regression model is the linear regression model
Y = r(x) + = 0 +
d X j =1
j Xj +
(23.23)
where E () = 0. This model can't be correct since it does not force Y = 0 or 1. Nonetheless, it can sometimes lead to a decent classi er. Recall that the least squares estimate of = ( 0 ; 1 ; : : : ; d )T minimizes the residual sums of squares
RSS ( ) =
n X i=1
Yi
0
d X j =1
2
Xij j :
Let us brie y review what that estimator is. Let X denote the N (d + 1) matrix of the form 2 6
X=6 6 4
1 1 .. . 1
X11 X21 .. . Xn1
::: ::: .. . :::
X1d X2d .. . Xnd
3 7 7 7: 5
Also let Y = (Y1 ; : : : ; Yn)T . Then,
RSS ( ) = (Y
X )T (Y X )
23.4 Linear Regression and Logistic Regression
437
and the model can be written as
Y = X + where = (1 ; : : : ; n )T . The least squares solution b that minimizes RSS can be found by elementary calculus and is given by b = (XT X) 1 XT Y: The predicted values are b b = X : Y P
Now we use (23.22) to classify, where rb(x) = b0 + j bj xj . It seems sensible to use a regression method that takes into account that Y 2 f0; 1g. The most common method for doing so is logistic regression. Recall that the model is
P
e 0 + P x : r(x) = P (Y = 1jX = x) = 1 + e 0 + x j
j
j
j
j
j
(23.24)
We may write this is logit P (Y = 1jX = x) = 0 +
X j
j xj
where logit (a) = log(a=(1 a)). Under this model, each Yi is a Bernoulli with success probability
P
e 0 + P X pi ( ) = : 1 + e 0 + X j
j
ij
j
j
ij
The likelihood function for the data set is
L( ) =
n Y i=1
pi ( )Y (1 pi ( ))1
We obtain the mle numerically.
i
Yi :
438
23. Classi cation
Example 23.11 Let us return to the heart disease data. A logistic
regression yields the following estimates and Wald tests for the coeÆcients: Covariate Estimate Std. Error Wald statistic p-value Intercept -6.145 1.300 -4.738 0.000 sbp 0.007 0.006 1.138 0.255 tobacco 0.079 0.027 2.991 0.003 ldl 0.174 0.059 2.925 0.003 adiposity 0.019 0.029 0.637 0.524 famhist 0.925 0.227 4.078 0.000 typea 0.040 0.012 3.233 0.001 obesity -0.063 0.044 -1.427 0.153 alcohol 0.000 0.004 0.027 0.979 age 0.045 0.012 3.754 0.000 Are surprised by the fact that systolic blood pressure is not signi cant or by the minus sign for the obesity coeÆcient? If yes, then you are confusing correlation and causation. There are lots of unobserved confounding variables so the coeÆcients above cannot be interpreted causally. The fact that blood pressure is not signi cant does not mean that blood pressure is not an important cause of heart disease. It means that it is not an important predictor of heart disease relative to the other variables in the model. The error rate, using this model for classi cation, is .27. The error rate from a linear regression is .26.
We can get a better classi er by tting a richer model. For example we could t X X logit P (Y = 1jX = x) = 0 + j xj + jk xj xk : (23.25) j
j;k
More generally, we could add terms of up to order r for some integer r. Large values of r give a more complicated model which should t the data better. But there is a bias-variance tradeo which we'll discuss later. Example 23.12 If we use model (23.25) for the heart disease data
with r = 2, the error rate is reduced to .22.
23.5 Relationship Between Logistic Regression and LDA
439
The logistic regression model can easily be extended to k groups but we shall not give the details here.
23.5 Relationship Between Logistic Regression and LDA LDA and logistic regression are almost the same thing. If we assume that each group is Gaussian with the same covariance matrix then we saw that P(Y = 1jX = x) 1 = log 0 ( + 1 )T 1 (1 0 ) + xT 1 (1 log P(Y = 0jX = x) 1 2 0 0 + T x: On the other hand, the logistic model is, by assumption, P(Y = 1jX = x) = 0 + T x: log P(Y = 0jX = x) These are the same model since they both lead to classi cation rules that are linear in x. The dierence is in how we estimate the parameters. The joint density of a single observation is f (x; y ) = f (xjy )f (y ) = f (y jx)f (x). In LDA we estimated the whole joint distribution by estimating f (xjy ) and f (y ); speci cally, we estimated fk (x) = f (xjY = k), 1 = fY (1) and 0 = fY (0). We maximized the likelihood Y i
f (xi ; yi) =
Y |
i
f (xi jyi) {z
Y i
f (yi ) :
(23.26)
} | {z }
Gaussian
Bernoulli
In logistic regression we maximized the conditional likelihood Q i f (yi jxi ) but we ignored the second term f (xi ): Y i
f (xi ; yi) =
Y |
i
f (yi jxi ) {z
logistic
Y i
f (xi ) :
} | {z
ignored
}
(23.27)
0 )
440
23. Classi cation
Since classi cation only requires knowing f (y jx), we don't really need to estimate the whole joint distribution. Here is an analogy: we can estimate the mean of a distribution with the sample mean X without bothering to estimate the whole density function. Logistic regression leaves the marginal distribution f (x) un-speci ed so it is more nonparametric than LDA. To summarize: LDA and logistic regression both lead to a linear classi cation rule. In LDA we estimate the entire joint distribution f (x; y ) = f (xjy )f (y ). In logistic regression we only estimate f (y jx) and we don't bother estimating f (x).
23.6 Density Estimation and Naive Bayes Recall that the Bayes rule is h(x) = argmaxk k fk (x): If we can estimate k and fk then we can estimate the Bayes classi cation rule. Estimating k is easy but what about fk ? We did this previously by assuming fk was Gaussian. Another strategy is to estimate fk with some nonparametric density estimator fbk such as a kernel estimator. But if x = (x1 ; : : : ; xd ) is high dimensional, nonparametric density estimation is not very reliable. This problem is ameliorated if we assume that X1 ; : : : ; Xd Qd are independent, for then, fk (x1 ; : : : ; xd ) = j =1 fkj (xj ). This reduces the problem to d one-dimensional density estimation problems, within each of the k groups. The resulting classi er is called the naive Bayes classi er. The assumption that the components of X are independent is usually wrong yet the resulting classi er might still be accurate. Here is a summary of the steps in the naive Bayes classi er:
23.7 Trees
441
The Naive Bayes Classi er
1. For each group k, compute an estimate fbkj of the density fkj for Xj , using the data for which Yi = k. 2. Let
fbk (x) 3. Let
=
fbk (x1 ; : : : ; xd ) =
d Y j =1
fbkj (xj ):
n 1X bk = I (Y = k) n i=1 i
where I (Yi = k) = 1 if Yi = k and I (Yi = k) = 0 if Yi 6= k. 4. Let
h(x) = argmaxk bk fbk (x):
The naive Bayes classi er is especially popular when x is high dimensional and discrete. In that case, fbkj (xj ) is especially simple.
23.7 Trees Trees are classi cation methods that partition the covariate space X into disjoint pieces and then classify the observations according to which partition element they fall in. As the name implies, the classi er can be represented as a tree. For illustration, suppose there are two covariates, X1 = age and X2 = blood pressure. Figure 23.3 shows a classi cation tree using these variables. The tree is used in the following way. If a subject has Age 50 then we classify him as Y = 1. If a subject has Age < 50 then we check his blood pressure. If blood pressure is < 100
442
23. Classi cation
Age
< 50
50 1
Blood Pressure
< 100
100
0
1
FIGURE 23.3. A simple classi cation tree.
then we classify him as Y = 1, otherwise we classify him as Y = 0. Figure 23.4 shows the same classi er as a partition of the covariate space. Here is how a tree is constructed. For simplicity, we focus on the case in which 2 Y = f0; 1g. First, suppose there is only a single covariate X . We choose a split point t that divides the real line into two sets A1 = ( 1; t] and A2 = (t; 1). Let pbs (j ) be the proportion of observations in As such that Yi = j :
pbs (j ) =
Pn I (Yi = j; Xi As ) i=1 Pn i=1 I (Xi As )
2
2
(23.28)
for s = 1; 2 and j = 0; 1. The impurity of the split t is de ned to be
I (t) = where
s = 1
2 X s=1
1 X j =0
s
pbs (j )2 :
(23.29)
(23.30)
23.7 Trees
Blood Pressure
1 110
1 0
50 Age
FIGURE 23.4. Partition representation of classi cation tree.
443
444
23. Classi cation
This measure of impurity is known as the Gini index. If a partition element As contains all 0's or all 1's, then s = 0. Otherwise, s > 0. We choose the split point t to minimize the impurity. (Other indices of impurity besides can be used besides the Gini index.) When there are several covariates, we choose whichever covariate and split that leads to the lowest impurity. This process is continued until some stopping criterion is met. For example, we might stop when every partition element has fewer than n0 data points, where n0 is some xed number. The bottom nodes of the tree are called the leaves. Each leaf is assigned a 0 or 1 depending on whether there are more data points with Y = 0 or Y = 1 in that partition element. This procedure is easily generalized to the case where Y 2 f1; : : : ; K g: We simply de ne the impurity by
s = 1
k X j =1
pbs (j )2
(23.31)
where pbi (j ) is the proportion of observations in the partition element for which Y = j . Example 23.13 Figure 23.5 shows a classi cation tree for the
heart disease data. The misclassi cation rate is .21. Suppose we do the same example but only using two covariates: tobacco and age. The misclassi cation rate is then .29. The tree is shown in Figure 23.6. Because there are only two covariates, we can also display the tree as a partition of X = R 2 as shown in Figure 23.7.
Our description of how to build trees is incomplete. If we keep splitting until there are few cases in each leaf of the tree, we are likely to over t the data. We should choose the complexity of the tree in such a way that the estimated true error rate is low. In the next section, we discuss estimation of the error rate.
23.7 Trees
445
age < 31.5 |
age < 50.5
tobacco < 0.51
alcohol < 11.105
typea < 68.5
famhist:a
0
ldl < 6.705
tobacco < 7.605 0
0
0
1 typea < 42.5
adiposity < 28.955
tobacco < 4.15 1
adiposity < 24.435 0
adiposity < 28 1
1
1 typea < 48
0
0
0
0
FIGURE 23.5. A classi cation tree for the heart disease data.
1
446
23. Classi cation
age < 31.5 |
age < 50.5
tobacco < 0.51
tobacco < 7.47 0
0
0
0
FIGURE 23.6. A classi cation tree for the heart disease data using two covariates.
1
447
1
0
5
10
15
0
0
0
tobacco
20
25
30
23.7 Trees
0 20
30
40 age
FIGURE 23.7. The tree pictured as a partition of the covariate space.
50
60
448
23. Classi cation
23.8 Assessing Error Rates and Choosing a Good Classi er How do we choose a good classi er? We would like to have a classi er h with a low true error rate L(h). Usually, we can't use the training error rate Lbn (h) as an estimate of the true error rate because it is biased downward. Example 23.14 Consider the heart disease data again. Suppose
we t a sequence logistic regression models. In the rst model we include one covariate. In the second model we include two covariates and so on. The ninth model includes all the covariates. We can go even further. Let's also t a tenth model that includes all nine covariates plus the rst covariate squared. Then we t an eleventh model that includes all nine covariates plus the rst covariate squared and the second covariate squared. Continuing this way we will get a sequence of 18 classi ers of increasing complexity. The solid line in Figure 23.8 shows the observed classi cation error which steadily decreases as we make the model more complex. If we keep going, we can make a model with zero observed classi cation error. The dotted line shows the cross-validation estimate of the error rate (to be explained shortly) which is a better estimate of the true error rate than the observed classi cation error. The estimated error decreases for a while then increases. This is the bias-variance tradeo phenomenon we have seen before.
There are many ways to estimate the error rate. We'll consider two: cross-validation and probability inequalities. Cross-Validation. The basic idea of cross-validation, which we have already encountered in curve estmation, is to leave out some of the data when tting a model. The simplest version of cross-validation involves randomply splitting the data into two
449
0.30 0.28 0.26
error rate
0.32
0.34
23.8 Assessing Error Rates and Choosing a Good Classi er
5
10 Number of terms in the model
FIGURE 23.8. Solid line is the observed error rate and dashed line is the cross-validation estimate of true error rate.
15
450
23. Classi cation
Training Data
|
{z b h
T
Validation Data
}
|
{z Lb
FIGURE 23.9. Cross-validation.
pieces: the training set T and the validation set V . Often, about 10 per cent of the data might be set aside as the validation set. The classi er h is constructed from the training set. We then estimate the error by
Lb(h) =
1 X I (h(Xi ) 6= YI ): m X 2V
(23.32)
i
where m is the size of the validation set. See Figure 23.9. Another approach to cross-validation is K-fold cross-validation which is obtained from the following algorithm.
V }
23.8 Assessing Error Rates and Choosing a Good Classi er
451
K -fold cross-validation. 1. Randomly divide the data into K chunks of approximately equal size. A common choice is K = 10. 2. For k = 1 to K do the following: (a) Delete chunk k from the data. (b) Compute the classi er bh(k) from the rest of the data. (c) Use bh(k) to the predict the data in chunk k. Let Lb(k) denote the observed error rate. 3. Let
Lb(h)
K 1X = Lb : K k=1 (k)
(23.33)
This is how dotted line in Figure 23.8 was computed. An alternative approach to K-fold cross-validation is to split the data into two pieces called the training set and the test set. All the models are t on the training set. The models are then used to do predictions on the test set and the errors are estimated from these predictions. When the clasi cation procedure is time consuming and there are lots of data, this approach is preferred to K-fold cross-validation. Cross-validation can be applied to any classi cation method. To apply it to trees, one begins by tting an initial tree. Smaller trees are obtained by pruning tree, that is removing splits from the tree. We can do this for trees of various sizes where size refers to the number of terminal nodes on the tree. Cross-validation is then used to estimate error rate as a function of tree size. Example 23.15 Figure 23.10 shows the cross-validation plot for
a classi cation tree tted to the heart disease data. The error
452
23. Classi cation
is shown as a function of tree size. Figure 23.11 shows the tree with the size chosen by minimizing the cross-validation error.
. Another approach to estimating the error rate is to nd a con dence interval for Lbn (h) using probability inequalities. This method is useful in the context of empirical risk minimization. Let H be a set of classi ers, for example, all linear classi ers. Empirical risk minimization means choosing the classi er bh 2 H to minimize the training error Lbn (h), also called the empirical risk. Thus, Probability Inequalities
!
1X b h = argminh2H Lbn (h) = argminh2H I (h(Xi ) 6= Yi ) : n i (23.34) b b b Typically, Ln (h) underestimates the true error rate L(h) because b h was chosen to make Lbn (bh) small. Our goal is to assess how much underestimation is taking place. Our main tool for this analysis is Hoeding's inequality. Recall that if X1 ; : : : ; Xn Bernoulli(p), then, for any > 0,
j
P ( pb
2 pj > ) 2e 2n
(23.35)
P
where pb = n 1 ni=1 Xi . First, suppose that H = fh1 ; : : : ; hm g consists of nitely many classi ers. For any xed h, Lbn (h) converges in almost surely to L(h) by the law of large numbers. We will now establish a stronger result.
23.16 (Uniform Convergence.) Assume H is nite and has m elements. Then, Theorem
P
max j h2H
Lbn (h)
L(h)j >
2me
2n2 :
We will use Hoeding's inequality and weSwill also use the fact that if A1 ; : : : ; Am is a set of events then P( m i=1 Ai ) Proof.
453
145 140 135 130
score
150
155
160
23.8 Assessing Error Rates and Choosing a Good Classi er
2
4
6
8 size
FIGURE 23.10. Cross validation plot
10
12
14
454
23. Classi cation
age < 31.5 |
age < 50.5 0
typea < 68.5
famhist:a
tobacco < 7.605 0
1
1
0
FIGURE 23.11. Smaller classi cation tree with size chosen by cross-validation.
1
23.8 Assessing Error Rates and Choosing a Good Classi er Pm i=1 P(Ai ). Now, b n (h) L(h) P max L h2H
j
j>
=
Theorem
23.17 Let
s
= Then Lbn (bh) is a 1 Proof.
P (j
P
[
455 !
jLbn(h) L(h)j >
h2H X b P Ln (h) L(h) > H 2H X 2 2 2e 2n = 2me 2n :
j
j
H 2H
2 2m log : n
con dence interval for L(bh).
This follows from the fact that
j > ) P max j h2H 2me 2n = :
Lbn (bh)
L(bh)
Lbn (bh)
j>
L(bh)
2
When H is large the con dence interval for L(bh) is large. The more functions there are in H the more likely it is we have \over t" which we compensate for by having a larger con dence interval. In practice we usually use sets H that are in nite, such as the set of linear classi ers. To extend our analysis to these cases we want to be able to say something like
P
sup j
h2H
Lbn (h)
L(h)j >
something not too big:
All the other results followed from this inequality. One way to develop such a generalization is by way of the VapnikChervonenkis or VC dimension. We now consider the main ideas in VC theory.
456
23. Classi cation
Let A be a class of sets. Give a nite set F = fx1 ; : : : ; xn g let n
NA (F ) = # F
\
o
A: A2A
(23.36)
be the number of subsets of F \picked out" by A. Here #(B ) denotes the number of elements of a set B . The shatter coef cient is de ned by
s(A; n) = max NA (F )
(23.37)
F 2Fn
where Fn consists of all nite sets of size n. Now let X1 ; : : : ; Xn P and let 1X I (Xi 2 A) Pn (A) = n i denote the empirical probability measure. The following remarkable theorem bounds the distance between P and Pn .
23.18 (Vapnik and Chervonenkis (1971).) For any P, n and > 0, Theorem
P
sup jPn (A)
A2A
j > 8s(A; n)e
P(A)
n2 =32 :
(23.38)
The proof, though very elegant, is long and we omit it. If H is a set of classi ers, de ne A to be the class of sets of the form fx : h(x) = 1g. When then de ne s(H; n) = s(A; n). Theorem
23.19
P
A1
sup jLbn (h) h2H
L(h)j >
8s(H; n)e
n2 =32 :
con dence interval for L(bh) is Lbn (bh) n where
32 8s(H; n) 2n = log
n
:
23.8 Assessing Error Rates and Choosing a Good Classi er
457
These theorems are only useful if the shatter coeÆcients do not grow too quickly with n. This is where VC dimension enters. De nition 23.20 The VC (Vapnik-Chervonenkis) dimen-
sion of a class of sets A is de ned as follows. If s(A; n) = 2n for all n set V C (A) = 1. Otherwise, de ne V C (A) to be the largest k for which s(A; n) = 2k .
Thus, the VC-dimension is the size of the largest nite set F than can be shattered by A meaning that A picks out each subset of F . If H is a set of classi ers we de ne V C (H) = V C (A) where A is the class of sets of the form fx : h(x) = 1g as h varies in H. The following theorem shows that if A has nite VC-dimension then the shatter coeÆcients grow as a polynomial in n. Theorem
23.21 If A has nite VC-dimension v , then s(A; n) nv + 1:
A = f( 1; a]; a 2 Rg. The A shatters ever one point set fxg but it shatters no set of the form fx; y g. Therefore, V C (A) = 1. Example 23.23 Let A be the set of closed intervals on the real line. Then A shatters S = fx; y g but it cannot shatter sets with 3 points. Consider S = fx; y; zT g where x < y < z. One cannot nd an interval A such that A S = fx; z g. So, V C (A) = 2. Example 23.24 Let A be all linear half-spaces on the plane. Any Example 23.22 Let
three point set (not all on a line) can be shattered. No 4 point set can be shattered. Consider, for example, 4 points forming a diamond. Let T be the left and rightmost points. This can't be picked out. Other con gurations can also be seen to be unshatterable. So V C (A) = 3. In general, halfspaces in Rd have VC dimension d + 1.
458
23. Classi cation
Example 23.25 Let
A be all rectangles on the plane with sides
parallel to the axes. Any 4 point set can be shattered. Let S be a 5 point set. There is one point that is not leftmost, rightmost, uppermost, or lowermost. Let T be all points in S except this point. Then T can't be picked out. So V C (A) = 4.
23.26 Let x have dimension d and let H be th set of linear classi ers. The VC dimension of H is d +1. Hence, a 1 con dence interval for the true error rate is Lb(bh) where Theorem
8(nd+1 + 1) 32 2n = log
n
:
23.9 Support Vector Machines In this section we consider a class of linear classi ers called support vector machines. Throughout this section, we assume that Y is binary. It will be convenient to label the outcomes as 1 and +1 instead of 0 and 1. A linear classi er can then be written as h(x) = sign H (x) where x = (x1 ; : : : ; xd ),
H (x) = a0 + and sign(z ) =
8 < :
d X i=1
ai xi
1 if z < 0 0 if z = 0 1 if z > 0:
First, suppose that the data are linearly separable, that is, there exists a hyperplane that perfectly separates the two classes.
23.9 Support Vector Machines
459
Lemma 23.27 The data can be separated by some hyperplane if P and only if there exists a hyperplane H (x) = a0 + di=1 ai xi such
that
Yi H (xi ) 1; i = 1; : : : ; n:
(23.39)
Suppose the data can be separated by a hyperplane Pd W (x) = b0 + i=1 bi xi . It follows that there exists some constant c such that Yi = 1 implies W (Xi ) c and Yi = 1 implies W (XP i ) c. Therefore, Yi W (Xi ) c for all i. Let H (x) = a0 + di=1 ai xi where aj = bj =c. Then Yi H (Xi ) 1 for all i. The reverse direction is straightforward. Proof.
In the separable case, there will be many separating hyperplanes. How should we choose one? Intuitively, it seems reasonable to choose the hyperplane \furthest" from the data in the sense that it separates the +1'a and -1's and maximizes the distance to the closest point. This hyperplane is called the maximum margin hyperplane. The margin is the distance to from the hyperplane to the nearest point. Points on the boundary of the margin are called support vectors. See Figure 23.12.
23.28 The hyperplane Hb (x) = ba0 +
Pd ai xi i=1 b
that separatesP the data and maximizes the margin is given by minimizing (1=2) dj=1 b2j subject to (23.39). Theorem
It turns out that this problem can be recast as a quadratic programming problem. Recall that hXi ; Xk i = XiT Xk is the inner product of Xi and Xk .
23.29 Let Hb (x) = ba0 +
Pd ai xi i=1 b
denote the optimal (largest margin) hyperplane. Then, for j = 1; : : : ; d, Theorem
b aj
=
n X i=1
bi Yi Xj (i)
460
23. Classi cation c b
c b
bc
c b
bc bc
bc
bc
c b
c b
c b
c b
bc
H (x) = a0 + aT x = 0
bc
FIGURE 23.12. The hyperplane H (x) has the largest margin of all hyperplanes that separate the two classes.
where Xj (i) is the value of the covariate Xj for the ith data point, and b = (b1 ; : : : ; bn) is the vector that maximizes n X i=1
i
n X n 1X Y Y hX ; X i 2 i=1 k=1 i k i k i k
subject to
(23.40)
i 0
and
0=
X i
i Yi :
The points Xi for which b 6= 0 are called support vectors. ba0 can be found by solving
bi Yi(XiT ba + b0 = 0 for any support point Xi . Hb may be written as
Hb (x)
= ba0 +
n X i=1
bi Yi hx; Xi i:
23.10 Kernelization
461
There are many software packages that will solve this problem quickly. If there is no perfect linear classi er, then one allows overlap between the groups by replacing the condition (23.39) with
Yi H (xi ) 1 i ; i 0; i = 1; : : : ; n:
(23.41)
The variables 1 ; : : : ; n are called slack variables. We now maximize (23.40) subject to 0 i c; i = 1; : : : ; n and
n X i=1
i Yi = 0:
The constant c is a tuning parameter that controls the amount of overlap.
23.10 Kernelization There is a trick for improving a computationally simple classi er h called kernelization. The idea is to map the covariate X { which takes values in X { into a higher dimensional space Z and apply the classi er in the bigger space Z . This can yield a more
exible classi er while retaining computationally simplicity. The standard example of this idea is illustrated in Figure 23.13. The covariate x = (x1 ; x2 ). The Yi 's can be separated into two groups using an ellipse. De ne a mapping by
p
z = (z1 ; z2 ; z3 ) = (x) = (x21 ; 2x1 x2 ; x22 ): This maps X = R 2 into Z = R 3 . In the higher dimensional space Z , the Yi 's are seperable by a linear decision boundary. In other words, a linear classi er in a higher dimenional space corresponds to a non-linear classi er in the original space.
462
23. Classi cation
x2 +
++
z2
+
+ +
+
+
+
x1
+ +
+
+ +
+
+ +
+
+
+ + +
z1
+
z3
FIGURE 23.13. Kernelization. Mapping the covariates into a higher dimensional space can make a complicated decision boundary into a simpler decision boundary.
23.10 Kernelization
463
There is a potential drawback. If we signicantly expand the dimension of the problem, we might increase the computational burden. For example, x has dimension d = 256 and we wanted to use all fourth order terms, then z = (x) has dimension 183,181,376. What saves us is two observations. First, many classi ers do not require that we know the values of the individual points but, rather, just the inner product between pairs of points. Second, notice in our example that the inner product in Z can be written
hz; zei
= h(x); (xe)i = x21 xe21 + 2x1 xe1 x2 xe2 + x22 xe22 = (hx; xei)2 k(x; xe):
Thus, we can compute hz; zei without ever computing Zi = (Xi ). To summarize, kernelization means involves nding a mapping : X ! Z and a classi er such that: 1.
Z has higher dimension than X of classi ers.
and so leads a richer set
2. The classi er only requires computing inner products. 3. There is a function K , called a kernel, such that h(x); (xe)i = K (x; xe). 4. Everywhere the term hx; xei appears in the algorithm, replace it with K (x; xe). In fact, we never need to construct the mapping at all. We only need to specify a kernel k(x; xe) that corresponds to h(x); (xe)i for some . This raises an interesting question: given a function of two variables K (x; y ), does there exist a function (x) such that K (x; y ) = h(x); (y )i. The answer is provided
464
23. Classi cation
by Mercer's theorem which says, roughly, that if K is positive de nite { meaning that Z Z
K (x; y )f (x)f (y )dxdy 0
for square integrable functions f { then such a exists. Examples of commonly used kernels are:
r
polynomial K (x; xe) = hx; xei + a sigmoid K (x; xe) = tanh( ahx; xei + b) 2 2 Gaussian K (x; xe) = exp jjx xejj =(2 ) Let us now see how we can use this trick in LDA and in support vector machines. Recall that the Fisher linear discriminant method replaces X with U = wT X where w is chosen to maximize the Rayleigh coeÆcient wT S w J (w) = T B ; w SW w SB = (X 0 X 1 )(X 0 X 1 )T and (n0 1)S0 (n1 1)S1 SW = + : (n0 1) + (n1 1) (n0 1) + (n1 1) In the kernelized version, we replace Xi with Zi = (Xi ) and we nd w to maximize
J (w) = where and
SeB = (Z 0
wT SeB w wT SeW w
Z 1 )(Z 0 !
(23.42)
Z 1 )T !
(n0 1)Se0 (n1 1)Se1 SW = + : (n0 1) + (n1 1) (n0 1) + (n1 1)
23.10 Kernelization
465
Here, Sej is the sample of covariance of the Zi 's for which Y = j . However, to take advantage of kernelization, we need to reexpress this in terms of inner products and then replace the inner products with kernels. It can be shown that the maximizing vector w is a linear combination of the Zi 's. Hence we can write
w= Also,
n X i=1
i Zi :
n 1 X Zj = (Xi)I (Yi = j ): nj i=1
Therefore,
wT Z j =
n X
i Zi
T
i=1 n X n 1 X
n 1 X (Xi )I (Yi = j ) nj i=1
I (Y = j )ZiT (Xs) nj i=1 s=1 i s n n X 1 X I (Y = j )(Xi)T (Xs) = nj i=1 i s=1 s n n X 1 X = i I (Ys = j )K (Xi ; Xs) nj i=1 s=1 = T Mj =
where Mj is a vector whose ith component is
Mj (i) = It follows that where M = (M0 can write
n 1 X K (Xi ; Xs)I (Yi = j ): nj s=1
wT SeB w = T M M1 )(M0 M1 )T . By similar calcuations, we wT SeW w = T N
466
23. Classi cation
where
1 1 N = K0 I 1 K0T + K1 I 1 K1T ; n0 n1 I is the identity matrix, 1 is a matrix of all one's, and Kj is the n nj matrix with entries (Kj )rs = K (xr ; xs ) with xs varying over the observations in group j . Hence, we now nd to maximize T M J ( ) = T : N Notice that all the qauntities are expressed in terms of the kernel. Formally, the solution is = N 1 (M0 M1 ). However, N might be non-invertible. In this case on replaces N by N + bI form some constant b. Finally, the projection onto the new subspace can be written as
U=
wT (x)
=
n X i=1
i K (xi ; x):
The support vector machine can similarly be kernelized. We simply replace hXi ; Xj i with K (Xi ; Xj ). For example, instead of maximizing (23.40), we now maximize n X i=1
i
n X n 1X Y Y K (Xi ; Xj ): 2 i=1 k=1 i k i k
(23.43)
Pn bi Yi K (X; Xi ). i=1
The hyperplane can be written as Hb (x) = ba0 +
23.11 Other Classi ers There are many other classi ers and space precludes a full dicussion of all of them. Let us brie y mention a few. The k-nearest-neighbors classi er is very simple. Given a point x, nd the k data points closest to x. Classify x using the majority vote of these k neighbors. Ties can be broke randomly. The parameter k can be chosen by cross-validation.
23.11 Other Classi ers
467
Bagging is a method for reducing the variability of a clas-
si er. It is most helpful for highly non-linear classi ers such as a tree. We draw B boostrap samples from the data. The bth bootstrap sample yields a classi er hb . The nal classi er is b h(x)
=
P 1 if B1 Bb=1 hb (x) 12 0 otherwise:
Boosting is a method for starting with a simple classi er
and gradually improving it by re tting the data giving higher weight to misclassi ed samples. Suppose that H is a collection of classi ers, for example, tress with onely one split. Assume that Yi 2 f 1; 1g and that each h is such that h(x) 2 f 1; 1g. We usually give equal weight to all data points in the methods we have discussed. But one can incorporate unequal weights quite easily in most algorithms. For example, in constructing a tree, we could replace the impurity measure with a weighted impurity measure. The original version of bossting, called AdaBoost, is as follows. 1. Set the weights wi = 1=n, i = 1; : : : ; n. 2. For j = 1; : : : ; J , do the following steps: (a) Constructing a classi er hj from the data using the weights w1 ; : : : ; wn. (b) Compute the weighted error estimate:
Lbj =
Pn i I (Yi = hj (Xi )) i=1 wP : n i=1 wi
6
(c) Let j = log((1 Lbj )=Lbj ). (d) Update the weights:
wi
wi e I (Y 6=h (X )) j
i
j
i
468
23. Classi cation
3. The nal classi er is b h(x)
= sign
J X j =1
j hj (x) :
There is now an enormous literature trying to explain and improve on boosting. Whereas bagging is a variance reduction technique, boosting can be thought of as a bias reduction technique. We starting with a simple { and hence highly biased { classi er, and we gradually reduce the bias. The disadvantage of boosting is that the nal classi er is quite complicated.
23.12 Bibliographic Remarks An excellent reference on classi cation is Hastie, Tibshirani and Friedman (2001). For more on the theory, see Devroye, Gyor , Lugosi (1996) and Vapnik (2001).
23.13 Exercises 1. Prove Theorem 23.5. 2. Prove Theorem 23.7. 3. Download the spam data from: http://www-stat.stanford.edu/tibs/ElemStatLearn/index.html The data le can also be found on the course web page. The data contain 57 covariates relating to email messages. Each email message was classi ed as spam (Y=1) or not spam (Y=0). The outcome Y is the last column in the le. The goal is to predict whether an email is spam or not.
23.13 Exercises
469
(a) Construct classi cation rules using (i) LDA, (ii) QDA, (iii) logistic regression and (iv) a classi cation tree. For each, report the observed misclassi cation error rate and construct a 2-by-2 table of the form
Y =0 Y =1
b h(x)
= 0 bh(x) = 1
Hint: Here are some things to remember when using R. (i) To t a logistic regression and get the classi ed values: d <- data.frame(x,y) n <- nrow(x) out <- glm(y ~ .,family=binomial,data=d) p <- out$fitted.values pred <- rep(0,n) pred[p > .5] <- 1 table(y,pred)
(ii) To build a tree you must turn y into a factor and you must do it before you make the data frame: y <- as.factor(y) d <- data.frame(x,y) library(tree) out <- tree(y ~ .,data=d) print(summary(out)) plot(out,type="u",lwd=3) text(out) pred <- predict(out,d,type="class") table(y,pred)
470
23. Classi cation
(b) Use 5-fold cross-validation to estimate the prediction accuracy of LDA and logistic regression. (c) Sometimes it helps to reduce the number of covariates. One strategy is to compare Xi for the spam and email group.. For each of the 57 covariates, test whether the mean of the covriate is the same or dierent between the two groups. Keep the 10 covariates with the smallest pvalues. Try LDA and logstic regression using only these 10 variables. 4. Let A be the set of two-dimensional spheres. That is, A 2 A if A = f(x; y) : (x a)2 + (y b)2 c2g for some a; b; c. Find the VC dimension of A. 5. Classify the spam data using support vector machines. Free software for the support vector machine is at http://svmlight.joachims.org/ It is very easy to use. After installing, do the following. First, you must recode the Yi's as +1 and -1. Then type svm_learn data output
where \data" is the le with the data. It expects the data in the following form: 1
1:3.4
2:.17
3:129
etc
where the rst number is Y (+1 or -1) and the rest of the line means: X1 is 3.4, X2 is .17, etc. If the le \test" contains test data, then type
23.13 Exercises
471
svm_classify test output predictions
to get the predictions on the test data. There are many options available, as explained on the web site. 6. Use VC theory to get a con dence interval on the true error rate of the LDA classi er in question 3. Suppose that the covariate X = (X1 ; : : : ; Xd ) has d dimensions. Assume that each variable has a continuous distribution. 7. Suppose that Xi 2 R and that Yi = 1 whenever jXi j 1 and Yi = 0 whenever jXi j > 1. Show that no linear classi er can perfectly classify these data. Show that the kernelized data Zi = (Xi ; Xi2 ) can be linearly separated. 8. Repeat question 5 using the kernel K (x; xe) = (1 + xT xe)p . Choose p by cross-validation. 9. Apply the k nearest neighbors classi er to the \iris data." Get the data in R from the command data(iris). Choose k by cross-validation. 10. (Curse of Dimensionality.) Suppose that X has a uniform distribution on the d-dimensional cube [ 1=2; 1=2]d . Let R be the distance from the origin to the closest neighbor. Show that the median of R is 0 @
where
1
1=d 1 1=n 2 A
1
vd (1)
d=2 vd (r) = ((d=2) + 1) is the volume of a sphere of radius r. When does R exceed the edge of the cube when n = 100; 1000; 10000? rd
472
23. Classi cation
11. Fit a tree to the data in question 3. Now apply bagging and report your results. 12. Fit a tree that uses only one split on one variable to the data in question 3. Now apply boosting. 13. Let r(x) = P(Y = 1jX = x) and let rb(x) be an estimate of r(x). Consider the classi er
h(x) =
1 if rb(x) 1=2 0 otherwise:
Assume that rb(x) N (r (x); 2 (x)) for some functions r(x) and 2 (x)). Show that 0
P(Y
6= h(X )) 1
B @
sign (rb(x)
1
(1=2))(r(x) (x)
(1=2))
where is the standard Normal cdf. Regard sign (rb(x)
(1=2))(r(x) (1=2)) as a type of bias term. Explain the implications for the bias-variance trade-o in classi cation (Friedman, 1997).
C A
This is page 473 Printer: Opaque this
24 Stochastic Processes
24.1
Introduction
Most of this book has focused on iid sequences of random variables. Now we consider sequences of dependent random variables. For example, daily temperatures will form a sequence of time-ordered random variables and clearly the temperature on one day is not independent of the temperature on the previous day. A stochastic process fXt : t 2 T g is a collection of random variables. We shall sometimes write X (t) instead of Xt . The variables Xt take values in some set X called the state space. The set T is called the index set and for our purposes can be thought of as time. The index set can be discrete T = f0; 1; 2; : : :g or continuous T = [0; 1) depending on the application. Example 24.1 (iid observations.) A sequence of iid random vari-
ables can be written as fXt : t 2 T g where T = f1; 2; 3; : : : ; g.
474
24. Stochastic Processes
Thus, a sequence of iid random variables is an example of a stochastic process. Example 24.2 (The Weather.) Let
X = fsunny; cloudyg. A typi-
cal sequence (depending on where you live) might be
sunny; sunny; cloudy; sunny; cloudy; cloudy; This process has a discrete state space and a discrete index set.
Example 24.3 (Stock Prices.) Figure 24.1 shows the price of a
stock over time. The price is monitored continuously so the index set T is not discrete. Price is discrete but, for practical purposes we can imagine treating it as a continuous variable. Example 24.4 (Empirical Distribution Function.) Let X1 ; : : : ; Xn
F where F is some cdf on [0,1]. Let
1X I (Xi t) Fbn (t) = n i=1 n
be the empirical cdf . For any xed value t, Fbn (t) is a random variable. But the whole empirical cdf
n o Fbn (t) : t 2 [0; 1]
is a stochastic process with a continuous state space and a continuous index set.
We end this section by recalling a basic fact. If X1 ; : : : ; Xn are random variables then we can write the joint density as f (x1 ; : : : ; xn ) = f (x1 )f (x2 jx1 ) f (xn jx1 ; : : : ; xn 1 ) n Y = f (xi jpasti ) (24.1) i=1
where pasti refers to all the variables before Xi .
7
8
9
price
10
11
12
13
24.1 Introduction
0
2
4
6
8
10
time
FIGURE 24.1. Stock price over ten week period.
475
476
24. Stochastic Processes
24.2
Markov Chains
The simplest stochastic process is a Markov chain in which the distribution of Xt depends only on Xt 1 . In this section we assume that the state space is discrete, either X = f1; : : : ; N g or X = f1; 2; : : : ; g and that the index set is T = f0; 1; 2; : : :g. Typically, most authors write Xn instead of Xt when discussing Markov chains and I will do so as well. De nition 24.5 The process fXn : n chain
if
P(Xn
2 T g is a Markov
= x j X0 ; : : : ; Xn 1 ) = P(Xn = x j Xn 1 ) (24.2)
for all n and for all x 2 X .
For a Markov chain, equation (24.1) simplifes to f (x1 ; : : : ; xn ) = f (x1 )f (x2 jx1 )f (x3 jx2 ) f (xn jxn 1 ):
A Markov chain can be represented by the following DAG: X1
X2
X3
Xn
Each variable has a single parent, namely, the previous observation. The theory of Markov chains is a very rich and complex. We have to get through many de nitions before we can do anything interesting. Our goal is to answer the following questions: 1. When does a Markov chain \settle down" into some sort of equilibrium? 2. How do we estimate the parameters of a Markov chain?
24.2 Markov Chains
477
3. How can we construct Markov chains that converge to a given equilibrium distribution and why would we want to do that? We will answer questions 1 and 2 in this chapter. We will answer question 3 in the next chapter. To understand question 1, look at the two chains in Figure 24.2. The rst chain oscillates all over the place and will continue to do so forever. The second chain eventually settles into an equilibrium. If we constructed a histogram of the rst process, it would keep changing as we got more and more observations. But a histogram from the second chain would eventually converge to some xed distribution. The key quantities of a Markov chain are the probabilities of jumping from one state into another state. Transition Probabilities.
De nition 24.6 We call P(Xn+1
= j jXn = i)
(24.3)
the transition probabilities. If the transition probabilities do not change with time, we say the chain is homogeneous. In this case we de ne pij = P(Xn+1 = j jXn = i). The matrix P whose (i; j ) element is pij is called the transition matrix.
We will only consider homogeneous P chains. Notice that P has two properties: (i) pij 0 and (ii) i pij = 1. Each row is a probability mass function. A matrix with these properties is called a stochastic matrix. Example 24.7 (Random Walk With Absorbing Barriers.) Let
f1; : : : ; N g.
X
=
Suppose you are standing at one of these points.
24. Stochastic Processes
−50 −150
−100
x
0
478
0
200
400
600
800
1000
600
800
1000
4 2 0 −2
x
6
8
10
time
0
200
400 time
FIGURE 24.2. The rst chain does not settle down into an equilibrium. The second does.
24.2 Markov Chains
479
Flip a coin with P(Heads) = p and P(T ails) = q = 1 p. If it is heads, take one step to the right. If it is tails, take one step to the left. If you hit one of the endpoints, stay there. The transition matrix is
2 6 6 6 P=6 6 6 4
3
0 0 q p 0 0 77 0 q 0 p 0 0 7 7 77 : 1
0 0
0
0 0
0 0
0 0
0 0
0 0
q
0
p 5
0 0
Example 24.8 Suppose the state space is
1
X
= fsunny; cloudyg.
Then X1 ; X2 ; : : : represents the weather for a sequence of days. The weather today clearly depends on yesterday's weather. It might also depend on the weather two days ago but as a rst approximation we might assume that the dependence is only one day back. In that case the weather is a Markov chain and a typical transition matrix might be Sunny Cloudy Sunny 0.4 0.6 Cloudy 0.8 0.2 For example, if it is sunny today, there is a 60 per cent chance it will be cloudy tommorrow.
Let
pij (n) = P(Xm+n = j jXm = i)
(24.4)
be the probability of of going from state i to state j in n steps. Let Pn be the matrix whose (i; j ) element is pij (n). These are called the n-step transition probabilities. Theorem
24.9 (The Chapman-Kolmogorov equations.)
n-step probabilities satisfy pij (m + n) =
X k
pik (m)pkj (n):
The
(24.5)
480
24. Stochastic Processes
Proof.
Recall that, in general,
P(X
= x; Y = y ) = P(X = x)P(Y = y jX = x):
This fact is true in the more general form P(X
= x; Y = y jZ = z ) = P(X = xjZ = z )P(Y = y jX = x; Z = z ):
Also, recall the law of total probability: X P(X = x) = P(X = x; Y = y ): y
Using these facts and the Markov property we have pij (m + n) = P(Xm+n = j jX0 = i) X = P(Xm+n = j; Xm = k jX0 = i)
= = =
k X
k X k X k
P(Xm+n
= j jXm = k; X0 = i)P(Xm = kjX0 = i)
P(Xm+n
= j jXm = k)P(Xm = kjX0 = i)
pik (m)pkj (n):
Look closely at equation (24.5). This is nothing more than the equation for matrix multiplication. Hence we have shown that Pm+n
= Pm Pn :
(24.6)
By de nition, P1 = P. Using the above theorem, P2 = P1+1 = 2 P1 P1 = PP = P . Continuing this way, we see that Pn
= Pn
|
P
P {z P}
multiply the matrix n times
:
(24.7)
24.2 Markov Chains
481
Let n = (n(1); : : : ; n(N )) be a row vector where n (i) = P(Xn = i)
(24.8)
is the marginal probability that the chain is in state i at time n. In particular, 0 is called the initial distribution. To simulate a Markov chain, all you need to know is 0 and P. The simulation would look like this: Draw X0 0 . Thus, P(X0 = i) = 0 (i). Step 2: Suppose the outcome of step 1 is i. Draw X1 P. In other words, P(X1 = j jX0 = i) = pij . Step 3: Suppose the outcome of step 2 is j . Draw X2 P. In other words, P(X2 = kjX1 = j ) = pjk . And so on. Step 1:
It might be diÆcult to understand the meaning of n. Imaging simulating the chain many times. Collect all the outcomes at time n from all the chains. This histogram would look approximately like n . A consequence of theorem 24.9 is the following. Lemma 24.10 The marginal probabilities are given by
n = 0 Pn : Proof.
n (j ) = P(Xn = j ) X = P(Xn = j jX0 = i)P (X0 = i) i X = 0 (i)pij (n) = 0 Pn : i
482
24. Stochastic Processes
Summary
1. Transition matrix: P(i; j ) = P(Xn+1 = j jXn = i). 2. n-step matrix: Pn(i; j ) = P(Xn+m = j jXm = i). 3.
Pn
= Pn .
4. Marginal: n (i) = P(Xn = i). 5. n = 0 Pn .
The states of a Markov chain can be classi ed according to various properties. States.
De nition 24.11 We say that i
j (or j is accessible from i) if pij (n) > 0 for some n, and we write i ! j . If i ! j and j ! i then we write i $ j and we say that i and j communicate. reaches
The communication relation satis es the following properties: Theorem
24.12
1. i $ i. 2. If i $ j then j $ i. 3. If i $ j and j 4.
$ k then i $ k. The set of states X can be written as a disjoint union S S of classes X = X X where two states i and j 1
2
communicate with each other if and only if they are in the same class.
24.2 Markov Chains
483
If all states communicate with each other then the chain is called irreducible. A set of states is closed if, once you enter that set of states you never leave. A closed set consisting of a single state is called an absorbing state. Example 24.13 Let
X = f1; 2; 3; 4g and 0 B B P=B B @
1 0 0 C 0 0C C 1 1 C 4 4 A 0 0 0 1 1 3 2 3 1 4
2 3 1 3 1 4
The classes are f1; 2g; f3g and f4g. State 4 is an absorbing state.
Suppose we start a chain in state i. Will the chain ever return to state i? If so, that state is called persistent or recurrent. De nition 24.14 State i is P(Xn
24.15
or
persistent
if
= i for some n 1 j X0 = i) = 1:
Otherwise, state i is Theorem
recurrent
transient
.
A state i is recurrent if and only if
X n
pii (n) = 1:
(24.9)
A state i is transient if and only if
X n
Proof.
De ne In =
pii (n) < 1:
1 if Xn = i 0 if Xn 6= i:
(24.10)
484
24. Stochastic Processes
P The number of times that the chain is in state i is Y = 1 n=0 In . The mean of Y , given that the chain starts in state i, is 1 X E (Y jX0 = i) = E (In jX0 = i) n=0
= =
1 X n=0
1 X n=0
= ijX0 = i)
P(Xn
pii (n):
De ne ai = P(Xn = i for some n 1 j X0 = i). If i is recurrent, ai = 1. Thus, the chain will eventually return to i. Once it does return to i, we argue again that since ai = 1, the chain will return to state i again. By repeating this argument, we conclude that E (Y jX0 = i) = 1. If i is transient, then ai < 1. When the chain is in state i, there is a probability 1 ai > 0 that it will never return to state i. Thus, the probability that the chain is in state i exactly n times is ani 1 (1 ai ). This is a geometric distribution which has nite mean. Theorem
24.16
Facts about recurrence.
1. If state i is recurrent and i $ j then j is recurrent. 2. If state i is transient and i $ j then j is transient. 3. A nite Markov chain must have at least one recurrent state. 4. The states of a nite, irreducible Markov chain are all recurrent. Theorem
X
24.17 (Decomposition Theorem.)
can be written as the disjoint union
X = XT
[
X
[
1
X 2
The state space
24.2 Markov Chains
where XT are the transient states and each ducible set of recurrent states.
485
Xi is a closed, irre-
To discuss the convregence of chains, we need a few more de nitions. Suppose that X0 = i. De ne the recurrence time Convergence of Markov Chains.
Tij = minfn > 0 : Xn = j g
(24.11)
assuming Xn ever returns to state i, otherwise de ne Tij = 1. The mean recurrence time of a recurrent state i is X mi = E (Tii ) = nfii (n) (24.12) n
where fij (n) = P (X1 6= j; X2 6= j; : : : ; Xn
1
6= j; Xn = j jX
0
= i):
A recurrent state is null if mi = 1 otherwise it is called nonnull or positive. Lemma 24.18 If a state is null and recurrent, then pnii
! 0.
Lemma 24.19 In a nite state Markov chain, all recurrent states
are positive.
Consider a three state chain with transition matrix 2 3 0 1 0 4 0 0 1 5: 1 0 0 Suppose we start the chain in state 1. Then we will be in state 3 at times 3, 6, 9, . . . . This is an example of a periodic chain. Formally, the period of state i is d if pii (n) = 0 whenever n is not divisible by d and d is the largest integer with this property. Thus, d = gcdfn : pii (n) > 0g where gcd means \greater common divisor." State i is periodic if d(i) > 1 and aperiodic if d(i) = 1. A state with period 1 is called aperiodic.
486
24. Stochastic Processes
Lemma 24.20 If state i has period d and i $ j then j has period
d.
De nition 24.21 A state is ergodic if it is recurrent, non-
null and aperiodic. A chain is ergodic if all its states are ergodic.
Let = (i : i 2 X ) be a vector of non-negative numbers that sum to one. Thus can be thought of as a probability mass function. De nition 24.22 We say that is a variant
) distribution if = P.
stationary
(or
in-
Here is the intuition. Draw X0 from distribution and suppose that is a stationary distribution. Now draw X1 according to the transition probability of the chain. The distribition of X1 is then 1 = 0 P = P = . The distribution of X2 is P2 = ( P)P = P = . Continuing this way, we see that the distribution of Xn is Pn = . In other words:
If at any time the chain has distribution then it will continue to have distribution forever.
24.2 Markov Chains
De nition 24.23 We say that a chain has tribution
if
2 6 6 n P ! 6 4
.. .
487
limiting dis-
3 7 7 7 5
for some . In other words, j = limn!1 Pnij exists and is independent of i.
Here is the main theorem. An irreducible, ergodic Markov chain has a unique stationary distribution . The limiting distrubution exists and is equal to . If g is any bounded function, then, with probability 1, Theorem
lim
N !1
24.24
N 1X
N
n=1
g (Xn ) ! E (g )
X j
g (j )j :
(24.13)
The last statement of the theorem, is the law of large numbers for Markov chains. It says that sample averages converge to their expectations. Finally, there is a special condition which will be useful later. We say that satis es detailed balance if i pij = pjij :
(24.14)
Detailed balance guarantees that is a stationary distribution. If satis es detailed balance then is a stationary distribution.
Theorem
24.25
th P We need P to show that P P = . The j element of P is i i pij = i j pji = j i pji = j :
Proof.
488
24. Stochastic Processes
The importance of detailed balance will become clear when we discuss Markov chain Monte Carlo methods. Just because a chain has a stationary distribution does not mean it converges. Warning!
Example 24.26 Let
2
3 0 1 0 P = 4 0 0 1 5: 1 0 0
Let = (1=3; 1=3; 1=3). Then P = so is a stationary distribution. If the chain is started with the distribution it will stay in that distribution. Imagine simulating many chains and checking the marginal distribution at each time n. It will always be the uniform distribution . But this chain does not have a limit. It continues to cycle around forever.
Examples of Markov Chains
Example 24.27 (Random Walk) . Let X = f: : : ; 2; 1; 0; 1; 2; : : : ; g
and suppose that pi;i+1 = p, pi;i 1 = q = 1 p. All states communicate hence either all the states are recurrent or all are transient. To see which, suppose we start at X0 = 0. Note that
2n n n p00 (2n) = p q n
(24.15)
since, the only way to get back to 0 is to have n heads (steps to the right) and n tails (steps to the left). We can approximate this expression using Stirling's formula which says that
p
n! nn ne
n
p
2:
24.2 Markov Chains
489
Inserting this approximation into (24.15) shows that
p00 (2n)
P
(4pq )n
pn
:
P
It is easy to check that n p00 (n) < 1 if and only if n p00 (2n) < 1. Moreover, Pn p00 (2n) = 1 if and only if p = q = 1=2. By Theorem (24.15), the chain is recurrent if p = 1=2 otherwise it is transient. Example 24.28 Let
X = f1; 2; 3; 4; 5; 6g. Let
2 1 1 0 0 6 21 32 6 4 4 0 0 6 1 1 1 1 6 6 4 4 4 4 P= 6 6 41 0 41 14 6 60 0 0 0 4 0 0 0 0
3 0 0 7 0 07 7 0 07 7 7 1 0 4 7 7 1 7 1 2 2 5 1 2
1 2
Then C1 = f1; 2g and C2 = f5; 6g are irreducible closed sets. States 3 and 4 are transient because the of the path 3 ! 4 ! 6 and once you hit state 6 you cannot return to 3 or 4. Since pii (1) > 0, all the states are aperiodic. In summary, 3 and 4 are transient while 1,2,5 and 6 are ergodic. Example 24.29 (Hardy-Weinberg.) Here is a famous example from
genetics. Suppose a gene can be type A or type a. There are three types of people (called genotypes): AA, Aa and aa. Let (p; q; r) denote the fraction of people of each genotype. We assume that everyone contributes one of their two copies of the gene at random to their children. We also assume that mates are selected at random. The latter is not realistic however, it is often reasonable to assume that you do not choose your mate based on whether they are AA, Aa or aa. (This would be false if the gene was for eye color and if people chose mates based on eye color.) Imagine if we pooled eveyone's genes together. The proportion
490
24. Stochastic Processes
of A genes is P = p + (q=2) and the proportion of a genes is Q = r + (q=2). A child is AA with probability P 2, aA with probability 2P Q and aa with probability Q2 . Thus, the fraction of A genes in this generation is q 2 q q P2 + PQ = p + + p+ r+ :
2
2
2
However, r = 1 p q . Substitute this in the above equation and you get P 2 + P Q = P . A similar calculation shows that fraction of a genes is Q. We have shown that the proportion of type A and type a is P and Q and this remains stable after the rst generation. The proportion of people of type AA, Aa, aa is thus (P 2 ; 2P Q; Q2) from the second generation and on. This is called the Hardy-Weinberg law. Assume everyone has exactly one child. Now consider a xed person and let Xn be the genotype of their nth descendent. This is a Markov chain with state space X = fAA; Aa; aag. Some basic calculations will show you that the transition matrix is
2 4
P Q P 2
P +Q
0 P
2
0
Q 2
Q
3
5:
The stationary distribution is = (P 2; 2P Q; Q2 ).
Consider a chain with nite state space X = f1; 2; : : : ; N g. Suppose we observe n observations X1 ; : : : ; Xn from this chain. The unknown parameters of a Markov chain are the initial probabilities 0 = (0 (1); 0(2); : : : ; ) and the elements of the transition matrix P. Each row of P is a multinomial distribution. So we are essentially estimating N distributions (plus the initial probabilities). Let nij be the observed number of transitions from state i to state j . The likelihood function is n N Y N Y Y L(0; P) = 0(x0 ) pX 1;X = 0(x0 ) pnij : Inference for Markov Chains.
ij
r=1
r
r
i=1 j =1
24.3 Poisson Processes
491
There is only one observation on 0 so we can't estimate that. Rather, we focus on estimating P. The mle is obtained by maximizing L(0 ; P) subject to the constraint that the elements are non-negative and the rows sum to 1. The solution is pbij =
nij ni
P where ni = Nj=1 nij . Here we are assuming that ni > 0. If note, then we set pbij = 0 by convention. Theorem
24.30 (Consistency and Asymptotic Normality of the mle .)
Assume that the chain is ergodic. Let pbij (n) denote the mle after n observations. Then pbij (n) P! pij . Also,
hp Ni (n)(pbij
i
pij )
N (0; )
P
where the left hand side is a matrix, Ni (n) = nr=1 I (Xr = i) and 8 < pij (1 pij ) (i; j ) = (k; `) ij;k` = p p i = k; j 6= ` : 0 ij i` otherwise: 24.3
Poisson Processes
One of the most studied and useful stochastic processes is the Poisson process. It arises when we count occurences of events over time, for example, traÆc accidents, radioactive decay, arrival of email messages etc. As the name suggests, the Poisson process is intimately related to the Poisson distribution. Let's rst review the Poisson distribution. Recall that X has a Poisson distribution with parameter { written X Poisson() { if P (X
= x) p(x; ) =
e x ; x = 0; 1; 2; : : : x!
492
24. Stochastic Processes
Also recall that E (X ) = and V (X ) = . If X Poisson(), Y Poisson( ) and X q Y , then X + Y Poisson( + ). Finally, if N Poisson() and Y jN = n Binomial(n; p), then the marginal distribution of Y is Y Poisson(p). Now we describe the Poisson process. Imaging you are at your computer. Each time a new email message arrives you record the time. Let Xt be the number of messages you have received up to and including time t. Then, fXt : t 2 [0; 1)g is a stochastic process with state space X = f0; 1; 2; : : :g. A process of this form is called a counting process. A Poisson process is a counting process that satis es certain conditions. In what follows, we will sometimes write X (t) instead of Xt . Also, we need the following notation. Write f (h) = o(h) if f (h)=h ! 0 as h ! 0. This means that f (h) is smaller than h when h is close to 0. For example, h2 = o(h).
24.3 Poisson Processes
De nition 24.31 A
493
is a stochastic process fXt : t 2 [0; 1)g with state space X = f0; 1; 2; : : :g such that Poisson process
1. X (0) = 0. 2. For any 0 = t0 < t1 < t2 < < tn , the increments
X (t1 )
X (t0 ); X (t2 )
X (t1 );
; X (tn)
X (tn 1 )
are independent. 3. There is a function (t) such that P(X (t + h)
P(X (t + h)
We call (t) the
X (t) = 1) = (t)h + o(h) (24.16) X (t) 2) = o(h): (24.17)
intensity function
.
The last condition means that the probability of an event in [t; t + h] is approximately h(t) while the probability of more than one event is small. Theorem
24.32
tion (t), then
If Xt is a Poisson process with intensity func-
X (s + t)
X (s) Poisson(m(s + t)
where
m(t) =
Z
t 0
m(s))
(s) ds:
In particular, X (t) Poisson(m(t)). Hence, E (X (t)) = m(t) and V (X (t)) = m(t).
494
24. Stochastic Processes
De nition 24.33 A Poisson process with intensity func-
tion (t) for some > 0 is called a homogeneous Poisson process with rate . In this case,
X (t) Poisson(t):
Let X (t) be a homogeneous Poisson process with rate . Let Wn be the time at which the nth event occurs and set W0 = 0. The random variables W0 ; W1 ; : : : ; are called waiting times. Let Sn = Wn+1 Wn . Then S0 ; S1 ; : : : ; are called sojourn times or interarraival times. The sojourn times S0 ; S1 ; : : : are iid random variables. Their distribution is exponential with mean 1=, that is, they have density Theorem
24.34
f (s) = e
s ;
s 0:
The waiting time Wn Gamma(n; 1=) i.e. it has density
f (w) =
1 n n 1 w e (n)
t
:
Hence, E (Wn ) = n= and V (Wn ) = n=2 . Proof.
First, we have P(S1 > t) = P(X (t) = 0) = e t
with shows that the cdf for S1 is 1 e for S1 . Now,
j
P(S2 > t S1
t .
This shows the result
= s) = P(no events in (s; s + t]jS1 = s) = P(no events in (s; s + t]) (increments are independent) = e t :
24.3 Poisson Processes
495
Hence, S2 has an exponential distribution and is independent of S1 . The result follows by repeating the argument. The result for Wn follows since a sum of exponentials has a Gamma distribution. Example 24.35 Figures 24.3 and 24.4 show requests to a WWW
server in Calgary.1 Assuming that this is a homogeneous Poisson process, N X (T ) Poisson(T ). The likelihood is
L() / e
T
(T )N
which is maxmimized at
N b = = 48:0077 T in units per minute.
In the preceding example we might want to check if the assumption that the data follow a Poisson process is reasonable. To proceed, let us rst note that if a Poisson process is observed on [0; T ] then, given N = X (T ), the event times W1 ; : : : ; WN are a sample from a Uniform(0; T ) distribution. To see intutively why this is true, note that the probability of nding an event in a small interval [t; t + h] is proportional to h. But this doesn't depend on t. Since the probability is constant over t it must be uniform. Figure 24.5 shows a histogram and a some density estimates. The density does not appear to be uniform. To test this formally, let is divide into 4 equal length intervals I1 ; I2 ; I3 ; I4 . Let pi be the probability of a point being in Ii . The null hypothesis is that p1 = p2 = p3 = p4 . We can test this hypothesis using either a likelihood ratio test or a 2 test. The latter is 4 X (Oi Ei )2 i=1
Ei
1 See http://ita.ee.lbl.gov/html/contrib/Calgary-HTTP.html for more information.
24. Stochastic Processes
0.6
0.8
1.0
1.2
1.4
496
0
2
4
6
0.6
0.8
1.0
1.2
1.4
time
0
200
400
600
800
time
FIGURE 24.3. Hits on a web server. First plot is rst 20 events. Second plot is several hours worth of data.
1000
1200
497
10 5
X(t)
15
20
24.3 Poisson Processes
24/09
24/09
24/09
24/09
24/09
24/09
600 200 0
X(t)
1000
time (seconds)
24/09
24/09
25/09 time (seconds)
FIGURE 24.4. X (t) for hits on a web server. First plot is rst 20 events. Second plot is several hours worth of data.
25/09
25/09
498
24. Stochastic Processes
where Oi is the number of observations in Ii and Ei = n=4 is the expected number under the null. This yields 2 = 252 which a p-value of 0. Strong evidence againt the null. 24.4
Hidden Markov Models
24.5
Bibliographic Remarks
This is standard material and there are many good references. In this Chapter, I followed the following references quite closely: Probability and Random Processes by Grimmett and Stirzaker (1982), An Introduction to Stochastic Modeling by Taylor and Karlin (1998), Stochastic Modeling of Scienti c Data by Guttorp (1995) and Probability Models for Computer Science by Ross (2002). Some of the exercises are from these texts. 24.6
Exercises
1. Let X0 ; X1 ; : : : be a Markov chain with states f0; 1; 2g and transition matrix 2 3 0:1 0:2 0:7 P = 4 0:9 0:1 0:0 5 0:1 0:8 0:1 Assume that 0 = (0:3; 0:4; 0:3). Find P(X0 = 0; X1 = 1; X2 = 2) and P(X0 = 0; X1 = 1; X2 = 1). 2. Let Y1 ; Y2; : : : be a sequence of iid observations such that P(Y = 0) = 0:1, P(Y = 1) = 0:3, P(Y = 2) = 0:2, P(Y = 3) = 0:4. Let X0 = 0 and let Xn = maxfY1; : : : ; Yng:
Show that X0 ; X1 ; : : : is a Markov chain and nd the transition matrix.
24.6 Exercises
density(x = w/60) 0.0015 0.0005
0.0010
Density
150 100 0
0.0000
50
Frequency
200
Histogram of w/60
0
200
400
600
800
1000
1200
0
w/60
0.000
0.001
0.002
0.003
0.004
density(x = w/60, bw = "ucv")
Density
499
0
200
400
600
800
1000
1200
FIGURE 24.5. Density estimate of event times.
500
1000
1500
500
24. Stochastic Processes
3. Consider a two state Markov chain with states X = f1; 2g and transition matrix 1 a a P= b 1 b where 0 < a < 1 and 0 < b < 1. Prove that b a n a +b a +b lim P = b : a n!1 a+b
a+b
4. Consider the chain from question 3 and set a = :1 and b = :3. Simulate the chain. Let n 1X pbn (1) = I (Xi = 1) n
i=1
n 1X I (Xi = 2) pbn (2) = n i=1
be the proportion of times the chain is in state 1 and state 2. Plot pbn (1) and pbn (2) versus n and verify that they converge to the values predicted from the answer in the previous question. 5. An important Markov chain is the branching process which is used in biology, genetics, nuclear physics and many other elds. Suppose that an animal has Y children. P1 Let pk = P (Y = k). Hence, pk 0 for all k and k=0 pk = 1. Assume each animal has the same lifespan and that they produce ospring according to the distribution pk . Let Xn be the number of animals in the nth generation. Let Y1(n) ; : : : ; YX(n) be the ospring produced in the nth generation. Note that n
Xn+1 = Y1(n) + + YX(n) : n
Let = E (Y ) and = V (Y ). Assume throughout this question that X0 = 1. Let M (n) = E (Xn ) and V (n) = V (Xn ). 2
24.6 Exercises
501
(a) Show that M (n+1) = M (n) and V (n+1) = 2 M (n)+ 2 V (n). (b) Show that M (n) = n and that V (n) = 2 n 1 (1 + + + n 1 ). (c) What happens to the variance if > 1? What happens to the variance if = 1? What happens to the variance if < 1? (d) The population goes extinct if Xn = 0 for some n. Let us thus de ne the extinction time N by N = minfn : Xn = 0g:
Let F (n) = P (N N . Show that F (n) =
n) be the cdf of the random variable
1 X k=0
pk (F (n
1))k ; n = 1; 2; : : :
Hint: Note that the event fN ng is the same as event fXn = 0g. Thus, P(fN ng) = P(fXn = 0g). Let k be the number of ospring of the original parent. The population becomes extinct at time n if and only if each of the k sub-populations generated from the k ospring goes extinct in n 1 generations. (e) Suppose that p0 = 1=4, p1 = 1=2, p2 = 1=4. Use the formula from (5d) to compute the cdf F (n).
502
24. Stochastic Processes
6. Let
2
3 0:40 0:50 0:10 P = 4 0:05 0:70 0:25 5 0:05 0:50 0:45 Find the stationary distribution .
7. Show that if i is a recurrent state and i $ j , then j is a recurrent state. 8. Let
2 1 3 0 13 0 0 13 3 6 21 14 14 0 0 0 7 7 6 60 0 0 0 1 0 7 6 7 P= 6 41 14 14 0 0 14 7 7 6 40 0 1 0 0 0 5 0 0 0 0 0 1 Which states are transient? Which states are recurrent?
9. Let
0 1 P = 1 0 Show that = (1=2; 1=2) is a stationary distribution. Does this chain converge? Why/why not?
10. Let 0 < p < 1 and q = 1 2 q 6 6q P =6 q 6 4q 1
p. Let
3 p 0 0 0 0 p 0 07 7 0 0 p 07 7 0 0 0 p5
0 0 0 0
Find the limiting distribution of the chain. 11. Let X (t) be an inhomogeneous PoissonR process with intensity function (t) > 0. Let (t) = 0t (u)du. De ne Y (s) = X (t) where s = (t). Show that Y (s) is a homogeneous Poisson process with intensity = 1.
24.6 Exercises
503
12. Let X (t) be a Poisson process with intensity . Find the conditional distribution of X (t) given that X (t + s) = n. 13. Let X (t) be a Poisson process with intensity . Find the probability that X (t) is odd, i.e. P(X (t) = 1; 3; 5; : : :). 14. Suppose that people logging in to the University computer system is described by a Poisson process X (t) with intensity . Assume that a person stays logged in for some random time with cdf G. Assume these times are all independent. Let Y (t) be the number of people on the system at time t. Find the distribution of Y (t). 15. Let X (t) be a Poisson process with intensity . Let W1 ; W2 ; : : : ; be the waiting times. Let f be an arbitrary function. Show that 0 1 Zt X (t) X E@ f (Wi )A = f (w)dw: i=1
0
16. A two dimensional Poisson point process is a process of random points on the plane such that (i) for any set A, the number of points falling in A is Poisson with mean (A) where (A) is the area of A, (ii) the number of events in nonoverlapping regions is independent. Consider an arbitrary point x0 in the plane. Let X denote the distance from x0 to the nearest random point. Show that 2 P(X > t) = e t
and
E (X ) =
p1
2
:
This is page 505 Printer: Opaque this
25
Simulation Methods
In this we will see that by generating data in a clever way, we can solve a number of problems such as integrating or maximizing a complicated function.1 For integration, we will study three methods: (i) basic Monte Carlo integration, (ii) importance sampling and (iii) Markov chain Monte Carlo (MCMC).
25.1 Bayesian Inference Revisited Simulation methods are especially useful in Bayesian inference so let us brie y review the main ideas in Bayesian inference. Given a prior f () and data X n = (X1 ; : : : ; Xn ) the posterior density is
f (jX n) = R 1 My main source for this chapter is and G. Casella.
L()f () L(u)f (u)du
Monte Carlo Statistical Methods by C. Robert
506
25. Simulation Methods
where L() is the likelihood function. The posterior mean is
=
Z
f (j
X n)d
R
=
R
L()f ()d L()f ()d :
If = (1 ; : : : ; k ) is multidimensional, then we might be interested in the posterior for one of the components, 1 , say. This marginal posterior density is
f (1 j
X n)
=
Z Z
Z
f ( ; : : : ; k jX n)d dk 1
2
which involves high dimensional integration. You can see that integrals play a big role in Bayesian inference. When is high dimensional, it may not be feasible to calculate these integrals analytically. Simulation methods will often be very helpful.
25.2 Basic Monte Carlo Integration R
Suppose you want to evaluate the integral I = ab h(x)dx for some function h. If h is an \easy" function like a polynomial or triginometric function then we can do the integral in closed form. In practice, h can be very complicated and there may be no known closed form expression for I . There are many numerical techniques for evaluating I such as Simpson's rule, the trapezoidal rule, Guassian quadrature and so on. In some cases these techniques work very well. But other times they may not work so well. In particular, it is hard to extend them to higher dimensions. Monte Carlo integration is another approach to evaluating I which is notable for its simplicity, generality and scalability. Let us begin by writing
I=
Z b a
h(x)dx =
Z b a
h(x)(b a)
1
b a
dx =
Z b a
w(x)f (x)dx
25.2 Basic Monte Carlo Integration
507
where w(x) = h(x)(b a) and f (x) = 1=(b a). Notice that f is the density for a uniform random variable over (a; b). Hence,
I = Ef (w(X )) where X Unif(a; b). Suppose we generate X1 ; : : : ; XN large. By the law of large numbers
Ib
1 N
N X i=1
Unif(a; b) where N is
p w(Xi ) ! E (w(X )) = I:
This is the basic Monte Carlo integration method. We can also compute the standard error of the estimate s se b = p N where
P
Ib)2 N 1 where Yi = w(Xi). A 1 con dence interval for I is Ib z=2 se b. We can take N as large as we want and hence make the length of the con dence inteval very small. s = 2
i (Yi
Let's try this on an example where we know the R1 3 3 true answer. Let h(x) = x . Hence, I = 0 x dx = 1=4. Based on N = 10; 000 I got Ib = :2481101 with a standard error of .0028. Not bad at all. Example 25.1
A simple generalization of the basic method is to consider integrals of the form
I=
Z
h(x)f (x)dx
where f (x) is a probability density function. Taking f to be a Unif (a,b) gives us the special case above. The only dierence is
508
25. Simulation Methods
that now we draw X1 ; : : : ; XN
Ib
1 N
f and take N X i=1
h(Xi )
as before. Example 25.2
Let
1 2 f (x) = p e x =2 2 be the standard Normal pdf. Suppose we want to compute the cdf at some point x:
I=
Z x
1
f (s)ds = (x):
Of course, we could look this up in a table or use the pnorm command in R. But suppose you don't have access to either. Write Z I = h(s)f (s)ds where
1 s<x 0 s x: Now we generate X1 ; : : : ; XN N (0; 1) and set 1X number of observations Ib = h(Xi ) = N i N
h(s) =
x:
For example, with x = 2, the true answer is (2) = :9772 and the Monte Carlo estimate with N = 10; 000 yields .9751. Using N = 100; 000 I get .9771. Let X Binomial(n; p1 ) and Y Binomial(m; p2 ). We would like to estimate Æ = p2 p1 . The mle is Æb = pb2 pb1 = (Y=m) (X=n). We can get the standard error se b using the delta method (remember?) which yields Example 25.3 (Two Binomials)
r
se b =
pb1 (1 pb1 ) pb2 (1 pb2 ) + n m
25.2 Basic Monte Carlo Integration
509
and then construct a 95 per cent con dence interval Æb 2 se b. Now consider a Bayesian analysis. Suppose we use the prior f (p1 ; p2 ) = f (p1 )f (p2 ) = 1, that is, a at prior on (p1 ; p2 ). The posterior is
f (p1 ; p2 jX; Y ) / pX1 (1 p1 )n
X pY (1 2
p2 )m Y :
The posterior mean of Æ is
Æ=
Z
1 0
Z 0
1
Æ (p1 ; p2 )f (p1 ; p2 jX; Y ) =
Z
1
Z
0
0
1
(p2 p1 )f (p1 ; p2 jX; Y ):
If we want the posterior density of Æ we can rst get the posterior cdf Z F (cjX; Y ) = P (Æ cjX; Y ) = f (p1 ; p2 jX; Y ) A
where A = f(p1 ; p2 ) : p2 p1 cg. The density can then be obtained by dierentiating F . To avoid all these integrals, let's use simulation. Note that f (p1 ; p2 jX; Y ) = f (p1 jX )f (p2 jY ) which implies that p1 and p2 are independent under the posterior distribution. Also, we see that p1 jX Beta(X +1; n X +1) and p2 jY Beta(Y +1; m Y + 1). Hence, we can simulate (P1(1) ; P2(1) ); : : : ; (P1(N ) ; P2(N ) ) from the posterior by drawing
P1(i) P2(i)
Beta(X + 1; n X + 1) Beta(Y + 1; m Y + 1)
for i = 1; : : : ; N . Now let Æ (i) = P2(i)
Æ
1 N
X i
P1(i) . Then,
Æ (i) :
We can also get a 95 per cent posterior interval for Æ by sorting the simulated values, and nding the :025 and :975 quantile. The posterior density f (Æ jX; Y ) can be obtained by applying density
510
25. Simulation Methods
estimation techniques to Æ (1) ; : : : ; Æ (N ) or, simply by plotting a histogram. For example, suppose that n = m = 10, X = 8 and Y = 6. The R code is: x <- 8; y <- 6 n <- 10; m <- 10 B <- 10000 p1 <- rbeta(B,x+1,n-x+1) p2 <- rbeta(B,y+1,m-y+1) delta <- p2 - p1 print(mean(delta)) left <- quantile(delta,.025) right <- quantile(delta,.975) print(c(left,right))
I found that Æ and a 95 per cent posterior interval of (-0.52,0.20). The posterior density can be estimated from a histogram of the simulated values as shown in Figure 25.1. Suppose we conduct an experiment by giving rats one of ten possible doses of a drug, denoted by x1 < x2 < : : : < x10 . For each dose level xi we use n rats and we observe Yi , the number that survive. Thus we have ten independent binomials Yi Binomial(n; pi ). Suppose we know from biological considerations that higher doses should have higher probability of death. Thus, p1 p2 p10 . Suppose we want to estimate the dose at which the animals have a 50 per cent chance of dying. This is called the LD50. Formally, Æ = xj where j = minfi : pi :50g: Example 25.4 (Dose Response)
Notice that Æ is implicitly a (complicated) function of p1 ; : : : ; p10 so we can write Æ = g (p1 ; : : : ; p10 ) for some g . This just means that if we know (p1 ; : : : ; p10 ) then we can nd Æ . The posterior
511
1.0 0.5 0.0
f(delta | data)
1.5
2.0
25.2 Basic Monte Carlo Integration
−0.8
−0.6
−0.4
−0.2
0.0
delta = p2 − p1
FIGURE 25.1. Posterior of Æ from simulation.
0.2
0.4
0.6
512
25. Simulation Methods
mean of Æ is Z Z
Z
g(p ; : : : ; p )f (p ; : : : ; p jY ; : : : ; Y )dp dp : : : dp : A
1
10
1
10
1
10
1
2
10
The integral is over the region
A = f(p1 ; : : : ; p10 ) : p1 p10 g: Similarly, the posterior cdf of Æ is
F (cjY1; : : : ; Y10 ) = P (Æ cjYZ1 ; : : : ; Y10 ) Z Z
f (p ; : : : ; p jY ; : : : ; Y )dp dp : : : dp
=
B
1
10
1
10
1
2
where
B=A
\
f(p ; : : : ; p 1
10
) : g (p1; : : : ; p10 ) cg:
We would need to do a 10 dimensional integral over a resticted region A. Instead, let's use simulation. Let us take a at prior truncated over A. Except for the truncation, each Pi has once again a Beta distribution. To draw from the posterior we do the following steps: (1) Draw Pi Beta(Yi + 1; n Yi + 1); i = 1; : : : ; 10. (2) If P1 P2 P10 keep this draw. Otherwise, throw it away and draw again until you get one you can keep. (3) Let Æ = xj where
j = minfi : Pi > :50g: We repeat this N times to get Æ (1) ; : : : ; Æ (N ) and take
E (Æ jY1; : : : ; Y10 )
1 N
X i
Æ (i) :
10
25.2 Basic Monte Carlo Integration
513
Æ is a discrete variable. We can estimate its probability mass function by P (Æ = xj jY1 ; : : : ; Y10 )
1 N
N X i=1
I (Æ (i) = j ):
For example, suppose that x = (1; 2; : : : ; 10). Figure 25.2 shows some data with n = 15 animals at each dose. The R code is a bit tricky. ## assume x and y are given n <- rep(15,10) B <- 10000 p <- matrix(0,B,10) check <- rep(0,B) for(i in 1:10){ p[,i] <- rbeta(B, y[i]+1, n[i]-y[i]+1) } for(i in 1:B){ check[i] <- min(diff(p[i,])) } good <- (1:B)[check >= 0] p <- p[good,] B <- nrow(p) delta <- rep(0,B) for(i in 1:B){ temp <- cumsum(p[i,]) j <- (1:10)[temp > .5] j <- min(j) delta[i] <- x[j] } print(mean(delta)) left <- quantile(delta,.025) right <- quantile(delta,.975) print(c(left,right))
1.0
25. Simulation Methods
0.8
0.8
t(p)
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
proportion who died
1.0
514
2
4
6
8
10
0.4 0.2 0.0
f(delta | data)
0.6
0.8
dose
3
4
5 LD50
FIGURE 25.2. Dose repoonse data.
wpq6 m N 3zc N N Nuonhgeba5d0982174yxvskrtflji N
6 N N i N N pNg983eyvr w m qnha41bzxskcj N Nuod7520tfl 2
N Nq653xzvtj N N8 N m Nuphgd97yri wnoeb4skcfl N 01a2
Nd0 N96rt N wuqocl N N m Nnhgea7541xzvsj N N83y N pbkfi
Nugt N N N po2bazxvk N N q607 N N Nnhed4315crl N8s myfji 9 w
N Nut 1 Na 6 m Nhgd750 N Nqpoe92xvcrl w4 N N Nbskij N nzf N38y
ua5i N N wqpoh6bvrt N Nngd9870321zyxskflj me4c
qn6 N N wupohgba5d9821zxyvskcrtij N N e703l m 4f
qn6a5st N N N wupohgeb94321d08zxvyckrfi m 7lj
N m wuqpnogba651ed432h0987xzyvskcrtflji N
2
4
6 x
8
10
25.3 Importance Sampling
515
The posterior draws for p1 ; : : : ; p10 are shown in the second panel in the gure. We nd that that Æ = 4:04 with a 95 per cent interval of (3,5). The third panel shows the posterior for Æ which is necessarily a discrete distribution.
25.3 Importance Sampling R
Consider the integral I = h(x)f (x)dx where f is a probability density. The basic Monte Carlo method involves sampling from f . However, there are cases where we may not know how to sample from f . For example, in Bayesian inference, the posterior density density is is obtained by multiplying the likelihood L() times the prior f (). There is no guarantee that f (jx) will be a known distribution like a Normal or Gamma or whatever. Importance sampling is a generalization of basic Monte Carlo which overcomes this problem. Let g be a probability density that we know how to simulate from. Then
I=
Z
h(x)f (x)dx =
Z
h(x)f (x) g (x)dx = Eg (Y ) g (x)
where Y = h(X )f (X )=g (X ) and the expectation Eg (Y ) is with respect to g . We can simulate X1 ; : : : ; XN g and estimate I by 1 X h(Xi )f (Xi ) 1X Yi = : Ib = N i N i g (Xi ) This is called importance sampling. By the law of large nump bers, Ib ! I . However, there is a catch. It's possible that Ib might have an in nite standard error. To see why, recall that I is the mean of w(x) = h(x)f (x)=g (x). The second moment of this quantity is
Eg (w (X )) = 2
Z
h(x)f (x) g (x)
2
g (x)dx =
Z
h2 (x)f 2 (x) dx: g (x)
516
25. Simulation Methods
If g has thinner tails than f , then this integral might be in nite. To avoid this, a basic rule in importance sampling is to sample from a density g with thicker tails than f . Also, suppose that g (x) is small over some set A where f (x) is large. Again, the ratio of f=g could be large leading to a large variance. This implies that we should choose g to be similar in shape to f . In summary, a good choice for an importance sampling density g should be similar to f but with thicker tails. In fact, we can say what the optimal choice of g is. Theorem 25.5
is
The choice of g that minimizes the variance of Ib
g (x) =
jh(x)jf (x) : jh(s)jf (s)ds
R
PROOF. The variance of w = fh=g is
Eg (w ) (E (w )) = 2
2
Z
2
= =
w (x)g (x)dx
Z
2
Z Z
h2 (x)f 2 (x) g (x)dx g 2 (x) h2 (x)f 2 (x) g (x)dx g 2 (x)
2
w(x)g (x)dx Z Z
2
h(x)f (x) g (x)dx g (x) h(x)f (x)dx
The second integral does not depend on g so we only need to minimize the rst integral. Now, from Jensen's inequality we have
Eg (W ) (Eg (jW j)) = 2
2
Z
2
2
jh(x)jf (x)dx :
This establishes a lower bound on Eg (W 2 ). However, Eg (W 2 ) equals this lower bound which proves the claim. This theorem is interesting but it is only of theoretical interest. If we did not know how to sample from f then it is unlikely
:
25.3 Importance Sampling
517
R
that we could sample from jh(x)jf (x)= jh(s)jf (s)ds. In practice, we simply try to nd a thick tailed distribution g which is similar to f jhj. Let's Restimate I = P (Z > 3) = :0013 where Z N (0; 1). Write I = h(x)f (x)dx where f (x) is the standard Normal density and h(x) = 1 if x > 3 andP0 otherwise. The basic Monte Carlo estimator is Ib = N 1 i h(Xi ). Using N = 100 we nd (from simulating many times) that E (Ib) = :0015 and V ar(Ib) = :0039. Notice that most observations are wasted in the sense that most are not near the right tail. We will estimate this with importance sampling taking g to be a Normal(4,1) density. We draw values from g and the P 1 b estimate is now I = N i f (Xi )h(Xi )=g (Xi ). In this case we b nd that E (I ) = :0011 and V ar(Ib) = :0002: We have reduced the standard deviation by a factor of 20. Example 25.6 (Tail Probability)
Suppose we have measurements X1 ; : : : ; Xn of some physical quantity . We might model this as Xi = + i : Example 25.7 (Measurement Model With Outliers)
If we assume that i N (0; 1) then Xi N (i ; 1). However, when taking measurements, it is often the case that we get the occasional wild obvervation, or outlier. This suggests that a Normal might be a poor model since Normals have thin tails which implies that extreme observations are rare. One way to improve the model is to use a density for i with a thicker tail, for example, a t-distribution with degrees of freedom which has the form ( +1)=2 +1 2 x 1 2 : t(x) = 1 + 2 Smaller values of correspond to thicker tails. For the sake of illustration we will take = 3. Suppose we observe n Xi = + i , i = 1; : : : ; n where i has a t distribution with = 3. We will
518
25. Simulation Methods
take a at prior on . The likelihood is L() = and the posterior mean of is
=
R
R
Qn i=1 t(Xi
)
L()d L()d :
We can estimate the top and bottom integral using importance sampling. We draw 1 ; : : : ; N g and then
PN j L(j ) j =1 g(j ) P N L(j ) : 1 j =1 g(j ) N
1
N
To illustrate the idea, I drew n = 2 observations. The posterior mean (ccomputed numerically) is -0.54. Using a Normal importance sampler g yields an estimate of -0.74. Using a Cauchy (tdistribution with 1 degree of freedom) importance sampler yields an estimate of -0.53.
25.4 MCMC Part I: The Metropolis-Hastings Algorithm R
Consider again the problem of estimating the integral I = h(x)f (x)dx. In this chapter we introduce Markov chain Monte Carlo (MCMC) methods. The idea is to constuct a Markov chain X1 ; X2 ; : : : ; whose stationary distribution is f . Under certain conditions it will then follow that 1 N
N X i=1
p h(Xi ) ! Ef (h(X )) = I:
This works because there is a law of large numbers for Markov chains called the ergodic theorem. The Metropolis-Hastings algorithm is a speci c MCMC method that works as follows. Let q (y jx) be an arbitrary, friendly distribution (i.e. we know how to sample from q (y jx)). The conditional density q (y jx) is called the proposal distribution.
25.4 MCMC Part I: The Metropolis-Hastings Algorithm
519
The Metropolis-Hastings algorithm creates a sequence of observations X0 ; X1 ; : : : ; as follows. Choose X0 arbitrarily. Suppose we have generated X0 ; X1 ; : : : ; Xi. To generate Xi+1 do the following: Metropolis-Hastings Algorithm.
(1) Generate a proposal or candidate value Y (2) Evaluate r r(Xi; Y ) where
q(yjXi).
f (y ) q (xjy ) r(x; y ) = min ;1 : f (x) q (y jx) (3) Set
Xi+1 =
Y with probability r Xi with probability 1 r:
REMARK 1: A simple way to execute step (3) is to generate U (0; 1). If U < r set Xi+1 = Y otherwise set Xi+1 = Xi . REMARK 2: A common choice for q (y jx) is N (x; b2 ) for some b > 0. This means that the proposal is draw from a Normal, centered at the current value. REMARK 3: If the proposal density q is symmetric, q (y jx) = q (xjy ), then r simpli es to
f (Y ) ;1 : r = min f (Xi ) The Normal proposal distribution mentioned in Remark 2 is an example of a symmetric proposal density.
520
25. Simulation Methods
By construction, X0 ; X1 ; : : : is a Markov chain. But why does this Markov chain have f as its stationary distribution? Before we explain why, let us rst do an example. The Cauchy distribution has density 1 1 f (x) = : 1 + x2 Our goal is to simulate a Markov chain whose stationary distribution is f . As suggested in the remark above, we take q (y jx) to be a N (x; b2 ). So in this case, Example 25.8
f (y ) 1 + x2 r(x; y ) = min ; 1 = min ; 1 : f (x) 1 + y2 So the algorithm is to draw Y
Xi+1 =
N (Xi ; b ) and set 2
Y with probability r(Xi ; Y ) Xi with probability 1 r(Xi; Y ):
The R code is very simple: metrop <- function(N,b){ x <- rep(0,N) for(i in 2:N){ y <- rnorm(1,x[i-1],b) r <- (1+x^2)/(1+y^2) u <- runif(1) if(u < r)x[i] <- y else x[i] <- x[i-1] } return(x) }
The simulator requires a choice of b. Figure 25.3 shows three chains of length N = 1000 using b = :1, b = 1 and b = 10. The histograms of the simulated values are shown in Figure 25.4. Setting b = :1 forces the chain to take small steps. As a result, the chain doesn't \explore" much of the sample space. The histogram
25.4 MCMC Part I: The Metropolis-Hastings Algorithm
521
from the sample does not approximate the true density very well. Setting b = 10 causes the proposals to often be far in the tails, making r small and hence we reject the proposal and keep the chain at its current position. The result is that the chain \gets stuck" at the same place quite often. Again, this means that the histogram from the sample does not approximate the true density very well. The middle choice avoids these extremes and results in a Markov chain sample that better represents the density sooner. In summary, there are are tuning parameters and the eÆciency of the chain depends on these parameters. We'll discus this in more detail later.
If the sample from the Markov chain starts to \look like" the target distribution f quickly, then we say that the chain is \mixing well." Constructing a chain that mixes well is somewhat of an art. Why It Works. Recall from the previous chapter that a distribution satis es detailed balance for a Markov chain if
pij i = pjij : We then showed that if satis es detailed balance, then it is a stationary distribution for the chain. Because we are now dealing with continuous state Markov chains, we will change notation a little and write p(x; y ) for the probability of making a transition from x to y . Also, let's use f (x) instead of for a distribution.R In this new notation, f is a stationary distribution if f (x) = f (y )p(y; x)dy and detailed balance holds for f if
f (x)p(x; y ) = f (y )p(y; x): Detailed balance implies that f is a stationary distribution since, if detailed balance holds, then Z
f (y )p(y; x)dy =
Z
f (x)p(x; y )dy = f (x)
Z
p(x; y )dy = f (x)
25. Simulation Methods
−1.0
0.0
1.0
2.0
522
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
800
1000
−8
−4
0
2
4
−2
0
2
4
0
FIGURE 25.3. Three Metropolis chains corresponding to b = :1, b = 1, b = 10.
523
0.0
0.1
0.2
0.3
0.4
0.5
25.4 MCMC Part I: The Metropolis-Hastings Algorithm
0
5
−5
0
5
−5
0
5
0.0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
0.2
0.3
0.4
0.5
−5
FIGURE 25.4. Histograms from the three chains. The true density is also plotted.
524
25. Simulation Methods R
which shows that f (x) = f (y )p(y; x)dy as required. Our goal is to show that f satis es detailed balance which will imply that f is a stationary distribution for the chain. Consider two points x and y . Either
f (x)q (y jx) < f (y )q (xjy ) or f (x)q (y jx) > f (y )q (xjy ): We will ignore ties which we can do in the continuous setting. Without loss of generality, assume that f (x)q (y jx) > f (y )q (xjy ). This implies that
r(x; y ) =
f (y ) q (xjy ) f (x) q (y jx)
and that r(y; x) = 1. Now p(x; y ) is the probability of jumping from x to y . This requires two things: (i) the proposal distribution must generate y and (ii) you must accept y . Thus,
p(x; y ) = q (y jx)r(x; y ) = q (y jx)
f (y ) q (xjy ) f (y ) = q (xjy ): f (x) q (y jx) f (x)
Therefore,
f (x)p(x; y ) = f (y )q (xjy ): (25.1) On the other hand, p(y; x) is the probability of jumping from y to x. This requires two things: (i) the proposal distribution must generate x and (ii) you must accept x. This occurs with probability p(y; x) = q (xjy )r(y; x) = q (xjy ). Hence, f (y )p(y; x) = f (y )q (xjy ):
(25.2)
Comparing (25.1) and (25.2), we see that we have shown that detailed balance holds.
25.5 MCMC Part II: Dierent Flavors There are dierent types of MCMC algorithm. Here we will consider a few of the most popular versions.
25.5 MCMC Part II: Dierent Flavors
525
In the previous section we considered drawing a proposal Y of the form Random-walk-Metropolis-Hastings.
Y = Xi + i where i comes from some distribution with density g . In other words, q (y jx) = g (y x). We saw that in this case,
f (y ) r(x; y ) = min 1; : f (x) This is called a random-walk-Metropolis-Hastings method. The reason for the name is that, if we did not do the acceptreject step, we would be simulating a random walk. The most common choice for g is a N (0; b2 ). The hard part is choosng b so that the chain mixes well. A good rule of thumb is: choose b so that you accept the proposals about 50 per cent of the time. Warning: This method doesn't make sense unless X takes values on the whole real line. If X is restricted to some interval then it is best to transform X , say, Y = m(X ) say, where Y takes values on the whole real line. For example, if X 2 (0; 1) then you might take Y = log X and then simulate the distribution for Y instead of X . This is like an importancesampling version of MCMC. In this we draw the proposal from a xed distribution g . Generally, g is chosen to be an approximation to f . The acceptance probability becomes Independence-Metropolis-Hastings.
f (y ) g (x) r(x; y ) = min 1; : f (x) g (y ) The two previous methods can be easily adapted, in principle, to work in higher dimensions. In practice, Gibbs Sampling.
526
25. Simulation Methods
tuning the chains to make them mix well is hard. Gibbs sampling is a way to turn a high-dimensional problem into several one dimensional problems. Here's how it works for a bivariate problem. Suppose that (X; Y ) has density fX;Y (x; y ). First, suppose that it is possible to simulate from the conditional distributions fX jY (xjy ) and fY jX (y jx). Let (X0 ; Y0) be starting values. Asume we have drawn (X0 ; Y0 ); : : : ; (Xn; Yn). Then the Gibbs sampling algorithm for getting (Xn+1 ; Yn+1 ) is: The n + 1 step in Gibbs sampling:
Xn+1 Yn+1
fX jY (xjYn) fY jX (yjXn
+1
)
This generalizes in the obvious way to higher dimensions. Gibbs sampling is very useful for a class of models called hierarchical models. Here is a simple case. Suppose we draw a sample of k cities. From each city we draw ni people and observe how many people Yi have a disease. Thus, Yi Binomial(ni ; pi ). We are allowing for dierent disease rates in dierent cities. We can also think of the p0i s as random draws from some distribution F . We can write this model in the following way: Example 25.9 (Normal Hierarchical Model)
Pi Yi jPi = pi
F Binomial(ni; pi):
We are interested in estimating the p0i s and the overall disease R rate pdF (p). To proceed, it will simplify matters if we make some transformations that allow us to use some Normal approxmations. Let
25.5 MCMC Part II: Dierent Flavors
527
p
pbi = Yi =ni . Recall that pbi N (pi ; si) where si = pbi (1 pbi )=ni . Let i = log(pi =(1 pi )) and de ne Zi bi = log(pbi =(1 pbi )). By the delta method, bi
N ( i ; i ) 2
where i2 = 1=(npbi(1 pbi )). Experience shows that the Normal approximation for is more accurate than the Normal approximation for p so we shall work with . We shall treat i as known. Furthermore, we shall take the distribution of the i0 s to be Normal. The hierarchical model is now
Zi j
N (; ) N ( i ; i ): 2
i
2
i
As yet another simpli cation we take = 1. The unknown parameter are = (; 1 ; : : : ; k ). The likelihood function is
L() / /
Y i Y i
f ( i j) exp
Y i
1 ( 2
f (Zi j ) i
)
2
exp
1 (Z 2i2 i
i)
2
:
If we use the prior f () / 1 then the posterior is proportional to the likelihood. To use Gibbs sampling, we need to nd the conditional distribution of each parameter conditional on all the others. Let us begin by nding f (jrest) where \rest" refers to all the other variables. We can throw away any terms that don't involve . Thus, Y 1 2 f (jrest) / exp ( ) 2 i i k 2 / exp 2 ( b) where 1X b= : k i i
528
25. Simulation Methods
Hence we see that jrest N (b; 1=k). Next we will nd f ( jrest). Again, we can throw away any terms not involving i leaving us with 1 1 2 2 f ( i jrest) / exp ( ) exp (Z i) 2 i 2i2 i 1 2 / exp 2d2 ( i ei) i where
ei =
Zi i2
+ 1 and d2i = 1 1 + 2 1 + 12 i
i
and so i jrest N (ei ; di ). The Gibbs sampling algorithm then involves iterating the following steps N times: 2
draw draw 1 .. .
draw
k
N (b; v ) N (e ; d ) 2
1
2 1
.. . N (ek ; d2k ):
It is understood that at each step, the most recently drawn version of each variable is used. Let's now consider a numerical example. Suppose that there are k = 20 cities and we sample n = 20 people from each city. The R function for this problem is: gibbs.fun <- function(y,n,N){ k <- length(y) p.hat <- y/n Z <- log(p.hat/(1-p.hat)) sigma <- sqrt(1/(n*p.hat*(1-p.hat))) v <- sqrt(1/sum(1/sigma^2)) mu <- rep(0,N) psi <- matrix(0,N,k) for(i in 2:N){
25.5 MCMC Part II: Dierent Flavors
529
### draw mu given rest b <- v^2*sum(psi[i-1,]/sigma^2) mu[i] <- rnorm(1,b,v) ### draw psi given rest e <- (Z + mu[i]/sigma^2)/(1 + (1/sigma^2)) d <- sqrt(1/(1 + (1/sigma^2))) psi[i,] <- rnorm(k,e,d) } list(mu=mu,psi=psi) }
After running the chain, we can convert each i back into pi by way of pi = e =(1 + e ). The raw proportions are shown in Figure 25.6. Figure 25.5 shows \trace plots" of the Markov chain for p1 and . (We should also look at the plots for p2 ; : : : ; p20 .) Figure 25.6 shows the posterior for based on the simulated values. The second panel of Figure 25.6 shows the raw proportions and the Bayes estimates. Note that the Bayes estimates are \shrunk" together. The parameter controls the amount of shrinkage. We set = 1 but in practice, we should treat as another unknown parameter and let the data determine how much shrinkage is needed. i
i
So far we assumed that we know how to draw samples from the conditionals fX jY (xjy ) and fY jX (y jx). If we don't know how, we can still use the Gibbs sampling algorithm by drawing each observation using a Metropolis-Hastings step. Let q be a proposal distribution for x and let qe be a proposal distribution for y . When we do a Metropolis step for X we treat Y as xed. Similarly, when we do a Metropolis step for Y we treat X as xed. Here are the steps:
25. Simulation Methods
0.6 0.4
p1
0.8
530
0
200
400
600
800
1000
800
1000
−0.2 −0.6
mu
0.2
Simulated values of p1
0
200
400
600
Simulated values of mu
FIGURE 25.5. Simulated values of p1 and .
531
0
50
100
150
200
25.5 MCMC Part II: Dierent Flavors
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.0
1.0
2.0
3.0
mu
0.0
0.2
0.4
0.6
raw estimates and Bayes estimates
FIGURE 25.6. Top panel: posterior histogram of . Lower panel: raw proptions and the Bayes posterior estimates.
0.8
1.0
532
25. Simulation Methods Metropolis within Gibbs:
(1a) Draw a proposal Z q (z jXn ). (1b) Evaluate
f (Z; Yn) q (Xn jZ ) r = min ;1 : f (Xn; Yn ) q (Z jXn )
(1c) Set
Xn+1 =
Z with probability r Xn with probability 1 r:
(2a) Draw a proposal Z qe(z jYn ). (2b) Evaluate
f (Xn+1; Z ) qe(YnjZ ) r = min ;1 : f (Xn+1 ; Yn) qe(Z jYn)
(2c) Set
Yn+1 =
Z with probability r Yn with probability 1 r:
Again, this generalizes to more than two dimensions.
"!$#%'&)(+*,(+"-.#/ 0213*5476'0/(+8:9<;= >6?(@#%A3(+"-.#CBD(+A$ 9E6?(+#%3FG473(H4 &I-.J'*5(+(K-L!M1N(O9+P (+9Q-.#/8RS-5#/47UTV;+47VBDA3(+';+(K95(O-59W4 *XJY!M1?4 -.J'(+95#%9<-5(+9Q-.#/'F3Z ][ \?^`_NaWbMcOad^feg'ad^)\?_G*5(h&)(+*59i-.4]13*,4kjV#%A3#/'F]]95#%3F70%(ml,6?(+9Q-IF7n3(O959,op4 &q95478:(srYnDSVt -.#C-L!H4S&D#/"-.(O*5(O9,-=ZvuXJ3(wrYnD "-.#C-L!K4 &q#/"-.(O*5(O9,-x;O47n30%Am6N(sE1D *5 8:(O-.(O*i#%yW13 *. 8:(O-5*5#%; 8R4MA3(O0zTYH{U|i}>~TMH13*,476D 63#%0/#C-L!@A3(O395#C-L!&)n33;O-5#/47iTM*5(+F *5(+9,95#%47R&)n3';O-.#%47iTY47* y13*5(OA3#/;h-.#/4 &)4 *y&)nV-.n3*,(jS 0%n3(
4 &i95478:(*. 3A'478jS *,# 630%( Z +VkdSk.VYvIG"d >7VkHOQ.]kG W¡ O'¢ £¤NSWSK 7¥ §¦kwR H¨Y©SODª""«dk<@¬3ªYk. `+£x<¦dpOQ.]k y¢ "S7d"X'§¦d YkQ@, <X¥Yd"V®S¥`Y "¯ 7£ °²± \?_´³xµ2bM_ ± b¶c+b"aK·s¸H#/9X@9,(O-X-5JDS-X;O47"-§ #%39wmrYnD "-5#%-L!G4S&2#/"-.(O*5(+9Q- H¹ #C-.J 95478:(E1'*5(+9,;+*,#/6?(+AG13*,476D 6'#/0/#C-L!ºI»$¼½ZI¾D47*½(h¿' 8:130/(ST ¹ (]8:#/F7J"- ¹ SY-s-.4@;O4739Q-.*5n';O :#/"-.(O*,jS 0q·s¸KÀÂÁfÃDħÅhÆx-5JDS-Ç-.*5 139½ :n33ÈY34 ¹ G1DS*. 8:(O-5(+* É"Ê 1N(O*½;O(+"-Ë4 &-5J3( -.#%8R(SZ LÌ RÍ?δÏI\DadÍÐbMc=^fcWakbMcOak^`_2Ñ2T ¹ (X9Q-§ *Q- ¹ #%-5Jy95478:(wA3(h&f n30C-x-.J3(O47*,!pÒH;=S0/0/(OAyp_UÓÐÔ`Ô ÍNδÏI\DadÍ2bMc=^`cwÒ: 'A ¹ (E 9,È#%&-5J3(
< ¦dE¥zÙØ5¥/OQ7Ú7dÛ §¦dR"ª"¯¯q¦k" §¦d+§Ü X3ªd]Ýv¦"¥fEÞÐS=ß d+Q7£HàX§¦d¥IS¥`K ß d'¯ 7ÛSK :Ø×O+Sk.dÛ §¦d¶"ª"¯¯ Ü$"¥GØ`f"¯áÚdÛ â¥/×ãL+hs§¦dy"ª"¯¯Ú£äÜ
º7º Y© "K"¯ 3£ QÝVÚ 3¯áÚ"7dÛD£ º]» ¼ 0%74 FqÁ ¼xÆ ¸À $Àº
*
,+ .-0/
"!"# %$&'(&' )#
Ä +Ä ¸ s(+*,347n30%0/#`Á ?Æ ¢ À ·s¸GÀ Á 3¢ H¸ » ,¸3Ä '¢ ¸ ,¸"Æ 3¢ ¸K» 3¢ ¸DÁº» 3¢ ¸"Æ
436587:9;9=<:>:?%@BA DCCC @ ,EGF .H "I ?KJLD[X\^]_?a`N>:KMfQSgh`iM'?B<'j H Qe> H A @ [ \ 9=?K]rUV?KJ=MUV H xw H y w #z%{ ?K]V? A [R<2MO?V>KM)M { ?J=7dSd { 9=KQe>M { `iM)M { ?2UVj`iQS]_M { `iM wa| ~} P k
A Qe>V H 8 H
z ?hUa`iJ]_??VUKMYM { ?#J=7dSd { 9=KQe> z%{ ?KJ H H P | Qe>td`i]?K]tM { `iJ[ { ?v7;>KMfQ sUa`iMQ:? 9b]_<:UV? I 7]_?V> z QSdSdcK?#M { ?>K7c?VUKM <'j M { ?JL?
M8j? z
:
21
U { `K9bMO?K]_>D[
¡G¢h£¤X¥§¦G¨ ¥©£Dª
«¥©£¢r¤
´(O- Ä OÄ ¸6N( | DA -§1N4 #/"-&)*5478 9,478:(:A'#/9,-5*5#%63n'-.#%47¡~Z 1?47#/"(+9Q-.#%8S-547* ¢ ¸ 4 &vy1D *5 8R(h-.(O* %# 9X95478:(]&)n33;h-.#%47¶4S& Ä +Ä ¸MP ¢ ¸HÀ Á Ä +Ä ¸7Æ (KA'(OBD3( 7äd Á ¢ ¸"ÆÇÀ dÁ ¢ ¸"Æv» Á 'ZCºdÆ -.46?(:-5J3(63#S9p4 & ¢ ¸VZ (9.=!>-.J3S- ¢ ¸#/9KÓi_ v^)gDc+bMµ #%& pÁ ¢ ¸7ÆpÀ Z <M6'# 9,(+AVt 3(O959À HQ À ]Á 3¢ ¸7Æ@À pÁ Æ@À '¢ ¸ Q¢ À Т ÁQº» ?¢ Æ Á 3¢ ¸7ÆÇÀ ÐÁº» ?Æ @BA CCC @
¬
Px
r®
@BA CDCC @
°¯
@BA CDCC @
KC
±
²´³
h±
µ
¶
©²
¸
h·
©®
G b¹ º
R»
¼ ¶
¼
¿¾
À
KC
½¼N¼ 8¼ µ Z
Á
*
²
H
Ë
À
H
A m p p ,+ ðÄl?KM´@BA CDCC @ ÅEÆF `iJ I ed ?KM H P k @ [Ç { K? J .H p A m p P k Qe>r7JbcDQf`N>:? I [È { ?0>KM`iJ I `i] I ?K]v]_ ² @ H >:< H P [ É H ÊH a P©[# { ?V? >KMQSgh`iMO? I >KM`iJ I `i] I ?KK] ]_ ¿ H H ©
ººÊ uXJ3( rMn3 0/#C-L!®4 &R 1N4 #/"->(+9Q-.#%8S-5(#/9>9,478:(O-.#%8R(O9> 9,95(+9,95(OA 6"!®-.J'( ebMg3_ c UÓ2g dbMµ b d\ MT'47* HT3A3(hBD3(OA6"! À dÁ ¢ ¸» Æ <(O;= 0%0-.J3S- kÁ Æs*,(O&)(O*59X-.4:(h¿V1?(+;O-.S-.#%47 ¹ #%-5Jâ*,(+9,1N(O;O-<-.4-.J3(A3#%9,-.*,#/63nV-.#/4 ¸ xÁ Ä OÄ D¸ ÆÇÀ xÁ Æ -.JD-pF7(+'(+*.-.(+A¡-.J'(yADS-§VZKÌ×-A'4V(O9p34 -p8:(= ¹ (: *5(mkj (+*5 F7#/'F4dj (+*KGA3(O395#C-L! &)47* Z <¦dO"¥S À 7äd Á ¢ ¸YÆ dÁ ¢ ¸YÆ Á VZå"Æ
)!&'0 Y$ & &K!"
¼
½¼¼ 8¼
| C
L* °²Y³
²³
A CCC
{ ? L*
¸
p.q
A
p
¸
Ua`iJRcK? z ]vQSMMO?KJR`N>
| y
*
À³
vC
µ
v} ´(O- ¸HÀ Á ¢ ¸7ÆhZIuXJ3(O dÁ ¢ ¸]» Æ À dÁ ¢ ¸» ¸ ¸p» Æ À dÁ ¢ ¸» ¸7Æ 3Á ¸p» Æ dÁ ¢ ¸» ¸7Æ dÁ ¸p» Æ À Á ¸p» Æ dÁ ¢ ¸p» ¸"Æ À 7äd Á ¢ ¸7Æ <¦dO"¥S 7 , $¢ ¸ ¢ ¸ » v} âÌ×& 7äd â 3A Q -5J3(+Tv6"!¡uXJ3(+4 *5(+8 'Záå'T 'ZRÌ×&)470/0%4 ¹ 9x-.JDS- ¢ ¸ » ZÁ (+;+ 0/0DA'(OBD3#C-.#%47 'Záå'ZÚÆ@uXJ'(W*,(+95n'0%-Ç&)470/0%4 ¹ 9v&)*5478 1D *,-EÁ)6qÆ 4 &iuXJ3(+47*,(+8 'Z 3Z Y© "H"¯ 3£ Á 3¢ ¸7ÆIÀ Q À ÐÁº<» qÆ 63#/ 9À » $À 3¢ ¸ » 3¢ ¸
"¬
³
"!
|
²Y³
²Y³
|
)y
| y
²Y³
| ²Y³
| y²´³
| y
j
º
$#&%
Qe>V
¹ º
|
`iJ I
KCÊË º
`N> P
'
º
M { ?KJ
(
Qe>0UVKQe>KMO?KJ=M´M { `iM
[
º
º Ã Ë
*)+
/-
*
'
À
|
y²Y³
+
'
,
º
µ
'
L*
º
.-
?KM7]vJ=QSJ#M'<M { ?UV
1032
>:<XM { i ` M °H 6H &' `iJ I Qe>V H eQ >t` UVKQe>KM'?KJ=M?V>KMfQSgh`iM'
'
H
ÅH a P
º
'
[54?KJLUV?V H
¹ º
H
H "M { `iM
º7º
-
"!"# %$&'(&' )#
s4739,#/A3(O* AD-§y1D #%*59KÁ Ä.
Æ Ä OÄdÁ ¸Mħ
D¸"ÆË&)*,478 *5(OF7*5(O959,#/47â8R4MA3(O0
À®xÁ Æ ¹ J3(O*5( pÁ ÆÀ G 'A #/9E $n33ÈY34 ¹ ¡*5(OF7*5(O95#%47&)n33;h-.#%47UZ W$(+9Q-.#/8RS-5(@4 &Ë #/9 k¢ ¸DÁ NƽÀ kj (+*5 F7(4 & 0%0´
95n3;§Jâ-.JDS- » &)47*y9,478R( 'Z <4 -.#%;+(G-5JDS-&)47*y(Oj (+*Q! ¹ (âJD=j7( n33ÈY34 ¹ 1DS*. 8:(O-5(+* xÁ NÆS3AS¶(O9,-.#%8-.47* k¢ ¸DÁ NÆhZ½uXJ3(m^f_NakbYÑ dg'akbMµ ebYgD_ c ÓÐg dbMµ b d\ #/9XA'(OBD3(OA6"! À Á d¢ ¸3Á NÆ5Æ
À ´ 7 Á d¢ ¸3Á NÆ,Æ Á k¢ ¸DÁ NÆ5Æ
À 7äd Á k¢ ¸DÁ NÆ5Æ Á d¢ ¸3Á NÆ,Æ
À #/"-.(OF7*.-.(+A63#/ 9 #/"-.(OF7*.S-5(+AjS *5#/ 3;O( Á 'Z YÆ 8R 0/0Vj S0/n3(O9Ë4 & ;= n39,(E9,8 0%0363#/ 9Ë6'n'-s0/ *5F (wj S*5#S3;+(SZ 2 *5F7(jS 0/n'(+9Ç4 & ;+ n395( 0S*5F7(Ë63# 9i63n'-I9,8S0/0Yj S*5#S3;+(SZvuXJ3#%9x0/(+ A39v-54E-.J3( i^)gDc 3g ^)gD_ ± bya dgDµ2bY\ -5JDS¹ ( ¹ #%0/0UA'#/95;On3959#%âA'(O-§S#/0 ¹ J3(O ¹ (H;O4kj7(+*<347'1D *.S8R(h-.*5#%;]*5(OF7*5(O959,#/47UZ ÌL8R "!;= 9,(+9OT3H1?47#%Y-½(O9,-.#%8-.(JD 9Ë GS9,!M8R1V-.471'-5#/; W47*,8S03A3#/9Q-.*,#/63n'-5#/47T -.J3S-#/9OT ¢ ¸ Á Ä Q Æ Á 'Z Ê Æ uXJ3#%9X#/9<m&f ;O- ¹ ( ¹ #%0/0N4 & -.(O¶8R È (pn'95(K4 &Z P
»
@BA
A CCC
p
² w
p
@
p
@
yxw
p
®
'
p
p
@
'
8¼
L*
L*
|
|
¼¼ 8¼
L*
| y
¼
#y
À
~À
|
|
| y
iC
µ Ã
©¬
"¶
| KC
¼
:¼
µ
~¢h¤ "!$#¤&%'# ()# ¥©¨ ®ºI»¼ ± \?_´³xµ2bM_ ± b^f_Nakb½¼*3 g3Ô&)47*sK1D *5 8R(h-.(O* #/9Ë #%"-.(+*QjS 0?·s¸HÀ Á`ÃDħÅhÆ ¹ J3(O*5(ÃÀ Ã?Á @BA ÄDCCC+Ä @ ¸7ÆHS3A ÅGÀ ÅÁ @BA ÄCCDCOÄ @ ¸7ÆH *,(&)n33;h-.#%4739@4S&X-.J3(GA3S-§ 95n';§J¶-.J3S+ ³dÁ -, ·s¸7Æ/. º<»¼ËÄ &)47*S0/0 -,10 C Áµ'Z "Æ ÌL ¹ 47*,A39+TvÁ`ÃDħÅhÆs-.*5 139 m¹ #%-.Jâ13*,476D 63#%0/#C-L!º<» ¼ËZ±(K;=S0/0xº<» ¼-.J3( ± \2? b½¼dg3Ñqb
-
4 &p-5J3($;+4 'BDA3(O3;+(>#%Y-5(+*Qj S0zZ s4 8R8:4730C!7TX1?(+47130%(n39,( É"Ê 1?(+*;+(+"-;O47'BDA3(O3;+( (»
ºº #/"-.(O*,jS 0%9 ¹ J3#/;§J ;+47*,*5(+9,1N4 3A39-.4 ;§J34V4 95#/'F¼ À Ê Z <4 -.(SP·s¸ ^fc dgD_еÐ\?e gD_е ^fc³ ´bMµ ]Ì×& #/9yj7(O;O-.4 *<-5J3(+ ¹ (Kn395(H;+47'B3A3(+3;O(@9,(O-mÁf95n';§J>y9513J3(O*5( 47* â(+0%0/#/1'95(kƽ#/39Q-.(=SA¶4 &v #/"-5(+*,jS 0`Z g _Ð^`_2Ñ "d +VdQ§¥`ªdO´ ]7¥Y+Sk+'k¨V"SdOwÚkS¥%Y¯§)"¥M"ª""¥S¯ dOHS¥YWO¥ £ àpYO 'ÇO'ª$O'¯¯ +Odh YkQ$Yd+'d§¥zªdh: 7¥WOSk +Vk¨'"d+@Úk¥/"¯ )"¥´"yª""¥¯k+S¥YWO¥2 d£ N'ªK+'k."ªdE§¦7 ÇhO'd§¥zªdO.dÛ O'k¨V"Sd+kS¥%Y¯ )"¥Q+¬3ªdd+¡ yª""¥¯k+ S¥YWO¥H Ä Ä ¦dS S¥½+kp+Vª"¥<kS¥/"¯ ]Ú¯áN§¥YR§¦d §¥zªdS¥ "EhS¥]"¯ªd £ ¦dS¥/ d dO+ Úk§¥/V3ªdO§¦d "= ¥7+d.dÛ:§¦d,YW ©M7¥`EkX dS¥Ð"d: dS¥×£ Y© "H"¯ 3£ G"!" &'t
$ %$
'
¼
C'
¼
®
®
-
A
0
6Â
0
|
A
*
+
|
CCC
0
0
Yui?K] I ` JL? z >O9 `K9=?K]_>]_?O9=D[ D` M { `i M 9=?K]UV?KJ=M<'j M { ?9= z QSM { i7J D> [
>K7 i ` dSd :?V?`t>KM `iMO?Kg2?KJ=Md Q"!;?#M { eQ >Y9=`UVUK7]`iMO?MO< z QSM { QSJ%$#9= 9=?K] UV?KJ=MX<'j6M { ?MQSg2?D[ { ? `i]V?Ê>D` QSJxM { `iM µ ( Qe>` &' 9=?K] UV?KJ=M &'
7å
VU z QSM { i7J >D[ %Sj z ` K? ui?K] iI ` jKM<'j z QSdSd´UVQe>#Mf]v78?r?Kui?KJRM { KMfQSgh`iMfQSJ` I Q *?K]_?KJ=M,+D7 `iJ=MQSM - ` I Q *?K]_?KJ=M 9=KMfQ
Ð-.(+*OT ¹ ( ¹ #%0/0?A3#/9,;+n39,9 sk! (+9,# â8R(h-.J34MA39X#% ¹ J3#/;§J ¹ (p-5*5(=- 9X#%&Ð#%- ¹ (O*5( R*5 3A3478jS *,# 630%(K 3A ¹ (mA34R8SÈ (@1'*5476DS63#/0%#%-L!G9Q-§S-5(+8:(+"-.9] 6N4 n'- ZÌL$1D *t -.#%;+n30/ *+T ¹ ( ¹ #/0/0?8R È (]9Q-§S-5(+8:(+"-.9s0/#/ÈS(lQ-5J3(]13*,476D 6'#/0/#C-L!y-.JD- #%9X·s¸MT3F7#%j (+G-5J3( ¬
F
7º º ADS-.'T´#/9 É7Ê 1?(+*;O(+"-=Z o<4 ¹ (Oj (+*OTi-.J'(+95( w=!7(O95#/ #%"-.(+*QjS 0/9p*,(O&)(+*p-54âA'(+F7*,(+(ht×4 &t 6?(+0/#%(O&s13*54763 63#/0%#%-5#/(+9OZuXJ3(+9,( w=!7(O95#/ #/"-.(O*,jS 0%9 ¹ #%0/0Ç34 -+Tv#/F7(+'(+*.S0zTi-5*. 1-.J3( 1D *5 8:(O-.(O* É"Ê 1N(O*;+(+"-<4 &Ð-.J3(p-5#/8:( Z Y© "K"¯ 3£ ·s¸âÀ Á 3¢ ¸H» ,¸VÄ 3¢ ¸ Q¸"Æ ¸ À 0/47FqÁ ¼vÆ 3Á Æ + Á , ·s¸7/Æ . º» ¼ ·s¸ º» ¼ E98:(+"-.#%473(+Aâ(=S*50/#%(+*OT31N4 #/"-(+9Q-.#/8RS-547*59w4S& -.(+âJD=j7(H0/#/8:#%-5#/3F W4 *58R 0UA3#%9Qt -.*,#/63nV-.#/4 UTY8R(+ 3#/'FH-.J3S-s(OrYnDS-.#%47¡Á 'Z Ê Æ½J3470%A39+TY-.J3S-w#%9+T ¢ ¸ Á Ä ,¢ Æ ZxÌLG-.J'#/9 ;=S95( ¹ (K;=S¶;O4739,-5*5n3;h-@Á` 131'*54=¿V#/8RS-.(=Æs;+47'B3A3(+3;O(H#%"-.(+*QjS 0/9 9X&)470%0/4 ¹ 9+Z ¦d+"¥ v"¥zE"¯ ß. dQ+âÝ'k¨V"SdO kS¥%Y¯Ú£ ¢ ¸ Á Ä Q¢ Æ +
À ÁQº<» Áf¼ Æ5Æ {U|Ð} + Á ƽÀ ¼ Á» ÆÇÀ º½»¡¼ Á VÄ=ºdÆ ·s¸HÀ Á ¢ ¸» Q¢ Ä ¢ ¸ Q¢ Æ Á 'Z 7Æ + dÁ , ·s¸ Æ º<» ¼ Á 'Z "Æ v} ´(O-I¸ À Á ¢ ¸» Æ ,¢ Z Ë! 959,n38R1V-.#/4 x¸ ¹ J3(O*5( Á 'Ä=ºdÆ ZW(O+ 3;+(ST + dÁ - , ·s¸7Æ À ¢ ¸p» ,¢ ¢ ¸ Q¢ + À +#" v» ¢ ¸,¢ » ! » %$ À º» ¼ ¾D47* É"Ê 1N(O*x;+(OY-Ç;O47'BDA'(+3;O(<#%"-.(+*QjS 0/9OT"¼¡À Ê 'A À º É W0%(= A3#%3F -.4H-.J3( 1'13*54=¿V#/8RS-.( É"Ê 1?(+*w;O(+"-;+4 'BDA3(O3;+(#/"-5(+*,jS 0 ¢ ¸ ,¢ Z ( ¹ #/0%0?A'#/95;On3959 -.J'(H;O4739,-5*5n3;h-.#%47¶4S&x;+4 'BDA3(O3;+(m#%"-.(+*QjS 0/9#/8:47*5(F7(O3(+*5 0/#C-L!#/-.J'(H*,(+9,-W4 &i-.J3( 6?4V47ÈqZ :µ
"!"# %$&'(&' )#
rF
BF
*
,+
w |
vJ
M{ B ? UV:?KMfMfQSJN de?KM a P [ ]_,QSJL? +D7 `id QSM
w H yÈw 0z%{ ?K]V? [ $0/0QSM8j,M { `iM H
%
- '
H
j
Qe>t`
UV
®
|
fµ
Äl?KM
Z3Ê587:9;9=<:>:?M { i ` M A
cK?M { ?
Qe>V
|
Äl?KM
<'j#`r>KM`iJ I `i] I
`iJ I
y
|
|
KC
| [ M { `iM [ ,' µ
{ ?KJ
³
¬
º
a
C
µ 4µ
ÇF
E
'
³
³
)y
|
³
|
º
C
|
|
|
|
Ë
'
C'
(
|
iC
±
-
ººÉ
©)! $& $ $l &'
¸ Y© "H"¯ 3£ ¸ Ä +Ä ¸ s(O*5347¸ n'0/0/#`Á ?Æ 3¢ ¸@À :Á 3¢ ¸7Æ À :Á ÆÀ ÐÁQº:» ?Æ¡À ÐÁº:» qÆÀ 2ÁQº:» Q¢ À 3¢ ¸3ÁQº» 3¢ ¸7Æ I, À ÐÁºX» ?Æ ?Æ 3¢ ¸Á ´Ä ¢ Æ ºÇ» ¼ ¢ Q¢ À ¢ '¢ ¸3Áº» 3¢ ¸"Æ
p Am p.q + oÄl?KM @BA CDCC @ E(F `iJ I de?KM H P k H A @ [2 { ?KJ p m p.q m p.q Plk| Pk| Plk|VP H À H oH oH H A À @ A H ÅH a P `iJ I H a P©[ M { ? Y?KJ=M]`id H a P©[ 4?KJLUV?V H H Ä QSgQSM { ?V
H H H ( H ( C | | P *
Y z QSM { M { ?xUV ?½`ig 9bde?D[ { ? :? I SQ J=M'?K]vuN`idYQe>2> { K` 9;9b]_<
QSgh`iMO?Kd - d`i]?2>D`igT
9bde? X / UV
$#¨£¨
¢¥
#¨ ¥£¤
½47n3*Ç;+470%0/(+ F7n3(wJ3 9 ¹ *5#%-,-.(O R 0%F747*5#C-.J38 &)47*I9,#/8mn'0S-5#/3F 3#/139Ç4 &]&f #/*x;O47#/Z ½47n *,(R9,n3951'#/;+#%47n39H4 &X!747n'*@;+4 0/0/(+ F7n3( á9K13*,47F7*.+ S8R8:#/3F9,ÈM#%0/0%9+Z ´(O- Ä hÄ ¸ A3(+'4 -.( 9,#/8mn'0S-5(+A ;+47#% 3#/139: 3A 0/(h- À Á ÀºdÆ Z ´(O- RA'(+34 -5(¶-5J3( J"!V1?4 -5J3(+9,#/9-.JDS-y-5J3(95#%8mn30-.47*H#/9y;+4 *5*5(O;O-R 3A 0/(h- A3(O34 -5(-5J3(J"!V1?4 -5J3(+9,#/9 -.JD-p-.J3(y9,#/8mn30/S-.4 *E#/9E34 -p&f #%*+Z K#%9p;= 0%0/(OA-.J3(:_UÓÐÔfÔWÍNÎUÏI\qakÍÐbMc=^fc 3A #%9 ;= 0%0/(OA-5J3(gDÔ akb S_2g'ad^ ?bÍ?δÏI\DadÍÐbMc=^fc Z (;= ¹ *5#C-.(]-.J'(KJ"!M1?4 -.J3(O95(O9] 9
¬
P
@
,H
¬
%!
@A DCCC @
"!$#
A
&!$#
'!
½¼
A
±
!$#
' '
H
A H)(
!
:
C
' '
¬
*!$#
+!#
A
H
¾
±
.-
#%
%
6¤6£ «10
/
23
A
|
C
,
#¤&!6£
54
+
'n *mA3(OB33#%-5#/474 &;+47VBDA3(+';+(#/"-5(+*,jS 0½*5(OrYn3#/*,(+9@-5JDS- dÁ , ·s¸7Æ . º» ¼ &)47*]S0/0 ),0 Z E$ÏI\?^f_Na R^`c+b¡gDchÎ2eÏ2ak\Dad^ ± ;+47'B3A3(+3;O(y#/"-.(O*,jS 0Ð*5(OrMn'#/*5(O9E-.J3SÁ
®
76
³
º + /0 #%8 #/V&z¸ Á , ·s¸ Æ). º@» ¼ &)47*yS0/0 , 0 Z W+ Ói_Ð^ z\ e gDcOδeÏÐa=\qak^ ± ;+4 'BDA3(O3;+(y#%Y-5(+*Qj S0v*5(OrYn3*5(O9]-.JDS-E0/#%8 #/'&`¸ #/'& kÁ ), ·s¸7Æ . ºW» ¼ËZpuXJ3( 131'*54=¿V#/8RS-.( <47*58R 0Ct×6D 9,(+A#/"-5(+*,jS 0´#%9:1?47#%Y- ¹ #%95(K 9Q!V8:1'-54 -.#%;p;+47'B3A3(+3;O(@#%Vt -.(O*,jS 0`ZxÌLF7(+3(O*. 0`T'#C-X8R#%F7J"-w34S-6N(mn3'#%&)47*,8 9Q!M8R1'-54 -.#%;];+4 'BDA3(O3;+(K#/"-.(O*,jS 0`Z
'
"!"# %$&'(&' )#
³
b¼
®
³
³