Encyclopaedic Companion to Medical Statistics Second Edition
Edited by BrlanS.Ev.nt Professor Emedtus, KIIig's CoIIege,. London, UK
and ChrIstopher R. Palmer
DItecIot Of the CenIte for App1Ied MetJIt:!II StatIstics, UnIvetsIty of CsmbIIdgs, UK
With a Foreword by Richard Horton
~WILEY AJohn Wiley and Sons, Ltd, Publicalian
This editiaa 6rst palllisbcd 2011 l' 2011 . . W'1Iey a. Saas. Ltd
I14lJ1tm1l1jJ1R
.
JaIaa Wiley a: Saas IJd., The Alrium,.Soulbma ~ CIUc:hestcr. West Suaex, P.OI9 ISQ. ",lted KiQIIIom Far demOs 01_ PaW editOrial a8Iccs, ·far CIIIIaaIa' ~I aacI far iDformIIdaD . . bow copyriPt IIIIIaW iD dIis book please see oar wbsik: at www.wiley.ODID.
to....,. far
permissioa to mase tha
I,..
The riPI of abe IIIlbar: to be ideati8ed as the IIIIhar or this ?laik has beat utCdcII ill ~ with 1hc CClpyricM, Dcsipu aad PIIInIs Act
All riPIs raerwd. No part of Ibis publiadi_ may .., n:pmduI:cd. stmaI ia a Idrieval SJStaII, GI' tnInsIaittaI, in lIlY farm GI' by Ibc UK .CapyrqIII~ Dcsipi
10)' . . . . . .~ 1DCICI.aica1, pbolocapyiD,l. m:CIIdiag Dr adawise., czccpt as pa1IIiUai by IUd PU:ats·Ad 1988, wiIhaut abc prior ~ of abe pubUsher.·· . .
Wiley also publishes its boab fa a ~ of elOCInIaic fOlllllll. Same C'OIIIeDt dial appears in priDt may ~ .., ·a\llilable iD ckcIraaa"e books. DcIipaIians used by compIIIics 10 cIistinpi'lb 1hcir pnIducIs In oftca clabncclll tradcmub.. AlIIaad DIIIIICS IDCI pmdud _ Used in dIis book an: InIIIc: Dillie&, scrvD -ks, Indc:IiiIub ar.staaI bldelDlrb of Ibeir 1apCCtiw: __II.. The pJbIidM:r Is DOl IlSOCialed wida aa)' pmduca or veaclar IDeIItiaDad in dill book. 'dais public:adaa iI.Biped to pnwide accurIIe aad authmi~ . iDfonnaIiDn ill IqIId to subject maIIa ~ Il is sold _ die .......ini thai the- publisher il DDt cappel iD tadi.iq ...,reaiaaallClvic:cs. If JIR)fasiaaaI acIvice or ada aped usisIaace is .aiui1ed..1hc ~ of a ~ profcaional ~ be BDqbt.. .
*'
Librory"fl/CIIIIBWU OIlIllo'''''''·l'IIbIicaliDn Dtlltl The CDC)'CIopaadic campaaiaD to ..... staIiiIiI:s I cdiIaI by Brim S. Emtt aad CJn1apIaer R. PahDc:r; with • fcRwanl by RicbanlIIadaIL - 2IId ell.
p.;cm.
lacludes IIiIIIiDpapIIic n:f'cmxcl. ~ ..... Elicyclopaalic Campaaian .10 MtcUc:.I Stlllisties, ....i · rcadibIe ~ or almost 400 sbdistical topicl ceatial to cam:III medical n:sean:b. Each catry _1Icca wriIIat by aa iDdiYidilal chasca far bath 1hcir expc:rIiIcia 1hc field aad tbdr abilitY to.CDIIUIIIIIIicale sbdilitical aIIICCpIS succ:cssrully 10 madical mtai"'lI.. Real a .....cs tiUm 1hc bIomcdicaI ~ ......... ililstndioas feam mID1ft)' entries, aad ~ cnu n:fmnciDI sipposts Ihc iader to ....,. CIIlrics"'-PRJvidai by puIiIisbu. . . ISBN 978-0470-61410-1 (!lib)
I. Medical ~Iopedjal I. Ewritt,. __. D. Palmer. aIrisIopbi:r .Rat,h. . (DNlM: l. S1aIiItics as ~"'s-EagIisIL 2. ~ 'J'bcaIaic~~ WA 13 E562 2010) RA409.ES2120IO 6Io.120~·
2010018141 A c....... 1tICCIId for this .... 1s IlftiIIbIc tiaai Ibc British LiInIy. PrinlISBN: 978-0470-61419-1 cPDF ISBN: 9'fI.O..470-ddD74-7
To Mary-Elizabeth Brian S. Everitt To Cailry-Joan, Lalll'a, Carolyn and David Christopher R. Palmer
Contents Foreword ........ Preface 10 the Second Edition I
.......... I
......... I
......... I
.......... I
.......... I
.......... I
.......... I
.......... I
.......... I
••••••••••••••••••••••••••••••••••••••••••••••••••••••
I
•
•
••
ix·
xi
Preface • . . • • • . • . . • . • . . . . • • . . . . . • • . . . . . • . . . . . . • . . . . . . • . . . . . . • . . . . . . • . . . . . . • . . . . . • xiii Biographical Infonnalion on the Edilors ••••.••••••.•.••••.•.•••••••••••.••••••.•••••••••• xv
LiSl or Conbibtltors . • .. .. . . • . • • .. . .. • . • • .. . .. • • • .. .. . .. • . • .. .. . .. • . • .. .. . • • • • .. .. . .. • • • .. .. . .. • • • .. .. . .. • • I vii Abbreviations arad Acronyms ......... Xli Eocyclopaedle ColDpaaioD to Medical Statlsllcs A-Z . . • • • • • . . • • • • . . • • • • • • . • • • • • • . • • • . . . 1-491 I
•
•
•
.. ..
..
•
•
•
.. .. ..
..
•
•
•
.. .. ..
..
•
•
•
.. .. .. •
•
•
•
.. .. ..
..
•
•
•
.. .. ..
..
•
•
•
.. .. ..
.. • •
vii
Foreword This cac:yclopaediac:ontains nocnlly ror 'Pe«~iew' .In my small comer of the medical statistical uni\lCnlC, Ibis seems Iikcagrasssinoromission.lnstead.lhcpn:x:cssorcw)ualing MSCIII'Cb papers is discussed under 'Critical appraisal'. Arc these two proc:edun:s synonymous? And. im:speclive or whether they 1ft or 8M not. should anybody cam? I believe Ibat peer review and critical appraisal do dilTer. that these dille.n:na:s mattcr a deal when considering the ways in which n:aclcrs should intClpn:l the medical Iitcral~ and lhat an undcnIandin, or these difrc~nces hclps to place medicaJ Slatistics in its proper (lCJftlext when surveying Ihc wide horizon of c:IinicaJ and public health n:scardI. Thc editors of this quite wonderfully rewarding uatise on slatisticaltenns havedcftncdcritical appraisal as 'the process of evaluatinJ; n:search n:poItS and assessing their contribution to scientific: knowledJ;c·. 'Ibis statement follows naturally from Ihc mean in, of the words 'criticism' (the an of judgiq)and 'appraisal' (lheestimationofquality). Tbatisto say. critical appraisal is an cstimation of wonh followed by some kind of judJ;ment- ajudt;ment that leans IIKR towards an art lhan a science. As a non-statistician, I rather wann to the precise imprecision of Ibis definition. Now consider the more commonly embeddc:d term 'peer n:vicw' and look how infcrior it is! Who is this anonymous idealised peer? Ocncrally. one would consider a peer 10 be an equal. somebody whocamc:s frum a groupcamparable to Ihat from which Ihc penon under scrutiny has cmcrgc:d. 1bis intellectual cgalitarian is subsequently setlhclask ofviewing apin (to take '~vicw' at its mosl literal meaning) the work underconsiclcration. But 10 view with what purpose? None is specified. Despite these practical shortcomings. editors of biomcclical joumals ~maiD wedded 10 "peer review'. Wc feci uncomfonablc with the notion of critical appraisaJ. The cmbodiment of peer ~view as a distinct scientific discipline is the series of inlcnlalional cOnJ;~sses devoted to peer n:vicw in biomedical pubUealion. or,aniscdjointly by JAMA and thc BMJ. These conpcsscs have spawned hundrals of abstracts. dozens orrescarch papers. and fourthemc issues or JAMA. They are enti~lycommendable in every WDy. For the cdilon of JAMA and the 8MJ, peer revicw encompasses a broad nmsc of activities: mechanisms of editorial decision making. toscthcr with their quality. validilY, and pncticality. online peer review and publication. pm-publication posting
,real
of infonnation. quality assuranc:c of ~vicwcrs and editors. aulhorshipand conlributanhip. conflicts or interest. scientific misconduct. peer ~vicw of grant proposals. economic aspects of peer ~icw. and the fUlU~ of scicntific publication. In other words. peer ~view is a ~mendously elastic concept. allowing editon to stretch it to mean whatever inle~ts them at a given (whimsicaJ) moment in lime and place. Indeed. its claslicity is seen by many of us as its ,~t ~th. ThcCODClCpI pows in richncss and undenlanding as our OWD appreciation of its complexity and nuance soars. 11le impenClrable nature of pccrnmcw. and the obscu~ and hardto-learn cxpertise iI demands. rceds our brilllc c,os. The notion of critical appraisal. by contrast. is far dUnner in meanin,. wilh much less room for cdilorial manipulation and BlP'Bndiscment. EVcn if peer review and crilical appraisal do diller. should anyone actually can:? Yes. they should. and for a very simple rason: the idea of peer ~icw is now bankrupt. Its ~tention as an operation within the biomedical sciences ~Rects Ihe inte~sts or those who wish to preserve their own pow« and position. Peer ~icw is fundamentally anti-democratic. II elcwtes the mediocre. It asphyxiates originality and it kills careers. How so? Peer ~view is DOl about intelli,cnt cnJ;agcment with a piece of rac~h. It is about defining the margins of what is acceptable and unaa::eptable to the ~vicwcr. 'I1Ie mythical 'peer' is bein, asked to view again. after Ihe cditor. the work in queslion and to oller a eomment about thc gcogaphical location of Ihat work on the map or existing knowledp:. If ~ is space on Ihis map, and providc:d the work does not disrupt (too much) the terrain cslablishcd by others. its location can be secured and marked by sanctioniq publication.lfthe disruption is toog~at. the wed's wish to seck a place of ~st must be "CIocd. Peer review is about the DleDeY of power to praervc csbIblished onhodoxy. It has nolhin, 10 do with science.•1 has cverylhinJ; to do with ideology - and the maintenance of a quiel Iifc of privilcp: and mysliquc. Instead. critical appraisal is aboul incrementally wedin, one's way towards lrulh'. It can new:r be about lruth itselr. The essencc of biomedical ~sean:h is cstimation. Our world ~sists CCltainly. Crilical appraisal is aboutlJ'aDsparcnt. measurablc analysis Ihat cuts a path towards grater precision. I. Honan.. R. 2002: Poi1pubItcaliCIII criticism _ dintcal ~. JAMA 217.2143-7.
the shapi", of
Ix
RDREWORD _______________________________________________________________________
Critical appraisal refuSC5 to \'eil itself in the gaudy adornments that editors pin to peer re'liew in order to embellish their own imporlance in the carlography of scientific inquiry. A far more robust instrument critical appraisal is for that refusal. What do these differelK"CS tell us about the propcr place of medical statistics in biomedicine: today'! In my \'iew, as a lapsed doctor and a now wrinkled editor. medical statistics is the most important aspect of our critical appraisal of any piece of new resc.an:h. The e\'aluations by .so-called peers in the clinical specialties that concern a particular l'CSCW'Ch paper prm'ide valuable: insight into how that work will be reeeh'cd by a community of practitioners or scholars. Howevc:r. as an editor I am less intere:slc:d in n:ccpCion than I am in meaning:. I want a tough intcno~ation of new work before its publication. according to commonly a~reed standards of questioning - standards that I can sec and e',-aluate for myself. To return to my personal definition of critical appraisal. I want an estimation of quality combined with ajudgment.1 do not want a view from the dub culture of one particular academic discipline. The rc:jection of peer re\'iew by the
2. Horton. R. 2000: Common sense and fig~: the mc:toric of \'ulldity in medicine. Solisl Med 19, 3149-64.
editors of this elK')'dopaedia is therefore a triumph of liberty against the forces of confonnity. Yet still today. too much of medicine takc:s medical statistics for granted. Time and a~ain. we sec research that has dearly not bc:cn within a hundred miles of a statistical brain. Physicians usually make poor scientists. and physicians and scientists to~ether too onen play the: pari of amatc:ur statistician - with appalling consequences. The future of a successful biomedical research enterprise: depends on the f1ourishin~ oflhe discipline we call me:dical statistics. It is not at all clear to me that those who so depend on medical statistics appreciate either that dependence: or the: fragility of its foundation. If this magnificent elK'yclopac:dia can be deployed in the ongoing ar~ument about the: future of twenty-first century academic medicine. then not only the research enterprise: but also the public's health and well-being will be far stronger tomorrow than it is today. Richard Horton Editor. Lanc'el
IS.
II
II
I·
,(.'
II; 1
II
·1.
J lt rII IIiiI'Ii i'. 'JI,nli
a
t
•
D
f.
•.
;r
1;.-1
B
~t
II
~1~li~=11 r
iJJi 'li lilt ~1!!sf~!I~~ 1 ;11:~f!§1 •
II
I'
!.
"II ",tlll· 11~llfr! l t I i . '~t' f e.·1 I;'J ~~ ~~frlltl l~ll:l' :~1 r I!Ii! S: I IIs,r ii':1.: II r
'S,.
0 0
f,!~ Bi a i '= ' IJ~ I i ' .. eI'" l:U· ~lllil ff~f~Jalli(- ~r
l
II
f'J t I 'II- ,r I. r,ll" ·1:1 if • r ""alrilifil'" 'I;,.lr i'I: I t I tL·a! a '&.1 1 - f I '
II
r
i' r i It ..,, i· ; I ~ I f. I-S' :IJttI5f~rltll.
"C
t 111 '.Jlil i lIlt, I:i rl'l~ It 'Ii~i l"~tJ f ·wI rIs=tJllrl'IJKJl~,1 tlr~ .,rJllifil 11ft till Iltlll!l .J 111;n:
•
-
•
, It
Preface Sialistical science plays .. ilDporlanl role in medical JaelRh. Indeed a major pall of lhc key 10 lhc pmgras in medicine tian lhc 17th ceatlll)' 10 the pn:scnl day has been the collcetion and valid inlCqRtalion of evidence.. panic... larly quantitative cvidence. pmvidccl by Ihe appIicalion of stalislical methock to medical inYeSlipiioas. Cum:nl medical journals ~ full of statislical lDIIICriai. both rdalively simple (for eumple.. t-lcsls. p-wlues.lincar lqIaSion) and. increasingly, mo~ complCJC. (for cxample. genenalised c:stimaliDl equalians. cluSla" analy. Ba)eSiaD IDClhods). The laller material reRects the vilnnl 5IaIc or stalistical research with many new mdhods having praclical implicati_s f~ medicine heiDI cIc~oped iD thc IasllWO cIcc&des ar so. Bul why is slatistics impallaDl in medicine? Some possible answers are: Medical praclice and medical rescan:h gcneralC IaJp amounts of clara. Such data an: generally full of uncertainly and \·arialion. and e&tIacling the "sipal· from lhc "Daisc' is usually nollrivial. (2) Medicine involves asking quclli_s lhaI have slraDg slalislical overtones. How common is lhc discasc? Who is especially likely 10 conInIcl a particular condition? Whal an: lhc chances thai a patienl diasnoscd with breast cancer will survive man: tha fi\IC yean? (3) The evaluation of campcliq In:allDCnts or pn:vc:alative IDCIISIRS relics heavily on slatillical cOllCc:pl' in both lhc clcsip and analysis phase.
(I)
Recognilion orlhc: importance ofstalislics in medicine has increased considerably in n:ecnl years. The lasl decade. in padicular. has seen the emergcnce of cvidence-based mccIicinc.., ..d with it the need for clinicians to keep one step ahead or lheir paticnts. lDay or whom nowadays ha\IC access 10 virtually unlimiled inronnation (lDuch ofil being virtual, yct SOnIC of it being limiled in its reliabilily). Comparcel with ~vious gencrations of medical students,. today's prc-cliniclil undergraduales arc heiDI taughl more about statistical principlcs than lhcir predecessors. Furthermore. today's clinical researchers 1ft faced (happily. in our vicw) with growing numbcn or biomedical journals utiBsing slatislical rcfCl'CCS as part of their peer Mview proccsses (sec amCAL APPRAISAL and STATImCAL REFEllEEDKJ). This enhances the qualily oflhe papersjoumal editors select. although from the clinical n:scardJcr's
pcnpc:clive it has lDade publicalion in leading journals more challcnginglhan ever befOM. So slalistics is (and an:))RYalcnl in the medical world now and is sello ranain so ror the future. Clearly. clinicians and medical n:scan:hcn need to know somcthiq about Ihe subject. cven ifonJy 10 maIcc their discussion with a friendly Slalislician more rruitful. The article on consulting a stalilr tician quolcs one of the fGRfatbcn or mocIenI slalillics, R.A. FISher who- back in 1938. obsc:ncd wryly: "To conn,lt lhe $Itll&lIcilBr tlJln till e."cpnimerrl isjin&lred u often merely 10 ale him ID conduct II po.,-mor'em e.'ttlmintlliDR. He Cllll per/uJps my ...·btlI tire e.'fpe,imerrl dim of.' Thus. one or our hopes for the usefulness and helpfulness of the EncycloPllffdic COmptllliDIf 10 MediCtlI Sltlli,sl;cs is lhat it may SCrYC 10
cncaurqe both productive and timely intaaclioas bcawcca medical racan:hc:rs and statiSlicians. Anolhcrsincc~ hope is that il ftlls a pp between, on the one hand. lextbooks thai delve into possibly 100 much theory and. on the olhcr haad.. shader cliclianarics thai may not necessarily focus on Ihc needs of medical racan:hc:n.. or else have c:alrics Ihat an: IanlalisiDlly succinct. 1b meet these ends. lhc presenl ~f cn:nce wark conlains concise. informative.. n:lali~ly nontechnical. and hence, we IrUsl. readable acc:ounlS of over ]So topics ccatrallo modem medical statistics. Topics are coVCRd either brieRy or mo~ extensively. iD general. in accordance with the subject matter's pcn:eivcd ilDportance. although we acknowlcdlc then: will be disagn:cmcnt. incvilabJy. about our ChaiClC of DIIicle lengths. Many entries benefit from containing real-life, clinical exalDples. Each has been wriltcn by an individual chosen not only for subjecl-maller expertise in the ficld bUI.just as ilDportantly. also by ability 10 communicale statistical concepls to olhcn. 11ac extensive cl'OSlrrefcn:nc:ing supplied usiq SMAU. CAPII'ALS 10 indiclllC tcnns thai appear as separate entries should help the radcr to find his or her way amuncI and also serves to point out associated topics thai might be of intcral elsewhere within lhc EncycloptJedic Comptllf;tIII. All but the shadcst caines contain rerCI'CIICCS to fUJthcr resources when: the interested n:aclcr can learn in paler cIcpIh about Ihc palticular topic. Thus. while hoping this work is found to be mostly comprehcasiblc we do nol claim illo be rullyCGIDprchcnsi~. As c:o-editan we lake joint n:spansibilily far ..y enurs ('sins or commission') and would positively weIc:omc sugcslions
xiii
PREFACE _____________________________________________________________________ ror possible new topics to consider for futw-c: inclusion to R:ctiry perceivc:d missing entries ("sins or omission'). Our thanks ~duc to numerous people - first. to all onhe many conlributors for providing such excellenl material. mostly on time (mosaly!) with panicular gratitude: extended to those who contributed multiple &nicles or who handled requests for additional &nides so gracefully. NeXI. we upprcciatc:d Ihc lrcmendous and indispensable effom of staff at Arnold. cspccially Liz Ooostc:r and Liz Wilson, and not least ror their remaining calm during an editor's moments or anxiety and neurosis about the enlire projecL In addition we would like to thank Harriet Meteyard for her eonslanl support
and encouragemenl throughout the preparation of this book. F"mally. our family memben deserve especial thanks ror ha\ing been exba toleranl of our lime spenl on developing and executing this extensive projecl from beginning 10 encLll is our hope thai Ihe Encyclopaedic- Companion pnn'cs all these efforts and sucriHces to be well worthwhile. becoming a userul. rqularly-thumbc:d reference added to the bookshelf or many or those involvc:d in contemplating. conducting or contributing 10 mc:dical rcscan:h.
Brlaa S. Everitt aDd Clartstopber R. P......r /tlnlllllY 2005
Biographical Information on the Editors BrI_ S. Everitt - Professor Bmerllal, KIDR·s eo... Londo.. After 3S years al the: InSlitulC of Psychiatry. University ofLondoD. Brian Ewritt n:tired in May 2004. Author of approximately 100 journal articles and O\'Cl'SO books on statistics. and also co-c:ditor of SllIlisllml Melhods in Met/· im/ Reselll'cir. Writing continues apace in n:lin:ment but now puncluatc:cl by lenn~ walks ia the counlry.luitar playing and visits to the Dna. rather lban by committees, committees and mon: committees.
Cbrlstapher R. Palmer, roundins Din:ctor of Cambridge University's Centn: for Applied Medical Statistics. rqularly teaches and collabanlc:s with cum:al and fuaun: cIoctan. His first clcln:e was from Oxford. while graduate and postdoctoral studies \VCR in the USA (at UNC-Cbapel Hill and Harvani). He has shined frvm mathcmaticaltowards applied Slatistics.. with particular inten:st in lheethics ofclinical trials and Ihe use or ftexible designs whenever appropriale. Fundamentally. he likes to promote soUDd statistical thinking in all areas of medical raean:h and hopes this volume might help towards dlat end. Chris SClrved as Deputy or Acting Editor far Slatistics In Medicine, 1996-2000. and is a longstandinlslatistical n:viewer for 71re Lancel. He and his wife have three childn:n they ClClllsider to be IIICR than statistically sipificant.
xv
List 01 Contributors K. R. Ab...... (DA). Cenln: for Biaslalislics aad Ocnelic. BpiclemioJOI)', J)cpaIbDent.ofHeallh Scic:nccs. UniwDit)' of Leicesla', LBI 7RH. UK
....,icestc:r.
Calla ....... (CB), Cli.aI nial Servic:e JJnil and
I.ac)r M. CarpeD.... (LMel, DeparlmeDt of Public Heallh.
Univenil)' of oxronl,. and Nullield CoIIcp,. oxranl ox I INR UK .
iuaa CIdaD (SC), Respiratory Epidc:miok., aad Public Health. Imperial College. Emmaaucl
Ka~. Building.
EpiclcmioJogicai Studies Unit (CTSU). Richanl Doll Buildilll. Olel Raad Campas. Raosevelt Drive. oxranl OX] 7LP,. UK
Manraa Road. LaacIon SW3 6LR,. UK
Alua ....... (AB), QuantilatiWl Scienc:cs. OIuoSmith-
LandaD WC1N lEH,. UK
Kline. Medicines Rcscarch Ce~ Gumaels Woad Raacl. Stevcaace. Hertrordshill' SOl lNY. UK
'11m CelIe (TJC), MRC C'enIrc of EpidemioiOD rar aaJcI Health. UCL Instiblte or Child Health. 30 Ouilrord SIn:et.,. Chris CGftOI8D (CCo), J)cpaIbDent of Madlematies and Los-. UT 84322-3900.
Statistics. Ulall Slate Univenity,
TlJI De Ble (TOB), ISIS Rc:sean:. 0nIup. Buildiq 1. UniWlnity of Sauduunplon. SouIb....pIaI S017 IBJ. UK KII.... Bjork CD). Primelric:s. Inc:. AmIcIa, Colanclo, USA (btheOprimebics.net)
J. MuIID ....... (JMB), Pmfessor or Health SlDIistics. DepadJnent of Health Sciences. UniWlmity of yadt,
HesliDgtaa. yadt YOlO SDD,. UK Ma.......... (MtB), DivisiOll of Bioltatics. AnIoId School of Public Heal.... Univenity of South Carolina. 800 Sumter Sbeet. Columbia. SC 29208. USA. and also Unit of BiosIaIisIics. InstitulC of BnvinHlll1ClllaI
Medicine. Kmulinslca Institutet Nobels ric 13. Stackholm. Sweden MkIIIUe Bndley (MMB), Health Ial'onnatiaa and Quality AUlharit)', Oc:orp's Court. Ocoqe's lane. Dublin 7.1n:1anc1 ........... (SB), Departmenl
or Sacial Medicine.
UniWlnity or Bristol. C_ynge HaD. Whiteladies Road. Clifton, Brislol Bsa 2PR. UK Marc aa,.. (Ma), Intenational Dna, DeWllapmenl "stitute (IDDI). 30 avenUc provineialc..
1340 Louvaia-la-Neuve. BelPum
M..J. CampIIeII (MJc)' School of Heahh aiad Related ReseaKh. University of Sheflielcl. Resent Court. 30 Relent Sbeet. SlIellleld SI 4DA,. UK
USA
or
NeIIo CrIIIIaaIaI (NC), UC Davis Departmc:Dt Slalisties.. 360 ICar Hall. One Shields A~ Davis.. CA 9S616. USA
Sanb CnaIer (SRC), MRC urec:ounc EpidelllioiOlY Unit. University or Southampton,. Southampton
General Hospital. Southampton $016 6YD, UK CaraIe O.mml" (CLC). 'I'be University of BinniaPam. Departmeat ofPublic Heal.... BpidemiolOl)' ancIBiotali*s, go VlIIL'ICnt Drive. Ecl&ballOll. Binni~ B 15 lTH Geo.... DllYII,-8mUb (ODS), Schaol or Social and Communit)' Medicine.. Uniwnit)' of' Bristol,. Oakfield Hause. Oakfield Onnre. Bristol BSS 2BN, UK Sbaaa Da, (SO), Raclae Products LilnilCd.. Welwyn 0anIen
Cit)'. HertfonIshiM. AL7 ITW. UK.
DaaIeIa·De A"'IIII (DBA), Statillics.. Modellin& and Economics Departmc:llt. Healdl Pmta:tion Apncy, Cenln: far Infections. Landon and MRC BiCl5lalistics Unit. Jnstilute of Public: Health. Uniwnity Pantie SilC. Robinson Way. Cambridp: ca2 OSR, UK
J......... Dee.. (JD), Public Health. EpiclcmioJogy and BiastatiSlics. Univenit)' or Binningham. EcIgbasIon. Birmingham BI5 21T. UK 0 ....... Daaa (OD), Health Scienc:cs Resc:an:b Gnlup.
, ..... R. C......ater (JRC). Mcdic:aI SlaIistic:s Unit.
London School or Hygiene and 'l"rapkal Medicine:. Keppel Sbeet,. Lonclan welE 71fT. UK
Schaal or Canmaunily Based Medidae, UniWlnily of Manchester,. lean McFarlane Buildin& Oxford Road,. MaacbesIer MI3 9PL. UK
xvii
USfOFCONTRlmnoRS _________________________________________________________
Daua E8Itoa (DE), Department of Publi~ Health and
THY Johnso. (TJ), MRC Biostatistics Unia.lnslitute
Primary OR, University of Cambridge. SlI1Ingeways Resean:h I..abanlory. Worts Causeway. Cambridp
CBIIRN. UK
of Public Health. University Fonie Site. Robinson Way, Cambridge CB2 OsR. UK a MRC Clinical Trials Unit. 222 Euston Road. London. NW 1 2DA
Jo....... Bmbenoa (JE), Cinical Trial Servitle Unit and Epidemiological Studies Unit (CJSU). Ricbard Doll Building, Old Road Campus. Roosevelt Drive. Oxfard OXl
Karea KaIMIar (KKa), Department of Mathemalics. University of Colorado at Denver. PO Box 173364. Campus Box 170. Denver. CO 80217-3364, USA
7~UK
IUcharcI EauIey (HE), Health Sciences Resean:h Oraup.
School of Community Based Medicine.. University of MancheslCr. JeaD McFarlane Building. Oxford Raad. Manchester M 13 9Pt.. UK Bri. . S. Everlll (851£), Biastalistics Dcpadment. Institute of Psycbially. Denmadt Hill. London SES lAP. UK David FaraaII (OF), Depadment of Statistics. Univenity of Haif... Haifa 31905. Jsrael
lC.,.........•• lC.bn (KlC.), Department of Biostatistics and Medical Informatics, University of WISConsin Medical School. 60D Highhiad Ave., Madison. WI 53792-4675, USA bib Kina (RlC.)t School of Mlllhematics and Statistics. Mathematit:al Institute. UnivcrsityofSt Andrews. Fife KYI6 9SS. UK " WoJlek IC.JozaDDwsti (WK), School of Engineering, Mathematics and Physical Science. University of Exeter. Harrison Building. North Park Road. Exeter EX4 4QF. UK
Health and Health Policy. Division of CommuniI)' Based Sciences. University of GIIISIOW. GlasJow G128RZ, UK
RaDjlt LaD (RL), Warwick Emergency Care and RebabililaliCID. Division of Health in the Community. Warwick Medical School. Univenity of Warwick.. The PannhoWie. Gibbet Hill Campus, Coventry CV4 7AL. UK
81. Gaet.......ur (EO), Department of Applied Mathematics and Statistics. OIIent Universaly. Krijgslaan 281-89,9000 Ohcna. Belgium
Sabine Landau (SL), BioslDlislics Department. Institute of Psychiatry.. King's College. Denmark Hill. London SBS lAP. UK
AlKlnw Orl..e (AO), Division of Health a Social Can: Research. Depanment of Primary ~ and Public Health Scieaces. School of Mc:dicine.. King's College. Floor 7. Capital House. 42 Weston Loadon SEI 3QD. UK
Andrew 8. La. . . (AL), Division of Bioslalistics and Epidemiology, College of Mectacine. Medical University of South Carolina, Charleston. SC 29415. USA
W. Harper GUDIOur (WHG), Section far Public
sa.
Julilua P. T. ......_ (JPTH), MRC Biostatistics Unit. Institute of PUblic Health, University Forvie Sileo Robinson Way. Cambridge CB2 OSR. UK 11Ieodore .. HoIfanl (TRH), Division ofBiosiatistics. Yale School of Public Health~ Yale.. New Haven. CT 06520. USA HaUls (SH), AstraZc:aeca, Parklands. Aldedey Part. Macclesftclcl. ~eshire SKIO 4TF. UK
SaB)'
Tonte. Hotbana (TH). Institut fUr Slalistik. LudwigMuimilians-Univenitit MUnchen. LudwiplraS&e 33. DE-80S39 Miinchen. G:nnany Hazel 1_ldp (HI), MRC Lifecounc Epidemiology Unit. Uni~ty of Southampton, SouthamplOD Cieneral Hospital. ~~nSOI66YD.UK
Marwn Leese (ML), Health sCrvitle and Population Resean:h Department. Institute of Psychiaby, King's College. Denmark Hill. Loncloa SBS lAP. UK
AmI1 Lyada(AGL), DeparlmentofOncology. University of Cambridge. Li Ka Shing Cenam., Robinson Way. Cambridge.. CB20RE. UK Cyrns M..... (CM), President. Cytcl Software Corporation. 675 Massachusetts Avenue. Cambridge. MA 02139, USA RI..... Morris (RM), Department of Primary Can: and Population Health. UCL Medical SdJooI, Royal Free Campus, London NW3 2PF. UK Paul M......... (PM), Department" of Statistics, The University of Auckland. Private B8I92019. Auckland. NewZcaland
_______________________________________________________
CIIrIstopIIer R. ......r (CRP), J)epadlDent of Public Health and Primuy Care, Institute: of Public Health. Uni~nity Fantie Site. Rabiason Way. C8mbridp CB2OSR.
USTOF~~
Aaden SknIadIII (AS), DiYisiem of EpidcmioloJy. Norwepua InsIituteorPublic Health. PO Box 4404 Nydalen. N~. Oslo. Nanvay
UK
NIaeI Smeetaa (NCB). Kina's CoIlqe London. Max ......... (MP), MaC CUnicai Trials Unit. 222 Busloft Road. London NW 1 mA. UK
Departmc:Dt of Prillllll)' Care and Public Health Scieaces.. DiYisiem of Heal... and Social Cam Rc&eaIdI. 7th Floor Capital Haase. 42 Weston SlRIe~ Landon
Nlla ..... tNP), Cytel Softwan: Carpanlioa. 615 Massachusetts A\lellue. C'ambridp. MA 02139-3309, USA
3QD.UK
sal
NIaeIStdanl (NS), Wanvick Medical Schaol, UniYersilyof
J_a Pow-UP), Dcpanmc:at of Public Heallb and Primary
Warwick. COW:nII')' CV4 7AL. UK
Iaslitalc of PubUc Health.. Uni~ly Fcnie Site. Robinson Way. C8IIIbriqc Ca2 OSR. UK
Joaat. . . . . . . (JS), School of Social and
ellie,
P. PracaIt (pP), Faculty of Malhematical Studies, UniWl5ity or SauahamplOD. SouIhaaaplOft SOl7 IBI. UK SopIda ............. (SRH). Graduate School of Educatioa and Gndu_ 0nJup in Bi~. UniWl5ity or California. Berb1ey. 3659 Tolman H•• Califomia 94720. USA and InstilUte of Educalion. UniWl5ily or London
8m ...... (BR), Department
or SIalistics.. Uni~ or
Medicine. Uni~nily or BristoL Canynge HaIL 39 Whatley Road. Bristol BSS lPS, UK
Comaumil)'
.......... SWaIIDa (88), Swedish Business Schaoll Slalislics. oR::bro Univenily. 0n:In. Swedc:n
M ...... SJdes(MS), MRCClinicai Trials Unit. 222 EUSloII
Road. I..aDcIoa NWI 2DA. UK J......, Taylor (NOT), Depanmc:at or BiaslalilliCs. Univenity of Michipn. 1420 WIIIhiDIton Heilhts. Ann Arbor. MI4BI09-2D29, USA
Haif.. Haifa 31905. Israel
ShaaII . . . . . (sas), MRC BiasIalislic:s Unit. Institute of Pablic Health, UnivCnily Fonic Site. Robinaaa Way.
Kale TIIIIaa (KT), School or Social and Comm_ily Medicine. Uni~ty of Bristol, OmYlile Han, 39 Whalley ~ Bristol BSS 2PS. UK
Cambridge CB2" OSR, UK M ........ (MRS), Division of aiaslalistics.. Uniwnity of Califamia, 185 8c:n)' SIreel. Suite 5100. San FraDcisco. CA 94107, USA
PraI..,. .........art (PIe), CyteI Software Carpanlion. 675 MassachuseUs Avcnue.. Cambriclce. MA 02139-3309. USA ............. (SS),DepartmentofStlllistics. The Uni~ity of Olascow. Olasgow 012 8QQ. UK ~
SIuIID (PS), Deputmcllt of Psychialr)'. 11Ie Uni~ily of Hoa& Kan" Queen Mill)' Haspital, 102 Pakfulam ReI.
HOIIIICaIl&
c .... Sbarp (CS), Computiq Deputment. Institute of Psychially. Daunut Hil~ LoncIaa se 8AP. UK Anid SjalaDder (AlS,. Depadment of Medical Epiclemiolo" and Biasllllillics.. Karvlinsb InstilUlet. Nabels Vic 12A. 171 17 Stockholm. Sweden
81t1ua TGID (BT), MRC Biaslalistics Unil.lnstilUle of Public Health. University Fanie Site, Robinson Way. CambridJe
CB2OSR, UK
.
ReIM!ca 1'araer (RT), MRC Biostatistics Unit. Institute
or Public
Health, Uniwnily Fame Site. Robinson Way. Cambrid&e CB2 OSR, UK Aady V. (AV), Bio.dptista 0nJup. Uniwmdly or Maachcster, Oxfanl Road. Mancheller M13 9PL. UK
SII,Ia V...........t(SV),Gheal Uni~lSity, DcpLof AppIic:cI Mathematics and Computer Science. Krijplaan 281, S9, 8-9000 Ghent, BelPum Sanb L Vowier (SL\I), BioinfCll1lllltics CoR, Cancer Racan:h UK. Cambridge Rescan:h lnIIitute, Robinson Way. Cambridp CB2 ORE., UK
Stepbla J. W...... (SJw)' Medical SIaIistics Graup. School or Health and Related Raarch, University of Sheffteld. Regent Court. 30 Repnl Stn:et. Shemeld
SI4DA.UK
xix
USTOFCONTRIBUTORS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
J.G. Wheeler (JG\V), Quanticate Ud. Bevan House. 9-11 lUi. UK Bancroft Court. Hitchin. Herts.
J .... \VII. . (JW), 1710 Rhode Island Ave. NW. Suite 200.
Bnmdon WbItcbI!r (aW). Clinicallmaginl Centre.
Mark Woad....... (MW" The George iMlibiac for
GluoSmilhKliae, Hammersmith Hospital. Du Cane Road. Lonclaa WI2 OHS. UK
Inlemalional Health. PO Box M20I. Misscaclen Sydney NSW 2OSO, Australia
sas
laD WhIte (IW), MRC Biostatistics Unil.lnslitulc or Public Univenit)' Forvie SiRe. RObinson Way. Cambridge
Health.
CB2OSR. UK
Washington DC 20036. USA
Road.
Ra-F_ Yeb (BY), University or California San Fl'IIIICisco. Campus Box Number 0560. Soo PanlB!isus" 420 MU-W. San Francisco, CA 94143-0560. USA
Abbreviations and Acro.nyms ACES ACET
AD AI AlC ANCOVA ANOVA AR
ARMA AUC
BlC BUGS CACe.
CARr .CAT CBA
CEA CI CONSORT COREe CPMP
CPO Cd CRM CSM .CUE CV
CWT DAG
DALY DAR DeAR DDD DE OaF
DIC DM DMC DIMC I)WT
Actiw conbvI cquivalcnc:c &Iud, Ac:Iiw COIIIIVi cqui~ I&:It Adapliw des.ip Altildal inlellipnc:e Alaike's information Crilaion AnaI,sil or COWIiaDt!e Analysil or variance AuloJqn:sSive Aulolqlaslve lIIDVinI'avcrap Ala " ' r c:urYe Ba)'Clian infonnaliaft crilCrian BaycSiaD inraa.ce Usia& Gibbs SIImpIinc (softw8re) CompUer awnp causal ell'ect Classiftcal~n and RpasiDn ftc
DZ 88M BOA EM
Diz)'lGlic EvicIencc-bued aalicinc BxplanlOl)' data analysis
£MEA
Ell...... Medicines EwllIIIIioa Apacy Food and DIu& Adminillnlliaft OcneraIised additive madel GencnlilCd csliaIaliDl eqlllltioDs CJeaeraI t'CItility nile
CoaapuICr..-laplive tcsIiJII Cosl-bc:neftl ",,.1 Cosl-ctrec:li--. analysis Caiaftdencc inIavaI
ORR GWAS
CGasolidalioa of s&ancIanIs or ftpIItinJ IriaIs ~:OI1ice
roi.RaeaIdI
FDA GAM
Gi!B GFR GIS GUM
GLMM
HALE
HMM
HPDl HREC HRQo~
IBD.
CaaunillCC fal' PmpricIary Medicinal Praducll Coiaditiaaal )RdicIiYC- onIi...e
ICC
"p
Dalaminill& Dada.monitorial cammilfeo
Data and saf'c:Iy monilariD& cammiuce DiI!Clde wavclcllnlnlf'orm
~ infOl1ll&llion sysICID
Gcnaalisallinear. illlenK:tiw .modcJIilll (SDftWIR)
GUMM GLM
Ethics ComniiIIecs
C1dble inlcmII Coaliaua11aS1C1S111e11t mcIhod c.uaillac GD Safety or Medicines Cosl-ulility analysis C'ocOicient or wrilllian Conli.ous wavelel .......... Din:c:lCd acyclic paph Disability alij8led life-)'Car Dlapaal at nnIom DIapauI mmplcrely aI mndaIn DaIa-clepcadcnt Dclipefred Dc:pa:s of rn:cclam DeviaDcc informalion crilcrion
Expeclalion-aaaimilalian
CleIIcraIisc:d Unear mW:cI ..... OcneraIised linear model Gcnaalisallinear mIUcI ...... Gmss ..".aduclion mte
C1enorno-wicla associlllion ·slUdics HcalIh-a4iUSlcd lire cxpecllUlcy Hidclen Markov model Hilliest (IOIIaior·dcnsity ~ H..... IaClRh elbics conunitlec Hcallbcllllcd qualily or lire fclenlily-by-clesc:cal IDlradus (or inIIaclustcr) com:laIioIa cadIlcicat
ICER ICH
~1I1CIIIaI CDII-clra:liveaess ratio InlcmalioDal ConrCRnCC GD
Hannaaizali_
IQ
rrr IV
KDD
laslilulianal .mew board InlcnlioD-~
lllslnanallal variabIc ICnowlqe cIisccway ill clatabuel
KM
ICaplan-Meil;r
IcNN
k-ncan:1t aeilhbaur u.. discrinainanI ftIIIcliDn
LDF LR
LREC LS· LST
MA MANOVA MAR MeA MCAR
~11IIio
Local n:aean:b c:lhici c:ommillee Lcasl·.._ l.aIp simple IriaI MoviI1l awrap MukiWriate ...., . or variance Missialld mndom Mc:cIiciael ConIIvI At,crrl:y
Misslnl. mmpIctcly aI. raacIam
xxi
~~AND~
______________________________________________
M.m. cliaiD MoRa.: Carlo
M~·_·~.~ PnJducis·
.RepIalDay Apacy . M~m_ JikemioacI CIIIinuiIe _ (- C8IimalicIa)
MREC
Mulliccalre ·lUCarch ethiCs caaunillee
MsB
Mc.a.1IqUIR e~
MTQ
MliXim .... Iolerali:d .. . . cloSe Manozyplic: ~ (arnoninfanaalhe)
MZ NI ~.
·:Ncl mcmeIaIy ~ftt.
NNH NNT
.NulDJM:r .aated·IO.hann Niam....... lOlMat Ncpii~ pmlicii~ value
NPV
NUs
Mdt .OLS QR
PeA
PDP PEsr fGM
ftl
.0Idin.y: inst sipuua ·Odds,.aio -PriDciplil companc:al.
anal,. ......,.litY "'Y runciIa. ...... ad B.arian of ~~(IOft~D)
,..,. ~ nicuuni . .
Quanlitali~ .lloci
RCT· REB·
bndomIsed conIIOIIaIlriai Rc~.I.~ .
RBML,
llcscaIda.eIhics cammilleC b ..... max_UIII. UbiihOOd
·ROC·
~eaciwropifttiq ~
ROt
~_ofintaat
RPW
~isacl play~. . .
•• SSM
·S8.t
.SPRT
.SS
,S8
·TDT
PnIpOrtiDaaI . . . .
POP pp
.,. .,...,col·
p'-p
~.JlCl'CCntile
• • • 1~ . . . ~H~t7. . . . . . . . . .
pn,bability
.
.1M
Tail
n·
VAS
wLSi!
~n-·;miIl ~hiJit , .r-" l"" ........ Y c.
Positive paalic:live ·alae
~
Quiaai~1IIlIiIe
.~
SVM
PIiarwnac:oJci_:-.....lo_ _- - - ' - _ : -
PPV
Quality or lire
SMR
PKIPD ppp.
QoL
.SD !S.
' . '
.
~ty. ad)i~ ure:-~.
RR
Naiiaaal Racan:h ElJiic:s . . . .SerVice . ·Net . . . .1CtiDn . .
-~miit
~Y·
RcIaIiWJ
.Ii*
SIancIanI deviation SI8nCIanl~
_ur81 slanclaidi.d
slIuIdIud ciJcr .Of'the mCan;: ~
DDIcJ
l'IIOIIIiIily. raIio . S.aultical parameIric nap psababilily tcit . .ratio .
Seq..,."
SUmofsquans
.
Sum of.squaR:S ~ 10 error s~ -reCtor madu_
TransmiisioD dlsiciltiDn lest· 1bbiJ. fertility rata . ·~rncihod
TiiquIar rat aDaJOpe...Je . W~ .... squan:a estimate
Y"""
(or lIati~)
A accelerated factor
See SURVIVAL ANALYSIS
accelerated failure time models
See Slm1\'AL
ANALYSIS. 'JRANSRlStAnON
active control equivalence studies The classic ranclomiscd aJNICAI.TRIAL seeks 10 prove superiorily of a new lmItmenl to an existing one and a successful conclusion is one in which such proof is clcmansll'alcd. The famous MRC trial of slIqJIOmycin is a case in poinl (Mc:dical Research Council Streptomycin in 'lUben:ulosis liials Committee. 1948). The trial concluded with a signiftcanl difference in outcome in favour oflhc group glven streptomycin compared to the group that was noI. In n:ceal yc:an., however. there has bcc:nan iDCMaSing inletat in IriaIs whose objeclive is to show thai some new therapy is DO worse as repnls some outcome than an existing treatment. Such trials ha\'C particular featwes and difftc:ulties that wen: described in an important paper by Makuch and Johnson (1989) in which lhcy used the term 'active control equivalence studies' (ACES). Actually. lhc term is nul ideally chosen since. unlike bioequivalence studies. when: Ihe object is to show that the bioavailability of a new formulalion is not only at least 204Jt less than that of an existing formulation. but also at most2S 4Jt more, ad he:nce w~ f!qUilYllence 10 some dcgn:c: is genUinely the aim, in ACES it is almast always Ihe case that only noninferiority is the goal. II may be questioned as to why the: rather modest goal of noninferiorily should be of any inlclal in drug regulation. There arc several reasons. 11Ic flnt is lhat the new drug may have advDDtagcs in lenns of tolerabililY. Second. the new drug. while showing no net advantage to lhc existing one. may increase patient choice and this can be useful. For example. many people have an aspirin allergy. Henee. it is desinble to ha\'C altcmalive analgesics, even if no better on average than aspirin. Third, it may became necessary to withdraw tn:abnents from the market and one can never pR:dicl whe:n lbis may happen. 'I1Iere IR now several stalins on the market. 1be facl that lbis is so means that withdrawal of "riYaSlatin does not make it impossible for physicians 10 continue to treat their patients with this class of drug. Pourth. introduction of further equivalent therapies befon: patent expily of an innovator in the class may pennit price compelition to the advantage of reimbunors (although such competition is probably not particulariyeffccti\IC; Sean ad Rosati. 2(03). However. the nOh nmon is probably the most important Dnq; regulation is designed to satisfy some minimum requirements for phar&qdOfNllldjt CtNHpIIIfioIr It) Mftd"1IYI1 Slalislia; S«rMd EdiliufJ C 2011 JohD Wiley & ~ ....
maceuticals: that they are of sufficienl quality. arc safe and emcacious. Efficacy isdcmonstrated iflhc treatment is better than placebo. even if it is not as good as some other treatments. The comparison of a new drug 10 an active In:alment may be dictated by ethics but the object of the trial may simply be an indin:ct pRlOf that the treatment is beuu than placebo through ClOIDparison to an agent whose emcacy is accepted. Rctlently the issue of the indirect comparison to platlCbo has been taken more seriously. Consider Ihe cue when: we ha\IC a single effectiye treatment on the market, say A. whose emcacy has been demonstrated in a series ofbials comparing ilto placebo. We now run some new trials comparing a furtha treatment. B. to A. 1bking all these trials together, they then have lhc structure of an incomplete blocks design. The effect of B compared 10 placebo can then be estimated USing Ihe double contrast of 8 comparai 10 A and A compared to platlCbo. This approach has been examined in detail by Hasselblad and Kong (2001). A CXJIIscquence of taking this particular view of malleI'S is that the precision with which the effect of A was established compared to platlCbo cannot be excec:dc:d by the indirect comparison of B to placebo. since the variance of this indirect conlnSt is the sum of Ihe variances of the two din:c:l CXJIIlraslS. This is. however, nol the only difficulty with such studies. The folloWing are some of those that apply.
Est.liming Q cliniCtlIl,. irrelevant difference. Ifthe route of a fannal anaIysisClOlDpamd 10 placebo via an indirect contrut is taken. this particular difficulty may be finessed. Tbe new RalJnent is shown to be 'sipiflcantly' better than placebo. albeit using an indiru1 argument. and the extent of its inferiority 10 the campanlOr is only of relevance 10 the extent that it impinges on the proof of el1icacy compared to placebo. If this proof is provided.. then the comparison to the actiYe compamtor is 'walei' under the bridge' . If this particular approach is nol taken. howcvu. lhen any proof of eflicacy of the new tn:atmc:nt rests on a dcmonslrlllion that it is not 'substanliaUy inferior' to the CompandOr. which camp;uarar is accepted as being cfftcacious. 1bis nises the issue as to what it means for a ckug to be not substantially inferior to anolhcr ODe. This appears to rcqui~ that some naargin .d • .:I> O. be adopted such that if" is theelllent by which the new trcaIment is inferior to the slandard (where r < 0 indicates inferiority) then it is judged subslanliDlly inferior if 1" :5 -.:I and not subslantially inferior or "equiYalent' if T > -.d.
Edited by Briaa S. Everitt and ChrisIGph« R. P'dmeI'
1
ACTIVEOONI'ROL EOUIVALENCE STUDIES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Tecl",;clI/ slolislim/ IDpeds. In a Neym~Pc:anoD hmewarlt (sec Salsbury. 1998) the lesl of noninfcriorily raauira one to use a shifted NULL HYPOrHESIS. One miPt. thcn:fon:. adopt Ho.~~-A. The silaalion is DOl lIS canlnn'asial as thai for InIe biacquiyalence. where the fiact thai two hypotheses have 10 be rejected. that of inferiority and that of superiority. means that an inlaiU", approach of seeing thai the confidence limits for the ditreraace lie within the limils of c:quivalc:ace is not 'optimal' (Beller and Hsu. 1996~ althoUlh the 'optimal' lest may in pnldia: be: worse (Perlman and Wu. 1999: Senn. 2(01). In praeticc. in the case of ACES if the lower CXlDYenlionai I - a ,wo-sided CONFlDENtE INIBlVAL for ~ excc:cds -A. Ihc hypothesis or substantial inferiority may be: n:jcclcd .. the level a ad noninferiority asserted. n might be: thoughl that a o~sided conficleace inlCrVal would be suftlcient for this purpose. Howc:Yc:l', the general rqulalory CXlDYenlion is Ibal all tests clc:signed to show superiority an: lwo-siclc:cl (despite appan:nt purpose) and. since such tests an: a special case of a noninfcriorily lell with.d =O. usc or onHdcd tests for noninferiority would lead to inconsistcncies (Sean, 1997; Committee for Prapric:bUy Mcclicinal Pnxiucas. 2000). In a Bayesian framework (sec BAYESIAN .IEIHODS) one might RqUin: that the posterior probabilily of noninferiority wen: less Ihan same specific:cl amounl. Abanalivcly. use of a loss functian would permit a clcc:ision analytic meIhod. such as has been proposed for bioequiva1ence (Undley. 1991). to be used.
Po...~r of ,rio/£. Nole that the rcasan one daes not employ a value of A = 0 in practice is that unless it is expected that the new ImIImeat Rally is better than the slandanl. the power of the resulting test could nevc:l' exceed SO fJ,.
However, the clinically im:levant difl'cn:nce is likely to be less than the clinically n:levanl difl'en:nce used in CXlDYentionalliiais. Hena:. if the new tn:aImcDt is ac""'ly no bc:_1han the stanclanl IRalment. then. for a given sample size. the IIOIICICntndity panamcler. cl == AISE(J) is likely to be: smaller f... ACES than for lliaJs designed to show superiority. COIIICqucntly. ACES either have lower power ... higher sample sizes than conyentional trials.
A.uay :lensili"ily. A pmbIem with ACES is thai if the trial appears to show noninferiority or the new IIQtmcnt. then then: are thn:e plausible explanations. 11ac lint. thai of cluance. is one: that statistical analysis is designed to addn:ss. 111e second. that the new IIQtmcnl is indeed noninfcrior. is what was dcsimlto prow. However. a thinl possibility. that the experiment was not sensitive 10 find a dift'cmacc. is difficulllO exclude. "111 is issue bas been n:rcm:d to as anc of ·competenc:e' (Senn. 1993) and atrects whalc\'CI'
infcn:ntial framework one decides 10 usc. An analogy may be uscrul hc:rc. In a pme of hUnI the thimble. a found thimble renders the quality or the Slrately used for finding it irrelevant. II is no marc 'found' ir a goad stnIcgy wen: used than if a bad ODe were. However. a failun: 10 find a thimble docs not automatically justify the conclusion that the room doc:s nat contain one and the quality of the SCBIdI employccl is a CI1ICiaI consideration in any judgemenl that it doc:s noL
TIre ejfed of DROI'OU1S, NONCOJIPUANCE tmd lire role of tnltllysu. It is plausible that in many cin:UJI1Itancc:s in conventional superiority trials if IICIIICOmpiiana: 01' clmpauts an: a problem an intcnlion-l~ IRat analysis will give a more modest estimate of the IRllllllcnt effecl than will a PER PIIOMCOI. analysis. In ACES. it is at least plausible thai this may nat be the case. INTEN1ION-ro-TIlEAT
COIfjlid ojrequiremenls ofodt/ilil'/Iy IIIfdclinical re/eFtlllt:e. It may be that the clinically irrelevanl diffen:nce is most meaningfully established on a scale that is nat additive. For example. in a trial of an anti-inrective. it could be most appropriate tocstablish thai theditrerenc:e in cun: rate an the PIObability scale was DOl gJaIIuthan lOme specified amounL Contrariwise. the log-odds scale mighl lend itself marc n:adily to staUstical modelling. This can lead 10 consich:rablc climcullies (Holmgn:n. 1C)g9). in panicular because a trial doc:s not Rendt a nmdom sample fmm the lalgct papulation. It may be that funhcr modelling usinI additianal data may be necessary (Sean, 2000). A CGIIUIIOft ein:umstana: likely to make n:gulalOly authorities ask questions is that a trial thai was designed with optimism to show superiorily to an active co. . . .t... fails to do so. but then is used 10 aIIc:mpIto dcmonstndc noninfaiorily. 11Iis parlic:ular set of cin:umstances has bctvmc Ihc subject ofoncofthe European Medic~EvaluatiaD Ageacy's 'poinls to consider' (Scan. 1997; CommitlCc f... Proprietary Meclcinal Pmducts. 2000). This stresses the desirability of elilablishing the trial's purpose prc-paf'armanoc and also warns apinsl establishing the clinically irrelevanl diffen:ncc. A, after the trial is ClDIDplcte. II ~ putlinl a trial thai was designed to show superiority to the puI'JJCl5C ofncminfcriority as an unac:a:ptableusc butKCqJtstheconvcnc.111e guideline recognises thai then: an: no issues of multiple Ic:sting involw:d with such switches (Bauer and Kic:sc:r. 1996) but that establishing values of A mraspc:clively may be biasing. 111us. it is preferable ror invcstigalOlS to specify in adwncc (e.g. by mcansof fannal chanp to the QlNlCAL TRL\LS JIIOIOCOL) Ihcir inlenclcd switch or purpcI5C and to nx the yalue of A prior 10 data unblinding. This. however. raises the ilSUC as 10 whether
______________________________________________________________ ADAPnVEDESIGNS the value of.d is not somclhing the regulator shoulddeclarc for given indications rathcI' than relying on the sponsor to do so. Otherwise,. a regulator could be raced with the following position. Drug B is rqislcnXl on the basis of comparison to a standarclln:allnent A bc:c:ause the lower confidence interval ror the lrealment effect. TIJ_A' excc:eds same pre-spccifted ,·a1ue..d. HoweVCl', a further drug. C. which has also been compared to A, is notgranled a lia:ace because a superiority trial was planned. Although superiority to A was not proven. the lower confidence inlerval for the In:aImeni etrc:ct 1'C_A excludes a smaller possible ditT~1It'e between C and A than is excluded for the difference between B and A by the lriallhat has led to rc:;istralion of B. SS B...... P. aad Kieser, M. 1996: A urufying appRl8l:h f«confidence inlm'als and testing of equivalence ud difremICC. Biometrika 83. 4. ~7. . . .r.1t. L aad Bsa. J. C. 1996: Biocquivalcnce mals. intcrscClion-union ICSts and equivalence confidence sets. Slatiftical Sdenre II. 4. 283-302. CeaualUee for ProprIeWJ l\IedldaI Prodad.s 2000: Points to oonsider on Switching between superiority and non-inrcriority. HUllibIad, V. aad 1CoDa. D. F. 200 I: Stati.stkal methods ror compcuiscn to placebo in acliveoCOiD)1 SlUdies. Drug In/ormation Jorunttl35. 435-19. RoIIngreD, E. B. 1999: Establishing equivalence by shwing that a speciftcd perce8IagC of thc cffect or the active control over placebo is maintained. Joumal of BiDplrarmat't'llticalStalisti(,..J9.4. 65 1-9.IJndI..,.. D. V. 1998: Decision analysis and biocquj\'aJcncc lrials. Slatisim Sri~lfCe 13.2. 136-4J. Makudl, It. aad Jabasaa, M. 1919: Issues in planning and interpming adiYe control equivalence studies. Joumal of Clinical Epitlenriology42. 6. 503-1 I. MedkaI Researdl CoaDdI sa.... m)'dn III Tabercu1a511 Tdais CGlllllllttee 1948:
~ptomycin
batment for pulmonary tubcmalosis. Brilish Medical JoufIJQl ii. 769-82. Pedmaa,M. D. aad Wu, L. 1999: 11ac empeRll"s nc\\' tests. Slalistical Sciencr 14.4.355-69. Sal....,., Do 1998: Hypothesis testing. In Armitage. P. and Colton. T. (eels). EnC)'tiopedia of biDslatisti('s. Chichester: John Wile)' & SOlIS. Lad. Sea&. S. J. 1993: InbaaIt difficultics with active control equivalence studies. StalUtia in MeJitUre 12.24,2367-75. SeDII,S.J. 1997: Slatiftical ismr.f in drug tlrvelopnrent. CUchestcr: John Wiley & Sons. Lad. s.m. S. J. 2000: Consensus and con~ny in phaimaceutical statistics (\\ith discussion). 1M Stalistician 49. 135-76. Sean. S. J. 2001: Slalislic:al issues in biocquivalencc. StDtiftia in Medidne 20. 17-18. 2785-99. Seaa. S. J. ad ItaIad. N. 2003: Editorial: Pharmac:eutic:als. paIcnIs and competition - some stalislical issues. Joumoi 0/ lire Royoi Stalislical Sot'iel}' sma A - Statisti('s in Sotiety 166. 271-7.
adaptive designs
a.JNJC\L 1RLWii that arc adapli\'C arc
modified in some way by the data dud hayc aln:ady been collected within that trial. The most common way the designs adapt is in the allocation of treatment. as a function or the n:sponse. For example. we may be interested in a dose that givcs a 20., chDlKlC of toxicity. whel'CCX"SSCS to this level of toxicity would be harmful. Thcrcrol'C, we may want to design the trial in such a way that. as more infonnation is gath~d. doses are a1loealed to optimise lhc estimate of Ihat dose.lfwc
were to usc a traditional fully nmdomiscd approach to runnirq; the trial. which is not adapth-c. we would probably not look at the data until the end or the trial. thereby risking exposing subjects to toxic doses and also possibly failing to produce an optimal estimate of the RXluired dose. Another such example of an adaptive design isgival in Rosenberger and Lachin (1993). whereby there arc two trcalments in the study, A and B. and as inrormation emerges from the triaI.he treabncnt assignment probabilities arc adapted in an aucmpt to assign IDOI'e patients to the treatment pc:IftXllling beucr thus far. 11Jereforc. when a patient enters the study. ir treallnent A appears to be better than treatment B, a patient has a greater than 50 Cit chance of being allocated In:alment A - and vi" vena. Because adapti"e designs modify the allocation of treatment on an ongoing basis, and thus protect patients from inefTeclive or toxic doses, they can be said to be more ethical than traditional designs. Rosenberger and Palmer (1999) consider the ethical dilemma between collective and individual ethics (see mucs .~ND CUNlCAL TRIALS) and argue thai in a clinical trial setling indh'idual ethics should be uppennosl; i.e. consiclendion should be towards doing what is best ror patients in the current trial as opposed to doing what is best for future patients who stand to beneftt from the l'Csults of cUlTCnt trial. The Declaration or Helsinki of October 2000 outlines the tension between these two types of ethics by stating: ·Considerations related to the well-being of the human subject should take prcc.x:dence over the interests of science and society: It is adaptive designs thal address the indiYidual ethics, as opposed to fully randomised designs. which address those collective ethics. We will be dealing primarily with n:sponse adaptive designs here. such as those just outlined. and will not be describing those designs that atlempt dynamically to balance the randomisation forcovarialc information. such as oUllined by Pbc:ock and Simon (1975) (see D.~TA-DEPENDEHT DESIDNS. MJmMISA11ON).
The randomised play winner (RPW) design attempts to a1loeale trcaIments to patients sequentially based on a simple probability model. Rosenbeq;er (1999) emphasises that the RPW design speciftcally applies to the situation whel'C the outcome from a trial is binary, i.e. either 'success' or 'failure' and where there arc only two lrealments. e.g. chug A and drug B. At the start of the trial there is an assumed urn of a baDs of type A (which rehde to drug A) and fJ balls or type B (which relale to drug B). When a subject is recruited. a ball is drawn from the urn and then l'Cplac:ed. If the ball is type A then the subject is allocated to drug A. if type B then the subject is a1loeBled to drug B. When the subject's outcome is available (and we assume that the outcome is available befol'C the next subject is randomized), the WD is updaacd. If the response is a success on drug A.then a ball ortype A is put into the urn, and
3
ADAPnVEDESIGNS ______________________________________________________________ similarty fora success on drug B.lfthe ouk:omc is a failure on drug A. then a ball of t)'pC B is put into the urn. and spin similarly for a failure on dl1ll B.ln this way. the balls build up such thai a new subject has a better chance or being allocated to a better lreatment. Rosen~er (1999) concludes with a table of eonditions UDder which the RPW rule is reasonable and provides a rQlistic allenlalivc to the standard cliniul trial design. These an: given in the table.
adaptive designs CondItions under which the RPW Is reasonable (Rosenberger, 1999) • The therapies ha\"C been evaluated previously for toxicity • The raponse is binary • Delay in n:sponse is mociemte, allowing adapting to take plaee • Sample siza an: moderate (at least SO subjects) • Duration of the lrial is limited and recruitment can take plaee during the entire trial • The trial is carefully planned with extensive computations done under dilTerent models aDd initial urn compositions • The experimental therapy is expec:led to have signiflcanl beneftts to public health if it proves effective
Traditional dose-n:sponse studies. where patients are allocated to a limited number of doses along an assumed dose-rcsponse cun'C. are limiled and. some would say. wrong. For example. if the assumc:d dose-n:sponse model is inCOlTCCI then palients may be allocated to ineffc:cli\"C or unsafe doses. One answer could be to increase the number of doses. However. this would resull in many patients allocated to wasted doses. It would be much belter 10 increase the number of doses and allocate doses to a subjC:CI based on cum:nl knowledge of the dose-response curve. which best optimises some IR-spc:cified criteria. This is precisc:ly what Bayesian response adaptive designs attempllo do. by employing Bayesian DECISION THEORY to a utility function. Thus. the dose thai most optimally addresses the utility is allocated to the next available subject or cohort of subjects. One of the first BA~ r.tE1lIDDS described was the continual mlSsessmc:nt method (CRM), inll'Oduced by O'Quigley. Pep.: and Fisher (1990), and originally devised for dose-escalalion studies in oncology. Whitehead el QL (200la) suggest that the method ClOUld also be used for applications in other serious diseases. The CRM c:nvisages a study whereby human voIunlc:ers are lrealed sequentially. in order 10 detc:ct a dose with a probability of loxicity of 20 CJt. i.e. TD20. The ~sponse is a binary response, 'Ioxicity' or 'no toxicity'. Before the study staIts, investiptors are asked to proVide what their best guess is of a probability of toxicily at
each or the series of doses. The first patient is then ~aIcd with the dose that is aJllsidcred to be the closest 10 the TOlD. Once the OIIk:ome is obsc:rved the FROB.o\BlUTYof Ioxic:ity at each of the doses is recalculaled using the Bayesian method of statistics. The proccd&R continues in this way until it wtles on a single dose. Whitehead el QI. (2OOIa) point out that the CRM does home in 011 the ro20 quickly and efficiently, but then: has been concern that early on in the lrialsubjc:cts could be allocaled 10 too high a dose. leading 10 palentia) toxicity problems. This has led to a number of modifications. such as starting at the lowest dose and never skipping a dose during the escalation. Whitehead el aL (200lb) suggelil practical exlensions to the CRM for pharmacokinclic data. employing the use of Bayesian decision theory 10 allocate ~alments optimally 10 subjects. They argue that conventional dose-escalatiOll studies carried out in healthy volunteers do not normally employ statistical methodology or fonnal guidelines for dose escalation. As such the studies can take a long time 10 complc:lc with little opportunity to skip doses. The methods proposed allocate doses in anlcr to maximise the information about the ~response curve. gi\'Cn a pre-speCified safely constrainL They use two simple utility or gain functions. one that allocates the highest allowable dose under the safely constrainl and the other that allocates doses in order to optimise the shape of the dosc-n:sponse curve. Krams el al. (2003) also use a Bayesian decision theory approach with sc:quential cIase anocation 10 a Phase II study in acute SIroIcc therapy by inhibition of neutrophils (ASTIN). which employs up 10 15 dose levels. They usc a responseadapti\"C procedun: in order to find a dose that gives an improvement over that of placebo in the primary ENDPOINT. allocllling the next subject eilhu to the optimal dose or FLo\CEBO. Slopping nales were employed by which if the pD5lerior probability of an effectivc drug or ineffective drug were greater than 0.9 then the dc:cision would be made eithu 10 go on 10 a confinnatory lrial (effeclive dnag) or to stop development (inelTc:ctive drug). In this way. they were able to stop dc\'Clopment or a compound more quickly than would have been possible under the traditional panuligm. In 2006. the Pharmaceutical Research and Manufacturers of Amc:rica (PhRMA) Adaptivc Design Working Group pUblished a series of papers in an issue of the Drug Information Association (DIA) joumal detailing various aspects of these lrials. Topics included terminology and classification: implementation; conftdentiality and trial integrily: adapti\"C dose response: seamless Phase IIIIlI: and sample size n:estimation (see Drug In./OTnJlllion JOIImal40. 425-84. 20(6). In addition. and re8ecling the growing inleresl in adaptive designs. there have been numerous special editions of other jounaals devoted to these trials. includi~ JoumQI 0/ Slalislical Pltllllling ad In/erence. issue 136(2). 2006: JOUTnQI of BioplwrIPlQt:eUliml Slalislics. issues 16(5).
_ _ _ _ _ _ _ _ _ _ _ _ ADJUSTMENT FOR NONCOMPlIANCE IN RANDOMISED CONrROLLED TRIALS
2006 and 17(6). 2007; and Stalislics in Medicine. issue AS
27(10). 2008.
KnIll., Mot Lees, JC., IIac:b, W., Grine. A. p.. Oraoaau,J..l\oL aDd Font. O. A. 2003: Acule IIIakc therapy by iabibilian of nculnlphils (ASTIN). An adaptive cIose-Iapoasc study or UK279.276 in Kute iscbanic stnIke. SlroJcr 34, 2543-8. Paeoek, S. &ad SImaa, R. 1975: ScqueDliaillatmcnt assipmeDl with ~ iltl or prolDDSlic factCIIS in CGIdIOllcd clinicallrials. Biome"k~ l I. IOJ-IS. O'QaIIIe". J., PIpe, Me aDd FIIbIr, L 1990: Coalinual reassessment mcIhod: • pl'ldieal design far fhasc 1 clinical trials
in cancer. BionretTia ~ 33-11. _ ...... W. F. 1999: R..cIomiJJCd piay-lhc-winDc:r clinicallrials: rmcw and m:onuneadatians. COIIIroil. Clilriml Trab 20, 321-12................ W. F. &ad tar..... J. M. 1993: n.e usc or rapanse-adaptive designs in clinicallrials. Co""oIledClinimlTrillh 14. 471~............., W. F.... hIBler, C. R. 1999: Ethics 8IId practice: alternative designs far Phase 10 ranclamised clinicallrials. COIflrollfti Clinical Triola 20, 172'-. WUeMad, J., Y......... z., P......... So, We""", D.... Fraadr, S. 2mla: Easy-lO-implemcat Bayesian methods far dasc-esc:alalian studies in healthy volunteers. Siolla· lirl;cr2.47-61. WMtebead,J.. ZIIDa, y ..staIIard,N..Tadd,S. ... WbIIeIIad A. 2OO1b: Lamial fram JRYious responses in Phase 1 cbc-esc:aI.oa 1lUdics. BTitUir JDIInIIlI of ClininrJ Pbsrnlll(,o/ogy S2. 1-7.
aon
adaptive rancloml..
Sec ADa\P11YE DESlCINS.
JlANI)CaUSATlON
adJU8Imeni for noncompliance In randomlaad controlled btala In clinical medicine. ·noncompliance' occurs when a patient does not rully rollow a prascribed course: of lIaImenL The alternative terms "adhcmlcc' and "concanlance' attempt toaw.icltheautharilarian cwertonc:s or "compliance'. In randomisecl QINIC'AL TRL\U. we an: cancemc:d with any ~ rrom a randomisc:d lRalmcnt. whether due to noncompliance CJI' a In:alment change q~ed with mc:cIical staff. In a trial to eamlNR two types or medication (drug A and dnag B. say) for the: tn:aImeat or hc:art disease. far eumple~ patients may ~ftlse or rorgc:t to take any of their medicalion or £orgc:a to Iakc: it some or the time (partial compliance). PatienlS aliocalcd 10 ra:cive dl1lg A might switch to drug B~ mad vice versa. Some of the patients mayevc:a take anaIhcr mc:dication altogether (drug C. say) or. particularly ir the Ihcrapy appears to be failing. RXlCive a much more radical intc:r'VCntion such as surgery. A rwther complication for the estimation of In:aImenl effects arises when patients who fail to comply with thc:ir prescribed tratment an: also Ihose who an: IIIIR likely to be last to follow-up.
Ratiollllie. Conventionally. trials with c1cpanun:s ftvm mndomisc:d tn:atment lire analysed by IN1ENI'JON-~TREAT. This clim:lIy compares the e;8'rclirmess of the diO'e..,nt
In:abnent policies as actually implemented in the llial e.g. "drug A plus changes' venus ·dlUg B plus changes'. Unlilcc efl'ectivc:aess. f!//k1lC)' ..,11lleS 10 the eITects or the In:alments themselves. and is not estimatc:d by an intc:Dtionto-trat analysis.. Rc:scan:ben may also be inlclated in the: eO'ectiveness oran intervention in othc:rcilaunslancc:s. e.g. if public suspicion of the intervention had bec:a ~duced by the positive aawls of a clinicallrial. In these circumstances. the: ratc:s orcompliance may be improved and actiustment for this change may be aIlc:Inpted. It is imponant to define the aim or adjustment for noncompliance. For example. in a trial of immediate venus c1crc:nat zicIovudine in asymptomatic HIV infc:ction. the: initial plan was to derer zidowdine until the onset of symptomatic disease. However. rollowing a pnJtocol amendment. some individuals startc:cI zidowcline beron: the onset of symptomatic disease (White et Ill.• 1997). Then: was interat in estimating the eITect Ihat would have been absc:m:d undc:r the original protocol. Zidovudinc: beron: the onset of symptomatic disease was thc:n:rore regarded as ·noncompliance'. Other individuals stopped zidovudinc Imdmc:IIt because of advc:rse events. Additional adjustment ror stopping tn:almc:nt would not answc:r a clinically relevant question. so the analysis did DOl aim 10 estimate efficacy. Adjustment rar noncompliance: is useful in a variety of situalions. Patients may be most intc:n:sIcd in In:alment efticacy. DiO'e~nc:cs in compliance may help to explain variation or a lreatment effm with time. between subgroups in a Irial or betwc:c:n trials in a .mAo-ANALYSIS. Reconciling llial daIa with obscr'Vational data may laIuire adjustment rar noncompliancc: in the: trial. Policy analysis may lellui~ projc:ctions for situadons with improYed eampUancc:. Most llllempls 10 allow ror noncompliance use on-IRalment analysis or PER PIIDI"OCOL analysis.. This only proviclc:s a valid comparison of the In:alments themselYc:s (efflcacy) if complic:rs and noncomplien do not cliffeI' systematically in their disease stale or prognosis. In practice this is unlikely to be the case. so selection bias occurs. Heart clisc:ase palic:nts who comply with their pn:scribc:cl medication. rar example. 1ft also those who an: likely to impro~ thc:ir diet or lab man: exc:n:isc and thesecllanges. in lOm.1Ire likely to lead toa better outcome. SELECTION BIAS may often be n:duced by adjustment ror baseline m\'ariates~ but thc:n: is still no guanntcc of an unbiased analysis. For elUUllple. in the: Coronal)' Drug Pmjc:cl. 5-year manaUty of poor eampUc:n was 28.2 CJt compared with 15.1 CJt in good mmplic:~ and adjustment far 40 baseline ractors only raIu4"ed the diITerence to 25.8 CJt vc:nus 16.4 CJt (The Coronary Drug Project
Rc:scan:h Group. 1980). Newer ·randomisation-basecl' methods am estimate emcacy while aw.iclinl selection bias by din:dly comparing the groups as randomiscd as in an ination-IO-tn:at analysis (While., 20(5). This is made possible: by considc:ring the
5
AGE-PERIODCOHORTANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
subpuup or 'compliers' who would ha\'e n:ceivcd Ihcir mndomised batmen.. whichever group they \\'ere nndomiscd to. For example. a trial in Indonesian childn:n campan:cI vitamin A supplemc:alalion with no inlerVention. the outcome beil1l 12-month manalily. Vitamin A supple. or the intervention ann and by DOlle or the caatnJl ann. Sammer and 2qcr (1991) cansideml the subgroup who did not n:ceive vitamin A in die intervention ann and a corn:spondilllsubJrouporlhe conbol ann who ~·ould rrol/raJ,-e ,eaiJ,-eti l'ilanrin A if Ihey bad hem allocaled 10 receire il. These 'noncomplier' subgroups were assumed to be unalJ"ecled by allocalion to vitamin A. It is then straiplfolWani to estimate abe number or nancomplic:n in the conlrol arm and their mean outcome. and hence the risk difTen:ace. risk maio or odds ratio in complien. This is oRen called the 'complier .~ causal effect' (CACE) estimate (Utile and Rubin. 2000). This approach is • special case of PRINCIPAL STRAnFlCAtICN. A man: gc:aeral approach requires a model n:laling polential oUIcomes for each individual under different caunterractual In:almc:ats. A simple model mipt sa)' that each individual would have blood pressure b nunHg lower if they lOok abe cIru& with perfect compliance than if they did not take the cInas. with pmpodional blood pn:ssun: mluclions for pallial compliance. Such • maclcl may be lilted by abservil1lthat untn:ated blood pn:ssure must have the same dislribulion in each rancIomisc:cl group (Fischer-Lapp and 0adJhebc:ur. 1999). An important ad\'8Rlage of these methods is that no assumption is rcquin:d about the relalionship bc:Iwc:c:a compliance and polc:IIliai outcomes. 'I1Iey are closely related to the use: or JNSTIt1}),IE1Il VAlUABlES mdhods (Dunn and 8entall. 2007). The approaches just described are genc:rally only able to estimate one tn:almenl effect in a Iwo-ann trial. They ad to be: hopelessly impra:ise: in situations such as EQUIVALEICE STUDIES where patic:ats may Slopallbatment during the trial. so that the analysis rcquin:s estimation of the elrect or both lIaImenlS.ln this case it is possible to adjust abe nndomised comparison usil1l observational estimation of one or more treatment elTects - i.e. assuming the~ are no unmc:asun:d caafounden for tn:aIment. Methods such as mtlrginlJ/.rlnlcIliral modellirrg can work e'VCII when at'tuaillalmeni is both a consequence of symptomalic deterioration and a cause of slowa' disease progression (se:e Little and Rubin. 2000. for ~ferences 10 this literalUre). A trial wilh noncompliance has less POWER than one with perfect compliance. as a n:sult of Ihc mlUClCd effect size as estimalc:d in an intcDtion-lO-Rai analysis. and it is natural to want to n:cover the last power. Howe~r. many or the DeW pnx:ed1RS preserve the intcntion-lo-Cn:aI SKOOF1CANQ LEVEL and thererore do not affect power. In some cases. il is impossible to relain power without makil1l some assumption mentation was actually received by only 8Oe.t
abaat comparability or noncompliers and compliers. In athc:r situations. some gain in power is theoretically possible. but this is unlikely to be appn:ciable in plKticc (Becque and White. 20(8). Signiftcance testing should therefon: rely on intention-to-lmll analysis even when other melhacls are used to estimate emCat'y. IWIGD
w.....
AnpIst,J. Do, IID....O. RaItIa,D. B. 1996: ldentificlllioa of causal effects usiq illSlnllDeDtal variables (with discussiOll). JoumaloJlbe AmtriamStaiinicalAs.ftJdalitHI91.~72. B....e. T."'" WIllIe. L R. 2008.ltepininc pG'A'Cr lost by non-campIiance via full proIIability modcllinc. Slatistiu in Medicine 27. 5640-63. DulIn. G............. R. 2007: ModdlinglJatmcnt-cffc:ct hcaem~ity in rancIamizcd CGIdIOllcd trials of complc:x intcnentiaas (psydlolCJCical tn:aImcnlS). StaiulicJ in Meditillt 26. 47(4)....15. .....-....." K. .... ~.... B. 1999: PmcticaJ pnJpCrties of same structural mean analyses of the effect of compliance in
randomized trials. COIrtroiletl CliRitll' Triu& 20. S31-46.1..1tt1e. R-
.... RuIIIn. 0. B. 2000: Causal effects in cliDical and epidemiological studies via potential oukClmC5: CCJIICCIIlS aad analytical a~ praachcs. A""""I Bern· oj PIIbIir Health 21. 121~S. 'Ibe C..... ..,. Draa PmJtd R..ardI Graup 1980: InIIueaoe of adhaaace to lRallllent and rapoIIle 10 cholcsaeml on mortality in Ihc CORJRII)' Drug PIojed. Nn' Eng/ad./tJumQ1 tJj Akdkine 303, 103s-J1.
WIllie, L R 2CJ05: Uses aad limitations or randamizaIioa·1Jascd eflicaty cstilDldol5. Sialislital MellrDt/s in Medial' Relml"ch 14,
327-47. \\'Idte. L It, Walbr,5..BaIIIker, A. G...... .,.,.,......... J. R. 1997: Impad of tn:abncnt chanFs OD the inlcrpmaaion orthe Conconle trial. AIDS 11. 999-1006.
.ge-perlocI cohort ...lysls
To understand die effect of lime on a particular oulcome ror an individual it is c:ssenlialto n:alisc the n:levanl temporal praspc:ctive. Ale affects mIlD)' aspects of life. including Ihc risk of dise:ase. sa this is an essential componc:at or any analysis of time trends. Period denotes the elate of Ihe outcome and if Ihc outcome varies with period it is likely to be due to some undedyilll fat'tor that affects the outcome and varies in Ihe same way for Ihc entire population under study. Cohort. aJIItrariwise. refers to lencrational effects caused b)' factors that only affect particular &Ie groups when their level changes wi'" time. An eumple ora period effm would be a potential elTecl of an air contaminant thai affected all qe poups in the same: way. If Ihc leyel of exposure to thai fat'tor incrcascdlde.creased with time. ellClting a change in the outcome in all age groups. then ~ would expc:ct • relalcd paltel'll aclDSS all age poups in Ihe stud)'. In studies that take place over IOIIJ periods or lime. the technology for measuring Ihc outcame may change. giving rise: to an arlifactual effect thai was nol due to change in exposure to a causative ageaL For example. intensive scn:ening for disease can identirydisc:asc: cases that would not previOusly haw: been identified. thus artificially increasilll the disease rate in a population has had no change in exposure over time.
"'aI
___________________________________________________ Cohort (also called birth cGbart) eJrcc:ts may be due to fllClorS n:1ated to exposun:s assacialed willi Ihc dale oI'birlh.
suc:b.dIe inlRMluctianofaparticulardrul_pnldiceduriDl dial was bmuchl in at a parlicular point in tilDe. FarexampIc. aprepancy pmcticc assaciated with iDcn:asc:cI risk and adapk:d by the popuIaIian or mothers cluriDg a panicular tilDe pcriodcould aired the risk during Ihc lire. . . of Ihc c:nIiR: ICncnllioa bam cluriq thai period. While it is conunaD ton:l'crto thc:sc: efl'eclSas bc:inlassodatc:d with Jar ofbinh..lhcy could also be: &be ",sak or chanps in c:.x~ thai occ:unalaRcrbirth.ln .....y inclividuals..lifell),le fadexs thai may affect disease risk ewer a lifetime an: Iixc:cI .Ihcy approach aduldlaacl.. Aqullllliftcalion ofthesc cft'c:ctson such a pnc:ndian would give rise 10 a COlDpuisan oflhc:se cohort or ICncndioDaI efra:ls. An inhc:n:nt n:dunclancy'amGIIglhcse thrc:c Icmponl fae. Ian arises rram Ihc facllhat knDwin& any lwo fadon implies the of &be third. Farc:.xaaaple. if we: know an individuals • Cd) at a given date or period, (P). Ihc:n dac: cabart is the difrCIaICC (c =p - II). This linearclepcndc:ncc liYC:Srise to an idc:nliftalH1ity pmblc:m in a fonn" ...-c:ssion model that atlc:lllpis to obtain quanlilali~ c:stimates of rqrasion .,.... mc:Icn lIIIOC:iated with each IalDporai c:1emc:at: pn:p8DC)'
AG&P~ODCOHORT~~S
aenlS or ownIllinear tn:Dd and curvallR ar cIc:paItIft fram I.ar_ In:nd. For c:uaaplc. • can be gi\len by ai i P.. + ci ;. when: i i - 0.5(1 + I). tI" is die overall slape and ;;; the cunalule. The ownIl model can be: expn:ssccI as:
=
=
2010
_lie
E[Y) = flo +II/l. + PIl, + r/J" Using Ihc linear n:latiansbip between Ihc lives rise ID:
~ f~exs
ElY] = Jlo + 4. + p/l, + (P-")Ilt = /10 + d(IJ,,-!Jt) +p(JJ, + /le) 'which has only two identifiable piaramcIcn bc:sidc:s the inlcR:c:pI instead of die c:xpectccl tIRe.. Another way of Yisualising allis phc:aolllCllDll is Ihat all eambinatians or SIC. period and cohort II1II)' be displayc:cl in &be LExIs DIACIWI (_&be ftpn:)~ which is obviously a n:pn:sentaliDn ofa twodimc:nsionaI plaac inslCad of die: tine dimensions cxpc:c1Ccl for tine separate factors. In pnc:nl. dlCse analyses an: IIDllimited to linear cfl'c:cIs appIiccllo a eaalinuous mc:asun: oflimc:~ but iDslead dle:y 1ft applied to b:alporal intervals.. such • disc:aso IIIICS observed for 5- or l~ycar intcmds or.., and period. When Ihc widths of these iatcrvals an: equal. the model may be c:.xpn:liICd as:
£(Yp) =/1 + a; +.7rj + Yk
when: /l is the inll:RcpL a, the efl'eet or . . f_ die ilia (/= I •.... I) inlen'aI. lrJ Ibe effc:cl of period far &be jlh U= I ..... J) inlCmll and die eJTect Ihc kth callOd (k=i-j+l= t ..... K=I+J-I). 'lheusual c:onsIIaints in this madel imply dIa. Ea,= E~J= EYA =0. '1111: _ntiaability plUblc:. manircSlS illelfdnugh a single: unidentifiable palBlllCtc:r (Faenbc:rg and Mason. (979). which can be ~ easily sc:cn if we partitioa each tcmparaI efl'eet into compo-
r.
or
30
40
Age(yearB)
__period cahort . . . . , . lsIs diagtam showing the RJItJtionship beIween age, period and oohott. The dagonaIline tnIc8s III/8'11IHiOd lfelime toran intIvkIuIII bam In 1947
.. -
... .... e["gt] =/l + (;!J" +a i) + (jll" +~i) + (lcll" + y.:) =JI +
-iUJ.. +/l,,) + -i(JJ" +/l.,) +a;-
-
.....
+.7ri+1". because Ic = } - I. Th..... caeh of the curvalun:s can be: lDIiquely delalnined. but dae 0\'e11l11 sq,cs 1ft hopelcssly c:ntaqlc:d so that only CCIIain CICIIIIbinatians can be uniquely cllimated (HolfonlI913). '1111: implication or the: identifiability pnJblcm is thallhc: ewerall dim:tion of the c:O'c:cI far any or the ..... 1CmparaI companellts cannaI be cIc:b:nniaecI fium a ~lR'ssion anal),sis. Thus.. we: cannoI evc:a detcnnine whether the tRmds an: inm:asiq ar declasiq with cobad. far instanc:lc. '!'he sc:cond ftgun: on PIIIc I displays snc:nI combilUllians 01' age. period and cohort parameters,c:Kb SCI ofwhich JllDvicIcs' an iclc:nlical Ic:I of filled rates. Nalicc that as Ihc: period
7
AG&RELATEDREFERSNCERANGES ________________________________________________ panuaeIeIS an: IObdc:cI clockwise:. the age and cobalt panImdc:n an: eomparably IQlaIcd in abe counterclockwise dim:tion. Each of these parameters can be IObdc:cI a fuU 181)0. but it is importDDl also to n:alise that they cannol be
IOlIIII:d one at a time.. only all together. n..s. even thOUlh the speaRe tIaIds cannaI be uniquely cstimalcd. certain combinatio.. of the overallln:nd can be uniquely delennincclsuch as!J" +{J". which is callecl Ibc nel drijJ (Clayton and Schil1lcn.. 1987a. 1987b). Alk:maIive drift a.timata covering shaner timcspmas can also be dclcnniDcd and these have pnldical significance in ta.abey describe the experience orfoUowing a palticular age paup in lime, because bath period and cohort will advantIC lOgeIhcr. Curvatura. by way of aJIIlnIsl. are campldcly cldcnnincd. including paI)'DDIDiai panunden for the sq~ and higher powers. changes in slopes and second difrc:n:nc:es.. The signiftcancc test ror anyone of the Ic:nlporaI eft'ccts iD the pracnce of the 0Ibcr two wilileacrally be aacst of abe conaponding curvatun: and naI the slope. Holfonl piovidc:s rudhcrdcrail an how software can beset up for fitting these models (Holford. 20(4). ' TRH aa,toa,D.............. E. 1987a: Models for temporal wriation in caliccr I1IIC5 I: Ap-periocland ~ahort madcls.. SlatUli('s in MftlidM6, 449-67. a..., D...... SchIfIInI, E. 1987b: Madcls for temporal ~ in cancer riles D: ~ ClDhon nladelL St,,'isliaill aVecIi('ille6,469-8J.I1eJIbIq,5.E..........., w. M. 1979: lcIeatilicalion and cstimaliaa or ~period-cobod models in the aaaI)'sis of discmc. .bMIl data. Sociolori('tll'MelhoiolDgy 1971.1-61. Halford, T. R. 1983: Tbc cSlimaliall of age. pcrioII and cohorI drects rar vital rates. BiDlrwtri('s 39. lll-24.1IoIfonI. T. R.
2004: TCIIIpCII'II flldals in public heal... survcillance: SCIItiDg aut age. period and CCIhart dfeds.1D Bsookmc)'CI', R. and Sbaup. D. F. (cds). aVDllilDrinl tbe bealtb 11/ populstitNu. Odani: Oxford Uai\'CISiI)'
PIal. lIP- 99-126.
age-ralatedreference ranges 1besc an: ranges of values or a mcasu~mc:at that identify the upper and Iowc:r limit of nannality in lhe population. w~ the range varies according to Ihc subjecl'S age. Rcfcn:nce l'BIIIes are an important put of medical diapasis. w~ a conliDUDUS mcasU~lI1CIIt (e.1- blood pn:ssun:) needs converting 10 a binary wriable fardccision-making purposes. If the patient's value lies outside Ihc measurement's reference range it is tn:atccI as abnormal and the palicnt is invcstiptc:d further. The canSlJUclion or refcrcacc I1IIIgcs involves estimatinl lhe range or values that CO\lCJ'i a spc:ciftc:d pm:enlqc or Ihc ~ren:ace papulation. often 9S .... Usually this is the ccntnl part of lhe distribution with equal rail an:a probabilities. allhaugh in somccascs the ~ren:nce range is baundccI at zero or infinity. For nonnally distributed cIaIa the range can be fnmI the population ~ and STANDARD DEVL\11O.'f (SD). die 9541, nmgc. for example. being the meaD plus or minus 2 SOs. For nannormal data the simplest approach is 10 use quanliles. i.Co rank and caunt the data. then the 2.5CJ. and '¥IoSCJ, points are Ihc lower and upper limits or Ihc 95 CJ, referaICC I1IIIgCo However. this is indlicicnt and n:quircs a large sample. If the data an: sIcew they can be iransronncd.. e.g. to lopritlumi. and then the refcrcacc range can be calculllled f..... the mean and SD on the transformed scale
cIeri_
2.-----------------------------------------------------------1
0.5
-1
-1.5
~
_
........- ........--........- .........--........--........- .......... .........
--........- ....-..-
.......- .......- ........
..M._......._.M...._.M...._
_ ......._.
........--.....~.~.-.........- .........- .......-.
........_ ........_ .........M_.......
. . M._........._ ........_ ......._ ......._ ......M_........_ ........
M .......
......
PerIod slope
-0.00 -0.05 -0.10 -0.15 -0.20
-I--
~.5~------------------------------------------------------~ age period cohort 1InIIIys. AiJB. pIHiodand cohof1 efIeds torpm-menopausal bnHlsI cancer incidence for SEER 19~1~
,
_________________________________________________________
AG&S~F~RATES
1m~------------------------~
180 -140
f -
12OJ------
I! 100 L-------~ 1m.·············· .. I. ..................................................
1
80 40 .......................................... .
........................
20
o~----~------,-----~------~ 1.. 19 24 Age(J8ars)
.....,..... ....iance . . . . AgtHe/ated 95" reIetence ranges for blood Pf8SSUte In boys: systolic (solid lines} and diIIstoIIc (dolled Hnss} andlnnsfGllllCd back 10 Ihc: original scale. A JIIOIe ftexible variant is 10 use a 8oJt-Cox power tnmsfomudion (of' which the laprithm is a special case). which adjusts for slcewaess more prmscly (sa: 'IIlANSRJIW.VIO). A&e«laIcd refen:nc:e nnps an n:feIaIcc l1IIIIes that dc:pend on DIe. They arise mast commoal)' in paediatrics. notabl), for apHelalccllllCaStRS of'bod)' size like hei&ht and weight. which can be displayed as CIIOW11I awns. The priac:iples or reference IBIIIC eslilDalion ale CSICIIliaU)' the SIIIIIe wilen they an: age re'aIaI. exC:Cpllhat Ihe JDD&CS Cor adjacent BlC JIUUPS aeeclto be consisteat. To awid clilCOlllinuiIic:s at the 8IC lmap boundaries .requires the sUIIUIUIIY slatistics toclefine the refereDCIC nmce (e.I.the mean ancISO) andtocbanpsmaothl)' with lIIe butimpasiq this constraint compIicarcs die fitling piQCess. For normaIl)' distributed homosccdastic data, when: the SO is coastanl ac:nJSS BIC. the lIIe-n:lated mean can be eslimllb:d by LINEAR IWIRESSION and Ihc: Jd'en:nc:c map cxmsllUcted 8IOUIId the regrasiOD cane using the n:sidual SO. 11M: rqrcssiDn curve is estimated asinl a smoaIbiq repalion fUnction. e.1- a pol)'llDlllial. Iiac:IionaI polynomial or lencrarllCd addilhc (cubic spline) cunc.lfthe SDchanps witb . . u is often the case. a c~ of Ihc: ap-related SO also needs to be estimated by the n:p:ssiaa methods or Aitkin (1981) or Allman (1993) and the Ble-reIaIccI me.. obtained usiq wciJhtcd linear relR:Ssian willa weights com:spollding to the imene squan: or the Ble-n:lated SD. 1'bc age-relatccl Jd'en:nc:c mqc is apia COIISII'IICIaI araund the repssioD CIII\'C using abe SD curve. When the cia.. are skew it may be possible toacljUSl for the skewness ..... a single. col. logarithmic. 1ransf0000000on at all asCI. However. often the cIepec or skewness is iiself' age ~ althouP this needs a large sample to show iL .. this case an 1IIe-n:Jated sWlI1IIIII)' slillistic for the has to be estilDlllecl, alGIII with the ....relalal mean and SO. The
_wness
UIS MEIIIDD isa papuI.-way todo~is, or altemmvelythcEN method of Royllon and Wnpa (1998). For IIIIR extn:me
IlDllnormai elata, a nOnpanuneIric appmach based on QUA.N1D.E PElWfSSIlN i5 ncecIecl. a fonn of least absaIUb: enors n:p:ssioD. when: smoaIh curves are cOlllbuc:led for the agerelated upper and IGWa' limits of' the n:feraace ranp. 'I1Ie figure livcs ase-n:JaIcd refelalce raqes for systolic and diastolic bloacl preIIUfC in boys.apd 4-24, estimated by the LMS methacI. ~ an: two advanIaps of'refeninte mnps based on an undedyiq fmauenc)'distributioa. uapposcdto~cIeri_ . . . quantile rcp:ssion. TIle lint is eflic:iency - the standard c:mmI ofb refen:noe I1IIIJC limits are smaller. 1b: SCICDIId is
anaI)'licai conVCIIiencc-daIa for indivicluals can beconvenccl loz-SCORES.. indicating how many SDs they an: above or below the median of the c:IisIribulion. wllic:h is a convenienl way of adjullilll for. prior 10 fUrther analysis. TIC (Sec also 0I0W1II awnsJ
AlIda, M. 1987: Madcllilll Y8riIIxe hc~ in IIGIIDIIIqIasicIa asilll GUM. App/iIIJSlflIlJIlcl36. 332-9. Mala, Do O. 1993: CalSlructiaaafap«laIaIicLaxeccntilcs......... aidulJs.
StlllirlnillM«Iit.W 12. 917-24. Cole, T.J.... Ona, •• J.I992: 5'nIDaIhiac n:fCJCIII!C ccnlilc CIII\'CS: ~ LMS IiIdhoIIIIIII peaaIizcd IibIihDocl. Sltllistin iIr IIft1lrine II, 1~19. KoII*tr, R. W.... D'O"'t V.I987: ~"",,""'.ApJIIWSttIt&lirl 36, 383-93• . . . . . , P..... WrlPt, JL M. 1998: A IiICIhod far cstimd.lIMPCCific Id'CftIICe illlCnlls ('nonDaI raps') based an
fraI:tiaaaI paIynamiIk and apanealiallIIIIsfarmIIIia JourtIIIIof" un. 79-101.
IlDJYII SIIIIiMiaII SDci6y sma A,
.......,aeHIc rates
These an: rates calculated within a Dumber of Kialively Dl1IIOW' ap bands. A ClUcIe rate is the: Dumber or ewmls occurriq in a papulation duriDg a speciftecllime period divided by an eSlilllllac of' Ihc: size of the:
I
AGRESMENT ________________________________________________________________
populatiOD. However. when comparinl rates belween populations willa diffen:nt !lie dislributions. it is necessary to consider IDles at specific ages separately. In the table:.. clealh rales an: pn:sc:naed for COIla Rica and Ibe UDib:d Kinplana for 1999. derived fna dala flUID the United Nations (2002). The final colunm lives the SICspecific rales far bmad !lie bands and the crude (lotal) rate. The QC-specific rate is calculated as the number ofdeaths in Ibe palticular DIe lroup.ln Costa Rica. thc death rate aI ages 0-5 is calculated as 129611 070000. 'I1Ie rate isexpmlSCd per 1000 penons so the nile is multiplied by 1000 to live the rate of 1.2 pel' 1000 in the final column of the table. .....pecHic rates Population, number of deaths and deaIh I8Ies from all causes lor Costa RIca and tire United Kingdom for the YfHU 1999 CD~III
RiclI
A,e
PtlplllllliDIr
'96 in.II,e
11'0"11
(IOOOOD.s)
gl'tlllp
0-15 15-49
10.7 17.4 3.9 1.3 33.4
32'1\
50-69 70+ Tala)
12 '1\ 4.,
DmIJu
DeIIIIr Nle//ODD
1296 2766 3447 7523 15032
1.2 1.6
8.8 56.6 4.'
Uniled Kirrgtlom PtlpIIlllliorr
'96 in lI,e
(IOOOOD.s)
gRlllp
0-15 15-49 50-69 70+
113.9 288.0 126.1 67.0
TaIaI
59'~0
DmIJu
DeIIIIr Nle//ODD
19 'I.
S8SO
48~
31228 120759 474225 632062
0.' 1.1 9.6 70.1 10.6
21 '1\ 11'1\
RATIOI 0.... far N...... staIIItIcI 2002: MorlDli" "D'islier mII.ff'. R~,~,,· oflM bgalr. G~I'tIIIIII_11u ilyeDuse, .x_IIg~. in
ErtglDRJ _ WDks. 200/. I.oadoa: Ofticc for NIIIioDaI Statistics• PmtdD, M., WIIeIaII, So, FIrIa)", J., TIppO, L -1'IIaIaaI, D. B. 2003: Olmw iItritImt:r iIr ftl'e ftHIliIIt"ls. ~l. VOl. Lyan: IARC Scientific Publicldioas. Valid Na_ 2002: 2000 tltmogrtlphic )WlrbooIc. New VOlt: UnilCd NIIIions. WaI... A., ........ J., CaaBard, .... GaddanI, Eo ... M. 2001: liI'ing in BrilDiR: re.ll~from lilt 200D c;,lItrtll HoaseIroIti SlInyY. Landan: 11Ie Stationery Offtce..
n-.
52~
Age ,,.D"II
00la: of NaIi..aI Statistics (ONS) in Ensland and Wales (ONS. 2002). Ase-spc:cific disease incideace rates are also published in wrious cauntries, mosI DOIably cllllCer incidence. for which inlclnalional data an: compiled by Ihe Inlemalionai Asency for Research OD Canter (Parkin el QI•• 2(03). Age-spccilic prevalence ndes for exposures such as smokinl can also be derived. but ~ mon: usually obtained from specific surveys such as Ihe Oenn Hause:hold Survey (Walker etlll•• 2(01). HI (See also CAUSE-SPEC'mC DEA1H RA1E.. STANDARDISED . . . .AIJJ'Y
The: crude: (talal) rate forCosta Rica is less than halflhat for Ibe UK. However. at DO age is the nile in the UK double that for Costa Rica and far some qe groups the rate is higher in Costa Rica than in the UK. Note that the pe~nlQc:s of the populalion in each age group (third column) diller markedly. The UK papulati.. is much older (II .., of thc population ~ oyer 70 compan:d with 4'1\ in Costa Rica). The different . . st~ explaias the misleadins mmparison between the crude rateL Agc-spec:iftc rates are cwnbenomc: to mml'lR across a number of populalions. Standardisation mc:lhacIs are often used to provide an qe-adjusted sunurtal)' nile far each population.
Many countries publish qe-specific ntc:s for all cause and specific causes of death. c.l. tile annual publications of the
8gl'88lllani Apeement in repeated assessments is a funclameatal crilcrion for quality of usessmenlS on mtins scales. The use or ratinl seales and ather kinds of anIercd cllllSi6calions of complex qualitative variables is inlel'disciplinuy and unlimilc:cl. RatiRl scale assessments produce tlrr/ina/ the anIemI caleCOrics JqHaentinl only a rank onIc:r of thc intensily of a particular variable and IIDI a numerical value in a IDIIthemalicai sense. althqh the use of numerical labelliag could give a false impression of quantilalive data. (sec RANK INVARWICE). The main qualilY concepts of scale assessments are reliabilily and "alidiIY. Reliability (sec ME.o\SUItEMENT PIECISION AND REl.lABlury) refers 10 the extenlto which repelllecl measun:ments of dae same: abject yield thc same raula. which means IIIR'Cment in n:peated usessments of various designs. In intell1lk:r n:liabilily (see MEASUJlfJ.I!NI' PRECISION AND REUAllurY) studies an: madeofthc level oragn:emenl belween obscm:rs duat classify the SIIIIIC object or individual. and inlnlraler reliabilily (see INrRACUSS CORREl.ATION COEFFICIENT) slUdies refer to ap:cment in lesl-relesl scale usc:ssmc:nts by dae
.'11.
same: nIIcr. the f""lucDC')' disbibution of pain of ordinal data is described in a square COJrnHOEI'D' TABLE (sec the figure wilh pans I, II and III on page II). and in the cue of continuous usessmenls on a visual analogue scale. VAS. by a scllllcr plot. 'I1Ie percentage apecment (PA) is a basic agn:emc:at meas~. When theqreemena is unsatisfactar)' small RUDDS f.disqn:ement ca bce'Vllluatc:cl by a &IaIisticai meIhod lhal labs Kalant of the rank-invariant propc:dic:s or antinal clata and thai makes it possiblc 10 identify and measure systelDlllic disasn:ement. when 1B1CDt. separately from disagn:ement
________________________________________________________________ AGREEMENT
Systematic diSllJl'CCmcDl is evident by the lIIBIIinai helUogencity. and by pairilll otT abc two sets or lIIBIIinai rrequencies. the so-called nnIt-lnInsrannabie pattem of &p:ClDent (RTPA) is cODstnlclcd. The RTPA clc:scribcs the expected paltera in abc case of systematic disqn:emc:nt only. All pain or observations or the RTPA will have tile same rank onicrilll in lhe two assessments provided dial the ranks am lied to the cells. which is the clcfinilion or the augmcnled ranking pnJCCdun: (aug-nmb)
caused by individual variability in assessments. SySlemalic disa&Rcment is population based and maIs a sySlemalic duuage in conditions or memory bias bcIwccn 1CSl-n:1CSt assc:ssmenlS. or bclWccn ndcI5 who interpn:t the seale caIelDries ditTeratly.l..arJe individual wriabilily. on the aIhcr haad. is a sip orpoorqualily ora ralilll scale as it allows for uaccrlainty in ~1Ilcd assessmc:atl. TIle pracnc:e or systematic disagn:c:mcnt in Ihe use of the scale calCpric:s between the two usc:ssmcnts is ~ed by ditTcrcat fRe quency dislribulions. which means IlUll'linal distributions (paris I and II). A systematic disagr1:clDenl n:pnIiac the categorical levels and in the way or concc:allatin& the asseamenlS on Ihe categories ~ mellllUml by the n:lali~ position (RP) ..... the n:lali~ conccnlralion (RC) n:spcctively. The RP expn:sscs the extent to which the IIIBIJinal distribution of assessments Y is shined lOWanIs hillier categories than the lIIIIIIinai distribution or X. nather ..... the opposite. A lheon:ticaJ clcscripti_ is the: diR'c:n:nc:e bclwecn the probabiUtics P(X < JI) - pcY < Xl. Possible values or RP ranie from (-1) 10 1. and RP is positive whcll higher seale calcJories ~ more frequently used in the assessments Y than in X when CXIIIIp8I'Cd with the opposite. Com:spondingly. Ihe RC expn:sscs Ihe extent to whieh the IIIBIJinai distribution or Y asscssmen15 is II'ICR COIICCnlndcd to central seale catcgoric:s than is the rnaqinal distribution or X. thc:oIclicaily clcscribecl by Ihe diR'c:n:acc in probabilities P(X,< r,,<x,)-P(Y,<X,, < lj). Possible values mnge from (-1) to 1. and a posilive RC indicates thai Ihe assessmcnlS r an: man: CXIIICCntratcd than x. Zao or wry small values or both RP and RC mean lhat the systematic part of an observed clisqrccmenl .-ireel assessmeats is ncglipblc.
I A
RaIcr
r
RatcrX B C
D
I
I
D
c
II tat 2
A
(sec IAXKINO). Part U in the figun: is the RTPA orthe pallem in part I. The observed distribution or pairs in part J deviates from this
RTPA. which means that some or the pain or aug-ranks given to Ihe observations diO'er. The relalive rank yariance (RY) is a rank-based IDCDSUK or this observed individual variability. i.e. unexplained by the measures or SyslCmalie diS81n:cmcnl:
when: n is the number or paiRd assessments and (.di,)1 is Ihe squsn: or the mean 8U&-lDllk diR'en:ac:e of the Qlh cell. ad the summation is made over all cells ij or the m x n, square table, 0 ~ RV ~ I (Swnsson Itl DI., 1996; Svensson, 1998&). TIle Cohen's coelltcienl kappa (,,) is a commonly used mcasun: or 8IrecmeDt adjuslccl ror lIIe chance expected qn:c:mc:nl (sec KAIIPA AND WEIOII1'ED KAPM). the calculations of Cronbach's alra and other so-called n:liability CXlCl1icicn15 an: baed on the lIlISumplion or quantitative, normally clisbibutcd daIa. whieh is not achievable in
RalcrX B C D
tat
'Z
Z
IJ
2
'·Z.
14 18 C
t 16"
B
1
1
II
3
16 B
A
2
8
3
1
J4 A
tal
3
II
17
19
PA.12'1 RP. -0.49
RV,o.OB
17'
3 11
!50 tal 3
II
17
19
RP,-G.49 RV.O
A D
RatcrX B C
D
I
4
14
tat .9
II C
1
2
10 4
17
16 B
1
6
3
II
14 A
1
2
50
3
II
tal
1
3 17
19
!50
PA,62~
PA.12" RC.O.l6
OJ
R~0.16
RP=RC.O
RV,o.OS
agraement EJtIJmpIes of psifed ordinal data from Intemtter assessments on 8 four-poinl SCIIle with the ORIered calegodesl8belledA < B< C < D. 7beranlc-transfotmablepaltem oIlJ111'f18f11f11(RTPA) Is sIuIded. Themeasul8SoI percsnlllge agreement (PA), lhelflllltive position (RP), Iherel8t1ve concenItaIion (RC) 8IId the 18Ia1lve mnIc VBIfance (RV) 818 given
11
AKAIKFS INFORMATION CRITERION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
data rrom nding scales. There is also a widespread misuse or the correlation coefficient as a reliability measure. The eorrclalion coemcienl (see CORRELATION) mcasun:s the degree of associalion between two variables and does nOi measure the level of agn:ement. In part I of the figure the PA is 12~. and the observed disagreemeat is mainly explained by a systematic disagreement in position. The negative RP value (-0.49) and the RTPA (parlll) shows that the assessmenls r systematically used lower catqories than X. A slight additional individual variability. RV =0.08 is observed. SPEARMAN'S RANK CORRELAll0N COEfFICIENT. r~. is 0.66 in part I or the figure and 0.97 in part II. ignoring the ract that the assessments an: systematically biased and unreliable. The same holds for the eoemcienl kappa (-0.14). In pari III the marginal homogeneity and the zero RP and RC values confirm that the disagreement (39~) is entirely explained by slight individual dispersion (RV =0.05) from the RTPA. which is the main diagonal in this case. The ra is 0.61 and the It is O.4S. Besides reliability studies. the level of disagreement is or main inteRst in paired asscssmenls ·berore and after' lrealmeat for analysi~ change in outcome or treatment effect. In this application or the disagn:ement measures, nonzero RP and RC values indicate the level or eommon group change in outeomes. and the hcterogcaeity in changes among the individuals is measun:d by the RV (Svensson. 1998b). ES Swauoa, E.. 1998&: AppIicatiCIII of a llIIIk-invariant medaod to C\'IIIuale n:liability of ontcmI catepD:al asscs:smcnlS.. Jorulftll of Epidemiology I11III Bioslalillirs 3. 403-9. SYIIISIOII, It. I998b: OrdiDDl invariant measures for indiridual and group cbanp:s in ordmd categorical data. SlaliNlits in Meditbre 17, 2923-36S........, ...... sr.m.tc, J ..E., BboIm, S.. YOB . . . . . c. aad Me....... A. 1996: Analysis of'inter-Clbsen'Udisagrmnc:n1 in die asscssment of subarachnoid blood . . acute hydrocephalus on cr scans. Nftlrological Remuth 18. 487-94.
AIcaIke'slnformaUon criterion
Akaike's infonnalion criterion (AIC) is an index used to discriminale between compcti~ models. It is widely used when then: is the issue or model choice where we wish to find the most parsimaaious model (see Akaikc. 1974). Often there may be a number or possible models that can be Hlted 10 the cIaIa. from which parameters can be estimated using. rorexample. the MAXIMUM UKaJIIDOD ESllMATION. Generally. complex models an: mo~ ftexible. but contain a n:lalively large number of paramclers. whereas simpler models with rewer parameters may compromise the fit or the model to the data. Eueatially. the AIC statistic compan:s competing models by CXJllsideri~ the trade-off between the complexity of Ihe model and the carrcspanding fit of the model to the clata. The AlC stalistic is widely used. particularly as it can be used to compare a'en
models when likelihood ratio tc:sts cannoI be applied. Let z denote the data and' the com:spondi~ maximum likelihood estimates (MLEs) or the pararnclcl'5. Then. the AIC for a given model is denoted by:
IKJIU1CSled
AlC = -2 log L(i; x) + 2p where p denotes the number of parameters in the given model being filled to the data and log L(i; z) the corresponding log-likelihood evaluated at the MLEs of the parameters. The AIC statistic is calculated ror each possible model being considcrc:d. The model deemed optimal is the one with the smallest AIC value. i.e. a model with a relatively small number or panunelers that adequately fits tbe data. The AIC is generally easy to calculate given the maximum or the likelihood function and is vcr)' versatile. allOWing us to compa~. for example. nonnesleci models. We note that c:orrec:tions have been suggestc:d to the AIC statislic to allow for data with ovenlispersion (denoted by QAIC) and small sample sizes (AIC,.). See. for example. Burnham and Anderson (2002), Sections 2.4-5. 1be AIC statistic has also been used to eompare Ihe performance of difl"cmlt models. relative to each othu (Buckland. Burnham and Augustin, 1997: Burnham and Anderson. 2002. Section 2.6). It is not the absolute values of the Ale statistics thal are important but their relative values, in parliculartheir difference. Foreach model the tcnn .dAIC = AIC - min AlC is calculated. where min AlC is the value of the AlC slalistic ror the model deemed optimal. Clearly. AAIC = 0 ror the model deemed optimal: the largCl" the value or .dAiC the poorer the model. The relati\'e penalised likelihood weights n', can also be calculated ror each model i = 1•. , '. m. where: W; =
exp( -.dAlC;/2)
-=",r~----';':"""";'-
Eexp(-AAlCj/2} j=1
and AICI denotes the carrcsponding AlC value associated with model i. The weights provide a scale to interpn:t Ihe difference in values ror the models. Finally. these model weights can be used to obtain a (weighted) model-averaged estimate or parameters of interesL RK [See also DEVIANCE. UKflJIIOOD RATIo) Ablb, H. 1974: A DCW look atlhc Slatistical model identification. IEEE TrtmJGt:lioRs on Alliomalit COIItrol AC 19. 716-72, Backlaad. S. T., BIIlIIIuua. K. P. ad Aapstta, N. H. 1997: Modcl selection: an integral part of inrc~nce. Biomelrirs 53. 603-11. Bu........... K. P. &ad Andenoa, D. R. 2002: Model sc/~clion aM mullimotlel i"J~rDlte. 2nd editiCIII. Heidelbell: Springer Verlag.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ALL SUBSETS REGRESSION
allelic association
1'bis is an association between two alleles (at two dilTermtlaci). ar betwcea an allele and a phenotypic bait. in Ihc population. Since humans IR diploid a morelc:dmical definition of the ranner is nca:ssary: two alleles are associated iflhcir frequency orCO-OCCUJ'RDC."C in the same: haplolype (i.e. Ihe genetic maIeriaI transmitted from one pan:nt) is gmder Ihan the product of the l1III'Iinal frequencies of the lwo alleles. Association belween two alleles is also known as lilrJcoge dlseqrtili"'iuIII. 1bc reason is thai. in a lillie papulation under mncIom mating.1hc extent of association between two alleles (as me&lllRd by the difference between Ihc rraaucncy of the haplotype containinr: Ihc two alleles and the product or the frequencies of'the two alleles) clc:c:reases by a radar equal to one minus the n:combinalion fraction (see OENETIC UNIt. AGE) between the two loci. per leneration. "Ibus allelic association n:pn:sents a slate of disequilibrium that k:ads to dissipate at a nile determined by the sln:nlth or linkage the stale of equilibrimn belwcen the lwo alleles. whea the: frequency oflhe haplotype is equal to the produc:cor the hqucncies of the two constituenl alleles. AssociDlions between lWO alleles can arise in a population for a number of R:8SDRs. The mutation that pve rise to the more rc:ccnt allele may ha'Ve occuned on a chromosome tlud happened to contain the other allele. Random genetic drift duriq a population botdeneck may have led to the oVcm:pR:SCntalion or some haplotypes. The mixiq of two populations with differenl allele frequencies may havc resulted in associations between alleles in the ovmall population. When. for any of' these ~asons. such allelic associations arose many lenemtions BlO. OIIly those occurring between lightly linked loci are likely to havc penisted 10 the Clll'l'CDt ICDeraliaa. We would thcrefce expect an impelf'ed inverse rdationship belween Ihc extent of associalion between two alleles and the distance belwecn 1hc1D. An assocwion between an allele and a disease may be lhe result Oradi~cl causal Rlationship.ln other words. the allele is a causal varianllhal is fuactional and increases Ihc risk of the disease. However. il could also be indin:cl. with the allele beiDg in linkage disequilibrium with a causal variant. The pI1:lICIICe of' link. disequilibrium belween tightly linked loci means that it is possible to seRen a chromosomal n:gion for a causal varianl without eumininJ all the alleles. only a sufticienl number to ensure Ihat any causal variant in the ~gion is likely 10 be in linkage disequililximn with one or mon: of tile alleles examined. The poIymorphisms chasen 10 JqJn:5Cnl itself and associated polymorphisms in ils vicinily in an association study are called TAG polymorphi&ms. The International HapMap Projecl (www.hapmap.DII) has characterised the pauem of allelic associations among over 3 million sinlle nucleolide poJymorphisms (SNPs) in the human lenome in line major populDlions (Europeans. Arricans and Asians).
lOW'"
.Classical epidemioloJical designs (CASE-coNI'IDL SnJDl!S. COHORI' STUDIES. CROSS-5EC11ClNA1. SnJDlES) are mldily applicable to lhe study or diseasc>oallele associations. as IR the Slatistieal methods developed ror Ihcse designs (e.l. LOOIS11C REORESSION. SURVIVAL ANALym). These designs are polCDliaily susceptible to Ihe problem ofhidclen population stratiftcation. which can lead 10 spurious associations or mask true a~ cialions. Family-based association designs an: robuSl to population stnliftcaliaa and usually consist of the use of eilller parenlal ar sibling controls. Melhads forlhe analysis of matched samples. such as the McNBIAR·S 11!ST (also called the transmission disequilibrium lesl in the contexl of'pamatal conlmls) and CONDITIONAL LOOJS11C RBJUSSIDN an: applicable 10 these dc:sips. The study of diseasD-8llele associations is a complemcnIaIy slndegy 10 link. analysis. in the localisation and identification or genes thDl incn:ase the risk or disease. In general. allelic association is unlikely to be detected when Ihc marker locus is quite far (>1 mclabase) from the disease locus. but can be much mon: powelf'ul than linkage whea Ihe marker locus is close enouP to the disease locus to be in substantial IiDkqe disequililxium with it. particularly when the effecl size of the disease locus is small. For this reason. allelic assaciation is particularly appealiq for sc:archillJ n:gions that demonstrate linkage to Ihc disease or to Ihe inveSlilation of specific candidate aenes. However. technoIopcal developments have enabled lhe efficient lenolypiq of up 10 I million SNPs in a single 8II'8y. and this has led to association sludies on the whole-genome scale (called gellOl11C-widc assoc:iation studies. or GWAS) that ha'Ve coverage of over 90 fJt of common variants (allele frequency > S CJ.) in the lenome. PS
all ..bsets regresSion
A form of n:grasion in which ull possible models an: compared usillJ some appropriale criterion for indicatinllhe "best' models. If there an: p explanatory variables in the data. there are a total of I
r-
possible reIRssioD models because each explanatory variable can be in oroul ofthe madel and the model containiq no explanatory variables is excluded. One possible criterion rar comparinl models is the MAu..aNs C,. STA11STIC' and 10 iIIur lI1Ile ils use we will apply it to data that arise from a study of 2S patients with cystic ftbrosis Rported in O'Neill el til. (1983). and also gi\'en in Altman (1991). Data for Ihc first tine patienls an: Jiven in the lint table. The dependenl variable in this case is a mcasu~ or malnutrition (PE_). Some of Ihe models consideml in the all subsets n:gression of Ihcse data are shown in the second lable. Iogcther with their assaciated C,. wlues. where p n:fers to the number of paramelcls in a particular model. Le. a model that includes a subsca of p - I of the explanalaly variables plus an intercepl. If Cp is plouecl against p. Ihc subsets of explanaIDry
13
ALTERNA~HYPOTH8S~
_______________________________________________________
all .uba. regression Cystic flJrosis data; fbI thl8fJ subjecIs Sub I 2 3
7 7 8
Se.T.
Height
Weight
BMP
FIN
RV
o
109
13.1
1
112 124
68 6S
32
12.9 14.1
19
6S
22
o
FRC
TLC
2S1
113
24S
137 134
95
449
441
268
147
100
85
Sub: subjcc:t number Sex: O=maIe. 1=female BMP: bady mass (ftiJhIillc~) as • pcrcen1age or Ihc agwpcc:ill: median in IICInII8l indiriduals FEY: fon:ed cxpiratary volume in CIIIC secand RV: raiduaJ wIume FRC: filDclional residual capacity TLC: lotaIlunc capacity Pf..u: maximal statistic clpinllOl)' pn:SS1R (cmH:O)
alternative hypothesis all ....... regie_on Some of ths models fitted in ~ths~ _ _~~m~~~.~
data (size is one mote than ths numberofvllliables in. model, 1o Include the Intercept) Model 7 14 21 ~
3S
42
2 3 4 4 5
63
6 6 7 I
70
9
77
9
49 56-
Tnnu
Size
Sex Sex, weight Age. FEV. RV
Age. BMP. FEV Sex, weight, BMP. FEV Ale. wei&ht. BMP. FEV, RV Age, sex. height. FEV. TLC Age. sex. height. FEV. RV, n..c Sex, weight. BM.,. FEV~ RV. FRC. TLC Age, height. weight. BMP. FEV. RV.FRC, TLC Ale, sex. height. BMP. FEV. RV.FRC,TLC
Cp
17.24 4.63 2.62 4.5 2.95 2.8 6.99 7.06 6.49 8.06
10.29
• Models close to abc line e,. =p. variables mast wonh cODsidcrini in trying to ftnd a parsimonious model am Ihase Iyilll close 10 the line C,,=p. All subsets IqIasion has been foUDd to be particularly userul in applicaliOM or 'COX"s REGIlBSION MODEL (sec Kuk. 1914). SSE (See also MUL11FLE LINEAR RBJRESSION] AltmaD , D. O. 1991: PrortimJ stillwiea for rrwdiml remut:b. London: CRCJChapman & Hall. Kale, A. Y. c. 1984: All subsets ~ion in a pruportioDaI hazuds model. BioIrwtrim. 71~.587-92. O'NtIU. s., ....." r., PasCerIcaInp, H. aad Tal, A. 1983: The dfects of chronic bypmunclion. nutriliaaaJ staha and postuK on rapiratar)' muscle sbmIIh in cystic fibrosis. AlfWriftlll Rft'ieM' of Respiratory DiJortlers 128. 1051-4.
AMOS
See IIYIVl1IESIS nsrs
See STRurnJRAL EQUATION MDOEWND sm:rwARE
a..lysls of covariance (ANCOYA, ANOCOVA) This is an extensiao of the analysis or variance (ANOVA) that incorporales a eantinuous cxplanalOry YDriable. When: ANOVA aims to ddc:ct if then: is a change in the mean value or a wriable across two ar IIICR puups, ANCOVA (or rarely ANOCOVA) docs the same but adjusts for a mntinuous covariate. Most commonly this cowrialc will be a baseline ~~ menl, a1lowinl the analysis 10 adjust ror initial variation between participants and isolate the etrects due to the lRatment factor. However. sometimes a dift'en:nt covariate is used. Far example. Xarhune el QI. (1994) consider the association' bc:twccn alcohol intake (divided into rour cllle-garies) and numbers of Purtinje cells. In doing so they introduce DIe as a continuous covariate in order to 'control' or 'adjust' for the effects of age on cell numbers. Under other cirannstanc:es the aulhan 'could have been inten:stcd in the etrects of age and wantinl to adjust far alcohol intake. Despite being the same analysis computationally. this is not typically what is lhought or as analysis of covariance and might mo~ commonly be IR5Cnled as a '~grasion' . Indeed the various analysis of wriance methods can all be Yiewed from within a repession fnIInewoItt. which demonstrates that ANCOVA can be extended to cope with much m~ than ODe continuous mvariate. Malhematically. ANCOVA follows a similar path to thal far ANOVA and the output is usually summarised in a similar table. aJlhoUlh the details may vary. The promised beneftls of the analysis of eovariance an: clear. If one has an unbalanced obserwtional study. then ANCOVA can adjust for dift"en:nces in baseline 'Values and ~movc a potential bias from the ~sults. By the same token, if one has a randomiscd biaI thai is naturally balanced. then
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ANALYSISOFVARIANCE (ANOVA)
ANCOVA mluces the amount of unexplained variation in the data and Ibus increases the power of the test. However. ANCOVA can only be employed if the appropriate assumptions are met. 'lbcse include those of ANOVA (i.e. normality of n:siduals. holllOSC."Cciaslicity) as well as the appropriateness of the ANCOVA model. Is the n:lationship with the covariate IrUly linear? Docs the etTc:ct of the COWl'iate vary between groups? Failing to meet these assumptions can lead to the introduction of important but subtle biases. It is a frequent c:onccm that medical n:scarch papers report a covariate as having been "conlrOlled' or 'adjusacd' for, with no evidence that the conlrOl 01' adjuslment was appropriate. For further- details sec Allman (1991), Owen and Froman ( 1998). Miller and Chapman (2001) and Vickers and Allman (2001). AGL (See also GIlNER.wSED LINEAR Mooa.l
AItawI. D.O. 1991: Pmt:tiCill statistic's lor mttliml researm. London: ChIpman &: Hall. KariIuDe, P. J., ErldDJutti, T. &ad LalJIPaIat P. 1994: Moderate alcohol ccaumplion aod loa of' cmbcllar Prikinjc cells. British M«Iim/ Joumal 301. 1663-7. MOler, O. A. and CIIa......,J. P. 2001: Misunderstanding anal)'lls of covariaDc:e. Journal ofAbnormtJI PS)~lIoIogy 110.~. OweD, S. V. aDd Ji'I'oawa, R. D. 1998: Uses and abuses oflhc analysis or CO\'lIriancc. Rt.seorm in NlII'sing and Hm/11r 21, 557-62. Vkken, A. J. aDd Altman. D. O. 2001: AnaI)'s~ coatrolled trials with basdiae and follcM'-up measun:mcnts. Britislr Medical JDllT1ItIl323. 123-4.
analysis of variance (ANOVA) Often referring to the one-way analysis of "'ariancc. it is a test for a common !.lEAN in multiple groups that we describe in detail here. Analysis of variance frequently arises in the comparison of more c:omplicaled models, but the same logical argwnenlS apply. In all eases. the undcrIyiRl concept is to partition the observed wriance into quanlities attributable to specific explanatory soun:cs. and then consider important those SOUK'es thai explain 'more than their rair share' of the variance. Despite the confusion sometimes caused by the name. the one-way analysis of wriance is a method for testing to sec whether multiple samples come from populations that share the same mean.. In this mspcct it can be viewed as an exlcnsion to the '-test, which assesses whether samples rrom two populations share a common mean. An analysis of variance performed on two samples is equivalent to performing a l-lest. ANOVA assumes that all the samples come from populations with a NORMAL DI5TRIBU11O.~ that share the same VARIANCE. It can be viewed in a number of ways. but essentially allDpare5 the estimate of the variance obtained within samples (that makes no assumption that the populations have a common mean) with an estimate of the variance rrom the sample means (which will requin: the assumption that
the populations have the same mean). If the two estimalCs of the variance an: different. then Ibis is evidence Ibat our assumption of equality failed and. therefore.lbat the populations do not all have the same mean. Note that the variance of a single sample is eslimated as Ihe sum of squared ditTen:nces from the mean divided by the sample size minus one. 1be sum of squared differences tenD is interpretable as a measure of the total variation in the sample. In the analysis of variance. by combining all groups together. one can calculate this measure for all the data. This is termed the 'total sum of squares' or 'total SS·. Variation in the data is either- "between' or 'within' the samples. The "wilhin poups sum of squares' or 'within sse can be calculatc:d as the sum of squared difren:nces rrom the individual sample means (mther Ihan the ditTen:aces from the overall mean Ihat produced the tcMal SS). 'Between groups sum of squares' or "between SS' can be calculatc:d directly. but is most easily calculated by subtraction of the within SS (rom the total SS. 11ae two estimates of the variance (or "mean square' as it is often termed in this context) can then be calculated. 1bc: between groups mean square is equal to the between SS di\'ided by the number of groups minus one. 11ae within groups mean square is equal to the wilhin SS divided by the number of observations minus the number of groups. An F-slatistic is then calculated as the betwe:en groups variance divided by the within groups variance. Under the assumptions of normality and homoscedasticity (common vananee) Ibis statistic will be an obsen'ation from an F-DISTRIB~ ir Ibe groups come from populations with a common mean. The DEOREES OF FREEDOM of the F -distribution are the number of groups minus one and the number of observations minus the number of groups. From the F-disbibulion. we: can calculate: the probability of observiRl such an extreme value of the F-statistic if the populations have a common mean. This is a one-tailed test. If the value is unusually small. this suggests the between groups variance is unusually small and so is not evidence of variation between the groups. Therefore. the test is to find the probability. ir the populations do have a common mean, or observing a value greater than that observed. A natural way ofpn:scnting ANOVA is the ANOVA table. Given Nobsc:rvalions that fall into k groups. it is necessary to calculate the total SS and the within SS as described earlier and then the analysis can be completed as presented in the first table. Murphy el aI. (1994) conducted an analysis of variance to see if milk consumption before the age or 25 affects bone density of the hip in later life. A total of 248 women palticipalc:d in this part of their study (N =248) and wen: divided intogroupslbat reprcscntlow. mediwn and high milk consumptions (k =3). The samples had similar variances aDd so atlcast one of the assumptions for ANOVA was
15
ANALYSIS OF VARIANCE (ANOVA) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
analysis of variance (ANOVA) 7he anaJys;s of valiance table Degree:lof
SIIIfI:I of
freedo",
:lqutlnS
8etwa:a graups
It-I
Within groups
N-k N-I
Between SS= Total SS - Within SS Within SS Total SS
Soune of l'Qrilllrce
TaIaI
Mf!tJR aqlltlTe:l
F
P
Between MS = Between SS/(It - I)
BetweenMS WilhinMS
p
Within MS = Within SSI(N - It)
analysis of variance (ANOVA) Approximate reconstruction of the analysis of variance table from MulPhyet a/.
(1994). Source of var;lIIIce
8etwa:a graups Within groups
Talal
Degree:l offreedom
Sunu of :lquares
2 245 247
0.15
4A
0.01 0.02
F
p
3.8
0.23
4.6
(Entries in bold weft: infemd from the paper. lite rat simply follow from the cU:ulalioas)
salisftai. As is common for reasons of space. the ANOVA table was not preseated in the published paper. just the Pvalue. but enough dala wm: pn:scnted for an approximate R:Consb'UctiOll. We can infel' btllle within SS is approximately 4.4 and the between SS is approximately O.IS. leads 10 an Fstatistic or approximately 4. From the rcpaned P-value (0.023). it can be calculated from the F-distribution (with 2 and 245 respcclively for numerator and denominator degrees of fR:Cdom) thai the F-statistic was 3.1. The aJOclusion then is thatthc:n: is evidence Ihal these samples cIonot come from papulations thai share a common mean. n.c rcconsb'Uctcd table is presealed in the second table (entries in bold iD this table were infem:d from the papeI'. the rest simply follow from the calculations). It is preferable to conduct an analysis of varillDlX: rather than 10 conducl/-tesls betwc:c:n all pairs of groups. ANOVA awills problems of multiple testing and Ihus keeps CX1IIlIUI oflhe SIDNlflCANCE LEVIiL Having amduclc:d an ANOVA and rejected Ihc hypothesis of common means. it may then be clcsin:d to lest to see which graups ~ raponsible (although a plot of the data might be as infonnalivc). In this case, care must be takea to comd for the problems of making 1ftILDPLE
nus
~.
It is imporlanttotakc: note or the assumptions being made. rather than simply ignoring them. ANOVA can be quite robust to variatio. from nonnality. but hcterosccdasticity can be a serious problem. Residual plots can be used to help assess the normality and IOXPLOIS can be used to help assess the hctcmsc:edaslicity. Passible formallcsts for the assumptions ~ the KOLMOOOItO\L-SMIRNOY 1BI' and I..EVENES 1BI' rcspeclivcly.
If the assumptions do not hold. thea "I1W5RItMA'I1DN of abe daIa mighl comx:t dU. Otherwise a number of nan)Jlll'1llDClric altemativc:s to ANOVA exist. the most commonly used beins the KlUsKAL-W~ 1BI' and the FREDMAN 1I5SJ'. 1bc one-way analysis of variance is appropriate when our data an: simply divided into a number of groups. 'Ibm: an: many other forms of analysiS of ,·arianc:c. 'nIe TW~WAY ANALYSIS CE VARIANCE should be used when the IJOIIPS are definc:d by two factors. S~ for example. we had six groups: the three groups of women in Murphy el QI. (1994) and tlee gruups of men at die same levels of milk atnsumplion. Rathel' than a on~way analysis of variance. a two-way analysis oharianee with gender and milk consumption as Ihe two factors would be appropriate in this instance. Ir the data are multiple observations from the same subjects, perhaps measurements· of cholesterol levels O. 7, 14.21 and 21 days after slarting a new diet 011 several individuals. then a REFEATED ).IEASURES ANALYSIS CE VARIANCE would be appropriate. This is a special case of the two-way ANOVA and can be viewed as an extension of the paired sample I-tell. If lhere ~ observations of ~ than one characteristic from the individuals in several poops. i.e. measures of bulb the diastolic and systolil: blood prasun:. then a multivariate analysis of variance (MANOVA) can be used. If. however. it is desired to correct for a measw-ed baseline atwriate. such as body mass index. in the analysis. then aD ANALYSIS OF COYARIANa. (ANt'OVA) may be used. All these techniques could be implemented through a regRssion framework. in most cases MUIlIPLE lINEAR REORESSION. TheadvanlqeS ofdoing so would be the transition from the usc of a IIYIIOI'HESIS TEST to an actual estimate of etrcct
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ARTIFICIAL INTEWGENCE (AI)
sizes. This approach would also allow IIICR ftcxibility; for inslanc:e in Ihc case ofMurpily el 01. ( 1994) we could account for lhe naluml ordering oflhe 1c~ls of milk consumption that ANOVA ignores. As a gencnl priDc:iple. eSlimDlian and modelling IR usually pn:fenm to testing of h)'potheses. For fUJthcr details sec Allman (1991) and Altman and Bland (1996). AGL
.AJtmu. D. G. 1991: ProdimJ Jla,istk, foT medical l'esearcl,. London: Chapman a Hall. AIIDwI. 0. G. aad BIaDd, J. M. 1996: Slalistic:s nota: oompariag several poups usia: analysis of variaJtI::e. British MedkolJoumol312. 1472-3. M1II'pIJy. s., Ka.w, K.-T.. Ma1. H. aDd Compsloa. J.IL 1994: Milk CIODS1UIIpIion and bone mineral densil)' in middle ~ and elderty WOIDeD. BrilM Met/iral JOIITtIa/ 308, 939-41.
diseased subjects by a fair coin toss. Consicicr the example discussed in the enlly ror the ROC curve. The points on the curve IR given in the table.
area uncler the cwve SUmmary dIIIa ussd in an ROC CUIVB
Speciftcit)' I-Sensitivit)'
lolli-I
AUC -
2~(/i+I-li)tl'; + )";..... )
in many amIS or medical ~search. including bioc:quivalClltlC and pharmacokinetics. It pla),s an especially important role in the analysis of RECEIYER 0PERA1IN0 CHARAC'JERISIlC (ROC) CURVES. The area under lhe ROC curve of a diagnostic: marker (1Csl) measures the ability of Ibc I1UIdcr 10 discriminate between health)' and diseased subjects. II is the most commonly used measure of perfonnance or a lII8.Iler. We use the convention thDIlarger marlccr values ~ IIICR indicative ordisease. 11Ien if we randomly pick one subject rlOm the health)' population and one flOm the diseased population we would 'cxpect' that lhe value of the muter far the healthy subject would be smaller than thec:om:sponding yalue ror the diseased subject. AUC is the probabilit)' that this. in fact, occurs. The larger the AUC. Ihc bencr Ihc overall discriminator)' accurac)' of the marker. An Bra or I ~presents a perfect test while an IRa or 112 rqRSCnts a worthless test having a discriminatory ability. which is the equiyalent ofdiffen:ntialing between healthy and
'11tc AUe is used as a suml118l)'
measu~
0 0
0.56 0.04
0.14 0.12
0.94 0.32
0.98 0.60
1.00 1.00
11Ic data pn::scnted ~ult in an AUe as follows: AUe - 0.5(0.04-0)
x (0+0.56) +
(0.12-0.04)
x(0.S6 + 0.84) + (0.32-0.12) X (0.84 +0.94) + (0.60-0.32) x (0.94 + 0.98)
area uncler the curve (AUC)
This is a simple and useful mediad of obiainiDg a summary measu~ from plotted cIaIa. Medical research is frequently concerned with serial data. as in repeated measurcments (sec REPEATm MEASURB ANALYSIS OF VARIANCE) an a subject oyer time. e.g. blood aspirin c:onL'lCntralion mcasun:d at various times over a 2hour interval (Matthews el m., IWO). Sa)' we have n measun:ments y, laken at times II (i-I, ..., n). Such data arc tRqucntl)' cxhibited by plotting )'1 yersus I, and joining the resulting points by straight-line segments n:sulting in a ·curve'. The n:sulting an:a under Ihc curve (AUe) is often used as a single-numb« summary measu~ for Ihc indiYiduai subject. Further analysis of Ihc subjects or comparison or groups of subjects is carried out based on the summary measures. The AUe for the SCI of points (y,. I,) i - I ..... n is t),picall), calculalcd by the lnpCZium rule:
()'I) (I,)
+( 1.00-0.60) x (0.98 + 1.00») - 0.91 AD area or 0.9 I indicates the high discriminatory ability or the marker. For the ROC curve. estimating the area by the trapezium rule is equivalent to computing the Wilcoxon or MannWhitney stalislic divided b)' Ihc products of the sample sizes on the healthy and diseased populations. For smoothed ROC curves.. allemaliYe eslimates of the AUe arc available (faraggi and Reiser. 20(2). The effectiveness of aitemative diagnostic marten is usually studied by comparing their AUes (Wieand el a/., 1989). Adjustments of these ~u ror covariate infonnalion, selection bias and pooling effects an: discussed in Ihe ~fe~nClCs gi~n in Ihc entry for the ROC curve. Sc:histerman elol. (200 I) consider com:clions or the AUe for measurement error. For rurthel'details scc Hanle)' and McNeil (1982). DFIBR ........, D. and RII. ., B. 2002: Eslimalioa of the lR:a under the ROC CUI\'e., Slalulit'$ in Medicine 21.3093-106. HuIe)"J. A. and McNeIl, B. J. 1912: The meaning and use or the IRa under the n:ceiver opending clwKteristk (ROC) cun~ RlMJiolol)' 143, 29-36. ~........ J. N. s., A...... D. G .. Call1pIIeII. 1\1. J. and P. 1990: AIIalysis of suiaI melSU~meDts in medical rescuda.. BrilU/r Medicol JtHll"lJQl 300. 230-5. SdIIIt--. E.. ........, o.,Reiler, B.andTrmsan.M.200I: Statistical infamce for the ma under Ibc ROC CUJVe in die pmlCDCC of random
ItoJstaa.
measumaenl CI'RIf'. Am~rit.Ylft Journal of Epitkmiolol)' 154. 174-9. WIeaIId, Gal, l\L H., J ..... B... and J..... K. L 1989: A family of non-parametric stali.stics for comparinc diagnostic
s.,
marken with paimI or uapaimt daIa. Biometriko 76. S8S-92.
artificial Intelligence (AI) This branch of computei' scielK'e is devoted 10 the simulDiion of intelligent behaviour in machines. Traditional focus an:as of AI an: machine vision. MAOIINE lEARNING. natural-Ianguqe processing and speech n:cognilion. Historically an interdisciplinary field. and helK'e characlerised by the pn:scnce of several
17
ASSOCIATION __________________________________________________________________
competing paradigms and approaches. recently AI has staJtcd developing a more unified conccptual rramcwodt. based largely on the convergence of statistical and algorithmic ideas. A constant theme of AI Ihroughout its history has been 'pauem m:opition', Ihc cruciallask of delCcting "pa1lems' (regularities. relations. laws) within daIa. This task has elllCJ'Fli as a roadblock in aUIhc lladilional areas mentioned earlier and hence has aunctcd significant aucntion. Since most cum::nt approaches to pallern R:COgnition involve signiftcant usc of statistics. this has bc:come an important ~I in AI in general. Recently. AI has bc:cn applied to a new series or important problems and this. in tum. has hc:avily aJTc:ck:d general AI ~. Important applications of modem AI include: intelligcnt data analysis (sec also DATA MININO IN MEDICINE): information retrieval and filtc:ri~ from the web; bioinfarmatics; and computational biology. Tmditional application areas. by way of contrast. included lhc design of EXPER1' SYSTDIS for medical or indusbiDI diagnosis. me:thods for scheduling in logistics and creation ofoIhcr decision-making assistant software. The imprecise definition of what AI actually is has made it harder in time: to gauge the impact of this n:sc:an:h ftcld on everyday applications. A number of widely used computer programs would have met early definitions of artificial intelligence.. e.g. popular wcb-bascd n:commcndation systems or air travel planning advisors. Popular techniques for pallc:m n:cognilion such as NEI1RAL NETWORKS, clc:cision ~ and cluSlu analysis (see nUSI"ER ANALYSIS IN MmICINE) have made lhcir way into the standard toolbox of data analysis and are commonly found in lhc toolbox of any biology lab. Machinc vision methods are routinely used in analysing medical images. as wcll as parts of systems such as microanay machines far collecti~ gene expression data. Web retrieval and email filtering software also incorponde several ideas from natural-language proccssi~ and paIteIn m:ognition and Ihc modem sequence analysis of genomic data heavily relics on techniques originally developed for spccc:h nxvgnition. Intelligent web agents exist to find, assess and rclricve relevant infonnalion for the user and spc:cch-n:cognition systems are routinely used in automatic pbanc information systems. 111e field of artificial intelligc:nee has clearly produced a number of pnc:tical applications, but-lhc critics say - these have been achieved without solVing lhc general problem of building intelligent madUncs. Maybe for this n:ason, gc:nc:rally the main suc:ccss story of AI is n:ported to be the defeat of the chess world champion Gary Kasparov by an IBM algorithm in 1997. The origin of Ihc field or AI is often idcntified with a paper by A. M.1Uring. whichappean:d in 1950 in thejournal Mintl, and with a workshop held at Danmouth College in the
summer of 1956, although many key ideas had already bc:cn debated befon:, during lhc early years of cybcmc:tics. Modem techniques of artificial intelligence include Baycsian belief networks. part of lhc more general ftcld of probabilistic graphical models: pattern-recognition algorithms such as SUPPORT \'EtTOR MACHINES. which represenl the con\'ergence of ideas from classical stalistics and from neural networks analysis; statistical analysis of natural languagc tcxt and machine vision algorithms: reinforceme:ntlearning algCll'ithms which represent a connection with control theory: and many other methods. !VenDS BIIbop. c. 1996: Nellral nelK--orks jor pQllern recognilion. Oxford: Oxford University Press. Mllcbell, T. 1995: Mamine learning. Maidenhead: McGraw-Hili. R...., S. and NOI'YIat P. 2002: Artijit'ia/ inlelligen~e: a ",odem approach. 2nd edition. Harlow: Pn:ntice Hall. SbaweTaylor. J. aDd CrlsUanlnl, N. 2004: Kemel methods for pallerrr allQlysu. Cambridge: Cambnqe Univcnity Press.
association
This is the statistical depcndc:noe between two variables. Measura of association, unlilce descriptive statistics of a single variable. summarise thc extent to which one variable: inm:ascs ar dccrc:ascs in relation to a change in a sc:cond variablc. The basic graphical analysis of two variables is the SCA1TERPLGr. wbk:h provides evidence of association in the shape and direction of the seanc:r or points. In the example given here. there appears to be an association between bady mass index and systolic blood pn:ssure values in a samplc of a few thousand middle-aged men and women: higher values of body mass index lc:ad to be associated with higher \'alues of systolic blood pn:ssure.. suggesting a "positivc' association. A ·neplive' association. in cantrast, would dc:smbe a situation where an increase in one wriable: tc:nds to be n:latcd to a dc:cn:asc in lhc second variable. Various statistical measwa can be lRd to inlCrpn:t Ihc degn:e of association.
Correlatiorr c.oeffi~ienl. This specifically measures lhc degree of lirrear association between lWO quantitative variables on a scalc from negative one to positive one. A value ofzero indicates a total absence orlincar association. whilc a value of positive or nqativc one indicates a perfecl linear relationship. The correlalion coemcient between body mass index and systolic blood pressure in our example was 0.25, indicating a positive association that is less than perfectly linear. Howevc:l'. adherencc to a linear relationship is only one form of association and it is easy to imagine other plausiblc patterns of association. such as a parabolic scatter. in which the change in one variable: may be perfectly reftected in thc change in the second variable, but the correlation coemcient might be close to zero.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ATTENUATION DUE TO MEASUREMENT ERROR
Regreuion t:oejJicienl. In the cascofsimple lincarreg~ssion. then: is a complc:tc c;orrcspondcnce between the (."C)IKlation c:oeflicient and the regression c:oeflicient for the slope (/J). 1'11c: regression coefficient. thCRforc. also measmes association. but its value is interpreted as the magnitude of change in the dependent variable thDl arises. on average. from a unit chang~ in the independent variable. In our example. an estimate of fJ = 1.37 indicated Ihat a I kglm2 increase in the body mass index was associated with an average in~ of 1.37 mmHg sylllolic blood pn:ssurc. However. in more complex regression models, the regression coefficient can measure other forms of association beyond linear dependence. For example. either the dependent or independent variable may be mathematically transfonnc:d. such as raising to a higher power. taking logarithms, etc., and the association measun:d by the regression coefficient would express a nonlinear change in one variable in response to a change in the second \·ariable. Relllli.'e riJlc. In the special case of two binary variables. various nlio measun:s arc often used to quantify the degree of association. forexample. one variable might be a measure ofdisease OCCUI'ICncc:.the other a biolOgical or environmental quantity. Most commonly the ratio would compare probabilily of disease expressed as an odds. a risk or some olhcr relevant approximalion to the risk. A relative risk value of 1. indicDling equal risks in both groups. suggc:5Is that no association exists between the biological or environmental quantity and disease. If a statistical measure suggests posilive or negalive association. this should not immediately be taken to imply thal the association is valid and gc:ncraJisable. Several ~ siderations mighl lead us to question the importance of an observc:d statistical association. Firsl. considc:ntion of the STANIlo\RD ERRDR of the measure of association. generally re8ccting the size of the sample. places Ihc magnitude of association in perspective with the magnitude of random c:rror. Apparently shollg associations may in fact be poorty estimalc:d and fall short or SlAtislical signiflcance. Second. an apparent association may be entirely spurious (i.e. 'confoundc:d') due to the inftuenc:e of Glher measuraL or unmeasun:d. variables that hayc not been accounted for in the analysis. For example. in II IRliminary statistical enquiry. risk of coronary hc:an disease may appear to be associated with watching television. although consideration of the underlying relalionship with obesity and physical exercise would probably suggest that Ihc preliminary Onding was spurious. An association may alter after adjustment for the interdependence oroeher variables and Ihc gcnc:nl validity of a measa= of association would often depend on the extent to which such poICntial interdependencies have been taken into K-aJUnt. Studies measuring sevc:nl variables often utilise
multiple regression models to estimate adjusted regression coefficients and partial com:lation coeflicients by including all relevant variables in the model. However. even aIlc:r allowing for such intc:rdependencies. Ihc much strongc:r claim of CAUSALllY between two variables would generally require cxaminalion of more stringent CriteriL Third. an observc:d association may be speciftc 10 Ihc chosen range of the variables or to Ihc particular group of subjects studied and any inrerence beyond the range of the data to hand would require careful consideration of lite method of sample selc:ction. Various forms of selection bias may limit the ,eneralisabilily of the association. JGW
as treated
See INTENTJON-TO-TREAT
attenuation due to measurement error This is a bias reducing Ihc size of a correlation or a regression coefftcic:at due to imprecision of data measurc:mcnt. Consider an analytical epidemiological study in which the aim is to estimate lite CClItJI.EU.TKJN bc:twc:en true average consumption of alcohol (mg per day) and true a'lmlle systolic blood IRS~ (mmHg). Blood pressure measurc:mcnts arc wc:lIknown to be variable within individuals and a single measurement is likely to be rather imprecise (see J.lEASUREMfJO' FREaSJON AND RELLO\BWTY). Such a statement is even more true of a single day's intake or alcohol as a measure of Ihc true a\'CnIIC daily intake of alcohol (even if that day's intake wen: found to be measured without cnur). Now. in the c:pidemiological study we chose. ror each participant. to measure systolic blood pressure once and then ask them to recalllhcir alcohol intake the previous day. If we now calculate lite Pearson product-moment correlation bet\WCn Ihc two mc:aSUfCS we are likely 10 get a positive value that may be statistically signiftcanl (assuming we hayc a large enough sample) but will not be particularly high (i.e. not far above zero). Suppose, ror the sake of argument that we have found a value of lhis correlation 10 be 0.20. It should be fairly obvious that as Ihc measures or systolic blood prcs5ure and alcohol get less pteCisc (equivalenl for a nx.c:d population to lowering their reliabilities) the correlation will tend to zero. This is alIcnuation due to measuremenl erTOr. l.etlhc observed measurement of blood pressure for the 4h participant be Y, and the corrcsponding true average blood prcssa=be t'.. Similarly.lct the measured alcohol intake bcXt wilh a true 8\OCrage of",. We have estimated the correlation between Y and X. PYX. when we are really interested in Ihc corrclDlion between the lnIe w1ues. PII9' If the c:rrors of measurement for blood prcssa= are uncorn:lated with those ror alcohol consumption then it can be shown thDl Ihc rollowing relationship holds: (1)
19
ATTRIBUT~~SK
____________________________________________________________
HcM. "r and "Jt ~ the reliabilitics of the blood pleSAR and alcohol a1Dsumption mc:asuremenls respcc:tivcly. It follows that:
as good an cstimale as possible. especially when one employs anobscrvational study. Using BAYESTHE(IlEM and rearranging the cqualiDD. we can obtain an expreSsion exprasc:cl in tenDs of the relative risk (RR):
Provided wc know the reliabililies fCJl'the two measuremenls.
1 = Pr{EHRR-l)
this equation can be used 10 adjust Ihe observed atJRlalion between Yand X to obtain the required com:lation between their true average values. If we bow that "Jt I : 0~3 and ItJt=0.7. for example. the n:quin:d comdalion is 0.21 ,J(0.3 x 0.7) = 0.44. If. instead ofa correlation. the lincarregn:ssion coefficient for the effecl of blood presSIR on alcohol CODSumption wcre of key intelallhcn: (3)
and. again.1he n:quiml adjustmenl isstraightforwani. Equation (3) also holds approximately if we were 10 use a logistic rqrasion to pn:dict the presencclabllClltlC of hypertension. These calculalions are One as 10Rl as we ha\'C valid cstimates of the reliabilities. Howcver. Ihcy ~ only valid in these veJY simple situations as described. Epiclcmiologisls almost always wish to adjust their estimates to allow for confounding and some of these confounders an: inevitably goiRl to be prone 10 MEASUREMENT ERROR. Undel' these cimlmstances life is considembly more complicated! We cannot even be ccrtain that the estimate of the required parameter will be allenualc:d, never mind heine altenuated in a way described by equation (3). Readers ~ refClRCi clsewhere to these much mom challeRling but more realistic situations (Carroll. Ruppert and Stefanski. 1995; Cheng and Van Ness. 1999; Ouslafson~ 20(3). GO Carrol, R. J., RuppeJ1, D. and S......... L A. I99S: MI!tUUI'r,.nl ~rTtJf' in IIIHflinetlr motkls. London: Oapman " Hall. a..a. Co-L ..... Va Ness. J. W. 1999: StQI&tkal re,rrssiolr M'ilh IMtISII1'tRfMt e"OI'. London: Amold. CiuIIIIfIaa. P. 2003: MetlSUTr!-
,.nl ~"or and miNmsijitaliDII ;" stalistia tmtl epidemiology. London: Cbapmm" HalIlCRC.
where Pr(£} is the pn:valence of exposun: in the population at large. This is a COIlYCnicnl way of expressing the measure of association, because RR is often cslimalc:d usine alternative study designs. including C.~SE-CON1ROL. COHOIn' AND CIlOSS-SECIlOIW. SlUDIES.
Attributable risk is most easily inlClpretc:d when the factor of interest increases risk. i.e. RR > I. and in these cases the possible range of the measure is from 0 10 I. An altributable risk of zero can occur when no individuals in the: popuIalion ~ exposed 10 the factor of inlc:rcst. or if the: faclor is not relaled to risk of disease. RR = I. The measure is nol easily interpreted when the expos~ is proteclive. RR < I. so it isgenemllynot used in thisca5e. By n:cieftning the: reference graup. one can always cxpress the results of a study in a fonn in which RR is greater than I. so this is not a serious limitation. In addition. the lDCasure is oftcn expressed as a percent. As RR became large.).. goes 10 I. but A. goes 10 zero either as the proportion exposed. Pr{EJ. becomes small or as the relative risk. RR. appraached the: null value of 1. If an enlim population is cxposed to a particular faclor. Pr{ E} = 1. then the: sc:cond equalion (above) reduces to A. =(RR -1)lRR. The lable: shows a typical 2 x 2 tablc that can he usc:cllo display Ihe results from an cpidemiological study. In a casecontrol study. the column totals arc generally regarded as being fixed by design and the odds ratioorcross-produci mtio is usc:cl as a good approximation 10 the estimate of RR when the disease is rare. In addition. the exPOSlR distribution in Ihc conbols. Pr{E) = Pr{EtO}. is conside=llo be representative of the exposure distribution in Ihc overall population. Substituting in the samplc estimatcs ofthcse quantities gives rise to whal is the maximum likelihood cstimate of A:
attributable risk As a
mcaslR of the public health signiftcance of exposure to a risk factor for disease. the attributable risk provides an estimate of the proportion of diseased subjcc:tslhat may be allributc:d to the exposure. It is defined by:
A = Pr{D}-Pr{DIE} where PrID) is the probability that an individual develops disease and E and E reprellCllt whclhcr an individual is cxposed CJI' not exposed to Ihc factor of intcn:st (Levin. 1953). Ideally. one would like 10 know bath Prt DJ and Pr{ DIE} fCJl' Ihc population under study. bul for some study designs this is not possible. so if one wishes to use the measure, can: is needed to design a study that will proviclc
•
ad-be
1=---d(a+c)
attributable risk Results from an epidemioIogIcalstudy with two lwels 01 exposure and disease status Disetue slahU
Expo:mi
£
E Total
D
fJ
Tolal
a c a+c
b tl b+d
a+b c+tl N
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ AUTOMATIC SElECTION PROCEDURES
When selting a conllclcncc interval aboulthecslimale. Waller ( 1975) sugeSls using lhe: nonnaI appmximation an die log
transfonnalion of the complemenl of lhe eslimale: var[log(l-l)]
==~c(~tI:~C) + d(b:d)
Aliemalively. Leung aad Kupper (1981) have suggested using a IlIIil transformaaion ia which:
In a cohan study. the row IolaIs in lhe lable an: mgudcd as fixed: IherefCR such a study does nat provide a goad internal eslbnale of the exposure: distribution and neiihcr docs it provicle a good estimate of the IDIconditionai estillUlle or the probabilily or disease. In this case. the proportion exposed is usually cieri. rlOm aDOIhc:r study, perhaps an earlier caseCODlIol SIUdy or a survcy of thc entire papulalion. A CI'UlSsc:c:lional study provides both aD eslimale of the ~tive risk and the overall popuIalion distribution. so in that sense il is ideal lOr estimating altribulablc risk. However. a crosssectional SIUdy sutTen in other ways (see ~SErl1ONAL STUDIf.S). Wallei' (1976) discusses the: properties of eslimales of altribulable risk using these altcmalive study clcsilns. Melhocls for estimating attribulable risk far a particular expasun: while adjustiag ror potential canfouncling facton depends on whether lhe ctTect is conslanl oyer Ihe levels of the: covariates under consideration. When thc effect is CXIRstaaL it can be represented as having a cammon mlativc risk over the strata when using a stralified appraach. such as the Mantel-Hac:nszel method. or it can be IqRSCllled by a main etTKI only in a madel. such as the: linear IlIIistic model. la thesesilUations.onecandi~yusetheadjU5tedestimatoror
the: rOdive risk. along with an estilDllle of thc exlJOS'R distribution in the disc:uc:d &;rOup in lhe second equation (above) to obIain an eslimale oflhe adjustccl allributable risk (Waller. 1976: Gm:nlancl. 1917). Howcver. lhe assumplion thallhe associalion can be described withoulthe inclusion of an iateraclion lena is a sbung one and il is critical in thai a seriously biased estimale can result ir it is not 1nIG. An eslimale of altributable risk thai can be used eilher in a slnlifled aaalysis in which Ihc ctTect is nol homogeneous KlOSS slraIa or in a generalised linear model thai includes inlcraction lenDs can be exprased as: , - 1-~,,"'.. Pij
I'. -
RRlli
whcRj repRSCIIIS the levels of the faclOl(s) being actiusted. i n:prescmls the levels of exposure. Pu is the: propartion of diseased individuals in (IJ) and RR~ Ihe relative risk for exposure level i for individuals with levclj of lhe cowriales being adjusted (Walter, 1976~ Benichau. (993). TRH
or adjustment for cstimaIing the aaibutable risk in ellSC-CGllllOl sbDes: a new. Stalistics in Medkilll 10, 1753-73. Gnad.... s. 1987: Variance cstillllllOrl (or altributabk fractioncslilDlta. coasb1ent in bolla Iqe IUIIa and spanc cIaIa. SIIII&Iies in Medkiltt 6. 701~....... H. It. ... Kapper, L L 1981: CompariSCIII of confidence inlCmis ror atlributable risk. Biomelric. 37, 293-302. M. L. 1953: 'I1Ic OCCIIIRDCC or IURI cancer in IDIIL Alia Uni" Inlmltlli".lis CUll'", Ctllltrunr9. 531-41. W.....,s. D. 1975: 'I1Ic dislributiDn of Levin's IDC8RIJe of lllributable risk. BitJmel.62. 371-4. Waller, S. D. 1976: 1be cstimaliGll aad inlCqRIaIion orlllribulable risk in heal... 1aC1R:h. BitNnel,k$ 32. 829-49. ........ J. 1993: Methods
I.e.,
AUC
See AREA UNDER TIlE CURVE
autoconelatlon
See CORRELATION
automaDc selection proceclurea These an: p. . cedun:s far iclenlirying a parsimonious model in~ion in genc:nl and MULTDU UNEAR REDRESSION in palticular. Such methods arc needed because: in regression analysis an underfiacd model can lead 10 sevcmy biased estimation and pmliction. la conlraSl. aD overfilled model can seriously depadc Ihc efflcicncy or the: resulting paramc:IeI' estimates and pn:dicticms. Consequently a varielyoflec:hniques all wilh the aim or scleetiqlhc: mast important explanatory 'Variables far prc:dic:ling the response variable and then:by oblainiq a parsimonious and ctTectively predictive model have been developed. Perhaps the thn:e mast cammanly used melhods an: ft,,·...tmI .1«Iion, btldcBWd elimintlliDIr and a combination of bulb of lhese. known as sleplt'ise reSTeuion. The forward selection approach begins with an initial madc:1 thai conlains only an inten:epl and successively adds explanalcxy variables to lhe mode) from the pool of candidate wriablc:s until a SIaF is n:ached where none of the candidalc 'Variabl~ if added to the cum:at model. would conlribute inrcxmalion thai is Slalistically imporlaat concerniag the expc:cled valuc of lhe response. Tbc backward elimination mcthad lqins with an initial madelthal conlains alllbe explanaloly wriables being used in the study aad Iben ftnt idcnlifies thc single variable thai conlributes the least inranaalion aboutlhe expected value or the respanse: iflhis is deemed ROllO be "significant" then Ihe wriableiseliminaled rlOmlhccum:a1 model SlIL"CCSSivesleps of the melhad resull in • 'ftnaI' model from which no further wriables can be eliminaled wilhaul adversel), afrccling. in a sIaIislicai sensc.the pmliclcd value or the ellpected n:sponse. The stepwise n:gn:ssion method eambines elements or bath rarwanl sclc:ction and backward eliminalion. The initial madc:1 consi~d is one that contains only an inlen:cpl. Explanatory variables arc lhc:n consiclcn:d for inclusion in the CUl'lalI model, as described previously for rorward selection., bul now in each step of lhe procedure 'Variables
21
AV~LABlECASEANALYSIS
_________________________________________________________
included previously are also considcml for possible: elimination as in Ihe backward mclhod., and lhcy might be removed if lhc presenec of new variables in the model make Iheir contribution 10 pmlicting the expc:ctal response no longer significanL In multiple linear rcpasion lhc criterion used for assessing whcthu or not a Yariable should be added to an existing model in forward selection or remoyed from an existing model in backward elimination is. eS5Cnlially. the change in Ihe residual sum-of-squarcs produced by lhc inclusion or exclusion of the variable. Specifically in forward scJc:ctian an 'F-slatistic'. known as the F-to~ller~ is calculated as: F=
RSS",-RSS"'+I
RSS",+I/(n-m-2) when: RSSmanci RSS"'~I an: lhc residual sums of squares when models with m and m ... I explanatory variables ha~ bc:en ftued. The F-to-enter is Ihcn compaml with a pn:sct term; calculated Fs gJaICI' than the preset value lead to lhc wriable lIIICIu c:oMidcndian being added to the model. In backward selection a calculated F less that a corresponding F-to-remo\le lcads 10 a wriable being removed from lhc auRnl madel. In the stepwise proa:dure wriables are enaen:d as willi forward selection. bul after each addition of a new variable Ihase variables currenlly in lhc model are CXIIISidemi for remoyal by the backward elimination process. (For more details sec Petrie and Sabin. 2005.) In olher types of regmssion. for example. LOGIS11C REORESSJON. other criteria are used for judgiq whether or not a wriable should be enlcn:cl inlo or removed from Ihc: auRnt model. When applying rqressian techniques to HKIII-DiMENSION."'- DATA more sophisticated variable selection Iechniques are needed (sec. for example. Francois. 2008). None of chc automatic pl'OCcdura for selecting SUbsclS of variables is foolproofand it is possible forlhcm lobe seriously misleading in some circumslanc:cs (sec Agresti. 1996). That said. atlcast one can be more conlidcnl in a chosen model if aD thn:c procedures COIIYeIIt: on 10 the same sci of 'Variables. as OCCID'S quite frequently. bUI nul always. in pmctice. When dilTcmat subsets of variables arc indicated. judgcmcnt is ncccssary 10 decide on a prcfcncd model. such judgement being based on the desi..: to create a parsimonious model that is lilcely to be genenlisablc. not overly complex as if madelling mere quirks of the particulardatasct on which it is based. and yet inc:ludiq important OJ'standard paramctasdccmcd to be: of clinical relevance. SSE (Sec also ALL SlJBSETS RI3ORESSIO.~) Ap'estI, A.
1996: inlrotiJKliDli 10 tG/egoriml MIG QIlG/),Jil.
New York: John Wiley a: SolIs. Inc•• FraaeoIs. D. 2008: High· dilMlUiolltll dG/G antJlym:from oplinllli melrits 10/eGturr ~/t£ lion. VDM Verlag. Peart.. A. ... Sabia. S. 200S: Mediall JIG/ulits GI a gla,,~. 2nd edition. Wiley-Black'A~II. Chichester.
available case analysis This is an approach
to
multivariate data containing missing values on a number of variables. in which )dI!.AJIiS. VARIANCES and covarianccs (sec COVARIANCE MATRIX) are calculated from all available subjc:cts with nonmissing values an chc wriable (means and variances) or pair of variables (co'Variances) involYCd. Although this approach makes use of as much of the observed data as possible. il docs have disacb'anlqcs. For example:. Ihc summary slalistics for each variable may be based an different numbers of observations and the calculated yariancecovariance matrix may now not be suitable for methods of multiwriate Daalysis such as PRINCIPAL C'OMJ1ONENTS ANALYSIS and fAC'l1ll .o\N.o\LYSIS for reasons described in Schafel' (1997). (See also ttllSSlNO DATA. MULTIPLE D.lPUTA~J
SSE
ScIaaf'er.J. L 1997: Analysis o/illtOlllpkte muJl;J'tlI'iale daiG. Boca Raton. Florida: Oaapman.t. HaiLOtC.
average age at death
This lawed slatistic is someUmes used for summarising life expectancy and other aspc:cts of mortality. For example. Andcnc:a (1990) commcnlS on a study that campum average age at death for male symphony orchestra condUClors and for the entire US male papulation and showed ahat. on a'VCl1Jgc. the conduclors lived about 4 years longer. 111c difference is. however. IlIIIcly iUusory because as age at enlJy was birth. those in the US male population who died in infancy aDd childhood wen: included in lhc calculation of lhc average lifespan. whereas only men who survived long enough to become conduClors could enter the conductor cohort. The apparenl dilTcreace in longevity disappeared after' aa:ounliRg for infanl and pcrinalal monaIily. In Ihe other direction. a study in the USA that used average age at death of lUCk stan (which. on lhc basis of 32l such deaths, they round 10 be 36.9 yean) to warn of lhc perils of rock music also got it WI"Oq. It took no account of the rock stars still alive. Proper analysis of mortality inyolves the clcccnninalion of AOE-5FEC1RC RATES for mortality. which requires denominator data on chc agc distribution of lhc population (see SSE Colton. 1974). A......... B. 1990: Melhodolog;i.YlI erTtNs in metliml reswuth. 0xfCIrd: Blackwell Scientific:. CoIIOD, T. 1974: Stalistics in oWi"inr. Basion: Unle. BID'A'D aDd Co.
average treatment effect on the treated (ATT) See JlRalENSIlY scalES
average treatment eIfect on the (ATU) Sec PROPENSITY SCORES
untreated
B back-calculatlon Also known as back-projec6on,this is a nu:aas of esdmating. for example. put HIV infectioa raIes and pmlicqllle number of new AIDS cases in the rU1ll1e IUId was flm pmpased in the mid-.9. (BRIOkmcy~ and G~I, 1986). The esSCDCe.of the methad is contained in tM equation:
E s~ imated
HlIV incideTlce
/
I
J
d(I) == ;'(.r)p(I--.r)cI.r
o
where dll) and lIes) denole Ihe disease diaposis nIC III time , and the infection rate at time 8, and p(.) indicales the pmbabilily distribulion (density) of the illCu~on lime (or JNc:UBA1ION PERIOD). This eapression stales that Ihe rate of disease diapasis at lime I depends an the rate of new infeclions at time s and on die distribulion of the illCllbalion lime 1 -s.11x:..,f'om, ifanyawo oflhe~ IhreecampanenlS are kllOWD- Ihe third can be ;"forTed. Typically. the dilCUC diqDDsis rate for I up 10 the cum:ntlime T andlhe ~ bution iof the incubalion lime aM assumc:d known and the infection nile is estimated. The ftgaR explains the: idea in a discn:te time liamewalk usillllhc HIV epidemic as an example. ~.the intcn:st is in estimating HlY incidence and in PRCfictiDg rU1ll1e AIDS cases. Suppose data on new AIDS cases ow:r time up 10 the culmlllime Tare available IOgc:Iher with the information on lM distribution of the incubation time. It is then possible to ~t the nUlIlbel' of past infeclions that have resulted in the oba:rYecl AIDS cases. The estimalc:d incidence of HIV can be used in conjunction with the distribution of the incubation lime 10 produce short-term pmjeclions of neW AIDS diagnases. NOIc that in Ibis particular c:ase.1he MEDIAN Iqth of the incubation time is ofllle onIc;rof 10)'CalS, with very few individuals developinl AIDS within a short time period flUID infection. 1he observed AIDS cases thererCR provide information on infections that occumxl in the distant past~ raIher Ihan in IeClent years. &timalcs of incidence of infectian far IIIe yean JUII pn:ceding T will necessarily be quite inaccurate. as they are based on UllIe inrormation. Can: should Ibcn be: IDkcn in the inlc:lprdation or IeClent trends in the number of infections. HOMWer. this problem wiD ani affect pmjcctions of AIDS cues as long as they are short term.
C8leooar time
T
back-c:adculatloa Btlck-atlt'U/alion lain, HIV incidence lIIId F~dicl;on D/julure AIDS IXlSrS
A number offOlRlulations ofthe back-calcuJadon equation
have been JXCIPOISI'Il. To give a flavour or Ihe·cllimation problem. it is CXJRYenient 10 use a cIiscn:ac 'Version of our fint equation. Let 10 be die beginniq of the epidemic and)"1t the nmnber of individuals tIIal develop 1he disc:asc endpoint or inlcn:st (e~ AIDS in an HIV context) in the kth time inlerwl 1"-1. 'II;) for k =I •...• K. Suppose thai fu. Ihe probabilit)' or developing the disease endpoinl in Ihe jib time: inlervalgiven infection in the ith interval, is also known. Then the expected number of new disease cases in [tlt- .. Iv can be expn:ssed as:
Ir E(,.Ir) == LE(hiVII ;=1
wbereh,islhe unob&erved number ofnew infeclians in the.fth time: intuvaJ. Assuming thatlhe " an: independently di~ lributcd according to a PaSSON DJS'I1UBtmON wi'" panunc:1cI' E(hJ. then Ihe ,'ott an: alsO Poisson dillributed wi'" panunclI:I' B(yll;). F'mm this Ihe likelihood rar the obsc:nm data can be c:onstruclCd and maximised 10 obtained cstimales of tbc: numba-ofnew infections over timc (see MA.XBIlJU LlKELDlOOO DJmUBUDON).ln pnldice.eslimadon ofh=( h •• ... •/rll) is not so Slraighlforwani. The high dimensionality of b can lead 10 unstable c:&timates. In anIer 10 avoid lack or identifiability.
&rqdfllllllf6e CfMIJIIIIIflM I. MeIka'SI.;'1ia: S«'IIIIII EtIiIiM YIaI by Briu So Everitt .... ChrisIlClph« R. P'aIIaeI' C 2011 .folD Wiley ilk ~ Ltd
23·
BACK·~EcnON
......................................................................................................................................................__
some structun: ncc:ds to be imposed on the shape of II. This has typically been achieved by choosilll fully paramelric models for h-lI(q). The problem is Ihcn mluced 10 an eSlimalion of q, convenienlly ehascn to be of a lower dimension than h. AJlCmlllivcly. to n:1aiD some ftcxibilily, weakly paramdric models (i.e. step functions constanl O\'CI' a 10111 period of lime) ha\"e been specified or smooIhncss CIODSlnintson II havc been intnxluccd. 'nIis has CJaIcd a rich literature.. especially in the HlV ftcld (sec 8nxIkmc)'el' and Gail~ (994). AtlraCtivc in principle. givcn the simplicity of the idca.1hc mdhod docs rcquin: IRCdc knowlcclgc of III lcast two oflhc thn:e COmpaacnlS intmduccd aln:acIy. However, perfect information is ramy available. For cumple. as in Hay. Ihc incidence of the disease endpoint. typically acquired 110m surveillance schemes.. might be affected by reporting delay or uncIcnqJaniDg. FUJtIIer.lhc:dislributionofthe incubation lime may also be imprecisely known.. Results can be highly sensitivc 10 misspcc:ificalian of the iaputs.llislhcn:ftR impaltant thai data an: appropriately adjusted far delay in reporting bcrtR Ihcy ~ used in the back-calculation. Equally. it is esscnlialthat sensitivity anaIyscslO Ihc model chosen for Ihc distributiem of the incubation lime am carried out. One more limitation or the method is the inability to pmyiclc pn:dsc eslimalcsoftheineidcnccofinfc:ctioninn:ccnttimcs. Thisisa pallicuiarly serious problem for diseases with lang incubation limes. as seen in the HIVexample. These limitalions nalWitlulanc&ng. the back-calculalion mdhod has been wielely used and ~lopcd in various ways. cspcciaJly in the HIV an:&. Notably. the oriliaal methodology assumed a fixed distribulion for the incubation lime. independent of calendar lime or age al infection. Howevcr~ thcnpcutic changes Cft"eI' lime and the discovery of a clear dependence ofHIV progression on qe at infc:ctian havc made the tiJnc....qc inclcpc:ndcnce assumption untenable. This has led to thedevclopmcat ofagcHimc specific vcrsio. of backcalculalion. Equally. the need 10 CSlimaIc the number of individuals 81 difl'en:nl stap:s of the development of HlV has n:suItai in the devclopment of 'slageel' back-calculaaion~ whac the incubation lime is diviclc:d illlo stages according 10 the value of markers of HlV disease. A Rnal example is given by the need 10 rdlnccstimationofHIV ineidcncc. especially in rucnt years. and AIDS projections. This has n:suIted in a fill1hcr development or Ihc mcIhod. now able 10 iJlallPDr8le external informaIion on the disease spn:acI as well as oIhcr surveillance data. in addilion to AIDS diaposes (sec Dc AlIIClis. GiUts aad Day. 1998; Becker. Lewis and U 2003). The method and its developments have found important application in OIbcr contexts besides HlV.lhamplcs include the assessment of the bovine sponpfann cncephaIopaIhy epidemic in caltle and the CGDSCqUcal Crcutzfeldt-Jakob disease epidemic in humans in On:at Britain. the estimation of lhc Hepatitis C virus epidemic in Prance and the
estimation of Ihc in Australia.
DUmber
of new injc:clilll drug users
DDA
a.cctHttI, P. 1998: BlKk~aJcuJaticIa.1n AlmiIlP'. P. ancIColton, T. (eds).En"rlo'-ofbionatmicJ. \b1.1. Cbicheslcr:Jaha Wiley &. Saas.lJd.. pp.235-42. lder. N. G...... J. J. C...... U, Z. F. 2003: ~ific back-pmjeclion alHIV diaposiSditL S'alulirs in Met/kiM 22. 2177-90. R. .... Od, M. He 1986: Minimum size oflhe acquired immuaodelcieac:y syocIrome (AIDS) epidemic ia abe United StaICL Loner' 2(8519). 1320-2. lraak. ....,.....R.adO'O'M.H.I994:AIDSrpi#kmiolDrY:aqutmlitaiRte apprt1lldJ. N",,' YOlk: Oxfold Univenity PIaL De AaatUs~ 0 .. GIkI. W. R. ad DIJ, N. E. 1998: Bayesian pmjedioD of the acquimI immune deficiency S)'IICInJme epidemic. JDII1fIIII 0/ 1M Royal Slatislictli SlICi~I)' C - App 47. 449-81.
I......,.r.
backWards regl888lon
See LOCKSTIC RBIlESSION.
Mtl.11IU UJlEAR UDlESSION
balance
Sec RANDOMISATION
bar chart
A graphical display of data classiftccl into a nwnbel' of (usually unordcn:cl) cllleprics. Equal width n:ctangular bars ~ used 10 n:pacnl CKh calcgory. with Ihc hci&JIts of the ban beilll proportional 10 the observed fn:.qucncy in the com:spondilll category. AD example is shown in the lipan:..
200 0"1
€
E .sO ,~
:8 ;;;
....
tii 00
!;i,
~.....
;::.-,
0
50
:::!!;,
0
I II
I
I
I
.... cIwt MO"III;I,. rllle:l PI!1' 1000 live bi"hs,/oT rlrild,en
under Jive in./ire tlgJ'e1WII co"""ie:l An extension of the simple bar chart is the component bar chart (also known as the stacked bar chart) in which panicular lcagthsoreach bar am difTc:rcntialCd ton:pn:scnt a numbcrof
___________________________________________________________________ BARCHART
'Jbe basic bar chart is oRen of linle rnc:xe belp in undersannding catc:gorical data than the numerical data thcmselves. However. sophisticated adaptations of the graphic can become an extremely efTective tool for displaying a complex set of categorical data. Thai this is so can be illustrated by an example taken from Sarbr (2008) that uses data summarir ing the fDles of thc 220 I pnsSCDgClS on the Titani,', The data are catc:gorised by economic status (class of ticket. first. second or third. or crew). sex (male or female). agc (adult or child) and whether they s1Ir\'ivcd or not (the data an: available on Sarkar's websile. htlp:lllmdv.r-forge,r-projccL orgI), The first diagram produced by Sarkar is shown in the third figure, This plot looks impressive but is dominaacd by the third 'ponel' (adult males) as heighL" of bars represent counls and all panels have the same limits. Sadly. all the plot tells us is thal there were many more males than females aboord (particularly among the crew, whieh is the largest group) and abat there were even fewer children. The plot becomes man: illuminating about what really happencci to the passengers if the proportion of survivors is ploued and by allowing independent horixontnl scales for the dilTerenl
frequencies a5sociaacd with each category forming abe chart. Shading or colour can be used to enhance the display. An example is givcn in the second figure; here lbe numbers of patients in the four categorics of a response variablc for two treatments (BP and CP) ~ displayed. D IProgriE
2(]0
lPerliaJ r£t;3poose
.
• Comp1e1B rElqJ!llfl~
so , (]O
.50 0---
CP
BP
bar chart Response 10 IreDlnu:nI
Su rvived
No
Yes
0' 2004006:00800 J
l
J
.1
I
I
J.
Ch[ldl Male
.1
J
L
J
.1
I
I
.J
Adull
Child female
I
1
Male
J
~ rn ~
~ ~
I
AdWt, Female
I
C rBW
3rd
0 2004000006.00 ,
D
-
~
; t
....
-
I I
I
I
~
I
I
I
I
I
I
I
I
I
I
•
I
•
I
I
I
freq bar chart Srmmlll1',l' of Ih~ fDlr of fNJ.f,fengt:rs of IIU! Tilanic. cltuSijieJ by sex. Dge DnJ class (used "'ilh 1/." pemrisJ'ion of
Springer'
2S
BASELINE MEASUREMENTS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
• I'd
OJ D 0 o 2'0
bar chart SUrI'il'al allumg diff('rtml suhgroupj' of Ptlsst'll..;ers on III" Titanic, willi a difftTml hori=OIrtal s('{II(' in eudJ panel (lI.\wl
"'ith
Ilr(' JWrnri,uion
of Sprillgt'r)
'panels' in the plot; this plot is shoYiIl in the fourth figun:. which emphasises the proportion or SUf\'i\'Ol"S within each subgroup ruther lhan the absolute numbers. The proportion of survi\'ol"S is lowest amon~ third-class passen~ers and the diagram makes it very clear that the 'women and childn:n first' policy did nOi work \'el')' we II ror this clas.... of passcn~er. (I am grulcfulto Dr Sarkar and 10 Springer ror allowing me to reproduce the Iwo diagrams.) SSE lSec also HlSTOORAM. PIE CRUlT)
Sarkar. D. 2008: u/tlice',' multiroriati.' data l';suali=lltion lfUh R. New York: Springer,
baseline measurements Thesc arc measun:menls taken al the bcgiMing of a study. (This section. however. concentrates on their role within the context of a CUl\lCAL TRIAL.) Baseline measun:ments come in different \'ariclies and rna)' aiM) ha\'C Il \'ariety of purposes (Senn. 1998). First are tif.'IIJClgraplri(· draractf.'ri.\'lic.'i of the patient. which either do not change (such as. for example. sex., change slowly. ir at all (such as heigh... or change Ilt the same rate for aU patients
(such a.o; age). The second and simplest sort is a measurement of the: same type as the outcome variable. but taken one or more occasions prior to RANDOMJS,\TION: these might be refem:d Lo as Iru,' base/int-.\'. l'bird. one ha.o; bas('/i/Jt' correIUlel·. measurements laken before randomisation on \'ariables other than the lrue outcome ..'ariable but pn:dicth'e of it and which ma), \'UI')' during the trial. Some such measurements are in\'ariably collected as part of the process or deciding which patients RUl)' cnter the trial. for cxample. it may be r'Cquircd in a trial of Ilsthma lhaL patients be aged 18-()S, have Il baseline fon:ed expinltor), volume in I second (I-'EV I) no man: than 75 tA of that predicted by age. sc:\ and height. can demonstrate 10 % re\'ersibilil)' ..... hen given a bronchodilalor and arc normolensi ..·c. Simply fulfilling these r'Cquirements will necessitate taking atlc.asttwo FEV l measurements prior to randomisation. as well as diastolic and systolic blood prcSSUr'C:!' Ilnd recording height. sex Ilnd a~e of patients. Lhe last two being variables that are always recorded anyway. In practice. man)' other things. such as. for example. concomitllnt medication and the centre in which the patient is trealed. will also be
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BASELINE MEASUREMENTS
recorded and sueb thinp are also potential candidates for any madel. All thR:C kinds of baseline may be used for four further common purposes. First. to help characterise the patienlS iD the lrial; se:cond.. to CO!l1pllm Ihe groups in various arms of the lrials: third. to provide conditional estimates. which will generally have grcaIcr precision and will be conditionally unbiased given the observed baseline measu~ments; and fourth. to in\leStigate the constancy or othcnvise of the treatment elTect as patients vary. Tbe~ is a fifth purpose to which IrUe baselines can be put: as part of a general repeatedmeasun:s framework. such as. for example, the nndom slopes approach ofLainl and her c~wrilCn (Laird and Ware. 1982; Laird and Warq:. 1990). The following ~ some issues that arise:. Genera/ising rent/11 Using the actual baseline measu~menlS observed is clcarty superior to using the inclusion criteria to ebanclCrisc Ihe trial. as the latter simply define values the patients might have had. rather than values they did have. It is unclear. howe\lCr. to what exteat the baseline characteristics mc:asun:d can be used as a basis for ;encnlising the R:SUlts. since many lbinp that might be imponant have not been mcasun:d. In any case. the logic of clinical bials is comparative rather than ~praenlative. In many areas it is accepIed that the paticats in a clinical trial will be u~presc:ntative. b is hoped rather that an additive scale of measun:ment may permit use:ful application of the R:SU11s. 'l'bis may rcqui~ usc ofadditional covarialc information from the larget population (Lane and Neider. 1982).
Comparing groups. This su;;csts perhaps that Ihe sc:c::ond usc
of comparing the gmups is m~ valuable. Howe\"Cr. fJODl another point of view. the comparison of groups is simply an unimportant ~sting point on the road to adjustmenL If the baseline measu~mcnt is propostic a superior infm:ncc wiD be made by conditioning on it. whether or not it is imbabiac:cd. Particularly questionable is the common practice of comparing groups at baseline in lenDs of si;niftcance tests (Altman. 1985: Scnn. 1989. 1994). This has no uscftd role as part of a geneml strategy for analysis but could possibly havc some limited usc as a lest of the randomisation process itself (to detect fraud.. for example) as palt or some general quality control of the trial. That beln; so. however, the commaaly employed significance levels of 5 .. se:em inappropriate. COlYlrillle adjustment. This in turn sug;cslS that the third
usc of baseline measu~menlS. to provide conditional estimates by stratification or adjustment usin; ANALYSIS OF COVARIANCE. is the most important of these purposes. Howcver. many trialists appear 10 have a strong (one mi;ht
say unreasonably stron;) preference for simple analyses over mo~ complicated ones. ~fcning simple I-TESTS 10 analysis of covarianee and the log-rank test to PROPORTIONAL HAZARDS re;MSsion. Effects in ntbgroups. The founh usc of baselines is also
controversial. An issue of bias V.o\RlANCE b'ad~ofl' arises. Some. the ·splillers·, am ~ worried about bias and consider that it is important to report U"Catment effects by subgroups defined by baseline measu~mcnts; others. the ·poolcrs'. rqanl \'Driance as being the bigger concern and point to the unreliability of inferences based on small groups. Use t1f lrue baselines in repealed metUUres anal)'sis. A conbUversial maucr he~ is that the baselines an: sometimes explicilly measun:d as part of the outcome despite the fact that. obViously. the RabnenlS cannot affect the baselines. If the baselines. or some function of them. ~ also included as covariates this may lead to causally atlCCpIable infe~s. The simplest example is w~ change scores are used and baselines an: fitted as covariates. Infe~nces as to the elTcct of an:alment an: then identical to those that would be made using JaW outcomes and baselines as covarialcs (Laird. 1983). NOiflinear models. Far the ;encral linear model. the
cxpcclcd value of the estimator conditioning on the covariate is the same as the unconditional estimator. This is not generally lrUe for nonlinear cases. Gail eI III (1984) have CXJDSidcml where this docs and docs nol hold for a variety of models. Robinson and Jewel have concentrated on the case: ofLOOlSnc RECJl!SSION (Robinson and Jcwc:lll99l) and Ford ellll on the prapanional hazanls model (Ford ellll.• 1995). It is usually the case when: nonlinear models am involved that filling covariates leads to an inc:n:asc in wrianc:e. Ho\YCYCl'. the~ is a biasing of the lIaIment eft"ect towards the null if prognostic covarialcs an: nol flued. so it does RDI foUO\\' thal fltling such covariatcs necessarily leads 10 a 1055 of power. Then: an: many quments. in fact. as to why the conditional estimators should be pmem:d (Lindsey and Lambert. 1998). but so-called mallinal approaches using working com:lations malriees have also become exan:mcly popular. in particular via the OENEIWlSEDES11MA11NO EQUAlDI approach of Lian; and Zeger (1986). Measuremenl error. II is well known that whe~ a
covariate is measured with error. the estimate of ilS effect onoutcomeisatlenualed(SCCATII:NUA11aIIDUETOMEASlIItEME.W ERROR). The raise conclusion is sometimes drawn that undc:r such cin:umstanccs. analysis or covariance docs not yield conditionally unbiased estimaton (Chambless and Roebaclt. 1993). What has been overlooked is a
27
BA88UNEMEASUREMENTS _______________________________________________________ second aUenuation: dud of the InIC: baseline clilTerencc OD the obsened baseline difl'e~nce (Sena. 1994. 1995). The variance of Ihe observed covariate will exceed ahat of abe "Inle' covariate. The covariance of Ihe two CaD be shown to be abe variance or the true covariate. Hence. since Ihe n:pession of observed on InIe islhe covariance divided by Ihe variance of lnIe dais R:Jl'Cssion is I. HowcvCl'. the l"CIn:ssion of lnIe OR observed is the covariance divided by Ihe variaace ofobserved and hence is less than 1. On average Ihe IrUe baseline diffen:nce is closer to zero than the observed baseline diffen:acCo The two aucnuDlions euelly cancel out and so it IUrns out that corn:cting for an imbalance in obscncd covariales using the observed covariales is the right thing to do.
Correcti". .10' Ime IHJ.seline:.. It has also been claimed that. in the cue when: Ihe covari"e is of Ihe same kiad as the oulcome measu~~ iD other words is a true baseline. analysis of COVarillDCC is only appropriate if the baselines ~ balanced and that unadjusted change SCCRS provide an unbiucd estimate in Ihe mCR p:neral case (Liaag and Zqer. 20(0). This is incOll'eCI as lhe following eauntcrexamplc shows. Imqine a trial in hypertension in which. quite ilTBlionally. but as is theoretically possible, we include: oDly palients who have diastolic blood ~ssun:s (DBP) of either 95mmHg or I05mmHg (to the nean:st mmHg). Forty palients or each sort an: recruited and are allocated. otherwise at random. but in proportions 3: I in the Ilrsl stratum and 1:3 in Ihe sc:cond straIum. In Ihe absencc of any further bowledge. a pc.Jfectly aasonable estimate under Ihesc unn:asonable circumslances would be obtained by subtracting mea DBP under the active tn:atmcnt from mean DBP under placebo separalely in each of the twa strata and avcraging Ihe n:sults. Whal would be misleading would be to taIcc DBP at baseline from DBP al oulcome and com~ Ihe a'VCnlle over both slrata under active In:atmcnt wilh Ihat under placebo. Yet, since we can n:codc 95 and lOS to a dummy variable with valucs 0 and 1 by diViding by 10 aad subtracting 9.5, Ihe finl appmach is fOl'lllBlly equivalent to analysis or covariuce and the sc:c:ond approach is simply that of change-scon:s. Clroice of C'OIYlriIIles. Rcgulalor)o authorities an: naturally
ncnous that sponsors may unfairly muipul..e n:sults by choosing the model most faYDUrable to them and journal editon ouglll to have similar fears aboUI their authors. One way of prolCcling the 'J)"pc I CITOI' rate is to p~pccify the model. which is c:onunaD practice with abc pharmaceutical industry and n:collUl1Clldc:d by various guidelines (lntcmalional Confcn:nce on Hannonisalion, 1999). A modcl-chcckilll approach based on randomilBlion tests not using the tn:atmcnl infonnalion is an altemalive
(Edwanls. (999). Prom abc Bayesia paint of view, however. this is simply a formal and poindc:ss game. Having a \'IIIiable in a madel is equivalent to saying that one knows nothing about its effcclS. An excluded variable is one for which abc effect is known 10 be zero. Other posilions ~ possible. If uscn cannot which model is appropriate. Ihcn eVcn given a limon of prior ignorance: about the In:alment effect, diff~nl posIerior opinions will obtain. This may give a san or jullillcation for frequcntist SCDsiUrily analysis. However. il should nOl be folplten that whereas litlle may be known about the effect or In:aImcat in advance or a 1riaI. the same cannot be said for covariates. and such prior knowlc:dge is an imponaat guide: in the choice of model (Senn, 2000).
.RIC
Choice of lme btueline 10 fil. II sometimes happens thai in a IUn-in period a number of mcasurcmcats an: made: of the: eVCDlUal OUtcolDC variable. Often Ihe last only is used as a covariate. allhough this is in fact wasteful as it implicilly assumes. which is unlikely. an au~grcssivc process. II is beller either to fil the mean baseline or. m~ generally~ each of the baseline measuR:mcnls (Senn. 1997).
Subgroup tllllllysa tmd ITelIlnlmi by eo'",UJle Urfer-tlctions. Such analyses an: sometimes unclenakcn 10 cxamine the constancy of the tn:aImenl effects. The former islhe naaural extension or abc stratification approach and the lallel' of analy.sis or cowriance. Oace such uaI)'IICs have been unclcnala:n a pmbIcm or cambining results from individual 5Inda arises. This issue arises ft'cqucDdy in die: analysis of multiccnlM trials when: centn:-spccific efl'ccts may be examined. If such cffects an: wcighac:d by pn:c:ision. this is equivalenl to using 8AS 'I)pe II sums of squ~. If unweilhted avaap an: used. the equivalence is to 1YPC III sums of squan:s (OaUo. 200 1). Frana one paiDt of view. once an interaclion has bc:ea filted an overall effeCI is no longer or inten:SI, but it ca also be maintained that this llllitucle naively mainlDins abc claims or n:ducing bias against n:cIucing variance. Compromise positions involving random effects ~ sometimes used but not nearly as commanly in combiniDg effects from various ccntn:s as in the analogous problem in mcIa-analy.sis of combining eITects from various trials and hanDy ever when otha' IiOI1s of covariate an: involved. Gmd can: must be taken in inlCrpn:ling elfccts when intcractions ~ involved (Chuang-Stein and Tong. 1996) and it may be wise to fila model without interactions as a check (Scan. 2000). is recommended by international guidelines (lntcmalional Conf'en:al:c on Hannonisation. 1999). Nole ..... it is the interaction or cowriates with Ihe Raiment effect that causes problems in this way. The inlelaclion of covarialcs wilh each
nus
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BASELINE MEASUREMENTS
other is not an issuc of the same impmtance. since it is mcn:ly the joint effect of these for which one seeks adjustment. or c:oune.lhere are also very many tc:cbntcal issues to be confronted when considering adjustment for baseline measurements. A particular inslrUclive case to consider is that when: a number. say k~ of binary covariates have been measured. One approach is thal of stratificaiioD. One CRates 21i strata based on these covariales. forms a tn:alment conlnlst within each and then a combinalion. usually weighted. of them all. In Ihe case of a linear model, if within-stratum variances are assumed constant and combined efficiently. this is equivalent to canying out analysis of covariana: using ordinary least squan:s having formed a faclOr for each binary covariate and fitting all mlCractions between covariates up to and including the highest. Nole that if we chamclcrise an inlenlclion by the number of cowrialcs r in~lved, with r= 1 corresponding to main effects and r=O 10 Ihe general inlerccpt. thea then: are in general (~) such lenDs and that since
t (~) =
21; one is filling Ihe same number ofdegn:es
offrcedom in bothcucs. as is indc:ed necessary. since the two arc equivalenL If~ are q treatments in Ihe trial, lining the treatment efTect as a main efTect only removes a further q - 1 degrees of freedom. Mostlrials Iua\'e two treatments and so q =2. q - I =1and onc: further*,," offn:cdom is removed. Ooing further and considering inlCl'aclions betwccn covariates and the treatment is mO\'ing a step along the road to splitling Ihe trelllment efTect by subgroup. Interactions between treatment and covariales will not be considen:d here and instead the issue: of intenlctions of covariates among themselves is consideml The approach via analysis of covariance has ~alc:r Hexibility than thal using slJata. since it is possible to fit the: main efTect of cowrialCs only or a limilCd degn:e or interactions between them. Funhcrmore. if we move to covariatcs with more than two levels. models with mluced degrees of freedom an: possible. Suppose. for example. thai we have measured baseline severity on a tllRe-point sc:aIe as 1. 2. 3. We can code this using two dummy variables. This is equivalent to filting a linear and a quadndic covariate. Possible schemes for both approaches arc illustrated in the table. Note thatlhese are equivalent since ZI =X2 + 2X:s - 1 and Z: I - 3X:l. 1he circumstances under which one would wish to fit Xl alone and especially X2, alone are fewer than those where one might choose to fit Z. alone. Thus. then: is an allrac:tive ftexibiJity and economy or the analysiS of covariance approach. or coune, where wly continuous measures arc involved the advantages for the covarialc modelling approach increase as arbill1U')' cut-points have to be used if slraliftcation is employed. SS
=
.....1IDe IDl8lllre.....1s Coding
sI..hemes for
a tlrree-poinl
se.wity xQle Dummy
l'llriable.s
Covariate!
Ilflert:epl Selrerily Sewrit,- Linear Quat/rQlic ielY!/2 level J Severity 1 1 2 1 3 1
X2
Xl
0
1
0 0
0
1
ZI -I 0 1
~
1 -2 1
AftmaD~ D. G. 1985: Compuability of randomized paups.. Slalis· lician 34. I. 125-36. CIIambItIs, L. E. and Raebaa, J. R. 1993: MeIhods for assessing dilfemtClC between poups ill change when initial measuremciU is subject to intra-individual \'ariatiOIL Sialis· lies in Meil;c_ 12. 13. 1213-37. a..q.s..m, C..... Taaa. D. M. 1996: The imp.:t of parametrization on the inlerprcblion or the maiu-elfcct terms in the pracnc:c or an interacliOIL Drug Infor-
DNJlion JOII17fal 30. 421-4. Ed. . . .~ D. 1999: Oa model prespccific:alion in confinnalory nmdomized SIUdics. Slalislia in MeJidlle
18. 7. 771-1S. Font. L a.L 1995: Modd inconsisaency. ill1lSlratcd b)' abc eox proportional hazards model. Slal;sl;cs br Medicin~ 14. 735-16. Gall, Me H.".L 1984: Biasc:destimatesof,"lIlmaIIe8"ects in randomized experiments with nonlineu np:ssions and omitted covariales.BiDmetrilca 71.431-44. Gallo, P.2001: Cc:ata'-'A'dpting issues in muhicc:ntc:r clinical mals. Joumal of Biophamrareu· liral Slatisl;cs 10, 2. 14s-63.............. CGnfenn&.oe GIl IIarnIcdsatIoII 1999: Statistical principles for clinical trials (lCH £9). Slalislit:s in Medicine 18, 1905-42........ N. 1913: Further campllratb'c analyses or ~1e5I posl-tcst R:SeBR:h designs. The Amen""" SlaJistidan 37, 329-30. LaIrd, N. M. .ad W.... 11'. 1990: Estimating rates of change in randomized clinical trials. Controlled C/iniC'tll TriaD 11.6. 405-19. LaInI~ No M. .ad \Vue. J. H. 1982: Randam-elfec.u models for Ion;itudinal data. Biometrks 38, 4. 963-74. LaM. P. W. and NeIder, J. A. 1912: AnaI)'sis of covarimce aad standIrdizaIion as ins1anc:es or pmlic:lioIl. Biomttrks 38, 3, 613-21. u.na. I(. Y. I11III Zeatr. S. L 1986: Lonptudinal daIa-aaalysis using generali7Jed linear models. BiDmetrilcQ 73. t, 13-22. ....... I(. Y..... Ztpr~ S. L 2000: Lonptudinal data aaaIysis of continuous and discn:ce n:spansc:s for IR1J05l designs. Sankhya - tht Inditm Joumal of S/alinia Series B 62. 1.34-48. LIndIey~ J. K. .ad ......... P. 1998: On the appI"OIIIiaIene or marginal models fClr ~ mea5lftDllents in clinical trials. Sialislies in Medicine 17.4.447-69. RabIIuoa, L. Do and JenD, N. P. 1991: Some surprising n:suJas .,.. oovariale adjusImeat in logistic n:pasiaa models./nlemaliono/SlatislimiRnieM· 58, 227-40. Sean, s. J. 1994: Testing for basdine balance in clinical trials. Slalisiks in Mtdicine 13. 17. 1715-26. SeIIII, S. 1995: Ia defence or analysis of covariance: a n:p1y to t1aambless and Roeback. Slal,,'io in Medicine 14. 20. 22I3-S. SlIm, S. J. 1997: Slaw/iral WMeJ in tbug derelopmtnl. Chichester: John Wiley &: So.... Ltd. SeaD,s.J. 1998: Baseline adjUstmelll in laagitw5nal studies. In AnnilaF. P. and CalIOU, T. (cds). Encyclopedia iii biMlalislirs. ~ll. New York: John Wile)' &: Sans. IDC., pp. 253-7. Sma, S. J. 2000: The many modes of mda. DnIg IrfonnaliOft JDlllflQIl4. S3S-49.
29
BASIC REPRODUCTION NUMBER _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
basic reproduction number A
Ienn used in the theory or infeclious diseases far the a\IenIIC number of secondary cases that aD infcctious individual pracluces in a allDpletcly susceptible population. The: basic n:pnxluction numbel' (R) of aD infectious agent is a key ractordetcrmining the mte of spread and the prapartion of the hOll population affected. 11Ic number- depends on the dumtion of the infcctious period. the probability of infecting a susceptible individual during one cont.:t and lhc numbel' of new susccplible individuals conlactcd per unit time; consequently. it may vary considerably for ditTerent infectious diseases and also for the same disease in dift"c:rcnt populations. 11Ic value of R has implications for whether them is a positive probability thai an epidemic may occur and the proportion of the population infccted 'WCM an epidemic 10 lake place. The largcr the value of R. the IBI'Icr the fraction of the popuIalion that must be immuniscd to JRYcnt an epidemic. A recent 84XlOUIIt ofthc usc ofthc basic repnx!uction number is givcn in Weslcy and Allen (2009). SSE
Waley, C. L ad Ala, L J. s.. 2009: The basic IqIIOduccion ....... in epidemic mcxIeJs with periodic demapapbics. JDlllflQlof BioID,iml Dynamics 3. 11~29.
Bayes'theorem Bayes' lhc:on:m is a mcthacl by wlUc:h conditional probabilities (see CONDmONAL FROBABIUTY) may be manipulatcd.1n particular. it provides a means ofn:vc:ning the conditioning in order to obtain pmbability statemenls rccanling spc:ciftc events of intcn:sI. Bayes' theorem itself was described originally in ·An essay ••• ' and published two yC81S after the dcalh of the Rcvcn:acl Thomas Baycs in 1761 (Bayes•. 1763). The: usc of Baycs' theorem in manipulating conditional probabilities is used widely. e\ocn ifusers may not be aware of it. but ilS use for rnon: general quantidc:s. for eumple relative risks, gives risc tocxmsidcrable controveny (sec BAYf.SL\N MEI'HODS). Following Spiegclhaller. Abmms and Myles (2004). COlIsieler two evcnts a and h. using the mulliplimlion I'IIle of JIROBI\BD..m"; Ihc probability of both II and b occurring, dcnoccd ·a 1\ b' is given by: 0
P(QAb)
= p(alb) x P(h) = P(hla»C P(a}
(I)
Rc:8I11IIIging this equation yiclds an expn:ssion fell' p(bla):
Combining equations (2) and (3) yields Bayes' theOMm. which expn:sscs p(6Ia) in terms of conditional probabilities far Q and the probability or h:
Pb _ P(tlJb} x PCb) ( III) - P(tJfb) x PCb) +P(Qlb) )C P(b)
(4)
Con\ocrscly. this equation could be considered in terms of b. i.c.:
b _ P(tlJb) )C PCb) P( III) - P(tlJb) )C PCb) +P(Qlb) )C P(b)
(5)
Dividing equDlian (4) by (5) yields:
P(bIQ) P(bIQ)
P(alb) x P(h) P(alb) x P(h)
P(alh) x P(h) P(alh) P(h)
(6)
Hence. equation (6) is Bayes" thCCRm in Icrms of the ocIds of event h. in which the prior odds of h. i.e. P(h)/P(b). an: modified in the light of the dala. i.e. the LIKELIHOOD RA11O.1o yield the pastcriorodds orh. conditional on knowing II. In the case when h is not a simple cvent. i.e. h or band b•• •• bIt an: in fact n mulUally exclusive cvenlS. equation (1.5) can be gencraiised so that the probabiUty of b.1II is given by: '0
Consiclerthecascofwishingtodetcnnincwhctbuapatiaat has a particulardiscasc. D. that the backcround pn:valcnc:c of lhc disease in the papulalian is 30 CJt. but a test is available. The: characlcristics of the test are such that a patient who has disease: D wiD lest positil'C with probability 0.8. i.c. the 5ENSI1M1'Y, while the pmbabilily of a positive tcsI result for nandiscased patients is 0.2. i.e. one minus the SFECIFICITY. Using Baycs' theorem we can calculate the probability thai a patient who has tested positivc docs hal'C disease D as:
peT + 10) x P(D) (DIT ) _ P + - P(T + ID) x P(D) +P(T + ID) x P(D) =
0.8 x 0.3 =0.63 0.8 x 0.3 + 0.2 )C 0.1
(8) (2) Considering theevcnts "II A b' and "II A b'. whcR b rcpJaCIIts theevcnt 'not b' • then these arc mutually exclusive and using the addilion nile of probability (sec probability). we can 'exlend the arpmellt' far II to include b:
P(a) =P(II1\ h) +P(IIAb) =P(lIlh) xP(b) +P(lIlb) x P(b) (3)
In terms of odds. the prior odeIs or having the clisc:ase are 0.31 0.1.0.43. i.e. just undcI-l in 2. whDe the IiIcc1ihaod ratio is 4 far a positive test result. i.e. 0.810.2. and the posterior odds or haVing the disease. having tested pasitil'C. is ~fan: 1.12. KRA Bayes, T. 1763: An essay lOWanis soIvine a pRlblcm in the daclrinc of chaaccs. PililosopldraJ TrtnuactiOlllo/tM Haytli Soc~ty 53. 41 B. ~, Do J., A....... K. R. aad Myles, J. P. 2004:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BAYESIAN METHODS
Ikl,..esilln apprDllt:Ms to dilrit:Q} Irillis and hmlth·C'tUe ~JYllUlltiOft. Chichester: John Wiley " SoDs. LJd.
Bayesian methods 1be use of Bayes' Ihcorem for manipulating conditional probabilitics of specific cvents of intcrest is used widely without conlrovcrsy (sec BAYES' 11IE(IlfJd). However. Baycs' lheon:m may also be applie:d to more gcneral quanlitics. c.g. relative risks. and in such scUinp the inclusion of e."Clernal information in the fonn oCthe unconditional probability distribution (sec CONDITIONAL PROBABLITY) for the quantity of intcn:st. the priordislribution. rather than the lRValcnce as in diagnostic testing. is controvcrsial and has altracted considerable debate (Spicp:lhallcl'. Abrams and Myles. 20(4). In short, a Bayesian approach (gcncnlly) has becn described as 'the explicit quantitDIi\'e usc of extcmal evidence in the design. monitoring. analysis. inlel]Rtation and reponing of a hcalth-care evaluation' (Spiegclhaltc:r, Abrams and Myles. 20(4). As such. it has been argued that a Bayesian approach is often morefIe."Cib/e than traditional mclhods as it can adapt to e:ach unique situation: is more effidenl in that it uses all available evidence thought to be relevant: is more ~I in providing IRdictions and inputs for making decisions about individual patients and summarising eYidence rcganling a proble:m. e.g. malting direct probabiUty statements that am clinically relevant: and more elhical in both claril'yingthe basis for randomisation and fully exploiting the experience proVided by past patients. There arc three clements of a Baycsian approach to medical statistics: subjective probability, assessment of eYidence and decision theory. Whilc the scaJIId is the one thai is most often thought of as a Baycsian approach per se, i. e.the use of extcrnal cvidencc. the first underpins many of the purponcd advantages of a Baycsianapproach. while the third iIIUslratcs the widerpcrspectivc that a Bayesian appn:Nlrh can give. A frequentisl view of probability relics on a long-nm view of the world. with probability being defined as the longrun frequency of events occUJring (sec PROBABLITY). While such a view is cntirely consistc:nt with replicable events. when considering unique evcnts. such as the probabiUly that a patient bas a particular disease. such a reliancc on repeatability makes little sense. A Baycsian approach views probability as a degrcc of belief in an event occurring, which docs nolrely only on n:pcalability but also encompasses a subjectiYC nature of probability. as wc all bring own experiences and background information in making probability assessments (Undley. 1985).1be use of a decision Ihcorclic approach to statistical infcrence plaecs the decision rcganling a paramc:tcr within the contcxt of the potential lossfgain in utility assoc:ialc:d with making dc:cisions (Lindley. (985). Fundamental in both frcqucnlisl and Bayesian approaches to statistical inference is the likclihood function (sec UKFlJ. Hom). From a frcqucnlisl perspective the likelihood function
summarises how plausible dilTcn:nt values of a parameter are by using an iR\'crsc argument. i.e. for a given wlue of the unknown parameter how plausiblc are the dala that ha\'C been observcd. A Baycsian approach uses the likelihood function. P( ),10). in the samc manner, i.c. as a summary of the relationship between data observed (y) and unknown paramcter (0), but using Baycs' thcamn reverses the: conditioning to obtain Ihc probability distribution for Ihc unknown parametCl' conditional on both the data and any background infonnation summariscd in the prior distribution, P(O). Thus:
P(Olv)
.
= P(O)P{)'IO) = P(.v)
1'(0)1'(.\'10)
IP(O)P(vt9}t10
(1)
Although this equation is applicable whether the modcl contains • single unknown panunc:lcr or multiple unknown panuncters. the speciftcation of P(8) (sec PRIOR DlSTRIBl1T1ONS) and the computation ofP(y) (sec COMPUTA11O.'W.MImIODS)can be more difftcult. but an addc:cI complexity is that we arc often only interested in cmain key pBl'llltlClel's. e.g. a treatment cffect, and wish to consider the OIhcr parameters as nuisancc parameters. Thus. in addition to obtaining the joint posterior distribution. we often obtain the IIfIlrginllJ posterior dislribuUon for one or more paramelcrs. say 8=(6• .;): then the rnarPnal postc:rior discribution for iJ is given by: P(6lv)
= Ip(81.l')d-tp
(2)
As with the: computation of P(y) in equation (I), Ihc intcgration out of the remaining model parameters in equalion (2) is very ran:ly analytically lraclable. 1bc prior distribution docs not necessarily have to be temporally prior to the study in question. but mthc:r is a summary of the pertinent external information, i.e. cither bascd on other studies. subjective beliefs or a combination of the: two. When there are multiple: SOIIKlCS of extcnaal cyidCIICIC in the form of other study Jaults. then the prior distribution may be based on a synlhcsis of such evidencc using meta-analysis or generalised evidencc synthesis techniques. which may downwe:ight some sourccs of external cvidCIICIC.e.g. observational studies. or may adjust the results for potential confounders or in order to make the synthesis of IIIOI'C relevancc to the study in question (Spic:gelhaltcr, Abrams and Myles. 2004). In 1CnnS0f using subjective prior belicfs. then: haw been a number of methods advocatc:d for the elicitation of such beliefs using a variety of methods. ranging from informal discussion. through the usc of structure qucslionnaiJa possibly using a "trialroulellc' fonnat to thc usc of intcractive computer clicitation techniques (Chalone:ret DI•• 1993: Spiegc:lhalte:r, Freedman and Parmar, 1994). When the beliefs of multiple individuals are elicited then ClDllsideration has to be given as to whether these should
31
BAYESIAN METHODS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
be pooled in a ronnal manneroruscd inclcpcndently (Oenest and Zidek. 1986). A particular Iype or prior distribution thai is oftc:n used is what is termc:d a "noninformative' 01' 'vapelprior" distribution. Such a distribution is deemed to be "vapJe' relative 10 the likelihood so thai Ihe data from the study in qucstion dominalc the analysis. While such an analysis appeals 10 analysts wishing to maintain a sense of objectivity but nc\lClthc:lc:ss lalee advantage of GIber aspecls of adopting a Baycsian approach. e.g.1he ability to make direct probability statements. when considering prior dislributions ror parametcn in CXNDpiex models other than main cJTc:cts. e.g. variance components. can:f'ul considc:ndion has to be given 10 what "vague' rally means. and this should be assc:ssc:d as part of a sensitivity analysis (Spiqelhalter. Abnms and Mylc:s. 2(04). A related issue is that of whc:aher a "vap' prior distribution is inwriaDlto uansfonnations. i.e. what is vague on one scale may in ract be informative on anothCl'. and in sucb circumstances Jefl'~ys' priors may be considen:cl. which. allhough not necessarily "WlUC'. an: invariant 10 transronnaaions (Bemardo and Smith. 1994). In complex multiparameter models the specification of a joinl prior distribution can be a difficult task in itself. since assuming independence between all plll'8l11CteJS. and thus being able 10 speciry a series of univariate: priOl' dislribulions. is usually lIJIR:8Sonable. A consequence is thai we often have to spc:c:ify oonditional prior distributions. In summmy. then: is no such thing as a "CXHTeCt' 01' single prior distribution. and consideration or a range (or "community') of prior disb'ibutions is advocate:d (Spiegelhaltcr. Frcc:dman and Parmar. 1994; Spiege)haJter. Abrams and Myles. 20(4). Such a 'community' eould contain a "vague' prior distribution. a "sceptical' prior distribution, i.e. one that places only a small probability on an intc:nenlion being beneficial. an "enthusiastic' priOl' distribution and a prior essentially based at the null (Spic:gelhalter. Freedman and Parmar. 1994; SpiegeJhalter. Abrams and Myles. 20(4). Having obtained the posIcrior distribution using Bayes" theorem (I) all subsequcnt infen:nce is based on iL Standard measures of location and uncertainty may be obtained. e.g. posterior mean and variance. and the posterior density itself may be ploued. which is c:spc:cially important when it exhibits unusual behaviour. e.g. multimodal. CaEDmLE MER. VALS (Crls) can also be calculated. which IR analogous to CONFIDENCE INTERVAlS but which have the intc:rpn:tation often incom:ctly ascribed 10 Cis. namely that they IR intervals in which the unknown parameter lies with a spc:ciflc posterior probability. Crls can be obtained in a number or ways. either as equal-tail lIMa intervals or as highest posterior densily intervals (HPDls). which ha~ the property that no point outside the interval has a hi;her point probability than a point inside the interval and an: particularly informative when the posterior distribulion is either skew 01' multi modal
(Spiegelhaltcr. Abrams and Myles. 20(4). In addition to obtaining Cds. a particularly appealing advanlqe or a Bayesian approach is that direct probability statements can be made that a~ ofdirect clinical relevance. e.g. the pasteriOl' probability that a relative risk is abo~'e a certain value or is within a certain specified range (Spic:gelhaJler. Abrams and Myles. 20(4). Another adWDtage or adopting a Bayesian approach is the ability 10 make pn:dicti'VC statements rqarding future clata by obtaining Ihe posterior pmlictive distribution. The: posteriOl' p=lictive distribution for future clala is obtainc:d by inleglDling the likelihood function fOl' the future data over the posterior distribution. i.e. cUlMnt stale ofknowlc:dge rq;arding the parameter, so thai the pn:dicti~'e distribution for future data. x. having observed data y is given by:
P(.~I>·) =
J
P(·\"IO)P(9IY)dO
(3)
This equation can be: used spcciftcaUy in Ihe monitoring or sludies. since having obtained the posterior pnxlictive distribution. direct probability statements can the~f~ be made rqarding the eventual "observed' study ~sult and thus decisions made as 10 whether to continuc or not (see CLlNIC'AL TRL\I.S).
An allemali~'e form orthe predictive dislribltion is to use the prior distribution rather than the posleriordistribution and so the resulting pn:dictive distribution is in facl that fOl' the data observed. Comparison oflhis with the observed clala has been advocaled a means by which priOl'-ciata conflict can be: assessed. although this mises fundamental questiOns when subjective beliefs ~ used (Spiq:elhaJler, Abrams and Myles. 20(4). In many biomedical settings data accumuJates sc:queatially over time and an imponant advanlage in the use or a Bayesian approach is the ability of the Bayes' lhean:m to naturally accommodate such scenarios (Bernardo and Smith. 1994). Essentially. the posterior distribution at one time point bceamc:sthe prior distribution for the subsequent time point. assuming thai the data can be considered to be: conditionally independent. Thus. if data)". an: observed rust. followed by data )"2. then:
P(OI>·I.)'2) lX PlvIID)Pb"l19)P(9) OC P()rlIO)P(OL'·.) (4) Of fundamental importance to lhe practical application of Bayesian methods in a medical selting a~ the assumptions made regarding model parameters. In many situations specific model paramelers may represent subgroups of individuals within a single study (see SUBOROUP ANALYSIS). studies within a mc:la-analysis or units within an institutional comparison selling. Such multiplicity or panuneten requi~s assumptions to be made ~gardless or whether a frequenlist or Bayesian approach is adopted. From a
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BAYESIAN METHODS BayesiaD perspective three possibilities exist: the parameten can be lhaught to be identical and therefCR all the data pooled and Ihc: commOll parameter eslimatcd: the parameten can be thought to be independent and lherefcn each subgroup/study/unit ualysed sc:parately (specif'ying an independent prior distribution for each); or the parameten can be thaught to be ·similar· in the sense that we thaught them nol to be systematically dirrerent. in which case they are termed "exchangeable·. If Ihe assumplion of exchangeability a priori is thought to be a reasonable one then the parameters are assumed to be drawn from a common distribution (with unknown hyperpanmeters) this specifics a hiel'llKhical or multile~1 model (sec MUJ.. TJl.£VEL ratODELS). Consequently. in estimating a specific: parameter. i.e. the underlying errect in a sUbpoupislUdyl unit. we ·borrow slrellgth' from lhe other parameters via the cammon distribution. ID practical terms. this means that a Bayesian approKh to problems of multiplicily ensllMS that individual parameters ~ shrunk towards some overall common effect and thatlhe "bonvwing of strength· ensures that there is less uncertainty surrounding the underlying effect within an individual subgroup/study/unil than had been originally observed in the data (Spiqelhalter. Abrams and Myles. 20(4). Specification of prior distributions for the unknown hyperpanmelc:n in the mocIcl then encompass the to which we belie~ individual subgroups/studiesl unilS may be differenl to one anaIher. As with statistical modclliq generally. model criticism can take the farm of msweriq thc:se questions: If a different statistical model were used would different conclusiOll be n:ached? How well docs the macIeI pcrfann.. i.e. how well does il madel the data? In terms of differenl IIatislical models.. obviously different models could be used and results compan:d or some form of model selection process may be usc:d (sec: later). Rcprding "model fit'. one approach is to consider pnxlidiCMI of the observed data based on the model and to compare this with the actual observed data using a cross-validation appmach to produce the: conditional predictive ordinate (CPO) (Oi!ks. Richardson and SpiegelhaJter. 1996). Altcmatively. an overall assessment of madel performance CaD be calculated deviance infannalion criterion (Ole) (SpiegelhailCl' el Qt•• 20(2). In addition. the use of specific prior distributions raiscs the question of whether diffcn:at conclusions would be drawn. legitimalely. by individuals holding diffcn:nt prior beliefs. However. sometimes equally impodant is the: specific speciftcaiiOll of PRIOR DlSIRIIlrIlONS even though they may be intended to be 'vague'. Consequcady. the use or a Bayesian approach dictates the nc:c:d far can:ful and coascientious seasilivity analyscs and this may appear daunting to the uninitiated anaIySL Model selection. whether n:lating to the speciftc parametric fona or covariales included in a model. can be
.,,.«
achievcdeithcr byquaJitalively comparing aspc:ctsofmodel lit. e.g. CPOs and DIC discusscd earlier orqlUlJllitatively via lhe use of Bayes' factors (Bernanlo and Smith. 1994). Bayes' factors provide a means of assessing lhc relative plausibility of the two competing models. in an analogous maDner to a UKEUHOOD RAllO. but havinl integrated over the prior distributioas for model hypcrparametcrs. Consequently, lhe speciftcation of improper prior distributions. which oRen arise when atlempting to repn:scnt ·vague· beliefs. causes compulational diflic:ullics (Bernardo and Smith, 1994). While Bayes' factors themselves can be used to com~ competing models dirccdy. and which do not have to be nestcd. they can also be used in conjunction with prior model probabililies to obtain the posterior model probabilities. i.e. the plausibilily of the competing models based on both data and subjective prior beliefs. and which can. in tum. be used to average DCIVSS models. so thatlhe estimation of a treatmenl effecL for example. takes into account both lhe within and bc:lwccn model uncertainty present (Kass and RaRery. 1995). As bas ahady been mentioned. the application or Bayesian methacls to realislic biomedical problems CaD be computationally intensive. with only highly stylisc:d examples beinl analytically trac:tablc.lnordertocvaluate inlqrals such as those in equations (I) and (2) lhn:c: broad techniqucs have been cunsidered: asymptotic approximalions. quadrature (numerical intepalion) techniques and simulaliCMI methods (Bernardo and Smith. (994). The development of MARKOV CHAIN MONI"E CARLO (MCMC) simulation methods together wilh usc:r-fiiendly software such as WinBUOS (see BlIOS .o\ND Vt1nBUOS) has caablcd the use of a Bayesiaa approach to be a realistic choice far many analysts regardless or philosophical crc:dcnce. n.e table summarises the differcac:cs bc:twc:cn a fftlqucnlist and a Bayesian approach to many oflhc: issues that arise in the design. monitoring. analysis and interpretation of RCTs and which are now discussed briefty. Although in practice Bayesian methods have been applied more fmauendy in the analysis of RCTs. use of Bayesian methods in speciftcally the design of early phase trails in which decisions as to the appropriate dose level or whelher to initialc a conftrmatory trial have to be taken as dala accumulates has received attenlion (Oalsonis and Greenhouse. 1992; Stalin. 1998). The role that elicilation of prior beliefs and demands from various stakeholders (clinicians. patients and policymakcrs) has to play in conftrming (or refUting) the need for a randomisc:d trial on the basis of equipoise has also been advocalcd. whether or not these are used in a formal assessment of whether a prvposed RCT is likely to lead to a definitive answer giveD the rcsoun:cs available and uncertainty in, for example. the event rate in the control group (Spiegclhalter. Abrams and Myles. 2004).
33
BAYESIAN METHODS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ BayesIaD methods Comparison offrequenlul arrd Boyesitm DpprOtlmes 10 design, moniloring tmtI QIrtlI}-awmlerprelQliorr of RCTs (Qt/Qpled from SpiegelhQller. AbrQllfS Dnt/ Myles. 2004,
lute
Preqrlentist
BayesiQII
Extcmal information Sample siK
Infannally used in design Required to dc:tcct minimum clinically significant diffenmce at prespecified level or Type I and II enors
l'ararndcr of inlerc:st
Fixed stale of nature Justifies hypalhcsis testing
Used fonnally to specify prior Assumcdfixcd.. but assessment of probability of ftnal Cd excludin: clinically significant difference. allowing for unc:atainty in inputs Unknown quantity
Randomisation
Basic question
Pn:senlalion ofn:sullS
Interim analyses Inlerim pn:diclions Subsc:ls
Not neecsS8J)' due to subjective naI1Jre of probability How likely are How likely is value data given value of panmcler given of panamcler? data? Likelihood Plots of poslerior. functions. P-Yalues posterior probabilities or quantities and Cis of inten:st, Cds. posterior used in decision model P-valucs ad esti- Inference not mates adjuslCd for afTectc:d by number number of Daalyses of analyses Conditional power Use poSlcrior ~ dictive distribution Adjusted P-values. Subset cffc:cts 'sluunk' using e.&- Bonfemmi 'sceptical' prior
A crucial aspc:d of conducting large-scale PHASE III TRIAU is the issue or monitoring the trial as data accumulale in order to miDimiscexposure ofpatienlS to lesseffc:ctive (or evc:a hannful) interventions. From a fR:qllCDtist pc:rspc:ctive such monitoring raises issues of multiplicity and for which methods to adjust for this to exist. The usc of a Bayesian approach to accumulating evidence is entirely natural. in that aI various stages during a trial the posterior distribution for the outcome is an uscssment of the cum:nt stale of knowledge and on which decisions n:ganling continuation/ termination should be based without the need for
adjuSlment (Fayen, Ashby and Parmar. 1997). An additional advantage is the ability to pn::dicl. using the posterior predictive distribution at iaterim inspections. what the consequences of continuation would be: in terms of the eventual posterior distribution. conditional on the data so far (Abnms, 1991). An alternative approach that has been s",gelled extends the considc:ralion or the posterior distribution to incorporate the potential losses or making (SpiClelhaltcr. Abnuns and Myles, 20(4). One key question. however. is whal prior to use in such monitoring situations. As regards Ihc situation in which a difference in favour of one intervention has bc:ca ddc:ctcd. then a "sceptical' prior (see PRIOR DISl'RIB~) has been advocated. on the grounds that if Ihc data so far am suf6cic:at to convince a ICICptiC ofthc merits ofa particular intervention then continualion would appear inappropriate (Fayc:n. Ashby and Pannar, 1997). Similarly, when no diffen:ncc has been dc:tcctc:d at an interim analysis. aD enthusiastic prior disbibutionalUld be: used to DS5CIS whc:thc:r then: issuflicic:at eviclelaec for a proponent of an intervention 10 rule out a beneftt. Having conducted an RCT, how should the results be analysed and interpn:tecI from a Bayesian pcnpcctive and what advantages do they confer? Ultimalely, a Bayesian approach allows an exploration of how and why individuals interpn:ting the SIUDC ReT eviclelaec may n:ach differing conclusions - aamely thai they held diffen::nt a priori beliefs. although in the light or substantial evidence evc:a 'sceptics' and "enthusiasts' should conVCIJC to a consensus. 'J'bc usc or Bayesian methods also rocuses atlenlion on estimation and/or decision making and enables direct probability statements to be: made thai an: of clinical n::levancc. It also c:aables the inclusion of pertinent external infonnalion, which in Ihc case or ReTs that are relatively small, but which have produced I~ effects and appear to be 'too good to be true' • provide a means by which such n:sults can be: amelioratc:d (Spiegelhalter. Abrams ad Myles. 20(4). An altemalivc approach in such cilalmSlDnccs is 10 ask the question: Whal prior beliefs would 1 have to hold in order not to IICceptlhc findings or an RCT? If the prior beliefs n:quin:d to overturn such findings are so ·cxnme' that it is unlikely for them to be held by a rational individual then the RCT n:suIts an: accepted at "face value'. Fmauc:ady in RCTs. inlen::st focuses on subgroups of patients (see SUBOAIOOP ANALYSIS) and interpretation of Ihc effects of an intervention within such subgroups raises issue or multiplicity. A Bayesian approach 10 subgroup analyses considcnthe simultaneous aaalysis ofthc subgroups within a hierarchical modcI. in which a ·sceptical' prior distribution is placed on the degree to which the c:sIimaICS of effectiveness within individual subgroups differ from one aaothcr - a consequence of such an approach is that abemmt effects in relatively small subgruups of patic:ats are ·shruak' towards
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BAYESIAN METHODS
a common overall effect.1he degree of shrinkage depending on bulb Ihc size of the subgroup and Ihc degn:e of scepticism exprased. Such an approach thus reduces the pollSibility that spurious findings an: accepted unwillingly. Bayesian approaches to Ihc analysis of ReTs Olhcr than two-group parallel clcsigns ha,'C also been advocalcd. including CROSS-ovER TRL\I.S" FACIalLoU. Dams and CLUSTER RANDO. !.IJSB) 11tIALS (SpiegclhuJlcr. 20(1). The growth of EVIDENCE-BASED MEDICINE and hcallhcare is basc:d on the systematic sean:hing for and synthesis of rc:scard1 cviclcnce. MetlHlJlalysis, the quantilalive paoIing of evidence from 'similar' studies. raises a number of methodological issues for which a Bayesian approach has been advocatc:d. The most fuadamcntal issue in mela-analysis is heterogeneity - statistical. clinical and methodological. Statistical heterogeneity refent to the study-Io-study variability in terms of the estimates associated with each study. When ex.c:essi ve statistical heterogeneity exists attempts should be made to explain it in terms of study- and patient-level covarialcs. but this is not always possible and so random effccts models. which allow for such hclcrogencity. arc oRen used (Spiegelhallcr. Abrams and Myles, 2(04). Estimation of the variance components within such moclcls can be problematic. especially when the number of studics is small and Bayesian methods have the ad,'anlale of not only allowing for the uncertainty in variance wmponenl estimates but also allow for the possibility of informative prior distributions on variance components based on other external evidence. Clinical heterogeneity refers to Ihc fact that different studies may have used diffemndoses. may have had different palient populalioos. e.g. in terms of age. and may have considered different comparators. In panicular. studies thai compare different interventions only provide indirect evadeacx: for other comparisons and the usc of multipanmcter e,·idenc:e synthesis methods within a Bayesian fl'BlDCWork ha,,'C been advocatc:d in order thatlhc approprialc conelalion and uncertainty is laken into account. A specific issue for which Bayesian methods an: ad"lU1iagCOUS is when baseline ,uk is considered as a possible treatment modifier. i.e. the event rate in the wnllol group. Clearly regression techniques have to allow for the correlation induced between the control group event rate and In:almCnt effect. which is most easily ac:cIOII1plished by using effectively a multivariate melD-analysis model. Such multivariate models can also be used when multiple or surrogate outcomes arc considered. Methodological heterogeneity often refen to study design and Bayesian methods fOl' the synthesis of evidence from a variety of dispandC sources have been developed. e.g. randomisc:d and observational studies. epiclcmiological and toxicological and qualitative and quantitative studies. Thc:sc methods can allow for both heterogeneity between different
soun:csofevidence and can beextcndcd to allow for differing levels of bias associated with different study dcsips and quality. Specific methodolOgical issues in EPIDEMIOLOOY for which Bayesian methods have been advocalcd an:: !.IEASVREMEm' ERROR. MlSSINO DATA and pharmacocpidcmiology. when assessing evidence on potentially rare but serious adverse events. One specific an:a of cpiclcmiology for which Bayesian and empirical Baycs' methods have been used ror some considcnblc time is SM1lAl. EPlDEAOOLOOY. in which interrelationships between gcogmphical an:as arc considered. 1he wmparison of institutions in terms of health oulcomes. often rcfem:d to as profiling. raises a number of methodological issues. most notably multiplicity and issues concerned with interprc:ting the outcome in individual 'units' that appear abenant and for which Bayesian methods have been applied. In ewlualing hcalthcarc interventions intcn:st often rocuses not only on clinical effectiveness but also on cosleffectiveness (see COST-EfRC'11VE ANALYSIS). with both clinical outcomes and resoun:e usc/cost data collected as part of the study. Methodological issues arise when analysing bulb outcomes simultaneously. most notable of which is Ihc conelalion between the two and for which Bayesian methods have been advocated. Although collection of both clinical and cost data within an RCT is highly desirable. such studics arcoRea ofrelalively short duration and extrapolation to the longer tcnn and 10 include other oulcomes is often required. Such eXlnlpOlation is most f..,quently achieved within a decision-modelling framework- which decomposes the intervention/disease pathway into a finite number of transitions or slates bcIwec:a which patients can move (sec DECLSION 1HEClRY and MARKOV CHAIN Mmm: CARLO). Decision models can assess cithu clinical or cost-effectiveness of competing interventions 01' policies, with different ports of the model being populated by either different SCJUKCS of evidence or the same source. e.g. study. by using a common metric for different health states, usually a utility or quality of life outcome (sec QUAUlYOFUFE MEASURaIENT). n.c key advantages thai a Bayesian approach confen on such models arc the ability to infer indin:ctly key model inputs on which there may be no direct evidence and allow ror appropriate soura:s of uncertainty and correlation in the model inputs. The development of economic decision models can alllO play an important role in identifying aspc:cts of the model (and therefore intervention/disease process) about which there is considerable uncertainly and on which fwther rcsean:h may nced to be commissioned. While the an=as of application above have conceatrated on epidemiological and evaluation studies, Baycsian methods arc beginning to be developed in oahu areas of biomedical n:scarch. most notably image analysis. time series and genetics. especially the analysis of gene expression data. 35
BAYESIAN NETWORKS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
While the: usc of Ba)'esian methods in many IR8S of biomedical resean:h conveys numerous ",·anlagc:s. Ihcir use mauiJa can:ful and conscic:atious application. which places considenble emphasis on Ihc role of seMitivily analyses wilh rapecllo Ihc sllllislical modeL prior distributions and computational methods (see MARKOV CHAIN MONTE CARLO and BUOS and WinBuos). In onb 10 impm~ and hannanise the ~il1l of anaIyscs using Bayesian mdhods a checkJila1 SlI)v!sWQlch (SpiegelhallCr. Abrams and M),Ies. 2(04) has bc:cn developed. KRA A.......... K. R. 1998: Moailaring nmcIamised CGfttIolied trials PIrkinson's disease trial ilIUSlrates the dangers or stopping carty. Bril" MeditQl Joumtl1116. 7139.1183-4........... J.l\L ... SmHb, A. P. M. 1994: BayaitDr lheory. Cbicbcstcr: John Wiley a Saas.Lld.CIIaIaDer.K..C..... T., ........ T.A. .... l\Iatb,J.P. 1991: Graphical elicitalion of a prior dislribuliaD for a cliaical trial. Slalisticitln 42. 341-51••• P.l\1., AsIIbJ. D. ..........r, l\L K. 8. 1997: TulOriai in biosllli5tic:s: Bayesian daIa monitoring in clinicallrials. Sltllulk~ in Mrdkine 16, 1413-30. GafsaaII, C. ad GneaIIDaIe,J. 8.1992: Ba)'aian mclhods forPbase I cliaical trials. SlalisliaillMedidlre 1I.1377-19.G..... c.. .... ~J.1916: Combining probabilily distributions: a critique and 1ft IIIIICIIaIaI bibliograph)'(wilhdisalssion).SlalistialISdarte I, 114-41. GlIb, W. R., RIc...... S. .... SpItae........,D.J.I996: MtlTleo" • • Mo"te CtUlo methods iIr prtlttire. New York: ctaapmaa a Hall. K-, R. ad A. 1995: Bayes' flClOnl and modd UIICCI'lainly. JOIIIfItll of lire AmeriRlL'l Sttlli"irtli AMotitlliolr go. 773-95. 1JDdIey, Do V. 1915: Mttlcilrg _wortS. 2IId cdilion. Chichcstc:r: Jaha Wiley a Sons. Ltd. Panur. M. K. B., SpIeaIIIaaIttr. D. J. _ .........., L S. 1994: 1'bc CHART trials: Ba~sian design and monilGriq in practice. Sltll&lit~ ill Metlidne 13. 1297-312SpI••'h'1Ir. D. 2001: Bayesian methods for cluster randamized IriaIs with CIOIItiIBlOUS RSpoaseL Sltllulit:~ in Medin. 20. 435-52Splellla.lter. Do J.. AInnII, K. R. .... MJ.... J. P. 2OCN:
,..n;
Raft..,..
BII}'eJitm approtldws 10 dilrimllrialJ II11II heallh~lUe nmutlliOll. audacster: John Wiley &: Sons. IJd. SpIepIWer. Do J., FrIed....... L S. &lid ........., Me It. 8. 1994: Bayesian IIPPR*hes lo rudalDiscd trials (with dilCUSSion). JounItIl of1M RtI)YlI SltllUlical Sotiely Serie~ A 157. 357-17. SpIep-'Mer, Do J.. Belt, N. 0., Carla, 8. P. ad na ..... UDde, A. 2G02: Bayesian mea5IIIa or model complexity aad fit (wilh discussian). JOUffIGI oj ,lie Royal Slalistical Socirty B 64. 513-640........ No 1991: Sample size delcrminatioa for Phase 0 cliDical trials IIascd 011 Bayesian cb:ision theory. BionrelrkJ 54, 279-94.
Bayesian ..tworks
One melhod or delenDining whether or noI lO Slop Ihe lrial is that of pcnuasion probabilities. The: pcnuadc-lhopessimisl probabililY (PPP) is defined as Ihc posIerior probability thai lhe: new balmenl is beta than abe standard (Ralmenl. The pcrsuade-Ihc-oplimiSI probability (POP) is Ihc posteriar probability thallhe new lRalmenl is no be:u.:.. than the SIandanl. Prior 10 cammencc:mc:al of the: trial. IWo pairs or prior disbibulians for Ihe effectiveness or the standard and new (Raiments an: chosen. One pair is thai of aD investigator who isoplimistic thalthe new tmdmc:al is beller than lhestancIanL The olhcr pair is that or someone who is pessimistic (ar sceptical) aboulthe effectiveness of the new balmenl. Also pn:-specified an: thn:sholds PPPCIUT and POPcarrAl each interim analysis PPP and POP an: calculaled using Ihc data colleclc:d so far. If POP> POPc ....,. Ihe trial is stopped. because even an opIimist should be penuadc:d that the new lreatmenl is no beUer than the staadanl. Similarly. if PPP > PPPcarr. the biaI is slopped bc:c&usc e~ a peuimisl should be: pmsuacled thallhe new tn:allnent is better. For further delails see HeiljaD (1997). SRS (See also Ba\YESJA.~ ME11IOO5J
HtItjIuI, D. P. 1997: Bayesian interim analysis or Phase I( cancer cliaical trials. Sltlli"icJ in Medicine 16. 1791-802.
benchmarking This is a proecdun: for adjusting a less n:liable series ofobservations to make it c:onsisIcnl with man:: n:liable mcasun:ments known as benclrnJtlTle,. For example. data on hospilal bed occupation coUc:cted monlhly will DOl necessarily ~e wilb ftgura collc:ctc:d aDnuaily and Ihe monthly IIgura (which an: likely 10 be less ~Iiable because the annual 11l1Ircs will pmbably originate from a census, exhausli~ administrative n:conls 01' a iarJer sample) may be: adjusted al some poinllO ap:e wilb the I11CR n:liable annual fil1lRlS. Benchmarking isotlen useclloacljust time-series dala 10 annual bc:ac:hmarb while pn:scrving as far as possible Ihe monlh-to-maath movemenl of abe original series (see. rar exampl~ Cholelle and Dqum. 1994). SSE CIIDieUe, P. A. ad ........ £. 8. 1994: Bcnclunartiagtimc series with aUlocondalcd 1illlYC)' cnors./,,'emtlliolloJSialislic,Rnk..·fil. 365-77.
See OIlAPHICAL MOIlELS
Bayesian persuasion probabilities These
Berkson's fallacy
1ft
posterior pmbabilitics Ibat a new lreatmenl beil1l lelilc:d in a Phase II clinical lrial is beller dum or no belle.. than a sIaDdanIlmilmenl.1n a Phase II lrial IN1ERIM AN.UYSES 1ft carried oullO dcaennine whether ar nolto Slop Ihc bial early becaus~ on Ihc basis of the dala aln:acIy accrued. the: new IMalment appears eilher unlikely 10 be beller than the slaDdanlllallDenl or alikely nollo be: better Ihan it.
Sometimes a spurious n:lationship can beconcluclc:cl because the dala from which the conclusion was clc:riyed came from a special soun:e, which is DOl repn:sentative of the gc:aeral population. Such bias is known as Berkson·s fallacy and it can anly be avoided by ClRful Slud), design (Waller 1980; FeinSlein. Walter aad Horwiaz.. 1986~
Woodward. 20(5).
A classic example of this bias is the study or aulopsies by Pearl (1929). Fewer aUlopsies dian expecrm found baIh
__________________________________________________________________ tuben:ulosis aad cancer lo occur together: the frequency or cancel' was thus lower among lUbcn:ulosis viclims Ihan oIhc:n. This Ie:d Peart to the erroneous eonclusion that tuben:ulosis might be offering people some kind of protection againsl cancer. even leading 10 the suggestion thai caneer patients might be lJ'eated with the protein of the aubeR:ulosis bacterium. 11Ie problem wilh this line of Ihinking is that DOl ever)' dcaIh is aUlopsied; in this case it turned aut that people who died with both diseases we~ less likely to be autopsied. leading to an anificiallack of numbers with both diseases in Pearl~s autopsy series. Bedcson's fallacy is a particular problem with caseconlrol studies. For example. suppose that both the case and eontrol series arc derived fiom hospitals. If it happened that anyoac with both the 'case' disease and some other disease we~ more likely 10 be hospitalised than someone with only one or the pair, we may well se:e a n:lationship between the p~valenc:e of the: two diseases in the caseconlrol study. even when Ibe~ is mally no such n:lationship in the general population. Exactly Ibe same situation may also give rise 10 spurious ~Iationships between any risk factor for the "second' disease and the disease that deftnes cases. For instance. consider a hospital-based CBSe-aJlltrol study of eoffee drinking and angina among the elderly. Suppose that coffee drinking is a risk factor for ParkinSOll's disease. If someone has Parkinson's disease she or he is unlikely to be hospitalised unless she or he develops a polentially life-thn:atening condition. such as angina. Most individuals with angina will be tn:ated in the community. the exception. perhaps. being when there is a disabling CDmorbidity. The ~sull of these hypothetical eonditions might be a disproportionate number with Parkinson's disease (who tend to drink eoffee) among the angina cases in hospital than among the controls (people with other iUnesscs). 'nle cusc>control stud)' would thus find coffee drinking to be a risk factor for aDgina. even if this wen: not ac~b~ (See also BIAS IN OBSERVA11CNAL SI1JI)IESJ
~
Felatteln.A. R.. Waiter.S.D.aadHorwUz,R. L 1986: Anaaalysis of Berkson's bias in casc-contrul studies. JOIITfIQI 0/ CIrrotr;~ Dis· elJSeJ 39,495-504........ R. 1929: Cancer and aubcn:ulosis. NneT;~Q11 JournoJ 0/HygieM 9, 97-159. Walter. S. D. 1980: Berkson's bias and its coatrol in epidemiological studies. JOUl'INII of Cluonk Disetues 33. 721-5. Woodward, M.2OO5: Epiden,iolol)': study, design llIfIl datil tIIItIlysis. 2nd edition. Boca RalOn: Chapman .t HalIfCRC PMs..
beta dlatrlbuUon This is a flexible PROBABJUTY DIS11lJ. BU11ON. eommonly used 10 describe a proportion. Whcn:as
many of the distributions we encounter arc nonzero OVCl' an infinite range of values. the beta distribution is nonzero only in the range 0 to 1. By n:scaling, it can be useful any time that a dislribution is required OVCl' a finite nmge. The dislribution
B~
is defined by two parameters. ,. and s, and has the density function:
j(.\")
= .t'-I(I-x)'-I/fJ(,.,s)
whe~ the fJ(,..:s) term CaD be viewed as a coMtant to ensure that the total probability is equal to 111Ie ~IEAN of the beta dislribution is r/(r+.r) and the
is ,..rI([(r+oJt(r+s+ I)]. The panunclCn r and s define the shape of the distribution. This shape can be wide ranging. with u-shaped CUI'\'c:s. n-shaped curves. sbic:lIy increasingldec~ing curves and lriangul., distributions all possible. Some or the possible distributions ~ illuslndcd in the figure (see pace 38). If,. and s ~ e:qual then Ibe distribution will be symmetric. Nole the similarities to the BlImMJAL DJSTRIB1J'I'ICN. W~ the binomial models the distribution or the number of successes, when given the probabiUty of a success. the beta can model the probability or a success given the number or successes. Indeed, in a Bayesian analysis (see BAYESIAN MEIlIODS), the bela distribution is the conjugate prior for Ibe binomial distribution. 11Ie bela distribution is n:lated lo a number of other distributions. It contains the uniform distribution OVCl' (0,1 Jas a special case (when r I and oJ I), it is i~ingly well approximated by a NORMAL DlSTRlBU110N as ,. and s increase and it can ~sult from constructions of the fonn AI(A + B) w~ A and B ~ both random wriables with OAMralA DlSlRIBurJONS. For furlbel' details on how the bela distribution ~laIes to othcrdislributions. see Leemis (1986). 11Ie bela distribution is most commonly used to model proponions. Suppose thai we wish to eslimale the specificity of a test that in trials Corn:dly identifies SO of the 52 participants that do nol have a eondilion. 11Ie usual normal approximation will not suffice since it leads to an interval rrom 0.91 to 1.01. and a value grealCI' than I makes no sease. 'nle~ ~ a numbel' of ways to use the beta distribution in estimating the interval (see Brown. Cai and DasGuplD. 20(1). AGL VARIANCE
=
=
DuG...,.., A. 2001: Interval esti-
Bro.... L D.~ C.., T. T. aad maIion for 8 binomial proportiCIJ.
Sltltisl;~ol Sdmce
16. 101-33.
Letads. L M. 1986: Relationships among common univariate distrillulions. The AnwritGII SIaiislidGII 40, 2. 143-6.
bias
Any experiment, sludy or measuring process is said to be biased if it produces an outcome that differs from the ·truth' in a systematic way. Bias can occur aI any staae of the ~sean:h process fmm the litcralU~ nwiew through to the publication of the results (Annilage and Colton. 2(05). It is important to dislinguish between bias or systematic error, on the one hand. and mndom error, on the other hand. For example, suppose that we had a population of subjects with a MEAN weight of 80 kg and a STANDARD DEVlA110N of
37
~----------------------------------------------------------------------
1
(.)
(b)
1.
1.5
.-t 1.0
10 •
,s. 8·
f
.i
8· 4.
.-8
~
.. 0.5
IM-t::::=t===::;:::==:;:::==r==::::;==::::t:===I 0.0
0.2
0.4
0.6
.0.8
1.0
I :-~~;;;;;;;;~;;;;;~J='===I 0:0
Proportion of success BS
12.0
r~
ti
-8 1.0
.-
~
0.0
0.2
p.4
0.6
0.8
(8)
.1
r.I.8.0.&
c: 10
8
t
6
i
0.0
0.2
0.4
0.6 0.8 Proportion 01 successes
1.0
r......
(f)
8 --------
a 6
I
• I· I:~~~~~~~ 'V
1:0
-t:::=t==::;:::::::;~:::;:=::;:::::::~::1
M
1.0
Proportion of SUCCBSSBS
t.
·0:8 0:" . Proportion·of SUlXl!sses
t
1.0
•JM-t::::~==::;:::==:;:::==r==::::;==_===1 I a·
0.4
(d)
(c)
~.
0.2
4
0.0
0.2
0.6
0.8
1.0
Proportion of SUCCI! SSBS
~
4 2 o~=*==~==~==~~~==~~ 0.0
0.2
0.4
0.6 0.8 Proportion 01 successes
1.0
..... dlltrlbaII_lliw"tl'ing Ih. rtII'illy.offomu I/rtl"lre wltl dUlr/buIID" can I.e: (II) lhe IIl1i[o1'm dillirilnll;on oJW (41). (h) tl hiniodtJJ CDMIIIV! .,,;bIIl;OII fin IIrU leJftey·s prior), (c) II t:UlW wil" tl sing/. made, (d,1I ~ jrlllClion oll~e proptll'liDll, fe) 1I11D111imN1,bul slill slricll)' inCf't!tl.fillg dUlribulion. UJ 1m eXlIIIIPl. Ihtll ill well tlpproximllled ~ ,he IfDrmtll dis"",,,lion
CAre'"
10 q. It we select • simple raDdom sample of·2S ·sUbjects tiom this popullllion and measun=lhcir weapls uliBl a wellc81ibnled sc:I of scales, l~ ·it is passib1e dual the IDeaD. weight far Ibis sample will be substantially cliftCmd. ftom 80 Icc. In fact. ~ is about a 1 in 20 chance: that the sample mean will be: IIIDRI than 4 k, below or 411 above the InIe mean or· 80 kg.
However, simple nndom sampliBl prod~ an unbiased e:&limale of lhe lrUe me&n weipt because:, if the pmcc:ss or sc:JectiBl a ~implc: IDJIdom sample: of 25 ~bj_ and compulinglhe sample: mean weight werelqle8led a Iaqc: numlJc:r or limes. the distribution or die: sample: means be: cc:nlml arvuncIdIe IIUe mc:an of 80 kc. 11K: larpr die sample ~ dac: closer Jhe samp~ mc:aas will be: clllltc:led around
wau"
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BIAS IN OBSERVATIONAL srUOIES
the true populalion mean. In alher words. the cxpected value of the sample mean equals the populalion mean. In this sccnario, there is no bias and any deviation of Ihc observed sample meaD from the lrUe value can be accounted for by pure chance. known as nmdom variation or random C:II'or. If. howevcr, the weighls of a random sample of subjects were mc:asunxl using a poorly calibrated scl of scales that weighed each subjeci as being 2 kg heavier Iban Ihe actual weighl. this would lead to a biased estimate of Ihc lIUc population mean weighL The six oflhis bias. or syslemalic error. would nol be by incn:asing the sample size and the distribution of the sample mean will be ClCnln:d around 82 q nlher than 80 kg. 11ae systematic error in this example is a mc:asun:menl bias due to a faulty measuring instrument. More genc:rally. measu~mc:al bias could be due to such diverse causes as poor queslionnaire design. faully equipmenl. obsener enor or n:spondenl enor (Silman and Macfarlane. 2(02). Examples of observer error include mi~ading the scale on an inslrUmc:at. bias in reporting results by an unblinded ewJuator in a clinicallrial or bias in eliciting information aboul the exposun: histol)' of c.asc:s and controls in a CA5E-cDNl'ROL SI'UDY. Eumples ofn:spondenl error include biased reporting of symploms by unblinded patienlS in a clinical llial. bias in rc:call of exposwe histol)' by cases and controls in a case-conlrol sludy (sec BIAS IN OBSERVATIONAL S1\JDIES). All types of study ~ susceplible to design bias. This can arise from many sou~es. such as saEcrlON BIAS (when the subjects selected for study ~ not relRscntalive of' lhe target population). NOmlBI'ONSE BIAS (when there is a systematic difference between the chanc:leriSlics of those who choose: to participate and those who do not). IlODComparability bias (when groups of subjects chosen for comparison in. for example. a c.ase-conllOl sludy are not in fact comparable). Randamised trials (see aJNICAL 1RJALs) arc genc:rally ~gardcd as being leaSl susceptible to design biases. 'I1Ie scope for BIAS IN aBSERVAllONAL SnJDIES. especially case-control sludies. is much greater. Annitille and Collon (2005). Ellenberg (1994). Porta and Lasl (2001) and Sackell (1979) all proVide a comprehensive descriplion of so,""s of design bias. Analysis bias arises from emn in the analysisof'dalL This covers such issues as confoundilll bias (in which conrounding fadars have nul been approprialely adjusted for in the analysis), analysis method bias (includilll inapplOpriale assumptions about the distribution of wriables. faulty strategies far handling MlSSIND DATA ar otmJERS. unplanned SlJI. OIOUP .UWoYSIS and dtlltl dredging) (Annitille and Colton. 2005; Da\'ey Smith and Ebrahim. 2(02). Ensuring that the interpn:talion ofdaaa is unbiased isjust as importanl as ensuring Ihat the pnJCCISCS of design. rnc:asuremc:nl and analysis arc unbiased. Bias in Ihe interpretation or data can be conscious or unconsciaus and is padicularly
rmuced
dimcullto adcIIus because it involves subjective judgements on the part of the reseaId1en.. Kaptchuk (2003) proVides an oveniew of the issues invol,,-ed. There is some evidence 10 suggest that the soun:e of runding far cIrq studies is related to Ihe outcome. A syslCmatic ~view by l.cxc:hin el til. (2003) demonstrated a syslemalic bias in favour of the products made by Ihc company funding the rese~h. 11Ie main S1IJUIceS oflhis bias were thought to be inappropriate selc:clion or In:atmenls to compare apinst Ihe product being investiplc:d and publication bias. Porential SGUR.'lCS of investigator bias an: reviewed in detail by Oteenland (2009). Finally, publication bias (sec SYSmL\1IC REVIEWS AND META-ANALYSIS) can arise from two main soun:c:s. First. rese~hC15 arc IOCR likely to submit papc:n for publication if the n:search produces D statistically and clinically significant mAlll nIher than an iaconclusive result. Second. journal editors DR IDO~ likely to publish papers reporting SlaliSlically and clinically signiftaual JaUlas (Dubben. 20(9). lVHG
ISee also NONRESlIONSE BIAS. SELECIION BIASI A.......... P. ad OIItoa, T. (eds, 200S: EMydoptlt!tlis oj biostoluties. 2nd edition. HCM' YaIk: JaIua Wiley a: SalIS. Inc. Daft)'
_II. 0."'" Dnllllll.5. 2002: Data ~nl. bias or canfounding. BrilUIJ Medital JDllfRQ/l2S,1437-8............ B. 2009: New mdhods 10 deal with publicatian biu. BrilfM Mftiiml lOUI'M/l39. b3272. BIlla....., J. H. 19M: Selcclion bias in obscnaliaul and experimental studies. Stll'iJli~s ill Media. 13.557-67.0.......... S. 2009: Accounting far uncc:nainl)' about inYcstipiar bias: disclo-
sure is inforlnatWe. JDllmtllojEp_,Wo,y and CDnIIIIIIIrity Healt" 63,593-8. KapfdIak, T.J. 2003: E«cct ofilllcqmive biasanlaealda
cvidcncc.BriliJIIMedimII0lllflQ1l16.14S3-5.PaIta,M. .... Last,J. M. 3D: A tikI.."., of ~pitlmlio"". 51b cdilian. Oxbd: Oxford Uniwnity Pras. .......... J., ..... L A., DJaIbeID*, B. ad ad, 0. 2003: ~aI induslry spaasanhip and man:b outcome and quality: S)*mDIic .mew. BrltiJIJ &lftikalJOIII'_ 326. 1167-76. SIIcbtt, D. L 1919: Bias in analytic raeaIdL Jormrtrl ", Chronic DismsG 32. 51-63. SIlman, A. J .... MadIuIIIIIe. O. J. 2002: Epidtrniolo,iml R_S: II pr«t_ guilt. 2nd c:dilion. CamIlridF: CamIIridce Univasity Pras.
bla. In ob_rvaUonal studies In an ideal study. an invc:stiptar seeks to estimate Ihe effect of an exposure 10 a factaron an outcome ofinten:st. We mightlikc to be able to look at what happens toa population when the rac:tar is at one level and then tum back time and ~run things atlhc: second level: llul thai is impossible. of COUI'5C. Vel)' oRen it is not even possible or pnc:licalto conduct an experimenl in which Ihe levels of exposure arc controllc:cl. so that one is left with analysilll observational data Ihat occur naturally. Bias is any systematic departure from this idealised construct. which is distinct fram purely random ctTOr. which is ZCI'O on DVClBlc. The lalter can be dealt with by RlCIuc:ing variability in the measure of association. which can be 31
BIAS IN OBSERVATIONAL STUDIES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
accomplished in a variety of ways. including the increase of the overall sample size. However. bias cannot be n:duc:ed by incrcasin, the sample size and it can only be conlrOllcd throup can:fully conducted research by an investiptor. 'lberc have been allCmpls to catalOJ;ue the types of bias that can occur and these broadly fall into thm: sources: the selection ofstudy subjeclS. c:rron in the information collected and coafouodin, or entangling the effeclS with other causes of the ouame (Hill and Kleinboum. 1998). In order to discuss the sources of bias in an observational SlUdy in more conc:me tenns. consider a hypothetical epidemiological study in which lhc n:su11S are summarised in a 2 x 21ab1e (showa in the table). We are inten:stcd in studying the association between expasun: and disease in a manner that avoids bias. Among the choic:c:s of study desi", from which data for this 2 )( 2 table may have arisen are a CROSSSECTIONAL STUDY. a COHORT SIlIDY or a CASKON11K1 ST1JDY. In a cross-sectional study. N subjcc:lS an: sampled and the four cell fn:qucnc:ies ddcnnincd. but in a cohort study. a groups of exposed and unexposed subjects arc chosen. essentially fixin, the lOW totals. and then the column fn:quencies arc dctcnnincd by what transpin:s durin, the course of followup. For a case-control sbidy.lhc column totals are n:prdcd as fixed and subjects dislributcd to each row within a column depending on their exposure history. which would usually be J;leaned by intervicw. Fundamental to each of these study dcsips is the ~lil8tionof a mndom sample. eilherovcrall or within lhc rows or columns. .... Ia oIIHnaUoai studies Tabu/aled resu/ls from an epitiemi%gim/ stlldy wilh Iwo lel'eiJ 0/ exposure Ilml diSftUe SilltllS
DixllSed Exposed Yes
Yes II
No
c
Total
Il+C
No
Tolal
b d b+d
Q+b
c+d N
Selection bias occurs when the propoltion rc:cnailCd from the IaJlet populDlion that is counted in a cell ofthe 2 )( 2 table depends on boIh the row and the column. One way in which this can occur in a cohort study is if there arc dilTerential diagnoses depending on the expasun: slatus. For example. SUppD5C thai an exposure of intcn:st occurs in a manufacturin, plant that proYidcs health insurance for its employees. but amon, the unexposed are substantial numbers who are unilUiuml.lfthe insured n:ceive n:,ularcheckups from lhcir physicians. this may increase lhc likelihood of a com:ct diagnosis among Ihose expased. while similar cases may have been missed for the unexposed that an: unilUiurcd.
Clearly, this would bias DD eSlimatc of the odds mUo that would be calculaled fmm such a study. Another potential soun:e of such bias in a cohort study may arise from loss to follow-up. e.g. if instead of exposure the invc:stiptor is intereslc:d in whether a person is using a particular type of In:atmenL However. suppose thai the ~tment is nul only ineffective but it also CBUSc:S unpleasant symptoms in patients who an: relalCd to the oc:cum:ac:e of the disease outc:ome. If the individuals so affeclCd drop out of the study. this would artificially lower the count in this cell of the 2 )( 2 table and bias the eSlimatc. Notice thai the m8J;nitude of the elTect of this selection bias may be substantial. even if the nmnber lost rcpn:sents a small proportion of the total. This is especially true when the proportion that develops the disease is small. so thai the portion lost in a cell of the lable is relatively high. even though the proportion lost n:pn:sents a small proportion of the o\'CI'all sample. In a case~ntrol study. a common source of bias when scIcclin, cases CDD occur when subjects with a prevalent diseuc an: enrolled into the study, some of whom may ha\'e had the disease for some time. 'Ihose who have been ill for a lon, period of time will be man: likely to be enrolled ifsuch a study design is used. a phenomenon known as I..ENGI1I-BL\S'ED SAMPIJNO. If the primary aims of the study are to study the association between exposure and the OCCUI"l"COCC of the disease, this will clearly lead to a biased estimalC of ass0ciation, but this could have been avoided by only enrollin, newly diaJ;noscd cases instead. n.c choice of appropriate controls in a casc>c:ontrol study can be an especially eammon soun:e of bias. If the cases an: scIc:ctcd from alllOllJ: those who an: di8J;noscd at a collaboratin, set of hOSpitals. then the controls should ideally be a n:praenlative sample of those \\'ho are healthy in the calc:hmcnt areas of those hospitals. If all hospitals in an an:a arc COOpendiRJ; with a study. then this could be accomplished by i1'Cl1Iitin, a mndom sample orthe overall populaJion in the geographic area. Random digit diallin, is one approach that bas been useful in populations weD covered by tclcphanes.. bu. it is bccomin, more difftcultto employ lhc method with the incn:asing use ofcum:nttcchnolOJ;ies such as cell phones. caller ID and noal lists. In some studies. controls an: scIc:ctcd ulin, subjects who have been admitted into the same hospital for a disease that is unn:laled to the exposure of intercsL This would result in a poup of subjects from the same catchment an:a as the cases. thus avoiding one SCJUI'CIC of potential selection bias. The estimate of association in such a study would be lhc dilTerence bclWccla the elTect of exposun: on the disease of interesI and ilSelTcct on the 'control disease' (Bn:s1ow. 1978. 1982). If one has chosen a control disease that is not n:latcd to exposure. i.e. the elTc:ct is zero. then the estimate of association will be an unbiased estimalC orthe effect on disease risk. However. it is often difl"u:ull to be certain thai this is the case because the assumption may just
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BIAS IN OBSERVATIONAL STUDIES be the result of a lack of kaowlc:dge about the aetiology of disease affecling the conbols. A cross-sectional study can be a useful way or oblaining a snapshot of the association bc:Iwcen two or more wriablcs at a single point in time. especially if the population chosen for study is of broad interest and a carefully planned method for drawing a random sample has been put in placc.. Some national health surveys are good examples of such studies. such as those conducted by the National Cenler for Health Statistics. HowcYCl". if the aim is to study disease aetiology or other outcomes that evolve o\lCl' lime. then the single snapshot in time can be a serious limitation. For example. in an epidemiological study. subjects with a dise:ase who have been identified by a sun'ey conducted at a single poinl in time would IICC'Cssarily be a prevalent case. which is a polcntial source of bias here as it is in a case-control study. Infonnation bias in an observational study arises from cnor in the wriables that have been collected as part of the data for each subject in a study. Such cnors can either be difTcn:ntial or nondilTerenlial, i.e. random. DifTc:nmtiai error in reporting values summarised in a 2 x 2 table would arise if the cnor rate for reporting the variable in the column depended on the lOW or vice vc:n&. This would obviously be a potentially important source of bias when estimating an association. However. bias can also arise when the cnor is nondilTcn:nliai or purely random. Casc-control studies can be prone to infonnation bias because someone with a serious illness may n:member their history of exposure to the factor of interest quite differently from a healthy control. This Jltx:AI.L BIAS can be especially significant when otherstudics of the ex~ ofinlcn:st havc enlCRd the public's consciousness or been repoIted in the news. One technique for minimising its elTect is to use a wellstructured interview in which the questions havc been clearly and unambiguously phrased and posed in an identical manner to all subjects in the slUdy. This requires considerable effort on the part of an investigator. in that the questionnaire would need 10 be pre-leslCd and the inten'iewer5 weillrained. Infonnation bias can potentially also atTect a study by subconsciously inftuenc:ing evaluations by interviewers. proressional diagnosticians or eyen laboratory technicians. This could happen if the: individual has a preconceived idea of what the results of a study will be or of the way the results arc going. Thus. it is generally prefem:d that the study hypotheses not be known to those responsible for collecting the dala or that the status of a subject be masked. a procedure in which the person rec:onling the data is said to be blind with respect to the outcome. These measures should reduce the: possibility for differential errors. but nol nondilTercntial errors. While it is intuiti\'ely casy to appreciate that dilTerential cnor of measurement can bias the results of an observational study. nondiffcrential cnor can also have an effect as well. If only a single variable is affected by nondilTcn:ntiai error, Ihc:n
the effecl is generally 10 attenuate the effect. i.e. to bias the estimated association towards the null value of no association. This would tend to make the results or a study with nondilTercntial error in one of the variables c:onsen'alive in the sense: that it would make it morcdifficultlo establish that an estimated association was not due 10 chance alone. Contrariwise. il would also result in an und~imate of an effect. whK:h can be important when trying 10 delennine the public health significance of exposure 10 a particular factor. It is most desirable to minimise information bias during the design and data collection phase ofa study by minimising measurement error, but it is generally not possible to be enUrely successful in these efforts. One approach to correcting for bias at the data analysis phase is to introdUClC a correction factor that takes into account the measurement error. In the case of a 2 x 2 table. formulae ha,'e been provided for this (Barron. 1977: Copeland el al.• 1977) and similar approaches are also available for use in lOGISTIC REORESSJON (Rosner, Spiegelman and Willett. 1990). There is now a rich variety orSlatistical techniques fordeaJing with errors in variables. many of which are described in the text by Canoll. Ruppert and Stefanski (1995). Confounding arises when the estimated effect for an association of interest is entangled with another factor. pemaps one that is well known to be associated with the outcome. It is conceptually relaled to aliasing in design of experiments. in which two effects arc completely entangled. and collinearit, in other contexts. 'The potential for confounding in an observational slUdy of two variables exists when each is associated with a tbinl variable. the confounder. in the presence of the factor orintcrcsL Pn:cisc definitions of confounding go to the heart oflhe objcctives ofobserwlionai studies and various models have been proposed as a theorelical basis ror its eITect (Rubin. 1974: WlCkramarablc and Holford. 1981). Altemativdy. c:oI/apmbi/il)' is sometimes used as a simple and practical alternalive 10 more fonnal definitions of confounding (Bishop. Fienburg and Holland. 1973). An association is collapsible with respect to a putative confounder ir the estimated association is unchanged whc:a adjusUng for the confounder in the analysis. Approaches for dealing with a potential confounder are in essence to estimate the association holding the value of the confounderconstanl.ln a designed experiment. this would be accomplished by selecting strata or blocks of subjects with identical values ofthe confounder and only vary the exposure of interest within the strata. One way of accomplishing a similar elTect in an observational study is to stratify the data by abe potential confounder and then combine information across the strata. if the elTect is constant. using the MANTEL-HAENml.MElHooorsomethingsimilar(Mantcland HaentszeJ. 1959). Altcmath'ely. one can adjust ror one or IIIOI'e putative confounders by including them in a model. such as the linear logistic model (Hosmer and Lcmcshow.
41
81MODALDISTRIBUTION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ 1989) or an altemaliYe CBlfltAUSfD LIIIEAR UOIB. (McCullqb and Nc:IcIu. 1989) rar D binary Iapoa5e. It is enlird), possible thDlaaobsenationalstudy will not be able to separate out the elrc:cl or an exposun: of interest froID the elrect or ziaother expos~ thai it lhouPl 10 be a aJOfounder. This is not unlike lhc more geaeral problem or eoIlinearily ahal arises in lhc context of IqI'asion anal),sis. In these silUalions. illDB)' onl), be possible to conduct a new study in which the design has been carefully conllnlcteci so that one can tcase apaJt lhc separatc contributions of a faclOr of interest from iRs confounder. TRH ........ B. A. 1917: nc eft'ectsof milclassikation on the eslimlle ofldati~risk.BiDml'triull.414_18. ~ Y.M.M., ........ S. It. aad RGIIaDd, P. W. 1973: Distnte mu/tiWll'me QllQ/yJis: tht!rNyamlpraclice. C_bridp, MA: MIT ~ss. BnIIow, N. 1978: The prupodionaI hazards madel: appIicatians in epidemioioU. Comtnlllfiarliolu in Sltltislirs - 'Th«Iry _ MlIlMRflllifI A7. 4. 315-32. BnsIcnr, N. 1912: Dcsip aad aall)'sis of auc-am1rOl 5IUdicL Annual R~rieM' of Publk Health 3, 29-54. CarnII, R. S., 1bIpptrt. Do ad SWWMli. L. A. 1995: Mmsumnml nror ill nonJiMtIT motIels. LoacIon: a.pmaa a: Hall. CopeIwd. K. T., ClltckGWaJ'.H.,M~A.J.ad RoI~R.H.I917: Bias to misclusificatioD in the estilDltiaa of n:1aIM: risk. AmeritIDI Jollnllll ofEpitkmiDIolY lOS. 488-05. HII, H. A. .... o.O.I998:Biasiaobscmlicnlstudie.s.InAmlitap.P.andCoitan, T. (cds). EIIcyt'lop«lill of biosllllmics. ~r: John W'aIe)' &: Saus.LId. B_r,D. W.... ............,S.1989:AppWloginr: regrasirHl. New VOlt: John Waley a: Sons. Inc. M...., N. ad .......*W. 1959: SlaIisticaJ aspects of the _ysis of daIa from mrospedivc studies of disease. Journal of 1M NtltiolltlJ Cllllc:er lrulilule 22, 71~. MtOa....... P. aM NeIder, J. A. 1989: GeneTaliMtiliMtD'motkIs. Loadoa: Chapnan a: Hall ROIDIr, B., SpIeaI>·n. Do . . W. . . . \V. C. 19!JO: Camctian or Ior;istic rep:ssiaa n:lalive risk estimates and canfidcnoc intcnaIs far mel-
-=
inactiWlion orllle cInag ioniazid in US adults. An accounlor the use or bimodal distributions in a mcdic.aI selling is pven in Hqberg ellli. (2001). SSE
0.20 0.15 ~
0.10 0.05
0.0 ""1....-~----,r------r--_--_ 2 4 6 8 10 Jt
bIJDadJd dlstrlbutloa Finite mixture tfulTibutitlll
.a.......,
surcmeaI enor.lbe case of multiple covariala IIICISIWd with CII'ClI'. AmerkonJDIInItIlofEpitlemiolo"U2. 734-45. JtaIIIa,D.8. 1974: Estimalinl caUSII dl'ccts of IIalmenlS in raadomixd and nonnn-
cIomized studies. JourMl tJf EtJamtioMl Psydrolo" 66, 688-701. ~ P. J. ad RaU'onI. T. R. 1987: Confouading in epidemiolopc studies: the Iidcquaf:y of the cantml paup as a IIICISIR of CCIIlf'~ Bklmelriu 43. 7SI~S.
bimodal distribution This is a PllOltABlIJIY DJS'JIUIU. with two modes. Often the two mcxa in the distribution com:spond to the data arising
TION or a FItEQlJENCY DISI'IUIU11QN
fiom two distinct populations. The nISI ligun: shows a bimodal densit)' function arising from a weipted sum or two NORMAL DlS1RI8l1rJONS (a FINITE MIXTURE DlSTRlBUIlON). An example or a histognun with two distincl modes is shown in the second ftgurc. n.c data hen: CDIIespand 10 the sw:s or
myelinated lumbosacral ventral roat libn:s laken from a kiHen of a puticular qc. The ftnI mode is associalcd with uons or gamma neurons and the second with alpha neurons. Other examples of medical bimodal diSlributions arc the age of incidence or Hodgkin's lymphoma and the speed or
fl)re size (mm x 1(t41) bimodal dlstrlbulloa Hislogrtll1l with
_be,. ddcclion Eo, mOt or G.
I~"O
distinct modes
Co, ......, F...... SuI.. J. N. (2001)
Imprvvcd evcat-relalcd fuac:tioaal MRI sipals using probability fuadiaas NeuroilnQge 14. 119l-205.
binomial distribution 11ais is the PIOIL\BD.lI'Y DISI'RI.
of the number of ·succcsses'. X. in a series or n independent trials, where the prvbability of a success is p far each llial. Spcciftcally lhc distribution is giveD by: BUOON
Pr(X = .\') = .( n~ ).,r(I-P)"-~ •.\' = 0, 1,2, .•.• n ."C. rr .\'. where n! (radorial rr) is the prodUCl or alllhc inteaen up 10 and including n and or is defined to be I. 11H: meaD or the disllibutian is rrp and its variance np( 1 - p). Some binomial dislributians with n= 10 DDd dift'cnmt values or p an: shown in the ftgun: (see PIlle 43). 11H: distribution often OCCUIS in medicine as the basis for tcsling the hypalhesis that the
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BIRTH.COHORT STUDIES·
n.10.p.o.1
0.5·
0.5 ·
0.4
0.4 ·
,~ 0.2
f~0.2 ·
O.t
O.t
I
0.0
2
0
. .\
6
8
0.0
1b
I
I .!
.!
o
2
o•
2•
Number of SUCCBSUS
I
4'6 8 Number of SUCCBSUS --'-
-'10
8•
10'
n.10.p.U
0.5 ·
0.5 ·
0.4 ·
0.4 ·
0.3 ·
'0.3 ·
1
I
J0.2 ·
Do
O.t ·
0.2 ·
O.t · I • 2
0.0
o
. 4
I
• 6
8
'
0.0
10 .
Number of succeSIBS
. 4
I
I 6
.
Number of succeSIBS
......... dlstrlballo. Binomilll _IributiolU lor mrillla mlue6 II/n lind p probability of some event of iDlCl'al IaIces a particular value. For example~ a n:sean:her ....)' poshIlaae Ihat 1041, of a populalian is iDfcctcd with a vinas and, ~n sampliDg 20 people at random flVlli.the po~~ fiDds 1h1ll6 people ha~ dae virus. Is then: any evidence dud the iDfection rate is .hi&hcr than the hypolhcsised value or 10 CJt? ~ answer Ibis .question a P-value can be computed from die biDomial distribulion as the probability Ihat 6 ·or more people in the 20 sampled have the virus wilen the probability lhat a penon is inl'ec:lecl is 0.1. i.e. Ihe sum:
~
2O! ~ 2D-. ~ x!(20-x)! (0.1) (0.9) 11M: rauiting Value is.O.Ol, givin& Sbung evid&:ace thai the infaclian rate is Iarpr Iba 10 ... As wc~ as _ling for a specific proportion. the biaomial distribution ca be usc:cl in calculatin& CONfIDENCE INT!RVALS fora pmpartion. Vallanueva ~l til (2003) usc the binomial.disbibutioa 10 c:stimale CUDftdencc: intervals ror the· pmpanion or adVerts in medical jaumals willi iDaccunde CIaiIDS. MCR clewis of the binomial distribulion can be found in AlImaD (1991). BSIiIAGL
A"-, Do o. 1991: PrtlCti~lII. "malia for nuNliclIl ~."dr. Laadoa: Chapman a HalL ~ p......... S............ J. .ad .....6, L 2003: AccuIKy of pbUmaccuticai advcdisemenls in medical jCIUIIIIIs. 14M~I 361. 27-32.
blolnfarmatlcs 'I11is is a
term given
far· the c:omiq
lOgcthc:rofmolc:Cuiar biology, C:ampulCl'se~ ~ icsandstalislics to deal with thce~-apanding genomic and prvleomic cIaIabascs. which are themselves Ihe result of rapid lcchnoiogiealMlYllllC;cs in DNA sequencilll, gene e.xpJaSiaa measuremc:at and macromolc:c:ular strudun: dc:acnnilllltion. In many cases sue" techniques pve rise 10 HIGH DJr.tI!NUDNAL DATA. A coJl1lfthensivc accounl ofbioinforrnatics is giw:n in Zvclebil and 8aum (2007). BSE ZYIIIIIIL M.. .ad ......, J. 2007: Ulltkrsllllllli", biDitlftJrmtllil:~
GIrIaad Scieace.
birth cohort studies
'I1x:sc IR stuclieseSlDblishecilo
e.xamine grawth. development and health or cbilcln:n flUID binh. ~~ giftn sufticienl follow-up they also provide 43·
BIRTH COHORT STUDIES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
insights inlo inftuences on aduk disease that operate throughout the life coune. In principle. a birth cohort study is one in which all study participants are recruiled al birth and then followed over lime. The cohort is defined by the localion and the lime period in which the participants were born, which may be those born in one week or over a period of a year of more. The members of the cohort are then followed up at various lime points to ascertain risk factors and health outcomes. As the cohort ",es the focus of the research lends to shift. In the early years. the emphasis tends 10 be on childhood poWlh and developmenl and risk of childhood illness. but as the QObort matures adult risk factors such as smoki", and obesity and health measures such as blood pressure stan to be of greatel' interest. Oulcome variables in childhood such as height can laler be conside~ as risk facton when assessing chronic: disease later in life. Inlergenc:ndional and genetic faclOrs are of interest and infonnation on family members is included in many birth cohorts (Lawlor. Andersen and Baity. 20(9). The first cohort to be established using recruitment at birth was the National Survey of Health and Development. which slUdied babies born in Britain in the fint week of March 1946 (Wadsworth el at•• 20(6). The eohort has been followed up on more than 20 occasions since birth. the latest being at age ~ years. Contacts with the participants ha\'C been by postal questionnaire. home and cliDic: visits. and through schools and links with health and educational professionals. Brilain has lwo oaher birth cohort studies conducted on similar lines. They comprise those born between ahe 3 and 9 March 1958 (Power and Elliott. 1992) aad between 5 and II ApriI1970(EliiollaadShepbenl.2006)respectively.Both studies have included a number of follow-ups that have gi\'en insights inlO the powth and development of these cohorts through childhood. adolesc:ence and into adulthood. Crosscohort comparisons have also been possible and have allowed examination of secular tn:nds. e.g. inlO Crohn's disease. ulceralive colilis and irritable bowel syndrome (Eblin el at., 20(3). The Millennium Cohort was recruited in a dilTen:nt way (Smith and Joshi. 20(2). and a further cohort study is planned for births in 2012. Birth cohort slUdies are not of course confined to Britain. though until recently comprehensive national cow:rage has ruely been allemptcd elsc:where. The Scandinavian counbies have well.developed lin~e systems and in Norway and Denmark studies or ovel' 100 000 births have been launched (Olsen el al.• 2001: Magnus el al.• 2(06). and the United Stales has embaJted on a study of a similar scale (Branum el at•• 2003. and sc:e hllp:l/www.nalionalchildrenssaudy.gOY). Frequently. birth cohorts an: located in one town or city. For example, the PeIOlas Birth Cohort Study in Brazil n:cruiled all births born in the cily of Pelotas during 1982. It represents a good example of a birth cohort study with
long-term follow-up in a developing country (Vic:tora and Banos. 2(06). Many birth cohorts have been defined retrospectivcly. Thus births in a defined geographical area during a specified time period ~ identified from established records. The dala can then be linked to other standard records such as death indices. or the sludy population can be traced and those slm alive can be assessed by post or by interview. An example of this is the birth records of the 1920s and 1930s from the English county of Hertfordshire that were extracled in the 1980s. The population was lnIced through the National Health Service Cennl Register and details of deaths and current general praclitioner addresses obtained. This allowed nol only an analysiS of mortality in relation to birth and infant weighl but also enabled follow-up of the survivors to examine them for risk factors for chronic conditions s~h as cardiovascular disease (Syddall el at•• 2(05). Some retrospectively defined birth cohorts have focused on particular events that gave rise 10 extreme liying conditions. Forexample. those born in Amsterdam in 1~1945 around the lime of the famiRC imposed by the Oennan occupation have been followed up 10 assess the impact of famine at key stages of pregnancy and early life (Roseboom el at.• 20(1). Similarly. a cohort of men born in 191~1935 WCI"C identified from one district in Leningrad. a third of whom had experienced starvalion during the siqe of Leningrad in 1941-1944 when the)' were around lhe age of puberty (Span:n el al., 2003). The whole cohon was followed up and invited to take part in health examinations to assess the long-Ierm effects of the famine. There is also inten:st in defining birth cohorts at an earliu time point than birth. A child's growth and de\'Clopment begins before birth and so characterisalion of aspects of pregnancy is considen:d importanl in determining the long-term in8uences on the olTspring's health. The Avon Longitudinal Study of Parents and Children (ALSPAC) recruited 14 000 pn:gnant women resident in the English county of Avon whose expected dates of delivery wen: between 1 April 1991 and 31 December 1992. The women and their offspring have been followed up by means of postal questionnaires on many occasions and a subsample known as the Children in focus was seen at clinics 10 times before abe age of seven )'C8IS. From that age onwanls clinics bepn for the entire cohort (Golding el at•• 2001). lhking this one step further. with an increasing focus on the very early origins of life. two cohort saudies have recruited women hefon: pn:gnanc:y. The first of these recruited some 2500 women in six villages near Pane in India. Of these. OYU 1000 became pregnant and full data were obIained on nearly 800 births. This c:ohoIt has now been followed up inlo adoleseence (Rao el at.• 200 I). In the UK. the Southampton Women's Survey rc:auited over 12 SOD women aged 20 to
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BIVARIATE 8OXPLOT
34 years when they wen: not plqlUlDt aad over 3000 orlhcm we~ studied throlllhout subsequcat p~pancies and Ihc chilcm,n are being rollowed up (lnskip el QI.~ 2006). The ~cal1y launched US Nalio.... Children's Study is mainly ~ruiting women ill Ihc first trimester or pn:gnllllC)' but is also including some women beron: conception (hUp:l1www
.nationalchildn:nsslUdy.gov). Birth cohort slUdies ha'VC many Sln:nglhs. Usually they
aaptuR: a mJIiSoSICCtion orlhe papuIaIian and lhey havc all die adWDlllpS orlongitudinal studies. However.1he weakness is tlud over the lire counc a lugc pen:entqe or prospectiyely defined bilthcabans lends todRJp out. MIlD)' DIIOFOUTSan: due todc:alh as thccohort ages or to mipation out orlhc "'lion or c:ounlryof study. Persistent questioning and n:qucsls to attend clinics 01' be Yisitc:d at home adds to the albition. as IOII1C participants fccl ....t they baYC conlributcd CROUgh and lheir motiVlllion wanes. Tbe ~maining cohort may no 10000r ~nt Ihe general population. Rc:IraspecIi'VCly defined cohorts CaD suITer less flUID this problem but then they oRm lack suflicicnl dala on die early ycars. HI (See also COIKIn' snJDIfS)
BnIIImI, A. M.. Co'awn, G. W., eon.. A.dal. 2003: NIlianaI Childraa's Study or envinwnental effects on child health and cbdopmcat. En,inHrnren'a' Hm/," hrJ/WcI;re 111.642-6. DdIII, A. G. c., l\~, S. M .. EIdIom, A., .......r, .. Eo &lid Wak...... A.J.2OOJ: PIe\'llleaceof lasaoinlaliaaldiscases in two British Dati'" binb cabons. Gu, 52. 1117-21. Ill", J. ad Sllepllenl, P. 2006: Cohort pmftic: 1970 British Birth CaIat (BCS'70).IRlmIII,iDlltll JoumtI' ofEpitlemiolol>' 35. 836-43. Goldm.,J.,........,.. M.,J--. R. ........ ALSPAC...., T_ 2001: AUPAe - The Avon LonPtudinai SIUdy or Pumts and Childraa I. Sludy methadoiCICY. PtIe_'r;~ tnttl pm-'ol Epidem;' oIogy 15. 74-87."",. H. Mot GadI'n)', K. M., ItDIIIDIDa, S. M.,
11Ic Millenniwn Cahart Study. Pop,,1or Trmtb 107, 3Q..4. Spuft.
".1IDIbJ.,
POI Vipri, 0., SbesIoY, 0. 8., s.. Puf'IIImI, N.. ......... V., PaIaroI, D.... 0aIaIdI, 1\oL R. 2004: LaIg tena mortality after sa~ SW'VIIion dun.. the liege of Leninpad: pnlSpCCtivc cahan ady. B,'tUII lIedital JDllrIftlI January. 321. II. 17........ R. Eo, .\ale Sa)w. A., ........., & M.. Maa1ID, H. J........... D.J. p.. C'Gaper. c. ........ ReItfDnllbln CoIIarI StadJ GnqI200S: Cohort pmfiIe: 11aellcdfonlshiR Cohort Sludy. In'mNlliontII Journal ojEpitienriology 34. 1234-42. Vkton, C. G. "'1IarnI, F. C.2006: Cohort profile: n.e 1982 PcIotas (Bruit) Birth Cohad Study. In'~nrtI'ionaJ JOIII'IIIII 01 Epitkn,iDID,J' 35. 23742. Wadswartll, 1\0'" XIIIIt Do, au.:..rds, M. R. 2006: Caban pralle: The 1946 NIIIionaI Birth Cohort (MRC N.a.aI Survey of Heal... and De\'Clapmeal).'If'e1I1t1,itNrtllJOUI'MI 01 Epidemiology 35. 49-54.
"'1IanI)'.
blserla' correlation
bivariate boxplot 'l1Iis is a two-dimensional analogue of the BOXPLOI' for univariate data. which is based on calcuIDling "rabust~ measu~s or locatioa. scale and com:laliOD. It
or
consists essentially a pair of conccalric ellipses. one or which (the "hiIllC') includes 50'1, the data and the other
com:laiions and large rorsmaU absolute values. Dclailsorlhe construction or the biyariak: boxplot an: p'VCn in GoldberJ and Iglewicz (1992). This Iypc orboxplot may be useful in
nee 0
~r" -:::r.::.. \ /. }.,
..... K., N,ad, W.,SkjMn... R. ................ C. 2006:
Study). Intmlll'iontIl JDllrlltliol EII_mlo." 35, 34-41. .... Sa, y~ c. s., K....... A....... C. H. D.. Maraetll, 80M., J. . . .A.A..SIder. R.,JGIIII,s., Rep.S.. Lubne, H. ... DIal, B. 2001: ..... ormicranutricnt-rich foods in India IIICIIbcn is associated willi die size or lbeir babies • birth: Puc Matanai Nutrition Study. JDUmG/ol Nutr/,1oIr 131. 1217-24. ItaIIbooIII, T. J.. tall . . . Metdta, J. H. p.. 0IIaaad. c., Barbr, 0. J. P., . ., ...... A. C. J. ad Blebr, o. P. 2001: Adult survival after expGiSUlIC 10 the Dutch famine 1~5. Pardia'ric ontl P«ind,a1 EpitlmJiology IS. 220-5........ Ie. and J.... H. 2002:
on
prcu'"
or
(called Ihe "fenan which dclineales potential wubJcsomc outliers. In addition. raiSlaat ~I~ssion lines of both y an x aad .'l on y 1ft shown, with Ihcir intersection showilll Ihe biYariatc locations estimator. The aaatc angle bc:awecn Ihe ICgn:ssion lines will be small for a IIIIIC absolute value of
.....,C.M.,Barbr,D.J.... CaapIr.C.2006:Cabortpmft1e:11ae Southampton Women's Sunoey. hrlmllll;olltll JDUmGI t1/ Epidemiology 90. 42~. Lawlor, D. A., AIIdeneII. A.l\L I11III Batty, G. D. 2009: Birth c:dtart studies: put. Pft:SCIII and fUhR-ln'l!I1ItIliolllll Joumol of Epitlmriolo".18, 897-002. Map..., P., ..... L M., Cahart pmft1e: The Narwqian MadIer and Child Cohort Study (MoBa).IIf,nnoliDlltllJOIITIItIlojEpiden,ioIolJ' 35, 1146-50. 0Isa, J., l\felbye. M., 0 ..... S. F. ~ III. 2001: The Danish Natianal Binh Cahart - its backpound. strucI1IK IDII aim. SmntlintniDn JOW'IIa' 01 Pubik Hm/'Ir 29. ~7. Penr, c. ... EllIott, J. 2006: CGbart profile: 1958 British Binh Cohort (N1IiDnaI Child newlopnnt
Sec CORRELAnON
,
/
,
I· • I
\ __i • DeInIII .. I ,
I·I
.0,J
\
I
I
\1~/ o
500 1000 1500 2CIOO 2500 SOOO Number of manufaclurlng enterprises _oying 20 or more workers
blYlll'late IIoxplot SCQ',erplol 0/ sulJur dioJci_ concml,Q-
lion QlllinJI nunrber D/nJIIIIlI/tICluring mterprises/or citws ;n lire USA, mowing lire bivtll'itlle boxp/o' of ,Ire
.'Q
4&
BIVARIATE DISTRIBUTION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
indicaling the distribulional propc:rIies of the cIaIa and in idcntifyinJ possible outliers. An exlUllplc faraSCA11ERFI.DTor the number or manuracturing enterprises cmployina mo~ than 20 people: qainll the: polluti_ level as measured by sulfUr dioxide c:oncc:nlnllioa for a number or us cities is shown in the ftgun: (sec JNIIC 4S)~ fiYC cities ~ indicated as outlic:n. rour thai haw: man: than a thousand manuracturing cnIcIprisc:s. bul one. Pnn'idencc. thai has only a n:laliYCly small number or manufacturing cnlClpriscs. SSE (Sec also SCA111!IlPI.Dr MATRICES) GaIdbera, It. M. ad JaIewkz. B. 1992: BiYariate extcnsialls oldie
boxpkM. TuluIDnretrit$l4. 307-20.
bIv. .ate distribution
For each and evcl)' pair or reasible values. lhc probability thai a pair or variables will lake IhaIc values. 111is lhcn is a naIUral exfcnsion of the idea of a lDIiYariate prababilily distribution. applicable when we ~ measurinllWo paiml Yariables.e-l. a penon's height and weight. If Ihe two variables ~ independent then tile bivariate clistribulion will not be of puticaIar inten:&L When they ~ CXIIIelated. as willa height and wcilht. however. it becomes importanl toeansiclcr ahc bivariate cliSlribution.lrwc wc~ to look aa a sample we might scelhal20" orlhc sample an: over 6 reet tall and 20., of ahc sample ~ under SOq in weight. but only by IoakiaI aa Ihc bivariate dislribution would we know ahat Ihcn: wen: no (or rew) peoplc who wen: bath over 6 fi:ct IaII and under SO kg in wei,ht. When:as univariate distributions can usually be depicted in a BAR CHART or 1IIS'I'OOItA1I, bivariate distributions. clue to their additional dimeDSion. cannat. Ir Ihe two wriablcs ~ categorical then a simple crass-tabulation will pmbably be most informative. while irthe two variables an: continuous a SC'ATI'fllPlDl' will probably be apprapriatc. Par eumple. whaals in pnMOUS univarialc work Mcl..arc:a ellli. (2000) hayC sepanllely lookc:cl 81 the distributions or n:d blood cell volume and hDClDOllobin levels when idcntif'yina anaemia. in a mon: sophisticaled approach, Mcl..arc:a et III. (2001) cmploy a biyariatc distribution or n:d cell volume and haemoglobin. The masl commonly cncou~d bivariate distribution is the BIVARIATE NOR1W. DISTRIBU11OM. a special case or ahc MUIl'IVARIATE HOB1AL DImUBUIlON.
AGL
Me...... C. It., "am......., &. L, Mc~ O. J., """'111, II. C.. U, G. M. ad .1d.aND, G. D. 2000: PIIient-spcc:ific ....Iysis of sequential hIcmatoIopcai data by natliple Unear ~ and JIIimR disarillulion moddUnc. Sttlti.rtiu in Mf!tiicillr 19. 1,83-98••1d.anD. c. E., c.Ia. LV.. s.,tII, p.
x.. ...........
_ Md.""•• O.J. 2001 :CIassific:lIIioaofcIisordcrsofancmiaon the basis of millhDe . . .1 paIBIDdCIS. ndnical Report 01-56. Iniae: Infonaalion IIIId Computer Science Department. UaiYCnity or CalifomiL
bivariate normal dlstrtbutlon Tbisis a special case or ahc multiwriatc nonnaI distribution with two variables and the IIIII5l mmmon example or a bivarialc distribution. The bivariate normal dislribution is worthy of mention because or anlhe multivariate nonna) distributions it is Ihc IIIII5l conunonIy usc:cI.1he casielllO i1luslnfc and lhe easic:sl to write out in mathematical natation. Oiycn two variables, X and Y, the probability density function of the biwrialc normal distribution is defined by the mc:ansof X and r(hen: dcnolcdllJt andllyrapecliwly). lhc STANDARD DEYIA'I'IONS or X and Y (hc~ dcnolcd tlJt and tlr n:spceti\ICJy) and lhc CORRELATION or X and r (dcnaled pl. Given lhc:se values, the pmbability thai X and Y bike values x and y n:spcclively.J(x~). is: 1
1
1 [ (x-IiX) I(x.,) = exp Sf 2."l.tIf,;r=p'l 2( I-p-) til. {
This ronnula may not appear particularly plc:asana. bul is easier'1O hancIIe than for hillier variate narmaI distributions bcaause of the siftJlc c:orn:lation involWICI.. For runhercldails of the dislributi_ sec OIalficld and Collins (1980) and Orinuneu and Slirzaker (1992). When paphed as a SC'A11EIPI.Of, data rlVm this distributian will appear as a cluster or points in an approximately elliptical shape willa the density of the points being gn:atcst. allhecen~ of the ellipse. 11Ie location of the ellipse will be dependent on the two means. while the standard deviations and correlation dclcnnine the angle and spn:ad of the ellipse. The ellipse gels 'narrower' as ahc mqnilade of tile c~lation increases and approaches a slnIighllinc al p =1 or -I. Suryapnmata et til. (2001) use the bivariate normal dislribulioa in onIcr to compan: simultaneously the clinical effcctivcness and cost-cfl'ecliyeftCSs ortwo lrealmcnls ror patienls willa acute myocardial infllKtion. By dnwing a graph of the difference in el1'ecl q.insl the diffen:nce in cosl they W~ able 10 illustrate a CONFIDENCE INTERVAL ror the clilJen:nces between the two tn:allllCnlS as an ellipse. A conYenienl property of the biVariate normal clislribution is ahc ract thaa the IDIII'ginai cliSlributions oflhc two wriables an: uni\rarialc narmaI: i.c. ir Yis ignon:d.lhcn Xby itself has a NOBW. DISTRIBU1'IC»f (and vice ycrsa). Also. the condiliaaal disbibulicms or X and Y are nonnal. To put dais another way, if Y is observed to take a particular yalue. dacn dac
_______________________________________________________________
unknown value of X saill has a nonnaI distribution given this knowledge. AGL
CMaIIId. C. aDd C....... A. J. 1980: Introt/lleUon 10 nadlia'tlritJle DIftlIyJiS. London: Chapman & Hall. GrImmett, G. R. .... stInaker, D. R. 1992: PrDlxlbility and rmrJom prtJeGJD, 2nd edition.. Oxford: CIIIKndOll ~ss. 5111')'.......... H., OUenanaer. J. P., NIhIIe...... Eo, , ...'t Har. A. \V. J., HooraUe, J. C. A., de Beer, M.J.,AI,M.J.aadZIJI...... F.200I:Long«rmoutcomeudoo.steffectiYeDCSS of stenti", ~rsus balloon angioplasty for acuIc my~
canlial infarction. NeDrl IS, 667-71.
Bland-ARman plot blinding
See UMIlS Of A(Jlf.BtENf
Sec CUNlCAL 11UALS. CAI11CAL APRAISAI.
blocked randomisation
Sec
RANJX))dJS,o\TION
Bonferronl correction This conection is used when pcrfcxmi~
multiple signiftcance tests in order to avoid an excess of false positives (ScbaJTcr. 1995). Suppose. for example. S'nJDENrs is to be applied to sample data on six variables to assess mean diffcn:aces in two populations of interest. If lhc NUlL HYPOJIIESIS of no diffe~nce in means holds forcach oflhe six wriables. and each of the six lcSIs is perfonncd at lhc 5 CJt SICJNlFlCANa L.EVEL. lhc probability of falsely ~jc:ctinr; the equality of at least one pair of means is 0.26 (this assumes the variables are independent). a ftvefold increase o\'er the nominal significance level. 11Ic Bonfemmi c:om:ction approach 10 this problem involves using a significance level of ~n rather than ac for each of lhc n lesls to be pcrfonncd. For a small number of multiple tests (up 10 about 5) this method proVides a simple and acceptable answer lo the problem of inftaling the 'JYpc ) error. The correction is. however. hiply conservative and not recommended if large. numbc:n of lests are to be applied. panicularly sinee its usc can lead to the rather UDSaIisfactory situation where many tests are significant 81 the « level but none al level rUn (Pemeger. 1998). In addition. lhc 8onferroni concction ign~ the dcgft'C to which the wriables may be CDlKlated. which again leads lo coRSCl'\'8tism when such ~lalions ~ substantial. SSE ISee also WLTIPLE COMPARISON PROCEOORESJ
'-TEST
........r. T. V. 1991: What's wrong with BOIIfenani aGj1IStments? BritimMtdicalJ0III'IftIl316, 1236-8. SdIaII'er.J. P.I99S: Multiple hypodlcsis tali",. Annual Rra'ieM' of I'S)v:hology 46, 561-84.
boosting This is a class of optimization algorithms that can be applied to lit a number of classical and modem statistical models. Its origins come from machine learning and aJIIIPUICr science (Meir and Rilsc:h. 2003: Schapire. 2003) but have been adopted in statistics as well. From a statistical point of view. boosling works by iteratively fttting
~STRAP
residuals obtained from rather Simple reg~ssion models (BOhlmann and Holhom. 20(7) (sec MULTIPLE LINEAR REDRESSION). These models ~ called base-learners and determine the structure of the linal model which. in esseac:e. is the sum of all base-Ieamers. The method is altraclive because it can be applied to multiple linear regression. LOOJS1lC REORESSlO.... classiftc81ion. SURVIVAL ANALYSIS. robust regression (see ROBUSTNESS). QUAJII11LE REDRESSION. etc. Furthennore. the rq:ression relationship can be ~stricted 10 linear or addilive funclions. which facilitales interpretation of the ftnal model. Unlike RANDOM RHlESTS, boosting is sensitive to the most important hyperparameter. the number of iterations of the algorithms. Too large values will cause overfilting. Thus. crossvalidation techniques have to be applied to detennine an appropriate number of ileralions. The algorithm is especially useful for model filling for HIGH-DIMENSIONAL D.O\TA. i.e. when the number of observations is smaller than the number of elplOl1ltory variables. Models filled by boosting algorithms have been successfully applied to weight estimalion for foelases by three-dimensional ultrasound imaging or for predicting cancer subtypes based on gene expression and single nucleotide poIymorphisms (SNPs) dala. TH
an......... P. ad HatIMn, T. 2007: Boosti", algoritJu.: R:pluizalion, pmlictioo and model filling. SIDtulicai Sdtrlce. 22(4). 477-505. MeIr,R.and Ri.... O. 2003: An iatnxluction to boosting amd Icveraciag. In Am'tllftm lecillTe3 Off nltldJiIre l«U'ning (LNM2600J. Sc••ptn, R. E. 2003: The boostinl applUllCb 10 machine lew",: an overview. In Denison. D. D., Hansen. M. H..
Holmes. C•• Mallick. B. and Yu. B. (cds). Nonlilreor ellimlltion and cltmijitDtion. Ne\\' York: Springer.
bootstrap The bootstrap is a computationally intensive technique for slatistical inference. which can be used when the assumptions that underpin much of classical statistical infen:acc an: questionable. Tbis may be because the data are not nonnaUy distributed or the dataset is small so that thccRtical !aulas based on larJe sample theory are inapplicable. For elample. the bootslnJp can be used to estimate the BIAS and STANDARD ERROR ofparameterestimatcs togethcrwith CONfIDENCE ImERYALS.
)n effect. as we illustrate in the figure. the boocslrap is a data resampling technique. It was ronnally introduaxl by Efron (see the discussion in Efron and nbshirani, 1993) aDd. although it has a sound thean:tical basis. the idea the~ is something magical aboul it is rcftectcd in its name. The tcnn boolSlnlp derives from the phrase to oncselfup by one's boolSlnlp. widely thought lo be based on one of the 18th c:entwy &dvenllRS of Bamn Munchauscn. 111e Baron found himself at the bottom of a deep lake and saved himself by hauling himself up by his bootstraps.
pun
47
BOOfSfRAP _________________________________________________________________
'. . . ..... '. . .. ... .. "' ............ ·.1 . '. . .. . S: ".': .: ... Ie ... I••: • • .. •.• ·.a •..•• ':: •.••••.••..•••• : •.•.• ,..: •..•••••. 2: .• ·.·: .. ' . .... .. ..... ......... ......... ..... ' . ..'" .... ..... " .. ,
• 7.: ... •••• • 12 • : t : .. ' : .... : :...... • ..'
'. •
•
.. " .... .. • . ' .. '... ..": • ... 10 ~. : .: ..
. . . . . .. .. • ....... •• ...-. t t • ~'...... .~." ~"
It'
9 ......1
....
.. .... : . I I ·
~
Population
parameter value •
...
'I 10 '11
Sample
2 12 4 7 I
parameter estimate 6
'5 986
S 2 15 I
., 6
8 2 'I 'It 7 2
to
2 4
241'1 '12 7 I f 7 '11
8 6 2 t '1284 8 4 'I 8
.,
7 '12 4
2 S .,
'It " 7 • 'It'lO 16.
I '10 15 .8 I 10 2 1
S 9
28 84 'It 912 1 o t
2 1 2 5
,7 4 '12 2 • • 1'112
7 Bootstrap samples
parameter eslimales 141 ,",0s,04,OI_"~ 6
•
'.
•
•
'6
•
boatstnIp S~he",a'k Illrut,lllion of boDt,t,apping We will describe the idea using the figure. Suppose we ha~ a population in which the true value of a quantity of intaat. say adult height. is c1cnOlc:d by O. We wish to estimate (J and take a sample or 12 individuals &am dUs popuIalion. In die fllIR- Ihe populatiaa is dcaoted by the IBlle m:1aDJIe in the first row (note that the numbc:n idenlify ~alaljon mem~ a I R not their adult heights). In this population.. the 12 individuals to be iDcIucIc:d in the sample an: numbered. They comprise the actual sample, which is shown in the sccond row. Our estimlllC or adult heiP~ calculalc:d &om this sample. is dcaaccd by 8. In order to quanlify how close 9, the estimate of adult heipa in aur sample, is likely to be t06, the actual adult height in the population. we need althe very least ID c:stimaIc Ihe variance or8. Imqine doin& Ibis in the followin& way. Take a lqe number, say B. of samples of size 12 from the population.ln each of these samples. calculate an estimate ofadult heiJht. Call these e:stimates 8" ... ,0,. Then estimate the variance of 9 by the sample variance of (9...... 8,). Of CIOUI'SC, dais appmach is impossible in practice; if we could doni to draw B extra samples of size 12. we waald ha~ dnwn a much larger sample initially! However. an appro.x.imation to it can be achievc:cl as follows. Suppose we sample with n:piaoemcnls from die 12 absenatiolW in the data (second lOW in the figlR) to fonn a ~subsample'. also of size 12. Seven possible such "subsampies' an: shown in the thinl row onhe IigIU'C.. Forcxample. the ftr.st subsample. shown in Ihe lint IeClanIle in the third IUW. eonsists of the follOWing observations (note some ~a lions will occur II1CJm than once, and some not at aD): C1.2. 2. 2. 3. 3. S, 6. 7. 7. 8. III. These ·subsamples' 1ft known as boot,t,ap somple,. Using each of these bootstrap samples we calculate an eslimlllC or adult height. By conll:nlion. these ~ denoted with a •••• to indicllle lhey have bc:ea calculated from a bootslrap sample. From the seven bootstrapAIBmP~ in the thinl row or the figure. we therefore gel 6" .•• ,87• Now
we simply estimate the variability or the estimate or adult height calcuJaaed from the aclllal +ta. py the sample variance or the bootstrap estimates 9., ... ,0,. or course. in praclice we waald need many IlIOn: than seven bootstrap eslimales. Another way of IookiDg at this is as follows. We wish to 1e:8111 about the mationship between the lruc population panuneICr value. 9, and c:sIimates of(J obtained &am samples from the population, denoted To do dais, we pn:tcncIthe observed cia.. an: the populalion and repeatedly sample from the dala to lcam about the relationship between iJ and c:stimatc:s abmined tiom the n:sampled d_ dcaotcd In other words, we say:
6,
9.
o... .
Di,lri""IiDII of eslimtlles 9gimr6 is lIPP,oximtlteri by
(I)
DistributiDn of e,timate, if giren 9 This is known as the boout,tlp principle. It is impartanl to separate this principle from simulation.. which is used to eslimatc the distribution ofestimates 6 given 8. In fact, then: IR two potential IOUrces or error in bootstrap procecIun:s. The first arises because the boocstrap principle cIac:s not hold lnIe, i.e. the two distributions in equation (I) an: not equal. The second arises because we only usc a finite number of baotslrap samples. B,ID eslimllle the distribution of the , s. However. this error can be made as small as we like by simply increasing B. whe:reas the: boaIstrap cnor is lied. One: of the ans or bootstrapping is to mnsiclc:r simple functions of 11. such as (9-9)/11 (when: II is the sample stanclarcl mor of 9), for whioh the bootstrap principle is more nearly bUe. 10 make things mon: concrete. we ilIUSlrate how to use the baotslrap toeslimlllC VAJUANCE. Considuthe data in the table.. We an: intcn:sted in estimating the a\'Cr. chanGe in the carbon IIIDDOXicIe transfe:,. fador. The obVious estimate is the mean: (33+2+24+27+4+ 1-6)11= 12.14. Suppose ...
A
_____________________________________________________________ 'we wae ~Ie 10 draw a lar&e a~.1J, samplesoflhc: ~ size u ..., in Ihc: table fmm the 'popIdaaioa' of ~
with ellielcenpox and c:atillllde Ihc: a\Wllp chinle aD .c:acb. Denob: Ihe n:suIlilll cstimala by 91 ••• ~ , ii, and recalllhat the InIevalueiD thcpapulalian iscalled 9..Thca an esIimalc or the varimJec: would be:
IIIOIIDXftIe,,.,,,,.
.......... 011111"" II. ctrion ./tld",..for dtklcapo.Y. ",..,mI. tJdnrbSitm," 1larpi11ll1lRlltI/kr tI sillY of11M ..wi (Din." I11III Hilrlcky.
.... ""6 wiI. 1997. p.6l1
En,,.,·
Wark
C""",e III (Wft'k - Enlry,
1 2 3 4
40 ·50
73 ·52
33
5
60 62
PtI'''''
6 7
56
80
58
15
66
64
63 60
hquenc:y of Ihese ot.r¥aIiaas in abe data in die ftatlable;
"y ·aII accui' GIKle. Rows ~Il show Ihe fn:quency or IIae observations in boatstrap umplc:s l...g.
Thus.
ia the 8nl
baolIInIp cia'........ obsc:naIian 1~s DOl appcaI',obscn.aIion 2 appealS ~ ~ absc:nali_ 3'~ twice.. Dbaer..; vati_ 4 once. obIerYaliDa 5 cIocs DDt appear. observalian 6 appealS ancc and oIa:rwitian 7 does DOl appem-: lite IIIC8D is dlen I!.71. ~ table shOws B.9 baolIInIp S8IIIpIcs. We Ihus have 9.1, •••• each..of ~ch . . . . ill approxilnab:ly Ihe &1liiie ~llIlicmIIIip to 9 u 0 does 10 the IIUe parameIer 9. We can use"'" 10 learn about Ihc ~laticiaslUp ___ ; aad O. $Peciftcally. the baolIInIp nlillUde of variance is: . . .
0,.
1 -B
2 24 'Z1 4 1
~NP
te"...~2 r-I
0.·-6 r
(3)
CcImpIIriIll with (2). we see the baolIInIp vasiaa (3) is _hal by (a).pUltiIll "'. na~ 1D~iD& with a ·hat". and (b) puUilll a ..... Oft wbaI is left. 1his rule oflbumb is ftI)' UBCfuI in practice. SubslibltiD& Ihe baolIInIp cstiIIIab:s Iian die secand IIIWe
-6
gives:
• (9;-9)1 -I~.
(2)
B,:
~ [(11.7.-12.14)2 + (7.00-12.14)2 + (18.57-12.14)2
UsiD& Ihe boubti. principle: (I), we CSIimu this by (a) . . - . 9 by its p.ae &am the _ 0, aacI (b) . . - . dac: iii by. ii. wlacn: cac:h is the IIICIIIl cadIon monoxide . . . . . in the ilh bootstrap sample.. The secancI .... shows the boaIslrap in action. 1'hI: ftnI lOW shows lite absenrccI diff'cn:aces, conapondilac to the rourdI co"ma of . . Intlable. ~ IKCIIId nM shows.the
i;
+ (1,.43-12.14)2 + (15.71-12.14r + (26.43-12.14)2
+ (13.29-12.14)2 +(24:29-12.14)2 + (16.43-12.14)1]
= 7.112
.........pFnqwnriawil. ""'klr.m~fnJm '.tWiginlll.'lIm 'kjin' '''''tlpJlNrlnern:'' t1f"_~'rk bDoI.,,.. ..",.. SItI'Ulk
0bIcI val differences
Flaauenc:y in
33 1
observed clara lit bootllnlp sample
2 1
24 1
27 1
3
2
1
1
2ad bacIISInp I8111pIc
1
I
.3n1 baotsInp IIIIIIPIc
3
I
4dI boaIstnIp umple
I
I
SIll boaIstnIp IIIIIIple
7.
4 1
1 1 1 1
1
2
1
2
1
2
2
2
6Ih boaIstnIp IIIIIIple
4
1
1
1
boaIstnIp IIIIIIple
I
2
1
1
8th boaIstnIp umple
2·
I
2
2
9dI boaIstnIp IIIIIIple
I
1
1
2
-6 I
I
I
1 2
2
(me_)
..
B= 12.14
'I...... =
11.11
... = 18.51 ... ' .. = 15.43 ... "... = 15.11 " ='26..43 ... IJ., = 13.29 .... '.... =24.29 " = 16.43 '2 =7.00
0]
I
41
BOXPlOT _____________________________________________________________________ However. B = 9 is nOi nearly enough. 'l)pically we may need BIOWld B 800 boatsb'ap samples lO eslimate the variance accuntely (Booth and Sarkar, 1998). Taking B= 1000. we ftnd lhatthe bootstrap variance of the mean is 5.392• which compan:s with the maximum likelihood estimate of 5.382 • 1be bootstrap eslimlllC of the standard enol" of the mean. 12.14. is thus 5.39. Of course., this example is only illusbative; we know the answer anyway. Howeyer. in many circumllanc:es we may not. e.g. if the data are nOl normally distributed and we want the slandanl enor of the median or some Olhc:r nonslandani measure of the data's 'cenlle·. The bootslrapprinc:iple can clearly be applied much more Widely. It is prvbably most often used to calculate confidence inlc:rvals (Carpenter and Bithell. 2000). where it avoids the need to n:ly on large sample lhc:ory or assumptiOM coneeming the dislribulion of the data. For example. the distribution of individual patients' hospital costs is usually very skew and the bootstrap has been applied to calculate conftde:nc:e intervals for the average cost of hospitalisation. Other applications include hypothesis tesls. power calculalions and estimating the prcdicti\'e performance a slatislical model will have when applied to a new dataset that was not used in fonnulating or estimaling the model. In order for the bootsbap principle (I) to hold, it is necessary fOl' the bootstrap sampling 10 mimic the actual data sampling. l'ben:fon:. if we an: bootstrapping a c:linical trial with twolleabnents. we should sample with n:placement within each tn:atmenl paup. to preserve the nmdomisalion. Other situalioM requin: diffen:nt approaches. The booIsbap resampling illustrated hen: does not depend on any statistical model and is an example of the no-ptlrtlmelric boolstrDp. An altemali\lC, the paramc:lric bootstrap. is less widely used. This samples data from a parametric slalistical model, such as a n:;n:ssion model. nuhc:r than wilh n:platlemenl from the observed data. Lastly. nOle that the bootstrap. although it uses simulation. is cbarac:terised by the boolsarap principle ( 1). It is thus quite distinct from two other common uses of simulation. rand0misation tests and MARKOV CHAIN MONTE CARLO (for ftlting BAYESIAN MOOELS). IRe
=
IAckDowiedaemeab James R. Oupenter was supported by ESRC Resc:an:h Methods Programme grant H3332S0047. tided "Missing data in muitile\'c:1 models'.]
BaaaI, J. G. and Sarbr, S. 1998: Mo.-c:-Carlo IPIJIOXimation of booescrap vari.aaccs. The Anrentllll SlalaliC'i1Bl 52. lS4-7. CarpeD..... J. ad BHbtII, J. 2000: BOOlstnIp coaftdencc iJdm'als: when, which. ,,'hal? A pradicaI pidc for mediad 5Ialistici1m. Slalislica ill MttliciM 19. 1141-64. OatilDD, A. C. and ~. De V. 1997: Bootstrap mrlhotis and lheir appliflllion. Cambridge: Cambridge University PIas. EIna, B. aad TIbIIIIIruI, R. 1993: An inlm.clioIr 10 '''~ bootslrap. New York: OIapman & Hall.
boxplot This is a graphical display useful forhighligiatilll important distributional features of a variable. 1bc diagram is based on the five-number summary of a dataset. the numbers being the minimum. the lower quartile. the median. the upper quartile and the maximum. 1bc boxplol iSCOMtructed by ftrsl drawing a "box' with enck aI the Iowa- and upper quartiles of the data. next a horimntal line (or some other feature) is used to indicale the position of the median within the box and then lines are drawn from each end of the box lO the moll n:mole ObaemdiOM. One COII\'Cntion modiftes this last step by tnanc~ the lines to within (WlmaJtcd) poiatsgiven by the upper quartile plus 1.5 times the interquartile range (the difference between the upper and lower quartiles) and the lower quartile plus 1.5 times the intenauartile range. In this case. any observations outside: these limits are n:p~sc:nted indiVidually by some means in the finished graphic. Different compUIc:r packages may employ slightly ditrerent conventions for displaying exln:mc 01' outlyilll yalues. The ~sulling diapam schemalically n:pn:sents the body of the data minus the exln:mc obsenations. Particularly userul for comparing the: dislributional features of a variable in difrcrc:nt groups as illustrated in the figure. which shows the birthweighas of infants with severe idiopathic n:spinlOl)' disorder, classified by whether or not the infant SUl'Yived. For other examples see Altman (1991). BSE
3.5
-! .1iP
3.0
2.5
12.0 1.5 1.0
L_....!::==--__-===--_--1 Baby died
Baby survived
boxplat Birth..-eights (kg) of infants ,,,ilh severe idiopathic respirolory diseose syndrome (See also HlS"J'OGIWt. S'IDt-AND-LEAF PLor)
AICInan, De G. 1991: Pradital alalislit's for medko/ rrlWll'tlr. Londoa: Oaapman at: Hall.
Box-Cox transformation
See nAN5RlWA11ONS
Bradford HIli afterla Guidelines for drawing conclusions about causal relationships were proposed by Sir Austin Bradford Hill, Professor Emeritus of Medical
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ BUGS AND WINBUGS Statistics at die London School of Hygiene and Tropical Mc:dicine. in his adcImis to the Section of Occupational Mc:dicineofthe Royal SodelyofMedicinc in 1965 (Bmdronl Hill 1965). Bnaclrord Hill's guidelines drew on his many contributions 10 chronic: disease n:sc:an:h in Ihe post-war era. inciudililthe graundb~ng work with Richard Doll on the link belwc:en smoking and lung cancer. The following nine aspects ~ proposed for deciding whetbel' a statistical association might be causal: slmlgIb - rna;nilude of the association. as observc:d by measures such as the ratio of incidence rates; consistency - repcatc:d observation of the association in different populations and ci~umstanc:es and by diff~nt n:searchers: specificity - whether a cause leads toa single effect in a gi\'ell population: temporality - whether the cause pn:c:edcs the effect in time; biological gradient existence of a tn:ad or dosc-n:sponse curve betwcen the cause and effect: plausibility - whether the association is consistenl with currenl biological knowledge: cohe~nc:e ensuring that the inlelpn:tation or cause and effect does DOl conllict willi what is known of the natural history of the disease; experimcnt- existence of experimental rather lIIan obsenalionai evidence. such as through conductilll a nIDdomised trial ell' by introduction of a pR:\lCDtive measu~: analogy - comparison with previous ~search thai identified similar effect mechanisms. Bradford Hill did DOl intend these: guidelines to be philosophically rigorous "crileria' for causal infen:DDC. rather a basis for decision making that could lead to timely action for the good ofpublic health. With furthuaJDSicieration. some of the guidelines (e.g. "specificity') are less than universal in their utility and some commenlalon have proposed alternative criteria more finnly rooted in deductive logic (Weed. 1986). However, in prvposing these guidelines, Bmdfonl Hill advocated an approach of mWlll the besl use of the totality of available evidence: "All scientific work is incomplc:te -thai docs not confer upon us the fRcdom to igno~ the knowlc:dge we aln:ady have. or to poslpone the action it demands.' JGW
BradI'm'd HII, A. 1965: The enviroDmmt and disease: as:socialion or causalion? Procmli~ of,he Ro)'sl SOC~I)' ofMeJiciM sa. 295. Weed, D. L 1916: On Ibc logic of causal infmace. Amer;~tIII JoumaJ of Epidmlio/ogy 123.965-79.
bubbleplot nais is a graphical display ror line variables in which lWO variables ~ used to fonn a SCATIERJIIDI' and lhen the values of the third ~'ariable are leplgenk:d by circles with radii proparlionalto these values and ccntn:d on the apprvpriale poinl in the scauerplot. An example is shown in theftgum; hc:rethedalaan: for41 citicsin the USA and the two variables fonning the scallerplol are average annual tempenbln: and avenge annual wind speed. with the "bubbles' n:praenting the pollution level as measun:cl by the concenlnlion of sulfur dioxide in the air. The plot
suggests that higher pollution levcls an: associated with a combination oflowerannuallemperatun: and higher average wind speed. Mon: details or bubbleplots can be found in Everitt (2003). BSE
i
•
• (I)
12
-11
to
• •
0
G>
'g
1••
9
Iu
8
I
7
r
•• G) @®
•
••
•
®
0
8 45
50
55
• 60
85
70
75
Average annual temperature (Fahrenheit) buhltleplot Bllbbleplol
0/ Qnnlllll temperalure tIIId wind
speedtlgQwt pollulion lel'elos mea:mredby sulphllr t/io.'Cide concenlration in the air fIN 41 cities in 'be USA (See also BIVARIA'IE IOXPLOI' and SCA11DPLOI' MATRICES)
E,..ut, B. S. 200): Modern mrdiml slllli"ks. London: Arnold.
BUGS and WlnBUGS 1be use of BAYESIAN METHODS in practical problems in medical statistics and other substantive an:as of application has bc:c:a hindered until ~lalively recently by computational aspects. In particular, the evaluation of integrals in order 10 obtain posterior marginal. conditional and pn:dic:tive distributions in many multiparamcler problems are nol usually analytically tractable and asym..olic. numerical integndiOD techniques or simulalio.based methods ~ n:quin:d (Bernardo and Smith. 1994). In many pnctical problems in medical statistics the lilnlc:tum and naIUn: oflhe models used havc made parameter estimation particularly amenable 10 the use of MARKOV OIAiN MONJE CARLO (MCMC) simulation methods and it is these that the software packages Bayesian inference Usilll Gibbs Sampling (BUGS) and Win BUGS (Windows version of BUGS) implement. (l.aIest versions at the: time of writing are BUGS 0.5 and Win BUGS 1.4 ~ fn:ely available fram www.m~-bsu.cam.ac.uk.)
BUGS and WinBUOS use the BUGS syntax. which is similar to that of S-PLUS a: R. to specify the likelihood and prior distributions for the stalistical modeJ in question. together with initial slarting values for the sampler (oillts.,
61
BUGSANDWINBUGS _______________________________________________________________ Thomas and Spicgclhalter. 1994). Wilhin Win BUGS the specification of models may also be in terms or directed acyclic graphs (DAGs) using the Doodle feature (see ORAPH. leAL )'IOOfJ.S). with the appropriate code being produced automatically. Additions to the most recent \'crsion of Win BUGS ( 1.4) are the ability to usc scripts so that WinBUGS may be used in 'batch mode' and impro\'cd graphics capabilities. together with calculation orahe deviance information criterion (DIC) to assess model complcxity and fit (see BAYESIA.'l METHODS). In addition. the suite of S-PLUS functions CODA (Best. Cowles and Vines. 1995) can be used to explore con\'Crgencc issues with output from BUGS and WinBUGS. Specific dc\'elopments of Win BUGS arc PKBUGS, which allows MCMC methods to be used for complex population pllQmracokinelidphumlarodYIlQmic (PKJPD) models and GeoBUGS. which is an add~n to WinBUGS that fits spatial models and produces a range of maps as output.
Since BUGS and Win BUGS require the user to specify statistical models in terms of the UKELIHOOD and PRIOR DISTRIBlT'J1m1S (see BAYESIA.'l METHODS). using MCMC methods in order to evaluate the model is only recommended for users skilled at undenaking Bayesian analyses and must therefore be used with considerable care -the manual even comes with a 'health warning'! KRA
Bernardo. J. M. and Smlt", A. F. M. 1994: Bayesian theor),. Chichester: John Wile)' & Sons. Ltd. Best. N. G., Cowles. M. K. and "Iae~ S. K. 1995: CODA COnl'ergence diagnosis and
outPII' analysis sojltt'Qre for Gibbs Sampler Olliput: Version 0.1. Cambridge: MRC Biostatistics Unit. Gllu. W. R.. Tbomas. A. and Splegelbalter. D. J. 1994: A language and program for complex Bayesian modelling. TIre Stalistician 43. 109-78. Spleplbalter, D. J.. Thomas. A. aDd Best. N. G. 2001: lVilrBUGS 'J'ersion 1.4 ruer manual. Cambridge: MRC Biostatistics Unit.
c calibration
Consider a situation in which we wish to mcaswc serum c:onc:enlnltiorw of hanaoncs. enzymes and other pmleins. for example. using such methods as radioinununoassays (RIA) and enzyme-linked immunosOl'bcnt assays (EUSA). 11an:c key questions in the development.of such assays IR (a) how does Ihc: expected value: (average) of the: assay response change as a funclion of the true amount of the target malerial in the senlm samples. (b) how does the VARIANCE (or STANDARD DEVIATION) of the USBy results change with the averBIe USBy n:sull and. subsequently. (c) how might we usc: a panicular assay n:sull to delerminc: the amount of the target maIerial in a new sample of' senun? We leave question (c) for the time being and concentrate on questions (a) and (b). Let the assay respDDse be Yand let the true: level orthe target material be X. We wish to clc:tc:nninc the form of the functionsF and G in the following two equations: E(YIX)
= F(X)
(I)
and
Var(YIX) = G(E(YIX»
(2)
Here we assume that the values of X IR known without !oIEASlIREMENT BUlOR. We an: concemcd with what is often ~fem:d to as absolule calibration. If we do not have access to the truth. but only have measun:ments using alternative assays. Y. and f20 say. then we ~ a:n:emc:d with the problem of comparative: calibration (for the lauer sec ME11IOD COMMIlISDN S'RJDIES). "JYpically. such a univariale calibndion study involves performing the assay proced~ (ideally with full. indcpendcat~ replications) on each of N training samples or specimens with known yalues of X. and then using various data analytic and modelling proceckRs to eYaluate the fonn of F and G. 11Ic: statistical mc:Ihods might be fully panmc:tric (Htting linear or nonlinc:ar models, for example, with an assumed parametric model for the wriancc:) or nonpanmc:lric (esSCDlially filting an ubilrarily shaped smooth dosc-n:sponse curve). Suppose: an analytical chemist wishes to use some fonn or absorption spectroscopy to study the composition of. say, certain body Ooids. He or she is likely to use meas~mc:nll or many peak heights &om such spectra 10 measure several substances simultaneously. This activity is the multivariate analogue of the univariate case; i.e. multivariate calibndion. Technically, multivariate calibration is much IIICR dimcult than the simpler univariate problem. but the ullimale aims and logic an: similar. We stall with the latter and then brieRy discuss the former. £rrqdDpllldie C~ to' AlNItaI SllIIis,;a: S«OIIII Edilim cJ) 2011 JohD Wiley 1\\ Sou.. ....
Instead ofdealing with the technical complexities offilting nonlinear models with heterogcac:ous error distributions. we will consider an example that. by comparison. appears to be: quite simple. Suppose we hayc a simple colorimmc assay fOl' urinary lIucose. We obtain a series of specimens with known lIucose concenlnliorw (X) and then measure the: absorbance using the mevant assay procc:dwc. We assume thai the calibration function F is a slnight line and that the variance of the Y measulmlCnts is independent of X (i.e. the 'enor' variance is constant). Filtin, a simple linear regRssion model for Y using ordinary least squares gives us estimates of the: inlen:c:pt (u) and slope (/J) of the slnlight line relating X to Y. Having answeml questions (a) and (b) usin, the: simple: rqressionanalysis. we now move on to question (c). Suppose we an: presented with a new urine specimen and ~ asked to ddcnninc its glucose eanlenl. 11ae classical method of estimating the unknown X fram our measurement. Y~ involves using information from the abovercgessionof' Y.X.1bc: requircdc:slimate isgi'Ven by:
(n
x = (Y-a)/IJ
(3)
An alternatiye is the so-called invc:ne estimator suggeslCd by Krutcbkoff(1967). This involves using the original X. Ydata to rcp:ss X on Y to obtain estimales of the: intcra:pt (y) and
slope (1). and then simply using these panuncler estimates 10 pmlict X giyen a new fl' i.e.:
.
X=y+AY
(4)
For details or the properties of these: two estimalon. see the review by Osbome (1991). To illustrate the ideas of multivariate calibration. consider a n:latively simple ellample. SuppaIC we wish to measwc the concentration or a particular metabolite in the blood (X) but we arc now able to usc, say, thn:e different colorimetric assay pmcc:dun:s to olMain yalues Y•• Y2 and f s• Assuming that the thn:e carn:spanding calibration curves (F.. F~ and F~). as before, are all straight lines (but with ditTen:nt intcn:epts. slopes and "c:nar. 'Variances) we CaD use MULTIVARIATE UNE..o\R REORESSJON (orlhree separate n:grasions) in order to estimate the panunclers or the thn:e calibration curvcs. 1hc classical approacl1 to the uscor. new set oflhn:e measurc:mc:nts (Y•• Y2 and Y,) on a DeW specimen 10 pnxlic::t an unknown X is the: mulliYariate generalisation of the univariate problem. DeWls of multivariate calibration IR well beyond the scope of the present article. however. and readers ~ refClRCi 10 Thomas (1994) and Naes el al. (2002) fOl' further information. Consiclc:ring our prescnl example:. one simpleappralU:h
Edited by Briaa S. EYeritt and ChrisIGJlh« R. P'IIImeI'
63
CAUPERMATCHING _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ (particularly if we are prqJBrCd to assume ClGftditional inclepeadcnce of the Y values) might involve estimating the unknown X using each or the YI' Y: and Y;t values separately (in each case usiag equation (3) above) and then producing a weiPtcd average of these thR:e estimates, with weights proportional to their estimated precision. An example of the inYCn1e approach would be to produce a multiple regn:ssion to prc:dict the unknown X from Ihe three Y measurements. This has ob\'ious lC:chnical drawbacks, howe\'Cr. because or MUIllCOWNEARm' (high correlations betwcca the lhrc:e Y values). One possible solution involves the: usc of principal components regression. A PRINCIPAL COMPONENI'S AN.u'YSIS is carried out on the Y values and then one or man: of the resulling components are used to predict the unknown X. Fwdaer details or principal components regression and alternative analytical stnlCgics can be found in Tbomas (1994) and Naes el D/. (2002). Whalcver method of prc:didion is usccl however, it is important in bath univariate and multivariate calibration problems thal the performance or the predictions arc adequately evaluated. this might involve validation using a test sci of new X, Y values or internal cross-validation (use or the LEA\'E-ONE-CX1T CROSS-VALIDATION approach. for example) using the original lraining set. GD
Kntdlbft',R. G. 1967: Classical and invcne lCp'CSSioa methodsof'
rewa, Dan.. T. 2002: A ruer-/riendl, gllide 10 nut/lilYll'iale
calibratiCJll.ledllORfetrit~ 9. 425-39. NaIl, T.. 1.1aRa,T..
T. aM
ralibralion tmd citmijiC'QlioIr. OUchcster. UK: NIR Publicatians.. 0IIt0rae, C. 1991: Statistical calibration: a nwicw. InlmllllioMI StaliJliCQI Rent.., 59, 309-.36. nama., Eo V. 1994: A primer on mullivariale calibration. AnIl/yliml Chemislry 66. 79SA-804A.
caliper matching
See MATCIDNO
canonical correlation analysis This technique establishes whether relationships exist between a priori groups or variables in a study. For example. in a study or heart disease, we might ask ir then: is a connection between penonal physical charac:lcristics such as age. weight and height. on the one hand. and the systolic and diastolic blood pressures or the individuals. on the odler. Altc:mativc:ly. in chronic depn:ssion. a study might be aimed at uncovering rc:lationships between personal social and financial variables such as gender. sse. educalional level. income and a range or health variables induding various indicaton of depression. In anothu example. a public health survey might be C1Jftdueted to explore cOMections between housing quality variables and indicaton of different illnesses. A first attempt al analysing the strength of association between two ,TOUPS of variables (e.g. between hausing quality and illness) might involve examination or all COII'CIalions betwcca pails of variables.. one from each group. Howevu. ir cadi group contains man: than just a rew variables. such an approach is bound to lead to conrusion.
Ideally. one would like 10 n:place each set of original variables by a new set. in such a way that the new variables were mutually uncOl1'elaled within sets and just a few of them exhibitc:d correlation betwcca sets. Canonical correlation analysis takes just such an approach. and finds optimal sets or linf!DT Irllllsj'oTmQI;olU or the original variables. one ror each original group of variables. Suppose that "I. II:••••• Uk an: the InIDSfonned variables ror one sci (say. the housing quality variables). while ",. "':•...• I'k an: the transfonned variables ror the other sci (say, the illness variables). "Optimality' is defined by requiring the correlation betwcen "I and ", to be as Iarp: as possible among all linear combinations of the original variables. that bc:twecn II: and 1'2 to be the nextlarp:st. that between u) and l':s the thinllar;est and so on. subject to Ihe rollowing constraiD15: II:• •••• an: mutually UDCGITClated: 1'1. "2••••• If.. are mutually uncorn:lated: and any",. pair is uncom:latc:d when ; #= j. It is clearly nat possible to have more (uncorrelalcd) transronncd variables Ihan then: were original variables in a set. so the number sorpain that can bedcrivcd is equal to the smtzl/eTofthe numbel'sororiginal variables in the two groups. The effect of canonical cOl1'elation analysis is thus to channel all the association between the two groups of variables through the n:sulting pairs of Unear combinations (u""')' (U2.I'2)' .... These derived variables are known as t:Dnonit:DI WlT;Dles. The only nonzero correlations n:maining in Ihe c:orrclalion matrix of the new variables an: those between conaponding pain or canonical variates.. i.e. between u, and ,., for; = I •.. -•.s: they an: known as the alIIOlfit:tJ1 t:orreIDlwIf,s of the system. Mos. computer software packages that conlain multivariate slaliSlical prot'edurcs will conduct such an analysis. They will also quote a significance level against cacl1 canonical conclalion. appropriate ror testing the NUlJ. H\'FOI1IESIS that all succccding population canonical cOI1'elations are zero. Such significance levels should be bellied with some caution, as lhey rely on Ihe assumption that the data follow a MULTlVARIAlE NORMAL DISTRIBUIlON. Nonetheless. the number or signiracanl canonical conclations is usually laken to indicate Ihe number of (independent)conncctions that ex is. betwcen the two groups or variables. Inspection of Ihe aJCfftcients or each original variable in each canonical variate may also proVide an interpretalion or the canonical variate in the same manner as interpretation or principal components.. which may help to identify the natun: of the conneclion between the groups (sec PRINCIPAL COMJlONENT ANALYSIS). However. again a cautionary note is in order. because such intellRtation is not quite as straightrorward as ror principal components. The reason ror the complication is thai there may be very diverse VARIANCES and covariances (sec COVARIANCE MATRICES) among the original \'ariables in the two groups, which affects the sizes of the coeflicients in the canonical van-
"I.
"I
"k
_____________________________________________ ates. and Ihere is no convenient normalisation to place all coefficients OR an equal fooling. This drawback can be allevialcd 10 some extent by rabictinl interpretation to the :lIQntlartli.d coeOicienlS.. i.e. the coemcients lhatln appropriate when the original variables have been stanclanlised. but nevenheless the problem still n:mains. To iIluslrale the technique. consider a canonical conelation analysis bc:Iwcen the 'health' variables ancIthe 'pcnonal' variables in the Los AftlCles clepn:ssion study or 294 n:spoadcal5 pn:senlCd by Aftft and Clark (1984. Chapter IS). The four 'penonal! variables \VCR: gender, DIC. incCHDe. c:cIucation level (numerically coded fram the lowest. 'leIS than hilh school'. 10 the highcsl. 'finished doc:lOrate'). while the IWa 'health' variables we~: CESD (the sum of 20 sepande Dumerical scales measurinl clift'en:nt aspcc:1s of clcprasion) and health (a numerical sc~ measuring 'general health'). The c~ma matrix bc:twc:ea these variables for the sample is showa iD Ihc: table. canonical correlation ....., ... ConeIafion mattix for Iwohealth andlourpersona/VIIIIIIbIBs in the LA ~ sionstudy CESD HeQII" Gentler Age &iMmliDn Income 1.0
0.212 1.0
O.llA 0.091 1.0
-0.161
alGI
o.ow 1.0
-0.101 -0.210 -0.106 -0.208 1..0
-0.151 -0.183 -0.110 -0.192 0.491 1.0
C~R&4ECA~REM~
worth. TIle conapondiag canonical variates In: fI2 = O.I99C£SD + O.288Hetlllh 1'2 = 0.396Gentler-O.443Age-0.448Edrlt:tllion
-O.ssSIRt:on,e Since the higher value of the gender variable is for females. the interpn:1alion hen: is lhal ~latively yauag. poor and uneducatc:cl females ~ aaacialc:cl wilh higlac:r depn:ssion SCOleS and. lo a lesser ellteal. wilh poor ~eivc:cl health. Thus Ihe~ are two inlClplelable 'dimensions' of cannCCli_ betwa:n the 1W0 sclsofvariablcs. A SCAmllJll.Ol'ofthe scores of ~spoadents &pinst each pair of canonical varialeS would help to identify any anomalous individuals in the sample. A further inlercstinJ application of canonical ccx-mali_ analysis occurs when there isjusl a single sct of variables. but they ~ measun=d on individuals in a number of a priori distinct poups or populations. Forcxample. a set of signs or sym..OIRS XI. X2 • ••• , .t'~ is observed ma a sample of patients suffering fiomjDunclice and each padent isclassifiecl inlOone of I illnesses thai have lheeXlc:mal manifeslali_ orjaunclice. We can thus define a sct of indicator variables 11. 12' •••• )'. that specify a palient's illness. by selling the values 'i= I. 1./=0 (j:F i) for a palicnt suffering fram illness i. A canonical com:lalion analysis with the x values as one SCI or variables and Ihe 1 values as lheolher sct ofvariablcs willihea produce the linear combinations of the .y values thai ~ most highly com:lalc:cI with linear combinations of the poup indicator variables. Since Ihe lauc:r deftne the best way 10 view group dift'e~nces. the fonner an: just the canonical variables tbal best discriminate belWcc:n the I grvups of individuals (see DISCRIMINANT RJNCI10N ANALYSIS).
He~
the maximum Dumber of canonical variate pain is :1= min (2. 4}=2.1be fint cllDllllicai cam:lalion turns GUlto be 0.405 and this Ih'CS a sipificance level P < 0.00001. II might be arguc:d thai lender ancI c:cIucalion ~ unlikely to have normal distributions. so Ihis SignificllllCle level should not be laken tao litemlly. Neyenheless.lhc:~ does seem 10 be strong evidence thallhe ftJst canonical cam:lalion is significant. The: cam:sponcling canonical variates, in lenns of slandanlisecl original variables.~: "I
= -O.490CESD + O.9I2HeQII"
= O.02SGe",lo + 0.87IAge-o.383EtluCllliM+ O.082lnl:o",e High coefticients com:spond 10 CESD (lICIatively) and hc:alth (posilively) for the ~c:aVc:cl health variables and to age (posilively) and c:cIucation (Relatively) for the penonal variables. Thus n:latively older and unc:ducatc:cl people Ic:nd to SCOle low in terms of clepJasion. but penx:ive their health as matively poor. while ~lalively younger but c:cIucaled people have the apposite hc:aIth pen:c:plion. The second canonical conelation is 0.266. which has a significance level P < 0.00 I so also carries interpretalive 1'1
Finally, the variables in a slUdy may fall inlO IlleR than two tI priori sets anclsome genend between-a mea5tR of associalima is requiral. Various possible definitions of such associalima may be made and. consequently. the ideas or cllDllllicai COlRlation analysis may be gcmeralised iD various ways. However, such generalisali_ is quite complicalc:cl and inlerpretation orthe n:sults becomes much more problematic. Gnanadesikan (1997) pnmdcs a brief overview and further ~f'en:accs. WK All, A.A. ... CIIIItE, V. 1984: Compuln-tlitletl _Ii,trille _lysis. Califomia: Wadswadb. O.........Ire., R. 1997: IIeltrods /« Jllllwimllllllll,lis tJf mulli.triJle obm-tVlliou. 2nd cditian. New York: Jalut Wiley a: Sans. Inc.
canonical variates
See
CANONICAl.
CORIlELAnONS
AJW.YSIS
capture recapture methods
"Ibis is an altemalive
appruadJ to a census for estimating population size
that operates by sampling the population several times. 65
~~EC~REMB~
_____________________________________________
idenlifyilll individuals who appear mCR dian once. Captun:-n:ca~ mdhods ha~ a long hislDry dalilll back to 1786. whca Laplace used such a technique lD CSlimalc the size of lhc 10IaI papulalion of Francc. Traditionally. captum-R'apleR melhaclolagy was primarily focused on wildlife populations. bUI has increasingly been applied to human populations, particularly within epidemiological situations. Within the ecological Reid. Caplum-nnplUle experiments involve observers going inlD the field and m:ording all animals ht an: absc:ned (either visual sighlings 01' trappings) at a scquc:ncc of capture e~1s. On lhc initial ca~ event, all aaimals that an: obsened an:: m:orded. uniquely marked ad released back inlD the population. Al each subsequent captureevcat. all unnuukc:d animals an: recorded and uniquely nuutcd. aU markccl animals an: lUanied and all animals released. The data from such an experiment 1ft simply the ~ of captlR histories for each individual animal observc:d within the study. Each iadividual caplUrC history is typically rcpmICntccl by a series of ZCIOS and ORCS. when: the 0 and the I denote lhc absc:acc or presc:ncc. n:spcclively. or lhc individual at each capture event. Then: an:: generally two fonns or models for captum~Caplure data: closed and open. for which there have been • series of models proposed. Closc:d models assume hllhe population is 4Xlftslanl thmughout the study period. with no births. deaths or migrations. whcn:as apen models allow for these traasitions in the populaliOD. Generally speaking. the panunctcrs of inlcn:st diller between the IWo models. Forexample. within closed populalions. the total population size is geacrally of particular inten:st: convenely. far open populations. parIII11CICI1s of interest may include birth rates. dcaIh rates. migration rates and/or pnxIuctivity rates. We initially. bricJ1y. consider the c:aplum-~apleR methods often consideml fOl' wildlife dais before 4Xlftsidcrilll in fUrther dc:tail epidemiolopcal models. Forcloscd populations. Otis el DI. (1978) described a series of different Caplun:-n:ca~ models. relating lD possible hclCrogcneity in lhc capture rates as a rault of time. Irap n:sponse or individual elTec:ts. 10111 and Brooks (2008) ha~ incorporated these madels inlD a Bayesian framcWOll. Mix.tun: models have become inc~asingly popular lD model individual heterogeneity (Plcciger.2000: Morgan and Ridout. 2008). Additionally. then: ha~ been a series of models proposed for open populations. dependent on the paramcIcn of interest. with perhaps the most widely used beilll the Cormac:k-Jolly-Sebcr madel. whIR lhc sumval rab:s an:: of primary interest. ad the Amason-Schwarz model. which incorporates multiSlnlIa cIaIa. Recent advances include the generalisation lD multiC\'ent models (Pradcl. 2005). where the SlaIc of an individual may only be partially observed. Within the epidcmiologicallitcratun:., closed populations an: usually modelled. with the lDtai population size of particular intcn:sl - and this is whal we shall focus on here.
capture-recaplure methods Example of an inc0mplete contingency tlJbIe, with three SOUICeS: A, B and C. The entries n.- denote tire number of individuals obsstved in the given cell, where 011 tepf8SBlJls libsencelPl8SeIICe on the given list. The cell ntIDD is unobsewed and hence unknown
C=I A=l A=I A=O A=O
8=1 B=O 8=J B=O
"I ..
c=o
IIIDI
110 "lliao
110..
11010
11001
lllIDO
For example, many IR8S of scientific n:scan:h focus OD the estimation of papulation size: fiom the number or susceptibles lD a given disease. lD the number of drug addicts in a panicular area or the number of injuries sustained in the workplace. However. it is usually impossible lD enumerate each member of a populalion. possibly clue lD their number (e.g. the number of web pqes on the intcmc:t) or whca the papulation is ·hidden' (such as the number of injector drug usen). Thus. data 1ft often collected in the fonn of a series ofincomplctc populalion counts using a variety of sources 01' lists. Each saur= ~spDnds to a capture event and an individual being IaXJI'dcd by a gi~ SDU~ com:sponcis 10 being observed at ht captun: cvcnt. It is assumed that each individual is uniquely identiRable by each soume. 11Ic:n. the data 1ft simply the capture hislDrics of all individuals o. served. The clata an:: usually 5ummarisc:d in the fann of a contingency table., when: k is the number of sources. and the cell cables corn:spond to the number of individuals thai an: observed by each combination of SlJUn:es (i.e. the number of individuals observed with the same capture history). Cleady. lhc contingency table is incompleac since the number of individuals belonging to the papulalion but not absc:ned within the stucly is unknown (sec the lable. for example). Unlike the ecological application die soun:cs do not usually have a temporal scqucncilll as for the captun: events and so dilTcmat models ha\'C been developed within Ihe epidemiological application. Within lhc epidemiological field the captum-recaplure approach is sometimes callccl 'multiple ~ systems' and the corresponding estimate of the total population refem:cl to as 'Bernoulli census estimates· or 'uccrtainmcnt c:orr=ted rates'. 1be most common appI'OIIC.h to analysing epidemiological data of this san is via the use of LOa-LINEAR MOOELS. introduced by Faenbcrg (1972). In these models. the logarithm of the expectccl cell count is expressed as a linear function of panunclel'S. These parameters relRSCnt main cft'c:cIlcnns for individual soun:es and associations bc:Iwccn two or ~ soun:es. Thus. these models allow for interactions between diffcmal soun:cs.. The model assuming indcpcncIcnt'C betwccn each soun:c is simply a special case:
t,
______________________________________________________ QUDOGMM .willa DO inb:ncIions JRICDL '11Ic:n: are 85_0y a DUmber of possible 1..-IiDeal' models lhat CaD be lilted to lhc
dada. each specil'yilll clifl'en:al sets or interactions bctWC:CD
the soun:cs. Tradilionally, classical aaalyses coasist of initially lindilll the model which pm\'idcs the best fillo Ihe .... _11& ror eumple, UDIJHOOO RAllO 1f.S1'S aadtor iDformaai_ criteria, such as AlCAllCE's INRlIMATIOX auTERION (AlC) or BtlyesiDlr in/Dmllltiolf mleritnr (aIC). Once lhc CiYCD madc:J has beell selected, the Ialal popuIaiiOD is csIiIaaIc:d .sinc MAXIMUM LIKflJIIDODES'IIMATIOX fardac miSlinlceD. cambinCd with lhc observed number oI'individuais (sec. forexampic, HOlIk and Rc:pI, 1995). Reccndy, Bayesian appmaches have also beca deveJaped for ftllinc 1og-liDe. madels to Ihc cIaIa, and ill )NIIIiaI1.1he issueof .....1choice (KiDl amlamob, 2001; KinI el QI. 200S). 'I1Iis approach also allows the calc.11IIioD or a modcI-avcrqcd estimate of the IaIal papulalion. I1:I1KWinc die ....I-clepcaclcace pmblem lhat mayarisewheD ....y
a siagte model is chosea on which 10 basi: infcn:lllx:. AIlerllllliw: appnNIChc:s to .sinc 1oI-1me. models iDclucle: 8SC 01' the Rasch model (see Caniquiry and FacDbeq. (998) in anIer to madel possible heterop:DCily iD the papulation and also IBleat class models when the individuals CaD be categorised inlo ditren:al subpopulations. RK
CaniqaIrJ. A. L
............. 5. It. 1998: Rasc:b IDDIIcIs. In p. ... Col. T. (cds).. Erreyt:/1Ipftiia D/ bitutlllillirs. CIIic:Iaa!ilCl': John Wiley" So-. Ltd........... 5. It. 19'12: 'I1Ie Anni~
mulliplc RCIIIIIIR OCIISIIS for clDlcd .,..11Iians ud iDcampIclc 'i' eoaIingcacy IIbIes. BitHnelriktl 59. 591-603• . . . , ........ RIpI, .... 1995: ~ mcdaads in cpdanialol)':
_sis
IDCIbods and limilalians. E,ilknliDllI,itlll Rnin.. 17, 243-64. KIIIIt ..... IInab, s. P. 2001: Oa abe Bayesian or papuIaIionsi&.Biotnelrikll88. 317-36. KIDa. ..... ~s. P. lOO8: On Ihc BayesilD esIimaIian or a closed papailliCII size in Ihc pn:seDCC of IIclCmpadly ... madel 1IIICCItIialy. IJitllMlrks 64, 81~24. KIDI. a., BInI, s. at., BnIab, s. P., Ba.J. o..... y • ....., S. 200S: Prior iDfOllDllliDn iD bebaviDlaal c~ ftlCapbR IDdhads: c1cmopllllly iaftDlD:CS injecIDII' pnIpeDIity ID
be listed GIl data!ICIIRCI1IIId lhcirdrupclllCchna...UIy.A....... JOIIIfIIIloJEpiMmitJlIIg)' 162.094-'103. OIII,D.a.., . . . . . .,K.P.. WIlle, o. c. ... A8dInMI, Do .. 1971: Stalislieal iDfeRnce fftIIa aphRdIta _ closed IIIIiIDII papallliaas. Wi/tli/, 62. 1-135. U ........ J. T..... ....., M. S. 2008: A new mixlUre model far aphR hdcqcaeil)'. JauraaI or the Royal SlIIIislicai Socicl)': Series C 57.433-46.......... s.lOOO: UniW maximuIn likclihoad csti_ta ror closed ~ IIIOdeIs IIIiDI mixllRS. BitJlntlTks 56. 434-42. ......., .. 2005: tttultic:Ycat: an exImBian ofmuhislalectplllla-ftlClI*R models IOUllCCllaiaSlates. B-.tI'icr61,442-7.
II""",,.,.
carry-over CART
See CR05SOYER 1RL\U
This is an 8CI'OJIym 'or clusificlllion aad 11:p:s-
siOD 1n:C. See TIEE-mtUC'IUlED loIEIIIOD5
cartogram This is a diapam ill which clescripli~ slDlislical iDformatiCIII is displayed on a p:opaphical map by means or shadiag, by usilll a variely 01' diffi:n:Dt symbols or by SOllIe IIICR illvolved pmc:ecI&R. Two examples in . . - s an: liven. A dcscripti_ of bow cllltognaas may be COll1InIcIc:d is given in OuseiD-Zade mad Talcanov (1993). SSE ISec aim DISEASE a.u51I!RIKOJ
CIIItognun CstIogram oflfe expedBncy in Ihe USA by slate.
<10 -10 yeats or less, >ro - mote ,han 10 yeatS 57
~sruDl~
________________________________________________
cart..... 1996 US population CIIItogram (III states aRl. resized telatitle 10 their population) Qaela.Zadt, S. ad l1~.V. 1993: A
coaslrlldiac
CDDtlauaus
DCW
cutopllDS- Geo,rtlpII}' _
' 'formtl';. S)'SImu 20. 167-73. caa-cohort studies
meIbod lor Gtogmpbic
Sc:e CASE
case control studle.
'I1Iese an: slUclic:s in which a group or people with the disease of intcrat (the cases) is CXIIIIpImI with a paap without the disease (the cantrols). ExpasUR tolherisk factarofinICn:Sl is then asc:cnainecl in all those m:ruitcd 10 the study and the CXpcllUle levels or the cases an: anpan:d to those of abe coatmls. Difl'erillllcvels of exposUre between the cues and conll'Ols indicate that the exposure is associlllCd with die disease:. Such Iludics arc padic:ularlyappmpriatc for rare disea.,s for which foDow-up studies arc inappmpriale. as insufftcic:llt cases of the disc:asc wauIcI to provide suflicieatllatistical FCM'ER. It is ilDparlant at the outsc:I to ..~ a clear deftnitioa of the type: ofpcl1lDD who is e:liJibIe to be • case. Issues Ihat nceclto be coasideled include:
_se
I. 'The tkjinilion. DIIM tlisetul!. Can Ibis be broadly dcftnc:cl aris a specific: subtype orthedisease the lOcus ofink:n:st? Par example, in a sIUdy of Ic:akaemia a clecisiOD nc:ecIs to be aaadc as to whc:1hcr all cues of leukaemia wiY be included or only • spc:ciftc subtype such as chronic myeloid. 2. The till! l'IIIIge t1/ ,he (YIS'Q. Some diseases .an: lilcely to ha~ difl'~nt causes at difl'erent ages. 3. TM sex of lire ~tDe~. Por some disordc:n. it would be appmpriale to n:slrict to one: or GIber sex. For eumple.
while cues ofbn:asa cancc:rcID arise in men, Ihe adiology of these is lilcely to be very diffi:n:at from that of Ihe disease in women. 4. Indtiml. prntUDlI.or de«lUeti auG. b is usually prefcrab~ to m:ruil cases when they an: cIiaposccI (incideal cases). Howe'VCl'. far diseUes thai arc very rare il CaD lake too loq to n:cruit 5uflicic:llt cases. They can Ihm:rCR be: supplc:mc:nted by cxistiRl (JRvalent) cases. Howcver. far diseases that can lead to death.·the pew)cnt cases will only n:pR1CD1 Ihc: survi~. To avoid survivor bias.. deceasccI caSes CaD be: iacludcd if the: nexl-of'-kin CaD pavviclc the appmpnate information. on the eqMISUR ofinlclat.
s.
HospilQI D~ co"""",,;I,. CtI• .r. A clec:isiOD nc:c:ds to be: laken as towhi:n: thec:aseSIR to found.trthe disease is of'
such a naIu~ thai all caseSIR likely to COIIIC to hospital (e.g. bn:astcanccr). then the hospital may be an 8pIIIOpIiaIe recruibnent Iomticm. For diseases that are o~n natal in the community. such as back pain. n:cnaitment Iium hospital wiD only include cases aI the mast SC:VC:~ end or the· disease. speclrUlD. Equally, besides. the choice of cases. lIM= an: impadDDl considerations for the selc:ctioa of controls. who should be dmwu from Ihe population -at risk of bc:c:oming asses. In 1heoIy~ the cases and conlJolscan be considcn:cl as beiDg part ofa Iarp hypothetical COIICItT~ Those who develop Ihe cliseaseare dlecasc:s, ad those who hawnlDl acquiml il fann the pool fiam which .the conIrols an: drawn. Thus controls should be drawn can:fully fram the graup or people: who would be elassificd as cUes if they happened to~1op the disease.
__________________________
Controls should be wilhin abe same age range as lhosc chosen for the cases. and ir only one sex is being considered for thecascs abe same n:sbiclion should apply lO lhc controls. If the cases are rccruitc:d in hospitallhcn abe controls may also be recruited from the hospilal or from the communily defined by the catchment area or abe haspilal. Any exclusion criteria applied 10 the cases must also be applied lO the controls. and the controls must be al risk of developing the disease. For example. in a case-mntrol study ofendometrial cancer. il was importanl"'at women who ha\'e had a hysterutomy. and thus no longer had an endometrium. were not included as controls (Barbone.. Austin and Partridge. 1993). Increasingly. matched case-control studies are being eondueted (sec t.L\TCHED SAMPLES). One or more controls are chosen for cach case. matched as closely as possible to the case ror various radors that ~ not of intrinsic intcn:st to the study. Common matching factors am age and sex. Thus for each case. a control of the same age (topcrbaps within one year) aDd abe same sex would be chosen. ~to-onc matching gives rise to a matched pair study. To iDcrease the statistical poweI' of abe study. more than one conbol can be chosen roreach case. This is particularly useful if the disease is rare and it is hard lO Ond sulTacient cases. However. it is mrcly wOJth studying marc than four controls pcl'casc. as the elTort spent in collecting abe dala on abe exlracontrols tends to outweigh the minimal inc~ in power. Frequently. the exposures will be assessed by QUES1KJN. NAmES. This can presenllogistical diflicullies irthe cases are yel')' sick and sometimes (as withdcceascdcases) abe next-ofkin need to be questioned im'lcad. For example. in a casecontrol study to examine abe risk of sudden infanl death syndrome in relalion to used infant matlresses, all Ihe expos~ data were collected by questionnaire (Tappin el 01.• 2(02). ExPOMR 10 a used infant maltMSs was assessed by asking parents about routine night and day sleeping places and ascertaining the state of the mattress and whether il was new for this baby. Other studies have been able 10 link n:cords with data held on the individuals. For example. a study of Alzheimer's disease in relation lO levels of aluminium in drinking walcr was able 10 utilise n:conk from the water companies loascerlain lhc aluminium levels in waterpipcd to each address aI which abe cases and controls had lived during their lives (Martyn elol. 1997). A particular concern in case-conllOl studies is the possibility ofJlECAl.L BIAS. particularly when obtaining information by questionnaire from the cases and controls. 1be cases ~ likely lO have thought about the possible causes of lhcir disease. whereas the controls will DOl. Thus the level ofn:call of past exposures may differ bc:Iwc:en cases and controls. which may lead to spurious differences in exposure between the two groups. One of lhc mast difficult aspects of a case-mntrol study is ensuring that the choice of COnb'ols is appropriate. Controls
CASE~ROLSTUDIES
that arc not rqJresenlalive of the population al risk of the disease will lead lO biased findings (sec BIAS IN OBSERVAYIQN,\L S11JDIES). An example is when the cases ~ asked to choose a friend to ad as the control. Such an approach tends 10 maximise the n:sponsc rate in the controls (as the controls are usually willing to help their sick friends). However. if the expos~ of interest is related to wort or leisure activities or lifestyl~ then the friends will be more similar to the cases than lhc a\'erBJ:e penon in the population at risk. Thus owmnolcbing or conbols to cases takes place and little difference may be rOlDld bc:Iwecn the exposures of the cases and the controls. Any association between the disease and abe exposure can therefore be missed. Issues ofBLO\S ncc:d to be addn:ssed at the design sta;e of a case-control study as no adjustmenl can be made in abe analysis to take accolDIl of it. In common with all epidemiological shldies. "onj'ounding is an issue that has to be considelal in casc-contml studies. A confounding factor is one that is reJalcd both to the disease IDlderstudy and thcexposun:ofinlercsl. Aceountcan betaken of confounding factorsal the analysis stage. bul it is important at the design sta;e to identify and collect as much information as possible on an putative c:mfounding factors. Common confounding factors are age and sex as these two factors an: invariably related to any disease and to most exposun:s. Marching at the design stage on one or marc confounders provides a way of removing abe effect or these confounders and so adjustment for them is not needed in the analysis. The appropriate method oranalysis depends on the type of case-conlIOl study employed. 1be analytical methods for matched casc-conbol studies differ rrom those for unmatched studies. and it is important lhatabe appropriate methods are employed to a\'Oid bias in the results. Basic methods of analysis in unmatched studies arc describc:d next. Estimates of risk of disease in lhc unexposed and exposed groupscannol be obtained fmmcase-control studies. because the case
ca_ control studies A 2 x 2 tsble for comparing a+ c cases with b+ d controls
Exposed Unexposed
Cases
Conlrols
Q
b J
C
1hc ODDS RAllO is the ratio of the odds of exposure in the cases (01,,) to abe odds of expos~ among the controls (bid) and thus iscalculatedasodfbc. Oddsratiosabcn'e I imply that abe exposure is associated with an increased risk of the 69
~Olgru~ES
__________________________________________________
disease whcn:as a YBlue below I indicates that the exposure may be pnIICCtiYG. As an example.. consider a study or cod li\lCl'oil in infancy in n:llllion 10 chiid-ollllCl 'J)pe I diabc:acs (Slc:ac and Joner" 2(03). The n:sults an: givea in Ihe scc:ond bible 1111: odds ratio is 197 x 1341(777 x 318)=0.66., indicllliagllaatcod liva-ail appears 10 pntIcct against chilcl-onsc:t diabetes.
li\lCl' oilleadillllG lower odds raIios. 1bc chi-sqWR test far tn:nd leads to a significancc level of p < 0.001.
call control studl. Case conttol study dallIlICCDffJinQ to lin 01rlinIlI exposUI8 Cod liver oil in lsI yelU ojlij'e No
Yes. 14limcs per \Wdt Yes. ~5 limes per week
Cod lilter oil ill
Ctue.s
COIIlml.s
0tJtI.s ratio
311
834 224 553
I (rd'ance) 0.70 0.65
60
137
Corrlro/s
INI ,wr tJf I(le
Yes No
197
m
311
834
SIaIIdanI analysis of a 2 x 2 table uSIlq; a OI-5QlWtE TESI" can be pcrfonnc:cl to test whether the odds nlio dilTen signilicandy fmm I. The 95 ., COMFJDEXCE INI'EIlVAlS can be deri~d usilll a varidy or approxinudc methods. 1'1Ic most
sophisticated methods taplin: ib:ralive solutions but simple methods such as that praposed by Woolfin 1955 (sec Breslow and Day. J98O; ScblessellDllll. 1982) provide n:asoaable appruxilDlllions. His method involwscalculatilllthe approximate VARIANCE of dac naaurallopridun of the odds mio as: 1 1 1 1 -+1"+-+a .n r II
More sophisIicatc:d analyses ~ possible usilll LOGIStIC REORESSJON. This allows assessmenl of cxposum on a con-
tinuous scale (rather than n:quiringgrauping into levels' as ,,'CD as aIIowilll for Ihc: etrccls of confounding factCIIS. 1'1Ic coc:fticients obtained in the logistic regn:ssion model are the logarithms of the odds mlios. and the actual adds ratios and their confidence inlc:n'als can be raaclily obtained. In a matched study, the conlrOls have deliberately been chosen 10 be mon: similar 10 Ihe cues than those gc:ac:ralJy at risk or the disease.. This has to be n=cognised in the analysis. The analysis of malched lIudics il man: complex than far ul1lll8lchcd sludic:sand in the discussi_ of the basic analysis., only mab:hcd pain analyses will be considered. The analysis focuses _ the pails rather than the individuals contributing to the pairs and lhe standard pn:scnIaIion of the daaa is as follows:
and then obtaining the 95 ... 4.DIficlcnce ialCrVal of the nlllUrai logarithm of the odds ratio fmm: log(OR)
=
Conlrol
1.96)C J'var[loI(OR)]
Taking exponentials orlhe two .aulting values gives the 95 CJ, confidcace inlCnBl for Ihe odds ratio. In Ihe example above the chi-sq~ test gives a value of IS.3~ indicating p < 0.0001 and a 9S fJt coaftdc:nce inlcmll for the odds ratio of 0.54 10 0.81. Thus this simple analysis indicates a sIrOng assacialion bc:Iwc:cn cad liver oil and diabetes. Howcva-. odds ndios andlheir confidc:nce intervals an: IIIWIIly cieri. using computer packages. One that is bely available and pnwidcs lady IIDCCSS to Ihc:sc fanns or analysis is EPUNFO (http://www.cclc.gov/cpiinfal).This can be dowaloadc:cl via the intcmc:t and is in the public clamain. Man: often we ~ intcn:sted in examining dilfen:nllc:vcls of exposure. A tn:nd in odds lBlios across ditrc~ntlcyc" of expasun: proYides man: convincilll evidence of a n:IaIiaaship between the exposun: and the disc:asc: tha raul.. fIOlD the simple dichotomy of exposc:cl venus unexposed. Odds lDIios can be calculated at each level of exposun: ~ willa the bascliDc: expasun: level (usually the unexposed). A chi-square test for In:nd can be pcrfonnc:clto test for linear tn:ncI in Ihc: odds 1DIi0s.1n the thinllable then: is an appamIt In:nd with IIICR flequc:nt consumpli_ of cad
Cae r
S
I
II
Thus cadi pair contributes to one of the cells and the lOtai (r +s+ 1 +,,) is the lOtai nlBBbeI' of pails mther than the lOtai number of individual cases and conl..,1s in the study.lnten:st focuses on the pain thai ~ discordanl far exposure. lbus a comparison is made bclwec:n dDsc: pairs where the case is
exposed but the CXllllI'Ol il not and the pain where the conuol is exposed and the case is not. The odds mio is calculated as the ratio of the discordant pain :sI1. The odds ratio will themfCR be high if the pain with case exposed and conIJQl unexposed gmdIy outnumber the pain when: the control is cxposed but the casc: is noI. As an c:umplc. consider a study on hip osteaanhritil by Caopc:rel al (1998). whichexaminc:d Ihe risk USDCiated with previous hip injury to Ihc: alfec:tc:d hip (ar 10 the hip on Ihe samc: side - right or left - for the matched conllol). The data fram the lIudy are given in the fourth table The odds ratio is 46/11 =4.2.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CAUSAL EFFECT (DIRECT AND INDIRECT) case-contral studies Data from a matched casecontrol study in hip osteoarthtilis Conlrol PreriollS hip injury Pre\'iaus hip injuJy No pmrious injury
I II
NoprniolU injury
46 553
Basic analysis of malchcd studies becomes difficult in anythinl oIher than die matched pain:d analysis of a simple dichotomous exposun:.. Nowadays., matched case-conlrOl studies an:: usually analysed using CONDmoNAL LOOISTIC RECIlESSlON. This is not the same as unconditional IOlistic rq:ression mentioned above. as it takes account of the matching struClun: in the design of the study. Conditional 100islic regression enables us to lest the odds mliocak:ulated above. giving p < 0.000 1 and a 95 CJt confidence inlcrVal around the odds nttio of 22 to 8.1. This indicates thai hip injury isassociatedwilhan inm:ascdriskofhiposteoarthritis later in life. Unmatched case-conlrol studies can be analysed in most standard statistical a1IDputing pac~es. LOIistic rqression is used in many fOnDS of analysis not conftned tocase-conlrOl studies. Conditional logistic n:gression is somewhat more specialised and is nol available in all packages. It is most n:adily available in STATA for which there is a specific routine for Ihis. It can be pclfarmccl in SAS. S-plus and SPSS (sec STA11511C'AL MC'ItAOES) bul these packages Rquire use ofa PROIIORTIOIW. HAZARDS model for which the likelihood function is identical, so cannot be used quite so n:adily. EPIINFO, mentioned above, does enable much of the basic analyses and is alsoconvenienlto use ifone is examinil1l data thai an:: already tabulated (such as in a published papu). rather than the raw data on individuals. HI [See also NESTfl) CASE-C'ONI1t.OL S'IlJDJES., OBSBlVA11O.'W. sruri. IES, s,wpu SIZE DEtERMINATION IN OBSERVA11OHAL SruDIES)
Barbaae,. F.. AastID, H. ... ~ Eo E. 1993: Diet and eDdomdriai cancer: a case-coatrol study. AmmctIR Joumol of EpUlemiolo,,. 137. 393-t03. BNllow, N. E. aad Day. No Eo 19IO:-StQt&titQI methods in amcer resetm:h. \VI. I: The tmtIiysiJ of£tue-control $tlMiie.J. L)'On: Intcmalional Agency Cor Ri:scarm on Cancer. Cooper,c.,~H.,Craft,P.,Campbel"L,SmItII.G., Md.ana, M. aad COlI. . Do 1991: ladividual risk factals far hip osteoarthritis: obesity. hip injury. and ph)'sical activit)'. AmtritDII JOrmNll of EpkkD,iology 147. 516-22. ~Iut)'a, c., Coaoa, D., IDskIp, H., Lace;r, R. aad Y.... W. 1997: Aluminium coacentralions in driakin; water ad risk of Alzbeimer's disease.. Epidemiol. ogy 8. 281-6. Sc:............ J. J. 1982: Ctue-C'OfItroi sludirs. Daign. £OMUCI and DIIQ/),m. Oxford: Oxford Uai~rsity PIas. St.... L C. and Joner. G. 2003: NOIWqi_ Childhood Diabetes Study Gruup. Usc of cod liver oil during the ftrsI year of life is
lUSOCialed with lOM:r fist of cbilclhood-onset lYpe 1 diabetes: Iar;e. population-based. case-coatnJI study. NnrritYm JournoJ of Clinical Nulrilion 78. 1128-34. Tappia. D.. Brooke. H., BeDIt, R. aadGI...... A. 2002: Used imant rnauresscsad sudden infllDldcalb syndrome in Scotland: case-coatrol study. Bril;m Medit:ai JOUl'IrtII 325. 1007-12. I
categOrising continuous variables
nus is die
process of convertil1l a continuous variable such as age inlo a categorical variable with a number of categories., e.l. 'younl' «40 yean). "middle qed" (40-60 yean) and ·old· (>60). 'The practice is vel)' almmon in medical resean:h where clinicians appear to havc a lencraJ preference for categorisinl individuals (see Allman. 1991). When used 10 simplify or improve the praentation and description of the data. IfOUping individuals into Calcgories may often be useful. althoUlh the choice of category boundaries and number of categories may not be easy. When. however. the categorical variables c:n:aIecI by die process an:: used in data analysis. rather than the original continuous variables, problems arise (sec. for example. Hunler and Schmidt. 1990: Stn:incr. 20(2). Calelorisalion introduces an extreme form of measurement error: splitting a continuous wriable into categories results in lost infonnalion and an inevitable lass of power in analysis. n.c apparent simplicity of the categorical variables and the ability to use proportions and odds ratios., which an:: mon: familiar to many clinicians. are unlikely to compensate for the lost power. Retaining the continuous variable and analysing the data usina; the appropriate statistical methodolOlY will always be a far bc:uer strategy. SSE AUmaa, D. G. 1991: CllegclI'ising coalinuous variables. Brilish JoumQ/ of Cater M. 975. Haatu, J. E. aad ScIImIdt. P. L 1990: Dichaanisalion of continuous variables: the impIic:atiOlL'i for mda-analysis. JourllQ/ ofApplied P~'dology 75, .334-49. Stniner, D. L 2002: Braking up is hard 10 do: heartbreak of dichotomizing continuous dalL CQlfQt/itm JoumQ/ of PsymiDlry 47. 262-6.
causal diagram
See CAUSAL MODELS
causal effect (direct and Indirect) This ~fers to the change in outcome distribution. or some feature thermf. that would arise: under a specific intervention. It is commonly fonnaliscd using so-called countcrfaclual or potential oulcomes Y(a). which n:pn:sentthe outcome that one would possibly conlnlry 10 fad - ha~ observed for a given subject had the exposun: A been set to the value Q through some intervention or maaipulatioa. The (average) causal cITed of exposure Q on the outc:ome can then be deftned as die population-avemgcd difference E( Y(a) - Y(O). between the counterfadual outc:omes under exposure le~1 Q and some n:fcn:nce exposure level o. This is to be alntnslcd wilh die mo~ usual expected dilTen:ac:c £(t1A =a) - E
81
CAUSAL EFFECT (DIRECT AND INDIRECT) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
(fIA =0). when: the expec.tc:d observed outcome Y is c0nsidered over different subpopulations defined by levels of observed expasu~ A =a aad A = o. This may not CIIIT)' the inteqRtation of a causal effi:d when the subgroups of exposed and UDCXposed subjects ~ not inherently comparable. When the outc:omc is dichotomous (with Y= I encoding "disease' and A =0 encoding "no expas~'), a causal effect can alternatively be quantiftc:d in terms of Ihe causal ~lativerisk(seeltELo\m'ERISKANDODDSRATIO),PCY(a)=
I tiP
( YeO) = 1}. the causal odds ratio (see REU.11\'E RISK AND ODDS RATIO). oddst tla)= 1t/oddst y(0)= I t. or the causal populalion attributable rraction. IP(Y= I) - PC y(0)= I WP(Y= I). nx: latter expresses by what pen:entage of diseased subjeCts the disease could have been pnwcnlcdlavoided by taking away the exposure (assuming that the exposum can only cause, but DC\'er prevent. disease). Causal effects ~ sometimes deftned within strata of given values of ~POSUI'C eovariatcs C. e.g. £e Y{a) - Y(O)ICJ. within sInIta derU1ed by the obscnoed exposure. e.g. £{ Y - y(0)1A I (in which case one ~rers to them as "causal effecls in the exposed' or "treabnenl effects among the treated'). or within so-called principal strata defined by joint counterfaclUal outcomes (Robins. 1986. Sc:ction 12.2; Fnmpkis and Rubin. 20(2). For instance. in the oontext of ranclomised experiments with lreatment noncompliance. Frangakis and Rubin (2002) rocus on Ihe local average ~alment effect (LATE) E{ Y(I) - Y(O)IA(I) = I,A(O)=OI. where y(r)aadA(r)denote the counac:rfactual outcome and R:CCived treatment under randomised assignmenl to ~almcnt r=O or 1. The LATE thus encodes the effc:a or assignment 10 Irealment 0 VCBUS I within the principal stratum {A( 1)= I, A(O)=Ot composed or subjects who would take the tn:atmcnt if assigned to it (i.e. A( 1)= I). but not otherwise (i.e. A(O) =0). By extending the previous c:onc:cpIs to a joint exposure (A. M). when:: M is a mediator or intcnncdiale variable on the causal path rrom exposum to outcome. definitions of dim:1 and indi~ causal elrects can be oonslnlcted. These express the causal errect that maniresls besides or through the change caused in that intenncdiate variable. Consider. ror instance. the Methods for Improving Reproductive Health in Arrica Trial. which investigated the effect of diaplarasm and lubricant gel use (A) in reducing inrection by HIV (Y) among susceptible women (Rosenblum et 01.• 20(9). Women received intensive ooncIom 4XM111SClling and IXOvision. aad wen:: then randomly assigned 10 either the active tmalment arm (A = I) or noI (A =0). Bc:cause them was much lower reponed use or c:ondoms (M) in the intervention arm than in the control ann. special inlen::st lies in the din:ct effect of assignment to the diaphragm ann. other tlum through its effect on condom use (Rosenblum et al.• 2009). To rormalize the aJIICept ofa direct causal etTect. define ror each subject r(o. m) to be the countcrractual outcome under
exposum level a and mediator level m. and M(o) to be Ihe counterfactual mediator under exposure level a. The controlled direct etTect or exposure level a versus refemnce exposum level O. controlling ror M. can then be defined as the expected conlnlSl Ee y(a. m) - y{O, nr)l (Robins and Greenland. 1992; Pearl. 2(01). It expresses the exposun: effect that would be realised if the mediator WC~ controlled at level III unifonnly in the population. In the example, with M expressing the pen:entage of sexual acts when: the male condom was used. E { Y( 1.0) - Y(O,O) t expresses the effecl or assignment to the diaphragm ann ifin lruth no male condoms were used. 11Iere ~ a number of limitations to the c:oncept or a controlled direct effect. in view or which altemative deftnitiom have been IXOposed. First. it is often not realistic 10 imagine rorcing the mediator to be the same for all subjeCts in the population. Second. indirect cfTecls cannot be defined in a similar manner as c:ontrolled direcl effects because it is impossible to hold a set of variables fixed. such that the elTect of exposure on outcome would circumvent the dim:t pathway.ln particular. the total causal effect. say E{ Yea) - Y{O) I. minus the controlled direct elTed. say He m) - Y(O. m)J. may not rqxesent an indirect elrect. unless the exposure and mediator have linear etTects on the oUlcome (VanderWeele and Vanstcclandt. 2009). Both Iimitalions can be oYCR:Ome by considering so-called natura) or pure direct effc:as (Robins and (in:enlaad, 1992~ Pearl. 20(1). which are de:fined as the expected contrast £t Y{o. M(O» - y(O, M(O»)J or total direct effects (Robins and Greenland. 1992), which ~ defined as £1 y(a. M(a» - Y{O. M(a» t. The natural direct effect essentially expresses what would be realised ir the exposure was administered. but its elTect on the mediator was somehow blockc:cl. In the example. H( Y( 1. M(O» - Y(O. M(O») t expresses the etTect of assignmeut to the diaphragm arm as it would ha\'C been observed ir women had not changed their rrequency or male concbn usc rollowing randomised assignmenL 11Ie difference between Ihe total causal effect and a pun: natural direct effect. £( Y(a) - Y(O») - £( Yea, M(O» - Y (O»=E{Y(a. M(o»-Y(a, M(O»J. measures an indirect effect as it expresses how much the outcome would change on average if the exposure were controlled at level a bU11he mediator ~ changed rrom level M(O) to M(a). It is termed the lotal indirect effect (Robins and On:enland. 1992). In theexamplc.E{y{I.M(l) - r(1.M(O»J would exprcss the change in HIV incidence thai would have been observed for women on the diaphragm ann ifthcy had gone back to their original use or the male condom. Likewise. the difremnce between the total etTect and the total din::ct effect gives Ihe natural indirect elTect £1 Y(o) - Y(O)l- £( Y(a) M (a»J =£t Y(O. M(a») - Y(O, M(O)t. This expresses how much the outeome would cbange on average if the exposure were controlled at level Obut the mediator ~changed rrom
yea.
yeo.
______________________________________________________________
its natwal Icvel M(O) to thc level M(a). which it would have taken at exposun: levcl a. Direct clTccts are sometimcs deftncd using concepts or principal stratiOcation (Frangakis and Rubin, 20(2); for instance. as thc exposure effect within the principal sb'alum of individuals whose mediator level was not arrectcd by the exposure. Lc. £fY(a)- y(O)IM(O)= M(a)}. Unlike the rargcoing definitions of direct cffcct. principal stndUm direct elTects do nul conccpluaiise the possibility of manipulating thc mediator. However, they havc a more limited utility because of the inability 10 ideatify which individuals faJl into which principal strata~ because the principal slnda are sparsely populatcd in many realistic applications and because thcy do nolcorrespond to a natural dcfinition ofindin:ct cffect (VaDdcrWcclc and VanSleelancb. 20(9). All foregoing causal effecl definilions require Ibe same population to be evaluated under diffcrent intervcntions! exposun:s. Because of this. they require specialised estimation techniques (see ),IAROUW. STRUCJ1JRAL !.IODFlS. INVERSE PROIWIILm" WEIOHTINO) thai work under specific ID1lcstable assumptions. such as about the exchangcabililY of subjects undcr differenl exposure Icvels (e.g. assumptions thai all conroundcn or the association between exposun: and outcome ha\'C been measured). CWl'Cnt eslimalion techniques for the causal cffc:ct £f Y(a) - Y(O)t (and likewise all other considered causal effect measun:s) infer the effcct of "noninvasb'c' intervcntions or manipulations thal allow forselling the cxposure allevel a versus o. Here. 'noninvasiyc' inlcl'ventions that set A to the value a are those that have no effect among those for whom exposun: levcl A =a was naturally obscrvc:d (VanderWcele and Vanstcelandt. 2(09). The use or counterfaclual outcomes makes dear the pn:cisc meaning of vague statements such as 'causal effect'. "direct cffect' and "indirect effect' • and makes clear. in particular. that there are different definitions of direcl effect and indirecl cffect. Whether estimates of the CXJDsidered clTc:ct measures can effectively be intelpretcd as "causal' effects in a specific application depeads on whether the required unlcstable exchangeability assumptions are mel Sv. EG Fa........ C. ud RabID, 0. 2002: Principal stnlificalioa in c..sal infe.-:ncc. Biometria 58, 21-9. Pearl, J. 2001: Dinxl and iadiRct effects. In Pr«m/illgJ of tire Semrtl.Ynth CDII/tmfte Oil Uncertaint)' tmtl artijicial iRtelligencr. San Francisco: Morgan Kaufmann.. pp.4II-20. RabIDs, J.1\1. 1986: A new IJIIIIIOICh to c.asaI infe~nce in mortality studies with sustained exposure periods - application to control of the bcaJthy 'A'om survivor eff'ecL MatMnlDt;cal Modelling 7, 1393-512. RoIIIaI, J. PtL ad One........, S. 1992: Identifiability and ellChan,eability for direct and indiRct dfects. Epidemiolog:t'l. 143-55. RaIeDbIaJD. 1\1., Jewell. N. p.. taD del'
s..
....... 1\1., SIIIbcUI, Va Ow Strate... A. aDd PIIcIIan. N. 2009: Analysing dinxt effects in randomized Iriais with secondary intcn'allions: an appliC4lion to buman immunodeficiency vinas preveation trials. JOIImaJ of the Ro),al Statislital Sot:iel.' - Series
CA~ITY
C 172. 443-65. VuderW.... T. J. ad V......1udf, S. 2009: Concct*Jal issues concerning mediation. intavcntions and composilioL StaliJtks and ilJ inter/acr. 2, 457468.
causality Thc two fundamental principles for establishing evidence for a cause and effect n:lationship are deduction and induction. 'I1Ic flJSt. with its roots in carly Greek
philosophy. encompasses malbcmatical reasoning: starting with a pn:mise fannulated withoul reference to the outside world. a general Ibeory is developed through logical reasoning and subsequently confirmed by statistical obSCIWlion. Induction. contrariwise. aims to establish a general principlc by observation of the natural world and sceJc.s 10 conOnn Ibis principle tJuuugh prediction and further observation. In contrast to deductive reasoning, inductivc reasoning proceeds from the particular 10 the general. While there can be no doubt that much of modem scicnce strives to make progress through inductive methods. the dcductb'c principle has a firmer pounding in logic. As Hume pointed out. simply obscning that one event foUows another. no matter how many times it occurs, does not establish Ibalone evenl caused the oIhcr. 1hc concept of whal a cause is needs to be set in context with the knowledge or Ibe time. In 185S John Snow established the link between a "morbid matter' carried in waler and acute. often fatal. dianhoca in London (Snow. 18SS). Snow'S evidence consisted of temporal statistical associations. geographical associations in relation to the water companies supplying the city and consideration or a plausible roule of infection through ingestion or drinking water. It was the assembly of these various strands of evidence Ibat led 10 the premise that the water soun::e caused the dianhoca and a subsequent outbreak in Central London gave Snow the chance to cstablish deductiYC proof. by removing the Broad Streel watcr pump. with the immediate effect of stopping the epidemiC (see IDSTORY a; MEDIC'AL STA11STICS). Snow postulated a cause of c:holeru 30 yean before the bactcrium was isolated. As scicnce delves deeper into genetic and molc:cular mechanisms. we alter our level of definition or causal radors. In realilY. few causc-effect associations are simple tworactor problems. Many or our prominent chronic diseases are multifaclorial in nature and many of the factors chat contribute to causation are yet lo be discovered. Despile Snow's example. much scientific work progresses withoul deductive proof: empirical knowledge seems 10 accumulate through methods that may lend support to but do not actually guarantee COI1'CCt conclusions. Statistical procedures are employed in obscrwtional sciences. when: a hast of unknown sources of error. including ME.o\SUREMENT ERROR and random ftuctuations, musl be considered berOR even progressing towards consideration of causality. Formal 'proof" is genemlly not alIainable in observational sciences and il may be fruitless 10 try 10 demonstrate causality in 63
CA~MODas
________________________________________________________
a fannal way. Mon: often- a number of usociations must be considcn:d together to assemble the totality of evicleace. The BRADRIlD Hill. CItrI"EIUA offer guideliDes for consiclc:riag the talalily or e\'idcna:. Some of these criteriL such as establishing temporalily and demonslnlling n:venibilily. ~ powerfullaols in determining causalily, whcmls othcJs may depend on specific cimanslDDccs. Bradford Hil)"s aim was primarily ID provide a mon: SlrUctun:d appmaeh ID infarmilll dc:cisiDns for IRvenli~ action. OIhcr wrilers ba~ advocalc:d decisions based on Popper·s concepl or n:fillDIion. Rcfutalionists would sugest thai sciena: can advance mo~ rapidly by formulDling radical hypolhcscs and disauding cnvneous theories ahan it can by liuidcss accumulation of suppodinl evidence. We could make many stalcmcnts in faWKll" of a hypDlhcsis and still fBi11D establish il. but we could lake JUSI one cumple thai disproves the hypothesis IDCIOIICludc il is false (e.g.. one black swan is proof against the assertion thai all swans ~ white). Howe~. even the process of Rfulalion dc:pcnds on observalion and is Ihcn:fon: ilsclf IIDl devoid of uncenainly. Thc~ n:mains a crucial role fOl' stalistics in any science basc:d on obsc:mdion: to set probabilistic limils on whc:lhcr an observed association exisls or shauld be ipon:d. Many widcJy accepted stalisticaJ practices appear to ~ to clccluctivc 19SIDIIiDI. 'I1Ic NULL HYPOnIESIS is a Slalcmcnt formullllCcl prior 10 any knowledge of the daIa and then:fon: a lest or the null is an inference from the hypothetical general population ID the sample: "Jf then: is no association in the lcaera) population a Rsultlike the one seen in the sample would ha~ probabilit)' P: In Ihe UXElJIIOOD approach to estimation, computations proceed ftom consideration ortlle hypathclical value for the parameter to support for ahat value in the particular daIa obscnal. Howe~. fn:qucntist theory would appear to be embcddccl in inducti~ n:DSODing: we imqine an experiment rc:pcating an infinite manbcr of times. under the assumption that given enoup n:pelilions we will ullimalcly arrive alb wth. While this n:asoning secms Iopcal if conditions remain unalten:d.. some would arsue thai ifconditions an: likel), Iochange in different populations or under the inftuence of additional variables. then a gc:aeral principle cannal be asscrlc:d on dais basis. As with anydisciplinc..ourability todelc:nnine meDnilllful n:sulls in slDlislicai analysis dc:pcnds on the appruacb we adopt. The disco\ocr)' of a slalistically significanl raull is paunds forn:jc:c:ling a null hypathcsis. but is notevident'C for a previously unfonnulalc:d hypothc:sis on the st~ngth of the currenl data. Hence. dn:dginl clata for significant Rsolls takes us no ~ to establishing causality and will at best mise possibilities for futUM invcstiplion. Likewise. ilwe ~ seekinl "explanation· in a statistical model. the decision to include a term in the model. guided only by considcndions of stalislical signiftcance (refutation of the null). would IIDl n:spccl deductive principles (Maclure. 1915). If scientific
explanalion n:lies on testing specified prior hypathcses. a mon: consistenl modelling approach would be 10 include modellenns. even if Ihcy ~ nul signincant in the curn:nl JGW data. For fUrther dewls sec Weed (1986). MacIaN, M. 1985: PappcriIIn ~fUlation in epidemioloJy. Ammnm JOIII1ItlI of Epitkmiology 121.343. SDDW'.J. 1855: Or lire"",. of t..",,'lIIIrcstiDlr of ('lroIml. 2nd edition. LoncIan: Chun:hill. W.... D. L 1986: On die logic of eausaI infemacc.. A.,i"", JourllQl of Epitkrnlology 123.965-19.
causa. models Thc:sc models attempt
to discover
whether an observed aSSDCiDlion between an expnsun: (e.l. cipn:Uc smoking) and an oulcome (e.g. high blood pn:ssun:) arises because the exposun: causes the disease or whether it is a SPUT;"'" MlOcitlliOtr. A spurious assacialion can arise if boIh the ex)JOSlR and the disease have one (or mon:)cammon cause -e.g. if socioeconomic stalUs is a cause of both hia;h blood pn:ssun: and cigamte smoking. Tbcn: arc four major types of causal model (see On:c:aland and Bnonbaclt. 20(2) and many definitions of 'causal' in an epidemiological CXJIIICXt (sec Panuc:andoJa and Wccd. 2001). One lype of causal model is the ctlratli tiillgnm. (Gn:enJancl and Bnunback, 2002; HcI'lUlD el til.• 2002). Such diagnuns link exPOSUR. outcome and oIher variables by arrows n:pn:sc:ating din:cl causal effects. A hypothetical example for smoking and hilh bload presson: is shown in the finI figun:.
cauaa. models causal diag18m showing a possible relationship between smoking, high bloodpressure and
socioeconomic status Such diagrams can be drawn rnm prior knowledge of causal mechanisms. Uses include deciding whether adjusting for CXJIIfounding wriables would incRasc or dccn:ase bias. For example. from the filOn:. one would adjUSI for socioeconomic status in analysing the relationship between smoking and hilh blood JRSsun: (as it is a common cause). (For IlIOn: details. sec Heman. el tiL. 2OD2.) A second type 01 causal mocIcl.1he collnlerftldUtlI model. arises from a definilion of CAUSALITY: "Exposure makes a difference in joulComc (or Ihe probabilily of an outcome) when it is pn:scnl. c:ompan:d with when it is absent" (PamscandoJa and Weed. 2(01). Thus the ·coUDterfactuaJ· arises because the dcfinition hypothesises about what ·mia;ht have been· if conditions had been other than those actually o~ served (Maldonado and Gn:cnland. 20(2). 'The deftnition also specifics that all oIher conditions should ha~ Rmained
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CAUSE-8PECIFICDEATH RATE
i.e. it is only the cxposlR that WII5 (hypothetically) changed. For example. wc c:auld say that smokilll causes high blood pn::ssun:: if blood )R5MR is higherwllc:n a penon smalcs comparailo what it would be if that same penon we~ a DOIISmoker. nte causal effect of smaldDl on blood prasun: for an individual c:auld then be estimated by the diffcn:ac:c between their blood pn:ssun: when they s....cd ancItheirbloocl pJaSlIM iflhey did not smoke. However. only one or Ihese outcomes for eacb person is actually observed. 1'1Ic unobserved quantities an: estimlllCd using subslilUlcs (Maldonado and Greenland, 2002). This could be by using uaexpased individuals 10 estimlllC the oulc:omes if exposed individuals had in fact been unexposed (OR:enland and Brumback. 2(02) ar by imputiDlIhe outcome from observed covarialcs. In a nmdomised mal, we assume that In:alment ancIcontrol pvups an: balancc:cl in all n:spec:ts. cXccpllhlllthe cxpasun: and then:fom the outcome in the expasc:d poup can be used as a substitute far the: outcome in the cannl gmup wen:: it to ha~ bcc:n exposed (Mandolado and Gn=enland,2OO2). 'l11c third type of causal macIcJ elisaaccl in the 2Q02 n:view. the: sujJicienl-colllponenl t:tlUN macIcJ. assumes that the: autcane is aused by sevcnd causes. none of which is ncccssauy and l1IfIk:icnl alone to cause the disease (Gla:nlancl and Brumbac:~ 20(2). Such maclcls me often IIIown as PIE CIWO'S. with cac:b slice IqRSI:nting one of the components or the overall cause. Forcxamplc. high blood pn:ssun: couldbec:aused by the piaCDCe of any 1\\'0 of smokilil. low soc:iocconomic sIatus and beilll CM:IWCight (see the second Iipn:). One elisadvan. .c or such models is thai in cxdcr to explain n:lationsbips such as Ihat of smoking with lung cancer. it is necessary to assume that ";smoking is one elemenl in a sufficient cause and thalthe GIber clemen.. simply ha~ not been identified ycf (Parascandola and Weed. 2001). 11ris is clearly a slnmJ assumptiDD to maIcc in many an:as of medical stalistic:L The fourth type or causal mocIcl.lhe STRurnJRAL EQUATION MODEL. may be thought of 115 a paramclerisation oflhe causal graph (Greenland and Brumback. 2002). Hen:. each n::lalionship DD the graph is expn:ssed 115 an equation. with each yariableoccurring as an autcomc: in only oneoflhe equations. All paramc:IeIS ~eslimaled simultaneously and can include CIIDI" Ienns and underlyinglREHf VARIABLES (as unobserved variables) and ClDll'Clation between any of the variables. Si..p-Manoux. Richards and Mannot (2003) usc:cl causal cliqramsand slructuralequation models to analylie the causal n:Iationsbip between leisun: actiYities and cOJniti~ funclion. 1he causal diapana showed eight measured variablcs (including the outcome) and seven unobserved variablc:s. and included cOl'l'CJation between leisllM activities enlailiDllow and high CVJniti~ elrod. Participation in leisun:: aclivilic:s. particularly social ar hip cognitive effort ac:livities. was positively associalc:d with cOlftili~ function. KT CODSIaDI;
causal models Sulficient-oomponent cause model showing hypothetical causes of high blood pressure 0 ........... S. aM • .,.....,.. B. 2002: An overview of ~Ialialls IIDGIIg causal modeUiag medIods. Inlf!t'llQlitmtll JDUl'flQI t1/£pillemiolo.o'JI.I030-7.aen.a,M.A.. H............. s., W....., M.M. ... Mltdaell.A. A. 2002:0nasaI bowled&c as a pcmquisi1c far oonfouading evalualiaa: an applicatiOll to birth defCClS cpidemioloo. NlMnCYIII JtJllTllQl of Epitltmiology 1.5.5. 17~. MaIdD....., G. ad Oneallad, S. 2002: Estimating c.asaI effects. Inlmtllliolltl/ JOUI1ItlI of Epitlml-.o' 31. 422-9• .,........... Me ..... Weed, 0. L. 2001: CausaIioa in cpidcmiolol)'. JtJllrlllllof EpitkmIoIogy _ Communil), HeflII" SS. 9OS-12. SI~ A.. ~ l\L ... r.1mnof, M. 2003: i.eislR actiYilies aad cognitive fuaclion in middle age: evidence fIan Ibe WhiteW11I study. JOUIIIQI ofEpidemiology GIld OInt"".,il>, Heflll" S7. 907-13.
e_u.. epeclflc death rate
11Iis is • death rate calculated far pc:opIe dying fRJIR a particular disease. AlI-c:ause monaJity ndc:s pmvide a summary of the overall moltality of a population. bulthe distribution of causes of death can vary considerably between populations. Classically. mortuly fRJIR infectious diseases is higher in counIries thai ~ less deyeloped. when::as diseases thal ~ chronic:. and primarily decl the elderly. pn:dominale in thc dew:lopcd world. In saudying a population's mDItaIityexperiencc. it is thc:n:fon: necessary to co.-... the specific causes of death rather than simply the all-cause death rate. the table gives the cause-specific: rates for a variety of diseases for eliffcn:nl c:auntries. nte rates ~ calc:ulated by dividing the number of dcalhs from the cause in question during the year by an estimate of thc mid-year population. Thus. the Romanian rate for tubm:ulosis is derived from 2130 (the deaths from tuberculosis occurring in the yC8l' 2000). elivided by 22.435.000 (the number in Ihe population far that year). nte n:sull is then multiplicd by 100.000 10 giyC the rate of 9.5 per 100.000 population. 8&
CENSORED OBSERVATIONS cauae-speclllc death nate cause-specific death rates per 100 000 from selected causes from various countries (United Nations, 2002) CDUlllry ArgCllliDa (l996) Aumalia (1999)
Bahamas (l997) JIpIII (1999)
PImama (1997) Ramlllia (2000) South Mica (l99S) United Kin""" (1999) Unital SlIdeS (l99B)
TIIbe,t:IIlosu 29.5 0.2 0.7 2.1 4.1 9.5 32.2 0.9 0.5
HIVI AIDS
Lun,
Cardiol'tUCllltlr
Acc;dmls turd
All
CllllceI'
dUmse
"iDlerrce
ctIU:WS
52.2 21.9 28.8 29.7 56.9 64.2 119.4 33.4 55.6
863.9 337.9 26t.1 740.1 418.7 1140.3 Ci06.2 1062.3 163.9
23.2 0.4 50.2 0.0 15.6 2.2 13.2 0.3
5.0
The lable sIIows that &"Ciden15 ad violence ~ hi&h in South Africa. wllcn:as canliowsculardiscase is of conc:ern iD Romania. It mull be noled thai Ihcsc an: crude rates and tile difren:al age struct1R5 of Ihc: populalions have DDt been CllDSicielai; lheeamparisons CD thedeR be mislcadinl (sec /tOE-SPECIFIC RATES).
Cause.-specifie death I'IIlesan: publisbccl by manycaunbic:s andona. an: pn:scntccI asap-and scx.-spcc:iftc ndcs I"oreach causc. The rates ~ pracntc:cl by cause of'death as classified by tile International Classifieati_ of Diseases (World Health Or&anisation. 1992). Replarupdates to Ihc classification an: n:quilai 10 accaunt for Ihc: chanlil1l dcftnitioas of disease and. in particular, to n:cognise new diseascs. IW example. acquilai immunodeficiency syndrome (AIDS) did DOl featun: in the 9th ~vision but has been included in the 10th. For analysis of disease time bends. ·bridpns· amJIS tile ~vi lions can pracnt some difficulties due 10 tile chaftlinl definitions. CMse.-specific rates can also be obtained far new cases of specific diseascs. MDY countries now have cancer n:gislric:s that publish national ancIIor n:ci-al clIDCCr incidenc:e niles. These an: collated internationally by the IntematioDal Alenc)' ror Resean:h (Parlin ellll. 2003). which publishes qe.ancIsex-specific incidenc:e rates for various types or cancer fiom mIlD)' caunlric:s 8CIOSS the world. Similarly. I'IIleS for tm:r200condilioas"ve been compiled by tile World Health HI Orsanisalion (Murray and Lopez. 1996). Marray. c. J. L - ' Lapn, A. 0. 1996: GIobaI_IIII sialiJlic£ A t.YIIIIplntliltm of ;"dMIK'l'. prnIIleII« _ morlality ~stimtll~s for Drt!f' 200 ctIIIIliliDlu. Han'lnl: Wortd Heal... OrpnisatiaL I'IItda, .... ww. s., ,....,. J.. ......... L .... 'I1IoIuI, D. B. 2003: CtlllftrirrtitleMrinJlWftIHIlirwnts, Vol. VID. L)'OD:!ARC Scicalific PUblic.iaas. U...... ~ 2002: 2000 Demo"aplJic ~Ic. New Yok United NIIIions. W.... IInUb 0rpaIIa.... 1992: Inl'l7ffIliMtII sialuliCYlicltmi./i«lliDII ofdisease,. RIfIlftiII,lIllh
pl'Dbkms. 10lh rmsiaa. Gcaew: \\aid Health OrpniSldian.
17.9 3.6 20.6 6.0 37.9 9.7 58.3 51.1
297.3 135.2 11.9 122.2 118.3 701.9 103.6 426.2 349.3
censored ob8erVlltlona This is a distinsuishil1l c"ractcrislic: of timc-to-eftnt data (sec SU1lVIVAL ANALYSIs). Censored obsc:nations CCllltain onl), paIIiaIl)' abscrved infOllDlllion about the time to Ibe evcDtofinlCraI; i.e. the exact time oflhc ew:nt is unknown as it may DDt yet"ve occum:d or be known to haftoccum:d.lWexampie. in a study of time to recum:nc:e of a parlic:uIar medical condition. say leukaemia afta' ·succcssf'ul· bone mallOW IransplanlaliOD. some: patients ma)' not expc:rience a m:urn:acc at the end of Ibe lIudy. some may dnJp out or be 105110 follow-up. some may expc:rience tile ew:nt of iDterat duriDI successive medical visils, while )'et othc:n may experience a ·compc:lins' event. say death. which prevealS further follow-up of these subjeclL There an: difl'en:at fonns of eensoriDg. The most cammon in mc:dicalsluclic:s is ,ighl «IUD,lng. This oceun when b)' the end ofa subject's follow-up the e'Vent ofinten:sl has not been obscm:d. In this situalion all that is known is that the true unobserved ·surviyal' lime cx.cc:als the observed ceRSOI'Cd lime. Most of the examples aln:ady seen ~ or this form.
lift cellMN'In, occan when the true surviyal time or a subject is less thaa Ihe actual lime observed. For cx.ampie. in tile Ic:ubemia study jusl examined. subjects may relapse before "eir finI medical visit. 111us. onI)' Ihe inaJlllPlete information thai the true m:ummce times ~ less than the limes 10 their first IDCdical 'Visit is awilable. 8aIh forms of cellSDl'inl an: special cases of Inler,.,,1 cel&SDrilrg. when: a subjc:cl is kaown only to ha~ experienced tile event within a specific time intc:rval. F'urlhcrmcR. Ihc censorinl mechanism CD be independent or dependent. In the fanner, Ihe true lUn'ivai lime or a subject. whose observation has bee. eenllRCl. is independent of Ihe mcchDDism that lxaqhl about this censoring. Allemalimy, Ibe survival pnISpCCIs for a homopneous Croup of subjects ~ tile same in Ihose c:ensored and in those continued to be followed up. Dependent censorinl CaD occur when subjects an: wilhclrawn 110m a stud)' beeMse of their
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CHI-saUARETEST ~nt high or low risk ofexperiencing the event. This type of censoring makes saandard sUl"Yival analysis techniques invalid. Thus it is important when cc:asoring CJC:lCU15 toc:ollcct as much inronnation as possible on those subjcds with cel1SDl'ed obscl\'alions. in order to decide whether the mechanisms behind the censoring types eDCountemi arc independent or dependent. Censoring should nul be "ignon:d' as valuable information is contained in dlosc subjects withcensorcdtimes.1'hcrefore. any analysis performed must take account of ceasoring to make valid inferences.. For further details see Collett
(2003).
BT
CoIeU. Do 2003: Modelling sun'irai dota in medical ~S'I!tI1'fh. 2nd edition. Loadoa: Cllapman tot HalIICRC.
central limn theorem
See NORMAL DIS'I1UIU1ION
Central Ollice for Research Ethics (COREe) See ETHICAL REVIEW COMMl11EES
chI-square dlstrtbutlon '111is is the PIIOBABJUfY IXS. TRlBtmON of the sum of the squares of independent normally distributed random variables (denoted X2 and sometimes refem:d to as chi-squared). If we have a wriable. XI' that has a NalMAL DlmtlBUTION with &IEAN 0 and VARIANCE 1, then the sqUll'e of XI will have a chi-square dililribulion with one DEGREE OF fREEDOM. often denoted X2( 1) or xi depending on the text. Similarly. if we have n independent observations. each from a nonnal dislribution with mean 0 and variance I. say Xl, X2• •••• X". then the sum + Xl + ... + will have a chi-squan: distribution with ndegrusoffRCdom. here denoted The chi-squan: dislribution can arise from many other cireumstances. but this definition makes it clear thai since it is the distribution of a sum of squaml numbers. the distribution's delUiily function can only be positive for nonnegative numbers. It can also be seen that a variable taking a chi-square distribution with. for example. scven dec;m:s of freedom is the sum of seven inclepcndent variables each with a xi distribution. Since the xi dislribulion has a mean of I and variance of 2. this result tells us that the distribution will have a mean of n and ~ancc of 2n. In addition to thesc relaiionships. we noIc thai the distribution is identical to the EXPONEK11AL DISTRIBUTION with parametcr O.S and thll, in gcncml. the X; dislribution is identical to the OA!.BIA DISTRIB~. with panuncters nil and 2. As n becomes large, the distribution is bc:ucr approximated by Dnormal distribution (with mean n and variance 2n). It has a sJilhtly more complicated relationship with the F-DlmtlBU11ON. for which it is the limiting dislribution after scaling..
x;
xi
X;.
xi
xi
x!
'I1Ic shapes that the distribution can take and some of these relationships are iIlustnted in the ftgure (see page 68). For further details of the n:latiolUihips with other distributions, see Lccmis (1986). As can be seen. with one or two degn:es of freedom the made of the density function is at zero Dnd is strictly decreasing with the value of the madam variable. With more delJ'CCS of fn:cdom. the density function takes the 'whalc' shape. which is morc usually associated with the chi-square distribution. 'I1ac chi-square dislribution is most obviously used in the an-SQUARE TEST, but also appears whca testing hypotheses about variances. when pcd'onning LlKElJHOOD RAm tests and in some NONPAJl,UIEJ'RIC METHODS (e.g. the KRuSICALWALlIS lEsT).
Here the central chi-square dislribution has been discussed. There is also a non-cc:alral chi-squan: dililribution that arises when the normal random variables that define the distribution havc a nonZCIO mean. For further details sec Altman (1991). AGL AItmaD, 0. O. 1991: Practical slatistics/Dr Dltdkafreseardl. Boca Raton: CRe PrcssfCllapman tot Hall. ........, L M. 1986: Rclalionships llnOIlI oonunon uDiwriaIc distributions. TIw Amtriam Slatistician 40. 2. 143-6.
chi-square teat This is a statistical significance test used to assess a variety of hypotheses of categorical data. particularly when interest cc:atres on the distribution or obscrvalions across catcgories.. For example. a study might record the ethnic group of individuals and interest might cc:a~ on the plOportion of individuals in each ethnic group. .Categorical dala an: often pmiCllted in FREQUfNCY T.o\BLES (or CON11NOENCY TABUS). where cadi ccll or the table shows the number of observations (counts) for a particular combination of categories for the variables of interest. Chi-squan: tests are a form or HYFOTHESIS lEST w~ the hypothesis speciftes the expected distribution of observations across the cells of Ihc frequency table (the expected numbel' of individuals in each cell). The chi-square test statistic, %2. pIOvides a measure of how much the counts n:cordcd in the study (the ·obsen'ed COWlIS') deviate from counts pmlicled by the hypothesis (the ·cxpected counts'). A small "-VALUE is evidence that the hypothesis is false. (The formula for the chi-square test statistic is given in the c:alr)' for COK11NCJENCY TABLES.)
Frcc:man et al. (2002) provide an exampleoflhe usc ofchisquare tests. This re-audit of hip fracbR in Easl Anglia involved seven hospitals and analyses were pcd'anneci to establish whether there were any difrerenccs between hospitals in terms of patient demographics. One of the variables tested was patic:at gender. In this case. the hypothesis to test was thai the distribution of patienls across male and femalc calc:gories was the same at aD of the hospilals. In other words.
S7
CHI-SQUARE.TEST _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ (8)
(b)
3.0
I
ao·
~
2.0·
f.
2.0·
•
1.0·
'Ii ;a.
I .a
I
I·
-8
~ ~o~====~;:~====~~~~ 1 2 3 .. o
1.0·
I M~==~==~==~~==~ 2 :1 .\ o 1
Vakle
Vakle (c)
i
i
i
0.4
-8
~
I
~.20
.a
.i!.
ti
(d)
O.0-l:t====:::;::==~~:;:::=;:1
o
4
8
8
10
I
0.10
O~~====~==~====~~~ 6 15 20 o 10
vatllB
Vakle
(e)
(I)
1°·004 .i!
f
i 0.002
~.
I ~
0.0i:::~===:;:::=:::;:==::==!:=::::I 9800 10000 10200 10400 YallIB
seoo
9600
9800
10000 Value
10200
10400
ch......ie.dlstllbUtloli Ib'rslintl the form of the chi-squat8 pmbabllly denslty'function and its I81aI/onshf1s wIIh other disIribuIlDns: (a) chi-square tlslrlbution wiIh one degree of freedom; (b) the F-clsflfbulion with one and one thousand degrees offfeedom; (c) chi-squtue disttibution with fwo degtees 01 freedom (equivalently the exponential dsItIbuIiDn with a PfllllmeterofO.5); lei) c/JI-squal8d1f11dbu11on wllhsbc degreesofffflfKlom; {eJchi-squatedistdbution with ten thousand degtees of freedom; (I) nonnaI distribution with miJan of ten f!Jousand and WIIia~ 01 ~ty thousand Ihe hypothesiS was that the pmpaiti_ofhip fracture pallenls who WCR maJewas the same at aU ofthehaspitals. 1herauJt chi-square tell was =1.50 and P=o.48. indic8ling that Ihem.wu·lIOC\'ideaceofadillerence betweea hospitals the piapodion or hip IiaC:tun: patients who were male..
ora
r
in
TbeJe. are lIRe lest: die .common . types of chi-lqUIR . ror iDdcpeadencc aad the test rew homogeneity. 11Ie chi-squam .ootine.u-of-ftl lCSI is appropriale when Lhere is a single group of subjeclS and a sinJle wriable of
goodness-of-ftt tell. the ICSl
_________________________________________________________
interest. The hypothesis in this sort of test specifies a precise ubibution of SUbjcclS across calcgorics. Few example. this test could be used to ddennine whc:thcr the ethnic mix of a patient group from a study is dill'erent from the (known) ethnic mix in the general papulation. The chi-squan: lest for independence is appropriale when there is a sin;le group of subjects and two variables of interest. In this case. the hypothesis is that the disbibution ohubjc:cts across calcgories in one variable is the same for (i. e. independent of) the categories of the other variable. For example. this test could be used to detennine whClher the incidCDCIC of a disease is ~Iated to the ethnic mix of a patient group - in other words. whether the propoItion of patients with a disease is the same for all ethnic groups. The chi-sq~ test far homogeneity is approprialc when there are sc,,,ral groups of subjcclS and a sin;le variable or int~. This test is used to dc:tcnninc whClher the distribution of a variable across categories is the same for all groups. The study by Fn:cman el 01. (2002) provides an example of this salt of test whe~ the groups ~ hospitals and intcn:st cen~s on whether the distribution of male and female paticnts is the same at all hospitals. Each of these tests is only appropriate when then:: is no nalUral ordering to the calegories(i.c.thedata~ nominal). If thecaICgories~ordcrcd(i.c. theclalaare ordinal). e.g. when the variable of intm::st is • group. then other analyses should be usc:cL such as the chi-sq~ test for tn:nd (see. for example, Altman. 1991). The chi-square test is a parametric test: it is only a valid test when the expected number of observations in each cell is not too small. (A rough rule of thumb is that the expected number of observations in each cell should be at least S. However, a less stringent crilcrion is commonly used and has been shown to yield satisfactory ~ults: namely no m~ than 25 4Jt or cells should have an expected cell count of less than S. proVided none is less than I.) The test can bceomc invalid when the expected number or observations in a cell is \'CI'Y small anellor when the total number of observations is small. In such cues it can be prudent to usc a continuity correction (see YAlES' CORRECrlON) or, even better. employ a NONJWto\. !.tETR1C' or EXACT).IE11IOI) alternative (e.g. FISHER'S EXACT lEST). The samplin; method should be considered when determining. what sort of chi-square test to usc. It is only valid to test a hypothesis thai makes sense in ~Iation to the way in which the data ~ gathered. In the hospital example. ~ ~ several independent groups of patients (from the seven hospitals) and a single variable of inte~st (gender). In this case, il is valid to pclfOnD a lest for homogeneity. but it is not valid 10 perform a test for independence. Chi-squan: tests only proVide an overall test of a hypothesis. If a chi-square lest is significant. it may be nccessary to perform funher post hoc tests in order to detennine in mo~ detail wbc~ significant deviations from the hypothesis exist.
CUN~AL1R~LS
In the example from Freeman el 01. (2002). had the tesl been significant. this would have indicalcd that the proportion of hip f'ractu~ patients who we~ male was no/the same at all of the hospitals in the audit. This does not specify precisely those hospitals that wen: significantly ditTen:nt flUlll each other. In this case. a I-tesl of the difre~nce between proporlions far each pair of hospitals could be used. but it should be noted that such post hoc tests would be subject to the usual problems of mulliple comparisons (see MULDPLE COr.lPAJlIS(»l FROCEDURES).
When a wriable has only lwo categories. e.g. geneler. the data may be analysed in lenns of proportions rather than fn:quencies. This sihlatioa may be analysed using logistic regression. which allows mo~ complex hypotheses to be tested. An allcmativc and mo~ ftcxible analysis for any number of variables with any number ofcategories is fOlSSON REGRESSION.
Chi-sqU8l'e teslS ~ described in mast introductory statistics texts (e.g, Altman. 1991: Wild and Seber, 2000; Ag~ti. 20(2). PM A.,.u,A. 2002: CtJl~gorit:tlltlola IUlQlysis. 2ndeditiOll. New York: John Wiley & Sons. Inc. AI""... D. G. 1991: Pr«timl slatiJlies/1H' metlicvl "search. London: Chapman & Hall. Free....... C.. Todd. C.. CamIlleri-Female. c.~ I·"~ Co, 1\1I111'III. P., Palmer. c. It., hrbr, M., P.,.., B. aDd Ruslltea, N. 2002: Quality impnwomenl for patienls with hip fract&R: experience from a multi-silc audit Qualil,. alfllSafety in Healtb Care 11.3.239-45. WIld. c. J. IIDII Seber~ G. A. 2000: Chanty mCDUtllers: a jIr" COUI'Je in tlola QIIIllysiJ and inj'ermce, New Yort: John Wiley & SoDs. Inc.
chi-square test for trend Sec CASE-coNrROL SltJDlES Christmas tree adjustment See INTERIM ANALYSIS classlflcaUon and regression trees (CART) See 11lEE-S11WCTUJlED ME1HOOS
classification function See
DISC'RWJNo\NT RJNC'I1ON
ANALYSIS
clinical equipoise Sec E11IICS .o\ND CLINICAl. TRL\LS clinical trial protocols See
JIROfOC'OLS lOR CLINICAL
TRIALS
clinical trials Also known as randomised controlled llials (RCTs). these ~ studies of a medical intervention in which the allacation of patients to the various experimental groups. at least one or which is a control group. occurs by an a/eolor),. or chance. mechanism. Such a study has a number of essential fealu~s in ackIition to randomisation. It is hypoihesis-clriven with an unambiguous ENDPOINT assessed in a way that assures unbiased measurement; it has secvndary 69
CUNI~TR~
_________________________________________________________
endpoints that add cn:dibilily and inlclpretability 10 the primary outcome: it defines its study group in a way that allows logical inference to some defiDllble popuIalion: it uses an ethically and scientifically defensible control group: and a clearly wriuen protocol gemas its pnx:cdun:s. Because the experimental units an: humans. these trials requin: a rormal process of informed consent as well as assurance that the safety or the participants is monitored durinl the COW'se of the study. Randomisecl controlled IriaIs may enrol tens. huncln:ds or even thousands of participants. While the general structun: of the design or such a trial is independent of its size. the number of participants enrolled aft"ects many aspc:clS of the conduct of a trial. Small sinlle-centn: trials tend to collc:ct a lal of data. Multicenln: trials typically collecl somewhat fewer data on each palienl. Withe methods orcolleclion tend to be highly Slructun:d to ensun: axnparabiUty across centn:s. A so-called 'larle simple trial' n:cruits thousands or people and collects a parsimonious amount ofdaladirectcd at a few pointed questions (Vusuf. Collins and Peto. 1984). A classification of trials of elmls or biologics relevant in the regulatory sc:tIiRi categorises them KCORlinl to their phase. PHASE l11UAlS. the first trials of the product in humans. aim to gain a preliminary understanding of the safety of the product and to select doses far further study. PHAsE 1I1RL\LS define the dose more pn:cisely and perhaps collect daIa germane to a prelimiDar)' glimpse at el1icacy. often thmup the use of SUftOIIIle endpoints or biologic mukcrs. In many therapeutic an:as. Phase J and U trials an: Dol nandomisecl. and sometimes not controlled. because their goals can be achieved with Simpler designs. Confirmatory. ar PHAsE W "I'IlIAU aim to test the emcacy of the product. 1)pically. a Phase III trial is randomiscd. Trials pcrfarmcd after the product has bc:cn approved for licensing. whether they an: randomisecl or noI. an: somelimcscallcd PHAsE IV 11lW.S. This section toucia on the early history of the modem mncIomiscd controlled trial. discusses its salient features and describes some or its limitalions. Sevcraltextbooks proVide thoraugh intnxluctiaa to these trials (Pocock. 1997: Fric:cIman. Furbc:rg and DcMeIs. 1991: Meinert. 1998). The earliest modem randamiscd clinical trial studied streptomydn. the first truly ldive drq for the trealment of tuberculosis. Stn:ptomycin. discoven:d in 1944. had shown very promising. raults in unconlnJlled pilot studies in the USA. At the end ofthe Second World War. limited supplies of theelmg wen: made available in On:at Britain forclinical usc. Because of the beJief that the drq had promiSing therapeutic aclivity. a portion or the available supply was n:servcd for patients with the two most lethal rorms of tuberculosis meningeal and miliary disease. The rat could be used ror the large majority ofpalicnls. who had pulmaDlU")' disease. Since it was manifestly impossible to lmlI everyone with pulmonary tubemllosis. the Medical Research Council decided that
the best usc of the small amount available would be to study its effects in a conuolled setliRi. The study that followed (Marshall ~, aLe 1941) was a mullicentn: conlavllc:d trial compariq stn:plomycin to standard bed resL Random numbers plac:cd in sealc:cl envclapcs governcd the trealmc:nt assignment of the 109 palientsentcml. Patients wen: eligible if they had 'acute propasive bilateral pullllDDar)' tuberculosis or presumably recent origin. bacteriologically proved. unsuilable for collapse therapy. age group IS to 25 (1IIler extended to 30)'. The narrow age poup was chosen to limit the number of eligible patients. Physicians raponsible ror evalualing. radiological change with tn:atment WC~ unaware of the batment assignment. Results showed a clear benefit of stn:ptomycin: the publishc:cl n:porl rerc:n to a few or the observed tn:atment ditren:nces as "stalistically sipilicanf. lWo anonymous editorials accompanied the publication oflhis trial. One dealt with the implications of the n:sults ror tubcn:ulosis therapy. The second. entitled 'The controlled therapeutic trial' (Anon. 1941). COIIUIICntcd on the nateworlhy aspectsoflhis study. including the precise definilion of the patient population. the advantages of multiccntn: collaboration to assun: adequate numbers of study subjccls. the need for screening palential subjects so that they conrormed to the elilibility requirements and the wac of the "ingenious system or scaled envelopes' to ensun: that aclwnc:cd knowledge of the tn:atmcnt assignment would DOl inftuence the decision about patient elilibility. Within six years of publication of the SlIqJtomycin trial. another trial dmmatically exhibited the unlRCedc:nled power of the new methodololY. The field trial of the newly de\'eloped Salk. vaccine for the pI'C'VClllion orpoliomyelilis was an enannaus effod. Centrally coonlinalc:d at the University of Michigan by a group convened for that purpose. it involved public health departments in 44 of the 48 states or the USA. Within about a 6-wc:c:k period. rrom 26 April to 15 June 1954. 402 000 childrm ft'Ceived either the SaJk ,·accine or placebo by random assignment. The very llUle numbers were n:quiMd because of the n:lalively low attack rates of the poliovirus in normal papulations of childn:n. The trial showed that the vaccine etrectively n:duccd the incidence of polio and. by provicliRi rough estimates of effectiveness for certain subgroups. sqgcstecl that the effectiveness of vaccinalion varied somewhat acconliq to the primary manifestations of the: disease (bulbospinal venus spinal) and accanlinl to the type of virus recoverai. 11K: comprehensive report describing. this heroic etrort and its n:sults (Francis ~'QI•• 1955) was publi&bcd less than 2 years after the National Foundation ror Infantile Paralysis fint annoullCed its decision to sponsor a formal trial of the vaccine. Since then. usc or the randomisc:cl prospectivc methodology has ~d. Bolstered by subsequent developments in stalistical methodology. the randomised clinical trial is
_______________________________________________________
widely acknowledged as lhe ·JOId Slandard' of evidence for evaluating lherapies. Howcver, because lhc nndomised trial can only be employcci 10 addn:ss a small fracti.. of lhc unanswen:d quellions Mlalilll to medical intervenlions. a variety of methodologies - cxperimenlal and observational. prospective and Mtraspcctive - have beea and wnlinue to be widely usc:d.. The statistical framewcn of a randomised clinical bial involves lwo sets of basic .u..s: TYF£ I BlROR ndc. P-VALIJE and validity; lOWER and precision. 'I1Ie T.""Pf! I error ral~ is the probability that if the tmatmcnts under studyclo noidilTer(i.c. the: NUU. HYPOJIIESIS islnle), the stud)' will show a statillically significant diffen:nce belwec:n them. The P-1Y1111e is the probability under the null hypolhesis ....t data would show by ch~ D diffeMnce as Imgc as lhc diR'en:nc:c obsem:d. Validit,. in clinicallrials MrClS nOl to the c:om:ctncss or the answcriDanyparlic:u1arbial but to the:eXpcclati.. that. under the: null hypoahcsis.lhc data would beIIave ina wuytxlDsistent with lhc pn>spccified Type I cnor rate and that if the sample size wen: large eaaugh.the estimated balment effect would be the lnIe effecL Power. by the same talen, is the pmbabilit)' that the study will show a stalistically silniftc:ant elTec:I or lmdmcnt if the true effect ofbatmcnt is nol zero.. PoweI'is lhen • function or the IJue effect: the larger the effect. the higher lhe power. Precision is a measwe of the variability or the estimated elTcct of_alment. The higher abc sample size and the lower the undc:rIyilll wriability or the measumncnts. Ihe higher the precisi... Results from studies of intervenlions m.y help ph)'sicians decide on the best lhcrapeulic option ror panicular patients. They pennit Mgulatory agencies to approyc products for mmetillland widespn:ad commen:ial distribution. Iac:reasingly. governments. insunmc:c companies and 1I1DII8g~ orpnisations use such studies to help decide which therapies to n:imburse. The use of fonnal clinical data as the basis for these decisions lalS on two obviausassumptions. The first is thailhe studies that f'onn the database have in fact yielded c:om:ICt n:sults: Leo an)' dilTeJaM:lCs in outcome: can with wnfidcnce be attributed 10 dilJeMDCes in Ihe therap)' adminiSlc:n:d. To lhe extent that studies passcss lhis quality, they an: said to have inielfllli validity. 1'bc: second is that the study MSUItS an:. to some dcp-ee at leasl. genemJis.ble to a population of subjects that is bmadcr than the study popuIatioa rrvm which the dDlD an: derived. Studies having this property an: said 10 hayc ~.~Iemtll .Yllitiil,. aad the broadness of a study's extemal validity is one critical meaSIR of the study's overall imponancc. A study lacking internal validil), has no n:deeming value. 'J'be men: pn:senc:e of inlc:mal validit)'. however. does DOl guaranlcc its general usefulness if the saudy has very limited extcmal validity. For example. selectilll D very lightly deftned stud)' population that docs not n:pn:scnt the: Iaqer
CUN~AL]R~~
universe of patients with the same condilion may Mndcr the lesuhs at best nanowly applicable. Alternativel)'. perhaps the interventi.. under stud), is of such technical complexity thai it is only available in D few selectc:da:ntn:s and cannot be applied bJoadl)'. Examplesorsuch problems have occ:UIIaI in virtually every fteld of medicine. 1bc major bmiClS to intemal validi!y n:1aIc mainly to the extent to which lIRe major soun:es of confusion an: pcnniltcd to interfeM with unambiguous infeJaM:IC - bias. confounding and chance (Hcnnckens ucI Buring. 1917). Usc:d in lhe wntexl or analytic clinical studies. 'bias' does not connate moral opprabrium but simply ~fen to any sy&lematic cnor in the design or execution of • saudy thai distorts lhe lnIe n:lalionship belween intervention and outCOnIC. Bias CaD originate with eilher Ihe investigators or the lIud)' subjects and is of several lypes. Sometimes the very manner or selecting a study's panicipanlS introduces bias (selection billS): this is a particular problem. ror example. in C'ASE-CmmtOL STUDIES when the selection of cascs. or ~ ment or the patient to participate. is nol indcpcadent of the chance thailhc patient will have been cxpasc:d to lhe inlervention or inteMsL In retrospeclive studies.. the ability of patients to ~I inlerYealions in the past may be inftuenced by whether or not they have expcric:accd certain medical outcomes (RECAlL BIAS). The history obtained &om a palienl in an)' lIud)' may be skewed by knowledge on the pari of eilher the interviewer (obxr1'er bia) or the study subjecl (subject bitu) ofb nallR ofdae study or what inll:rventions have occurMd or whal outcamc:s lhcy have experieaccd. In prospective studies.. lasses to follow-up oftca occur; nonrandom losses that occur dilfen:atially in one: group or lhe alhe:r may inlnxIuce serious bias. By far the best insurance agai..t the various types of bias in clinical studies is lhc use of. prospeclive. rancIomiscd lIudy design thai keeps both the inYCsIigDlolS and the study subjects unawaM of which inlervenlion the individual subjects an: n.x:civing (the so-called double-blind or double-mtUk~d desip). Certain study designs. such DS abase that compare a surgical 10 • medical intervealion or those WheM the siclc-effeclS of tn:almenl fn=quently ~veal the naturcoflhe therapy. preclude blinding of either subjects or investigators. These cases n:quire the 1IID51 objective c:adpoints avail.ble (e.g. death from any c.use:. if this is an apprupriate mc:asun: of batment effectiveness). W~ less objective endpoints an: the most meaningful meclic:ally. evalUalol'5 wIIo an: blind to the therapy received should assess them. In designs that an: nol praspcctive aDd nndomised. minimisation of biDs poses a major challellle because there arc no good analytic tools 10 cam:cl ror it. Once biDS is pn:scnt. the extenl 10 which il has been climinalcd by sound design and caaduc:1 is always open toqueslion. 1bc primary advaDtageofa wll-aJftductcd randomiscd clinical biaI is thai it n:moves many saun:cs of bias.
71
CUNICAL TRIALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Conroundinl refers to the distortion of lhc association between intervention and outcome by the association of another factor (the mn/tRIIltle,) with both. A factor A is a confounder of an intervention I and an effi:ct E if A has an effect on E independently or J and the use of I depends in some way on A. For example. if one were to try to compare two regimens for leukaemia (sa),. a vigorous and a more gentle one) by reviewing past series of patients lreatcd with each. one miJ;ht find that the agressive regimen perfonncd much beuer than the olhcr. A more detailed analysis, howeYer. might reveal daat the agressive regimen was used pn:fcrenlially in younger palienlS, while Ihc genlle treatmenl was n:servcd for older patients. Age would mnfound any attempt to aJlDpare the two regimens in this simple manner. sinec it is well known that )'ounger leukaemia palients have a bc:lter propI05is than eldc:dy ORCS when treated with Virtually an)' relimen. Note daat. in contrast to bias, confounding is nobodY's faull. It docs not represent c:mxs of commission or omission. but rather is a natural consequence of the often complex relationships amonl the many ractors that determine clinical outcome followiRl an inlcrVention (or. II1C)m generally. following exposures of any kind). In aBSERVAnON."'- STUDIES. when a variable is suspected of being a SDU~ of confoundinl. one allemptsto corrc:c::t for the confounding in either the design or the analysis. A prospective randomiscd desiln proVides the SllOngest protection &pinst confounding because randomisation tends to equalise the distribution or potential or actual con rounders amorq; the various b'eatment anus. Randomisation does this with both known and unknown confounders. While statistical methods arc available for adjusting for known confounders. nothing other than mndomiSDlion can even approach the control or unknown conrounders. The very nature ofdinical investigation permilS the play of chance to deal the investigator a misleading answer. The reliability of a result is in pari a function of the number of patients studied: incn:asing the sample size of any nndomised study. no matter what the design. will dcc:n:asc the probabilily thai the patients studied an: peculiar in some idenliftc:d or unidentified manner. Moret:Wer. an inadequate sample size dearly incn:ascs the likelihood that chance alone can deal a false or misleadirq; resulL A trial that is too small may end up showiRl a diffen:nec where none really is prescnt or. more commonly. may fail to show a difference whcrc a medically important one really does exist. The statistical procedure thai evaluates the degmc to which the obsem::d n:sult is consi!ilc:at with chance is called a lest of statistical signf/ironce. The great vinue of randomisation is daat it removes selcc:lion bias from the allocation of palients to thenp)' and that it lends to equalise baseline paticatduuactc:ristics (actual
or potential confounders) in the various anns of the study whether these characteristics an: known or unknown. ThUs. if the sample size is sufftciently large. any slatistically significant differences in outcome can be aUribulcd to Ihc differences in therapy with much greater confidence than with other methodololics. Finally. mndomiscd In:atmc:nt assignment provides thcon:Iical support for the inference that permits calculation of Ihc probability that the obsc:ncd differences mighl have arisen by chance (the P-value). n.e conclusions or a clinical trial an: usually staled in probabilistic terms: therefCR.. even when a trial has shown a stalistically significant diffcrcna: bc:lwccn two tn:almcnts. there remains the possibility that the results might ha\'e been due to chance. If. however. several independent trials show a consistent effect. the probability that Ihc result can be ascribed to chance is enormously reduced. In such a case. chance plays a far smaller role in inlcrferinl with a sIrorq; eonclusion that it docs in most other aspcc:1S of daily life. Various features of the randomiscd controDed trial. when laken together. form the most reliable roule towards Ihc documentation of a causal n:lalion between an intervention and a clinical outcome. The prospective collection of data. wida suitable measures to ensure mrrcclneSs and completene~ serves to minimise ).IJ5SJNO DATA. misc:1assific:d data or emmcaus dDIa elements. The usc of blinding, whenevCl' feasible. minimises the presence of observer or subject bias. The use of a PUCEBO. when medically appropriate. allows isolalion or the true effects of the tesl therapy from those alIc:ndant on the Ihcrapeutic setting in general. The conlemparanc:ous relationship of lest and eonlrolgroups eliminates sc:cularchanges in patient selection or ancillary therapy that might affect results. Finally. as already described. the usc or randomisation lends to equalise the distribution of confounding variables. known and unknown. among the intervention options and proVides a formal basis for the application or statistical infen:nce 10 the data. or coune. a mndomised controlled trial is not the only route to truth. Few would quibble with the claim that appendectomy cures acute appendicitis. that penicillin cures pnc:umoc:oocaI pneumonia. that mdiothc:npy cures early HodIkin's disease or that the use of pamc:hutcs saves people who jump from planes (Smith and Pcll. 20(3). althoulh no randomised data exist to support any one of these claims. In the prcscnec of striking clinical elTms. formal lesting of a clinical hypothesis with a randomised trial may be unnecessary. Neither docs a randomised trial nc:ccssarily always give the right ansWel'. Ukc any other scientific experiment. a clinical trial is fallible. CaRless data collection, inaccurate observations, inadequate sample size. faulty tc:chniques of infen:nce or simply the play or chance can n:sull in misleading or frankly c:noncous mnclusions.
_________________________________________________________
Adherence to high standards of design. conduct and analysis. as best one can judge from abe published report. tends to bolster the cn:dibility of the trial The larger the difl'emlce in outcomes between control and test therapies. the larger the sample size (or in trials that COWlt aocnts, the more events) and the smaller the P-vaIue of the comparison and the more precisely eSlimaled the differences, the more cmlible abe n:suIts. In rancIomised controlled trials. as in other experiments. the design should address the question at hand. A clear medical question leads to a crisp design. while a fuzzy purpose may lead to an inadequate amount of data. conceotration on irrelevant details and failum to answer useful questions. In practice. many trials ha\'e founden:d because their vaguely articulated hypotheses havc spawned inadequate designs. One should think the scientiftc hypothesis underlying the clinical experiment makes scnsc, for the more biologically plausible the hypotheses the mCR likely the results will be believable. T~ is often much subjectivity hen:. however. and what makes eminent sense to some may be implausible 10 others. One may. from time to time. be raced with results in clinical trials that seem to havc no plaUsible scientific suppon. Many of the most impmtant clinical trials have been rounded on hypotheses about the pulalive mechanism of action of abe intervention. It is gratifying when such trials tum out positive. since then the result is alRsistent with the proposed mc:daanism. In the rcal world. however, it is often exllaOrdinarily difficult to usc clinical trials productively as probes for the undcdyin; mechanism. A randomiscd controlled clinical trial has a protocol that discusses the pwpasc of the trial. describes the procedures to be used and defines the endpoints as well as the criteria used to define 'success'. It justifies Ihe sample size. the study group. the strata - if aD)' - and plan for foUow~p and the methoclto be used for monitori~ safety during the course of the trial. Regulatory bodies ollen n:quire that they review protocols for trials performed for Ihe purpose of licensing a product or extending abe inclic:ation in the label. The complexity of the stud)' and ofthe pralocol depends on a number of factors. A small single-cenlR trial Ibat is slUdying a shon-ccnn intervention for symptomatic miter may have a brief protocol, while a large., multicentrc study measuring a complicated endpoint in subjects followed for sevenl yean may RXluim a much mon: detailed pnilocol. Whether the protocol is simple or CXIIIIpUcated. it should be written clearl)' and essenlial pIIIts should be easily identified. Other documents ma), provide a rurther description of the slUdy. For example, a data analysis plan may pn:sent the details of the statistical methods to be used. If the study has a data sarety monitori~ commillcc or an endpoint commit-
CUN~AL1R~LS
tee., their charters will describe them and abeir roles. The InlCrDlltional Conference on Harmonization. in ilS guidelines for Oaod Clinical Practice. describes elements of a wellconstructed protocol. While randomisation produces study groups with equal expected disbibutions of both measun:d and unmcasun:d characteristics at baseline. randomisation by itself cannol ensun: that the results of a study will be unbiased. The unbiasedness confem:d by randomisation n:quires a statistical analysis that classifies people according to the tn:aIment to which they wem mndomised. not the ~nt they actually n:ccivcd. Analysis that preserves abe randomisation is called inlenl-lo-trl!Ql aDalysis. A perfectly conductc:d clinical trial would have no problem analysing the groups as they were mndomised. for each person would n:uive the assigned therapy, each would adhen: to abe protocol and each would provide a measurement for the primary endpoinL Even rigorously CJCCCutcd clinical trials. however. ran:ly meet this ideal. Many participants in trials adhcn: incompletel)' or not at all with abeir assigned n:gimen. The prinwy endpoint may be missing for some participants. Thus, while Ihe principle of performing statistical analysis according to the randomiscd assignment is central to producing an unbiased n:suIt, in practice one is oRen forced to violate those principles. One needs reasonable approaches 10 assigning outcomes when actual observations arc unavailable. ORen investigators an: templed to analyse data from the subset of participants who complete the study according 10 the protocol. Such analyses can be subject to seven: selection biBS (Lamm el al, 1981). Sometimes an analysis oIher than the intent-to-tn:aI analysis is appropriate. For example. in studies of infectious disease whem treatment is given presumptiveI), hefon: determination of the infecting orpnism. the primary analysis may include only those patients who have an DIIIanism apilWl which the agent being studied is likel), to be effective. Such an analysis leads to an unbiased assessment of the effect of Ihe intervention on people infected with the target cqanism. A clear objective along with focused hypotheses will dri\'e the choice of endpoints. When the endpoint is unambiguous. ilS deftnition and measun:ment is simple. The mCR subjective the endpoint. the mon: need for independent 8S5Cssment. For example. in unblindcd trials of cancer aberapies when: the endpoint is a measurement of change in tumour size BS assessed by a CT SCaD. independent n:aders who do not know whether the subjects are in abe treated or control group may be n:quin:cllo ensun: unbiased measuremenl of size of Ihe tumours. lfthe endpoint is a report by the subjecl (e.g. a scon: on a scale measuring pain). abe trial may use sc\ocral outcomes to support and conuborate the prinuuy endpoinl. Control groups an: essenlial to the design of a randomised conlmlledtrial andlhc infen:nccsthat CaD be drawn fromthcm.
73
CUNICAL VERSUS STATISTICAl. SIGNIFICANCE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ 1'he conllUI can be a placebo. usual CIR, a diffe~nllhempy or usual care plus a placebo. In selecting a atntlOl therapy. the invesligalOrS should choose one lhaa is ~evanllO the queslion at hand and should allempL insofar as possible. to choose: a controllhat allows blinding. The study group in a randomiscd controlled trial is necessarily helCrogencouS. Sometimes. investigators want to ensure equal allocation of lrCabnent and atnllUl in spcc:iftc subgroups. In such cases. they define slnla and randomise within strata. For example. in a sludy of p~,,.ention of heart allack, one might sInltify by OCCW11:nce of a prior heart allack. 1be choice ofwhclher or not to stratify should depend on the size of the sludy and the relationship of the particular variable to the outcome of inleR:Sl. If a variable has the potential to be a strong confounder and the study is small. then stratiticalion may be useful. At the end of a study. one might want to analyse the data by subgroup. either those IIud defined thenndomisation strata or others. The purpose of such analyses is sometimes to assess whether the treatment is efTc:ctive within specific subgroups of the population and sometimes 10 assess whether the elTc:ct of lrCatment varies by subgroup. Such analyses should be undertaken and intcrpreted with great caution because lrials arc rarcJy large enough to suppoIt reliable infen:ooe within subgroups (Yusuf el at.. 1991). While nndomised controlled trials ha\'e great advantages over other designs in tenns or producing wlidity inference about the effect or treatment, they have severe limitation in the actual selling of clinical investigation. A prospeclive trial is often large. complex and expensive, requiring the coordination or a small DImy of participants and many sites spread over counties, countries or continents. There is no realistic prospc:ct. the~ro~. of employiRJ: the nndomised clinical trial in more than a small proportion or the uruaol\'ed questions in therapy. Within spc:ciftc disease categories. the restrictive ELtOIBIUTY CRITERIA of many trials mean that the results of the study will be directly applicable only to a narrow segment of the total patient population with that disease. Sometimes. a trial would need to be very long to yield clinically meaningful results. The polio vaccine trial realised its scienlific goals quickly because the effect of the vaccine could be determined oyer the few months following immunisation. It is another matter entirely to assess the impact of a screening intervention or IRvcntive drug on the incidence of heart attacks or canc:cr; here the nc:cessary follow-up time may be measured in decades. Even for bials that determine the endpoint or individual treatments quickly. slow accrual may make the study take so long to complclC that the therapeutic question it poses is no longer or inten:st by the time the answer is available. Some subspccialties have particular problems with the use of the nndomised controlled trials; for sllllery. in particular, the acceptance
and application of the randomised trial has been slow. Finally. a randomised controlled trial may not be usable at all if the compeliRJ: interventions ha~ been in use forenoulh lime that the atliwdes of physicians are fixed: physicians are understandably reluclmllto randomise palients ir they think they already know the answer. The decision to perform a randomised controlled lrial n:quin:s a sizeable commitmenl in time. money and effOI1: the dc:cision nolto perform one entails some sacriftce in the quality of e\'idence provided 10 physicians when they arc making choices for thc:npies for their patients. A combination of medical, societal and financial forces delCnnines those questions that become the subject of randomised aJlllIUlied trials. JW AlIDa 1948: The controlled ~utic: llial. BntUlt Met/itol Jourlftll 2. 791-2. Franc" Jr, Tolf III. 1955: An cvallllllion of the 19S.
poliomyelitis vaccine trials. Summary n:pol1. Nn~rimn Journolof Pub/it Hmllir 45. 1-5 I. FriIcIInan, L M., Furbeq. C. D, lind DeMtts. D. L 1998: Fundanlerllal.s 0/ dinicallrials.3n1 alition. Heidelberg: Sprin&'Cr ~rtag. Heanebas, C. H. and Burl. J. Eo
1987: Epidemiology in medicine. BosaoD and Toronto: Linle. Browo and Co. Laaua. G"" Gl. 1981: Inftuence or tn:abncRt adhen:nce in the coronary ~ ...,ject. Nett' Engkmd Journal of Medicine 304, 612-13. ManbalI,G.,BlaeIclac:k,J. W.S.,Camena,C""aL 1948:
Streptomycin ImIbnc:nt of pullnOlllr)' tuben:ulos~ A Medical Rcsean:h CouDCil iam:stiplion. BriJisli Mediall JOUfIlQI 2. 769-82. Me_.... C. 1998: CliniEol tria/.s. NC\\' York: Oxford Unh'ClSit)' Press. Pac:ocl. S. 1996: C/inim/lriols: a procliEol approocIJ. New York: Jobo Wiley" Sous, Inc:. Sad.... G. C. and Pili, J. P.2003: Paracbute use to prcvcIIt deatb and major tnwma n:lak:d to p8\'ilaIionai cllalleqc: systematic n:vic:w ofrandomised controlled trials. Bntislr Medical JoumaI327. 1459-61. YuuI', S., Cal..... R. and Plto, R. 1984: Why do 'We need some large.. simple randomized mals? Stotistits in Medidne 3. 4O!J-20. Y......, S., "'..... J., Prabst8eId, J. aDd 1)roIer. H. 1991: AnaI)'sis and intcrprdalioa of treatmeDI c:frects in subpoups of palieDts iD rucIomimI clinical mals. Journal of lire Anreri(QII Medical Ass«iation 266. 93-8.
clinical versus statistical significance
This pair oflCnns is often confused. being mistakenly considered as interchangeable. when. in reality. neither implies the other necessarily. One oflhe UMC:Cessary c1imculties of statistics is that we arc using ordiruuy English words. such as 'normal'. "confidence' and "population', in a lCchnical way. If only our founders had followed the example of the anatomists and named everything in Lalin. It islOo late now and we arc stuck with our English lCrminology. or all the words our p~ cessors appropriated. the one that must cause the greatest confusion is "significant". 11ac Shorler OAford English DicliollOI)' gi'les Iwo definitions of "significance': "the meaniRJ: or import of somelhiRJ:. meaning. suggcsti~ness' and "imponancc, consequence'. The stalisticm lISaIe n:lales to the first interpretation. If a diffenmce is significant in a sample, there is evidence that the
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CLUSTERANALYSIS IN MEDICINE difrcn:ace exisls in the population that the sample represents. Hence the difference has meaning beyond the individuals who make up the sample. By clinical significanc:e. we mean thai the different'lC we havc observed is important: that. for example, it implies that we should cluiage clinical practice. Thus. this usage ~Iates 10 the second inlelJRUtion. meaning thai the significant difference is importanl. If a diffe~nce or ~lationship is slalistically significant. this implies Ibat we havc evidence that it is real. existing in the larger population. but not that it is important. having implications for clinical IXBClice. Concluding that a difference or relationship is important depc:Ads on its magnitude. together with nonslalistical factars. so that we can decide whether it is big enough to inftlJC:llCle clinical dc:cisions. For example. in a large clinicallrial with 2000 subjecls in each ann. adiffcmlCe of I mmHg in mean diastolic blood prasurc would be statistically signiftcant. As this was a trial. il would be reasonable to conclude that the difference was ~I and thai the In:alments had slightly differenl effects on blood pressure. Yet it is unlikely that such a difference would inftucnce treatment decisions. It would not be important and so not clinically signiftcanL Conlnlriwisc, a small study might produce a nonsigniftcanl diffe~nce that is quite 1aJ];e. We could not conclude that the difference was n:al. but we might think it important enough to carry out another lrial. Statisticians cannot expeClto approprialc ordinary English words and then demand that their use be restricted. HO\\'evu. the use of "significant' in resean:h ~pol1s to mean something other than statistical sipiftcance can be potentially misleading and it makes sense to a\'Oid iL In its instructions to authors. the lmrl:el asks that authon "avoid non-tcchnical uses of Icchnicaltenns in statistics. such as .•. ..significant.. '. JMB lSec also CJlI11CAL AJIIIRAISALJ
cluster analysis In medicine The term cluster analysis coven a very wide range of methods for disc:overing groups in multivarialc data. II is distinct from classification techniques such as DlSCRo.oNANl' FUNC'IKlN ANALYSIS or cla$ijkotion tmd regre:r.sion tree (CART) (sec TREE-STRU("J1JRED ),fE'I'IIOI)5) analysis. Thescclassify individuals into groups thai have already been identified wheras cluster analysis looks for groups within the data. Oenerallcxts on cluster analysis ~ by Everilt., Landau and Leese (200 I) and GonIon (1999) and lCI::hnical developments arc often published in the Jownat of Claslqicalion. HO'MWCI'. then: is a vast lilelature dealing with cluster analysis in various guises and this is necessarily a \'el)' brief review or the m~ Widely used methods. Many specialised methods dealing with particular subject malten have become sepanlted from mainstream cluster analysis, oRcn using their own terminology. and they may be classifted under other headings such as
pattern n:cognilion. AR11FICIAL JNl'B.LIOENCE or ,"TA MOONO IN MEDICINE.
In medicine. the cases to be clustered are generally people and the multivariate data describe various aspects of their clinical. psychological or sen'ice use sIalus. However. other lD1its of analysis can of cowse be clustered. e.g. hospitals or health authorities. and cluster analysis can also be used 10 group variables. although this is less common. The data may be in the form of aUributes. such as ethnic group. or continuous measurements such as blood pressu~ or a mixture of both types may be analysed (mixed mode data). Thcobjccti\'e of the analysis is to nnd subgroups or people who arc relatively homogeneous with respect to thescclwactcristics. The reason for performing the analysis might be administntive (e.g. to define SInda in a llUl'\'Cy sample) or. IlIOn: usually. it might be ~Iatcd to a resean:h question (e.g. 10 identify groups of people with a common gene stRICture). (A review of metbods used in medicine is given by McLachlan. 1992.) In the most widely used methods. one typically pmceeds by choosing a measure of the proximity bctwccn cases in tenns of the multivariate data (the tenn "proximity' coyers both similarity and distance. either of which may be calculaIcd). Next. an algorithm for fonaing clusters is applied to the matrix of proximilies. The investigator usually has 10 decide on the number of clusten forming the ·solution'. i.e. a partition of the dala based on a particular choice of proximity measure, algorithm and number of clusters. The choice or number of clusters is particularly difficult. since the~ are few reliable formal tests and the investigator may have 10 consider a range of solutions Ihal seems reasonable based on the subjecl matter. The robustness of the final solution can be tested in a number of ways. These include using alternatiye clustering melbods. using 3pIil-_" methods on the data. the detection and exclusion of outJien or inftuential cases and validation against external data. FonnaI hypothesis tests of the NUlL HYI'OTHESIS of absence of clustu SlnIcture are theoretically possible but arc nucly applic:d. The definition of proximity between indiYiduals depends on the type of data and the relative weight 10 be placed on differenl variables (Everitt. Landau and Leese. 2(01). For example. binary attribute data may be wcled as series of ones and zeros. denoting pn:sence or absence of an atlributc. In the case w~ each category is of equal weight., such as genderor white/nonwhite ethnic group. a simple matching coeJJicienl (the proportion ofmalchcs betwec:a two indiYiduals) could be used. However. if the attributes we~ the prese~ of various symptoms. proximity might be more appropriately measun:d using the asymmetric Jaccard coefficienl. based on the proportion of matches where there is a positive match (i.e. ignoring joint negative matches). For continuous data. the Euclidean tlistllnce betweea individuals i and j.
75
CWSTER ANALYSIS IN MEDICINE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
r~
(Xii -."Cjt) 2] 1/2.
w~ P is the aumbel' of variables.. is
~ used (applied 10 binary cIaIa it is Ihe same as the: simple matching caellicieal). Far mixed IIIGde data. Gower's ctJe/-
Jiden' can be used 10 aJIIIbinc componc:als or cliSlancc rrona either attribute or continuous data. aIlc:r flnt scaling the continuous data 10 a 0-1 raDJC. Many allCmative pmximity IIIC8SURS haw: been prapascd to deal willa specialised types of data; e.g. in lencties binary matches may be assigned ditren:at weighlS dc:peading on the: part or Ibc genetic sequc:ace fram which Ihey arise. Hicrardlical algorilhms are possibly the IIIDSt Widely used of general purpose cluSlering mc:chads and ~ included in most ST.'1ISI1CAL PACICACES. 11Ic:y use a heuristic algorithm succ:cssivcly 10 join or divide: cluSlen on the basis of their proximily (thus beilll reramilollSqglamc:raLivcordivisive methods respcclivcly). TIle methods differ in the: way in which the interc)uslel" proximilies an: calculated from the inlainciiviclual diSlanc:cs. Sin&le linIuJ,e. rar example. uses the proximity between the closesa individuals in IWO clus1e15 to bejoined. whereas CfIIIIplele linlcoge uses the most dislant. Wtutl's method is another popular agJomeraIive method that joins clUSlen on lhc: basis or an emil" sum-of-squan:s crilerion: unlike lhe: two linkAle meIhocIs lI1CIIIioned it rc:quiaa the raw cIata (rather than jusathe proximity maIrix) to be available durinl the clUslering prvcess. Divisive meIbods ~ less commonly used than Blliomenlivc methods.. except perhaps abase specifically desipc:d for allribUle data. An example of the Iauer is the IIIOIIDI/relk di...isive method. which divides the sample a«ardina;1O the value ora single altribule aI cadi stage. the: allribute beina; chasen so as to caule the most homopnc:aus paups allhal stage. PtllytMlic tlivi8iFe methods divide ac:canlinl to a number or variables considered loIe1her. 801h ag)omenlivc and poI)'Ihetic divisive: meIhods ICnc:rally pnxIuce a IRe: diapam. or dendrogram. which shows lhe: process by which cues have been joined or diviclc:d. and this can be used to SUIPSI ahe numbel' or ciuslen present (by examinilllthe jmnps in the pmximityal which cluslers an: joinccl or divided). Here we can ICC thai a small dendrogram is formed usinl single linkaa;e from the rollowilll malrix or proximities (the proximily belWecn case I and ilself is. of course. 0.0. and belween cases 1 and 2 it is 2.0. etc) (the figure shows Ihe relaliODship in diagmmmalic fashion): A slUdy of pc:opIc: with -ina disordc:n made by Hay~ Fairburn and DoD (1996) illusbales the Ule of two of the sIandanl hieran:hical mdhads mentioned and same or the robusIness checks thaa can be made. The rationale ror Ihc:ir slUdy was that: ·Clinical experience. however. inclicatcs thai a subslantial nmnbc:r of those: who pn:scD ror tn:aIment or an ealilll disorder do DOl fulfil diapaslic crilcria ror eilher
Case I 2 3 4 S
0.0 2.0
0.0 S.O 9.0 1.0
6.0 10.0 9.0
0.0 4.0
s.o
0.0 3.0
0.0
Partition Members I
5.0
P5
(12345)
4.0
P4
(12). (345)
8 3.0
_.... P3
!;
I Q
2.0
-"-
P2
[1 2). (3), [4 5) [1 2). (3), [4L (5)
1.0 0.0
1
2 3
4
SP1
[1), [2], (3). (4), (5)
cluster .....,...In medicine A dBndrogtBm prodJced using single linkage applied 10 a matrix of pairwise disfllnces. A sequence of paffillons P1-P6 is prodJced aocotding to the minimum distance between cases fir cIustets 10 bejoined. cas. 11111d2jo1n ftst then 4 and 5; case 3 joins the 4-6 cluster IIIJd finally aI cases 1118 joined
orlhc:sedisonlcrs.... The aim oflhc pn:sc:nt saudy was to clc:rivc an empiric.y based scheme: for classifying those with n:cur~n1 binge ealing.~ The daIa we~ the finI seven principal compllllCllls based on 22 ilc:Ins from a QUES'IIONJWR£ measuring e:aliDg disonIc:r behavioun and altitudes. wants meIhod was usc:d. with aJIIIplele linka&e as a check. ainical juctp. ment. inspecIionoflhe dendlOpamand rannallc:sts wem used 10 determine the numbc:rorcluSlers.1heanalysiswas~ withauttwoOUl'UEl.S and raa..llnc:ss was examined usiq a 75 .. 5Ubsample. TeslSofconslrUct w1iditywc:reperformeclusilll wriablcs external 10 ciUSlerina; lind pnxIictive validity was assessc:cl in tenns orits suc:a:ss in praliclilllthe lime course: of the iUncss comparailO using standard diapastic criteria. Partitioninl mc:Ihods divide Ihe dala inlo a single partition (rather than a series as in hiCl1ll1:hicai methods). Ihe number or clusleI'S beina; specifted in advance. They iteratively lRSSip cases to clusters and recompute cluster ccnlla. 10 as 10 optimise an objective runction such as within-cluster variance. One popular melhod is k-1IIfiIIIr., a non-panunc:lric method is parlitioning tll'DlIIIIl nwoitls (PAM). 'l'hese methods usually need to have an initial partition lium whieh to
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ClUSTERANALYSISINM.EDlCINE sbIrt Ihc: pmccss and n:qUR the whole.dataset to be available
during the piocess;. TIle KoiItmm or .II-orrIlilUin& IIUIP (SOM). a typeofNllJRALNI!IWORX that sua:cssiw:ly lUISi,ns cases 10 ciUSICIS. isanexamplc or .. 'online' melbad. i.e. one when: cueS an: Iaken one at aUme and &ixac:ntai' toclustcr ·cenln:s. The"willllin&' (closest) ccaR is moved the cae and Ihe praccss conIinaes wilh Ihe neJtt case, recyclin& casesunlillhcsyllemisstabJc.Thismdbodisquitesimilarto k-means. but docs not need iIlI cues to be 'aailable and CaD 'thcld'CR capD wi" much a.rpr clatalidS In additiOD to methacIs usiDI heuristic aJplhms. a IIIIID-
loW"
bel' of SD-C~ m_l. . .d melhads has been developed. The maclellhatundc:dic:ssuch mc:Ihods is usually that lhedata an: a IIIIIIpIe Iium a FDII1'E MIX'RJItE ~U1IDN (sc:c: McLaebhm and Peel. 2000). Far catcarical dala,1hc: popUlations could be multinomial: a mc:Ihod basc:d Oft Ibis usumption is ltJIlfIII ~/II.D 1IIfII/y8is (see Eftritt,. LandaU and Leese, 2(01). For continuous - . a mixhft of MUIl'IVAIlIA1E NORMAL DIS1IUISU11qK may be~. Estimatin& the mullivariate normal ..,..neters. ancIthe mixing piqJCJIlions by r.tAXDIUM UlfllIIOCD E$1IMA1ION CaD be a clifllcuJt computlllional pmblem. espc:cially with small ..,Jc:s. However. the use or clusiftcalian liblihoacl methacIs (which involve estimBlin,
clustc:rmembc:rships llaleciasindicalorvariablc:sndhc:r Ihan. i:sIimating mixiD& pmporii~s) has simplifted this Iask IUId made clustering basc:d _ multi_ _ nonnaI modcJllIIIR wielely ~ailable in standanl softwan: paclcaps. 1mpIemc:nlalion ~this mc:Ihod n:quires iI spc:cific:alion of bow lim and sIIape (spherical or ellipsoidal) an: IlllUmc:cI 10 varY. The sc:cond ftpnshows a tbrec>coaapoacnt mixtlft idc:atilc:d by fitting a mix~ or multivariale DCIIID8Is using classillc:alion lilcelihaad and iHustraIcs Ihc: icsuJts of makin& thc:se choices far a padic:uIar datasd. 1'III=~ID8DY meahads thai cIoDOt fall iDto Ihccatcgories menliaaecL Porexample. injiu=j mc:thacIs indiviclual."ve a grade orweiPt orcluster mc:mbenhip fcirclifl'cnntclusters· u apposed to crisp meIhods. ~ cues an: deftailcly IIIIIigncd to one duster. Modd-basc:d melhads thai produce prvbabdities of mc:mbenhip oflhe clustas fai'c:ach case can be IqIIIdcd as fuzzy, but Ihe~an: also fuzzy mdhods that do DOl ~Iy on a prababilillic ~I. Some methods CD allow far orerltlppilrg clusters (ovcrl., cliffiRnl concept 10 iD that easeS can beIon& 10 man: ...... anc: cluslc:r
bela,.
fuzziness.
limultaaeausly). .Other inedlads can clUSlcr cues and v8nab1c:s sunullancously..- AD example is IW,Q,t:/Iiclll cilu:ws (nol to be
' 00 ,.
,
[
-
100
2(1 1~1)
I'D
100 ' 20 140
PO
cI..ter ....,... fn medlcfne Thtee IIfOIlP soIuIIons .from applying .-doIIs fotms 01 clastilllclJllon I1JIJKmum IiIfBHhood to 8 sample d 500 VOKeIs from tMRllmaginfI data: (8) an 8 pIiod clllsslllcation; (b) dustenJ IJI!SUIIIinII samtl-size spheticIII cIuIIIets; (e) cIusfets assuming samtHizfI ~ cIustets; (d) cIustets IIIIowtJtI ~ cIusIfJts 01 tIIIetenI sizes
77
CLUSTER RANDOMISED TRIALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ aJIIfuscd with the IlIOn: general term "hieran:hical methods' as described earlier). This is a method appropriale for attribute data and relatively often used in psychological or psychiatric applications. AnoIher method for cluSlcrilll both cases and variables is dired dDI" c/uslerillg, which involves lUJl'anplllthe rows and wlumns of the data mDlJix so that cases that are similar in tcnns or variables (and vice versa) appear next to each othCl'. Arabie. Huben and Dc Soete (1996) give a general n:view of saandard and nOMtandani methods and individual entries in Ihis volume describe some of these in IIIDIC dewl. TwolypcS of medical dala that may need special In:atment. because of the size and complexity of their t)'Picai datasets. an: genetic and imaging data. Ocne expn:ssion data produced by m;croorro,'s (solid surfaces containing many. often th0usands of. target genes against which genetic samples are allDparcd) are characterised by ~'Cry large datasets and also include a lc:mporal dimension if the samples from the same penon are taken at dilTerent time points (see MJCJIO. AIUlAY EXPERIWENl'S). The correlation Ihll this induces is somcIimes modelled by including an Qutoregressire component in Ihe analysis. Medical images. e.g. from functional rnapetic raananc:e imaging. are also often characterised by large datase15. Funhermore. in addition to a temporal dimension, they may also exhibit :qHIlial OI,'ocorreIQ'wn due to the physical contiguity or the measurements (see STAnmcs IN IMAOINCl). Oenetic and imaging datasets often n:quire melhods not available in standard soRwan: and a numbel' of wcbsites are devoted to this type of specialised analysis. ML Arable, p.. Ruben. L J. aDd De See... G. (cds) 1996: Clustering ond dlUJijkolion. SiUl1IPOR= World Scientific. EftlItt. B. s., randili. S. IIDII LeeIe, 1\1. IIDII StaId. 0. 20 II: Clwler QllQlysis. SCh edition. Wiley. Chichester. Gordaa. A. E. 1999: ClosJijiralitm. 2nd edition. New York: Chapman and Hall. Hay, P. J., FaIrIIura. C. G.IIDII DaD, H. A. 1996: The dauificalion of bulimic aling disorders: 8 community-based cluster analysis study. p$)Y:Jro. Iog;ml Met6cine 26. 801-12. Mr.........n J G. J. 1992: Cluster aaalysis and rdaIed techniques in medical raeaR:b. StoliJliml MellrodJ in Medical Remuch I, 27~. Md..IcIaIIn. G. J. aad ..... 0. 2000: Finile mixlIITe models. New York: John Wiley & SoDs. 11K'.
cluster randomlsed biale These an: Q.INIC'AL 1RW.S in which groups or clusters of individuals an: randomly allocated to In:llments. The dilTen:ace between cluster randomised trials and individually randomised trials is Ihll in a cluster lrialthe main unit of randomisation is not the same as the unit on which the analysis is canied out. Thus the unit of mnc:Iomisation may be as a group of people such as in a town but the outcome will be the behaviour of people in the towa. 1bc: intervention is often aimed at and delivc:n:d by heallhcare professionals. such as educalion to modify their treatment of
patients. bul the elTectiveness of the intervention is assessed in lenns of the outcome for the patient. In conlnlst. in indiVidually randomised trials. both intervention and outcome an: aimed at the same penon. Useful refCIaICes an: Donner and Klar (2000). Campbell. Donner and Klar (2007) and Hayes and Moulton (2009). The main reason for using a cluster trial is fear of mnlaminDlion. This OCCUR because subjects in the same unilor beated by the same heallhcare professional are likely to receive the same intervention. Thus it can be 'lery difficull to ensure that subjects in a control group do DOl receive: al least some of the intervention if they are physically in the same unit as the lreated subjc:c:ts. It may be difficult for hcalthcan: professionals to switc:h from one style oftn:atmcnl to another or subjects may compare notes on the treatments they have received. Th~ an: many different fealUres associated with dUSler nndomised trials and some of the statistical aspc:c:ts were first discussed by Comfield (1978). The main feature is that patients Ircated by one healthcan: professional tend to be more similar than those belled by different healthcan: professionals. If we know by which doctor a patient is being treated, we can predict slightly better than chance the performance of the patient and thus the observations for one doctor are nol completely independent. which is the usual assumption for analysis. What is surprising is how even a smaD wlRlllion can gn:atly affc:c:t the design and analysis of such studies. Cluster trials can be divided into those with a cohort design. in which patients an: foU",,'cd up over time. and crosssectional design, in which the patients Il baseline are not the same as those in the follow-up. A cohort design would foUow uppalients after treatmenl. but a public health campaign might. adapt a cmss-sedianal design in which dill'erml individuals are questioned before and after the intervention. say a local radio campaign to mluce drink driving. It is helpfUl. when planning and analysing a study. to have a model in mind from the slarl. We will consider a cohon design in general wnclice in which the same patients an: foll",,'ed up over time. For continuous oulc:omes Yil for an individual j in practice i we assume that: )'jj
= II + =; + ord; + /lxij + Eij
(1 )
where j= I ••..• n, = the number of patients in practice i and
;=I•...• N =total number of pnactic:es. Here :, is assumccIto be a nndom variable wilh £(=1)=0. Var(:;) = 01. and reRects the ovenll elTec:t of being in practice i. r is the additionaleffc:c:tofbcinginoneofthetreabnentarmsrelative lo the other whcle d,takes the value I ror one treatment ann and zcro otherwise and ."11 is a l'CCtor of the individual level (or practice le\'CI) covariates with repession coefficients fJ. We assume Var (EtI)=~ and that and EIJ are indepe.....l and thus Var (YtI) = + It can be shown that when a model is fitted that ignores the STANIlARD ERROR of the estimale of/I
cr 01. =,
=,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CLUSTER RANDOMISED TRIALS is too smaD and thus in gencnl one is likely to increase the
data; rar the laUer. one analyses the proportions in eac:h
TYP£ I ERROR R.O\TE. One fealU~ or Ihc model as written is that both 0 2 and oj am assumedconslant and independent ofthc lleabncnt effCCl,
cluster and In:aIs the proportions as continuous measures (as. rar example. using a pain:cI t-test). HoWC\'cr, then: an: a number or problems with a clusterIcvel approach. The mainooc is thai il does not pmpcrly allow rar patient-level cowriatcs. It is unsatisractory 10 usc cluster avcrqcd values of the patient-Ievcl covariate•. A IWO-slalc approach would be to use convenlioaal methods to adjust rar covariatcs.. ignoring cluSlcring, and then apply the cluster methods to the adjusk:d outcomes. Howcver. fitting equation (I) docs both stages simultaneously. The cluster-Ievcl method is also possibly ineflicient since the number of DEOJlEES OF RlU.IXW ror any practice-level comparison is constrained by the number of practices and the method taItc:s no aaxMIRl of the number of patients per practice. The madel given in equation (I) is tcnned a RANDOM EFRrTS MOOEI.. a two-stage multilevel model or a milled model sincc it contains both random cffects (%,) and fixed cfTects(cl,). The main method for flttin& these: typc:sormodels is by MAXlMW UKfl..DIOOD and Ibis is available in a number or paclcaP's sum as MIWin. SAS Pmc MillCd. STATA~ Rand Splus. Some of the methods n:qui~ distributional assumptions such as nannality or the betwccftoClustcr random effect =#0 which can be difficult 10 verify cmpirically. particularly with smaU numbcrsorclusters. A fUJther ~flnemcnt 10 madel filling is to use a technique known as Jatricted maximum likclihood (REML). This method is userul for c5limalin& variancc components because the usual maximum likelihood cstimates an: biased (sec mMPONENTS OF VARIANCE). This proccdure is available in SAS Prac Milled and MIWin. Tbesc: methods cstimale the paramelcrs from a C'llIs~r-speC'fJic model and try to estimatc the effect of the intervcntion wi,hin clusters. A rathu diffe~ method of estimating the parameters uscsOENER.WsmSSmlA11NO EQUATIONS (OEE). which provide valid estimates of treatment effects cven if the inlnCluslcI' com:lalion is nul pn:ciscly specified. Sincc it is an approllimate method it requires II1CR than 20 clusters to givc valid cstimates. OBE cstimates parameters rrom a populQtion ar nrtITginDi model thai tries to estimate the effects on al'erQgl! over clusters. To ell plain Ihis it is easicr to seleci an example outside clinical trials. Suppa5e we had patients. clustered in some way, e.g. in ramilies.. and we ~ interested in the risk of high blood pn:ssure ror stroke. A marginal modcllooks at the risk of intervention people with high blood pressure. aJIIIpared to low blood pressure, on aJ'erage.lncontrast. a cluster specific looks atlhe avcrage of risk of people with high versus low pn:ssure Jvithm a clusler such as a ramily. For a linear model. the marginaJ and cluster specific: methods are estimating the same population PIIrQmeter. although diffcrent methods of cstimation may give differing estimated results. For a nonlinear model such as a logistic regrasion, the population
but clearly Ihwean be investigated and Ihe modcl modified ir necessary. For some models. we need to assume also thai %, and Euam nonnally distributed. The model can be elltended to counts ar binary dependent variables using appropriate geacralizcd linear models. The INTRAQ.USlER caRltlL\TIO.,,( (ICC) is given by:
oj P=(1l+Q}
(2)
With cluster trials th~ an:: two sample size issues: bow many clusters and how many patients per cluster. 11Ie basic principles for a aJIIIpletely randomised design have been discussed by Donner and KJar (2000). The idea is to obtain the sample size for an indiVidually nmdomised trial and inflate the sample size by Ihe design effect (DE). whc~ DE = I + (;;-I)p and nis a measure or Ihe avenge cluster sizc. Values oflhc ICC up to aboul O.OS an: found in practice in primary can::. Even with such a small ICC, with 20 patients per practice the sample size has 10 be doubled for a fixcd POWER com~d to an individual randomiscd trial. Cornfield (1978) stales that one should 'analysc as you randomise'. Since randomisation is at the Icvel of the practice. a simple analysis would be to calculDlc 'summary measures'. such as Ihe mean value ror each practice, and analyse: these as the primary outcome variablc. Omitting the covariates rrom Ihe model for simplicity it is easy to show that:
Yj = II + I'd.. + Ei
(3)
when: S'; is the mean value for YII rar practice i and:
Varlv..) =
oi + -a2ni
(4)
Equation (3) is a simple model with indcpcndcnt enum. which arc homogeneous if n,'s am always ofsimilanize. An ORDINARY LEAST SQUARES estimate at PlKtice level or r is unbiased and the standard error or estimate is valid provided the error lmn' is independent or the lleatmenl efTccL Thus a simple analysis at the c1UJ1cr level would be the followin&: irn,'s an: the same or noIlOO different. then carty out a two sample t-lest on the practice 1C\'cl means: irthen/s am diff~t. then carry out a wcighted two-sample ,-test using Ihe estimatcd inverse: or Ihe variancc for wcighL It is worth noting thal ir 0 2 is zero (all values from a practice ~ the same) then pnclice size docs not matter in the analysis and if oj is zcro, then the weight is equivalenlto the number of patients per practice. The advanlale of a pracl~level approach is that it is simple and intuitive. It works rarboth continuous and binary
79
CLUSTEREDBINARYDATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ panmetcn arc dirrc~nl, to a clcgn:c: mated to the difrc:rcncc in Ihc mean levels oflhe dUslerS. (Fwtherclctailsarc giycn by Neuhaus. Kalb8eisch and Hauck. 1991). Some packages have an optian to estimale a 'RJbust' slandard envr for a large number of pmc:cclun:s such as multiple and LOCJIST1C REORESSION undel' cluslCrirll. also known as Ihc Huber-While eslimate. for which no clislribuliona! assumptions an: n:quircd. The method 10 avoid is lbe FIXED Em!CTS approach in which one fils DUMMY VAR.L\Bl.B to each cluster. This remo~ the cluster-level wriabilily. but gives estimatcs thai an: biased (Mwray. 1998). MJC (See also CLUSlERED BIN.~RY DATA) Campbell, PtL J., BaaDer, A. MIl KIIIr. N. 2007: ~meDts in cluster raadomized bills and Slalislics in Medicine. Stalistics in Mrt/iC'iM 26, 2-19. CondIeld, J. 1978: Rmdamizaliaft by poup: a formal lDalysis. Aml'Tiam JoumtU 01 Epidtmiology los. 100-2DInIaer. A. aad KIar. N. 2000: Design tmd lIIIIII},su of C'mter ,amIomistlliDlr trials ill Mallh reINTC'''. Loadon: Arnold. ~ It. J.... MoaItaa, L H.2009: Chuter rrmt/omisetitTm. Boca Raton:
Qapmma:HaUlCRC.Marn)"D.M.1998:DeJigntmdQIIQ#yJiso/ gTDIIp rllftllonri:.etlITia&. Oxford: Oxford Univcnity ~s. Nea..... J. Me, KaIbIeI!Ic:II. J. D• .ad IIMICk, W. 1991: A compariSCIII of cluster-specific aad popuIaIiaa-avengcd approadIcs for analysing CXJIRIaIcd biliary daIa. lSI Rel';~ 59. 2S-lS.
w.
clustered binary data '11Iese arc biniif)' responses on units that an: nested in clustc:n. Examples include Iq)C8ted
responses when: occasions arc nested in subjecls. twin data whc:re twins an: nestcd in twin pairs and ~spanses on children nested in doctors. Here the units an: said to be at leycl 1 and lbe clUSlen al levcl 2. In thrce-Ievel data. lbe '1~1-2' clusten an: themselyes nested in "Ievel-3' clusters. For inslaDce. children may be nested in dac:lars who arc nested in hospitals. as shown in the figure.
Level 3: Hospitals Ie
01'.1
Dr.2
Dr.S
/\/\/\
Ch.1 Ch.2 a..S Ch.4 Ch.S 01.8
Level 2:
betwecn-c:luSler heterogeneit)' aad wilhin-cluster dependence. T1In:c types of statistical methods. accommodating the clcpcndcace induced b)' clustering. have been suggested rar analysing cluSlcn:d binmy data.
1. C/usler-sperijic nrotIel:s
whe~
each clustcr has ils own
cffecl(s): (a) RANDOM IHEcrS MOOELS. w~ clcpenclcnce is explicitly modelled by incluclin& cluster-specific random
inlemCpls (and passibly coefficients) that arc drawn from a distribution and hence \IDlY Ol'Cr e1ustas. 1be random effects arc assumed to be uncom:laIaI with the included covariates. These models arc Iypically cslimab:d using maximum marginallikdihood. when: the random effects arc ·intcgndc:d out' (see RANDmI ERECTS MODELS FOR DISCRElE IAJNOI1tJDlNAL DATA). The random elTects an: somelimes specified as catcgoricallatent classes.. leading to mixt~ rqrasion. (b) Fixed intercept models. where dependence is explicitly modelled by includirll fixed intercepts dial \IDlY over clu51e1s. In this cue it is not necessary to assume that Ihc clusa-specific effects arc uncom:IDIed with the includc:d cowrialcs and die ~gn:ssian paramclen can be interpmcd as within-clustcl' effects. 1bc:se models an: typicall)' estimated using maximum mndilionallikelihoad. w~ Ihccluster-spcciftc intcm:pts an: ·condilioned out'. 2. MQrgino/ apPrrNlCheS for marginal or populalion-a~r aged effects.. (a) OEIUaWSED ESTIMATING EQUATIDfG (GEE). w~ ~ penclcnee is treated as a nuisance. GEE is an cstimation algorithm that need not com:sponcI to any statistical model. (b) Marginal statistical models.. for instance the Babaelur. Dale and ~8owman models. These models an: usually cstimalcd using MAXIMUJ.( UKEUHOOD fiSTD.L\1KlN. 3. TrQnsilion models. w~ effects an: mnditional on ~ sponses oroIhcr wils in the cluslel'. "nIcse models require that the units withiD aclustcr arcDDl 'interchangeable·.1hc canonical example bcilll 1o~lUdinal cIaIa w~ ~ sponses an: time ordered. Tnnsition maciel, arc sometimes called autmqn:ssive modeIs.lqged response models or dynamic models.
Dodorsj
ASlSR-H
Levell: Chldreni
clustered bI..ry data Three-lf1f181 cIusterBd data
Units within a cluslU are expected to be more similar to each other than to units in dirr~nt clu5lc:n and thc:re isbcnee
.......elr, L MIl TIdz, G. 2001: Mul'i1YlTiIlt~ IlatislimllllDlklling basetl 011 gtnerali=«IliMar motIe&. New York: SiJringeI:. MoIIaG. 2002: Model families. In Acns. M.. Geys. G.. Molenbcqla. o. and Ryan, L. M. (ells), Topia in nlDtltllillg of C'1wI~Tttl data. Ben Ralan: QaPIDID a: HalIIC'RC, pp. 47-75. sau.daI, A. ... Rllbe-HtIIcItb, S. 2004: GeMra/i=rt/ lalmt rtlTillbk motItllillg: mullil~rel.IOIII;lutlilrtll tmd JlrvcllllVll equtlliofJ motlt&. Baca RaIon: Chapman a: IlaWCRC.
be.......
________________________________________________
Cochran
Q-test This
used
if the ptOJIOItion ofpasitive clichalamousoulclOmcs yaries between selS or mlllChed data. II is uscd. rar eumplc, to Ic:sl whelhcr there is hetcqcnc:ity between people ralilll subject dala or to see if there is a difl'CImCC betwccn IRatmenls for trials usin& malched patients. When McNDIAR"s TEST is applied to two pailed groups. the Cochna Q-tcst seeks to iclc:atiry whelhcr the pmpodians of positive n:sponses vary 811lOIII many IlUlkhed groups. The Cochna Q-tcst can be viewed as an extension of McNemar's test and an: equiyalent wilen the RlDDber of groups is two. To c.ullde Coclmm's Q-sIalislic, one: must idenlify for each or Ihc N samples or subjcclS Ihc: number of paIIPS in which its RSpOn:Se is positive and denote thc:sc values Sa, S2• •••, Sw- One: must also identify for each oflhc C IfOUPS (C.I. mter.s or time: points) Ihc: 10l1li number of samples or subjc:cls dlat an: given a positive n:sponsc and clc:notc thc:sc values TI • T2 • •••• Tr ' These two sels of values will bulh sum to the numbc:rorposilivc responses. which we dc:naIc T. The Cochran Q-slatistic is then calculalcd as: te!l1 is
10 ICC
lOla.
= (,,-1»)( <~~-r J . I J
Q
"T-
tS)
and compan:cl to Ihc: OD-5QUARE DB1RIBUI1ON with
C-
1
DEOREES OF FREEDOM.
Cochran ( 1950) iIIusb'ates Ibis with an example when: rour ditTcn:nl media 1ft iDvcSlipacd ror efl'cctivenc:ss iD puwin& a bacterium when 6911181chc:d spccianens were IlOWn in each mc:cIium. Four of Ihc specilDCns had S,. values of 4 (i.e. bacleria grew in all rour media). five had S,. values or 3. one had an S,. wlue of 2 and the: reSl S" values of O. livinl: N
E~-1I3. <-4, T-33 'I1Ie total numbers of suc:ccssful specimens by medium (the 7j) wen: 6. 10. 7 and 10.pvilll: ~
~ i
When compared
rJ = 285. Q = 8.052
I
ecbi-square distribution with 3 dc:gm:s ofrra:dom this lives a P-vaIue ofO.04S. indicalilildlat lhere is cviclenc:e that the media do not all perform a1ik&:. It may be appamIl rrom lhese calculations thai since anly the sums ofpasitive responsc:s~ used. any specimen (orruw of matched data in Ihc: more geneml case) dial conlribu1cd no positive rapoIIIICs can be illICRd. By lIIIumenls of s)'IIIIIICtry. onc can sec that specimens that giye a positiye n:sponsc: iD evcl)' case 8M similarly uninfonnalive and can be ipoml. Nate thallhis is akin 10 McNemar's test, where the: conc:or-
OOCH~E~nON
danl pain do DOl conlribute 10 Ihc: test statistic. far details see Fleiss (1981).
furta AGL
n.c
C....., W. G. 1950: oomparisoa of pcR:CIlIaICS in matched suapIcs. BiDnwIr. 37, 256-66...... J. L 1911: Slali"iml for ",'e~ and ptDPD'tioIrs. New York: Job. Wiley .t Saas.1ac.
.,hDtlJ
Cochrane Collaboration
The Cocluanc Collabanalion is an iDternllliORal orpnisation that aims to help people make well-iDformed dc:cisionsaboul heallhcarcby preparilil. maiDlainilll and pI'OI1IOliDJ the accessibility of systematic reviews oflhc: eft'c:cls ofheallhclft iDtervc:alions. Systematic reviews produced by the CoIlabondioD ~ published in The Cochrane Database of S)'Slcmalic Reviews as part of The Cochrane Libnlry~ available online and on CD-ROM on a subscription basis (1'hc Cachranc Database or Systematic Reviews. 2004). The Cochrane CoilabandioD is c:unenll), Ihc: largest organisation in Ihc wadd enPled in Ihc: production _ maintenance of syslc:nlatic reviews. ID 2003 mo~ than 9000 conlribuaors from SO counbies wen: involwd and the sc:cond issue of Ihc dalabase iD 2004 containc:d 1999 completed reviews and 1441 protocols for reviews. The Cochrane Collaboration was named after An:hie Cochl'BllC. Ihc British cpidcmiolopst who. iD his inftuential text. E.J/«Ii.ytWD and f!jfidem:y. pramolcd Ihe use of evidc:nce rlOm nmdomisc:cl conllVlled bials 10 inform Ihc provision ofhcalthcan: services (Cochrane. 1972). He went on toemphasilC Ihc impmtaDcc of sySlcmalic reviews. when iD 1979 he: WJOlc: ~It is surely a gRaI criticism of our profession that we have: DOl arpnised a critical summary. by specially or subspccially. adapted periadicall),. of all relevant rancIomisc:cl canllVllcd IriIIIs' (Cochrane. 1979). This challelile led to the: establishment durinl the 1980.s or an international mllaboration lodc:velop the Oxforddotabtueoj perinaltli trillia (Chalmers, 1989-1992). In 1917. the )'c:&r before Cocluanc died. he refcrml to a systematic ~vicw of mndomised controlled llials (Refs) of can: during Iftlnancy and childbinh as ·a real milestone in the history of randomised lrialsand in Ihc: evaluation ofcan:' and sugeSlcd dial other spccialtic:s shauId copy Ihc IIlClhacIs used (Cochl1lllC. 1989). His encauragc:meat. and the cncIanemc:at of his views by olhem.lcd 10 theopeninl oflhe first Cochnuae CcnIR by lain Chalmc:n (iD Oxfanl, UK) in 1992 and Ihc: raunding of the Cochrane ColiaboratiCID in 1993. 11ae Collaboration produces reviews thl'OUlh its Collab. rative Review Groups. which an: suppoltcd by ficlcls. Cochrane ccnln:s _ methods Ift1UPS ('The Cochnuae Collabandion website). The~ 8M currently SO Cochnuae collaborative review paups. each heilll n:sponsiblc: rar reviews in a particular IRa of heal~. The 12 regional Cochrane cc:nlR:S support J'Cview activity aacI dissemination of the library around the warIcI~ while the Dine ficlcIs pnwide
81
OOEfflCIENTOFDETERMINATION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ links between Ihc Collaboration and particular an:as of hc:aIthcan: (e.g. primary aR). lypc:S of consumer (e.g. older people) 01' lypc:S of iDlervcnlion (e... vaccines). 1'he 10 methods groups unclerlakc slaliSlical and mc:lhodological n:search related to s)'stematic: reviews. advise the CoIlabondiaa on how sySlcmalic reviews should be undcrIakcn and n:padCd. monitor the qual it)' of reviews and assist in lhe dcvelopmenl of sohan: and lninilll materials. Cochrane n:vicws aim to minimise bias and therefore n:views of heallhcan: inlcrVcnlians atlCmpl 10 Iacatc all nuacIomiscd !rials. whether or not they have been published. 1hc: Collaboration has workecIlo improve Ihc identification of randomiscd controlled IriaIs in the literatun: by syslemalieaUy handscan:hing journals and conference procccclinp and by working with the National Library of Medicine to impn)\'C indexing of randomised trials on Medline and PubMcd. 'lbe resulting collection of citalions. The Cochrane Central RcgistcrofConbolledTriais. isaYailable asasecond dalabase on 'lbe Cochniae Ubrary. In ScpIcmber 2010 it cx:MIlaincd over 400000 citations. Publication ofCoc:luanc n:vie.ws as elcc:lronic rather than paper documents has advantages that include the ability to update reviews when new bials an: completed. full reporting of standardised details from all trials. including forest plaiS and data. and the ability for users of the Cochrane Library 10 reanalysc reviews usilll altenaalive summary statistics and Slalislical models. as well as Viewing Ihc analyses chasen b)' the author. Commenls and criticisms of reviews can also be made online and published alonlside the original n:view. In its second dcc:adc the Cochrane CoIlaboraliaa is aJntinuing to n:gisler and publish DCW reviews of heallhcarc inlcrVenlions. as well as tsdclilll the challcagcs of how to obtain beuer sySlc:matic evidcac:e of the harmful effects of inlcrVcnlions and how to ensure that systcmalic: reviews arc updated in a timely manner. The CoUabondion is also now develapilll plans for the publication ofCoduane Reviews of diqnostk" lest accuracy. JD
a..aa.n, L (cd.) 1989-1992: '17re Oxford t/slabtJse of periRtlI,,1 1riIIl1. (hfoni: Oxfanl Uniwnity ~ (Contcats 'A~ subsequenlIy llaDsfemd 10 .... maintained in 1Iac Cacluue DIIhIIase of Syslelllalic R.c\'ieM.) Codanae, A. L 19'72: EJf«li,'elless I11III e.f/k1ency. RtmtJom rrjlecliMs 0#1 .,,-it:el. UncIan: Nufticld ftoviacial Hospitals Trull. C....... A. L 1979: 1931-1971: a critical .:view. with puticular.:rmace to the medical profession. IaMetlkinesJDrlM)~tJr 2000. Landan:Ofticcoflkalda Ec:anamics.. pp. I-II. CoeIIraB. A. L 1989: Fon:wanl.1a Cbalmcrs.1.• Eakin. M. and Kcinc. M. J. N. C. (cds). E//«Iiw Ctm! iJr pl'egl'lllllt)' tIIIII
_I,,,
milt/birt".
Oxford: OIford University ~ss. TIM! C4IcIIn.Col....... websItE W'A"W.lhcccx:branclibruy.oom and WW'A'. cochraac.cq. 'I1Ie c...... n.tahase of s,1IIInatk R. . . . 2004: laue 2. 'I1ac Coclnac Library. Chichester: JoIm Wiley a Soas.1Jd.
coefflcla'lt of determination
See CORRfl.AlION
coefficient of variation This is a measun: of dispersion dcftncd as the STA.'IDARD DEVIA11O.~ divided by the MEAN. Bc:cause Ihc standard dcYiaiion and Ihc mean shan: the same: units, lhc:sc units cancel out and leave the CIOCmcicnt of varialion (CV) as a dimensionless number. Because it is indcpcndc:at of mcasun:menl units. the coefficient can be used to c.xJlllpaI"C the amounl of dispersion for two sets of values - hence its allematiye name. relative variability. Such comparison can be usefUl in some cues but it has to be remembered that the CV can only be used for ratio scale variables that have a IrUc zero point, e.g. height and weight. However. the CV cannal be used on wriables on an interval scale. e.g. temperalure mcasun=d in dqrecs Cealipadc.. because it would have a different valuc rram lCmpcratun: IDC8SUR:d in dcgn:c:s Fahn:nheit due: to the ditrcn:nt mean temperatures. SSE
cohort studl.. Also callc:d medical rollow-up studies. cohort sludicsare considcalto be any epidemiological study in which the study populalion is idenlified befon: lhe occurn:ncc or the disease event of iDtcn:1I and then followed in time until the first occum:ncc or the disease event or the end of the stud)', whichever comes finL 1bcse may also be refCIRd 10 as survival studies. in which the outcome is death. 1Jpically. subjc:cts arc classified asellposcd ornal expascd 10 one or more pul8live risk ractors at the beginning or the study or. allc:matively.lhc)' may provide mon: delailed information on eqxJSun:. Because exposure is determined prior to an illness. this lIud)' design avoids bias due 10 selective n:call by palicnlS who have been recently diaposcd as in a CASE-CONI'IlOI. S11JDy. especially when lllere may be rumours or pn:conccplions relanling the lISSDCiation bctwccn disease _ IIIe pulaliye rislc. ractOl'. Nevertheless. the palcntial for bias always deserves considerable Ihoulht when dcsignilll a lIudy. especially for an observational study (KlciDbaum. Kupper and Mcqcnstcrn. 1982: Kelsey. 1bompson and Evans. 1986; Pmaticc. I99S; Rathman and On:cnIand. 1998).1bc sIrolllell evidc:aec of the dl'c:d of an expasun: on a disease event ordcalb is provided b)' a study in which Ibc: level of exposure is usigned at random. as in a randomised conbollcd clinical trial. Howeyer. for factor.s that may be harmful. this would not be feasible in a human populationduc to c:chic:aI c:onccms.
In a typical c:ohart sludy~ subjc:cls arc leCl'llilc:d ror a period of time and thea followed unlit a specified datc:. when Ibc Slatus of the subjecl is n:cordcd and the n:sults analysed. 1hc: filUn: (sec page 83) pI1:lCnts • diqram showilll a ch~ logical n:pn:scnlation forfOlD'hypothclical subjects. The date
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ COINCIDENCES of enmlmcnt is ~praented by a cirde (.) and the dale at which the discase is diaposc:d or the subject dies is aqxesenled by a diamond ( • ). A CXIIIIplete history from enrolment to Ihc oulcomc is available for the first two subjects, but for the lasl two, only incomplete information is ayailable because they are not observed unlilthe outcome. In the Ihinl subjecl. follow-up coatinues until the study ends. when Ihcy ~ wilhdrawn alive.. However, the fourth subject is last to follow-up during the period represented by the dotted line and the faelthal the outc:ome had adually occwml before the end oflhc study was not known to Ihc investigaton. The last two subjects arc said to be right ",lUOredbecausc only partial infonnalion on lime to the outcome is available, i.e. it is known that outCXIIIIC oc:cum:d sometime to the right of the last dale of observation.
Chronological representation Recruilrnant ~ period·--.,
•
CaqIIa",
•
End lludV
Analytical represenlation Camplete
0an8Ol8d 0an8ORld u
~1ime
cohort studl_ Chronological and analytical representations of a cohort study l- = statt time, • = time of outcome)
For analysis. Ihc lime of follow-up and an iadicalor or whether Ihc outcome was observed arc used. as shown in the analytical representation in the Ilgure. The hazard funclion, which is also called the incidence rate ror a disease outcome or a mortality rate t'or a death outcome. is the basic quantity of inlCmCt. A propcxtional hazards or Cox model is commonly employed:
A(X;t) =
~(/)cxp{Xx)
in which X is a row vector orcxwariales.X a column vector or c:om:sponding paramc:tcrs to be estimated and 10 (I) the undcdying hazard that may depend OIl lime (Pn:alicc. 1995: Holrard. 20(2). Among the elements in Ihe vector or covariates arc indialtors ofthecxposurc level for eaeh subject.
The number of disease cases or dealhs usually excns Ihc impact on the slalislical power of a cohon study. Therefore. a rare disease will lypic:ally require a huge and expensive elTort to accomplish. This may be due to Ihc nccxI to enrol a very large populalion thai will be followed in time or it may be due the need ror a long period of follow-up, especially if there is a JODI incubation period between the time when Ihc exposure of inlm:st otlCurs and the disease process begins. For example. in studies of cardiovascular disease. e.g. the Framingham Study and the MRFlT SlUciy, a sample size ofSOOO to 20000 was used to obtain raulls of interesL Howcvel', for studies of diseases such as cancel', e.g. the Nunes' Health Study and the Iowa Women's Study. Ihc outcome is less common, so sample sizes in Ihc range of 50000 to 100000 or even larger may be used. AI one lime. cohon studies were identified as prospectiYe studies. but theCUlRnt usage ofrctrospective and prospective refers to the temporal identification of Ihc study population in relatiOll to the Sludy itselr. Thus. a prospective cohort study would start by rccRliling a study population that would subsequently be followed in time. HowcyeI', in some circumstances. a more efficient design Slralegy would be retrospectiyely to identify a population in which ICIXII1k are available that will allow an investigator to rcconSlnlct the cohort experience that would have been observed had Ihc lIIudy population been enrolled in the study for Ihc enlire time period. For example. in a study of ractors affecting occupational safety, company n:amIs might allow an inyestigator to go back in time and thus leconSlnlct the disease history ofcahortsexpascd toclilTercnt factorsofintercsl. i.e. a retrospc:c:tive cohort. TRH ~aIcsl
RoIford. T. R. 2002: MullirariQl~ methotls in ~pitltmilJlogy. New York: OxfanlUni\'aSi1y Plas. KeIHy,J. L, 1'IIampIoa, W. DoIIDII E,...., A. S. 1986: M~thoth in obsrn'lliiontll qidm,iDlDg)'. New York: Oxfard UDWmity Pras. rae......-, D. G.. IUIppIr, L L IIDIII\............, H. 1982: EpUlmriolog;t l'eMtUch: prilrtipl~s (IfId quanlilaliloe methods. Belmont: Lifetime LcamiJIg Publicalions. PnDtIce. R. L 1995: Design issues in c:ohart studies. S'alisiicaJ M~thtJtlJ in Mediml ReMQrcIr 4. 273-92. RDtIuDu. K. J. IIDII 0 ........., S. 1998: Motiern epitkmiolol1. Philadclllhia: Lippineau-Raven.
coefficient of detennlnatlon See COIlItEUnON coincidences 111ese are surprising conculRncics of events., pem:ived by same as meaningfully related. with no causal conneclion. Carl JUDg was fascinated by coincidences and even introduced the tcnn synchronicily for what he saw as aD QaIWfl/ coanecting principle needed to explain the pheaomcnon. lUluiDl that such events occur far more rn:.. quently tbaa chaacc allows. Howcyer, JUDI gets very little suppon rrom FIShel' who commented Ihus on coincidences:
83
~~------------------------------------------------------'TIle aae chBDCC iD amiliioll wiD uadoubtcdl, ace_. With no mannaad no lesS than 'ilS ~ r~. however smprised we may be Ihat it should occur 10 Most Slali. . . . . waWcI ~ With Fisher and put dowa coinci. . . 10 t~ ~"w" or truly n~ Willa. Jarp enau&h sample~ outmpaus thin&.is likely 10 happen.
I.
88,
us.'
co.mmon .oraln afaUetIcaI.nalyaMln III8d-
Stali5lic:aI ann ~ occur .• any Slap iD a ibid,. iDcluclillglhe planaiq. clc:siP 'and cxecutiDn sIDIcs.. .HowCVCI'" the __ .~n ~ tyJiicaIly oCcur duriac die ....)'Sis~ ptaen~ and i~iIIion of"1he coIecIed data (sac: the ftmt ftpM).
lei..
Same examples an: pea in 8Yeri1l (1999). Those .iaIeJpn:tiD& medical lUe. .h lladics shaaIcI be awam oftbil when raced with.an extremely ....1 P-wlue
if such did DOl arrive Ii'om a pn:-planDed ....p .. pc:daaps aniVi... fiaIa . a data-dlqiD& ~iIc (or ·~ftshini cxpeditian"). SSE (Sec also I'I11\WS IN ~ aaEa\RCH, JIQIT HOC .uw.YSaJ Eftdt, & S. (2008) C""ruIa. 2ndedilioa. SpriqI:r"Ncw Y.....
comrnan . . . . 11i .......... ...-.,••• In iiIed. . . DitJgIIIm 01 the most common'emu 0DCUt'IintI In •
"118ctIve. eIhIQa
sllitisliclJl analysis
See IIHICS' AND aJNICAL 'IIUALS
Ny. Cllnlcaiiy Important
I
Upper 95% Cl.llmlt Lower 95% Cllimlt +E8t1inated treatment cItfI'ereIQ
I'
II I
II. I
1
.•
i-
I·
I
II
.~
•
c) ....~.~
&cInIcIItJ ~
• .)NrX......... .~~
~1mportanI
'1 ~.--~.~~ I '
I
:
------~------~-------------------
o
Difference coinmDn emn In allilatICIII ....,.. In ~.. Used f;ontldlllDlInteivaIs to " . dIsIInguIsh' statlslicIJI ~ ftum dinIcIII ~ (waItIws, 2OO9J
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ COMMON ERRORS IN STAllSTICAL ANALYSES IN MEDICINE Box 1 Basic errors in statislical analyses in medicine (from Altman, 1991) I. Using mclhods of analysis when the assumptions arc nol mel. 2. Analysing paired daIa ignoring Ihe pairing. 3. Failing to lake accoual of orden:d calegories. 4. TreBling multiple observations on one subject as independent. S. Using mulliple paired comparisons instead of analysis that considers all poups. 6. Performing within group analyses and lhen comparing groups by comparing P-VAIJJES or CONfIDENCE INTERVAlS.
7. Quoting conftdcnce intervals dial include impossible values.
Box 2 More advanced enors in statistical analyses in medicine (from Allman, 1991) I. Using C'OJRlation in method comparison Sludies. 2. Using condation to compare to sets of lime-related obsemdians. 3. Assessing abe comparability of lWO or more groups by means of hypothesis tesls. 4. Evaluating a diagnostic test solely by means of SEN. smvlTY and SFEClRCI1'Y.
Altman (1991) dc:scribcs seven 'basic' emxs and four more "advanced' cmn in the Slatistical analysis of datascls (sec Boxes 1 and 2). With the presentation of the: raults or a Slatislical analysis there arc apin sevc:ral common em:n (sec Box 3). The cammaa misleading errors in graphical presentation arc (Allman, 1991): I. LKk or IIUe zero on the venicaI axis. 2. Change of scale in the middle of an axis.
Box 3 EITOfS In the presentation of IfISUIIs of statistical analyses in medicine (ftom Altman, 1991) Box 3 I. Using standard errors (or conftdcace intervals) for descriptive infonnation. 2. Presenting means (or medians) of continuous daIa withoul any indication of variability. 3. Presenting the n:sults ora stalistical analysis solely as a P-value. 4. Presenting results with spurious numerical pmcision. S. Mislc:ading graphical presentation (see ORAPHICAL DECI!.P1ImI).
Box 4 Enors in the inlerpretBlion of the teSUIIs of statistical analyses in medicinB I. 2. 3. 4.
Assac:ialionlrelationship F: causality. Nonsignificant " no cff'e:ct. Statistically signiftcant imporlBDt. Extrapolalion from sample to populalion of inten:st when sample is nOl representative.
3. Three-dimensional effects. 4. Failure to show coincidenl points in a scatlc:r diagram. S. Showing a fiUc:d rqrasion line without a scatler of Ihe raw data. 6. Superimposing two (or mCR) graphs with diffen:nt vertical scales (especially wilen they do DOl sIaII at zero). 7. PlOUing means without any indication of variability. Further discussion or these and many other issues can be found in Fn:emaa. Walters and Campbell (2008). There are several common cnon in Ihe interpretation of statislical analyses (see Box 4). However'. abe majority of errors in abe interpn:laIion of statistical analyses relate to wlClerstanding Ihe meaning of hypothesis Ie_ and P-values. The P-value fram a hypothesiS lest is fn:quently inlerpmed as the probability that the observed effccl is due to chance. This is inco~L The P-value is "DI the probability dlat the observed eITect is due to chance. It is the probability of obtaining Ihe obsc:rw:d effect (or a more unlikely one) when the NULL HYJIOI1IESIS islnlc. 't1Ie P-value assesses how likely it is to obsenre such an effcct in a sample when there is no such difference in the population. Another false inlclprctation is the belief that P=O.OOI implies a slRlllgel' cff'e:ct that P = 0.0 1. This may be so. bul Ihe P-values alone do not demOlWlrate this; \Ye nec:cl a conftdcaa: interval for the observed efFecL Statistical signifieancc is often used as the sole basis of the interpn:laIion. A common mi5take is that any signiftcant effect. however small or implausible. is taken as reaI~ and any nonsipi8c:ant effect is taken as indicating nodilTerence. A ltaliSlically signifieant rault does nol imply that a result is important. Real. nonrandom elTects may be very small and unimportant. Conversely. a nonsigniflcanl resull does nol imply l.herc is no elTect. Big effects may not be signiftcant if the sample size is low or the variability of the data is high. Again abe estimation of a conftdence interval for the elTect may reduce the difficulties and help inlerpn:lalion (sec Ihe second ftgure) (Walters. 20(9). AnoIherfrc:quent c:nor in the intcrpn:laIion of abe n:sults of Slatislical analysis is to equate a relationship or association and causation. An observed association between ~ariables does DOl necessarily imply a causal relationship (see C.O\USALnY). M05l resc:arch is based on Ihe principle of
85
OOMP~R~
______________________________________________________
exlnlpDlaiilll finclilllS IiDm a sample: to the populaai_ or inlClat (Campbell. Machin and Walters., 2OD7).lnonlertocio this abc sample mUll be ~seabltiveof'abc populali_. 11Iis exlrllpDlali_ can be compromised b)' studies willa hi&h diapout or refUsal rates. 'J'hm, is also IIIIGIher enar or exInIpolali_ by estimalinl unbown daIa values outside abc known limits of abc data. Paatherdiscussion ortJu:scCOllUllallemxs in the statistical anaI)'sis of mecIieaI daIa can be round in AlIIlllll1 (l99I~ campbelL Machin and Wakc:n(2007). Freeman. Waltcnand campbell (2001) aDd 000cI and Hanlin (2009). SJW AMman, 0. G. 1991: Pr«limJ sItIlistka/DT rMtlit.YII R..a.. ........ C1iIpman .. Hall. CaaIpIIeI. Me J., UMIIIa, 0. ... W.....,..s. J. 1OD7: Mftliml sItIlulic,: II tr.YI hDDk for , . /nllh ~1.4dacdiliaa. Chic:bata: JaIm Wiley.t Sons. Ltd. rn-a, I. V., W", S. J.... C·......, II. J. 2008: How to tIi:Ip/tq .ttl. Oxfanl: aMJ Books, BIadtwcII. Goad, P. L ... ......, J. W. 2009: C""",.. I!mIrS in .ttllistics (_Itow 10 II1YIiIl ...". lrdcditian.Chicbcsaa:JabaWileyaSoas,Ltd. Walttn,I.J.2OO9: Cansuhants' raruna: shaaId post hoc sample lire caIcaIatians lie dane? l'IrtInntIt:elllitlll Sttllislia .. 2. 163-9.
competing risks
11Iis tenD is used particularly in
or
SlaVI\'AL MW.YSIS to indicate .... the event illlent (Col. ch:aIh) ....)' occur flOln meR than one causc. Survival anal-
ysis is conc:c:med with the time (7) 10 the OCCurtaICC ofsome event or inten:sl, such as remission. death due to a spcciftc disease,. discontinuation of use ora conlnceptiYe deYi",.etc. For some individuals T may DOl be absenDble due to the accmn:nce or SOInC compc:linl evall. For example. if death ti'om pmIIale CIIIICeI' is or inten:st.. tIICII de_ fiom cardiousc..... disease is a compelilll risk. as is death IRma old • . 1bere ....y be mulliple competin& faillR I)'peS rram which each subject is at risk. We assalfte thai tbere isonlyonc r.i..
time per.study subject. AIr IIIOR: than one: time absented far each subject mullistllle models as discussed by Kalbfteisch aDd Pmati~e (2002.) aMn:cOllUl1Cllded. The survival function ofan)' specific failun: type ofinlen:st is typically eslillUlled b),
abc pmducl-linail estimalor (KAFI.AH-M£IEa ESmL\TOR (KM». lIalilll the obsem:cItimes of' the other rail. . Iypes as CEN1DRFJ) OISEJlVATIONS. The complement of the KaplanMeier c:sIimalar (l-~) is often UIed 10 estimale the pr0bability of rail. . due to a speci8ecl fail. . cause even in die pn:sencc of compelin& risks. The KM approach is onI)' masonable: UDder Ihe that all campclilll risks are iaclependcnL Independencc can be considen:d to mean thai the pr0bability or the occum:nce or some compc:tin& event is iaclependent ortheoccurrence oranyoflhe othercompeliq events. In mast situations this assumption is not valid. Far example, palicnts with local relapse ot blaSt cancer may have a higher probability of distance recurrence. In fact it has been lllluecI that in abc pn:sence of annpeliDl risks a c_~ilic sUrvival function has no biolopcal mcaniq since 'elimination' or some or thc compeliq risks mull inaueace Ihe otIIc:n. For example. Houpard (2000) points out that tn:llllllC:llt or stroke OWing 10 thromboses by dissolViIll blood clots would increue the pmbability of haemorrhqe. In onIer to avoid unrealistic assumptions on the relalionships between the various compcliq risks the KM medaod should not be used bat rather the cumulative inciclence fimclion needs 10 be cstillUllcd. The cumullllivc incidencc runclion rill' a speciftc cause or inll:n:sl. (often called the subdislribation runction) is the probability of the event of interest acc:urrilll befon: lime t from an individual subjc:cttoall or die competiDI causc:s. Toilluslrall:.. suppose: thai 10 palienls. subjecllo two compeliDl risks (A and 8) die at the limes shown in the table.
_pli_
0.9 0.8
I 0.7 ! 0.6 I. 0.5 0.4 i
a
0.3 0.2 0.1 O+-~O-
...- - -... a
9A
¢-----06 I 9----------~
------~ __~~~~~~~~~~~~~~~~ Tme
competing rI8Ic8 CLmuIallVe IncIdent:es of death for CIIUSBS A and B tor data in lhe 0DIf1PfIIInII tisIcs table
____________________________________________ competing lIa. Sutvlval expet1enc8 of 10 patients subject to fwD compeling risks of deIIth fA and B) Patienls
Time
1 2
3
2 17 3
4 S 6 1 I 9 10
14 S 9 4 I 10 12
'0 tI.,h
Cau:Ie of tleQIJr A A B A B B B A B B
COMP~MENrARY~LOGMODa
ineiclence funclion and ror lhe comparisDD of se~ cumulative iDcidc:ncc:s CaD be quite eomplc:x. Crvwder (2001) discu5SCs in dc:1Di1 panunc:1ric modcIlinI far competing risks - an appmac:h nol often taken far biamc:clical n:search.. Orcc:n. Benc:.tli and Crowley (2003) focus on applications in ODCoion'. although their mc:Ihods an: man: generally applicable. In.l1UID)' situationl it may nul be: clear which of Ihc: possible c:ompc:lilll causcs resulted in clealb. Flehinger. Reiser and Yashcbia (2001) review lite analysis of such moM_ cIaIa. DFIBR Cnnrder,M. 2001: Clsuit:tll ~/iIIg ri.slc.r. New YcR: Oapmaa B. ad v....... Eo 2(0): Statislil:aI analysis rell' masbd daIa. la BaJakri.shnu. N. 8IId RIo, C. R. (CIIs)."1IGndbook tl/J/lllulin. Vol. 20: AtJrmr«s ill rrlillbilily. l.aadoa: Cbapm.. a Hall. pp. 499-522. 0 ..... S., .......... J. ad crowIe)"J. 2OOl: Clinical trills in tmC'DIDI)'. 2nd edilioa. New Yade: a..pman a Hall. a..pud, P. 2OOD: AIttIi)'Jis ofmull"lIr. itI~ JMniMllItIIII. New YcR: Springer ~ "-I1JIIIIcII, J. De. ad . . . . . . R. L 2002: Tht! JI,,'ulictllllrllllym Df/IIi_ lime dalll, 2nII ediliaa. New Yark: Jab Wiley a Saas, IDC.
a Hall............. & J.......,
ConapondiDg cumulative iDcideaccl far causes A and B 8M pn:lClltcd iD Ihc ftnl fipR (see pap 86). In anIcr to
iUuscrate Ihc: cumulative ineiclence computalions CODSidc:r time: S. The pcababiUty or dcalh fi'om cause: A at time: S or before is ellimaac:d by the Dumber of cause: A deaths occurring by time S divided by the IoIaI Dumber or subjc:c1S in the sbldyor 1110: for cause B theatl'Rsponclins probability is 31 10. Nale dI. the sum or Ihc:se two probabilitiel4l10 is the o~1 probability or clc:alh at time S or befan:· and is eqai\'alc:al ED I-KM eomputcd on all Ihc: deaths ignoring
·causc. However, when computed for a specific cause I-KM. the cumulati\'e inciclc:nce estimate CaD dill'er subslaDlially, as illUSbatc:cl far cause A in lbc SCCODd flgun:. 11Iis shows thai l-KM tends to o~timatc: Ihc: probabililY ofintaat. 'I11c: pn:sc:ac::e or cc:asorilll and/or explanatory _abies compllealel the computations (lCalbfteiscb and ftenticc:. 2002). SIalislicai inf'en:nce, both for a single cumulative
complementary log-log modal This is anolhc:r commonly used model, besides the LOOIS1IC RBOR65SION model and the PROBrf MOOI!L, for inveslillting Ihc: n:lalionship between 8 binary (or binomial) responsc:~ r say. and explanatory or predictor variables. Il is used in a variety of sellings. e~g. in Ihe analysis or cIaIa fma toXicology studies (cioso-Japonsc: cIaIa) where inten:st lies in dc:lermining Ihc: eR'ecl on subjc:cIS' sunriwl (eol. mice mortalily) of Ihc: exposu~ 10 difTen:nt closes or a ';Ioxic:' chemical compound. . Il is also used in serological studies in which 1ClUl00icai lc:a1S are pc:rfonncclto clc:tect the: presence (i.e. the seropositiyity) ar absence ofantiboclies praducc:cl in n:sponse to an infectious disease such as malaria, So as to be able 10
1 0.9 0.8
1-KM
0.7 ,
0.8
i
0.5
l.
0.4 0.3 0.2 0.1
...-----------Qo -----e
.---~
O~~~~~~~~~-r~~~~~~~~
TIIII8
compelllili lIaIes CUmuialive incidence IIItd f-KM estimales lor probability of death by CIIuse A
COMPLETECASEANAlYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ calculate inrection rales. Further examples arc in dilution studies. where an estimate or the number or inrective organisms present in a solution is required but can only be obtained Ihrougb applying different dilutions of the solution to a number or plates that conlains a growth medium Dad R:COnIing whether any growth bas occulRd after a fixed incubation period~ in ageing studies whcre interest lies in sclr-reported mobility disability; and in the analysis of grouped (or interval) sUl'Yival data (sec SURVIVAL ANALYSIS). where the presence oroccunence or an event is known only to within a spc:cific time interval. Mathemalically~ the modcl relatc:s the probability. P say. of a "positive' MSpOI1SC (Y= I) to a linear combinalion or the explanatory variables through the complementary log-log link runction (sec OENER.WSED LINEAR MODELS); i.e.: log ( -log(I-P»)
=flo + ~I·'(I + ... +/ll;."'''
when: .'(1 ••••• x/t arc the k explanatory variables and the fJ values arc the corresponding regression coefficicnts.
I
Q,
)
5
'15
.. ... ... . '
/,
I~
~~~::.--::-
~.'''-
0.0
0.2
0.4
0.6
0.8
/
1.0
p
complementary lag-log model Three Imnsfomrations 01 probability The figure presents a plot or the complementary log-log transrormation or the probability P against thc probability P itselr. Also included on the grapb arc plots or the lopt and probit transrormations or P against P. Eacb of these three transrormalions 4XlDYerts a probability in the unit interval (0. 1) to any value wbalsDcver. thus eliminating the need to impose any Jatrictions on the regression coemcicnts. Howcver. unlike the Iogit and probit transrormations that arc bolb (ISW roIationaUy) symmetric about P= 112. tbe camplcmenlary log-log transfonnation is asymmelric (see the figure). Thus this llansfonnalion is found to be more suitable when: it is appropriate to deal with tbe probability of a positivc response in an asymmetric manner. i.c. when the piobabilily incrcasc:s from 0 fairly slowly but approaches 1
quite suddenly. Observe also that the complcmentary log-log transformation does not differ appn::clably from the logit transformation when P is small. say less than 0.2. The justification fDl' using this type ofmodcl in the analysis or data from many slUdies comes rrom assuming that CKb subject bas an underlying, continuous latent or unobservable tolcl1lllCC or thn:sbold wriablc. r. which is assumed tocomc from the Gumbel 01' exllCmc value distribution (see Davison. 1998). If a subject's tolcl1lllCC variable. r. exceeds a cerlain tIucshold (J (i.e. r > (J). then a positivc respanse. Y=1. is observed. For example. in a toxicological slUdy invcstigating the cffect ofdifTc:n:nt dasesofan experimental drug on mice. a mouse may dic irthe CXpasIII'C dosage exa:cds the underlying tolerance the mouse bas ror the drug. In studies aJDemIing mobilily disability in the clderly, the uncIerJying latent Iesponsevariablc rorthe sclr-n:parlinginability to walk aqU8lter' ora mile may be the subject's true mabilily level. Henc:eeacb individual's response to the question: 'Arc you able to walk a quarter of a mile?" ~iU depend on his or her cut-oll' point. wbich is the threshold Icvel on this latent scalc at which he or she will move rrom Y=O 10 Y= 1. Thus caefftcicnts in the rcgressionmaclelabovemaybeintcrpn:talaslhccll'c:ctsofthe cavariaa on the latent variable. )"". The complcmentary log-log madel can also be derived from notiDg the relationship between the probability of a positive response in a lime interval of Ic:ogth. Tsay (or an analogous measure. say volume). and the response rale.1' say. fDl' this time interval u . the Poisson assumption (see PoiSSON DIStRIBUTION). For example. this relationship is utilised in the development or models for dilution and serological studies. where tbe probabilities or growth occurIinI on a plate al a particular dilution and or a person living in a particular disease endemic area being inrected with this diseue in one year, n:spectively. arc of interest. This madel also follows naturally frum the applicalion of the proportional hazards assumption to poupc:d survival dalB. Thc rqrcssion CXJCflicients in this case arc interprdcd as log hazard ratios (see SURYIVAL .ANALYSIS) or log relative risks. However. if P is small (P < 0.2), the regn:ssion coefficients can also be inlclprcled as log odds. Far further BT details sec Colleu (2002). CaHett. D. 2002: "'otklling hintllj dola. 2nd cdilioa. Loncba: ChaplDllllt HalIICRC• .,.. . . A. C. 1998: &bemc w1ucs. Ia Annitace. P. aad Colton, T. (eels), E/ft),dope4it1 of hioslalislks.
Chichester. John Wiley cI: Sans. Ltd.
Complete case analysis This is an analysis that LIleS only individuals who have a complete SCI of Ihe intended measurements included in a sbldy. An individual with a missing value on onc or more variables will not be included in the analysis. Whcn there arc many individuals with missing values this approach can considembly n:duce the effectivc sample size. In most cin:umstances complcle case
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ COMPLEX INTERVENTIONS analysis is not to be rc:commended since other approaches such as multiple imputation can be used in onIer to retain as full a datascl as possible. then:by improving efficiency and rc:duci", bias. SSE (Sec also DROPOUTS)
complex Interventions 11aese 1ft interventiolUi that contain 5evenl interacting components and/or multiple outcomes. Most CUNlCAL TRIIaI.S focus on a single intervention with a primlU)' outcome measure (see ElmPOINTS); the slaDciani trial compares a group of patients who receive a new medication with a group who do noL and Ihe allocation to groups is done al random. lnaasing)y. howc\'er, we wish to test more: complex inlcl"Ventions. Multifacc:tc:d interventions or those thal ha\'C multiple outcomes an: hard to develop and their testi", presents a nmge of challe",es. Disciplines such a public health and preventive mc:dicine need to assess complex packages of measures that are tailored to individuals or groups.. ralhc:r than a single medication given in standard doses 10 recipients. InterventiODS aargc:tc:d al behaviour change can involve various components and be particularly complex. Sometimes the intervention conlinues to develop ancrthe initial assessment, and ongoing evaluation is needed. though this can be hanlto accomplish. There is no specific deftnition of whaa makes an intervention complex: even standard "simple' interventions may have elements of complexity. a.US1D RANDOMIZED TRIIaI.S have been developed to deal with the 4X1mple:xity Ibat arises when interventiolUi are applied to groups such as hospital wards. surgeons conducting operations and 4X1mmunilybased groups. However. the: complexity can go much funher. Mullifacc:tc:d interventions to alter diet or physical activity. for example. can be extremely complex. The British Medical Research Council has developed gUidance for the development and evaluation of complex interventions (Craig el QI., 2(08). Some: of the key points that have emc:r;ed over the years relating to these intervenlions include the need to develop the intervention carefully. drawing on all the evidence 10 date using systematic reviews. understanding the theory underlying the intervention. modelling the processes and outcomes. and assessing feasibility and piloting the methods. before embarking on a full evaluation of Ihe intervention. A clear description of the intervention is vital both for those involved to understand it fully. but also in reporting the findings so that lhe intervention can be introduced elsewhen:. or tested further. Various apprvaches to developing and evaluali", interventions have been cleveloped. such as a multiple optimisation strategy (MOST) (Collins el al.• 2005) and the Reach. Effec:tivencss. Adoption. Implementation and Maintenance (~AIM) framework (Glasgow. Vogt and Boles. 1999). The Nalional
Institute for Health and Clinical Excellence (2007) has also produced guidance for planning. delivering and evalualing public health activities aimed at behaviour change. 1he choice of study design depends on the intervention being IISSCSsc:d.. Randomisation should be employed wherever possible and the gold standard is the nmdomised controlled !rial. perhaps with variations OD that methodology. such as pn:ference mals. and randomised consent. stepped wedge and N-OF-l 11lIALS. However. sometimes randomised assessment is nol possible. An example or this islhe introduction of the Sure Stan Local Programmes in England. which was a government initiative 10 provide care and support to families and children in the most deprived areas in thecauntry.1'hc: intervention was undoubtedly complex in that sen'ices olTemi in each SIR Start an:a varied according to locally idcntiftc:d priorities. and no control groups wen: mODitoR:d contemporaneously. The: evaluation learn used a quasi-experimental approach 10 compare the data from Sure Start with similar data rrom children in the Millennium Cohort Study who lived in similarly deprived parts of the country but who did nol yet ha~ access to Sure Start programmes (Melhuish el al.• 2008). This falls short of the ideal of a randomisc:d assessment but was a serious attempt to evaJulIIe a large. widespn:ad and costly intervention. Public health assessments often need 10 maximise the infonnation available: from "natural experiments' (Petticrew el QI•• 20(5). 1bus the introduction of a new supennarket. lraftic calming I1'IC8SUR:S. the building of a major road may have health effects thal may or may nul be the intended con&equcIM..'Ic of the intervention. Randomiscd assessmenls arc: rarely possible and although less than ideal a befoRHmd-aftcr comparison. preferably contrasted with one: or more comparable areas. may be the best that can be achieved. Particularly dillicult is the assc:s.smeol of interventions affecti", entire: populations inlroduced by nalionaJ or local gOYc:mment Examples are the introduction or Wilier fluoridation (sec: hup:llwww.lOuthamptonhealth.nhs.uklpublichealthl ftuoridation) or smoking bans in wrious countries (Haw el aL. 2006: Pelt el QI.• 20(8). A PRRQuisite for such assessments ismllc:ction ofdaIS before the intervention commences and this CaD be politically difllcult. as it may nca:ssitale delaying the implementalion of the new me8SIRS. Complex interveationsrequin: imaginative approaches but they also need syslemalic evaluation and an understanding of the true n~ of the intervention under study. All 4X1mponents of the intervention and its consequences. intended and Wlintendc:d. ha\'e 10 be assessed and resean:hers need to undersland the thcon:lical and practical background to Ihe intervention that they are investigating. Campbell eI 01. (2000) offer the sensible advice that a mixWn: of Refs and other research designs an: needed fully to assess complex interventions. HI
89
COMPUERAVERAGECAUSAL EFFECT (CACE) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Campbell, r.L,t IIf. 2000: Framcwcd for the design and cvaIuaIion forcomplex iJdm'aalioDs 10 improve health. Brittsh MftiiraJJDUlflQI 321. 694-6, CaIIIas, L. M., Marpby. S. A.. Nair. V. N. aDd stndIer, V. J. 2005: A Slralcl)' for qUnWng and evaIuaIing bebavioral intamdions. AlurtIls of ikIrtn'iourai Medicine 30(1). 65-73. C ..... p.. Dleppe, Po, r.1adDtyre,5.,l\UC..... NuanOa, L IIDd Peftkre1r.1\1. 2008: Developing and evalualin, complex inlcn'ClllIons: the new Medical Racarcb Council guidance. British Mttliml Journa/l37. a1655. Full guidance available al: W\\'W.IDR:. ac.uklcompiaiDlcrvc::ntionsguidance. G~. R. Eo, Voat. T. r.L _ BoIII,S. M. 1999: Evalualing the public health impact ofheallh promoliOll inten'aalioDs: the RE-AiM framcwcd.Amrrit'tlll JDlllJlQI If Public Hettlllr 19. 1322-7. Haw, S. Jo, Onler. L., Amas, A., Carrh,C.,Fl,dlbadllr.C.. FGllltG. T.""I. 2006: LegislllliODon smoling in enclosed public places in Scotland: how will we C\'alUIIIe the impact? Journal of Public Hmlllr 28, 24-30. MeIhuIsII, Eo,
s..
BelIky.J., lAylaad,A. R.,BanHI,J.aDdtbeN......... Ew........
GlSan Start 2008: Research Team EffedSoffully-es1ablishcci Sum SlIIJl Loc:aI Programmes on J-year-old cbildmt and their families lMng in England: a quasi-experimenlal absen'aIionaJ study. Lonert 372. 1641-7. NatloaallDlUtute for Redia aDd CUaIcaI ExaIIe. . (NICE) 2007: Bcha';our cban~ at papu)atioa. communily
and indi\'idualle\-ds. In NICE Pub/ic Health Cildtltmce. London: NICE. Pea. J. Po, H..... CoIIbe, s., NnIJ)'. D. Eo. Pel, A. c. R., FIIddtadIer. C.II"'. 2008: Smok~f~ legisllllion and hospitalizalions for acute COIODIIIY syndrome Nt'tt' EnglandJDUlflQI ojM~didne 359.412-91. Plttlcnw, r.L. Cmnad..,S., FemII, Co, FlDdIay, A., HI....., C., RO)'. C." •. 2005: Nalllralexperimcnts: an uncIcrused tool for public health? Public Health 119. 751-7.
s..
complier average causal affect (CACE)
See
ADJUSTMEXI' FaR NONCOMPLL\NCE IN RANDOMISED CG.'fI'RII' m
TRIALS
component bar chart
Sec B..\R CH.O\Rr
components of variance
'11Icse are variance paraI11CtcB lIIat quantify the variation attributable 10 nndom effecl lenns included in a n:pcssion model. For example. a simple IWIJ)O),I EFFECTS MODEl. for diastolic blood preSAR mcasLII'CIDCnts on patients recruilc:d from a number of clinics includes random effects 10 Jq)rescnt the variability belween clinics and random n:sidual effects to rcpn:scnt Ihc \'arj. ability betwc:cn patients. If no f1ll'lhu nndom efTcct tenns arc added to this model. Ihe model is said to include two CXJII1POnCnts of variance. The VAlUANCE of lhc random clinic effects in the model is the bctween-c:linic varianec component and the variance of the random n:sidual effects is the between-palient variance c:omponent in this example. Under this model. the lOiaI variance of the individual patienl measurements is assumed 10 be equal 10 the sum of the variance components. Suppose in lhis example of blood pressure meBSUn:lncats on patients wilhin dinics that the overall mean value is estimated as 10 mmHg. with the bctween-c:linic variance
component estimated as 7 and lhc be~n-paticnt varillDC'e component estimated as 135. 1be estimated belween-dinic variance componenl allows conslruction of a 9SCJL range for lhc mean blood pressure values at the different clinics. using lhc approach for calculaling a .den:nce interval. Hen:. values lhat arc within approximately two (between-clinic) standanl deviations of the overall mean arc 10-1.96./7 = 74.lmmHgandlO + 1.96J7 = 8S.2mmHg.llisthcrcfore estimalc:d that the majority of mean blood pressun: values for different cUnics lie between 74.1 mmHg and 85.2 mmHg. Estimation ofvarianc:e components is relevDDl in a number of application areas. In HEAlllI SERVICES RESEARCH. variance componcats can be used lo describe the variability between administrative or geographical units such as clinics. hospitals or towns and. scpamtcly. Ihc Variability bclwc:en patients within units. In LQNOJ1UDlNAL DATA. variance aJlllponents can be used to desc.ribe the variability betwc:cn patients and. scparalCly. the variability belween measurcments within patients. When Ihc data or inlerest arc from a balanecd clcsign.tbm: is a standanl approach forestimalion ofvariant'e componenls that is based on AN.o\LYSIS a: VARIANCE. As an example. consider some data n:pn:senting six repealed measurcmcnts or Ihc peak expinlory Row raIc (PEFR) for 10 palients with asthma. A simple random effecls model ror the PEFR measurements includc:s a betwc:cn-patient variance component ~ and a within-paticat variance canponcnt 0;. Because the: same number of observations is available for every patient. the datasel is balanced and the variance components can be estimated using an analysis orvariancelable for the dala. 'J'bc table pracnts the observed sums of squares and mean squares. as in a conventional analysis of variance. Under the random effccts model assumed hc~ the expc:cted values ror the mean squan:s can be expn:ssed in terms of the variance components o~. and oi. By eqUaling the observed mean squares with their expected values. eslimales for 0;. and oi arc obtained as 191.41 for 0;, and (11903.13 - 191.41)1 6= 1952.07 ror ~
components of variance 0bssfVed sums of squares and mean squares
Source of variation
Degree:e of Sum:eof Mea" E."Cpecied freedom JqlIIlTes :squares mean JqIIIlTes
pMieaI5 W"adlin patinas
IlerAftD
T....
9 50 59
l07 134.5 1 9570.70
11903.83 .91A1
a:..+6Di 0;-
11671S.21
Many study designs pnxluce unbalanced data; e.g. health services rcsc:an:h studies thlll include a numberofhospilals or
________________________________________________ clinics commonly n:cruit varying numbers of palienlS from these ancIlongitudinai studies need not collect equal numbers of measun:ments from all subjcctL Sc\'CI'aI methods are available far estimation of variance components in 1mbalanced datasets. Extensions of the analysis of variance approach to the unballlllCed case have been proposed. but these an: not now commonly usc:d.. Estimation of variance components using the method of MAXD.lUM LlKElJHOOD EmMAl'ION can be achieved within many statistical software packages. Howcver, maximum likelihood estimates of variance components an: biased downwards in general The prefemcl method for eslimation of variance components in unbalanced data is RES11UcrED t.L\XWUM UKELIIIOOD ESmIA. TION (REML). which is also available within many software packages. In balanced datasets. REML estimation gives the same results as the analysis or variance approach as just described. whereas maximum likelihood estimation docs not. 8y definition. a component of variance is nonnegative. since it com:sponds to the variance of a set orrandam effects. However. the methods forestimalion of variance components can pnxIuc:e negative values. Usually. this OCCU15 when the true value of the variance camponent is small and nonnegative. One approach to proceeding is to set the negalive estimate to zero. Estimation and Jq)DIting should be handled with can: far data in which a negative variance estimate has been obtained (Brown Dad Pn:scou. 1999). For further accounts or wriance components. see OoIcistein (1995). Searle (1971) and Snijdenand Bosker(J999). RT Browa, H. ad ............ R. 1999: Applim mLw mDtkis iIr mftIieille. Chichester: Jdm Wiley " Soa~ Ltd. GoIdIteIa, H. 1995: M"ltilerri sfaiinil.Yll motIe&. Loadoa: AmoId. Sarli, S. R. 1971: LiMtIr motIels. New Vorl: 101m Wiley &: SoDS, Inc. SDI,Iden. T. ad BCIIbr. R. 1999: MuJ/iiel'el tllftll),sis. I..andon: Sase.
compoafte endpoint
See ENDPOINTS
compound symmetry
This tenn is used to describe
the slruclun: of a covariance matrix thai has all its diqonal elements equal to the same value (sa)' fill) and all its oft"diagonal elements equal to another value (say fll~' i.e. a covariance malrix. with the form:
., 1:=
An KCOW1I of testing for compound symmelly is given in VoIaw (1948). SSE
(See also lJNEAR MIXED EHEtTS MODELS)
Votaw. D. F. 1948: Testing compound symmeli)' in a normal multiYlliatc diSlributioa. ARno& oj MothtmQliml Slota/its 19. 447-73.
conditional Independence graphs See
flI2 0 12
°Il °12
012
flI2
fI{
.,
condlUonailoglstlc rag~ This is a form of logistic rqression dlat can be applied to matched dala&ets. particularly cIaIu from matched CASE-CONTIIDL S'RJDJES (see MATCHED MIRS ANALYSIS). For such dala the usual logistic regression model cannot be used since the number of parDmeters iDCmISCS at the same rate as the sample size with Ihc consequence that MAXIMUM UXBJHOOD ESTIMATION is no longer viable. The problem is o~ome by regarding particular panuDClCr5 as a ·nuisance' that do not need to be eslimak:d (see NUISANCE PAIWIf:I&S). A condilionallikelihood function can then be CRaled thai will )'ield maximum likelihood eslimaton of the parameters of most intc:n:sI. i.e. the regression coefficients or the EXPLANATORY VARIABLES involved. 111e mathemalics of the proc:edum are described. for example. in Collett (2003). The conditional logistic regn:ssion models can be applied USing SIand.arcIlogislic regression software .85 follows: first. set the sample size to the number or matched pairs: next. use as explanatory variables the diffc:n:nces between the values for each case and control; Ihcn. set the: value or the: n:sponsc variable to one ror all observations; and. finally. exclude Ihc constant tenn from the model. SSE Callett. D. 2003: Modelling bilrtrry .to, 2nd edition. Boca RaIan: Chapman &: HaU/CRC ~S5.
condlUonai probability
A conditional probabilily is the probability of an event given thai another event has occurred. For ex.ample. ClOIISidcr two events tl and b. 111e probability orboiha and boc:cuning,dcaotc:d pc,a A b), using the mulliplim/ion rule (see PROBABILITY) can be expressed as:
= P(alb) x P(b)
(1)
Reanarq;ing eqWllion (J) yields the: conditional probabililY of a given bas: P(a,., b) P(alb) =
Such a SlIUctUR is umed by some a s to the analysis of longitudinal date. e.g. the random inleKqlt model. although it is genc:nlly lmn:alistic since. in pradice. varianc:es often i~ with lime and covariances frequently increase: with Ihc time interval bc:Iwcen two mcasun:ments.
ORAPlD-
CAL MODELS
P(a 1\ b)
fli °12
OONDm~p~mLrrv
pCb)
(2)
If tl and b a~ independent. then P(a " b) =p(a) PCb) and hence fromequalion (2) P(alb) = P(tl). Frequendy. wcwish 10 revene the conditioning: i.e. ndhcr than p(alb) we want P (bla) and this can be achieved using B.o\YES· nlECIlEM.
91
OONA~I~MB
_________________________________________________
D+
o.a
~
0.2
E+ 0.8
D+ 0.3 E- 0.4
~
0.7
Coadiliollal pmbabililics IR fRqUeDtly used in epidemiology (Ciaytoa and Hills.. 1993). 'l1Ie &111m shows a typical sihIalion is which incli\'iduals CaD clevelap. a diseue or nat ...... D+ and D- aapc:cIively. haviIIg been expallCd or not. cIeaatcd E+ and E- n:spc:c:liwly. 'I1Ic eoncIilional pmbability Ihaa individuals cle\'elop abe disease 'giYCD tballlley \1Wft eapasal. i.e. p(D+IB+), is 0.1. KRA CIQtoII, Do o........ M. IWl: Sltllillktlillltltk& iR qitkmi,.". 0dInI: Oxranl UniYcnil)' PIa&.
confidence Intervals 'I1Iis is a nagc: of wlucs calcuJated frvm a sample 10 lIIat a given prapDItion ofinterYals Ibus calcuJatal flOlll such samples waald CDIIIain Ihe IIUe
papulation _ac.ln JaeaKh. we collect cIaIa on ourn:sean:h subjects 50 we can tlnw CGIIClusions aboul some IIIr&CI' papulation. Far CXlllllpIe. in a randomiscd CXlDbaJled trial comparilll IWo obstelric n:Jimes. Ihe n:llllive risk of' Caesan:... lCCIion for actiw mlllUlplllCllt orlabaurCDlllplRClto . .line JIUIM&CIIICIIt was 0.97. with a caaftdence interYal 0.6010 1.56(SadIcr, Davison IIIKlMcCow.... 2(00). This ~ was carried aut in one obstetric unit in New 7aIand. we ale not spcciftc:ally inlCn:lted in lhis unit til' in Ihcsc patients. We an: inlclatcd in what they can lell us abauI wIIal would happen ifwcllaled rutun: palic:a1S with active 11UIftIIFII1CDl orlabaur nthcr than RJUline nuiaapmc:at. We wanlto bow DIll lhe n:latiovc risk for thc:sC: ptlrlialkr YIOIIIen but dae n:llllive risk fOl' QII women. The: IriaI subjects farm a sampJc that we usc to clnw _ _ conclusions aboul Ihe population or such palients in aIhc:r dinic:aI cenlRs in New Zc:aIancIand othcrCGU~ DOW and in the ftIbft. The: observed relative risk ofCac:sBlalllllClion• 0.97,. pmYiclcs an estimalc or the Jelatiovc risk we woulcl expecl to see in this wider pnpulali-. II is called a ptlinl ulimtlle because it is a sincle number. If we \1Wft to n:peal abe trial. we would not set .:xac:dy the: same: point eslimalC. Other similar IriaIs cilccl by Sadler. Davison IIIKl McCowan (2OOO).ovc ~ different n:llllive risIcs: 0.75,1.01 and 0.64. ~h of Ihesc lrials ft1RSCIIlS a dill'CRIIl sample: or patic:nls anclcUniciaasand tIIc:m isbouad to be SOIIIC variation between samples. Hence we c...not conclude that lhe n:Jative risk in IIIe population will be the: same: as ..at found in our p8raicular lrial ....Ie. The n:IaIiw risk that we get in .y
_I
4+-.............................................~..................................................----3+-.............................................~..................................................-----
risk I<>• aRelatIve not h:IucIng population reIaIIve risk
95% conldence inIerviIl
canftclenoe .....,... ConIIdenoe JnteMJ/s for 100 slmullJted t8IaIIve risks
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CONFIDENCEINTERVALS particular sample would be compatible with a J1IIIge or possible dilTerences in the population. We estimate this range of possibilities in the population with the confidence intcrval. A 95'1,. «Jnfidcnce interval is defined in such a way that. if we ~ to JqJCal the lrial many times and calculate a confidence interval for each. 9S~ of these intervals would include the relati\'e risk Cor the population. Thus if we estimate that the population wlue is within the 95CJt conftdeac:e interval. we will be com:d for 954it of samples. This is a pretty difficult «JnCept to get to grips with. The 8glft (see page 92) shows a computer simulation of relative risks and confidence intervals for 100 studies where the relativc risk in the population is 0.90 and the sample size and Caesarean raIe similar to those in the New Zealand study (Sadler. Davison and McCowan. 2(00). or these 100 confidence intervals, S include the papulation value (chosen to be 0.90). Many raean:hers misundenland conficlence intervals and think that 95CJt ofsamples will produce point estimates within this confidence interval. This is simply not true. In the simulation. the rust sample coafidcnce inlel"WJ is 0.46 to 1.15, and only 13~ of sample relalivc risks are within these limits. Such intervals are not unique and indeed many intenals with this propeny could be chosen. We usually choose the intenaJ so that. of those intervals Ihal do not include the population value, half will be wholly greater than that value and half wholly less. This often leads to intenaJs that arc symmetrical about the point estimate. allhough in the case of RELATIVE RISKS AND ODDS RA'IIOS this symmetry usually occurs on the Ioprilhmic rather than the natural seale. In principle. a confidence interval can be found for any quantity estimated from a sample. There arc several different mc:lhods for doing Ibis, SOnIC simple and some not. First. we shall show how conridence intervals can be found for two or the simplest statistics, MEA.,,. and proportion for continuous and categorical data respcc:tivcly. and then see what they show about «Jnfidcnce intervals in general. In the St Oc:orgc's Binhweight Study (Brooke et ai. 1989) data on binh weight and gestationalqe on 1749 pregnancies were obtained. For the 1603 bint. at 37 weeks' gestation or more the mean birth weight was 3384 g and the STANDARD DEVIATION was 449 g. This is a large sample and the sample mean will be an obserwtion from a NORMAL DISlRIBUTlON whose mean is the unknown mean birth weight in the population and whose standanl deviation is well estimated by the standard emJI' 449/ ,JIM» = 11.2. For a normal distribution. 9S4it of obSCl'Wlions arc less than 1.96 standard deviations from the mean. so 95'1,. of sample means will be less than 1.96 standard enurs from the population RICan. The 9S4it confidence interval has as a lower limit the sample mean
minus 1.96 standanI CITOrs and as an upper limit the sample mean plus 1.96 standanl errors., 3384 - 1.96)( 11.2 to 3384+ 1.96 x 11.2=3362 to 3406g. Similar methods can be usc:d for many large sample estimates. We need the estimate to be from an approximately nonnal distribution and the standard error to be well estimated. We can estimate a confidence intenal for a proportion p USing the standard error formula for a BINCMIAL DlSTRIBI.I'1X»l 'P(I-p)/rr. For example. in the St G:orge's Binhweight dy 146 of 1749 births occum:d at less than 37 completc:d weeks' gestation. The proportion islhus 14611749=0.08348 or 8.34it. The: standard error is estimated by .08348(1-0.08348)/1749 = 0.006614. 'I11e 95'1,. connnce inacrval is thus 0.08348 - 1.96 x 0.006614=0.07052 to 0.08348+ 1.96 x 0.006614=0.09644. Rounding Ihis, we get 0.071 to 0.096. which is from 7.1% to 9.6%. For small samples things get much I11CR complicated. We cannot assume that theeslimate follows a normal cliSlribution or that the standanl error is a good estimate of the standard deviation of whatever distribution it does fo))ow. For means, we can use a method based on the standard enor if we assume that the data themselves follow a normal distribution. If we make this assumption then for a sample of n obscnations Ihe difl'erence between the sample mean and the unknown population RICan dividc:d by the standard error follows a ,DlmtlBurlON with rr - 1 DECJREfS OF FIl£B)Q)J. Rather than 9S~ or samples having means within 1.96 standard errors of the population mean, they ha\'C means wilhin laM saandard errors of the poP'dalion mean. whCR ' 0 •06 is the two-sidc:d S4it point of the ,-distribution with degn:es or freedom. In Ihe binh weight study there were II babies born at 34 weeks' gestation. Their RICan birth weight was 2477 g with a standard deviation of 531 g. giving the standard emJI' 5311 JIT = 160.1 g. Then: were II - I = 10 degrees offn:e.. dom and the 5'1,. point or the t~bution is 2.228. The 954it confideRCIC interwl for the mean birth weight of babies born at 34 weeks was therefore 2477 - 2.228 x 160.1 to 2477+2.228 x 160.1. namely from 2120 to 2834g. For a propoltion estimated from a small sample or small number or events, things do not work in the same way. 111e standard erroreslimate can go disastrously wrong. In a study of isolated inlnlcaniiac echOJenic foci in foetuses. we found one bisomy-21 abnormality among 177 subjects (Pn:fumo el al., 20(1). The proportion was Ibus 1/177 =0.00S6S. or 5.65 per thousand. The usual 95% COnfidCDCIC interval using the normal approximation to the binomial disbibution gives -5.4 to 16.6 per thousand. clearly impossible. The large sample assumption has broken down. Researchers will actually quote such impassible intervals andjournals ha\'C been known to publish them! Somc:limes, realising that the negative limit is impossible researchers will replace it by zero,
93
CONFIDENCE INTERVALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ buttlUs. 100. though beuer, is still wrong. The lower limit of the confidence interval annat actually be zero in this example. Since we have found a case in the sample. it is not possible that there are no cases in the population. There are a number of diffcrent methods to improve this interval (Newcombe, 1998). One of these uses a proc:edu~ based on the exact individual probabilities of the binomial distribution. The binomial distribution has two parameters. the number of independent obsen'ations n we make (e.g. number of patients) and the probability P that any given observation will be a ·yes'. This probability is what we are trying to estimate. We find the lower conftdence limit as the value of P 50 that the probability of obtaining the observed number of ·yes's or morc will be 0.025 and the upper limit as the value of P so Ihat the probability of the observed number of 'yes's or fewer will be 0.025. These probabilities are obtained by summing the exacl binomial probabilities for all the possible numbers of 'yes·s equal to and beyond that obsc:rved.. n.c calculations for such methods are extn:mely tedious. but not to a computer. For the echogenic foci data the 95... aJlllidence interval by this method is 0.00014 to 0.03107. or 0.014 to 31 per thousand. This is an example of an exact method calculation. because it uscs the exact probabilities of the distribution (se:e EXACT MEI1IOOS RIl CA"lmaUCAL DATA). There are seven! other computer-intensive methods that can be used, such as the BOOISlRAP and those: based on rank tests. The confidence interval allows for what is called sampling variation. This means that it reftec:ts the differcnc:e between estimate and population value likely in nndom samples from that papulation. However. it does not take into account other sources of variation. tcnncd naasampling variation. The sample that we have is from geographical space. in that it aJIItains one hospital. as in the .:tive ll'UlllBgemcat bial (SadJc:r. Davison and McCowan. 20(0). Evca the IlII'J:est clinical trial wiD contain at most only a few hospitals and their patients. The hospilals arc not chosen nndomly, so the sample will differ from the population in an unknown and inestimable way. It is also a sample in time. in that we want the sample of patients seen in the past to tell us about patients whom we will see in the future. The sample may not be as good at estimating quantitics in this wider population as the confidence interval suggests. The interval quoted in 1he .:live mllllBlcment trial was II 95 CJ., confidence interval and 95... of such intervals would aJIItain the mative risk fOl'the population. We could also calculate intervals for other percenlages. e.g. a 99'1,. interval. calculated so that 99... ofpassible intervals would contain the population estimate. For the Caesarean section relative risk the com:sponding 99CJ, confidence interval would be 0.52 to 1.81. wider than the 95CJ., inlClVal of 0.60 to 1.56 n:ponc:d. In compensation. more of these intervals would aJIItain the population value.
We could calculate a much narrower interval. A SOCJ, confidence interval is calculated as estimate minus or plus 0.67 standanienms. com~d to estimate minus or plus 1.96 standard enors for a 95... confidence interval. The: SO... interval basc:d on a large sample normal approximation is only 34% of the width of the 95CJt. interval. This is not very useful as an estimate. as only S09L of such intervals contain the population value they are estimating. However, it shows that if we calculate 95'1,. confidence intervals. we can say that fOl' about S09L of samples the middle third of the 95CJ, confidence interval will contain the population peI1Imeter. Thus. 95% is chosen as a standard eonfidence level as a reasonable compromise between width (or precision) and coverage probability (accuracy). SignificllDCle tests and confidence intervals are closely related. Many null hypotheses are about the value of something we an also estimate. such as the diffe:rence in mean between two groups. It will usually be the case that if the NUU. H\'FOIHESIS value (diffCRIKle or regression coeflicient = O. odds ratio 01' ~Iative risk = 1.0) is contained within a 95CJ, confidence interval then the P-value will be g~ter than 0.05. For example. in the Birthweight Study. we might want to test the null hypothesis that mean birth weight in the population is 3400 g. To lest this. we sublmct 3400 from the obscrved mean and divide by the standard crror, 11.2. This ratio. (3384-3400)111.2=-1.43. would be an observation from the standard nonnal distribution if the null hypothcsis were true, givingP=O.lS. HCR the 95% conftdclKle interval (3362 10 3406 g) includes the null hypothesis value for the mean. 3400 g. and P > O.OS. Contrariwise. we might want to test the null hypothesis that the population mean birth weight was 3500 g. Now the test statistic is (3384 - 3500)1 11.2 = -10.36, giving P < 0.0001. The null hypothesis value is not included in the confadc:nce interval and the difference is significant. Thus the 95% confidelKle interval can be used to do a significance test at the 5CJt. level. For mcans and their differences then: is an exad relationship between the usual conftdence interval and the usual signiflcance test. because the standard CIIOr is not related 10 the quantities being compan:d (means) and thus is not affected by the null hypothesis. It may not woJlt for proportions. relative risks. odck ratios. etc. For example. let us test the null hypothesis thai in the population the proportion of births at less than 37 weeks' gestation is 8 .... Under the null hypothesis. the prtJpOJtion is 0.08 and the standard error is 0.08(1-0.08)/1749 = 0.006487. not the same as the 06614 used for the confidence interval. The test statistic is (0.08348 - 0.08)10.006487 =0.54. P= 0.59. The nuD hypothesis value of the propartion is within the confidence interval 0.07110 0.096 and the difl'ercnce is not significant. Now let us consider a null hypothesis value just outside the confidence interval 0.97. The standard error. if the null hypothesis wac
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CONFIRMATORY FACTOR ANALYSIS true. would be .fH1 x (I-OJ)97) 1749 = 0.007077. The test statistic is (0.01348 - 0.(97)10.007077 =-1.91. P =0.056, but not significant. Thiscffcd ofthe null hypoehcsis on the standard cnor is why we sometimes see odds nlias. relative risks and SlaDdaniised mortality mtias where thc 9SCJt conlidc:nc'C intc:rwl includes 1.0, but the ratio is repoJ1cd as significanL Rcscan:hers arc now cncouraged lO present results as confidence intervals iMtcad of. or in addition to. P-values (Garclnu and Albnan. 1986). This approach is more informativc than the pnclicc of giving a P-valuc or staling "significant" or 'not significant'. as it provides an cstimate of the size of the possible dUTcn:nce or mtio betwccn the groups in the population. This is particularly useful when diffen:aces arc not statistically significant. as it cnables the reader to judge whether a poICDtiaily impoJtant diffcn:nc:e could havc been missed. P-values and confidence intervals both have their role and if possible both should be given. Most major medical journals now include in their recommendations 10 authors that thc main results of studies be presented using confidencc intervals (or their cqui\'alenl) and that authors should avoid relyi", solely on hypothesis testing. Finally. same comments on the Bayesian penpc:c:tivc. then: being two differing statistical philosophies. the Bayesian and the frcquenliSi. At present few Bayesian analy5Cs appear in thc mediailliterature. althouP we may expcctto sec more of them in rul~ (sec BAYESIAN METHODS). People often talk about a 9SCj(, confidence interval as including the unknown population value with probability 0.9S, saying. for instance. then: is a 9Sfit chancc thDi the true value lies within the computed 9SCl. confidencc interval. Now, it is true that if wc sct out to collect a new sample. the probability that its confidence interval will include the papulDlion valuc is 0.9S. Howcvu. onoc the sample has been collected and the interval calculated. it either includes the populDiion valuc or it docs not. we just do not know which. In strict frc:qucnlisttcrms. we cannot talk about the probability of the papulation plU1UllClcr having any given value or range of valucs.ll has a constant. albeit unknown. value with no probability distribution. A Bayaian is willing 10 think of thc population value as a variablc with a distribution. which n:prcsenls the unc:CJtainly in our estimatc of iL Bayesians quote something called a CRmIBLE INTERVAL. which is a range of possiblc values that has a given probability of including the unknown population valuc. This probability is often sci at 95 Cl.. Thus a 9Sfit crediblc interval is a set of values that is estimated to includc the papulation value with probability 95 CJt, whcn:as a 9SCI. confidence interval is a sct of values chosen so lhat9Sfit of such scts would include the population value. For the proportion of biJths before 37 weeks. a Bayesian credible interval. assumi", no prior knowledge. is 7.ICJt to 9.7", virtually the same as thcconfidence interval
(7.1 'l.lO9.6'l.). The difference is academic. which is perhaps why academics have spent so much time qui", about it. JMB
Bruob,o. G., AndellODe H. R., BIaad.J. M.. PeImd,J. L ..... Stewart, C. l\L 1989: Eft'cc:.ts on birth weight of smotiag. alcohol caffeine. socioccoaomic factors. aad JlSYchosoc:iaJ ~ss. BriliJIJ Medkol JDUmQI298. 795-801. G........, M. J. aad AI....... De O. 1986: Confidence-intemJs rather thaa p-WIIDes - estimation rather than hypothe.s~~ Srilisb Meditol JOIITnttl 292. 7~SO. NIWCGIIIbe, R. G. 1998: lWo-sided confidence intervals for the single prupodion: comparison of seven methods. Siolislirs in MedkiJre 17.857-72. PnI'mDo, F.. PnStl,P., l\lanldls, E.. Saaud.
A.F.. BI.....,J.~L,C............ s. ..... Camlll... J.S.~OOI:~laled
echoFDic foci in the fetal bean: do lhey increase the nsk oflJisomy 21 in II population pnMousty smcned by nuchal lransluccncy? UllrtJSOumJ hi obstetrics aNI Medicine II. 126-30. Sadler. L C.. T ...... McCow., L M. 2000: A randomiscd controUcd bial and mcta-anaIy.sis of Klivc ~mcnt or labour. BrilU/r Journol oj'Obstelrics turtI G)JltWco/ogy 107.909-IS.
Danso.a.
confidence level
Sec C(]lIIfIJ)fNQ INJalVALS
confirmatory factor analysis
This is a procedure for testing a hypothesised factor sb'Ucturc for a sci of obSCl'Ved variables. Thc hypothesised structure will specify both the number of factors and which observed variables arc rclatc:d to which factors (Dunn. Everitt and Pickles. 1993). This contrasts with FACTOR A.~ALYSIS when used in its exploratory mode when the number ofractors has 10 be detenninc:d in some ways from thc data and no a priori constraints an: placed on the factor structure. Conrannatory factor analysis is a thc:ory-ccsting madel as opposed to a theory-generating method like explOJalory factor analysis. The first step in a confirmatory fador analysis involves the calculation ofcithu a correlation or a COVARJANCE ),f,O\1RJX for a sel of observed variables. Then possibly a number of competing fador models arc proposed. derived either from theory or previously performed exploratory factor DDalyseS on othe.. datasets. The models will differ in their specifications of 'frec' and "fixed' parameters. MAxD.IUM UKElJHOOO ES11WJJON is generally used to cstimDle the fr= p8l'8l1'ldcls in a model. ConfirmDlory factor analysis models can be fiued using one of a number of available softwan: packages (USREL. EQS, MPLUS) and a wrietyofmethods can bcused to test the filof a model and to compare: the fit of two competing modcls. As an cxample or where this approach mighl be applied. considu a psychiDlrist who meaSW'eS a number of variables on a samplc of mentally ill patients. The PSYChiatrisl belicvcs that some of the observed variablcs arc relatc:d to a patient" s dcpn:ssion and oIhcrs to anxiety. and hc or she is particularly interested in estimating the correlation between these two, essentially. Lo\TEHT VARIABLES. To make things specific suppose then: arc six observed variables with thc first three indicating depression and the remaining threc. anxiety. The
95
OONFOUNDING _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Delailcd exampies or the appUadiaD orconfirmakH)' f~~ aaaI)'Iis an: giyen in Hub&. Wingard and Bcatlc:r (1981) and Dunn. Eyeritt and Pickles (1993). BSE o.m,G.. EwtrIt,LS.....dda, AoI99j: MotkIJiRlcoa'Ol'iDncr, 1YII'iIIbk, DIrg EQS. Boca Raton: CRe PlaslCbapmlll a: Hall• ......, o. J.. WIIIpnI. J. Ao .......... P••L 198): A CCIIDpIIrisaD of' IWo laIent YariabIe causal models far adolescent drug use. JtJlUlIIIIlI/ PuJtHlllIi" _ S~,iDI PS]tIlol"l)' 40. 116-93.
_ltd.,
confounding
See BIAS IN OBSERVATIONAL S11JDIES. ~
CONIROI. STUDIES
Consolidated standards for Reporting TrIals
(CONSORT) statement
"Ibis rcscan:h tool was desilnc:dtoimprovelhequaiityofn:paltSofclinicailrials(Bc:u el al. 1996; Moliere' ilL, 20(1). Thccon: conlribution orthc: CONSORT sIaIcmc:nt consists of a now cIiagnun (see the figwe) and a chc:ckIisL The ftow diqram enables ~iewcrs and JCaden to pup quickly how maDy eligible panicipaIU wen: randomly assignc:cl to each ann of the trial and whether any imbalanc:es an: appIRnl n::pnljng Dumbers or palicats withdrawilll flOm 01' failiq to comply with their assigac:d Ralmenl (sec: DIIOPOUrS). Large discn:pandes ar imbalances SUgesllhc: need far conducting not only INI1iNI1ON-.11lEAT (m) analyses but also FER FROTOCOL analyses to seek corroboiation. SUch infonnalion is fn:quc:ntly difficult or impossible: toasoCltain from trial n::ports uthey wc:rc n:parted in the past. Thcchccldist iclcnliftc:s21 items lhatshould be incorpallllCdin
confirmatory factor.lUdyais Palhd/aglBmfordepression BtJd IIIJJdety example
corn:laIaL two-radcrmacieJ to be fitted isdeacribcd gmphiadly by Ihe .... diapam shawn iD the figaR:. Apart from 1hc enur wri~ the paI8II1daS to be estirnaled me the IaadiDp oflbe first thn:e wriabJes GIl ractor one (cIcpIasion) - variables rour~ 1M and six arc consIrainI:d to have zc:m Ioadin&s onlhis variable - and the la.linp of the lui thn:c yariables an facflll' IWo (anxidy) - now the firsllhn:c variables an: CODSInIinccllD D:IU Iaadinp.. 1he estimated corn:latian between the Iab:nt variables., cIepaasion and anxic:ly will be a c&satIcnualcd ~ IaIion.. i.e.. one in which Ihcdfeclsor~menl cm:n in the CJbsentcd variables ..~ been eft'CCliycly R:IIlCM:d.
"VC
Regislered or eligible patients (n = •••, I
Not rancIomised (n = ...) Reasons (n == ••• ,
}l ReceiwKI standanI Intervention as allocated (n = ...)
Received inteMlllion as aIocated (n = •••)
Oil not receive standanI Intervention as allocated (n = •••)
DId not receive interwnllon as aIocated (n = •••)
I
Followed up (n = ..•) Tuning of prinary and secondary outcomes
I
foIowed up (n = ..., Timina of primary and secondary outcomes I
WiItDawn (n = ...) InteMlllion iniledive (n =...) Lost to 1oI1ow-up (n =...) Other(n= ••.)
Withdrawn (n = •••) Intervention inelfective (n Lost to foIl~ (n = ••.) OIher(n= ••.)
=...,
I
~eted trial (n = •••)
Completed trial (n == •••)
!
ConaoIkIIIted SIancIarda for AeporIIng Trial. lllatement Flow clagram of CONSORT slatement
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CONSULTING A STAnsTICIAN the title. abSlract, introduction. methods. results or conclusion ofevery mndomizcdclinicallrial. More details am be found at www.COIL501t-slatement.OJg. BSE (See also atI11C'AL AJIIIRAISAL, STATlmCAL REFEREElNO) _lilt C.,CIao.M..Irastwood,5., Mort..... a., MaIaer,D.,OIaIeID, L ~/"'.
1996: Improviagthcqualityofreportingof'randomizcdclinkal trials; the CONSORl' statement. Joumoi of the Amer;C'tIII Medical ADodation 276. 637-9. D.. Sbalu, K. F. aDd AlImaII, Do G. 2001: The CONSORT statement mised m:ommcadalioas for impIovin, abc quality of JqIOIts of parallel-group randomized trials. Annals of 'lftmIDl Met/kiM 134. 7-622.
,..aIaer.
consulting a statlsUclan
'To conmJt Ihe ,'atisli· ciDnajleran experiment isjinishediloften merel,. 'otuk him 10 conducl Q post-morlem exanJinalion. He rQlI pnhaps say ",/ralthe experiment died of.' So said R. A. Fisher. later Sir Ronald. widely considered lhc founding father of modem slatislics. and of R.\JIIDOW~ in particular. as Ion; ago as 1938. His ton;uc>in-c:heek message remains sage advice just as true today as a n:mincler of Ihe singlc most important aspect of sc:c:king slalistical advice - to seck il early. Many no\ice n:searc:hers make Ihe mistake of believing lhc statistician to be thc numbers person only to be approached. and then with bqJidalion. once data have been collectc:d.. In actuality. a consullation with 0 statistician should be a positive expericncc and opportunity to assist planning all aspects of study design. meaning neither just Ihe subsequent analysis nor the nanow mallu of SAMPLE SIZE DE1EIWINATION. Natumlly, lhen: 1ft important diffen:nces in how statistical consulting tUes place according 10 whether the setting is within a university. a hospital. a pharmaceutical company. a governmcnt agency and so on. duc to the obviousmffcn:nces between public and privatc sedor employers. not to mention gcocraphical diffcn=nccs from one wnlinent to another. Statistical consulting can also takc plaec in a variely or ways: tclephone. cmail or facc-lo-race, or a mixlun: Ihc:reof. This entry will rocus on the mast productive manner. nlllDCly face-to-face., sinec this maximises clrccliyc two-way conunurucation. II also CXlDCenlrales on those aspects of n:scan:h projccls that an: R:asonably consislcftt n:gardlcss or the particular environment. although the aulhor's perspective is bued on experience within academic settings. 11Ic remainder of this enlly examines Ihe sort ofprojcct-specific ad\icc a statistician am give. noIably incluclin; general guidance on ptqI8ring for a first meeting with a statistician and somc observations on abc inlCraction ~n the Slatislician and clinical rcsean:hcr. What cannot be included. necessarily. is local adYice on when: 10 find a nearby consulting statistician in the 8rst place. In abc event none is available. one should consider usililtexlboolcs or WEB RESOlJRCES IN MEDK'I\L SfA11S. TICS. or cven tra\'Cllilil 10 attend a short course offering an inlrOduction to the subjccL For further details concerning technical content. in addition to Ihe prcx:ess. of statistical
consultations. the rader is R:fcned 10 the rcstoflhis volumeor else to one of several boob. such as Hand and Everitt ( 1998). Derr (2000) or OIbera and McDougall (2001). 8nJadly. n:sc:arch can be subdivided into a number ofdistinct slagesasdcpidc:d in Ihe flgun:. Tbe wonltimc tob apprtJGICh a Slalb1ician is at the past-n:fCRCiq ua;c of a submilIaIjoumaI article. Consultant slaIisticians may be abJc to ofTer some mnedial help at this IaIe Sla£'e. but only on mailers of analysis. iderpR:Iation or prcsedalion. 1be most cammon n:asons why 5lalistical rd'c:nx:s rcc:ommcnd R:jeeIion or submitted manuscripts 10 biomedical journals pertain to design issues, which is hardly surprising when one Ialisc:s Ihal fUlldarnaual nows in SlUdy ap simply cannal be R:bicvcd by sophisticated anaIyscs (see STA.fISfICAL REHIttEINO). 'Thus. if the paper has not been mjedcd ouIrighl on staIisIicaI puunds., then: may be hope for Ihc manllSCripl aft« suitable nMsion. A staIisIician approached at such 0 late SlaIe is Iilccly 10 drop meR than a subtle hintlhat it would be allOp:Iher IIIIJR: SDtisractal)' csaentially to hec:d Fasher's advice mel request that the racaR:hc:r come along sooner in 0 projecl's lifc cycle Ihe next time!
Respond to referees' comments
consunlng a slatlsUeI.. Schematic diagram of the research process, from initial thoughts through to dissemination of the study results. Statistical input should ideally be sought at the study fotmulation stage
97
CONSULTING ASTATlsnCIAN _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Then:: is a ICmptation to think seeing a statistician is unnecessary if one has confidence in one's OWD statistical knowlc:dgcanciabiJity (and access to relevant SOFIWAItE).1'his can be a danguous policy for Ihc novice n:sean:her. cspecially if the confidence tums out 10 be misplaced or handling the data man: complcx than envisa;ed. Evcn veterans of mc:dical n:scan:h with substantial slalislical skills of their own can find consulting a statistician invaluable, despilC the time andeffOlt n:quin:d in the mickt of busy rescarchagcndas and clinical conunilments. 'Ibis is by no means just to dcle,ate data-n:laIc:d tasks bul to additional. independent input about the intended study from an aho,cihcr ditren:nt penpec:tivc. Statisticians. after all do not sec the world of medicine and rcscaId1 in lhc same way as those on the frontline dealing din:ctly with paticnts. nor for that matter those working with test tubes in the laboratory. What sort of help can a statistician offcr? Clearly this depends on the nature of the projc:c:t itselfand the extent of the sllllislician's involvement. For example, ir in an academic environment a student seeks advice on a research projc:c:t forming a part of a ~e. then involvcment will be nc:cessarily less than in a full collaboration. In the fonner case, the swdent nccck to own Ihc anaIyscs and be able to derend Ihcm single-handedly. so thatlhc role of the statistical consultant is to poinl in the ri,ht din:ction by n:commendin, an appropriaIC choice or swciy design and method fordala analysis. To help a\'Oid bec:omin, a surroplc supervisor by derault il can be helpful to SU"cst thatlhc stuclent's projc:c:t supervisor should also allCnd the cOMultation. It is important to clarify early on in the consultation process irit iscxpcctc:d to bec:omc a full collaboration. far then issues or payment. if indicatc:cl. and co-authorship (or acknowledgement for lesser statistical involvement) need to be discussc:d and agn:c:cl. Payment for statistical advice remains a dclicate matter and Ioc:al nales would dictate. It is scnsible. so as not to discouragc those who mD5t need statistical help. to have a policy whcn:by the first meeting (or say about an hour) is provided fmc orany ~t charge to the consuhec. Parter and Berman (1998) providc some helpful criteria for sqgcsting when authorship may • may DOl be appropriate. As a nale. ir the finishc:d piecc of work could nOi have attained its statistical quality without the assistance of the consultant statistician. and more than just elementary descriptive or inrerential statistics an: involvcd. then Ihc default ~ht to be ~authorship for the statistician. Then: is at least anecdotal evidence that statistical ~authorship enhances chances of publication in fint-cboice journals. Equally, then: is a ~ that sbdisticians' names can be used ~ainstlhcir wishes to lend perhaps man: c:n:denc:c than is due to some submitted papers or grant applications! What should be broqhtto a fint meeting? In orduto make the most use oflhc lime available il is best for Ihcconsultec to
,aiD
have made some specific pn:pandions. A checklist can assist. pemaps in the fonnat or a QUEmO~AJRE to be completed in ad\'llDCC of the initial mc:ctin,. Useful questions to addn:ss both 'housekeeping· matlers as well as more substantivc issues concerning lhc projc:d. include Ihc following: I. What is the single nmin aim oftire project '! (A bricf answer to this fundamental queslion at least ensureslhalthe mccting can be foc:usc:cl.) 2. What stage is the project al rigM II0W'! (Options can be forming ideasldesipi~ protocollcollc:cti~ datal analysis or dabllwriling upfreren:e's comments.) J. What arra(s) do yOli Ihilllc )"011 need "elp wit"'! (Some an:as an: formulating ideas/sample size calc:ulalionldesiping protocollmaldng granl applicationlrandomisalion practicalitiesf carrying out the study/collecting dalalmanaging data/analysis you an: doing/c:bccki~ your anaIysislchcckin, wriucn n:portlrespanding torcfcree.) 4. \Vhal role Jl'OIIld)'Ou lilce the stalislicilm 10 play,! (Thesc could be advisorlco-applicantl interpretel' of resultslco-author. althoqh note thalthe statistician would reserve the right to decline the latter if authorship was felt to be inappropriate.) S. Does t"is ...'On form pari ofa disserlation or tlresis'! (See comments abo\'c concerning student work.) 6. \Vhal is Ihesour«ofpolientsor sllbjects and lire crileriafor selecting Ihem? (This allows an opportunity to discuss ar review appropriaIC study design.) 7. Ho ...•many subjects are reqllired or available'! (If this is to be a topic for Bdvice.lhcre is a need to know clinically n:lewnt differences in proportions invol\'ed and/or sIandanI devialiOM far continuous outcomes.) 8. What is the maiIl outcome measure? (Again to foc:us attention on primary as opposed to seconchuy ENDPOINTS or, in the worst case. to ensure lhc project docs pre-specify at Icast one endpoint.) 9. W/ral is the main ('Otrrparison or relationship 0/ interest? (To enc:ouragc bei~ as specific as possible and to check for a suitablc control ,roup.) 10. What other quanlilies are being measured and "wen? (For example. B.UELL.'IE MEASUREMENTS. covariates. scc:ondary outcomes.) II. What problems lrtuY! been or are tllfticipated in dota colledion? (To discuss. f . cxample. aecuracy. MJSSJNO DATA. repeated measures. matching but essentially any potential BIAS.) 12. Wlrat aree.'Cpected orhopetijoT rrsullsal lhe stlldy'smd? (Apin to foc:us on lhc real n:ason for performing the rescan:h.) /J. Are there any specific approaches 10 .'a am/ysn intended? (For instance. Ihc same method as in a previously publishc:d study. preferably with a hard copy 10 be handed over.) 14. Is there any furlher infornroUon )'OU would like 10 gi.'Y! rrgarding the sllIdy? (A suitable clOSing question to allow, one hopes. any pertinent facts to emerge.) It is best if aMYlCrs to the above catalogue or questions can be sent in advance of Ihc meeting. along with a brier description of lhc projecl and copies or related dac:uments to assisl the statistician's understanding (c.g. protocol, gnmt application. Clhics commillCC submission).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CONSULTING A STATISTICIAN In lenns or practicalities. the slalislieian may have some rurther expeclations or the CGDSullce 10 bring or transmit in advance or the Rill meeting at which data DR to be analysed (n:call, idc:ally, this isnol at the first c:DIXIUnter!) Statisticians do nul usually lake on the more mundane data enlly tasks. so would not be pl'CJHImIto type in Ihc: numben. They may express IRferences rorhow data IR pracnted eleclJonicaily in terms or file type (e.g. Excel being a cammon choice) and media (Ioppy disks are fine. thoqh somewhat oldrashioned; email atlachments genc:raJly wadt better ror small-to-modcratc: sized datasds, or USB pen drives more genc:rally). In any evc:Dt. it is always important to check ror viNsc:s 10 awid spn:ading conlaminated files. The layout or the data should anlinarily be as a sprc:adsbc:c:t. with welllabelled variable names. one column pc:r variable and one row per subject. It is best to ask ir th~ is a data enlly prererence whc:a handling repealed measura data. but ir in doubt the spmldshcc:t works well. In any case, data provided must be reasonably clc:an and rn:e rmm dala enlry errors, although the odd OUTLIER is excusable. Due to conRclc:nlialily issues (e.g. the UK's Data Protection Act 1998) there should nol be any uacadc:d individual patient iclc:nlific:n: i.e. names and addresses and other inrarmation that could be used 10 trace individuals must ha~ been n:mGved. ObViously, the: anonymisation praa:ss must generate uniquc palic:at IDs in order to be fully reversible $Olllat quc:ric:s with dalacan bechc:ckcd flOmariginal records that are ston:d clsew~. While it is nul a serious problcm. it is better to code data numerically nllhc:r than alphanumerically. For example •• ' and '2' ror 'malc' and ·female' rc:spc:cti\"Cly is bc:ucr lhan use or 'M', ·m', "male'. ·Male'. ·MALE', c:tc., especially as atlCidental leading ar tniling blanks can add to potential c"onf'usion, possibly creating a nc:edless missing clala point on subsequenl conversion to numeric formaL In geneml. calcgarical variables should ha~ a ditrcrc:nt number rcpnm:Ating each group, with an acc0mpanying clc:scription. or internal labelling, or how the categorics are c:odcd.. Equally. missing data DR better bandied by insc:rting an obviously impossible value (e.g. '-99~ whc:n all other values DR positive) rather than just leaving a spradsheet cell blank. The: statistician would rather be told about any such embeclded axle, however, to avoid unnc:ccssary runs or softWIR IVUlines after noticing. for example, stl'DDge rc:siduals in regression analyses. An altogether less langible item 10 bring along. but arguably the most impadant ror a suc:ccssful mec:liDg. can be summarised as the: righ, ,,"Ilude. Statistical ClClftSullation involves a high dqRIC ofCXJIIImunicalion and mutual respect. Since areas orexpeltisc are ditren:nt.j8llCID is to be awiclc:dboth by the statistici... and the consultcc:. (Medics are not alone: in haVing big words. ar ablRviations and acronyms, to describe lhings that are obvious only to themselves!) Pune-
tuality is important but it is undentood that mc:cIicai emergencies can and do occur thai nc:cessilale being late. in which cue 8II1II1Jing far a telephone mc:ssqc is a simple courtesy or cutting shart an ongoing meeting at a blc:eper~s notice. However, there is lID such lhing as a slalistical Cmc:l'lcncy, so the~ is littlc excuse far the cansullee who demands an immediate appointment with a statistician or expeds raults 10 be tllJ1led around within. 58Y, 24 hours to meet his or her cIcadIine rar • grant. Clhics or canrerence submission. parlieularlyas such clc:adlines are typically known months in ad\'8DtlC. Also bear in mind some c"oosulling statisticians are new to their jobs. Just as some lnIining or junior doclon occurs ·on thejab~. so Iooclo juniar sIalislicians ha~ 10 learn. iclc:ally under supervision rrom someone more experienced. by interacting with rail clients in real consultations. 11Ic: bansition fram a univc:nity degree coune to a practising statistical consuilaDl is never automatic. An attitude of patience is helprul in these cin:umslances. much as laluiam by drivers stuck behind a leamer*,,"ling willi hillstarls(all were Ic:amer cIri\ICIS once!). To close. and in Icceping with Ihe spirit or F"asbc:r's advice quoted earlier, it can be instJuctive toc"onsicler waysorhaving an _helpful meeting between a mc:cIical resc:an:her and statistician. So long as both panics can avoid malting Ihese mislakes. then: is scope: for real progras and genuine collaboration. Flr:sl. whtlt Q1'e some of 1M "'wy-, a slotistietlll upse' medlcul colleQ,ues? I. Being 100 nit-picky. pn:cise, clc:tailoriented and railing to see the big picture. 2. Being slow to n:spand to requests ror appoinbnc:nlS ar to analyse data. 3. Bcing overly crilical or gcnuine-but-ftawcd attempts to analyse data themselves. 4. Using unnecessary jillion. S. Using unnecessarily complicated mc:Ihods when simpler ones suffice. 6. Spending tao much. or 100 little lime. during the consultation. 7. Embarking on a mathematical lecture within • consultalion. 8. Only expecting to meet on your home turf (clcspitcowning a laptop). 9. Believing ~ is such a thing as an average patient 10. Thinking EVIDENCE-a.\SED MmICINE. (EBM) means clinical expcric:acc counts ror nothing compan:clto having a few well-honed CIUIICAL AJIFRAISAL skills and a recenlly published META-ANALYSIS to hand. Fin"I1.". IrOIl" 10 up., .,'OUr st,,'&Iicitm? 1. Sayiag "This will only take S minutes or your time'. far it will nul. 2. Arriving unnannaunc:cd, late or not at all (notwihSlancling genuine clI'ICI'ICncies). 3. Wailing until the grant or ethics application deadline is lomonow and leaving no time rar review orstatistical input beron: sc:adingthe documc:at otT. 4. Driprc:c:cling data ar hypotheses or telling halrlhc: story ('Ob. actually irs Ihe same: patient seen five times') or shifting between study aims. S. 1bking far granted - not considering acknowlcdlement or co-authanhip or balhc:ring to inform ir that application ar joumal submission was eversuccc:uful or
•
CONTINGENCY COEFFICIENT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ noL 6. Saying, in earshoL "I just need the slalto toaunch the numben' and generally reprding the statistician as a technical service provider. 7. Expecting knowledge or specialist mc:dicalll:nninology. 8. Expecting poorly entered data to be cleaned or fmgdting to run a virus check on your data. 9. Demanding "What"s the P-yaluc?' or 'CanOt you ftnd one that is significant?' 10. Coming too late in R:SClIR:h process and aJlDplaining about a statistical postmortem! CRP Callen, J. 8Dd Me....... A.. 200): Slatistitlli consulting. New Yolk: Springer. Derr, J. 2000: Slatink'oJ eOMllllillg: a gllitk 10 efferlive ~itlltion. Pacific Grave. CA: Duxbury Press. Bud, 0. J. 8IId Eymtt, B. S. (eels) 1987: TJw stal&titaJ consuIlaIIl in aclion. Cambrid&e: Cambridce Univenil)' Pms. Parktr, R. A. ... Bem.a, N. o. 1998: Criteria for authorship far sIalisticians in medical papa'S. Slalistics in Mftlirin~ )7,2219-99.
contingency coefficient
This is a measure of the slrength ofan association between two categorical Yariables. While the CHI-SQUARE lEST can detect an association between two variables.. it is not a good DlCaSlR of the stmIgth of that association. This isbccausc it is alsodependcnl on the sample size and Ihc numbcrorcategories into which the yariables 1ft classed. Typically, COIIliagency coefficients are adjustments ofthechi-squlR sbdistic, intended to remove the dependence an those ractcxs. Because they are based on Ihc chi-square statistic. any attempt to test the contingency CXJCl1ic:ient for significance will mcn:ly resolve into repealing thcchi-square lest of independence. The two most common COIItiogcacy coefficients 1ft CramCr's contingency cocfBcient (also known as Cramer's c. Cramer's V and occasionally CratnCr's v) and Pcanon's amlingency coefficient (often just refcrml to as Ihc c0ntingency coefficient 01' as Pcanon's cocfllcient or mean squlR cOnlingency). For a table with T rows and c columns, with k being sct as equal to abc smaller or T and c. that produces a chi-square stalisIic of }f from n observalions. the fonaulac for Cramer's and Pcanon's coefficients arc: Cramer's coemcient
=
)(2
n(k-I)
so it is possible to n:scale Ibis cocflicientlO lie in die range o to 1. While the use or these measures is popular in some fields.. more so if we consider thai the phi coefllcienl far a 2 x 21ab1e (sec COItRELATlOH) is a special case of Cramu's coefficient. inlClprctation is nol straightforward. Clc:ady, in some sense. the larger the cocfftc:ient is. the g~alcr Ihc associatian. However, the absolute w1uc does not haye any clear meaning and comparing concJatian coefficients from two tables (especially tables of dilTermt dimensions) is not sllaighlforward. Cantingency coefficients are widely used as a result of their convenience aad in spite of their limitations. For 2 x 2 lables. odds ratios IR possibly a bcUCl' mcaswe as it is easy 10 produce confidence intcrwls and they ha\'C a familiar intcrpn:lalion. Far larger labIes with at least one ordered categorical variable a measure based on die Spearman rank ~Ialion might be mare appropriate. For ftuther details ICC Goodman and KnukaI (1954), f1c:iss (1981). Siq;el and Castellan (1988) and Conowr (1999). AGL
c-,.., w. J.
1999: PrtNtim/lflHrptrTtltMlrir Slalislirs. New yort: John Wiley a Soas.Inc. ' .....J. L Itsl: Slatislit:tlimellrot/s for raleJ tmtl "opoTliOlU. 2ad editiCIIL New York: Joha Wiley III Sans., Inc. Goodma, L A..IIDII KnasbI, W. H. 1954: Mcas&Rsof assaciatioA for croswlassificatialS. Jouru of 1M Amerinm Stalutiml As.Jotialion 49. 732-64...... s. 8IId CaltllIaa Jr, N. J. 1988: Nonparanrelric stalistirs for lire behtniDurai sdmca, 2nd edition. New Vorl: McC:iraw-HiU.
contingency tabl..
These 1ft clOss-tabulatiOM that arise when a sample from same populatian is classified with respect to two or IDO~ qualilaliYe Yariables. The ftnI table shows a simple example involving two such Yariables each willi thn:c caIcCories. A IIICR complex CCIIIlingencylable that inyolves a classification with n:spccl to thn:e yariables is shown in the second table.
contingency tabI_ Incidence of cereblal tumours Type
and Pearson's coc....cient = Site While Cnunc5r's cocOlcienl can Iak values from zero to CIIIC. PearsonOs CXJCflicient cm. Dever ~ach one (the denominator is clearly always larger than Ihc numcnllor). In fado
.........·s_ .... lcaownmaxinwmor lk-I)!k
1bbd
I D
m
A
8
c
Total
23 21
9 4 24
6 3
38 28
17 26
75 141
34 18
31
L rn.tat . .; D, tcmporall.s~ Ul atbc:rcmInJ mas. A. baaip tumours~ B. maiipaDt tulllOllll; C, other oen:inl1llJnclua..
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CONVENIENCESAMPLE
contingency tables Coronaty head disease
Serum cholesterol Blood
2
3
4
PRISUIC
CHD (ycs)
CHD (ao)
I 2 1 4 I
2 3 8 1 111
2
85
1 4
119. 61
334 2 I 1 II 6 6 12 II II 121 41 22 98 43 20 2C» 61 43 99 46 3l
Blood ~: I, <127~ 2. 127-14611U11Hg: 3. 141-I66mmHg~ 4. >1671D11l11;.Saum cholesterol:
260 IDJI'I00 cmz•
whe~, is Ihe numbcrofrows and c the number of columns in the table. If No is InIe Jf bas. asymplalically. a CID-SQUo\R£ DISTRIBUTIDN with (, - l)(c - 1) degrees of &c:edam. For our filSltabie. the eslimalcd expectc:d values under incIcpcndcnce In given in Ihe thinl table. Hm: ~ =7.84. wbicb with 4 degm:s of fn:cdom gives an associated P-VALU6 of 0.098. T~ is no eYidence qain51 the independcace or site and type: of tumour.
contingency tables Estimated expected values under the hypothesis of independence for data in the cerebral
tumOur table Type
I:
Contingency tables sucb as these two CaD be used to lest various hypotheses aboullhe variables from which they IR fonned. To begin with. we shall iUuslnde this using tables formc:d fmmjustlWO variables (two-dimensional conlilllc:Aey lablcs). since .these ~ Ihc ones enc:ounteml most commonly in praclice. The bypolhcsis of inlen:st for twodimensional tables is whether or not Ihc two variables IR independent This may be formulated I11OI'e formally in terms ofPtlo the ptObabilily of an obsc:mdion being in the UlhCleII orlhe lable. PI" the prabability of being in the ftb row of the table. anclp. the probability of being in Ihe jib column of the lable. ~ hypothesis or inclc:pcndencc can now be writa:
No: Pi =pi. x Poi Estimated wJucs ofPI. and Po) can be found from the ~Ievant marginal totals (RI" n.}) and ovemll sample size (n) as: ...
ni·
Pi· =-.
..
"'j
P.j = -
n 111csecan then be combined to givc the estimated probability of being in the Ulb cell of the table under independence. Pi. X P.j. The mquencies to be expected under independcace. S". can then be obtained Simply as: R
£i
="
,.... X Pi. X p.j
R;. X R.}·
= ...;,-.-~ n
The hypothesis of independence can now be assessed by
comparing the observed (all) and cstimaled (Ell) expecled fRquencies usililthe familiar aa-sQUAREi TEST statistic:
X2 =
E' f. (0,_-:£;)1
.It
Eij
SiRe
I 0
m
Tot.J
T....
A
B
C
21.02 15.49 41.49
9.97 19.68
1.01 5.16 13.83
31
1.35
71
37
26
141
Since the disbibution of the
lest statistic:
21
75
is only a chi-
square asymplOlicaily thm: bas beca coasiclerable WGrk on lIyilil to find when the sample size is sufticient for the p-
values derived in this way to be yalid and altcmative proc~ dun:s have been derived that do not ~Iy on the asymptotic assumption (sec 6XACr ~ RJR C.«ElJORICAl. DATA). When Ihe continpncy table is formed flOln m~ than lwo variables....., than D siDgle bypothesis may be of interest. We may. forexample. wish to lest the mutual independence of the variables forming the lable or the conditional independence oflWoofthe wriablesgivcn a thint and soon. Forsome bypotheses. estimated expected values can be found flUID particularmarginallOtaIs but forothen an iterative scheme is nc:edc:d (see Everitt. 1992. for details). In gc:aeral the analysis of three-dimensional and higher contingency tables is best lD1de~ with the use or log-linear models. BSE (See also CORJtESllONDENCE ANI\LYSIS) D'II'IH,B.5. 1992:""'e tll'lQI,$isojtDIIliRgalt.'1lab1rl. 2DdeditiaIL. Boca Raton: Chapman &: HalIICRC.
convenience
Slllllple A convenience sample is a nonnmdom sample chosen due: to i~ easy access. A conv~
nienc:e sample is unlikely to be n:p~sentative of Ibe population. Ihe main clisad'VDlltagc beil1l that it is unclear how rcprescntalivethe sample is ofthe population ofinte~L One example is surveying people who walk by on Ihe stn:eL AnaIher wauld be scleclilll palienlS who Dltcnd a clinic or doctors in a particular hospital. The mainadwntage is thallhe sample is simple to obtain and may save money.
101
OX*'SDmT~
______________________________________________________
A classic example is the use of medical students as study subjects when C!ODducdng medical resean:h. If Ihe study involfts seeking opinions or, fCll' thai maller, measuring certain characteristics. such as height, one needs to bear in mind Ihe raet Ihal the sample is atypical of the population as a whole and, heace.limits conclusions to the papulation ofall medical sludenls, perhaps to ensure valid infcmlCC. For IUrther ddails see Crawshaw and Cbamben (1994). SLY
en...... J.8IId a...henI, J. A. 1994: CtHftUrtDlUge inA lelyl .stotirtits, lnI edition. Cbcllenhlm: Stanley 1'bomcs Publishers Ud.
Cook'. distance
Cook's distance (Cook. 1977: Cook and Weis~. 1999) is a measure oflhe inftucnceofa case on . . thecstimalecl panunc:tcrs /l ofa linear repession.1l measures the glabal impacl of deleting lhc case on all the parameter cslimalCs laken lOgCIber and is the diSlance fro.!" iJ 10 iJJ,,)' expressed in tenns ofconfidence ellipsoids about/l, when: fJ(.) is the vectorofplll1llllCtel5 C5timated with the ilh case omitlecl. Far a dependent variable y,. D, is given by: D. - ' ... _v )2 o
r -
VI .,
figure). A general rule orlhumb is 10 examine CUC5 for which D, > I: alternatively. Hamilton (1992) has suggeslecl examining cases fCll' which D, > 4In. An infannal approach is 10 sort the distanteS in Older and examine the fcw cues with Ihe highest distances. A large jump between these and the rest can suggest points worth iJM:Slipting. CUC5IhUS identified mighl be considcRd for removal or at least further inwsligalion (subja:t to the caution that removal of OLnUEllS always requin:s). Analogous quantilies fCll' other maclels an: available, e.g. for LOCIISlIC IlECIlESSIDN (see P1egibon. 1981). If the interest is in one or man: particular parameters in the repasion, ralhcr than the c:amplete set taken as whole. 'dfbetas' can be computed: these estimate Ihe changes in the indiYidual paramelCl's after dcJetilll each cue. ML
Coale. R. D. 1977: Detedioa of infIucnlial obsemliaas ia linear ~pasion. TemlJtHJJt/rit.t 19. 15-11. Caalc.R.D.udWil!llllleq.S.
1999: Appltetl regrrsI_ illcludm, rompuling ond ,ropIIiu. New York: JoIm Wiley &: SaIs. IDe......... L C. 1992: RcpasiDD with paphics. Belmont: Duxbury. PnaIbGD. D. 1911: Logistic ~pasioR diaposticL Annab of Slolirliu 9, 705-24.
h;
(I-hit
r
when: p is the number of i I ·ables, is the variance of the ellimalc and Ir is the I vc:mge of Ihe fth observalion. given by Ihe ith onaI ement of the socalled "hat' nudrix H=X (X"X)-IX": w~ X is the data matrix ofiDdc:pendent values (see Cook and Weisberg. 1999). For a point to be inftucntial it must be both an oUllier. i.e. haft a high residual, and it must also have high levcnagc. i.e. be far from the cen~ of gravity or the points (see the
IC
8
aB
6
coplota
See 'I1tEWS OR.o\PHS
COREC
See E1HICAL REVIEW CotNl1TEB
cOlf8latlon Correlalion is used to meas~ Ibe strength or the linear n:lalionship between two random variables. If we plot two variablC5 on a SCAnERPLOI', their correlation is a measure of how closely lhc points lie to a straight line. We measure correlation by a mm:lationc:oemcienl. 11Ie simplest or these is PEARsoN'S CORItEI..A'I1ON WEFFlCIf.NI'. also known as the protlucl-lIIOIIIenl t'O"emlion coefficient or simply as dae com:lalioo coeflicienl. This is the ratioorthe sum ofpmclucts or difTerences from the ),lEAN divided by the square RJOIs of the two sums of squares about the mean and is usually denoted by r.
4 r = --;==+====~=======~ ;-y X;-."C
2 ~
_)2t -)2
0
-2 -4 ~
-8
-2
-1
0
1
2
3
4
X Cook'. distance Three points in s sample al50, only one of which (C) bass high Coole's dislance. PoIntA has B hiQhresidusJ butlowIfIV8f8(IB IUfdB has Blow IfISiduaI but high leverage
The confusing sym r' (rather 'c') is for historical reasons: it appears to ave indicated "regn:ssion' originally. It is now well established and if a medical paper uses "r= ...' wilhoul explanation, il usually means the correlation coefficienl When we wanl to distinpish between the correlation coefficient in a sample, r, and the correlation coef6cient in the population from which Ihe sample was drawn, we use 'p', Ihc Greek letaer "rho', to denote the latter. 11Ie figure (see page 103) shows lOme sample correlation coefficients. The coefficient is positift when large values ofy are associated with large values of x~ the variables being said to be positively correlated, as in (a). (b) and (c) in the figure.
__________________________________________________________
be misaed 01' ~ by il. In the fipn:. (.:) &baws • SIIimI mllllionship yet the ccm:JaIiaI coeIIk:ienl is ZCIU and (I) shows .. CUd l1181hema1ical mlaliDnslaip. wiIIIauI. lID)' JBDdom wrialian. yel .thc CGlRIIIIian coeIIicic:nt is Iess.1IIIIn GIll: because Ihe mlalionsllip is nal
The lllldaril)' of abservaiions will have either baIb obserYaIian5 llater thaa the mc:aa or boIb less Ihaa the .... la eilhercase, obscMdian miaus .... wiD have the __ sip. eilherposiliw: or 1lCl1IIi~ for baIII variables ancIlIK: pnxIuct of Ihese clitrc:nmca wiD be positive. HeiIcc ~ . . . of products will be pasili~ aaclthe COIMJaliDn cocftlc:ieIIl wdl be posiliw:. The canclaliOR cacIIIcienl is nc:ptive wIIea small values of,. an: associated with Iarp values or .Y, the variables beilll aeptively comdaled as in (I). (h) IIDil (i) ill lhe fipR. The majarily of observatioas will ha'We CHIC obscnati~ p:atcr Ihaa Ihe IIIC8D .... Ihe GIber lea .... the mean. 0bscnaIi0a aiiaus mc:aa will haw:. ditrcn:al sips for the lwo variables aad the: pnJdUCI oflhese clifl'cn:nces wdl be aecaliv.e. HelICe the sam ofpnxiucts wiD be acpIiw: and lhe cam:1aaioa coeIJIcienl wiD. be aeplive. The conelalioa caeOIcical has a maximin vahle of+1 wllea lhe pOinlS aD lie exac:dy OR a straiJhl line and the variables an: posiliw:ly c:anelalcd and a lllinim. . -I whea.1he points .. lie aacdy OR • Slmipt Une and Ihe variAbles an: aepliw:ly com:lalcd. When there is no linear rNtiClDlllip 81 • the caefficical is :a:ro aacldIe variables an: said lo be UIllXtlleIalc:d. as in (d) ia die filum. COIRJaIiDn -'y IIICIIIUIa the IImI&Ih of the Iiacar (LCo SIrIIiPlline) ""lIiansbip. Noniiacar Rialionships ...y
• IInIiChlliac. . . We: can lest Ibe MULL IIYPOIIIESIS Ihat Ihc papulation corRIIIIion is zc:nt. Le. dial Ihc~ is no linear n:Ialionsbip belween die two variablc:s.. Ulinl • sinaple l-test. Alleut oaeoflhe twoqriables millt follow. normal distribulion.and die observalions mult be indcpe..... lfwe can asIImIC Ibis. we: n:qui~ only Ihe value of r ancIlhe sample size If. 11u:a. if die nuJl .ypalhc:sis YAK InIe: . 1=1'
0'
(a)
r=1.0
/0
7 > 8 5
I§
4 3
0
>-
I
8
I
~
8 8
4 2
(It
r.O.O
.. 4 ~ 2
2
>10
I
8
I!l
r.-G.3 00
5 0
co""""
0
4 2
I alii" >-
8 6 4 2 o0 0
0
-
2
0 0
YarilbleX
>
I
8 8 4 2
0 .8 0
0
0
•
VarllbleX (I)
.....
o~
(h)
'. 00
0
r.O.O 0
>- 8
YarlableX
(a)
~
(e)
r=O.5
(c)
YarlableX
ooo~lho
8
UI' IIOL'
r=o.e
YarilbleX (d)
would follow al-dislribulion ~ an: IabIc:s or lhis lesl many boob'and aImast aD PIOIJ1IIII5Iha1 calculaae ,. also liw: die P-VALUL As a n:.tt.. correlalioa coefftcients in medical papl:n ~ allIICIISl iDvariably rollowed by p-~uc:s. Col. ·,=0.57. P
(b)
>
OOARBA~
0
"
r.O.82 0
>100
J~
~
YarlableX
OOW?
2
/
V8rIabIeX
00, 1:°°, r.-o.95
(i)
r.-1.o
9
0
YarlableX
6
2
0
V8rIabIeX
Nine t:DIIfIIatJon coefficitJnIs
103
CORRESPONDENCE ANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ The distribution oflhe sample com:lalion eoeJ1icient is DOt a simple ODe. but it can be convened to the normal using a TRANSRlIWA'f'I(]N known as Fisher's :~sfonnation:
+r)
: = !10I .. (1I-r 2
This works provided observations 1ft independenl and both variables follow nonnal distributions.. a stronger assumption than thai n:quired for the lest of significance. Provided it is meL the :-lnnsformation can be used 10 find a confidence interval for r. the standard enur on lhe baMformc:d scalc being 1/(n-3). We can calculate the confidence intuwl for... d tmnsfonn back. Curiously. many programs do not onn this. The standard enur can also be used in a power calculation to estimalc the samplc size required to dc:lc:ct a relationship belWc:c:n two continuous variablcs. Bland (2000) and Machin el Q/. (1998) givc formulae and tables. We can still calculate: corrc:lation coefficients whc:o normal assumptions 1ft DOl meL bUI cannot usc P-values or confidence intervals found by these mc:thods. The assumption of independence is vcI)' important. II could be seriously misleading to lakc sevc:ral obsc:mllions from each subject and treat them as a simple sample for the calculation of mlTelalion coefficients and their P-values and mnfidcnce intervals (Bland and Allman. 1994). The correlation coemcicnl and regression equations between two variables are closely relaled. The proportion of variation explainc:d by the JCgrasion is whelher we have the regression of Y on X or or X on Y. Thc:rc is only one allfClation ClDCflicient although there arc two possible regression lines: correlation has no choice of dependent and independent (or outcomc and predictor or explanatory) variables. The pmduct of the lwo regression slopes will also be equal to r:. which is sometimes called the coefJkienl ~f delerminalion. The lests of the null hypotheses of zero correlation and zero slope for Ihe regression line: give the same P-valuc. Tbe two mc:thods proVide the same lest for a linear relationship. The product-momenl cOrRlalion is only one: or several correlation cocflicic:nts in usc. There are two nonparamc:tric rank correlation cocftlc:ienls. SPEAIWAN'S RHO and Kendall's 1011_ useftll when the assumptions of nonnal distribution nccc:ssary for confidence intervals and Significance tests arc not tenable. The inInClass or IN'IRACLUSlER CClUtEiAnON co. EfFICIEIilJ' is used when. rathCl'than two variables. we have IWo ormorc obscrvationsofthe same variable on each subject. 11Ie lelrachoric correlQlion coejJicienl. seldom seen in practice in the modem IilcralUIC. can be usc:d when we have lwo underlying continuous variables bUI can only observe whethel' the subject is above or below some cut-ofl'wlue for each. making both dichotomous. 1be biserilll C'orre/aliotr coefficienl can be used when one variable is co.inuous and the oIhcr dichot-
r.
omous. These arc not the same as Ihe corrcJalion coefllcients found by simply making the dichotomous variable zero or one and calculating r. called the phi coeflicic:nt and point-bisc:rial com:lalion coefficient n:spcctively. We can adjust the correlation between lwo variables fOl' their mutual relationship with a third wriable using a partial cOrRlation ClDCl1icient. lbis is an estimate of the correlation between Ihe twovariablcsofintcrcst for subjects who all have the same value of the third variable. Partial correlation is seldom seen now, multiple regression being preferred. There is also a partial rank mrRlalion, using Kendall's appl'OlllCb. Wc can calculate a multiple correlaliOlf coejJicienl. usually denoted by R. which expresses the strength orlhe relationship between a chosen variable and sevcrul others. Time series (sec mIE SERIES IN MmICINE), when: observations arc measured successively O\lel' time. may show serial correlation or autocorrelation. where adjoining observations are correlated. A com:lalion matrix is Ihe set of all the cOrKlalioDS bctwc:cn each pair of a set of variables and is the starting point fOl' seventl multivariate techniques (sec. for example. PIUNC'IP.M. COMPONfNl' ANALYSIS).
JMB
(Sec also SPEARMA.~'S RHO (p»)
BIaDd. M. 2000: All ilrtrotluclion to medical $tatiJtia, 3rd edition. Oxfonl: Oxford UniYersil)' ~5S. &Iud, J. M. aad Altman. D. O. 1994: CoRlatiClll, rqruUOII and ~peatc:d data. British Medical JOIII'IIal 308. 896. Mae...... 0., C'aapbell. M. J .. Fayen, P. aDd PIDaI, A. 1998: Stalistical tabks for 1M dt.gn of eliniral J,ut/ies. 2nd edition. Oxfonl: Bl.:kwell.
correspondence analysis This is a tc:chnique fOl' graphically displaying the associations amoDl the categorical variables fonniDl a COIlll1NCJENCY TABLE in the fonn of a SCAm:RPlDT. A com:spondence analysis should ideally be sc:c:n as an cxtrc:mc:ly useful supplement to. raahc:r than a ICplaccment for. more formal inferential analysis such as a CHI-SQUARE Tf3f of the independence of the variables. Correspondence analysis proVides a 'window' on to the data that may allow rcseard1c:rs easier access to the associated numerical results. facilitalc dilleussion of the daIa and possibly generatc inlcrc:sliDl hypotheses about the data. The mathematics behind correspondence analysis (Grecnacrc. 1992; Everitt,. 1997) leads to lwo sets or multidimensional coonIinDlC values. one of which reprc:scnts the categories or the row variable and Ihe other the categories of Ihe column variable. In general, Ihe finltwo coonlinatc values rcpraenling each row and column calegOlY arc used to provide a single lleattcrplot of the data. In lhe resulting diagram. the distance between a plOlled row point and column point represents as accunttcly as possible how the: corresponding cell of the conlingency table depans from indc:pc:ndence. as we shall illustnllc usiDl the data on age and boyfriends in the first lable. 1be coonlinaaes resulting from a com:spondcncc analysis of lhese daIa 1ft shown in the second lable.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ COST-EFFECTIVENESS ANALYSIS
Conwpondence
anaIys..
Two-dImBnsionaJ
COITB-
spondsncs IIIJIIIysIscootdnates tor the IOwandcolumn categories in the titst table
No boyfriend
BoyfriendIno sexual inleKaune Boyf'ricacI.Ise inten:oune AG5
W
x
y
0.193 0.192 -0.732 0.355 0.290 0.103 -0.211 -0.717
0.061 -0.143 0.000 0.Q55 0.000 0.000 -0.134 0.123
Category
0.5·
-I 0.0 - 8f::.ea.-....···_......·_........_ ..·r. A&9......NcLbf 1&61 --lliilJ....·· t:
!
BLnoaax
AG4
~.5-
•
•
I
I
~.8
~.4
-0.2
0.0
I
-
.
0.2
0.4
correspondence analysis Two-dimensional
con.
Dinension 1 spondencsanalys;ssolulion forageandboyftiendsdata
coneapondence a..lyaIs Age and boyfl1ends conlingencytable Agtr
2
3
4
S
No boyfricacI
21
21
14
13
I
BoyfiicadIDo sellUal intercourse Boyfiicadfscllual iateKOIIl'5C Totals
1
9
6
2
3 33
4 24
8 10 31
2 10 2G
31
wlue, close 10 the origin CIOlRspond 10 cells whcIc the observed frequency is close 10 the expected frequency under independence... the ligan:.. for example. age JIUUP S and boyfricnd/sexual iD~ both have larp negative: coanlinaac values on the fiist dimension: consc:quc:nlly die: com:spondinr: cell in the IabIc: contains more obsc:rvaliOllS than would be the case unclerindcpc:adence. Apin.ar:cIJOUP 5 and boyfric:adlno sexual inlcn:oune ba'VC coonlinate values willi opposite: sir:ns on boIh dimensions. implying thai the cOIRspondinr: cell in the table: has fewer obsc:nalioas than expeclCd under indcpc:ndencc:. BSE EftIttt,B. S. 1997: AnaotaIioa: conapancIeace analysis. JDUmGlof Child PJ)doIDD tIIId PJycllialry 38.131-45. One...... M. 1992: Ccmapallclmce anaI;sis in medical n:scudL Stalistiml Mnhods ill MeilkoJ Re.am. 1.91-111.
coat-lleneflt analysla (CIA)
See CO&T-EFn:rnVE..
NESS ANALYSIS
I
"'AIe IRIUPS: I. <16 years; 2,16-17 yean; 3,17-11 yeus; 4.11-19 }CUi; 5.19-2I)'a1S.
A&cpoup 1 Agepoup2 Acepoup3 Agcgmq,4 AgcpoupS
X2 =20.6, DF=i.p=O.ODI..
Thc:sc coordinates can be plotted to live the scauaplot shown in the fil~ or most inlelat in com:sponcienc:e analysis solutions is the joint interpretation or the points rqRSellliDr: thc lOW and colunm calClories. It can be shown thai row and colunm coordill8lc:s that are larr:e and of'the same sign com:spond to a cell with considcnably more. obsenaiions than if independence held. Row and column .caonIinalcs that arc large. but of apposite signs. imply a cell in the table with far fewer ObservaliOlll than ~uimI under the assumption. or indc:pcndeacc. Ymally. small coordinaac
cost-effectlveneaa
analyala
Cost-cft'c:clivcncss
.....ysis (CEA) is a 1001 for comparinl casts DDd bc:ac:fits in lc:nR5 ofpaticnt oulc:olncs (changes in health·and welfare), so thaldle: wJuc: far money or a proposed healthcarc intervenlion CD be jud&c:d. Cosl-utility BDBIysis (CUA) is a spcciftc form of' C~ the main diffe:rence bc:inr: that bc:neftt is expn:SKd in the: fonn of palienl ~fe:renccs aad con~ 10 a measure such as the number of quality-or.life: adjuslccl life-years (QALYs). CUA thus aliowl the comparison of compeliDg health pIOgnuDlDCs that have \'elY dilTcn:at sorts of outcome. CEA~ and to a lesser exlc:nt CUA. now appear in many bealth services resc:lRh saudie:s. unlike cast-bc:ncfil analysis (CBA). in which bc:ncftlS an: e:xpn:ssc:d in purely manc:lBIy Ic:nns. Because of'time limitations Oft clinical and health service: trials. cost-c:.fra:ti'VCDCSS CaD often only be e:slablishc:d for intc:nnc:dialc outcomes and it may be: nccc:ssary 10 apply tcchniqucs such as SURVlVAL ANALYSIS 10 extrapolate: values mto the futon: (C.I. to )RIlict modalily
105
COST-EFFECTIVENESS ANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
from risk factors). Dnimmond and McQuire (2001) discuss this point and other kcy methodological principles involvcd in CEA. The incremental cost-clTcctivcacss ratio (ICER) is a key measure in cost-clTc:ctivcness analysis and is deftned (for a comparison or two treatments) as I!J.CIl!.E. where I!J.C is the mean diffcrence in the costs or the treatments and I!J.E is the mean diffc:n:1ICC in cffectiveness. A trealment is considered cost-clT«tive ir ACIAE < A. wh~ ). is the maximum amount that the decision maker is willing to pay. or the 'cciling ratio'. Note that the term 'incn::mental' stresses the comparison of abc tn:aIments with cach other: the less useful ·averagc· cost-ctrcctivCDCSS ratio is often cSlimatc:d separately for each treatment under study BDd compared without testing the diffc:n:nccs statistically. Many trials of new thempies collect data on the costs ror individual paticnts. as well as the etrectivCDCss in terms of changes in clinical SlaIUs or quality of lire. and can thus calculate ICERs as part of an economic evaluation. Indeed. rqulatory bodies in most c:ounbies insist on such evaluation berCR considering drug licensing and provide guidelines as to acceptable methodology. Kobelt (2002) describes the worItOow or an economic evaluation in n:lation to the typical drug development process. from pre-c:linical studics to PHAsE III TRIALS and marketing. and discusses the typical rcsourc:c itcms that might be included and the ccanomic perspectives from which such studies are performed. In an economic evaluation. the term ·pcrspective· refers to the level at which the costs are to be considen:d. ForelUllDplc, the societal perspectivc would treat all costs as relcvant. including loss 10 production due to illness. whereas a health service perspective might consider only din:ct lreabncnt costs. As well as inftuencing the type of data collected. the penpective has a bearing on the summary statistics and Iype of analysis that might be tlODSidercd appropriate. For example. where the pcrspc:clive is that or a healthcare provider. it ean be argued that it is the mean cost and the mean elTectivc:acss abat are relewnt. ralhcr than some other summary measure such as the MEDIAN. Tbis is because the total cost aggregated over all patients (which is the important quantity for planning or budgeting purposes) is obtained by multiplying the mean cost by the number or paticnts. Various typesofuncertainty apply to economic data and all may need to be 4XlRsidercd in CEA. For example. the amount or a scrvice used by a patient will need to be multiplied by a unit (lOSt (c.g. the cost per hour of employing a therapist) to obtain the total cost pel' patient fOl' that service. The true unit costs may be available onl), as point estimales. e.g. in lhe form of approximate published \·alues. Fw1hc:nnorc:. costs and beneftts may ha~ to be discounted to takc account of the preference for earlier beneftts ancUor later c:osts and this involves the application or discount rates. which are genCl'ally estimalcd. The impact of such deterministic SOUR:eS of
uncc:nainty is generally assessed through sensitivity analyses. in which ranges of plaUsible values are considered. Conlrariwise. data that ha\'e been obtained from a random sample. c.g. on the service use of individual patients. an: subject to stochastic: rather than deterministic unCieltainty. and this is generally expressed in the fonn of statistical quantities such as CONFIDENa. INTERVAlS and P-VALUES. Costs onen haYe \'elY skcwed distributions and for this reason normal distribution theory may not be appropriate. FlD1hc:rmorc. if the emphasis is on the estimation of means as suggested here. simple lRANSRHWATIONS (e.g. log transformation) may not be appropriale since thc quantity about which inferences are to be made would no longer be the MEAN (see Thompson and Barber, 20(0). The use or generalised linear models is one solution since these ean model mean values directly while also allowing ror skewed distributions such as the GAMMA DISTRIBl1I1ON. An a1tematiYe is 10 use nonpanunelric bootstrapping (see BOOTSTRAP). in which a large number of rc-samples is generalcd. with individuals replicatc:d by sampling with n:placement. However. while Widely used. there is a potential problem with bootstrapping applied 10 the ICER. as discussed later. and it has been criticised in more general terms by o· Hagan and Ste\'Cns (2002). who argue for a Bayesian framewmt (see BAYESIAN MEIlIODS) fOl' analysing cost~ffecti~ness datL The ICER and some relaled quanlities are illustrated and discussed by O'Brien and Briggs (2002). using an cxample relating to the costs or treatment BDd lifc-),eaJS gained in the Canadian Implantable Deftbrillator Study (CIDS). The incremental coSl-clTectivCIICss plane for interpreting ICERs is shown in (a) in the first figure (see page 107). Each quadrant can be labelled according to the intelpretation or an ICER falling within it and typically new treatmenls are more effective but also more costly and would appear in the NE quadranllfthe new treatment is both more effective and less costly than the eontrol or comparison treatment (quadrant SE) aben it is said to dominatc. In this diagram the point denoted 'CIDS data' represents thc obscn'ed (I!J.C.1!t.E) pair. The point cslimateofthe ICER is the slope or the line joining this point to zero (in this case the ICER is 49.11SItIO.23. 01' $213.Sk per additional life-year gained). The decision as to cost-elTectivc:acss can be made for all AC. I!J.£ pairs consistent with the observed data. as re~ sc:ntc:d by the bootstrapped values. a set of 1000 pairs for the CIDS data being shown in (b) in the first ft~. Also shown are 95e.t conftdencc limits derived from the: linesc:orresponding to the 26th and 97Sth replicates (ordc:n:d from the smallest to thc largelit ratio). Ratios. evcn of normally distributed quantities. tend to be unstable and have disaJntinuitics since both numClBlor and denominator muld be O. This can give rise to anomalies since the same mtiocanarise from two opposing SC:lSofresults.onc where the ncw treatment is less elTectivc: and more costly
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ COSf-EFFECTlVENESsANALySIS (b) .1ao.ooo
(a)
NE
NW
U oq
'f aao.aoo
'H.~tai·:sed celliIg
ralio A($1OOk1QALV) -t
....
...0
0.2
I .....
!·.OOO
0.81 . Elect cIIIerence, ~ OA
_000
0.0
8t1D,OOD
~1ao.ooo
sw
SE
0.5' EIIect·dIerence. ~
CD8I-eftecllven_• ....,... /ncremen'III00sf~planeforc.nadan/~Dellbllllatorstudydala shawlng(aJthelourquadtants,poinIesltnalesol4EIlIIdAC, from which the ICERlsobtlJlnBd,lIIJdaCBilintll8tioline; (b) bootstrapped values for AE anti /!;,C lIIid·oorIIdent:e IinIIs for the ICER deriVed ~ ihese WIIues (quadrat NW) ancIthe aIher ~ effi:ctive aad.1ess CXIi5IIy (~ 58). Whea the ralias am .ordered and used 10 ·eslimlllC conftdeace 'IinUts ~ is 110 way to clistillluish .Ihese c~ 11ms il il usually n:cOll1lDeDllc:d lO plat the booIsIrappecI poinlS in additian lO' calculalinl numeriCal eonficIcnce limits SIt that "is siluation can be detecIed. One way around diflicullics arisil1lfnxn using nIlios is 10 cast the problem in a sJilbdydiaemil wa)'. based on ploUinl lhe wiJlinpess.-to-pay line . . . aD the cost-efl'eCtiveaeu
.1.00 0.90 . _........-
........-
........-
........-
plane. The n:giOD below Ibis line (also called the ceiling rlllio liRe) is the (lcupl_ilil,. ~on. Ia this example abe Wulilllncss-lOiJayis $IOOk per QALY IiDd the point estimate or the ICER' is $2 14k per QALY, so the ImIIInCDI wauJcl not be consiclcrcd· cost-efrcclive. The pmportion or the boatsInIpped values rallinJ in the acc:cptabilil)' ~ CD then be platted against a I1IIIP or bypothc:tiDal values far the ccilil1l ratio to P\'e a CD:JI-tt/fecliW!nea IIccepltlbility Cllne~ This appro8c:b is useful wbc:R no Iixc:d willingncss-lo-pa)' has
......._ ........_ ........_ ......._
........
.1
0.80 0.70 0.60 0.50 ~ '0.40 21
J
i
I
0.30 0.20 0.10 0.00 .._ ....... . $-
~000
SSoo,OOO
$400.000
S6OO,ooo
Yalue of ceililg 18110 (1) COBt..nectlveneBII ....,... CosI-fIIfet;Iivenes accepIabfiIy CUfVfJ for the CIInadi8n Imp/IJntlIbIe DfIIilNIIIBIDr StudydaIB
107
OOST-tJTlUTVANALYSIS(CUA) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ bc:cn established since it indicates how pmbabIe it is lIIal a treatment will be cost-effc:clivc al any given hypalhc:ticai willinpcss-to-pay. The: second filUre iDuslnlcs an acceptability curve for the CIDS daIa.. For cumplc., giwn IhaI a bcallhaare providc:r alRSiclers $214k as an KCCpIabIe upper limil. one might be able to claim thai a new treatment is. say. SO'II likely lo be cost-effective. At $17k it would only be about2.SCI. likel)' to becasl-elJ'cc:livc. This IlIIlCrfigun::canbeusedasalower9SCJ, CXlllfidcnce limit. Ihe upper 9SCJ. wnfidcnce limit being infinity in this case. Anathcr approach. malhcmalically equivalent 10 Ihe COIl acceptabilit)' Cline technique. is to calculate Ihe nel moneIllry -eftl (NMB). which re-apn:sses the ICER clccisiOD rule to give the quantity NMB=AAE - AC. This is now exprased in monetary lenDS and because it is linear in A£ and AC. paramelricconftclcnce limits are simple tocalculale, gi~ variances rar 4E and AC. The value or NMB can be ploUecl against A. for a nnge orh),polhctical values. and the value or A for which Ihe NMB is zero is a bRakevcn point. or at which Ihe likclilloocl of being cost-efl'ective is caunc..tIIe value $214k is abtaincd as beron:.. ML
SO.,.
I)nmunand. Me ... McGaIn, A. 2001: &DIIOIIriC' era/lllliion ill health ellre - me',inB thary prtlt'i~. Oxfanl: Oxfon:l Uni-
wi'"
'rCnity Pras. KabeIt, O. 2002: Hral," MHffHfIiC'J: GIl ;"Irotllltl_ 10 «OIIOIIIic eN_lion. 2nd edilian. London: Ofticc or Haltb EcClllCllllics. O'Brtta, B. J .... B....... A. H. 2002: Aaalysis of IIIIICCItainty in heal... cue cmt4ccti\aeu studies: an intmduclion to Slatislical issues and melhods. SIIII&tklll Melirotb in Metliml llat!arrh II. 455-68. O'H..... A. _ Stnas, J. W. 2002: Ba)'aian mcdaods fordesip ancIanaI)'sismcast·etTccliveness trials in the evaIuaIioD of beallh CIR tcchnoJacies.. Sltlt&tiall MethDtb in MMiml Re.retm:la I I. 469-90. 'I'IIoaqaIa. S. G. ad ......... J. A. 2000: How should cost dala in prqnIIIic randamiscd llials be aaalysed? Britis" Metliml JOllrrllli 320. 1197-2CXl.
cost-utility analysis (CUA)
Seecosr-EffEIrnVENESS
ANALYSIS
count.ractual model See CAUW. .IODELS covariance
See COVARIANCE MATRIX
covariance matrix Tbis is • synunetric
maIrix in which the olf-diqonal elements are tile wvariances of pails or variables and Ihe elements on the main diagonal are the variana:s or the variables. 1hc sample covariance of two variables with sample values (x •• 'I), (X20 ,~••••• (X,., ),,,) is
defined as:
Cov(x.y)
=:£! · (xi-x)b';-j) ", 1
wheR x is the arithmetic or the x variable aDd j the arithmetic: mean of the , variable. The wvariance maIrix is
often. bdtcrbasis forlhe appIicalian ofS'l'lWCl1JRAL EQtL\TION MODEU than Ihe correlation malrilL. BSE
Cox's praportlonal hazards model
See COX'S
REORESSION MODEL. PID'CR11C»W. HAZARDS
Cox's regression model In 1972 David Cox developed Ihe Cox (or propartional hazanIs) repasion madel. Since that time il has became probably the most widely used methacl or anaI)'sing time-lo-cvc:nt (survival) data. This madel allows us to link line orb main campancnts of such data for the lint lime: (a) an indicator variable IdIccting whether the individual has experienced an event or not (i.e. hasIhas not been censon:cl): (b) the lenglh or lime rrom entry in a stud)' to the evc:at or 10 the censoring lime: (c) one or IIIOJ1: explanatory variables, such as qe. sex and ftalmcnl received, usually collected at the time of enlly or aD individual into a slUcly. Tbc: popularity of the model is due 10 ils relalive computational simplicity. its inlelpn:labilit)'. ils ability/appeannc:e to perform well in man)' situations and ils incorparalion inlo most -U0l' STAnS11CAL MCKAOES. The approach used was DOYel because il modelled die hazard fUnction OWl' time - made: up Iiom campaaents (a) and (b) abo\oe - and n:lated it to Ihe explanatory variables component (c). 1he hazanl function can be: thought or as Ihe p!Obabilit)' thai someone now evc:at-be wiD experience an event in the next small lDIil or time. 111c: madel makes no assumptions about the underlying distribution or hazards in tile diffen:nt groups and. inclccd..this is left uncslimalccl in Ihe process oreslimatingthe paramelers in the model. The basic madel relies on the usumption that the hazard runclions an: proponional across the groups being studic:cl. i.e.1hc: relalive hazards experienced by any two glUUpS of palic:als are constanl over lime. 1bc Cox maclel can be: used to perform a number of dilfen:nt analyses for lime-lo-cwmt data including: estimating a lmllment elfc:ct in a study while adjusting ror a numbc:r or explanatory or baseline variables. such as ap. sex: assessing which or a number of explanatory variables are most important and consc:qucndy developing a )HOIDDstic index: performing Slndified analyses; and assessing inlenlc:lions between variables. It has also been extcncled to deal willi silualions where the n:lalive hazard runction changes over time, the so-called timc-clependcnt Cox model. and rar situalions when thc:n: are deviations fiom propoational hazards for Ihe hazard fUnctions in dilfen:nt groups. 1bc Cox model can be written as: hl(I) = /ro(I)exp(ll.xl
+fl2.Y2 + .•. +lJtXt)
where h.(I) is the hazard runction in a given group. h,J.,I) is the hazard fUnction in a baseline group (which n:mains unspecific:d), fl, are the n:gn:ssion cacllicic:Dts and .'(, are Ihe
____________________________________________________ explanatory variables (rrom ; =I to k). Therefon:... the hazard ratio (HR) between groups is:
II I(1)1ho( I)
= exp(PI·TI + P2·T2 + .. , + fJ/c.l:/c)
One should note the indepeadence rrom time (I) of the hazard ratio an the righi-hand side or this equation. Hazard ratios are n:lalively simple to inteJpn:t: they are the n::lative risk or one group experic:acing the evcat to another group experiencing theevcat. Note that. as the baseline hazard is not esdmated. only relative measures such as the hazard ratio arc provided by the Cox model aad thus noestimales are givca in absolute terms using this mcthod. Such estimates have to be calculaled indirectly. A variety or data Iypes can be used in the model, including binary, Calegorical. ordeR:d categorical and continuous variables. The: number or variables one may include in a Cox model is theon:tically limilless. but in prac:lice it is limited by the number or events in the analysis. One guide is not to use mon: variables than the rounh root of the number of events. A mon: lenient guide is to ha\'e at least 1>20 events ror each category combination. PareKh variable being consiclen:d in a Cox model we lest the NUU.. HYPOJ1IESIS that the variable is not imponant to the model. i.e. that the panuneler value associalcd with the variable.p, is zero: this is equivalent to the hazard ratio (HR) ror that variable. HR =expfJl) =eo = I. This can be tested with a =-slDlistic when: == b/SE(b). where b is the estimate or the parameter and SS(b) is its standard CIIUI'. Unclu the null hypothesis this should follow a normal distribution and thus P-VALUES can be calculalc:d in the usual way. We may assess models and the addition or removal of variables to models using a vanetyofdiffen:nt tells including the Wald test. LlKEIJIIOOD RAno test and &COn: lest. The score test is the mosl complex and less commonly used tesL The Wald test looks at the change in the overall value between two models where the DEGREES OF FREEDOM is the number of dirrerent variables between the madels. The likelihood ratio test compares the 'likelihoods' of the two models and lakes a more general approach than the Wald tell: illooksat how the included variables explain the variation in the model. This is. therer~, the prerem:d method for reasons of consistency and stability. The time of each oulc:ome evenl (railure lime) is not actually relevant in a Cox model. but the ordering or these railures is. 1bererore. considcralion needs to be given 10 the onIcroffailurcs in the event of failures with tied event times. These can be dealt with in a series or methods including marginal calculation. panial calculation, E&on approximation and Bn::slow approximation (see Kalb8eish and Prentice. 20(2). The last of these is the simplest and is an adequate approximation if then: are relatively Cew lied railures. ~ should be taken when using the Cox model if there are 100 many lied event times.
r
cREDIm£I~v~
As Car nonnallinear n:grasion. it is possible to assess the fit or the Cox model by calculating residuals. However, then:
are no unique residuals far the Cox model Commonly used residuals an: Schoenfield and Maninga!e n:siduals., althou;h il can be difflcult to interpret whichever are used.. II is also possible 10 assess whether individual explanatory variables violate the propmtional hazards assumption (sec PIIOFORTJON. ALHAZ.O\RDS) and thcn::ron: assess whether a variable should be included in the model. MSIMP
a ..... M. A., GoaIcJ. w. lV. . . . GaUe....... R. G. 2003: An introduction 10 $IUl'iPO/ tIIftIlysis wbrg Slaltll. mrisc:d edilian.. 'lelas: Slala Pras. Cos, Do R. 1972: Rqression models and tire tables (with discussion). JOrmlill of the ROYS/ Stalistical So~"'rty 834. 117-220. KalIIIIIIdI, J. D. and PnaUet, R. L. 2002: The statisliml tlIIII/yJU of/DiMe time .'a. 2nd editian.. New York: Joha Wiley &: SaM. IDe. MMIdD, D. IUId .........., Me K. L 1995: Surriml tIIItllyJis: a pI'tI~t;CQI approodJ. London: John Wile)' &:
Sen. Ltd.
Cramer·. conUngency coefficient
See CONtlN.
OENCY ('()I!ffiCiENT
credible Interval When the aim of a Bayesian analysis (seeIlAYESlANMErHODS) is 10plOvide a scienliftc infen::nce about
an unknown parameter aD the requin:d informalion about the uncertainties involved an: contained in the POSTERIOR DISI'RIaunON. 11x:~ is a sense in which the only bUe salisfactory infen:nccswnmary is the complete "piclure' n:pn:sented by the posterior dislribulion. Alternatively. a range or posterior dislributions concsponding 10 a range of prior spccificatians allows a display of the sensitivity of ·conclusions· to "assumptions'_ Sometimes. however, proViding a complete pidure or the uncertainty in estimation by a posterior distribution is lessc:onvenicatthan providing alow-la'Cl summaryorthc message contained within it. Credible intcrwlsan: to Bayesian statistics asarmDtNCE INJERYAlS an: to fnxplentist SIaIislics; they provide a simple summary or the uncertainty associated with the e5limalc or an unknown p8l1lmelcr. If we suppose that the posIcrior dislributian ror an unknown parameter el is denoted by the poSlcrior diSlribution p (6) then an inlcrYal (el1.' clu) is said to form a 100(1 - a)CJ.. posterior credible for ~ if
a..
p(cl)d6= I-a
There are infinite many ways to determine a credible interval. some oC ich are iIIuSlnled in the first n,ure. Cn:dible inaerval ( I) isdelCrmined so thai il excludes regions or equal posterior probability. eKh tail com:sponding to a probability of I - al2. Credible interval (2) excludes a region of ex.Kliy a. on the lower side extending to infinily on the right. In contrast. credible interval (3) excludes
109
CRmCALAPPRAISAL. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ a ~gion of exactly ~ conlent an the right beginning at zero. 1be final cmlible interval (4) is termed the highest poSlCrior dc:nsity (HPD) intc:rval and is constructed 50 that every parameter w1uc: within the inlcmll has a higher density than every value: outside: the inlerval and hence they arc mo~ likely values. It can be demonstnted that the HPD interval in any particular case de:tennines the shadest cn:dible inlcrYai. If the posIcrior distribulion is _synunelric then the equaItailed and HPD intervals coincide:. TIle concepI of a CRdible inlcrYal JenenaJises to mo~ than a single parameter. •
(4)
•
I
(3)
~~----~~~--------~: (2) II II • I~
(1)
::
I;
..
credible Interval Credible IntetvBls for an unknown paramstfN
The second figure illuSlrales a bivariate (lwo-panuneler) HPD region COnsb'Uctcd in euctly the same way as the uaiwriatecasc but with every point within the region having a highu density than every point outside it. 1be specific example is IBken from an extensian of the Bmdley-1Cny pain:ck:omparison madeI (see. far example. Imn:y. 1998), accounting far lies. Far a given pmbabilily conical such a c:mtible ~gion has the smallest arcL AG
4 I!
1
3
II
12 1!
11 ~
0 0.4
-0.7 0.6 0.8 0.9 0.5 Probability of trealment preference
ciacllblelnterval Bival'iate HPD region
1.0
1IDn!"P. B. 1998: BndJey-Tal)' madelInArmil8F. P.aadCoIt. . T. (cds) En"dopet/kl of biolltltislirs. ctuchestcr: John W'aIcy &: SGas. Ltd.
critical appral.11
This is a process thai evaluates reselRh MpOrts and assesses their contribution to scientific knowledge and is typically applied to n:seardJ papen in medical journals. A careful evaluation of the medical literature is important because the quality of n:sean:h is variable, and oRen very poor. It is imprudent to assume that a paper is error f~ just because il has been published: even papen in well-n:spectedjoumals contain faults that cast doubt on the conclusions. Allman has n:seaadIed the exlent and implicalions of emn in the medical lilcnllUle. estimating that reviews have found statistical emxs in about half of published papers (Altman, 1991a). n.e pmbIem of poor-quality research is set in the context of the incn:asing use of statislics in the medical Iiteratl1J'e. Altman (199la) describes two surveys of n:seardJ papers published in the New Eng/tlIft/ JOUl'llllI 0/ Medicine, in 1978-1979 and 1990. III this lime the proportion of papen conlaining nothing more llum descriptive Slatislics fell fram 27., to 11 CJt, while the proportion using mon: complex statistical mc:&hods. such as SUlMVAL ANALYSIS. increased dramalic:ally. A good UDCIcrstancIing of slatistical analysis. alongside: BD awareness of statistical iaues surrounding ICselRh de:sign and e:Jlecution, ~f~ is essential to effective appraisal of the medicallilcnllUle. AllmaD (1994) BIIuc:cl. in an editorial enlilled 'The scandal of poor medical research", thai re5elRh in the medical amHI isoftca done with the-aim offunhering a curriculum vitae. rather than promoting scientific knowledge. He sugests that: 'Much poor I'CSelRh arises because I'eSClRhers feel compeDed for CIRCI' MaSOns tocany out n:sean:h that they arc ill equipped to perform. aad nobody stops them. " The situation is eampaundcd because the individual is 'expected to carry out some n:sean:h with the aim of publishing several papers'. the number of publications being "a dubious indicator of abililY to do good n:search: ilS n:Jevance 10 the ability to be a good cIactor is even mo~ obscure' . This culture. AllmaD argues. leads 10 poor-qua!ity ~ search. An addilioaal difficulty arises bec::ausc jlDlior doc:lOn typically move jobs frequeady, but are nevertheless often expected to conduct racardI during their short tenures. This may lead to small sample sizes as weD as inaclequalc time far baining. planning. analysis and formulalion of conclusions. Purthu problems may occur when in\'estigalon an: expected to complete n:search initiated by their pralc:cessors. Altman (199lb) suggc:slS that easy access tocomputen BDd slalistical packqes ulIIICICOmpanied by cOlTeSpODding technical understanding. as well as inadequate statislical education, also contribute to the c:mn.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CRmCALAPPRAISAI.. Elscwbcn:.. Albnan (1980) describes the grave ethical implicalioos of poor-quality research. He argues lIIat it is unethical to carry out bad scimlific experiments since palienls may be subject to unnecessary risk. discomfort and inc:onYenience.. while other resoun:cs. including the n:seard1er's time. ~ diverted from more valuable functions. 11lc publication of erroneous n:sults is also unethical since it may lead din:ctly 10 patients receiving an inferior treatmcnL Mon: subtle consequences arc the encouragcment to Olber rescardtcrs to n:plicatc Hawcd methods or 10 do further n:search based on erroneous pn:miscs. as wcll as the difficulty in gclting ethics commiUces to permit further n:search when il is lhoughtthat Ihc: '~cr aRSwer is known. Many medical journals cmploy statisttc.al n:Yiewers in an atlcmplto improve the quality of racardJ design. statistical analysis and presentalion of l'CSults (see STATLmCAI. REfUEE. ING). Goodman. Albnan and George (1998) rcpoIt lIIat in a 1993-1995 study 37.. ofjournals surveycd had a policy that guarantecd a statistical n:vicw bcf'ore an acceptance decision. Direct eYidence of the etTect of stalistical n:viewing is limitcd. However, Schor and Karten (1966) studied the implementation of a programme of statistical n:view at a leading medical journal. Of the 514 original contributions consideral 26Clt were judged statistically acceptable: this increased 10 74'i, once these manuscripts had bem publishcd aller statislical ~vicw. Ganlncr and Bond (1990) pcrfonned a similar study on 45 papers submitted 10 the S,.ilis/, Medica/ JOUT#IIl/. 11lcy found thai only 11 'i, WCI'C inilially considcn:d suitable for publicalion. but after stalistical n:vicw 84CJt. wen: n:glUded to be of an acccptable stalistical standard. Howcvcr.lhcn: is much research that has not undergone statistical ~vicw and reviewing itself is a subjeclive process. It is thus essenlial to n:od papers in the medical literature cautiously: the n:putalion of a journal is not a guarantee of the quality of research n:portcd. Errors in racardJ vary in their magnitude and impact and a major elcment of critical appraisal is then:fore to evaluate their poICDtial cffects on conclusioos. The followingdcsc:ribes SOMCCOmmoncnolS in Ihc:dcsign. analysiS. presentation and interpn:lalioo of medical n:searc:h. 11tc c:ommcnls ~ not comprehensivc. but n:preSCllt some of themon:widespn:odandimportantcrrorsmadcinlhcn:search process (sec also pitfalls in medical n:scan=h). Andersen's (1990) book. Mel/,otI%ricD/ errors i" medico/ ,.esearch.
contains a more completc description of crrors illustrated by examples from the medical litcratun:. although the author nevertheless describes il as an ineomplete catalogue. Medical I'aieIII'Ch can be broadly divided into CUHIC.o\L TRIALS. COHORT SllJDIES. C.o\SE~ON1ROL SllJDIES and CJlOSSSECTIONAL STUDIES. Clinical trials an: cxperimental studies when: the invcsliplOr assigns participants to different interYcntions. preferably randomly. The well-conduclCd nndo-
mised conlrollcd double-blind dinicallrial comes the closest establishing cause and cR'ccl between intervention and outcome in a single study. Cohost studics. casc-control studies and crass-sectional !!Iudics arc all observational !!Iudies. Hen: Ihc: investigator observes participants without making any inten'mtion. Conclusions an: noa cORSideR:d as robu!!l as lhasc from cxpc:rimental studies because factors controlling cXpDSlR may also be relaled to the ouame. However, obscrvatiooal !!Iudics an: common because it is oftcn impraclical or uncthical to assign participants to intcrvmtions: for example, it would noI be possible 10 assign subjects to be smokers or nonsmokers. Each study design has advantagcs and disadvantages for specific research questions and an important initial consideration in critical appraisal is whether an appropriate experimental design has bem employed. Several of the following criteria ~lale to clinicallrials when: rigorous design and conduct of studies is impcmli\·c if n:sults an: to be conclusive. 11Ic vast majority of research studies cannot consider the whole popuialion of intcn:st and Ihc:n:fon: a sample is selected. Results from this samplc an: then applied to the population ofinlercsl.lflhis infen:nce is to be valid it is vital that the sample is n:presenlative of the population. A key concept is thai of random selectioo; if the pmlicipants an: randomly sclected from Ihc: population then then: is the best chant'IC or the sample being truly representative. 11lc n:search setting is often a penincnt CX)nsideration hen:: a srudy of childbinh in a maternity hospital n:cciving a hish proportion of referrals for complicatioos may nol be rqm:scntalh'C of childbirths throughout Ihc: c:ounlly. DRoPOUT or refusal rates an: also important issues because then: is a strong possibility lIIat those who do not take part in a study arc systemalically diffcrent from those who do. Allhough dropouts and n:rusals should be minimisc:d. they an: usually inevitable and a good rescan:h paper will describe the repn:scntative nalUn: of a sample by n:porting clearly the number originally selected. as well as the number completing the study. Reasons for dropout should be given. if possible. and any available characteristics of Ihose who do not compleac the study compan:d with those thai do. All n:scarch papers must describe fcal~ of the study samplc so thai comparisons with Ihc: mcwnt populalioo can be made. A very common problem in experimental design is the lack of a pre-study sample size calculation. This indicaICS the number or participants n:quin:d to be reasonably lilcely to deleet a clinically significant etTect.lt is considcn:d uncthtc.al to undertake a study with insufticicnt numbers 10 detect such an etTect. Thcn:fon: it is important that a prc>study sample size calculation is performed and described in Ihc: n:scan:h n:porl. providing su01cient detail. about the assumptions made, so that the calculalioo can be verified. Sample sizc to
111
CRmCALAPPRAISAl. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ aJIISiderations should be based on the primary outcome variable in a study. and should also allow far dropout or refusal rates. In clinical lrials the concepts of blinding and nndom allocation to intervention are essential aspcclS or experimental design. Blinding is necessary becausc BIAS may coler a study through a participant or observer knowing the intervcotion aDocation. In a double.-blind clinicallrial neither the participant nor the observer is awan: of the allocalion. Blinding is clearly nul always feasible. ror example. in a trial comparing an intervention or physiotherapy to no physiotherapy among a sample of elderly patienlS who ha~ had a fall. In some trials. it may only be possible to blind the participant and not the observer (known as single-blind trials). but it is impodantthat the maximum level ofblindncss possible is used.. Random allocation to illlenention is a further desirable feature of clinical trials. It is nc:cess8l)' that groups of participants receiving different interventions are as similar as possible so that any effects at the end of Ihc: lrial are attributable only to differences in the intervention. RANDo. MlSA110N optimises the chance that the groups will be as similar as possible. Unfortunalely. many 'trials' are not planned. but instead are basc:d on existing routinely collected data. Allocalion to intcnention in Ihc:se instances is never truly random and is often particularly derective. Forexample. two surgeons in a hospital performed many operations to ~uce snoring. each surgeon using a different technique. Analysis of sevemJ years of routine outcome dala attempted to com~ the two techniques. Here the surgeon was II ClOIIfoundcr and it is not possible to deduce whether differences in outcome wen: due to Ihc effects of the surgical technique or of the surgeons who operated. The double-blind randomisc:d controUed trial (see CUNK'.a\L 11lLW) is considered the gold standard of medical rescan:h. If a rescard1 n:poIt stales thai blinding and random allocation have been usc:d then it is impDJtant that the procedures employed in implementing each are desaibed~ it is insufficient to assume that authors understand the meaning of these tc:nns. EITors in n:scan:h design would be reduced if statisticians YI'CM consultc:d more often in the early stages of the research proc::as so dial statiSlical issues could be conshleml throughout (see CONSUUINO A srATIS11C1AN). Unfonunately. enors in the design of expcrimenlS an: nearly always impossible to axrcct aad ~fore the resean::h may be falally "awed. Errors in statistical analysis are also Widespread. Many statistical techniques make assumptions about the data to which abey are applied. but a mislake often observed in n:sean:h papers is thai these asswnptions ha~ not been met. A common assumption is that ofdalaconrorming to a NOIWAL DlSIRIBtmON. II is. unfortunately. not always possible 10 lell whelhcr a variable is aonnally distributed when the raw data
cannot be inspected. However. summary slalislics may be provided and for measuremenlS that cannot be negative. which is often the case in medical n:sc:arch. it can be infem:d that the: data have a skewed distribulion ir the standard deviation is more than half the MEAN. although the converse is nul necessarily tnJc. When daIa do not ClOIIfonn 10 the assumption of nannality they should eilhc:r be transformed (sec 11lANSRlStAnON) or NONPARAME11UC METHODS used instead. It may be clear from graphs or ranges that oulliers arc pn:scnl in data. These can haYC a ClOIIsiderable effect on Slatislical analyses. Generally. however. values should not be allCI'ed or delded if there is no evidence or a mistake. Instead if OUTLIERS are pn:scnl a n:scan;h paper should indicate that Sleps were taken to inveSligate their elTects. Again. transformations. ar nonpanunebic methods may be appropriale. A common assumption of Slatistical tcsls is thai all the observations are independenL However, multiple obsenalions on one subject arc not indcpcndentand should therefore noI be analysed as such. For example. the results of hearing tests in the right and left ears of a group of study participants should not all be entered into an analysis whcm observations are assumed to be independenL InsIead. the a\'UBge of the right and left measurcmc:nts could be laken. or the results from just abe left or right ear might be chosen. It is also erroneous to anaIysc paired data ignoring the pairing. Pain:d data can arise when a one-to-one matched design has been uscd or when two measurements are made on the same subject. e.g. beftR and aflc:1' In:a1ment. METHOD COMPARISON sruolES are common in medical n:search and CORRELATION is often misused 10 assess agreement between the two. Com:lalion measures linear association. rather than a,greement. so if one method always gives a value ofexactly lwice the other method a perfect correJalion would be found although agn:cment is clearly lacking. Instead. ag,"ment betwc:en two conlinuous variables should be assessed using the technique described by Bland and Allman (1986). A major problem c:ncountcn:d in the analysis of medical rescan:h is that or multiple testing (sec ).ftJl.11PLE ca.IP.wso.~ PROCEDURES). Choosing the conventional significance level of 0.05 means thai if 20 statistical tests were performed we would expect one to be signiftcant )JtRly by chance. Then:.fore theconclusionsofa papcrreponingoncsignifiamt result among 20 teslS performed should be tempered by this fact. It may be that an adjustment for multiple 4XJIDparisons. such as a BONFERRONlCCRJlfC11ON., is appropriate. A distinction should always be made between prior hypotheses and those resulting from explondion of the data. so that the same daIa arc not used for testing. a hypothesis as for generating it (see POST HOC A.~ALYSJs).
Other emxs in analysis abound. including the use of conclalion to relate change to initial value. failUM 10
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CRmcALAPPRAISAI.. take account of onlcn:d categories or the evaluation of a dialnoslic lest only by means of SENSmvlTY and 5PfCIf1C1TY whea the POSmYE and NfXJATIVE 1IIlEDICm'E VALUES would be more informative. Whatevc,. method of analysis is used. an importanl n:quirement of the ~ report is thal all techniques employcd an: clearly specified for each analysis. Unusual or obscure methods should be referenced and methods that exist in more than one form. such as Pearson's or Speannan's correlation coefficient (see CORRELATION). must be idenliOc:cl unambiguously. EI1OI'S can also be made in presenlalion. although these may have more trivial implications for the conclusions of a paper than the errors describc:cl earlier. Nevcnbeless, good presentation is important to ensure that Ihe readu is not misled or confusc:cl. is WOIth Doting thal a paor~uaIity paper may not necessarily describe poor-quality research. but ifinsuflic:icnt detail is provided in a report it is unsalisfac::tory to assume that the research has been performc:cl acceptably. P-valucs an: oftcn used in the medical literature 10 indicate statistical significancc. Howcver. it is preferable that CG."lR. DENCE INlERYAU an: used in the presentation of results to gi~ an immediate idea of the clinical signilicance or an effect. If a 9SC)'t confidence interval does not include zcro (or. more gCDmllly.the value specified in the NlJLL 1n"POTHESIS) then the P-value will be less than O.OS. 'lbus there is a close relation between confidence inlerVals and P-valucs (see lESI'S AS ~CE INJ1!RVALs). but confidence intervals additionally dcmonstrate the magnitude or the effect of interest. MEASURES OF SPRE.W should be quoted alongside MEASURES OF LOCAllON to indicale variabiUty around the a\'mlle measurement. However, the ± notation is discouragc:cl because its use to denote the stANDARD DEVIJ\1KXII. the standard ClTOr and the half-width of a coaHdcnce interval has Ic:cIto some ambiguity. ThUs. rather than describing mothers in a study of pregnancy by saying. ·the mean age of mothers was 28 4.6 yeaJS'. the data am better summarised as 'the mean age of motheJs was 28 yean (SO 4.6)' • Since the vast majority of statistical analysis is now performed using computers, research papers should present exact P-values. which an: far IDCJR informative. such as P =0.014. rather than ranges. such as P < O.OS. 'lbe notation "NS' fornonsigniftcant is even less re\'eBling. However. there is no need to be specific below 0.0001. Authors must justify the appropriateness of a one-sided P-value quoac:cl in a research papcl'. A one-sidc:cl P-value should only be used in the vel')' rare situation where an observc:cl difference could only have oc:cum:d in one direction. The decision to use a one-sided P-value should be made prior to the data analysis and hence nol be dependent on the raults. Spurious precision is m.oahu common enor in the medical IilerallR that impairs the readability and credibility of a papel'. When presenting raulls the precision of the original dala must be bomc in mind. Altman (l99lb) suggests that
.1
=
means should not be quotc:cl to more than one decimal place than the original data and standard deviations or standard errors to no more than two. Likewise. pen:entages need not be given 10 more than one decimal place and P-valucs need not ha\'e more than two significant figures. Errors also arise in graphical pn:senlalion (see ClRAPIIICAL Df.XDTION). Gmplw that do DDt include a true zero on the vertical axis or thal change scale in Ihe middle of an axis can be misleading. as can the unnecessary use or thrce-dimensional effects. Other crron include the plotting of means withoul any indicadon of variability and the failure to show coincident points on a SCA11ERPlDT. Misinterpretation is common when P-values an: presented. It must be remcmberc:d that the con\'enlional cutoff of O.OS is purely arbitrary. A fla(ucnt mistake is to interpret a value of. say. O.04S as significant. but a value of O.OSS as nol signiOcanl. when in reality there is very little difference between the two. Then: is also a preVailing belief thai sigDiOcant P-valucs are indicative of more successful rescan:h than nonsigniOcant P-values. This auitude is retlmed in studies being describc:cl as ·positi~· or 'ncgati~'. depending on the signiOcance of the findings. Results should not be evaluaaed solely on the statistical significance or the Ondings, but also on their clinical significance (see ruNICAL VERSUS STATImCAl. SKl.'UFJCANC'£). The usc: or conl1dence intervals is a helprul antidote 10 this problem. A rurthe,. serious error or inlcrpretalion is to interpret ASSOCL\TION as causation. The only type of slUdy when: CAUs.\UIY can be inrem:d is a well-conductc:cl randomised conuulled trial. Otherwise. gn:aI can: should be taken in the interpn:laIion of resulas: in particular. the likely effect of confounders must be consicL:rm. A ftnal area where oonclusions an: ollen nol an:alcd with suf1k:ient caution is that of inference from a sample 10 a population. Although a sample should theoretically be random, in pmc:t.ice this may not be realistic:. 'lberefore a researdl paper should attempt to repon any likely biases in the selection proecss and implicadons this may have for the findings repartcd. When critically appraising a research paper it is helpful to have a checklist of issues to c:oasiclcr. A checklist is particularly useful because it is easier to spot errors than omissions and. as already nOled. it is inappJOpriale to inrer that a aJlMCt procedure was employed when the relewnt inrormalion is not includc:d. The Bri/WI MedicQ/ JOllmQ/ provides two checklists ror use by its statistical reviewers that can be usc:d when cridcally appraising a paper. these an: published in Gardner. Machin and Campbell (1986) or can be round on the British Medical JoumQI website. One checklist is intended specifically for clinical trials and so includes questions relevant only to this study design; the othe,. is for usc with all othe,. study types.
113
~
~
R
]
~
~
R
C
_____________________________________________________
In canclusion. critical appmisal is an essential skill for of the mcdicallilCratlR; it is impolUllltlhat readers haw: the conftdence to question c:onc:lusions sIaIed by the aulhan and the statistical knowledge to assess the mclhods used. 'l1Ic consequences of Ihc mage of errors described in this section can wry between n:ducilll the readability of a paper to n:vcning the din:clioa ofthc results. An important part of the critical appmisal process. thcn:fCR. is to make a juqcmcnt about the implications of' any issues raised. A study should not be discarded becausc a single naw is found. buL instcacL a subjectiw: assessment of Ihc impact on the findings must be made. SRC IISCI'S
A.I.w•• o. 0.1980: Slatistics aad ethics in medical mem:h.Britirb M«Iiml JDIImIIl28I. 1112-4. AI...... O. o. IWIs: Statimc:s in medical journals: dcvelapmads iD the 19IOs. SllItislirs in M~tlidM 10. 1197-911. A......... O. G.I99lb: I'rllclimlJtatlJticslornredical remlrch. Laadoa: Chapman at HaU. Altman, De G. 1994: 1be sclllldal of poor medical 1'CSCEh. British Mmiml JDUmIIllOI, 213-4. Aadenea. B. 1990: MellrotlolDgical ",ors in nWiml remlrM. axfanl: Black~lI...... J. M..... AlIma. De G. 1916: Stalislical methods for lIS5CSSiag qrccmcnt 1lel\\'CCn two methods 01 clinical mcasumncnL lmrtel I. 307-10. O'.....r, ftl. J. ad . . . J. 1990: AD explar81Dl)' study or statisIicaI asscssmcat or papers publisbcd iD the British Met/itlll Jourlftll. Joutrllli of lire Amft'kQIJ Mediml A,sor.';on 263, 1355-7. Oud-
. .,M.J., ........ D..... c..apbell,M.J.19I6: Uscorchec:k lists iD assessiag the statistical c:aalent of'medicaI SIUdics. Br;Iirb Medical Joutlllll292. 810-12. GoodJnan, s. No. AIIDID. De 0 ..... 0......-, S. L 1998: SlatiSlicaI rmewiDI policies or medical jaumals: camll lector? JIllU'IIIII of Gellmll Inlmllli MmidM 13. 753-6. Sdaor, A. .... Karla, L 1966: Statistical cwluatian of medical jaumall'DIDusaiplS. Journal of lire Ameritllll MedicallWDdlltioR 195. 1123-8.
crossover trtal.
11Icsc are trials in which patients are allocated 10 sequences of tratmcall with the object of slUdying diffen:nccs bclWccn individuailratmcnlS or subseqIICIKlCS of treatments (Scnn, 2002). 'I11at is to say, each palienl is treated man: Ihan anc:e and the responses under difl"cn:at In:allDcnts for the same patient can lhen be compared. This is besl explained by considering some examples. Suppose we an: inICRsIcd in gcacral in comparing tn:atments A. B. C, etc.• and Ihat patients will be allocated to sequences of In:aImcat of the form ABC, CBA. etc.• whcn:. for example. ABC means thai the patient will m:eiw: A in a first period. B in a second and C in a thinl. When only two In:atmcnts an: being compared. a w:ry popular type of ClVIISDw:r design is one in which patients an: allocated at random and usually in equal numbers to one of two scquences AB or BA. Such a trial was run by Gralf-Lonncvig and Browaldh (1990) comparinglhe cf('eclS of single doses of inhaled fonnolcrol (1211g) and salbutamol (200 III) in 14 mocIenatcly or sc~ly asthmatic In:atmcnlS. If we giw: the
label A to fonnotcrol aad B to salbulamol. Ihcn childn:n wen: allocated at nuuIom to one of the two sequcna:s AB or BA. Whc= thn:c In:almcnlS are being compared. patients may be allocated in equal numbers to one ofthrec sequences formilila Latin Iquan:. either ABC. BCA and CAB or ACB. BAC and CBA. or it may be that both Latin squarc:s would be employed. 10 that patients would be allocated in equal numben to each of Ihc six passible sequences involviq each of Ihc Ihrec treabDcnlS. For example. Dahlof and Bjorkman (1993) compared two closes of the poIaSSium salt of diclof'enac (SO mg or 100 mg) to placebo in the IJulJncat of migraine in 72 patients. If A is placebo. B is the lower dose of diclofenac and C Ihc higher ODe. dlen Ihcir design involved allocaling patients to one of the six sequences ABC. ACB. cIe. MeR complex dcsiglUii than Ihis an: possible. Forcxamplc. il may sometimes be the case that the number of treatments that one wishes to study is In:ala' than the numbcrofpcriods in which it is consiclcJm n:alislic to In:at paticnlS. So-called inro",pkle blDCk de~igns. in which patients receive suitable: chasen subsets of lhe trealments to be inw:sligatcd. an: papular. At the adICI' extmnc. it may be lhal it is possible to tn:at patients in IlIOn: periods than Ihcn: are Imdmcnts. lcadilll to so-called ,eplicllie deJ;gns. As we shall discuss. these an: extremely useful for the purpose of studyilll an individual response. Because crollSD~ trials permit comparisons 011 a withinpatient basis. they are ef1icient compan:clto parallel group trials aad consiclcnble savings in patient numbers an: po... sible. Howc\'er. CI'05SO\'C1'triaJs are clearly UftSuitable for any condition in which death or cure is the outcamc and Ihcir appIUpIiatc usc is rcstric:Ied to chronic diseases and an:alments whose effects are rewniblc. Suitable conditions include asthma.. rheumatism and migraiDc. However. it is naI juSlthe condition bulthe lrealmcnt and the ENDIIOlNI'that determine the suitabilily of crossoVCl' bials. For example. they can be used to study blood pn:s5um itself in short-tenn trials in hypcnension bul naI lhe long-lCnn sequelae of hypcl1ension. such as. for example. slrOkc or kidney or eye ciamaIc. In aslhma dley are more suitable for saudyingthe effcclS of bcla-qonists. which an: n:lativcly shalt tcnn and n:versible. than those of steroids. which havc IongerlCnn effects. In such condilions when: crossovcr trials may be employed. it is nearly always the case that the sample size laIuinxi to prove eflic:acy. even if a puaIlel group lrial is used. is considerably less than Ihat required to demonsb'ate safety of the drug. Hence. in Phase ID. whe~ safely considerations an: extremely impadant. then: is no point in reducing. the sample six by employing Cl'OIISDVCr bials anyway (sec PIWE III 11lW.S. PIWWACOVIOILANCE). Conse.quently. some discussions of Ihc comparative mcrill of cnJSSOver trials and parallel group trials that appear in die
_____________________________________________________ scientific literature
IR
ndher misleading. In practice.
crolSD~ trials aR DeVUaD alk:mali~ ror Ihc: major parallel
graup trials carried out iD Phase III. They can, however. be exlremcly useful in Phases I and II for pharmacokinclic and ph8llDllCClclynamic modellins. ror dasc finding for aolcmbility in healthy volunteers and ror ef1icaey using pharmacodynamic outcomes in patients (see PHAsE I 'I1UAI.S. PHAsE II 'I1UAU). They CaD also be: useful elscwhc:n: for answering certain specialist qucstio.. such as. ror elWllpie. clcmonstratins equivalence (see EQUIVALENCE STUDIES: DESKIN) of generic aad brand name products using so-eaUc:d biocquiwlcnc:e studies (Scnn and Ezzd.. 1999). Unlike the parallel group trial. the basic unit of RpIicatiOD in a CrossoVCF dc:lisn is nol ahe patient but an episode or Iralment. Since a genc:nl necessary assumption in standard analyses of eapc:riments is Ihal there is no interfCRIICC between units, this is clearly potentially more problemalic for CIVSSO\'CI" trials than rar parallcl group trials. It is inhc:rendy IlleR plausible thai the In:atmenlgiven to a palient in an earlier period may arr«t the respaasc rar the same patient when being given a further lreallnent in a subsequent period. a phenomenon known as t:llrry-fnw, than thai the In:alment given to one palient may affect another. ("I"hcR aR some cases wbc:re even parallel group trials may suffer rrom inlClference between unilS, iD particular irinfectiousdisc:ascs ~ invol~ or irpvup therapy takes place. This may lead to clustcrrandomisalion being nccesSlll)'. but this is plausibly an infraauc:nt problem.) In fact. carry-ovcr is Iqanled as being the central (potential) difticully of CI'05SO\lCl' trials and much of the considerable Iitcratun: devoted to the design and analysis of these trials is concc:mc:d with malten to do with COIIlIOllins for cany-over. The phenomenon of cany-ovcr means thai it is prudent. incleed necessary. to employ a so-called wtUhDllt period. This is a period bc:twccn the measurement or the effect of one Iralment and the neat in which the elT«t or the previous treatment is allowed to cliSSipale. Washout tn:aImc:nts can be ptlS6ire. if washout is alloweclto occur withoul any In:alment being given (Senn. 2(02). This may seem the natural approach from the experimental point of view. It has. however. the disadvantage that the patient is eapected to tolerate a period in which no therapy is offeml. AD a1tcmali~ strategy is that of employing an tlt:/ire washout period (Senn. 2(02). 'I1Iis might involve a near-immediate switch or the patient's therapy but a delay or measurement until a suitable period has taken place during which the effects or the previous In:alment ha~disappemaL (For rurthcl-delails. see Senn. 2002.) It seems plausible Ihat ClUSSO\lCl' llials will be more vulnerable to ckopouts than parallel pvup trials because or the greater demands OR palient lime thal the ronnc:r make. because davpauts in one period will also lead to loss of data from subsequent periods and because incomplete data will
CRO~V8R]R~~
unbalance designs and lead 10 dispnlpOltionate losses in information. It should be noIc:d that in nearly all CrossOVCF trials. subjects are nol recruited simultaneously. The exception is some designs in whicb healthy voluntcc:n IR used. Far designs involvilll patients they must. of course. be b9Ic:d when they pracnt. Consequendy. 'period' has a rellltiW! meaning in the conlext of CI'OISDYCr lri• . Far eumple. in an ABIBA CIOSSO\ICI' some patients will usually have complc:tc:d both periods oftrealmellt before oIhen have even swtcd in the trial. A popular linear model ror responses for a crossover llial with I palients in periods J with Ttreallnents gi\'ing rise to L rarms orcarry-over may be expressed as follows (Jones and Kenward. 1989). We let the response in period}.j= I ••• .• J on patient i, i = I. . ... I be r u. the: treatment given to Ihal patient in that period be t(i, j) = I ..... T and the form of cany-over be /(i,}). Then we write:
Hc:n:1' isapandmc:an. "', is an effect due topatienli•.lI'Jisan eff«t due to periodj, 1'"1Jt is the effect of treatment I(i,/). ll(t.l) is the: carryover effect of type /(i,/1 and the .(1 an: within-patient error terms usually assumed identically and independently distributed with variance u2 (say). TIle following points may be noted in connection with this model. 1he model is seven:ly OVCIpInIIDCtcriscd. HowevCF, interell centres on conlnlsls between the various ~ tenns and. given various reslrictive assumptions about the cBll')'O\'Cl' terms.. Ihc:sc will usually be estimable. Since then: can be no cany-O\lCl' in period I. ror each patient ~/.IJ =0 rar all i. In practice. to make pmgnss in ellimation. runhc:r restrictive assumptions IR introduced about the CarryovCF terms. 1'ben: are two VCIY popular choices. TIle first is to assume that any washout sll1degy has been successrul maclthat all C8I1)'O\'CI" tenns aR ZCI'O. 11M: second is to assume thai 'simple' carry-ovcr applies and that carry-ovcr clepcnds only on Ihc: treatment given in the 11Kvious period., 50 that we may write I(i,})=t(i.j- 1),j~2. This last assumption may seem more reasonable than the first but in practice then: are very few imaginable cin=umSlances under which the second assumption would apply if the fint did nat, as it seems plausible that if cany-o\lCl' occumxllhc: effect of Ihc: enscndering lrealment would be: modified by the pc:nurbcd lmIlmCnt (Senn. 2(02). In practice. although designs can easily be round in which patient. period and In:alnlent cff'ecls aR onhogonalto each other. ir carryover effi:cts an: included. the desip matria will usually be nononhogonal and thc:re will be a loss in efficiency. For CCltain designs. for most purposes il makes no dilTerence whether the patient elTCCIS flI IR labn as F1XJ!D ElRCI"S arRANDOM EfRCI'S. However. for incomplete block designs in
115
CROSS-8ECTIONALSTUDIES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ which T>J. interpatient information will usually be R:covcnble by taking the patient effccls as nndorn, and this will also usually be the case: if carryovcr etrects arc included in the madel (Scnn. 20(2). The fonowing arc a number of conwvenies and issues that arc relevant to crosSO'VU desiJ;11S. The most notorious CXIIItroversy c:onceming cany-over has been in connection with the ABIBA desip. For many years a popular approach to dealing with cany-ovcr was the so-called two-stage pro~IR originally proposed by Grizzle (1965). He noIcd that in thc ~scnoc of carry-over the Raiment effcct was not estimable and hence proposed thai a preliminary test ofcanyover be made. If cany-over wcn: delccted.1he second period data should be ignored and a between-patient test usilll flrstperiod data only should be employed. However. a subsequent paper by Fn:eman (1989) showed that this strategy was eXRmcly biased as a whole and did not maintain nominal 1)rpe I enor rates. It is possible to adjust the two-~e proecdure so that it maintains the overall Type I error J8te (Scnn. 1997; Wang and Hung. 1997) but it has less power than the strategy of simply ignoring carry-over and is not recommended (Sean. 1991). Various eXbcmcly complica!ed designs and analysis strategies have been proposc:d for dealing with cany-over. They aU make n:strictive and unn:alistic assumptions about the nature of cany-over. howcver. and they nearly all involve a penalty in terms ofincreascd \lDlianees of estimators of the treatment efl'ect (Senn. 200:2). This would seem to lea\'C washout as the only reasonable slntegy for dealing with cany-over. However. this approach is bound to leave some investigDlOrs unhappy because of the n:liance it makes on judgcment based on biology and pharmacology rather than slnlc:gies of design or analysis based on PlRly slDlistical principles. Man: general envr structures than those considen:d here an: possible. In particular. one could allow for a tl1lC nndom effecL that is to say the possibility that diffen:nt patients react ditren:ntly to treatmenL From one point of view we would then have as a random intera:pt panlmc:ter for each patient but then add a slape parameter for a given tn:alment for a given patient. For a givcn tn:alment these would then be assumed to be randomly distributed with unknown variance to be estimated. It is also possible. although this appears to be rarely atlemplCd. to allow for an autocorrelation or the within-patient cnors.. Baseline values at the beginning of each treatment period an: sometimes collected in crossover trials but extreme care is R:quired in their usc. For many desips it is quite plaUsible thai the outcome values may be unaffected by cany-ovcr but that the baseline values would be. In that case. incorporatilllthe baseline wlues in the analysis might inlroduce a BIAS due to cany-over that would not otherwise be pn:sc:aL
.1
1ben: have been various attempts to produce Bayesian analyscs of crossovcr trials (Grieve. 1985). In theory this is atlrac:tive in that it pennits compromise positions to be adopted between that of assuming that cany-over is absc:at to allOWing that it may have any yalue Dl all. In practice. it is difficult to caplun: in the model the dependence that must ineVitably exist between belief in the magnitude of the tn:atment effect and thDl of the canyovcr effecL Ironically. despite the CXlllsiderable potential of crosSO'VU trials to measure individual response to tn:atmc:nt. especially if n:plicate dcsips an: employed when:by the number of periods ex~ the Dumber or tn:alments 50 thai J > T. and hence make a genuine contribution to the currently fashionable field ofpharmacogenomics, this possibility has n:ceivcd most Dltention where it is least impoltant. namely in investigating individual n:sponsc to ditrerent formulDlions in the context or bioc:quiwlcnoc. Despite some limilDlions of application and some difficulties in their usc it would be wrong to conclude. however. that crossover trials have no place in drug devcJopmenL They can be extn:mcly efllcient compared to parallel group trials and an: far superior for the purpose of investigating true random effects. They arc extrc:mcly valuable on occasion. in padicular in phannacokinetic studies and in dose flndilll in Phase II. 5S
C._
o.bIaI. BjoItuaD. R. 1993: Diclorenac-K (SO and 100 me) and placebo in tbe acute ~atment ofmigrainc. CepIJalalgia 13,2. 117-23. I'reeaIaD. P. 1919: 11ae performance of the two-stace analysis of two-babnent. two-periad cross.over trials. StatirtiC's in MedicineS. 1421-32 Gralf-Laaaeri&. V.aadBrowaldta.L 1990: l\\~1vc: haulS broactIacWatinc effect of iDbaled fonnCMaoI in chi.. dn:n \\ith asthma: adoublc-btindaoss-overstudy \'mus salbutamol. Clinical and E.~tal Allergy 20. 429-32. GrIne. A. P. 1985: A Bayesian analysis of the two·paiod cmssovcr design for clinical mals. BiomelriC's"l. 4. CJ79..9O. GrtzzIe~J. It. 1965: 1bc IWO-pcriad chaDge over design and its usc in clinical trials. BiomelriC's 21. 467-80. J..... B. and Keaward. M. G. 1989: De';''' antiQRQ}ysis o/C'rMSO(}'f't!T lrio/s. London: Chapman & Hall. SeaR, s.J. 1997: The case forcross-o\'cr mals in Phase ID (letter; comment). Stalirlirs in Medicine 16, 17. 2021-2. Seaa, Sa J. 2002: Cross-om' lriDls in
C'linitol resrtlTclr. 2nd edition. C1Uchester: John Wiley & Sons. Ltd. Sean, Sa J. ad Ezztt, F. 1999: Clinical c:ross-ovcr trials in Phase L Sialislimi Methoth in Medical Resetlrch I. 3. 263-71. W.., S. J. _ R..... H. PtL 1997: Usc of IWCHIa,ge test statislic in the twoperiod crossover trials. Biomelriu 53. 3. 1081-91.
cross-sectional studies The objective of a crosssectional study is to determine the disbibution ofa variable or the joint distribution of man: than one variable in a population. This may be accomplished by oblaining a n:pn:sentative sample of the population of inten:st lhrough the use of a So.lPLE RANDOM SAMPLE, a SITDlijied rtIIIdom sample or a complex survey design. Such a study is characterised by the faet that subjects arc only observed at a single point in
________________________________________________ time even Ihau,h the phc:llGIIICna lIISOCiatcd with the "ariables or inraall may haw: evolved duaup a d)'Dlllllic process dud develaps oycr lime (Kleinbau.., Kupper and Margenstcm. 1912; Roduaan and Gm:aIaad. 1991). H~. beaiuse the study subjects ~ only abscrved at . a siDJle point in time. csscalial reatun:s in the temponaI paIIcrns wiD be which n:ndcn il impossible to coaduct a lhDrau&h Ioncitudiaal ualysisof the phenOllleIlOla or inlcn:sI. This appaach is also sometimes usc:cI when coaduc:liD, an epidc:miolop: 1Iud)' in which sabjccls 1ft n:cruiled Without M1an1lo lhc:jrexpollRorcliseasc slatus. so lhal die inr~on on each conapands 10 the sIaIUs of subjccls aI the time of the iDlerYiew _Iy. In Older lo lIIJPRICilllc the inhc:reat limilalion "at wsls .wheaobservatiaasIR anlyobsaved aI a sin&ie poiat in ~ consider the h)'pothc:Iical esampJc iDUSInIcCI by the SCA11ER. PLOr in the fi&In. pari (a,. Subjecls 1ft obserYccI aI different aps ad the scaaterpJat sugc:sIs Iballhe ou.k:ome tends to dc:crease . . . illCla5eS. Iii COIIInsI. a /Oll&illlliilllll wauId obsc:ne subjc:dsat multiple tilDe paints.. dausenablin& lhe ilWCSliplarlo lAck the clm:1opment orthe outcome oYer time. It is appamIl dud the tracking ofiadiriclual subjects in oar hypalheaical example may have arisen either rlOlll an incn:ae·in die n:spaase.the sabjeclages (&pre (b)) or a decmIsc (_un: (e). B)' -ay abscmq subjccls at a IlDeIe point in lime. it is impossible 10 dislinpish between . . trends lISSDCiaIed willi enmlment into the SIIIdy and abase that e"olve lIS each indi"iduaI subject . . . eDillie. Uang and %cp, 19M). This limitation is nOt raolval by eaadu&:ling n:pe1llcd~aI studic:scanic:claut atdift"crenlpoinls in lime, unless the SlUclies ~ dc:sipacl 10 as 10 ablain n:pealcd lllllessmealS or the same indivicluaJs. In I:pidaaiolo&Y. a cross-sectional lIady canlnslS wilh a COIIOIn' SI'UDY and a CASI!-cama. mJDY. In a coIat study, subjecls are selected on the basis oftheircapasuM slabIs and thea rollowed IDltilthe disease develops 10 dull aD invcstipIOr can direatly assess the usaciatian wilh disease deYClopmenL In a typical casc-contml study. incideat CIIICS 1ft n:cruiled near the time at which the disease is diqnoscd and cxpIIIUM is assessed by 1eC8O. In either case. an inYellillIIOr wauld be studyilll iacidcnt cases while a ~ sludy would be stuclyilll JRWIc:at cases, i.e. eases thai may have acc:unal at 10111e time in the past. 1'IIis would confound eft"ecl5 on disease incicleace with etrecIs on pqnasis or survival. A aaa-secIionaI 1bICI)' is especially pnme 10 I.fKJIII-BWIJ) Sa\MIII.INO because the ~enl eases with a lon& period or illness bcrCR death would be IIIIR likely 10 eater the sludy ilia would a subject who diccl shorlIy after diaposis. 'I'hcRr~ the desiCn is primarily used in the study or diseases willi n:latiYely short-lmn effects. One example or such a SIUdy would be aD attempl to discxm:r causes or a road poisoniiaa aullHak in a school. by icIenlify., all
miss."
cAOSHE~srumES
(a,
• •
y
•
• • ••
•
•
•
• • x
(b)
6"'"
y
x (c)
y
.... -4
x ant. . . . clonal ___ Result (10m a hypothtfJIlcaI cross-secIionaI study (a) and 0DI'I'8fJIJ0IJd IongIIutfnaI studies wlllaincrflllsing (b)anddet:trJasing(cJ lime fiends
lI"nls and IISICSSilllthe specific road they had CDIISUIIICCI and whcdJer they had become iD. ADDIher concem in a cmss-seclional study or disease is whether there is a s)'stematic BIAS or inaccwacy in dac:
117
CR~VAUDATICN
_____________________________________________________
n:parIingofexpIIISIR bydiscase IilatUs.ln IDIIIC cases lhiscaD be avoided by usilll eXpa51R mcasUR:S thai an: not affected by disease:. e.g.. Ihe dc:tenninalion or a particular gcnolypc duough the usc of ,enomic analyses. However. irlhis cannot be avoided. Ihcn Ihis potential sauR.'lC or bias may limit the strenlth or Ihc associalion. as well as the stmIgih of the cvidence thallhe exposlR ofintc:n:Sla«ccls the aetiololY or disease. TRH ..... P. J.. L..... K.·Y..... Zepr, S. L 1994: AlrG'I)~is t1/ Ion,iludinal .'a.
Oxford: CIaJadan
~ss.
KIIIn......, D. 0.,
KUPPll'.L L ..............,.,R.I982:Epidtmiologicramrrlr: pTinrip/e, anti qrMllllilalil'e IMIIIDtlJ. Bc:11DDllt LifClimc: I.caming PUblicllliaas. ....._ , K. J..... 0"........, S. 1998: Moiern qilkmilllDD'. Philadclplaia: Lippiacaa-Ravc:n.
cra. .vallclaaon
Sec DlSClUMDIANT FUNCIIDN AlW.YSIS
crude blrlhldeath rate cubic spline
See DEMCXHWIIY
Sec SCA11ERRDI' SMOOIIIEIS
cura models A cure model can be used in survival analysis when lhere an: 'immunes' or ·Jolil-lerm sW"'liVOlS' pn:scnt iD the data (Mallcraad ZIaou. 1996). In such a selling. immuDC or CURId subjects DR a:1ISCRd siace cure can DC\'CI" be observed. while susccpliblc: subjects would e'VClllUally develop abe C'VCIIt ir rollowed for long enough. A lypical example where a cure anacIel might be approprialc would exhibit a Kaplan-Meier eSlimalc or the naaapnal lime-Eocvent cliSlribution thallc:vellcd 011' aliong follow-up limes to a IIDft7.aO value (sec: KAPLAK-MEIEa f.STIMAT(Il). An example is in studies or CIIIICCI' ror which a significant propadiaa or palicnls may be cun:d by the tlalmcnL A mixtun: model fcxmulation is one appIOlCb to analysing such data (sec: fINItE ~ DlSlRIBUI'IOKS). Assume that a rmelion p or the papulalion ~ susc:eplibles and the n:maining fraction arc not; thc:a Ihc survival runction S(I) for the population is given by: S(/)
=pS.(/) + (I-p)
whereS.(I) is the survival funclioa for the suscepliblegraup and where covarialcs can affecl bath p and S.(I). Let (I" dJ.o Z,) be the observations. when: Z, is a vector of covariates. " the observed or censoral lime and d, the censoring indicalor. lei D, indicalc cun: status for each subject denoted by V,= 1 for a susa:plible subject and D,= 2 ror cwal. Thus each censored subject has either D,= 1 and the event has nat yet occurral or has D,= 2. The incidence model is Iypically given by:
Amonl susceplible individuals. the lime to event has a distribution. such IS a Weibull (Fan:well. 1912):
S.(/ilD;)
=exp[-cxp(y'Zr)Ir]
An attractive reature or this model is Ihe two separate components. The parameters b IIICIISUIC the effect of covariateson whether the event will occur and the paramelelS Y measure the dl'ecl of Ihe covariates on when the event will occur liven that the subjecl is suscc:plible. These two components IR somc:limes called incidence and laleney and can have nice interprctalions in liven applications. DilTercnt formulations can be used. Li and Taylar (2002) and Y8IDIIpChi (1992) consiclcml parametric and semipanunclric accelcndcd failure time anacIels for the latency model. Kuk and ellen (1992). Peq and De. (2000) and 5y and Taylar(2000) consiclc:n:d a semi-parametric pmpodiaaal hazards model far the latency model. One pmblc.. associated willi the cure model is nearnanidcnlifiability (Fan:wcll, 1986: U. Taylor and 5y. 2001). This arises due to Ihe Jack or informalian aI the end of the follow-up periocl. n:sulling in difliculties in distinguishing models wilh a high incidence of susceptibles and lonllails or S.(I) f'nIm low incidence of susceplibles and short lails of S .(1). The iDcorpomlionorlongitudinai clata into the cure madel isoDC way to n:duce Ihe problem (Law. Taylar and SancDu, 20(2). While Ihc panunelcrs inp and S.(I) have nice inlerpretatiaas. in some applications the lIIU'IinDI survival distribulion S(I) and ils dc:pendcnce on Z may be or most inlel'CSL This clislribulion is easily obtained rrom Ihe estimates of p and S.(I). Predicling the CUR: stalUs or a censored subject ..ay also be of interest. The rannula to estimate the probability that a censon:d subjeci is in the susceptible group is given by: pSI (Ii) P(D; = liT; > I;) = pS ( ) 1 • I;
+
-p
The mixtun: cun: madel S(/) dac:s not in gcnc:nI have a prapaItionai hullds slnlclure. In orcIer to Icccp this, howC:VU, nonnaixturc CUR: models have been proposed (Tsodikcw. 1998; Chen. Ibnhi.. and Sinha. 1999). In lhesc: .....Is. a baundedcumulalive hazanl isassumc:cl: lim, _-1\(1) =IJ. One way 10 enforce this propeny is to write A(I)=6F(I). where F (I) is Ihc distribution function or a l10IIIICIative random wriable. Then the survival distribution S(I) ror the population can be written as S(/)=e-Ifl", which has the cure nile e-O. Covariates ean be incorpanlled into Ihe IICJIUI1ixhn cun: madel by assuming 1J(Z,)=c:xpfJJ'z,). Cure models arc worthy or considendion ror analysilll data ror which Ihc:n: is a stnKIg scienlilic rationale far Ihe exislellce or a and group and empirical eviclc:aa: or a naazero limiting survival fnclion,lOpIhc:r wilh a substlllllial number of cellSCRd observations with long rollow-up limes.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ CURTAlLMBfTSAMPLING ~n example wile", the cwe model was applied arose from a study of 672 IDnsil cancer patients Imdcd willi mdiaaion therapy (Sy and Taylor, 2000). The radiation caaeliminatcall lhe cancer cells in the lonsil of some palieals and thus the cancer will nul reappeal' in Ihc tonsil and Ihc patient is regaudecl as beiq curc:d. Iflhe radiation is DDt successful at eliminalilll all the caaoercells in !he tonsil. those thai n:main .wiD rc-gruw and become delectable., as a Iaca1 n:curn:1ICIe. within about 3 )'an. This is a good situation wIleR a CIR model txHIId be appropri_ because the", is a scientiftc ndianale far a cun:d paup and because a Kaplan-Meier estimale of lime of local recum:ace will ellhibit a long plateau rqion if Ihe", is suflieient follow-up in the data. Far lhese daIa. tIleR were 206 ewnls or local n:cunence and most patients had men than 3 )'e815 fciUow-up. The main interest was in understanding the effect of the toeal dose ofmdialion and theoveralllmllmeDtlime between lhe start and the end of radiation on local recurrence.. 0Iher covariates. such as stage orlhe I. .our and age of the patient we", incluclecl in the analysis. A million: cu'" model was used. with a 1000islic model for Ihe iDc:idence and a semipalBlllClric .,....,nional hazards madel for the Jalency_ The n:sults sugelllCd ahat Slap. dose ancIllalllnc:llt time we", slnmgly associaled with wllcthcr Ihe twnOUl' recumd. as indicab:cl by the paramelels in Ihe logistic model. Age. however. was nul associated with the incidence. The estimates of the relative huanIs parameICI5 in the laIency part of the model suuestcd stage. cbe mad ovenII lreabncnt lime we", not as:saciated with when the recum:1ICIe would occur. 11K: patient's age_ however. was :lITOngly associated with when recum:nce would happen. given that the patient was nul c:ured. The din:clion or the association was that younger paliealS would n:cur earlier. One possible inleqntalion of this is thai yaung patients tend 10 have the same
t-'
susceptibility 10 lreabnent as older patients but they tend 10 have fasterpowingcllllCelSlhalwill n:curearlierifnalcun:d. The initial iizeor stage of the tumour and how it islMatecl an: importllDl radan in clelermining whether a patient is curc:d. but ~ not important in determining how fasl the lumour grows back after lrealment if nat cun:d_ JMGT
a... Me 8., IllrUlm,J. O•
.ad....,
D. 1999: A DeW Bayesian model for SUlYi'VaJ daIa with a survival fraction. JofITRIII 0/ the Amtriam StQtiJlit:tll AIsoritIIio" 94, 9(8-19. Fanwwa. V. T. 1982: The use or IIIixIIR IIIOdeIs for Ihc anaI)'sis Of survival data with loag-taJn surviwn.. BiontIlrits 83.1041-6.........., V. T. 1986: MiXbR models. in surviYil aaalysis: are they ,,'onb 1bc risk? 77Ie 0mtItJiDIr JOIII'IItII oj Sialislicr 14, 257-62. Kale, A. V. c-. C. H. 1992: A mix1uR: model ~g logistic ~~aa with prapartianai hazIrcIs ~ssiCIIL BiollltlriJca79. 531-41. Law, N.J.. T8,JIar.J. MoO."" S'adIer,H.2002: 11acjoinl mocIcllingor a ICJIIIitudiDBI disease propasion mukcr aad Ibe failun: IiInc: pracess in the presence of cum. Biostalirs 3, 547~. U, C. 5."" T. . . . J. M. O. 2002: A semi-parametric: KCelmlcd faihR IiInc: aRmodeI.SIQlislic.inMdcbJt'2J 9 323S-47.LI,C.s.. Ta)'lor,J. Me O.... 5)" J. P. 2001: IdmIifiability or C1R models. Stalistics _ Probability lI11ers 54 389-95......... R. A. . . 7Jaoa, "1996: Surriral tlllll/ysis ",Uh 1Mg-lenn sani'f'fWs. MetA' York: Joha Wiley a: SOIIS.IDC..... Dar, K. B. O. 2000: A IIOIIpII'8IDdric mixture model far CUR: raIe cslilDltiolL Biometric. S6, 237-43. 51, J. P. aDd T.,.IDI', J. M. O. 2000: Emmlllioa in a COll prapaniaaaJ IllZardscun: model. BiDlrwlrks.56. 227-36. nodIkoY. A. 1998: A pnIIIOItioaaJ huaRI model tatiag accouat of loag-tenD survivon. Biomemu 54. 1508-15. V........... K. 1992: Acc:leIeraJcd failuse time ~pasioa models wi1h a repasioll time model of surviving _tion: an ...,Iication 10 the aa&lysis or "permaIXIII
C.""
9
Y.""
empIo),lIICDt'. JDIInIfIl·tJf lire Amtriam SltllislitJll Am1dtIIi6n 17.
284-92.
curtailment sampling
Sec 00EmI ANALYSIS
119
D data and safety monltortng boards These arc commitlCcs of experts set up to monitor the safety of participants and validity and inlegrity of data in QJNJC'..\L 'J1UAlS. Some: rorm or data and safely monitoring is called for in any trial to c:n~ minimal acceptable risks to trial panicipants and continually to IaSSc:SS the risks versus beneftts of trial interventions during the conduct to make sun: that thc:rc is an eqrlipoiN in cantinuing Ihe 1riDI. The International Conference on Hannonizalion defines good clinical pmclice (GCP) as 'an international ethical and scientific quality standard for designing. conducting, recording and reporting ofmals thai imolve participation of human subjects'. Monitoring of trials for safety of participants. integrity of data leading to valid conclusions.. adequate trial conduct and considerations ror early lermination to avoid unnecessary experimentation on human subjects is thus necClUl)' to mccI the stated GCP n:quirements. The trial sponsor. the investigaton and the instilUtional review board (IRB). also known as Ihe ethics commitk:e (Ee). are at the frontline of safety monitoring for trial participants and they assume and share the responsibilities. Howcvu. the sponsor may elect to eslablish a data and safety monitoring board (DSMB). also known as an independent daIa monitoring committee (IDMC or DMC). and delegate part orits responsibilities to the DSMB. The establishment of a DSMB is n:commended basc:d on lhe recognition that monitoring of safety at regular intervals is essential 10 c:asure safety of trial participants and that individuals dim:lly involved in lhe anluc:t and management of a trial may not be suited for objective review of emerging interim dalD.. All clinica.llrials n:quire monitoring of safely and efficacy data. but nol all require monitoring by a DSMB independent of the sponsor and investigaton. The degree and extent of such monitoring should depend on the poIenlial risks associated with interventions. the severity of disease: and END. POINTS of Ihe trial and the method or monitoring on the size. scope and complexity. A DSMB is generally n:quired for laqe. randomiscd conttolled trials comparing monality or major irrevenible morbidity as a primary endpoint or pivotal trials for regulatory approval of nwlteting. A DSMB is a body of experts who review accumulating daIa. both safety and eRic:acy. from an ongoing trial at regular intervals and advise the sponsor about lhe risks venus benefits and the scientific merit of continuing lhe trial. A typical DSMB is made up of people with pcrtinc:at expcltise. including clinicians and scientists knowledgeable about
the disease and intervcations under investigation and a statistician knowledgeable about clinical trials rnelhodology includiDg methods for INIBUM ANALYSIS. to interpret the emerging data appropriately. A DSMB IDay also include a patient advOCale or an ethicist. A DSMB is a SCpande c:atily from an instilUtionaireview board and its members should DOl be involved with the trial they monitor and ha\'C no connict of interc:sL either scientiftc or financial. A DSMB is primarily raponsible ror the appropriate o\'enight and monitoring of the conduct of trials for safely of panicipants and validity and iDlegrity of the dalD.. Marc specifically. lhe primary raponsibilities of a DSMB include review of the study protocol and lhe plans fordala and safety monitoring; evaluationoflhe propasofthe saudy. induding recruitmc:at ofllial participants. timeliness aad completeness of follow-up. compliance with protocol procc:durcs. performance of participating sites and other rac:lon that may affect study outcome: assessments of risks venus benefits: malting m:ommcndations to the sponsor and the investigaton concerning continuation. modifications to the protocol or lenninationofthe trial: and communicatinl the findings from data aad safety monitoring to the local IRBs. A DSMB will allow diRiculL mid-study decisions about the trials. DSMB members arc pro\'idcd unblindccl data on lhe important outcome measuremc:ats at rqular inlervals or at intervals specified in the proIOCol. These unblinded data should be kept confidential from the sponsor and Ihe investigators. A DSMB is responsible for making recommendations to the sponsor as to whether the trial should continue as originally planned or with modifications to the design. be temporarily suspc:aded of enrolment or trial intervc:alions until some uncertainly is adequaaely addressed or be lenninaled eilher because there is no longer equipoise among trial interventions or because it is highly unlikely that lhe trial can be successfully complelccl or meet its scientific goals. 11Ic indcpc:adence of the DSMB is inlendc:d to control the sharing of important comparative information and 10 protect lhe integrity of the bial from adverse impact raulting from pmnature knowledge about the emelJ:ing data. While small differences may be wellatClCptcd as nondcfinitive. awareness of such dilTen:nces may make investigaton reluctant toenter patients on the trial. to limit entry to a certain subset of patients or to encourage patients to withdraw if lhey arc assigned what they pm:eive as inferior intervention. Such tendencies will introduce biases and diminish the reliability
EtrqUopIINit C'OMpIIItiolf It) M.aKaI S/fllislic$: S«ond Edition Edited by Briaa S. EYeritt and ChrisIGph« R. PoaImec' oJ> 2011 JohD Wiley & Sons. ....
121
DATA~
_________________________________________________________________
of tile llial's eventual n:sallS areven pra:lucIc tile completioa ofdlc mal. Umilinglhe access to unblincled interim daIa to a DSMB relievcs Ihc: sponsarof'lhc: burden of clc:ciding whether it is ethical 10 continue 10 randomise patients and helps pralcctthe trial from biascs in patient entry or evalualion. A DSMB should have standard operating procedun:s and mainlain n:cords of all its meetings and deliberations. including interim n:sallS. and lhese should be available for n:view when the bial is completed. The DSMB standard operati... procccIun:s should specify meeting quannn. schedule mad pracedun:s. its decision-making rules and meeting follow-up. A DSMB should be consulted about the conlCnls of interim n:pons that serYC as a basis for dIc DSMB clcIibc:ntions. A practical pcnpcctive on DSMBs and the n:commcndations far the operation and management of DSMBs can be found in EllcnbelJ. Fleming and DcMcIS (2002), n:l1cctinga n:cent guidance forclinicallrial sponsors em the establishmenl and operation of clinical mal data monitoring commillce by the Faod and DnII Adminislnlion of the US Department or Health mad Human Serviccs (www.fi:la.gov/cbcdgcUns/clinclatmon.htm). Inb:rim analyscs of compandive trials are necessary to ensun: Ibatlarge differenccs between intcncntians do not go unnoticed. as well lIS 10 detect excess toxicity or unanticipated llaws in study dc:sips. Routine n:porling of Ioxit'itics ar information about inlerYCDtion adminiSlralion helps ensun: ahat interventions an: being given safely and properly and improve IriaI quality. Routine n:porting of outcome mulls. however. can hann study quality. In genn a DSMB would examine not only Ihc llial data but also relevant external evidc:ace rrom 0Ihc:,. soun:c:s. lis n:commcndation 10 Ihc sponsor should be based on Ihe intapmation of tile resullS ofdlc oRloin, trial in the contexl of cxistilll outside scientific cia.. releVllllt to such inlerpretation. A final decision. as to whclher or not 10 cOnlinue the trial. should not n:ly solely on • formal lell of Slatistical significance. The DSMB meetings proVide a setting in which lheclinical signiftcance or carty diffen:nces or lack thcn:or can be discussed openly with inlerim data and Ihc complex Stalislieal issues involved in sequential monitoring ofallial can be discussed at leRlth. Focused discussions of the JII'OIn:ss tow" the scicntilk goals of a study are r.:ilitak:d by a DSMB with access to unblinded data. Since intervention effcclS will be cxaminc:d by a small group.1hc dBlller will be n:duccd that a pnHIIising IriaI will be infonnaUy stopped carly with n:duccd accrual because of o\lCrinlerpmaiion of interim results by the sponsarorlhe in\'CSliptan.ln addition. sequential manitoriRl rules an:al best guidelincs forcomplcx decisions involving many aspccIS of a trial. Deliberations and canc:lusions of the moniloring should be awnmunicatcd 10 Ihc spoILIOI' and the IRBs without compromising the inlqrity of Ihe trial. Rc:conuncndations
resulting from monitoring activities should be reviewed by tile study team and adequately adcln:sscd. LocaIIRBs should be proVided fcalback on a regular basis. inc:ludinl findings from adverse evcnlsand ~ndationsclcrivcd from data and safety monitorinl. KK
....... S............ T. R. and 0eM.u, D. L 2002: I)Qla lIIOIIilorillg l.YImIIIill«~ Boca RaIan: Chapmaa 4 HalI1CRC.
data entry This process puts observations inlo elecllOnic ronnat ror compulcr analysis. No successrul statistical invesliJalion talccs paaa, without Ihc reliable and KCUl'ate collection of data and its convenion into a suitable elcctlOnic rana for compub:riscd analysis. While ostensibly a simple clerical process. it often sulTers ncll= in planning and execution that can jeopardise the smooth running of a research projeci. The rcliability of data collcclion is not specifically an issue rardataentry. but we will see latcrhow technoloJicalchangcs in data entry can encl'DKh on thc process. Most impoltant far the majorilY ofinvatigatOlS is the accurate entry ordata. In • rannal n:scarc:h projecl such as a clinicallrial ~ will be established and inviolable clerical pruccdun:s for data collection and entry dial will help 10 ensun: accumc:y. But in many academic studies Ihc researchers Ihcmsclvcs will lake responsibility far the complete coIlcclian and enlly pnx:css. With modem statistical paclcqcs and modendc information tcchnology (IT) litcnc:y on behalf or the uscr. this is • pcrfecdy feasiblc and economic process rar studies up to a few hundn:d variables and a few hundred cases. The spreadsheet data entry facilities of'SPSS ar Excel provide an easy way of enterinl data and.. given that the n:searcher is enteriRl the data. he or she can make checks durillllhc procedure. TWo problems occur with Ibis appraach. One is simple clerical enorarabsenl-mindcdncss in typiqdata.lhcolhcris a lack of an audit trail far chBllles in the spn:adsbccL Dual entry is typically used 10 com:clthe lint. Pro,rams such as SPSS Data Enlly ar a program written in MS-Acccss ar similar pennit CIIIC uscr to enler data and then anaIhcr to recater it frum scratch. Any inc:onsistency is 8agcd and Ihc appropriate variable checked. Such programs can also incorpanlc range chcckilil. While takilll exlm effort. it is well worth the initial investment in design. Some argue qainstlhc administrative burden of double data entry. however. on Ihc grounds that ranJC checks. ele.. will detect mast clerical el'lVl'S. yet it n:mains a sensible pn:caution. especially if temporary ar CXlernaJ slafl' are to be used for data entry. n.c audit or change is important. particularly ir scveral researchcrsarc mviewing the datL An individual corn:cUllla variable may unwittingly invalidate lIIIDlhcr's pn:vious analysis.lt is good pnlClice in these cin:wnstanccs 10 set up aeon: datasel and Ihcn usc a JII'OIram to change individual data
_________________________________________________________ elemenls if they need revision. ThUs. fOl' ex.ample. a file with SPSS syntax IlIIll:uage dala InInsfonnation commands can be used to compute changes to an SPSS dala file. It is then available for review by all in the team. There is much interest in using personal digil8l assislants (PDAs). or internet browsers. for data enlly. sometimes dim:lIy by the subject themsclves.11te superficial allnlcti\'Cness of these procedures can be misJc.ling. Transfening and merging dam from a PDA is not necessarily simple and will usually n:quire signiftcant manual intervention. This can be a source of error and care needs to be taken in design to prevent this. The use of a web page for data enlly potentially gives access to many thousands of respondenls. Setting up a n:liable data enlly page is not so simple. Browsing sessions ollen terminDle for communications reasons mid-session and therefOR: program logic needs to identify successful completion. Care needs to be taken to identiry unique data enlly sessions by. for example. originating aD IP addn:ss of the client browser. Data security is needed so a user cannot accidentally see other entries. A reasonable amount of pr0gramming effort is ~uired to do this and will certainly require database. programming and HTML design skills to Khieve it successfully. This docs not mean it is no( possible to desip a simple web page to acquire data: it is just more complicated to acquin: data boIh n:liably and acaJrately. Planning is the key to successful data enlly. Before any dais are collected the process now for entering data into the compulcl' packaa;c and its checking should be described and adhered to rigidly by the n:search team. Such discipline will payoff in smoothing the path to analysis. CS (Sec also DATA MANAOF.JdENI'I
data-dependent designs These are methods used for allocating tn:almenls to patients in clinical trials that make constructive use oflhc emerging responses. Compared with traditional trial designs using pn:determined sample sizes. dala-dependcnt designs aim to impan some advantage to bial participants. n:aching a conclusion sooner and/or exposing fewer 10 inferior therapy during the course of the trial. Such designs ha\'C been around in thecHy for at least as long as modem CUMCAL lRIAIS. although Ihcir practical applicalions have hitherto been very Umited. Other terms in use for similar methodological approaches for such trials include jle:cible desigtrs. dynamiC' designs. ADAP11VE DESIONS and letmI-QS-yolI-go designs, although then: is no apparent CODSensus on the nomenclatUM. There are four broad categories ofdala-dependcnt designs. each or which shan:s the same spirit of leaming from the accumulating data within the trial. as opposed to ignoring intermediate results until completion of the trial. These categories are: sequential. Bayesian. decision-theCRtic and adaptive. Their descriptions given later in this entry dclib-
DATA~PENDENTDESIGNS
erately avoid too much mathematical detail. Also. a distinction is drawn here rrom two other n:lated types ofclinical trial design not discussed further. FirsL Ihcre an: designs that use MINIMISATION to incorporate knowledge or cowriDles of patients already entered into a trial (and hence self-modify according to treatment allocation though not to treatment response) and. second. there are bial designs intended to have an internal pilol study (see FILm SIlJDIES). Initially. however. it is wonh considering brieRy some historical background to help understand why modem c1inicallrials ha\'C emerged in the way they have. Typically they fWlR fixed smnplc sizes dictalc:d by error probability considerations (see TYPE I ERROR and TYPE II EJlROR rates). tn:alments being allocated equally (usually. to maximise FOWERof a lest) and results kept hidden (to all but a DATA AND SAfETY ~rroRJNO &QUD (DSMB). if appoinlCd) until the final analysis. which is conducted well after the final patient has been enrolled. Interestingly. and tellingly. the rootsoflOday'slrials lie not in medicine but in agriculture. In the UK in the J920s. R. A. Fisher began conducting crop field bials to try to dctenninc: which type of fertiliser produced higher yields of wheat. Realising then: wen: more facton than could be listed (soil composition. aspect. slope. WDIer and so on) that might inftucnce total yield. YlSher ( 1926) pioneered RANDOMISATION to cope with the problem of balancing all the known and unknown variables as far as possible. It was a statistical masterstroke. for such use of an extemal chance mc:chanism alone could ensun: that the comparison betwccn fertilisers was fair and unbiased. Specifically. any difference observed in crop yield at harvest lime could be altributed to the one factor that was known to be different between the puups. namely the fertiliser. all other fadon being expected to be equal. Hence. infen:nc:e from any observed differences between groups would Unk cause and effect (bere, fertiliser and yield) as strongly as possible (sec CAUSAUTY). 1hc first medical application of randomisation came in Ihc laic 19405. when A. B. Hill used Fisher's tc:dmique in a clinical trial testing streptomycin for the In:almcnt of pulmonary tubclallosis. This was not without some controVCl'Sy at the time. but Hill convinced sceptics by arpIing that randomisation was also a fair way of allocating Ihc scarce reSOUR:e involved. given that the treatment was in slriclly limited supply. This Medical Raean:h Council sponson:d trial became Ihc first randomised controlled clinical trial to be published (MRC. 1941). However. trials of today an: fundamentally quite similar to those of SO or ~ years 8p), in Ihal they typically involve equal allocation of treatments to patients. generally after performing a power calculation to determine a lalgcl number to be m;ruited. Thus. in a two-lmltment compuative bial. half the palients customarily n:ceive Ihc standard and half the experimcnl8l treatment. As alrcady mentioned. with the
123
DATA-DEPENDENTDE8IGNS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
possible exception of DSMB committee members and a slalislician eonducting an IN1DIM ANr\LYSls. lID one looks at the raults until all the patients haw been nmdomisal and followed up. At Ihe end of the trial it is possible thai the experimental lMabDent is cleclan:d a statistically signilicanl impl'OVCmc:Dt and hcmIcIed as a clinical SUlXlCSs..1t is an ethical problem. however~ if slalistical ·failure' means die patient died and one can look back with some n:morse wondering 'if anly we had come 10 this conclusion saon&:I' pcIhaps we could ha~saved somelivcs' (seeE11lJCS A.~aJNICAL'IIUALS). Even if the outcome is noI as serious as death.. the !Illume. pelSills: Could fcwcr palic:ats in the study haw suITemI on the way to R:aching a valid conclusion? This last question has motiwtc:cl much rescardI by ethically minded statisticiaM. Ironically, this work dates back at least as far as the lint modem clinical trial. for the whole area of SEQUENJ1AL ANr\LYSIS tlKes its hi5lory 10 the 1940s. World War U and US goyemment-contracted statistician Abraham WaId (see Wald. 1943). His wort. like FlSher·s. was not in the medical arca of application. but in ammunition Ic:sling. an allolelbcrdilTenmt CJUIII1p1e ofseeking toc:ope with )Rcious and limited resoun:es. Medical application of sequential melhods docs seem entin:ly appropriate. after all, as palienlS ani~ to be tn:aIcd sequentially (they arc DOl all wailing in line outside the: doclor's ofIic:e. or ha5pilal clink. at the start of a bial) and. similarly. laults rrom same: arc awilable sooner than from oIhen. The rationale for sequential trials invol vcs looking carefUlly at data QS lbey accrue with a view to slopping just in time. Hence. the number ofexperimental units mauin:d is noI fiKc:cI in advance but is a random variable. 11Ieory shows that the expectccl numbers involved in a sequentially analysed nuacIomised controlled trial is less than the com:sponcIing fixed sample size trial. far any given power and level of significance. It is possible. when treatment groups fare broadly equally well. for a sequential trial to nc:cd sliPtly rnon: patients overall compared with a biaI using IJaditional dcsip. but this would be quite unusual. Far better or wane. the clinical llial as canduc:tccI and analysed today is nol in Wald·s style of testillli ammunition but ralhc:r in Fisher's appliclllion of fc:niliser to fields of whc:al. 'I1Iese two metaphon iIIuslnde the fundamental difrCRnce between the slalistics behind c1inicalllials that SIri~ to leam-as-they-go and those thal wait. literally, until harvest time befo~ beginning 10 make scienlilic infenw:es. The R:ader may decide whether it is rilhtthal nonnative pnsclice sees clinical trial volunteers alTanIed the same respect as the rc:n.iliser rather than the ammunition. Following Wald's pioneering reseaueh. sequential designs have evolved as sophisticatc:cllools 10 assi51 those on DSMBs and hence can be considered mainstRam. in conll'asl to the mnaiDing design types discussed below. It should be said. however. that these methods arc noIlUUtinely implemented
as primary anaIyticaitaols ror drivillli llials. Instead at best
they ~ used as "back seat driven' toexert iDdin:ct inftuc:ac:e on trial conduct. How do they do this? Essentially. as data accumulab:. a test staliSlic can be plaited on a graph of tn:almCnt dilTen:ACC VCIIUS lime. and trial m:ruitment can be m:ommended to terminale just as soon as a pmleterminc:cl boundary is crossed. This boundary may take on various shapes. the simplest. being lrianJUlarwith two possibleopliaas; eilhertreatment A or B isdc:claR:d bc:tterclcpendinl011 which side ofthetrilUllle is crossed filSl. 10 allow ror a thinl. nonconclusive. option with a pleddennincd maximum bial sample size. the boundary outline is modiflc:d to include a vc:nicalline at a given paint on Ihe lime (strictly "information') axis. 'I1Ie idea is 10 Slop the lriaI in favouroftreatmenl A. say. if the upper line of the bouadary is avsscd fint; B if the lower line; or else. conclude no clinically relevant dilTerence between A and B if the VClticalline is n:achcd lint. 1hc~ ~ variations on this theme with rules such as those derived by Pococ:k(1983)andbyO'BrienandFleming(1979) beinl popular examples. Thus it is not necessary to update the graph aftcrevay single observation. One can apply rules. called group .sequential melha. that update after small batc:hc:s or raIIlas become: awilable. For man: details refer to Jennison and 'lUmbull (2001). StatiSlicai software for implementing these rulcs is readily available in sevent COl1U11Cl1:iaI paclcqes (e.g. EaSL PEST. S + Sc:qTrial). One disadwntage with sequentially designed experiments is that their usefulness. namely their poICDtiallo learn while in progress. is self-limiting to trials having relatively rapid ENDFOOO'S. 11Ius a sequential trial otTers little benefit oyer a traditional. fixed sample size biaI if the outcome R:mains unknown until years after randomisation. 11Iis may be 10 in bR:8St cancer. for example. but is no limitation. for instance. in emergency medicine or in rapidly fatal diseases. Tumin, 10 Bayesian designs. investigators sbut by elicitinga PRIOR DlStRlBurJON, either fram a panel ofclinical expens or fram a R:aSaftable selection of available thcomical distributions thought 10 mimic reality in lenns or tn:aIment success distributions (see BAYESIAN METHODS. For example. a BETA DtnRlBU110N with suilably chosen panunelels can represent initial beliefs about a trealmenl'sel1icacy rallliing fram neplively skewed to unifonnly distributed to positively skewed. In practice., ~ is virtue in choosing a prior that makes Ihe experimental treatmenl appear initially a weak conlendcr. so that positive results in fa'VOlD" or the tn:aIment are not too dependent on initial choice of the prior. As the patients' results accumulate. the conditional diSlributiaa given the data thus far is evaluatc:cl-the so-called POSI'EIUOR DJmuBUDDN. amalpmating the prior and the: I.IKEI.IHOOO. Infe~nce is based on the posterior. including the evaluation of CREDIBLE INTERVALS. analOlous to CONRDENCE INTERVALS in the frequcntiSi CODtext.
_______________________________________________________ An adwablge is the ease ofinle...,matiaaofthesc intcn_ for they havc IlIOn: inlUilive meaning 10 clinicians and paticnts. A disadvanlDgc is thc gene'" lack of awan:ncss of Bayesian methods sinec these an: less oftcn cacountcrai dum those from the fmaucntiSl school. nais is reftcclcd in lhe comparative lack of slalistical lc:Xtbooks. counc:s and softwan: aligned 10 Ihc Bayesian paradigm. Spicgclhala. Fn:ccIman and Parmar (1994) proVide an cxcellent o\'CI'Vicw of Bayesian methodology applied 10 dinicaltrials. Some see Ihc subjective or arbilnlly nalUn: of the prior distribution involved as a weakness: othcn repnl it as a positivc oppaI'lunity to illCOlpOJate provisional information about the potential new In:alment. The thUd braad calcgOl)' of dallHlcpcnclcnl designs iDvolvcs the usc: of DEClSKlN THEORY. Some expcrimcnlal slUdics can be conducted with lhe resulliag inferencc. in lenns of how Ibc iafonnation will be used 10 reach a practical clccision CDIICCming which lrealmentlo n:commcnd. as the driving fon:c. ForexampJc. one can specify a criterion such as minimisingclqJCetcdsuccc:acslost.armaximisingsucccssc:s gained. over the course of a pm:Ictcnnined number of future paticnts. called Ihc horizon. wilhin and outside a comparative trial. Another crilCrion CXJUld be maximising the probability of conut scIcc1iaa of superior treatment. Either way. the foeus is on Ihc PJDIIDIIlic nc:cd to make a decision and use one of the treatments or not once lhe trial is OWl' in a din:ct attcmplto balancc the nc:cds of CUrmlt and future patients. It is possiblc to discount future patients by putting more weight on present results. althouP this whole an:a can become malhcmatically quite inlricalc., especially when modelling with uncanslnlincd "multiannc:d bandits' in the context 01' deciding &mOng scvcmltreatmcnts. Ncvenhclcss,. praclical simpliftcatiaasca. be iDtVrpanlCd. such as limiting equal allocation among n:mainiag treatments. In lhe case 01' just two treabncnts this amounts to a1localing pairs or treatments until such lime as it isopUmai. by whalcvercrilCrion.to cease the comparativc stage. After that onc can switch all mnaining patients within the horizon to the preferred treatment. or maybe enter thcm inlo a brand new randamiscd lrial comparing this 'winner' wilh 8DOIhc:r novclllaltmcntthal is ready for a compandi\'C bial. ThUs. OIIC is not CGIIslniDed in actuality to put all remaining paticnts on to the indicatcd treatment. but one can acl safcly in the Imowlc:dp thai the selection of the winner is warlting 10 the best available information. whc:rc "best' is guaranlccd until thc original horizon is n:ached. (Note. in practice, the choice of horizon in absolute tenns is naI critical. for only an approximate size would ac:cd 10 be specified.) Objections 10 the subjective natun: of prior distributions involved in Ibis type of dccision-thcomic framework can be a1lcviatc:d. for example. by appealing to minimax crileria. 11Iis means implcmcnlilll a design that has good thean:tical propc:dics across a bJVBd range or priors. Development 01'
DATA~PENoeNTDES~NS
computer software to allow such designs 10 be implemented has been slower than for sequential mcdaods. contributing 10 the current lack of use of dc:cision-thean:tic methods in practice. The fourth catcgory considcn:cl hc:rc. {n:sponsc-).~E DESIGNS. is Ihc mast cxtremc 1)'pC of clata-depcndcnt design. It incorporates the aCCJUing informaliaa from Ihc data 10 maclify Ihc treatment allocation probabilities away fram 50:50 in thc case 01' two lIatmcnts. Thus. f... example. whcn:as the trial wauld stan with equal allocation. as Ibc data begin to favour one In:alment cven slightly. Ihc:n it a«ccts the odds or thc next allocation being accanlingly f'racIionaily higher. In practice il works like lhis. Imagine a bag containing an equal number of nxl and blue balls. A n:cI baD drawn iDdicatcs Ihc nexl aDacation is to Irealment A: a blue ball. In:atmcnt B. If a success occurs a ball of Ihc: approprialc colour is added to the bq bcro~ the next drawing. and hence treatment alloealion. takes place. Adaptivc designs wcre SCI back by a ndbcr poor pnltotypical cxample 01' a mid-l980s trial (Bartlett. Roloff and Comell. 19I5) involving CXlracarpomli membrane oxygcnation (ECMO) dacrapy. and which has reccived much attention in the statistical and medical literatun:.. Ethicists. clinicians and statisticians ha\'C all contributed 10 the debatc about lhis particular trial. It involved critically iU newborn babics and Ihc relcWllt outcome in quc:slion really was a matter of lifc and dcalh. In relnlSpcCl. it was clearly a mistake 10 begin this bial wilh pn:eisely one ball of cach colour in thc hili instead of. say. len of each. What cnsuc:d was a highly unbalanced cliSlributiaa of In:atmcnt alloealion (for ECMO babies generally lived. unlike many of those not on ECMO therapy) n:ncIcring sensiblc inferencc difficult. if not impossible. On the other hand. it can be said that sincc the ECMO tria~ computing power and mobilc lcchnology. two pn:RqUisites far successfully conducting an adapti\'C design. have taken hugc leaps forwaud. making this design far IIIOIC feasiblc to implement successfully than cver before. Thesc adaptive designs an: the most CXJllIIO\'Cl'Siai in Ibc family or data-dcpcnclcnt designs. 11ais is bc:aause they appear to read too quickly to carly clata. which may be subject to syslcmatic bias. ar time tn:nds. and if DIll careful can begin 10 aclapt 100 swiRly 10 ch8DtlC results. 11acre is also the criticism that if one treatment happens 10 be a PLAtDO. why should anylhing change aftcr a success ar a failure on such an inert substancc? Nevertheless. wilh suitable cautions and awareness of the issues involved. adaptivc designs can be a highly cffectivc and cthically appealing design. despitc oncc apin the relativc dearth of pasilivc examples of their usc so far. A puwing number of statisticians believc Ihc 21sa century will be chanlctcriscd by man: use of computer-intcnsive.. clallHlcpcndcnt methods. 50 long as those responsible far
12&
DATADREDGING ________________________________________________________________ conducling clinical bialsan: opca tora=eiving suggestions on how to advance trial methodology. For rurther details. including when data-dcpcndc:nt methods are consideml most suitable and a proposed stndegy for lheir introduclion. see Palmer (2002). In closing. it is worth ~membering why one should consider using data-dc:pendent designs. The primary n:ason is for thcirethical ad\'8Dtage in tenns or how patients within trials an: ~gardcd. without compromising the scientific rigour or usefulness or studies for the sake of rutlB patients. Thc~ are secondary reasons besides. with benefits derived from the side effect of expccliag fewer patients to be involved in leam-aryou-go trials compared with lJ'adilionai trials. These beneHts pertain to trial sponson (notably the pbarmaceutical industry). doctors. patients and ultimalcly the science of medicine itself. CRP Bartlett, R. H .. RaIoI'f, D. W .. CCll'lld, R. 0 ... til. 1985: ExtracorporcaI ciR:ulation in neonlllal respirator)' fail~: a prospective nmdomised study. Pediatric$ 76. 479-87. FIsIIer, R. A. 19"'..6: 1bc amngcment or field apcriments. JoumtIl of 1M Atinislry of Agricullure. Greal Brilaill S03-13. J........... C. aad TarabaII, B. W. 200 1: Group sequeftlilll methods "'ilh applicalions ID clinicallrials. London: Chapman ..t HalIICRC. MRC Slnptam)'dD .. TubeR'll.... TrIals CGmmIUee. 1948: Streptomycin lleabncnt for pula. nary tuberculosis.. Bri/ish MedimJ JoumtJ/769-82. O'Bflea, P. c. -1'1ImIIIae T. R. 1979: A multiple taling poccchR for clinical 1riaIs. Biomelrics 35. 549-56. PalmII', C. R. 2002: EdUc:s. datadepcadeDt dcsips and the strlllqy of clinical trials: time to SlUt le~-as-wc-,o? SlalislimJ Mt/WS iIr Medical RrsmJ'th II. 381~ IWock, S. J. 1983: elilrira! lrials: Q praclical approum. OUchcster: Joba Wiley &: Sons. Ltd. S............., D. J., Fnedm.... L 5.aad Parmar. Me K. B. 1994: Bayesian approaches to randomised trials. JoumtJlojthe Royal Sialutical Society, SeriesA 157, 357~16. WaId,A.I943:Sc:quc:ntialaaaJysisofSlatisticaJ daIa: tqXIrl submiued to Applied Malhematics Panel National Defense Rcscan:h Commiltce (declassified in Walel. A. 1947: Sequmlial tJIItIIysis. New York. Jolm Wiley &: Sons. Inc.).
data dredging
See lUI' HOC A.'1ALYSfS
data management
This is the syslemalic management or a large structuml collection of infonnation. 'Data management' is always a component of data analysis. but is usually a more signiftcant issue in large or multicent~ studies wheR the data management fealU~s of software packages such as SPSS or Excel an: inadequale. This will also be the case when the 'data model' of the study - the entities ror which data an: collected zmd the ~lationships betwc:c:n them - docs not fit the standanl rectangular data model or spreadsheet or the classical statistical package. Thus. for example. a study comparing trc:alment in hospitals may ha~ thn:c entities - hospital, ward mad patient - that need data n:cording at each level. Longitudinal or n:pcalCd measurement studies also generate data that docs not so easily fit the n:ctangular model.
Nonetheless. \'ery many complicated studies an: maaaged and analysed ent~l)' in packages such as SPSS or SAS (see STA11SI1C:Al. MCKAOES). SAS in particular has very strong data management fealu~s. Se\'eral data Illes foreachenlity can be created. and the data melle reatun:s of thc:sc propams in order to prepare spccillc analyses. Because Ibis melling of files is manual, man: skill and experience on behalf or Ihe researcher is nc:c:dcd and thorough documentation and understanding of merge procedun:s is esscutial to p~\'ent error. Nonetheless. bcc:ause only one programming language is used. the procedures me consistent and easier to learn. Allhough such an approach seems 'Iow-tech'. there is much to recommend it for many studies. The main weakness of this approach is the manual management of bansaclion updates and production of an audil toil. If, in the example above, mo~ palicuts an: recruited in a particular ward. then dcrivali,,-e files that include hospital or ward variables need to be recreated manually. In a very dynamic data cuvironmcnt this is tedious mad also error prone. Equally the com:clion of values in one file similarly requires the n:cmdion or aU the dcriVDlive flies. An allCmati\'e is to consider using a rormal data management tool. and this usually implies a dalabase. Wilb a fasl modem PC. desktop database packages such as MS-Ac:cess ~ capable of managing datascts with some tens oflhousands or cases and several hundred variables. Only the very largest studies will ~ui~ a full SQL-compliant database, although the~ may be sound ~s for using the latter ror security and access control. Almost ineVitably deployment of a database will require the production or data entry and update scrc:c:ns. a process that requ~s some programming ability. lbis is particularly the case if transaction control and an audit tnlil of changes are needed. Sc:cond. the database query statemcats needed to provide Ihe appropriate rectangular malrix datasets for analysis can be complicalcd,. and can requ~ subtle undentanding of SQL. Such in-depth ex.pertise is not normally easily accessible in a resc:arch team and may be expcasive to proVide. Bcfo~ deciding to use a database for data storage Ihe ~search team should plan and budget for such skills to be available throughout the life of Ihe project. Employing a programmer who thcu gels another job just bcro~ the end of the study can lea\'e a rcsean:b team wilhout the support they nc:c:d when wanting IlnaJly to analyse the data. For Ihis reason alone. it is often scasible to consider the acquisition of specialised dinical data rnanagemcul packages. These often include alilbe alia checks and ronns necessary for formal CUNIC'.o\LTRIALS. Entering the appropriate lCIm5 in any web search engine will briDl up scveraI hundred companies offering products Ibat are sullable - the difficulty will be in selecting 0ftC. Although there may be a seemingly significant initial cost (perhaps seveml Ihousand euras or
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DECISION THEORY
dollars) Ihc saving on development time. as well as the prcdiclability and security of softwan: operation. give a rapid payback. At project conception it is usually possible to outline the exlent of data management n:quiranents dependent on the complexity or the problem. II is sensible ror prior specification and buclieUng or the software needc:d to take place. ralhcr than awaiting project start and then developing ad hoc: solutions. This will give the resean:h team the security or control of the data over the project IireUme. CS (Sec also DATA enRY)
data mining In IMdlelne This is a branch of both computer science and statistics devoted to extracting useful knowledge rrom databases (also known as KOO. knowledge discoveJy in databases). In general. such knowledge is obtained by cleteding various types or regularity and relation among the datL most often association rules. classification rules (sec DISCRIMIN.~ ~CIIQN ANALYSIS).lincar and nonlinear dependencies and clusters (sec CLUSJ1!R .o\NALYS1S). Depending on the context in which it is performed. data mining research may emphasise computational scalability or the algorithms or slalislical significance or the n:suIlS. The field bcneftts from a major injection or ideas and tools from genmd ).IA(IIJNE lEARNING and pattern n:cogniUon and. as sucb, it is often tDlsidenxl also as part or ARTIfICL\L INTELUOENCE..
Data mining (OM) is often described as an intel'DCli~ process that involves both the compUlcl' and the human c0mponent. This is also why data visualisation is considcml an essential pan or the process. OM is lI10Ie gcncnal than traditional statistical analysis. in the type of regularities that can be found (e.g. decision In:es). in the size of the dalasds (often in the range of millions ofdata items) aad in the 5IrCJng emphasis on visualisation or the dala and automation or the analysis. The application or OM to medicine has a long tradition. Automatic data collecUon in modem medicine is i~ ingly pushing towards the development and deployment of 0015 able to handle and analyse data in a compulersupported fashion. Being able to detect sets of symptoms thai are oRen simultaneously present (association analysis) can help predict which other symptoms may be observed (association rules). Observing many paUent descriptions as well as their diagnoses may help find a rule to predict Ihc diagnosis given a new patient (classification analysis). SpotliDg groups or similar patients can help customise the therapy (cluster analysis). Finally. being able 10 predict Ihc expc:ctc:d cost or a patient based on his or her hislOl')' can help insurance companies optimise their services (regression analysis - sec WU1PLE REORESSJON). An early application
or data
mining in medicine is the decision tn:e leamer ASSISTANT (Cesblik. Kononenko and
Bndko. 1987; Witten and Frank. 1999). It was developed speCifically to deal with the particular cbancterislics of medical dataselS. A whole new chaplCr in Ihc application or DM lechniqucs to biomedical data is being wrillen with the intnxluclion of genome-wide dablSCls. Genomic sequences ror seyeral organisms are now available online and the availability of high-throughput gene expression and protcomic data highlight the urgent need ror efficient and flexible algorithms to exlrad the wealth or medical information contained in them. Oatascts n:cording human genetic variability (SNPs) are soon expccled and. with them. the possibility of correlating genotypic with phenotypic information (sec OENETIC EPlDBOOI.OOY)•
.Classic examples or modem data mining melhock an: systems such as BLAST (Altschul et QI•• 1990). which allows researchers to ftnd related genetic sequences elrlCiendy. together with a Slalistical assessment orlhe deem: or similarity. Si;niftcant biological discoveries are now routinely being made by combinin; OM methods with traditional laboratory tc:chniques. For example. the discovery of novel regulatory regions for heat shock genes in C.oenorhabt/itis elegans (Thakurta el al.. 2002) was made by mining vast amounts or gene expression and sequence data ror sipmcant patterns. NeRDS Albdaal, S. F .. GID, W.. ~DDer, W.. Myen, Eo W. and LIpman. D. J. 1990: Basic local alignmem searda tool. JournoJ 0/MoImdaT B;ology2IS.403-10. Bratb,Land Keaoaellko,L 1987: Leaming diagnostic rules flOlll incomplete and noisy dalLln Phelps. B. (cd.).
AI mellrotls in slatislicJ. L.ancIon: Gower Technical Pms. CtIbdIc. B., KOIIGIIeBb, I. aDd Bntko, L 1987: ASSISTANT 86: a knowledge elidtalioa tool for sophisticated uscrs.ln Brab.l. aad Lavrac. N. (eels). Progress in nltldJiIre lemning. Wilmslow: Sigma h:ss. IIaad. D. J., Mannl-, II. and SmA P. 2001: Prilr~iples ofdata minm, faJapli1'e ~ompulat;on tmtI mtJmiM /etzmingJ. Cunbridgc, MA: MIT ~ Lavnc. N. 1999: Selected techniques for data mining in mcdk:ine. Al'lijidaJ Inlelligence in Medicine )6. 1.3-23. nuart., De G., P........ L.. SIonno, G. D., TedI&c:o, p.. JoImoJI, T. E.. W.tker, D. W., 1Jthaow. G.. KIm, S. ..... LInk, C. De 2002: idenlificaliCIII of a DO\'C1 ds..~plalOl)' clement iJI\'olved in abc heat shack zapon&e in Caenorhobtlilis tlqmu using miaoanay gene cxpr9UCIII and compulalional methods. Genome ReseQrm 12, S. 701-12. WIttaa, L R. ..... Frank, IE. 1999: Data minillg: prDtliml maclrinr learn;"g lools ad t«hniques "'ilh Jom impItmtnlatiDlu. London: Morgan Kauffmann.
data monitoring committee (DMC)
Sec DATA AND
SAf£TY MONITORINO BOARDS
decision theory This is an approach to the analysis of data that leads to choice between a number or allemaUve acUons by consideration of Ihc likely consequences. This is in conlnlSt to the commonest fona of analysis or data from a ClOOCAL TRIAL that is based on hypothesis tesUng.
127
DECISION THEORY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
The decision-lhcomic approach is mast suitable for
decisions in which the possible actions 1ft entimy wilbin the control of the _ision maker. In a clinicallrials scUing. mosI suge.stions for the use of decision thcary havc been in early-phase trials. Ibe rmal outcome or which is usually a decision as 10 whethc:r or IIDl to continue with the clinical devclapment pmpammc. In a dccision-thcoraic fl'1UlleWOl'k. the actions thal will be taken as a ~It of the analysis an: explicilly identified and a utility. or pin. assipcd to each exlRSSilll the desirability of the action as a func:lion of some unknown plll1llllder. For example. in an early-phase clrua trial. possible actions might be either conducting fUJtherclinicallrials with the drug ar abandoning the clinical devclapment pmgramme. The desirability or each of these actions depends on the InIc unknown efficacy. Information on the unknown parameter is summarised by a Bayesian posICrior disbibulion (sc:e &\YESL\N ~). indicating those values that 1ft plausible givcn the observed data. and this can be usc:d to obtain an expected value for the utility f«ada adiaa. The actian with Ihc larpsa pastcrior expc:cb:d ulility may Iben be identified. This is the action thai will be chosen by a ndiaaaI dccisiaD maker whose ~nccs an: accurately n:pn:a:ntal by the utility functian. A simple example based on Ihal CCHlSiclenxl by Syl'VCSter (1988) (see also the cOlRClion by Hilden. 1990) illusllales decision making in a PHASE U lRLo\L. AI the end of the trial a decision will be made as 10 whether or nollo continue with Phase 10 development. The desirability of each or these two options is IUIIUIUIrised by a utility function. which. if the observed data I n biruuy (suc:ccsslfailure). depends on the true success rate. which wiD bedenob:d by p.Suppose Ihalthe suc:cc:ss probability for the exisling standard trutmcnt is known: e.g. it may be known to be 0.5. Suppose also that if the Phase II lrial is successfuL some known number. denoted RI. ofpalienls will be tmltcd with the new (Raiment in PHAsE 1II1R1ALS and that int is found 10 be eifcctivc in the Phase III trial. a IOIaI of I additional patients will be In:ated wilh the new lR:almenL PalicD15 Imlted with the standard batmenl in the Phase III trial will n:ceivc the same balmeal. rqan:Uc:ss orlbe outeome of the Phase U trial. so need not be consiclen:d. Suppose lb. the utility ofeach action can be measun:d by the number of future successes expected if that action is taken. If the Phase III trial is nOi conducted. the RI +1 ~ patients will all ","ive the slaDdarclln:alment. The success ndc for these patients will then be 0.5. so thal the expected number or future successes is 005 (m + I). nis does not depend on the sua:c:ss rate for the new drug since: this will not be used for aay ftIrther patients. If the Phase III trial is anIuctcd. m patients will ","ive the new tn:alment in the trial. so that theexpectcd numbcrofsuccesscs far the patients in this trial is pm. If the Phase llllrial is unsuccessful. the I
further patients wiD n:cei\'e the standard (Ralment and the expected number or successes for these patients will be 0.51. If the Phase III trial is successful.thc:se patients wiD ","ive the new lR:atment so that the expected number of successes will bepl. We will assume that the Phase llllriaJ will givcthe COm:ctDSWCI'. so thal it will be successful wheneva- p > 0.5. in which case the number of extra successes from Imllilllthe 1 fUlure patients with the new rathCl' than the standard (Rabnent is the ditrcn:ace between pI and 0051. which is (p-O.5}I. The loIai expected number of suc;a,sses if the Phase III trial is conducted will be: mp +0051 + (p - 0.5)1/ (p >0.5). when: the indicator funclion /(p >0.5) is equal to 1 if P > 0.5 and 0 ir p 0.5. If p is I~e. this utilit)' is large. n:8ecling the factt"t coalinualion to Phase HI is desirable. and ifp is small. the ulilily is smaller as continuation 10 Phase III is undesirable. If the value of p wen: known. we would lake the action com:spondilllto the larger of the two ulilities: i.e. we would continue to Phase III if the expcclcd number of successes from doing so. mp + 0.51 +(p - 0.5)11(P > 0.5). was gRater than the expected number from abandoning de"elopmc:nt of the experimentallmltment. O.5(m + I). In pnactic:e. or course. pis DDI known, but instead Ibe infonnation on p given by Ibe observed daIa is swmnarisecl by its posterior dislribution and the expected number of futum successes frona each action must be avcrapd over this dislribution. The oplimal action COI1CSpondilll to the I~CI'expectcd number of succcsscsam then be selected.. In addition 10 making decisions. the end ofa clinicallrial. as illustrated hm:. decision theory can be used to make decisions befon: the study SlaItS Iqanling the study desip farcliaicallrials,ascliscusscd by Sylveslcr(1988),and itisin this context that the approach is most often propasal. Design decisions consiclcml might be those taken befce the study sluts ~prding the sample size for a fixed sample size study or those taken during a sequential trial (see SEQUEH1lAL AJW.YSIS) about the future conduct of the trial. In the Jaacr case. a mc:thod known as 4dynamic prognunming' may be used to obtain a sequcnc:e of optimal clccisions by working backwanis through the 1riaJ. eonsiderilll each clc:cision point in turn. Examples I n given by Bcny and Stangl (1996). who consider the problems of when to stop a sequential trial involving a single expcrimenlallmdmcnt and of deciding which Imllmentto use far each patient in a sc:quentialtrial comparilll an experimentalllalmcat with a control. Although the sugellion to use decision theary in clinical trials has a lang history (see. far example. Colton. 1963). then: hu been Iiule practical application (see DATA-DEPENDENT Df.'SION1). 'I1Ie use or the approach has probably been limited by the dif6culties8SlOCiated with specificalionofapprapriate ulility functions. 11Ie detailed specification of the gain fuDction also meaas thal designs must be obtained with a particular type of trial in mind. One possible solution is
:s
_________________________________________________________________ DSMOGRAPHY
to use wbat has been called a stylised Bayesian approach. as illustrated. for example. by Stallard. Thall and Whilchead (1999), in which parameleu in the utility runction are selected so as to lead to a design with attrldive fmauenlist propc:Itic:s. NS BeIl'J. D. A.ad 5....... D. It. 1996: Bayesian lDdhads in heallhrelated I'eSCmh.1n Beny. D. A. and Stangl. D. K.. (eels), Bayemm bioslalisli~s. New York: MaR:el Dekker. CoItoa, T. 1963: A model for selecting one of two medical treatmeats. JOIII1Ial oflire American SIaUsI;cal AssodaliDft 58.3Il-400. Hlldea, J.I990: Conmcd loss calculation for Phase U trials. Biomelrics 46. 535-8. 5......,.. N., 'I1IaI, P. F.ad Wlaltebead,J. 1999: DccisiOllIbcomic dcsips (or
Phase II clinicalmals wilb naulliple outcomes. Bkmrtlriu 5S. 971-7_ Sylvester, R.J.1988: A Bayesian...,..,acb tothcdesicnofPhase U clinical bials. B;o",tlrits 44. 823-36.
degrees of freedom This is an elusive concept that occurs throughout statistics. Essentially. the tenn degrc:c:s of
fm:dom (DoF) means the number of independent units or information in a sample ~ICVanl to the estimation of a panuncter or the calculation of a slalistic. For example. in a 2 x 2 CONTINOENC'Y TABLE with a given set of marginal totals. only one of the rour cell fR:qucncies is f~ and the table is thc:Rro~ said to bave one delR'C of freedom. In many cases the term corresponds to the number of parameters in a model and in others to the number of parameters in a Slatistical distribution such as the l-Dl5TRIBl1J1ON. the H)lSTRIBuno.~ and the CHI-SQUO\RE DISTRIBU11ON. SSE
demography
The study of population processes (~on. Hcu\'elinc and Guillot. 20(1). This entry provides a brief survey of the following topics: measura or fertility. mcasun:s of mortality. age standardisation. sources of data. historical demography and the dc:mognphic b'alWition, papulaaion projection. papulation ageing and summary measun:s of population health. Note that many demographic measures that ~ deftned as 'raleS' an: noIlnIc rates in the sense thalthey ~ not measun:s of events pel' unit of penon-time. These measures ~ identified by plac:ing the "rate" of their title in quotes. We hegm by discussing measun:s of rertility. FatUity mcrs to actual childbearing petfornlQlf«. not childbearing potential. which is calledjecunaity. The CTUck birlh role is the number of births per conventional unit of person-time. 111c person-time denominator is typically bued on estimates of population sixe at mid-period multiplied by the length of the period. W~ the period is a single calendar year (the usual cin:ums1ance) then this equals I. For example. in England and Wales in 200 1 the~ wen: 594 634 live births rqistcml DDd the mid-year population was eSlimalcd al S2 084 Soo. giving a ClUde birth rate of 11.4 births per 1000 population per year.
The analogously calculated crude dealh rate may be subtracted from it to give the Tote of natural increase. M~ spccirac measures of fatUity ~ desirable because population &Ie structure affects the childbearing potential of the population and because. at an individual level. births (unlike dcaiM) can be repeDlcd and the likelihood of this happening depends on n:produclive experience to that poinL Thus. cumulative measures or individual rertility an: also desirable. The generol feTtility 'Tole" (GFR) is calculated as the number or live births per conventional unit or female person-years in the age range IS 10 49. while the 10101 fertility "Tole' (TFR) estimates the a\'Crage number of babies that would be born per full ~productive lifetime - given current age-spcciftc fertility rates. It equals the sum or the probabilities of giving birth in each or the ycars of life in ",hich such a birth could occur, atnventionally from IS to 49. The gross reproduction 'role' (GRR) is the 'rate' at which mothers an: n:prociucing themselves. It is an estimate of the a\lCJ'Dgc number of daughlCrsthat would be born to a ",oman during her lifctime if she passed tluough the childbearing ages experiencing the age-spcciftc fatility rates or the population of inlc~sL It can be estimated on the assumption that the proportion or female births is (approximDICly) l00f (100+ IOS)=0.488.111c GRR is then 0.488)( TFR. The nel repTod"ction 'rOle' (NRR) takes into account the preVailing mortality among women aad thus estimates the cxtentlO which each generation of mothers actually reproduce themselves. allowing for the proportion who die beron: reprodUCing. It can be calculated from a hypothetical birth cohort (e.g. or 1000 femalcs) who ~ aged through the reproductive lirespan and exposed to the given death rates using lifctable methods. This yiel& an expected number of penon-years lived by the cohort ofpotc:ntial mothers in each of the age inlc:mlJs in which births could occur. ThUs. formulaically (",ith 1: denoting 'sum or) the NRR =10.488 x 1: (probabilities of giving birth in each ofthe ycarsorlife in which such a binb could occur )( the person-years lived by the cohort at each age»)lnumber in the cohort. By definition. a NRR of 1.0 equals "mplacement level fertility". 'Zero population growth' ",ill not typically be approadlcd until sevcnl decades after the allainment of ~placement le\'el fertility because (previously) incn:asing populations typically have a higher proponion in the ~ reproductive ages than would obtain in the atll'Csponding stational')' papulation. 'This exccss reproductive potential creates subSlantial momentum that is not slowed until the age structun: appJOaChes that ofastationary papulation. (Sec the discussion of stable population to follo\\'.) Turning to measu~s of monality. the most basic is the mHIedeath role, analogous to thecrudc binh rate. giving the number of deaths per conventional unit of population time. Thus in 1999 in the US. 2391 399 dcaiM WC~ reported and
129
DSMOGRAPHY _________________________________________________________________
the estimated mid-year popuJatian was 272 691 000. giving a made death rate of 8.77 per 1000 population per year. Death raIcs may be specific for sex. age group and cause: e. g. 8337 men qed 5S-64 had their cause of death entered as hean attack (KIlle myocardial infarction) on their death certificates in England and Wales in 1990 - out of an estimated mc:aD population at risk of 25.26 x lOS. giving a nile specific for age. sex and cause of 330 per lOS per year. Among other specific death 'raIcs' and ratios. an important one is the infant morlality 'Tale'. oonceptually. the probability of dying between binh and exact age I (.qo in lifetable notation). It isoperatianally dermed. in relation. foreumple. to events occurring in a gi~n calendar year. as: Number of dc:aths of liveborn infanls who ha~ not reached their first binhday -----~~-:---~___:~o:_--- x 1000 Number of live births Nalc that this operational definition rcquira accurate counts of births and infant deaths and is thcn:fore dil1icult to impIcmcnt in the absence of a national \ital statistics system with complete (e.g. greater than 9S'.it) co~e. Only 75 of 191 member states of abe World Health Organisation (WHO) met this criterion in 2000. Thus the infant monaIity rate is most dif1icultto measun: in those populaaions when: infant mortalily tends to be highest and of greatest public health importance. In such populations it is usually estimated using modcllifetables starling from estimates of the ~h;ld morlalily 'rQle". which tend to be: more robustly estimated. Another important 'rate' is the ~h;ld(or under S} morlalily 'rQle". conceptually. the probability of dying betwccn birth and cxact age 5 ~qCl in lifc:lab1c notation). con~ti0R8l1y mUltiplied by 1000.11 is the most robustlyestimatc:d measure of mortality in carly life in low and middle income countries without comprehensi~ vital statistical systems. In such «lWlbies. it can be: measured operationally (in demo;raphic and health suneys and in censuses) by asking women of reproductive age about all the babies they have had and which of these ha~ sina: died. There are standard demo;raphic methods fQl' using answers to these questions to estimate 51/0. If details of dates of birth and death are available then mortality rates can be estimated dirc:ctly. Ifanly the numbers ever bom and numbers alive (or dead) an: known then the indinxt method (also known as the 'Brass' method) is used. The WHO estimates that each year about 22CJt of global dc:aahs occur in populations for which estimates of this type proVide the only available evidenc::c on mortality levels (at any age). The Qdllil morlalit,- ·rale' is the probability of dying bc:twccn exact age 15 and exact age 60 (..!IIu).lt is typically used by international agencies as a SUIIlI1'UU)' measure of adult mortality levels in low and middle income countries. High income countries with compn:hensi\'e vital statistical systems tend not to use this measure for their own pmposcs.
MatnrrQ/ morlali,,· is a topic of glQt policy interal globally but its measumncnt is fraught with dimculty. The WHO defines maternal death as the 'dcaIh of a woman while pregnant or within 42 days of terminatian of pregnancy. irrespective of the dulDlion and the site of the pregnancy. from any cause related to or aggravated by the pregnancy QI' its management but not from accidental or incidental causes'. ThenlDlerllQlmorlQ/ityralioistheratioofmatemaldeathsto li~ births x 100000. Even in countries with the best vital statistical systems it is estimated that around one-third of maternal deaths an: not identified as such by the ICD oodc assigned for the underlying cause of death. Blscwhen: the ratio is subject to even greater uncertainty - making it unsuitable for oomparisons between countries QI' over time. The global maternal mortality ratio for 1995 was estimated at 397/100000 births with an uncertainty interval extending from 234 to 635 - emphasising the magnitude of the uncertainty as.saciated with this measun:. In order to IIIBk fair comparisons.. especially internationally. in demogmphy it is necessary to standardise vital (birth and death) rates. Crude (unstandardised) death rates are poor guides for comparing the force of mortality in ditTen:nt populations: a retirement town. with many of its population in the oldcr(dying) age ranges. will. purely as a function orits age structure. tend to have a very high crude death rate. The processes used to age-standardise can be con~nientJy described by distinguishing bc:twccn the ·population of interest'. i.e. the population whose vital niles are being charaacrised. and the 'referen~e' or 'stantlurd' poPlilution. i.e. either an artificial QI' a real popuJatian. used fQl' standardisation. DireC'1 standtzrdi:rtltion is one method n:quiring relatively pn:cisc estimates of the age-spccific dcalh rates in the population of interest. The standard population provides a standard age stnIcturc. '11Ic age-spcciftc death rates of the population ofintcrest are applied to the component age strata (of standardised size) orahe standard population. In this way. each age stratum of the standard population is made to expericna: the same force of mortality as the oom:sponding age stratum in the population ofintcn:sl. The resulting sum of deaths (in the standard population) is no longer inftucnccd by the age structUR: of the population of interest and. when this sum is divided by the appropriate denominatQl' (100000 in this case). it yields a din:ctly age-standardised death rate. A dirc:ctly ag~standardised death rate is. in effect. a weightc:d mean of the age-specific rates using a standard set of weights - with the weight foreacl1 age straIum being the standardised proportion it comprises of the total standard population. The second mc:thod is called intIireC'1 Qge stant/QrtliSQtion. If the population of interest is small or the deaths of interal are from an uncommon cause. the number ofdeaths occurring in some age strata may be too small to produce the stable estimates of the age-specific death rates that the direcl
_______________________________________________________________ DSMOGRAPHV
mdhod rcquin:s. In the indin:d method. Ihc: agc-spc:ciftc dcaIh ndcs or a SIandard population 1ft projcdcd on to the &Ie sIndD onlle population of inlen:sl to givc the number of deaths dlat would bcexpeclcd in each age stratum on the basis oflhc standard IDles. 11K: lDIio or the total observed deaths to the toIaJ 'expc:cted cIeaIhs' is usually called the standardised mortality lDIio (or SMR). Because SMRs 1ft still inftucnced by the age slnlcllR of the populations of inlclat. cach should. Slrictly. only be compan:d willi the WIIuc fOl' the slandanl populaUon. i.e. with 1.00 (or 100 depending on which base is chosen). Sourccs of data used for cIcmopaphic measures CaD be ilIuslJated for mortality. ArouncI onc-quancr ofthe 57 million dealhs estimatcdlO occur cadi year occur in counbies w~ the vilal SlalisticalsyslCm has been judged to be at least 95... complete. Around 13... occur in populations whose vital statistical systems an: less than 9SCJ, complete. For India. China and sevc'" smallcrcounbies. vital rates an: estimated usingdala fram sample rqislralionand suncillanc:e systems. In these systems some 1'1. or so oflhc: national population is coveml by intcnsi\'e surveillance for vital events. For populations in which around 22CJ, of global cIeaIhs occur. child mortality can be estimated from suncy and census n:wms on the numbas of chilchn born and numbers still alive" even thou&h then: is little or no din:ct cvidcacc on adult mortality levcls. 'I1Icsc 1ft typicallyestimalcd using mocIcllifetablesto match plaUsible adult mortality levcls to estimated child mortality. This leaves around SCJt or dcalhs occurring in populations with no n:cent data on child or adult mortality. Estimates of mortality in this last c:ategory or populations an: enlin:ly 'model bascd'; i.e. they an: pn:dicted fRJID other known or estimaacd characteristics of the pOpUlation. The calculation of death rates also n:qui.a estimates of populations at risk of dying. A minority of counbies ha~ n:gular censuscs with coverage deemed complete. These countries estimate populations in inten:ensal years using adjace:nt censuscs. Atlhc: other end of the daIa availability spcclJUm an: populations with no n:cent censuliCS. For Ibis group. bodies such as the Population Division of the UN have a long experience in pn:paring 'modcl-bascd' estimates of the size and age andscx disbibution orthe papulation. albeit with subslantiallevels of unc:edainlY. Thus. while mortalilY estimates an: now pn:pan:cI by intemational bodies such as the WHO for all components of die human population. many of these an: subject lO substanlial uncertainty. 1"be evolving philosophy of Ihc: WHO has been to make Ihc best use of all available evidence and dlen to scck 10 quantify the level of uncertainly attaching to the n:sulling estimalcs. Life expeclancy eslimatcs published by WHO an: now presented with UDl:Crlaint)' intervals. 'I1Icsc inlCrVals aim to quanlify all sou..:es of uncertainty. DOl just thai associated with sampling enor - hence Ihc:ir dcscription as llIfcola;"I)' rather Iban CO.'fl=JDEN(E
INlDVALS.
Historical demography is the bnmch or demography dlat Sludics how and why the force of mortality has changed thnJugh historical lime. infanniDg our understanding. of Ihc: main determinants of human health. and is thcn:fon: of consicbablc inlcn:sL Historical demographers typically work their way backwards in time from man: n:cent periods. willa data that an: n:aclily available and of g.ood n:liability.1O earlier periods when: then: an: problems with eithCl' the availability or thequalily ofthc available e\idcnce. Mortality eslimatcs based on a formalised syslcm of data collection by parishes an: available fOl'Sweden from the mid-ISIb century. For England and Wales aD ol1icial system of vital n:gistralion began in 1837. Sefom such syslcms ~ in place:. parishes in England, for example. kepi n:cords of baptisms. burials and maniqes. Historical demographers have used these reconls to'reconstitute' families and fram these gcnealogies have obtained bolh aumerators (vital events) sad denominators (estimates of person-time lived) fOl' Ihc: estimation of vital rates. Family ra:onslitutioa has yielclcd estimates offcnility and mortality levels for England dlat now extend back to the 16111 century. TheseconstilUte the longcSl such series for a North AlJanlic society. Thcn:havc been two main findings from IhcseciatL First. it has been shown that the main means by which the English population adjusted 10 cyclic variation in economic fortunes. in the early modem period. was via the rqulationoflDlll'riqe (Wrigley el al. 1997). When economic conditions became difficult. age al marriage increased and the proparlion never marrying also incn:ascd. Thc:sc dcpartun:s from the pallern of universal early lDIII'riqe as sc:c:n eJscwhcn: have been characterised as the Emvpcan marriage pattern. As nUpiiaiity varied. 50 did fedility and with it the rate of population incn:ase. Second. a high level oradull moltality in England in the early mocIcrn period has been obsc:ned. While somewhat man: than SOCJ, of those bam survived 10 adulthood. among Eqlishmales.fOl'examplc.onlyaround309toflS-year-01ds could expc:d to survive to M.1t is of inlcn:slto notc hen: dlat hip levels of aduk mortality wen: also typical of the poOl' agnuiaD society of India on the eve of its dcmopaphic transition. Around 1900. only I in 6 of 15-year-old Indian males aJUld expect to survive to 65. 1hc overall lI'IIRSition fram a 'pn>-modem' 10 a ·I~ madcrn' pattern of vital ndcs isdcscribcd as the demographic 'mnsit/on. It begins wilb high mortality and fertility rates. followed by a period in which manality dccliacs in adwnce of the decline in fertility - a phase of the transition in which population growth accelcralCs. Fertility then declines - in an idealised fonn to n:ach n:placement level (NRR 1.0). with a new equilibrium being finally established with high survivorship. As has aln:ady been implied. Ihc: starling point fOl' this demographic transition was II1CR fawwuble in northwest Europe (in. say. the 17th century. when birth and death rates wen: 'submuimal') than in poor agrarian
=
131
DSMOGRAPHY _________________________________________________________________
societies such as India (around. say. 1900. when birth and death rates were exceptionally high). Turning from the past to the future. anOlher imponant aspect of dc:mognpby is making popu!atl"" projedlons and fcm:caslS. Population projc:c:tions, as the name implies. project eXisling populations forward in lime under slated assumptions and in aoconI with established relalionships bctwccn demographic panunc:ters. Some projections may be known to be unn:alislic but be carried out to explore 'what ir scenarios. Forecasts are those projections that an: believed most likely to pn:dict the future. The standard method for projecting populations is known as the rollOrl fORlpolient melhod. Typically. each 5-year age group in the populalion of interest is projected forward 5 calendar yean at a time. It is depleted by expected losses to death and emigration and augmented by expected levels of immigration. At the beginning of life. births (10 existing residents and to immigrants) an: predicted. For these purposes. attention focuses on females to whom assumed fertility schedules are applied. A parallel exercise for males makes up the numbers. This exercise is n:pealed. starting apin with the expc:cted populalion in 5 yean' time. The migration c:vmponent usually introduces the largest levels of uncertainty into the calculations. Realistic assumptions entail nonlineartn:nds in fertility and perhaps also monatity so that the assumed nleS need to be adjUSled for each 5-year calendar period. Both the United Nalions Population Division and the US Bureau of the Census prepare projections on 'high'. ·medium' aDd "low' assumptions for key inputs. ThUs. estimates for the size of the US population in 2050 wry by 102 million between low and high fertility assumptions. by 48 million between low and high mortality assumptions and 87 million between low and high migration assumptions. There is a general n:cognilion that this scenario-based appI'OIICh needs to be replaced by a more systematic: approach to the quantification ofuncCltainty and its representation in probability distributions. UndelSlanding ofthedetenninalion ofage SU1Ic:turc n:sts on the theory of stable populalions. Stable popuIaiions emerge when the puwth rate in the number of births is constant (or the schedule of a;~specific fertility rates is constant). the schedule of age-specific: death rates (i.e. the lifeaable) is constant and theR is no migration. In such populations. to which many historical populations approximate. various mathematical mationships hold between key parameters. The age distribution. the birth rate.. the death rate and the growth rule an: entirely detcnnined by the fertility and mortality schedules. Populalions that are not themselves culMntly approximating the stable model can nonetheless be said to have a 'stable eqUivalent'. i.e. the population thai would elDCfge if the birth and death schedules were allowed to act eonti...ously. From this c:qujvulence an ·intrinsic growth rate' may be delennined.
One of the most striking and counterintuitive findings from stable population theory is that population age struc:lUre is very much more sensiti,'e to changes in the fertility schedule than to changes in the monality schedule (Coale. 1955). nus with a gross reproductive rate of 2. incn:ases in life expc:c:taDc:y from 40 to 60 years are associated with redur:liollS in the mean age of the population. nis is because increases in survival an: proportionally gn:atcst at each end of the lif~ span. The increases in survival in the early yeaJS of life lead to larger cohorts or parents who in tum produce more childn:n. keeping the base of the population pyramid extended. Hoy,-ever. as fertility falls and life expec:lancy extends. proportions aged over 65 do inclQSC. Finally. as populations approach stationarity (sustained equality of birth and death rates). increases in survival are reftected in increased proportions of aged persons. On the way to such equilibrium. substantial perturbations may arise due to the passage of cohorts that an: ·Iarge' relative 10 those that immediately follow. 1bese may have arisen from shon periods ofinc:reascd fertility, e.g. 'post-war baby boomers' in Western c:ounlries or from the last 'large' birth cohorts before subsequent substantial and rapid falls in fertilitY9 e.g. in such countries as Japan. China and Italy. In the next halfcentuIY these presumptively transitional phenomena will result in periods of marked ·population ageing' when the relevant ·Iarge' cohorts pass age 65. According to the UN Population Division's 'medium' variant projections. proporlions aged over 65 will incn:ase over the period 20 10 to 202S from 8.34jt to 13.4" in China. from 20.4" to 24.4" in Italy and from 22.M(, to an extraordinwy 29.74Jt. in Japan. By contrast. incn:ases in the USA an: expected to be more modest: from 16.1 .. to 18.1 ex, (United Nations Population Division. 2(09). As populations approach stationarity. assumptions about limits to life expc:dancy become incn:asingly relevant. Oeppen and Vaupel (2002) have shown how demographers have repeatedly underestimated such limits. Mortality decline at high ages has continued in low monaIity countries and has 50 far shown lillie evidence of slowing down at the highest ages. Demography has played an important role in the development of methods for measuring the burden of di.sease and injury (Murray ("I at.• 20(2). Forexample. the health-adjustc:d life expectancy (HALE) measure seeks to estimate the expc:c:tation of life in 'full health'. nme expected to be spent in less than full health is subtracted from tOlallife expectancy. after Weighting by the severity of the depaJture from full health. 'Health pp' measures. such as the disability-acljusted life-year (DALY) lost. estimate the hypothetical Rows of "lost healthy lifetime' arising from deatm and from onsets of disease and injury during the period of interest. For the "years of life lost' eomponent (and for long-term nonfatal health losses). gaps an: estimated relative to a standard lifetable with a remale life expc:c:taDcy at birth of 82.5 years
___________________________________________________________ DENSITYESnMAnON and a male life cxpcclaDey at birth of 80.0 years. Unlike hc:alth expectancy-type DleIl5UleS (such as HALE), health gap measures can becleaJmpased by allocllling DALYs 105110 the diseases and injuries R:SpOMible and also into the determiDants of the diseases and injuries. JP c.Ie, A. J. 1955: How lite ace distributiaa of a human papuiatiaa is determined. In 'Prot:eerliRgJ o/Olki Sprilf, HlUboitr S,)1JIIHlI1ia 011 {llltllllila#ire biology. pp. 83-9. Marnr, c. J. L, SaIamDII, J. A., ~ C. D. ad Lapn. A. D. 2002: SurrImtzI1 RletUlUeS of popula/iDII IIM1111: com.~pu. ethiC'S, IftItlSllmftMltmtlopplimtiDlfJ. Gcans: wartd Health Orpnizatiaa (WW\\"."iao.inl/pultlsmphlcnf indcx.hImi). OIppIII,J...... Va"", J. W.2002: B'-D limilS to life clpectaDcy. Scicnc:e 296. 1029-31. Prestaa,S.H.,HMlvtIIDe,P. ad Gal.... M. 2001: Demopaphy: RfNSIITing tmd modeliRg popida/iDlf prouJSr,. OxfClld: BERdl. United Natlaas Papal&tIDa DI.... 2009: WorltlpopidatiDllprosp«ls: lbe 2008 nMDIL New Yen: UN Dcputmcnt of Economic and Sod" Main (htlp"JI www.ua.argesalpopulalionl). WrIaIeJ, Eo A.. BaYlIs, R. s., Oeppili, J. E. ... SdIDIIeId, .. S. 1997: popMIa/iDlf hilt"" /IYIIII /tImil)' r«oIISlilll'lon, 151O-1BJ7. Cambridp: CamIIridgc University PIaL
O.
Jt
density estimation Kernel estimate showing individual kemals (SiIvennan, 1986)
• 30-
.
_lim
dendrogram Sec CLUS1ER ANALYSIS IN r.tEDlCINE density esUmatlon This is lhc c:stimalc of a probability diSlribution from a sample of abserYations. In many siblalions in medical n:scan:b we may wish to usc a sample of observation to estimate: the f"",ucney dislribution or probability density of a variable of iDtCRSl. Commonly this estimation probleat is approached by simply cOllSUUtting a JIISIOORAM of the daIa. Howevu. the histogram may nat be: the most cfl"ecIi~ way of displaying the: distribution ofa wriablc. because of its cIcpeadc:ncc on the: aumber of classes chasen. 'J'bc: pRlbIc:m becomes even man: acute if IWcHIimcnsionai histagnuns an: usc:cI to estimate: BJVARIAlE DImlIlllmONS. The density estimates provided by one- and awo-dimeaslonal histograms can be improved in a number ofways. If. of COUIR. wc wcre willing to assume a partit'ular form for the distribution. e.g. normal.1hc:a density estimation would be mluc:cd to estimating the paramelcl"s of the chosc:n density function. Mon: eommonly. however. we would like the data to 'spcak for Ihc:msc:lves' as it wc:rc:. in which cue we might choose one: of a variety of ahe nonparamdric _sity estimation procedures available. Perhaps ahe most c:oaunoa 1ft the kernel density estimators. which an: essentially smoolhc:cl estimatc:softhe proportion or observations railing in intc:rwJs of some size. Thccssential components of such estimalOrs 1ft the kc:mcl function and baadwidth or smoothiq parameter. 111c: kc:mcl emmat. is a sum of 'bumps' placed III the obscmdions. The kcmcl function detc:nnines the shape of the bumps while Ihc: window wicbh determines dle:ir width. Details of the mathematics iavolvc:d an: given in Silverman (1986) aad Wand and Jones (1995). but die: cssc:ace of the pmccdurecan be gleaned from the: ftrst figure. Here ahe kernel
~-
.20II
i 15Q
•
.• ...• ••• • ..•
•
."
10-
5-
•• ••J.I. •• ~ •
• ••
2b
sb
• • , •
..
••
• • •• • • •••• • •• • • • Birth rate
4b
density estimation Scatterplot ofbirth and death tales for 69 countdes fuaction is GaUSSian, and the: diagram shows the: individual bumps at each absemdion as well as the density estimate obIaineci flVlll adding Ihc:m up. The kc:mc:1 density estimator considcn:cl as a series of bumps central at the obsc:rvatiOM has a simple cxteasion 10 two dimensions as described in. for example. Silverman (1986). Here we conle.nl ounelves with an example. 'J'bc: second ftgll'" shows a plot of binh and death rates f. 69 countries and the third figure (in (a) and (b» shows pcmpc:ctive plots ofdensity estimates givc:a by using dilTen:at 1cemc:1 functions. Bivariale density estimlllcs can also be: useful when applied 10 ahe separate panels in a SCAT1'ERPlDI' MATRIX of data with mon: than two variables. As an eumpJe lhc fOUJ1h ftgure shows the SCallClplol matrix of data consisting of thRC body
193
DENSnvesnMATION _________________________________________________________
(8)
........
(b)
...........
.......
........
....
d_sfty ........10.. PelSPflCfive pIoIs of two density estimates forlhe bItth and death tate date: (a) bivatlate fJOIfIIIII kemaJ; (b) Epanechnikov kemel me8SlRmenls on 2.0 indiYiduais with a mntour plal or the appropriate cslimllleCi bivariate densit)' rUDClion on each panel. ~ is clc:arevidencc or lWO modes in the estimated densities, which is explained by the pieseac:e or aDd women in the sample. SSE.
men
......... B. W. 1986: DrlUily '31im11la
.'a
loT
slaliJlks aruJ
tIIIIIIym. LandoR: CRCIChapnum a: Hall. Waad, M. P. .... - - . M. C. 1995: Knife' JmIJIJ,h;"g.l.andan: CROOIapman
&HaU.
22 24 28 28 30 92
Chest
Ii
I
"a
I~~----~~~~~~~~ ~
__~__~~
~
I
.1
~
•••~
A ~1"""T'"~_ _"'I""T"......JiI ~-__---'r---"""'" 4--I--I--I--I==P- I)J S24 _
as 400
Got
20
25
30
a5
32 34 38
as
.to •
cleMIty estimation ScaIte1plot maIti1t 01 body measutemenls data showing Ihe estimated bivadate densitif1s on each panel
______________________________________________________________ DENTALSTATISTICS
dental statistics
Dentistry is concemcd with the pr0vision of aR for the teeth. supporti", tissues and the gums. and lhe tn:atment of diseases aJTecti", these an:as of the mouth. In the United Kingdom. the Social SW'\ICy Division of the Office for National Statistics ewes out the Adult Dental Health Surve)' every 10 years (see. forexamplc, the Iq)OIt on the 1998 surve), by I<&:lIy. Walkei' and Cooper, 20(0). PaI1icipontsan: interviewed face to f~ and in addition lhose with natural teeth ~ asked to undergo a home dental cxamination. Using a random samplc of several Ihousand individuals qed 16 years and over from England. Scotland. Wales and NOIthern Bland. this survey yields information by constituent counlry on issues lh.. include the number and condition of teeth, dental hygiene behaviour. patterns of dental visits and altitudes towards the provision of dental health care. In a similar manner the Child Dental Health Survey. involvi", schoolchildren aged belween 5 and 15 years. has been conducted on a IO-ycarly basis since 1973. Statistical analyses an: provided in detailed official n:poIts of these surveys. For all dental specialties. current evidence is advanced through the conduct of suitably designed research studies thai use statistical methods for data analysis. Developments in denial public health. cspecially aspects n:lewnt to chilcbm. have DlIracted the mast attention in the media. Worldwide. important public heallh themes ha\'C included the impact of the ftuoridation of public watu supplies on dental caries. the elTect of the introduction of ftuoride toothpastc on dental health. the decline in dental caries experienced by schoolcbildnm since the 19€1Os and the influence of socioeconomic factors and cthnicity on the provision and uplDkc of denial sen'ices. In dental slUdies. possibl), m(R so lhan for those of other types of health care. data from consecutive patients should not be treated as independent obsemdions. In routine dental appoinlments. a significant Dwnber of patients arc examined as put of a family group. with mc:mbcn being seen consecutively. COrRlalion between observations occurs because within a household. individuals tcnd to eat similarl)' and engage in the same type of routine dental aR. For ClUlll1ple. Metcalf el al. (2007) n:poned a withi....houschold JNl'RACJ.ASS CCRItEIJ\~ com:tCIENT (ICC) for the consumption ofsuplS of 0.518. Other types of cluster cncounlen:d include school classes in the study of adolescents and nursing homes in surveys of the elderly. Cluster randomisation (see CLUS1D. RA.~D5ED 'IRJAU) can be used to ~ lhis problem iD randomiscd conlrolled CUNiCAL 11UAtS (Frenkcl. Harvcy and Newcombe. 2001).
Similarly. data from an individual's tc:elh cannot be treated as independent obscrvalions; an individual might. for eXDJDpie. havc similar patterns of fllli",s on the left and right sides of the jaw. Consequently. in nearly all studics oftc:clh lhe unit of observation is taken as the indi\'idual rather than the tooth.
Recently. wilh the development of IIKJR: sophisticaled statistical methods. studies that analyse individual leelh as cOlI'ClaIc:d obserwlions ha\'C staned to appear. for iastance that by Chuang ellli. (2002) into possible factors influencing the surviwl time of dental implants. This paper also contains a useful n:\'iew of SURVIVAL ANALYSIS techniques that ha\'e been applied in lhe modelling of dental implant faillft. .Continuous and near-continuous dental data rarely follow a N
135
DESCRIPTIVE STATISTICS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
being used as a 1001 in the n:vicw ofspc:ciflc issues in denial n:scardJ. For example. Ismail and Hasson (2008) described a meIa-analysis or Ihe studies published belween 1966 and 2OD6 inlo lhe associalion between ftuoriclc: supplements. cleatal caries and RUOIVsis. a dc:ntal conditiOll lhal is chancterised by stainilll of lhe lc:c:th. Brier summaries of crilical reviews can be: round in lhc: journal EvidenceBlued Denlulry. NCS Bell, G. W., ......... J. A., ~ .. J., ......... I(. t-1IaIIII, M. R., . . . . . , M. L, Keea. W. D., Stewart, D. J. c. aad N. 2003: The accuracy or _tal paaanmic lOmapapbs in dctamininc abe roolllHHpholol)' of maadibular third mala teelh before surgery. Oro/ Surgery. Oral Medicine, Oml Pothology. Orol Rlltliology oM ERtIotlont;u 95. 119-25. D., DItU, E., ScIdat........ P., M~ L. aad KIn:IIDer, U. 1999: The
Ham...,
BiII-.
zao-iDftalcd Poisson madel anclabe dccaycci.1IIiaiq and filled teeth incIcx in dental epicierniolCll)'. Joumo/of,lre Royo/Sla,illkalSodely A 162. ItS-D. CIIaua. S. 1(., Wei. L J., ........ C. W. aad DodIoa. T. B. 2OIX!: Risk facton fordealll implant failun:: a strategy for the analysis of clustacd failun:-lime obscnalians. JOIII'IIal of DnlJal ReMtlm. 81. 572-7. CoMa, J. 1960: A coefficient 01 agnxmenc for namiDal scales.. EduraliDllal _d Pqt:lrologito/ Mm.lMn'mt'ni 20, 37~........... R.,Hamt)',I.aadNIWCGlllbe. .. G.
lOGl: ImprtWing ani hcallb in insailutiaaaliscd elderly people by cducaliJq; can:giw:rs: a nmdomiscd eonllOlled 1riaI. Community Denlisll)' 0Rtl Oro/ EpitlemioloD 29. 289-97. GadIaa, J. H. aad WIllIams, S. A. 1996: Oral hcallh and health related behaviours IIIIIDftIlbRC-ycu~ld
children born to fint aDd scc:and geacralion
Pakistani mothers in BnKlfoRI. UK. Com1lUlllily Denlol Hm/lir Il. 27-33........ A. I. aad ......, H.2OOI: Fluoride supplements, deataI caries and ftUOlUsis: a syslcmalic review. JDllmQI oj ,he AnrericOll Denial AModaliOlf 139. 14574. Keo,. M.. " ....., AoIIIIII Cooper, L 2000: Adult Drlflo/ Hm/,Ir Sunoey: Oml Heal'h ill lhe Unitftl Kingdom 1998: a ~ aJIIIIRi.rsioMtI by Ihe Uniled Kingdom HeaI,h Dtporlmtllts auried Dill by lhe SotitI/ Surrey Dirision 0/111e OfJIcr /M Notiono/ Slo'i.Jli('~ ill ~ollabonJlion tl'ith ,he DeIIlo/ Srhools of Birnrhllhom. i:JIInIke. NN'ttISI/e oM Wales, Celltro/.sr.m, Unit of lhe Northem 1~/1IIItl Stalistics 0Rd Ramrrlr Age"". l.CJncIaa: Stationery 0fIicc. Metal', P. A., Sena .. I(. R., Stewart, Ao W. I11III ScGtt, A. J. 2007: Dcsiga dfccts lISSDCiatcd with dietary aullient inIakcs from a clusteml design of I 10 14-~-oId childn:a. European Joumo/ of Clinictll Nu'rilion 61. 1064-71. NataZtlll. M., MIIII. . . R., ........ Y., RdonI, .... a.......... D., AlVIS, l\L I11III......., J. 2003: A 4-ycar Ioagitudinal evaluation of mmtomia and saliva')' gland hypofimcIion in the Women's latcragency IDV Study plllticipanlS. Oral SurgB)~ Oral Met/kiM. orol Pa,hoIogy. Oro/ Rtldiology 0Rd DHIotIonlits. 95. 69l-1.
.,Ire
descrlptlveatatlatlcs
These am summaries dcsiped
10 encapsulate meaninlful aspects ofdataselS. Hen: we focus
on numerical descriptive SlaliSlics. ORAPIIICAL DISJIlAYS being CXJRSiclered sepandely. Individual observations are the: basis of slatislical analysis. However. when describilll clata it is I'IRly reasible to pn:senl all obsc:rwIions. and il is nul always possible lo illustrate the disbibulion using a graph. 1hc:mrCR
descriptive stalistics are n:quirm
10
provide a numerical
sumnuuy or the distribulion.
an: used 10 describe in a silille filUn: Ihe lypical or lePR*'ntative level or all abserwlions. The mc:asun:s or location most oRell employed are lhc: .EAN. MEDIAN. CJEOUEIRIC' ~ and t.IODE. Variabilily IIR1Und Ibis a:nlnll point is summarised by means or a MEASURE OF SPREAD. The RANG£. STANDARD DEVIATION and INTERQUAImLE RANOE are used as measun:s of spaad. OIhc:r aspects of a distribulion are encapsulated by SKEWNESS. which measura how as)'IIUIIClric thedislribution is. and KURTOSIS. which quantifies ils ·peakedness'. It is important that descriptive statislics clloscn an: appropriate 10 the distribution or Ihc: data. Data that have an approximalcly symmetric distributiOll are usually summarisc:cl using lhc: mean and standard deviation. On Ihe other hand. the presence of skewness or OU1l.lflLS implies thal the median and inteJquarlile range an: man: approprialc.. 11Us is because they are based on nmks. and lhc:mfom make no DlSumplions about the: distribution of lhc: clatL SRC MEASl1RES
(]II
LOCATION
deviance
Deyiance is a measum oflhe exlentlo which. palticular model dirrers rrom Ihc: saturated model for a datasc:L Defined explicilly in terms of the: diffen:nce in Ihe UKELDIOODS or the two models, deviance -liin L.: -In L.J when: L.: and La are lhc: likelihoods orlhc: curn:nl model and lhc satUnlted model respectively. I..a.qe Yalues of the deviance are elK:OWlten:cl when L.: is small n:lalive 10 L.. indicaling that lhc: cum:nl model is poor. Small values of Ihe deviance: areoblaincd in the n:yenc case. Asymplolically,lhe deviance: has a CIO-sQUAJlE DISTRIBUTION wilh DE.CJREES OF RlEEI)Q\I equal 10 the diffen:nce in Ihc: number of panmclen in Ihe lwo models. SSE (See also OflIIERALISEO I.INEAR MODELS, LCXH.INEAIt MODwl
=
digit preference
This is the personal and often subconscious BIAS thal rrequently occurs in the ruonling of observations. For example. a penon may round the lenninal digit or a number (i.e. the: 5 in 624.75) systematically to a palticular digil or sct or digits thai they pn:fer. Moll fie... quenlly people will wanllo round 10 zcro or five. although othernumbc:rs are or coune possible. This is a problem when reconIilll data 80m an analogue SDURIC. i.e. a clock or sphYll11OmalllDmeler. and when R:CaII or alimaliOll is n:quin:d. i.e. recalling the age aI which menopause began or one's weight in kilos. II can also be a problem when using visual analogue scales or similar devices lo lamer imonnalion.ln medicine lhc: digit prererence phenomenon is palticularly troublesome in Ihc: laXll'dinl orbload pn:ssUR:o whcm it has been estimaaed thai clinicians have up 10 a Iwelverold bias in ravour or the lenninal digit XCIV (sc:c: the: figure on pap (37). This Iype or bias may have
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DIGIT PREFERENCE
101 stel c: . 5:5 000
SlDrOOO
4£,000 r-
B
4(1.000
r-
35 000 r-
-m
;)1).000 r-
:!WP
25.,000 r-
"'0
lW
2!D.ooO
r~
15.000 r10,00{) .5 ~ ~
0
I
r
- r
po-
r
II ~I I
I
II " ' - - '1 I
I
1'-"1
7
, 000
1D
5
o
_ 0
I
I _______ If l I
I 1
2
3:
4
1,_
I 6
7
1 1.IS
9
E dlClg digit preference Diastolic and systolic end digit preference
grave implications for diagnosis and trcaIment. but its greatest effect is perhaps in epidemiological and other research studies w~ it can distort frequency dislributions and reduce the power of stalistn lcsts (sec Hessel. 1986). Johnstone (2001) identifies a study in which digit prefen:nce may ha\'C led the authoB to the wrong conclusion about the elrcet of D drug in the treatment of hypcm:nsion (Penson. Vitols and Vue. 2000). Then: arc times wilen it is appropriate to compare the distribution ortcnninal digits lo a discrete uniform distribu-
lion in order to detect digit preference. However. one should not lose sight of the fact that. as dcmorwtratcd in Crawford. Johannes and Stellato (2002). it is possible furtbe dislribution of terminal digits to be nonuniform. but still not tbe result of digit prefcrence. BSEJAGL (See also fRAUD DETEC"J1ON IN BIOMEDICAL RESEARCHJ
s. In. Jabaanes, C. B. 8Dd Stelato, R. K. 2002: Assessmeal of dipt pn:fen:noe in self«poncd year al menopause: choice: or aD IlIlI'IOpriatc n:f~1JCC disttillulion. Ameril.Ylft JOUJ7ftIl of Crawford,
137
DIRECT STANDARDISATION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Epidemiology 156.676. HeueI, P. A. 1986: Terminal digit JRfcreace iD blood JRSSIR meawranc:nl: effects on epidcmiolCJCical associations. /"temalitJnGl JounrtIl of Epitkmiology IS. 122-5. ,JaIIDstoae. G. D. 2001: Leaer. BritU/r Met/wi Jour1ftl1322. 110. ........ M.. VItoII,S..... Y. 2000: Orlistalassoc:ialcd with hypenension. Brilish MetlimJ J0IUIItIl321. 17.
y-.o.
direct standardisation
See DEMDORAPHY
disability-adJusted Ilfe-year
See DStOOllAPHY
dlsattenuated correlation See
CONFIRMATORY F.AC.
TOR ANALYSIS
dlscrtmlnant function analysis
This is a collection of mcthads aimed at optimally distinguishing between a priori groups or individuals.. so that futlR unassigned individuals can be classified to one or the groups. To illustrate the problem in medical conlcXts. consider the following thm: silUalioos. FU'Sl, patients entering hospilal with jaundice could be suffering from one of a number of diseases. Some of these diseases n:quire surgery. while others can be treated completely by medical means. Exact detc:nnination of disease may itselfrequile surgery. which is to be avoided ifat all possible, so it is hoped that diagnosis might be achie\'ed via a batlCly of obsenalions (signs. symptoms and laboratory measurements) taken on the palienL Such data an: available on a sample or patients for whom either a biopsy ew a postmortem examination has eslablished the underlying disease. and hence the medical or surgical status. with cenainty. Can we usc these data to ronnulate a rule for predicting the status of a future patient from the battery of obsenalions made on him ew her? Second. a retrospective study is conducted on patients undeQ;oing sWJCry and the appearance or otherwise of postopendive pulmonary embolism is n:cordc:d few each palient alongside a range of other variables (e.g. age. obesity measure. number or cigarctlcs smoked II« day. nalUre of disease. etc). Can the data be .cd to develop a sm:ening index for pn:dicting patients at high risk or appearance or postopendive pulmonary embolism? Thinl. consider a prospective study being conducted on patients with thrombophlebitis. Each patient is moniton:ci on a range of IaboratOlY measun:mc:nls: same: patients develop embolic thrombosis, while others do nolo Can those who develop it be predicted from the measurements monitored in the study? These situations share some common characlcriSlics and objc:clives. In each of them then: an: two distinct groups of individuals and observalions have been recorclc:d OR a set of jellllUes (.YlriDbleJ or Qllributes) for each individual. The
hope is that these featlRs an: able: to distinguish the two groups from each other well enoup for the measurements taken on them to be "pable of predicting the group membership of a ruture individual. 'l1Ie process of distinguishing between groups is known as diJcriminQliorr. while the pre.diction or group membc:nhip or futun: individuals is termed citusijiclliion or allomliorr. 1)pically. the best way of utilis.ing the featun:s is toeambine them in some way. i.e. to form a fundion of them. Di:lcriminQrrl furrction analYJis then aims to find the: best function ror distingUishing between the groups. while formulating a clasJijicQliorr rille will provide a means of pn:dicting group membership. Fn:quently. the best runction for diSCriminating between two groups also din:ctly provides the best classiftcation rule~ but this is not always the: case. Then: are many different potential discriminant functions and classification rules in any practical situation. so we nced to be able to judge their perronnances in order to choose the best ones. The worth of a discriminant runclion can be assessed by any measun: that estimates the se:panItiOD between the: groups using this function. while the worth ora classification rule is generally delennined by estimating the probabilities of misclassifying rutun: individuals with this rule. The ideas and problems can be n:adil), generalised to situations where there arc more tluia two groups. Por example. in the earlier jaundice case. we may be intc:rcsled in discriminating between the actual cliwtues causing the condition I8therthanjust between the group that n:quircs surgery and the group that can be cun:d by medicine. 'l1Ie underlying principles n:main the: same~ but the details bcoome mare complicated. for example. in discrimination we: ha\'e to be able to assess the extent or separation among all groups (or all pairs or groups simultaneously). while in classiftcation then: arc now many more potential mistakes that can be made in pmlicting an individual's group membenWp. For the purpose here, therefore, we will describe the methodology few the case of two groups and later merely indicate how the methods generalise to multiple group situalions. The first allc:mpt at fonnulating and tackling the problem was made by Fisher (1936). who looked for the linear combination of features that maximally separates the two groups. He tackled this by maximiSing. over all linear combinations, the ratio of the squan:d difference betwee:n the group means to the poolc:d within-group VARIANCE or the observations, which is equivalenlto maximising the squarm t-statistic that tests for a difference between the two groups. The coefficients of the features in this linear eombinalion an: given by the elements of the vector formed on multiplying the veclor of mc:m. dilTen:nces or the featun:s by the: inverse of their pooled COVARIANCE ~ This very simple function bas become known as Fisher's li"near dim-iminanl fundion (LDP).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DISCRIMINANT FUNCTION ANALYSIS
Fisher derived his result in a purely practical setting. using actual daIa ratbel' dian stalistical or probability models. Welch (1939) look alheon:tical approach and showed thai lhe classiftcatiCIII rule maximisilll the a poslcriori PROIABILIrY of com:cl paup membenhip is given by comparing the l'Dlio of probability density functions of observations in the two populations againsl a giyen thn:shold value. nais thn:sbold. or cul-oll'. value depends on lhe prior probabililies of obsc:nations in each of the lwo populations: an individual is allocab:d to one gruup if the tluahold is exceeded and olhenvise to lhe olhu paup. Welch also showed that if the two populations w~ characterised by multivariate narmal distributions thal had a commoa dispelsiCIII matrix lhen the. n:sullanl c1assiftcation rule was a linear combination of the absc:rved feallRS. while if the dispenion matrices dill'emI lhen a quadratic function was necessary. Subsequently. the lheory was exlcDdcd to encompass dilferential costs incuned in mWlII clall5iftc:ation emn and applied to practical datascls.lllnIDSpires thai Fisher's LDF proyides the best classifiealion rule in the case of populations having MUUlVARlA'IE. NalMAL DISIllIBIJ'1ICX\IS with a common dispersion matrix and thai the cul-otl' value for classiftcation is a simple ratio of allSts of misclassification as well as prior ptubabilities in the two populations (see McLachlan. 1992. for details). Fisher's derivation shows dlat his linear discriminant function will provide a good separation bel\\'een groups for many pradical Situations. bul the subsequenl Ihcan:lical developments warn that dlis function may not provide gaod classiftcation perfonnance if populations an: DOl multivariate normal widl equal dispersion matrices. Allenlalive functions an: nccc:ssary in such cases. Data arising in dilfemlt conlexts may have aspects ....1 can readily be modened by parametric distribulions alhcr dian lhe 1IDI1DIII. so various functions have beea derived for specific lypes or data. Eluunples include fumions based on the !oI1JL'IL'lOMlAL DISIRIBunoN for discrde feature data. on Ihe localion model ror mixed discrete and continuous feature data and on dislributions such as lhe EXFONENnAL DISI1UBUlION. the STUDENTS l-DlSlRlBunoN and the invene normal for continuous nonnormal feature data (sec McLachlan. 1992. for clcaails). An exlremeusumplion is that all feallRSan: independent. whence Ihe class-conditional distributiolW can be simply estimated by pracluclS or marginal diSlribuliCIIIS. Although seemingly a loIally inappropriate assumption in masl praclical siluations..lhis method has had SUrprisingly good results on occasion (see Hand and Yu. 20(1). Howcyer, approaches other dian the paramc:lric need to be sought if the .aultanl mcIhod is to be widely applicable. Fix and Hodges (see Agrawala. 1977. pp. 261-322) lherefore iniliated a stream of n:seard1 into nonparamelric methods. Here the data alone delCnnine the classificalion rule wilhout imposition of aDy distributional assumptions, so that
the mdhocls can be applied in almost any contexl. TWo of their idcu. winch have undergone refinement bUI which relain their popularity, an: nearest-neighbour and IcmIeI melhods. The k-rretlrest-neighbo"r (kNN} rule simply allocales an individual to the group thai is in the majorily among lhat individual's k nean:sl neighbours. Although simple in conecpl. lhis approach has seveml obvious questions that do DOl have casy _wen: how do we measure the 'distance' between individuals in anIcr 10 delemUne lhe nearest neighbours and how do we choose: an optimal value of k in a given situation? 1'hcse questions have been adcIMsscd in a number of contribulions to the literalure. The kemel melhod, by contrast. is a nonparamclric mc:daod of eSlimating the probability densilies in each group vialhe average or a so-called kernel function evalualcd at each cIaIa point (see DENSrrY ESmlA11ON). Once the density funclions have been estimalcd then computing their ratio althe point to be clusifted leads din:ctly 10 an allocation rule. Fix and Hodges proposed a erucic 'kernel' based on the empirical histograms ofllle data. Hand (1997) contains a good overview of both the kNN and the kernel melhods.. while Mclachlan (1992) proVides man: Icchnical detail. A method intenncdialc between fully panunclric models of the populations. in which lhe dala are required to satisfy fairly strict assumptions. and the nonparamelric approaches jusl described is the idea oflogistiC'diKrimillQlion (LD). This was introclucccl in the early 1960s by Cox. Day. Kenidgc and others. but was mainly developed in a series or papers well summarised by Andenon (1912). The idea is to c:stimaIe directly the posterior probability of graup membership fOl' an individual by a simple runction ofitsabserwd features. Since this probability has to lie belween zero and anc.lhe simplest such function is logistic with argument given by a linear combination of the observed fcalums. It can be shown lhat this fonn or posteriOl' probability is eilher ellhibilcd or approximalcly exhibiled by a range of diITerenl panunetric models including some of those already mentioned. so lhat the approach palentially has wide applicabilily. Moreo\'cr, it only requires estimation of a small number of paramc:tcn. as opposed 10 some of the paramclric methods that involve manYpanuDeten. so it should be simpler 10 apply. Once some technical difficulties had been O"'R:ome the medaod proved both popular and useful and can now be found alolllsidc FlSher·s LDF in many software implementations. Dramatic inereases in computer power achieved towards the end of the 20th century led to an explosiCIII of inle.at in compulalionally intensive melhads. One of the ftrst ideas in this vein was regulariscd discriminant analysis. where an optimal mixture of difrerenllypcs of discriminant funclion (such as linear and quadndic, for example) is sought by computationally optimising a criterion such as lhe crass-validalCderror I1IIC (see later). This was followed by TREE-STRUC"RJRED ralElHODS such as classification and
139
DISCRIMINANT FUNCTION ANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
n:grcssion tn:cs (CART). multiYarialc adapti~ relft'Ssion splines (MARS) and flexible discriminant analysis. while overshadowing all these approaches has been Ihc develapment of IlEUR.Uo NEA'OUS and SUPPORT VErIOR MACHINES 8IIIOIII1hc compulcr scicac:e mmmlDlity. UnfortunalcJy. Ihc incn:asingrelianc:c on campulalianal power in these mcIhods has tumecl each process into somedaing of a 'black box'. with n:sults simply being producc:cl at the c:ad of a loag series of campulcr operations and little chance beilll providc:d for either intc:nention in the process or interpn:tation of the underlying discriminant functions. More recently.lherefore. attention has been given to computer-inlc:nsive enh8llClemcnlS oflraditionaJ methods. One generalappmac:h ismotlel tWeIYIginr. in which rather than seeling a single 'optimal' discriminant function of a given fonn many different such fUnctions are derived and Ihcir ayerage (in some sense) is usc:cI for future pmlidions. MARmv CHAIN MoNTE CARlD mclhods fall under this heading and much rcsellKh is curn:ady uncler way in their applications to discriminant functions. A good accouat of this work is liven by Deaison et aL (2002). The sc:cond genenl approach is in the use or local model.". when: instead of estimating parameters just once for all regions of the sample space.lhcy are estimalcd separately for many sublqions. Thus. forexample. the aptimal number k of aearest neighbours to use in iNN classific:ation is estimatc:cl separately for all potenlial points to be classified. Whicheyer method of discrimination or classiftcatian is chosc:a. a paramount mnsidmdion in practice is to obtain a n:liable assessment of its perfannance. Then: 1ft many possible mcasw-c:s that can be used for such assessment (see. forexample. Hand. 1997. Chapter 6). butoverwhelmiagly the mo5I prevalent in practice is the misclassincation rate. Agaia. ~ an: many possible ways of definilll such a rate. but here we will just consider the one relewnt to mast practitioners: Giyen a particular classification rule formed flUm some sampledala. what is the probability ofmisclassifying a futon: individual when using this rule? It is possible to tackle this question thcon:tically. by postulating a probability madc:1 for Ihc data and then following duuugh with a sequence of probability cak:ulations on implied SAMJILIXO DISTRIBUTIONS to anive at a fanal value. Such calculations. however. fn:quendy in,vlYe heavy simpliftaations to achieve tractability and stand or fall according to the appmpriaIeness of the initial assumptions about the dara. 1bey have, the~fon:~ long since been abandoned as genuine melhads to use in practice and now usually serve only as beachmarksin simulationstudiesofprapertiesofnew methods. Attention instead has focusc:cl on JMII'Cly data-based methods of assessment of performance. usilll the data from which the classifteation IUIe itself is constructc:cl. Giyen that the aim is to assess how often mistakes will be made in classifying ./iIlure individuals and that in ordc:r to assess the accuracy oflhc classification we nc:cd to know the
lIUe group mc:mbcnhip of each indiYidual. an obvious meth-
od is to splitlhc available sample data randomly into two portions. One portion. the trominr set. is used to fonn the classification rule. while the ueher. the test set, is used to assess its performance. Typically. the two sets are then combined and used to fonn the classific:ation rule rar actual use on future data. Such a prucess is known as crossvo/idalion. or coune, it assumes that futun: samples an: 'similai in campasition to the present ones and thlll the classification IUIe is stable over the different datasets. 'I1Ie fanner assumption is implicit in the classification procedure itself. but for the JaHer assumption to hold we really nc:cd largcdata'Cfs. Problems arise when aYailable samples are not luge. Either the training set will be YeI')' small. so the training sci classification rule may differ markedly from the ftnallUle and the wrong rule will be assessed or the test set will be very small. so the assessment of the rule will be poor. or bath dl1lwbaclts will oa:ur. An early attempt to solve this problem was by simply farmiDg a classifteation rule from all available data and then reapplying it to the same data to assess its perfonnance (raubsiilUtian). However. it was soon realised that this method will provide a grossly oyeroptimistic assessment far small to mc:cIium samples. This is because most classiftcation rules apc:ndc by optimising the poup sc:paration on the liven data. so such a "raubstitutian' emil" rate repn:senlS the best achieyable far the data and perfonnance on genuiaely future data will be much poaIa'. One possible solution is to conduct " distinct nadom training/lesl sci diYisionsaDd to average the "raultilll error rates as Ihc nnal assessment ofperformancc. This is known as n-/O/d aou-Wl/ic/mion. 1Bking this process to the limit means mnoving ead1 single observation iD tum from the data. forming the classiftcatian rule using all the other observations and classifying the one that has been left auL The proportion of observations misclassified in this way then provides an estimate of the error rate of the rule. This is usually known as the leal'l!-OM-out crossvalidalion estimate. The leayc:-onc:-aut appnJKb satisfactorily cam:c:1s for the known BIAS of the resubstitution cnur nte. but it has been shown 10 ha~ the unsatisfactory property of a high variance. Thus. although it will live (approximately) the cunect estimate 0" average over many replicates or equivalent datasc:1S, any sinlle application may yield an estimate far from the lIUe value. An altematiye line ofattack was therefore developed in the 19801 usinl the idea of bootstrapping. Hen: the awilable daaa values are sampled with replacement to give a IIOOI'S'IIAP sample. which is intended to mimic the dmwinl of a futum sample tium the populations IDlcler study. and relcwnt measures (such as the cnurralC) ofthe bootstrap sample are computed.. This process is repeated for a large numbuorbootstrap samples and this enables distributions of the measures to be studic:cl. In the context or classiftcalion
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ DISCRIMINANT FUNCTION ANALYSIS
enor rates. many potenlial baoIstrap corrections to the n:substitulion error rate have been consicIerccL The most popular appears 10 be the '632 bootstrap' estimate. A large number orboolstrap samples 1ft gcnerated,1hc classification rule is computed ror each bootslrap sample and the observations not rqxesented in that boolslnp sample are classified by the rule.lr ell n:pracnls the enor rate obtained in this way and eb ~prcsenls Ihc raubslitution ellOr rate of the original daIa. aben the 632 booIsIrap cnor nile is given by O.632ell+ O.368t!b. This appears satisfactorily 10 c:om:ct Ihe optimistic bias oreb. These dala-bascd methock are applicable ror USCSSl11CDI or any classification rule. However. the derivation or some rules itselr requira assc:ament or enor JUles as part of abe pnx:cdUM. Far example. cst.imation or the number k or ncan:st neighbours to use in kNN classification can be effected by tryilll all possible values orkin a given rBIlIe and pickilll the one that produces the fewest misclassification mors. Simply quoting abe rcsuIlant misclassification cnorrale apin gives an overoptimistic assessment or performance or the mc:thocl. because a pammcter has been chasen to optimise such perror11UllleC on the given data. n.c c:orm:l procedure hen: is to randomly divide Ihc data into three portions: a trainilll set. a validation sel and a tesl SCI. The classificalion rule is formed from abe training set and parameters are optimised by calculating enorrales ovcrthe validation set. Having thus seuJedon all parameter values. final assessmenl or perfannancc is conducted on Ihc (truly independent) lest sct. The allTCSpondirq; cam:ction in the Icave-onc-OUl pnxc:ss is a M:Jted leaveone-oul: one observation is omiucd and Ihcn Ihc classification rule is rormed and optimised using Ihe ~mainirq; observations.. ncstirq; a second Icave~1 process within the first for the optimisation. The omiucd observation is lhus only classified once all paramcIcIs orthe rule ha\'C been estimalcd. Many of the above ideas were applied by Asparoukilov and Krzanowski (200 1) in an empirical investiption ora mnge or ditrcmal discriminant functions on binary dala.. Five dataselS were used. of which the following four were medical: pili. monary dotQ - IS fealUres to discriminate 144 patienlS who sutrcn:d postopenative pulmonary embolism and 246 who did not: Ihrombosu dalll- IS rcalun:s 10 discriminalc 34 patients with embolic thrombosis rrom 68 paticnlS withoul the c0ndition: epilepsyt/QIQ - IS rcatun:s 10 discriminate 81 childn:n with craniocerebral trauma epilepsy from 48 without the condition: and lI1Ieurysm tlmQ - 17 rcatun:s to discriminate 102 patic:alS with diSSCCling aneurysm from 140 patients diagnosed with other similar diseases. All feallRS wen: already either binary in nalurc or WC~ converted to binary rorm. the two cateCOrics in each case being scon:d 0 and 1. Each datasel was subjected to a number of ditrcmat discriminant functions.. but the table shows the enor ndes usirq; lhasc runctions mentioned earlier. Each method was assessed exclUsively using iIS lea~nc-out
error rates.. so the best method on each dataset is the one with Ihe lowest error ratG.
,.1.
dlacrlmlnantfuncllon ..... Errorrates from seven methods appIed to tour datasets Du("rilllintnll PuimOfUlT)' 11Irombosu Epilepsy Anellrys", .'a .tD dalll dtztQ
procedure
Indepeadenoe Fisher LDF Logiltic
kNN Kanel NeuIaJ ad Vector
O.l46 O.l59 0.159 0.20S O.l72 O.l56 0.213
0.265
0.209
0.294
0.163 0.217 0.202 0.217 0.186
0.021 0.041 0.050 0.103 0.054 0.070
0.209
0.051
0.255 0.245 0.265 0.245 0.24S
support
These n:sulls dcmonslnlle a lypical empirical findilll, namely thai no single method is dominant in all cases and that each method pcrfonDS well on at leasl some if not all dalasclS. The independence assumption works well on the pulmonary and aneurysm selS. but badly on the epilepsy data. Fisher~s LDF is thc best method on the epilepsy dala. bul the worst on the thrombosis data. NEUJL\L NEIWORKS. SUPPORr VEcrORS and kNN classifiers ~ the joint 'winners' on the thrombosis data. but have mixed n:sulls on the other dalaselS. Therefo~ the mcssqe ror pl1lctical applicalions is to by a nmge of potential methods bero~ c1assiryirq; individuals. Extensions of all methods to the situation of multiple groups is slraightrorward in principle. although il may nec:d carerul computational implementation in practice. FISher's LDF approach extends dim=tly. by seekirq; the linear combination of features thai maximises the ratio of ~n group to within-group sums or squares. This is equivalent 10 muimisilll the F-ratio in saandard ONE-WAY ANALYSIS Of VARIANCE. which is the mulligroupexlension or the two-group I-tesl rordifferences bctwccn abe groups. Thcextra racet hen: is thai more than one f'unction faults from this process; indeed.. the number of funclions will be the smaller or lwo values: the number of rcatun:s present and one less than Ihc number or groups. The n:sulting functions are known as clI1IoniCYlI.YUiQle:J or aucrimintmt coort/ilfQles. and plOlling the original data againsl these functions as axes wiD highlight group ditTcmaces pictorially (sec CAHONIC\L CORRElATION ANa\LYSlS).
Welch's theoretical approach leads 10 classification being done via a series or pairwise compariSOllS. where ratios of each pair or population densities ~ in tum com~
141
DISEASE CLUSTERING _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
against cut-off thresholds until unique classification is achieved.. This can be a somewhat protracted process. 50 R:COUJK is usually made toone of the olhcr metbods. Logistic discriminalion givcs an estimatcd probability of group membcnhip for each availablc group and allocation is made to the group with the highest plObability. The kNN proccss gi\'es allocation directly without any chqe in its deftnition. Kernel discrimination requires densitics to be estimated separately for each group and allocation then rollows directly from these densities. while all 'black box' routines deliver allocations for any numbel' of groups. Likewise. all methods for estimating crror rates naturally exlend to the multigroup case. WK lSec also CANONICAL CORRELATION ANALYSISI
Aalnul., A. I(. (eel.) 1971: "'othiM I'ttogniliotl of pot/etns. New York: IEEE Pras. Aadenoa, J. A. 1982: IAgistic disaimination.ln Krishnaiah. P. R. and KanaI. L N. (eds)~ Handbook ofstalislics. \\11. 2. Amsterdam: North HoIIand.pp.I69-91. Aspal'Ollkllov, O. I(. 8IId KrDDowUi, W. J. lOOt: A compuison of discriminant pruccdures for binary variables. ComputDtiontU Slatislics and Dala AnQIyJis 38, tJ9-(i(). DealIaa, D. O. T .. Holln., C. c.. r.i'-lick. 8. I(. ad SmUb, A. F. ~L 2002: Btl)-esian melhods/or nonlinear CltwijkDtion QN/ I'tgl'tssion. Chichesler: John Wiley &: Sons.lJd..l1Iber, R. A. 1936: The use of multiple mcasurelllCllts in laXanomk pnIbIems. AnnIIlsof Ellgenics7. 179-88....... Do J. 1997: Conslruction tmd tuses.smmt of eiassijIcGlion rllks. Chichesler: John Wiley It Saus. Ltd.1IaDcI. D.J.8IIdYu, 1(.2001: ldiot's Bayes-not so stupid after all? Inll!r7laliOllDI Sialislimi Renew 69. 38S-98. Md.achIaa, G. J. 1992: Dis£rimintml DlfIIlysis _ slatislimi potlern r«ognition. New York: John Wiley" Sons.IIIC. W.... 8. L. 1939: Note on discriminant functions. BiomelrikD 31. 218-20.
disease clustering These are unusual uggregationsof disease that appear on maps of disease incidencc (Lawson. 2006. Chapter 6). They arc areas of such elc''Bted risk that they could not have arisen by chancc alone. For examplc. concerns about the inftuence of induslriaJ installalions on the health of surrounding populations has given rise to the development or methods that seek to evaluate clusters of disease around such installations. These c1ustcrs arc n:gardcd as n:prescnling local adverse health risk condilions. possibly ascribable to en\'ironmental causes. Howc\'er. it is also InIc that for many diseases thc geographical incidence of diSC8SC will naturally display cluslering al some spatial scale. e\'cn after the ·aI-risk' population cffecls arc taken into acxount. The reasons for such cluslCring ofdiseasc are various. F'JI'5t. it is possible that for some apparently noninfectious diseases then: may be a viral agent. which could induce clustering. This has been hypothesised for childhood leukaemia. Second, other aJlllmon but unobserved factonlvariables could lead 10 observed clustering in maps. For example. localised pollution sources could produce elevatcd incidence of disease (c.g. road junctions could yield high carbon monoxide levcls aod hence clcwtcd rcspimtory disease incidencc) or a
common treatment of diseases could lead to clustering of disease sidc-cITc:cts. 'l1Ic pn:scriplion of a drug by a paJticular medical pmctice could lead to clcwtcd incidence of sidecffects within that practice area. Hcnec. there arc many silualions where diseases may be found to cluster. even when the aetiology docs not suggest it should be observcd. Because of this. it is important to be aware of the role orclustering methods. e\'Cn when clustcring per se is notlhc main focus of intcn:st. In this case. it may be important toconsidcrcluslCring as a background effeci and 10 cmploy appropriate methods to detect such elTccts. 'lWoextremcfonnsofcluslCringcanbcdeftncd. Thcsctwo extremcs rqH'Cscntlhc spectrum of modelling from nODpanametric to panunetcric forms and associated with these fonns are appropriate stalislical models and estimation proccdures. F'JI'5t. as many n:scan=hcrs may noI wish to specify a priori the cxact fonn/cxtcnt of clusters 10 be studied. then a nonparumetric definition is often lhc basis adopled. Without any assumptions about shape or form of the cluster. the most basic definition would be 'any arca within the study region of significanlly elcvated risk' (Lawson, 2006. p. 104). lbis definition is oRcn ~fcrrcd to as /rot spot clustering.In cssence. any area of elcvated risk. regardlcss of shape or cxlent. could qualify as a clustcr, provided the area meets some slalistical criteria. Note that it is not usual to rc:prd areas of signiftcanlly low risk to be of inlcn:st. although these may have some importancc in further studies oflhc aetiology of a particular disease. Second. al the other extreme. wc can define a pammc:lric cluster fonn as 'thc sludy n:giondisplays a pn:spccificclclustcr SllUCture'. This definition describes a paramcleriscd c1U5lcr form that would be Ihoughtto apply across the study region. Usually. this implies some stronger n:sIric:tion on the cluster form and also some region-wide panuneters thai CXlfttmi the shape and size of cluslers. NOIISpeCijiC' (.'iuslering is the analysis of the overall clustering tendency of the disease incidence in a study region. This is also know as general cluslering. As such. the 85SCSSment of general cluslerilll is closely akin to lhc assessment of spatial autoconelation. Hcnce. any model or lCsl relating 10 gcneral clustering will assess some overall/global aspect of the c1ustcring tendency of the diseasc of intcn:sl. This could be summarised by a model paramctcr(c.g. an autoco~lation paramder in an appropriate model) or by a tcstthat assesses the aggregation of cases of disease. For example. the correlated prior distribulions used in the Bcsag. Yorl and MolliC (BYM) model (sec SPATh\L EPJDEt.IIOLOOY). It should be noted at this point that the gcneral c1ustcring methods discussed above can be n:pnlcd as nonspecific in lhat they do DOl seck to estimatc the spatial locations of cluslcB but simply to 85SCSS whether clustering is apparent in the study region. Any method that seeks to assess the locational structure ofclusters is defmccl to be specifiC'.
___________________________________________________________________ DROPOUTS
An ahemlllive nOMpecific effect has also been proposed in models for lract-count or case-event data. This elTed is convcotionally known as unconelated heterogeneity (or OVERDlSFDSION. or eXira-Poisson wrialion in the Poisson likelihood case). Specific clustering mncems the analysis ofabe locations or cluslers. This approach seeks to estimate the location. size and shapc of clusters within a study mgion. For example. it is slrBia;hlforwanilo formulate a nonspecific Bayesian model (see BAYESIAN ).IE1IIQI)5) for case events or Inct CXHInts that includes heterogeneity. However. specific models or testing pnx:cdura am less often n::ported. NevCJtheless. it is possible to formulate specific clustering models for the case-evcnt and trad-count situation. Another definition of clUstering seeks to classify the methods based on whether abe loclllion or locations of clu5lels am knowa or not. Focused clustering is specific and usually seeks to analyse the clustering tendc:DCY of a disease or diseases around a known location. Often this location could be a pultlU,'e pollution source or health hazard. Nonfocused clustering does not assume knowledp: of a location of a cluster but seeks either to assess abe locations of clustering within a map or to assess the overall clustering tendency. Hc:ace. non focused clustering eould be spcciftc or nonspecific. The: IilCratun:: of spatial epidemiology has developed conSiderably in the an::a of hypothesis testing and. man:: specifically. in the sphen:: of hypothesis testing for clusters (see. for example. Lawson and Kulldorff. 1999). ~ early developments in this arca arose from the appliclllion of statistical tesls to spatiotcmporal clustering. a palticularly strong indicator or the importance of a spatial clustering phenomenon. As noted above. distinction should be made bctwc:cn tests for gcneml (nonspcciftc) clustering. which assess the overall clustering pattem of the disease. and the specific clustering lests w~ cluster locatiOM an:: estimated. For case events, a few tests have been de\'eloped for nonspecific clustering. Specific nonfocusc:d cluslel' lests addn::ss the issue of the loc&lion of putative dusters. These tests prodUCIC n::sults in the form of loc&lionai probabilities or signillcances associalcd with specific groups of tract counts or eases. Openshaw and co-wcders (Openshaw el 1987) lirst developed a general method that allowed the assessment of the location of clusters ofcases within large disease maps. 111c method was based on ~peated testing of eounts or disease within circular n:gions or diffen:nt sizes. n.c statistical faundalion of this method has bc:cn criticised and an impro\ocmcnt 10 the method was proposed by Sesag and Newell (1991). An alternative statistic has bc:cn proposed by Kulldorff and Naprwalla (1995) (the scan statistic). n.c tell can be appJied to both case events and Inlet eounts. An evaluation of various tests for clustering has bc:cn made by KulldortT. 18ngo and Park (2003). Focused tests have also developed and then:: is now a range of possible tesUng
m..
procedures (for a ~ccnt evaluation see Liu. Lawson and Ma. 20(9). Cluster modelling has seen some development but has
developed as fully as testing procedun::s. Usually Ihe successful models ~ Bayesian with prior distributions describing the clustering behaviour (sec, for example. Lawson and Denison. 2002. and Lawson. 2009. Chapter 6. for m:cnt examples). AL
not
.... J. aad NeWIll, J. 1991: 'Ibc ddcc:tion of clusters in ran: diseases. Journal of die Royal S1atislic:al Society. SericsA 154.143. KaIIcIarIr, M. IIDII NquwaIa, No 1995: Spatial disease clusters: deccction and infemace.. Sbdistics in Medicine 14. 199. ~ 1\1., Toea. T."'" hItl, P.J. 2003: "~rODDll'8riSDDs fordiseasc: clustering leSts. Computational Statii1ics mel DaIS Analysis 42. 665-84. LaWlDDt A. B. 2006: Sialblimi melhods in spolia/epidemiology. 2nd edition. Chichester: John Wile)' &: Sclns.lJd. uwsoa. A. B. 2009: BoyeJiatl Jisrtue mtlpping: hiermchkal modeling in sptllial ~pidemioIOD. New York: CRC Pn:ss. ta....., A. B. IIDII 0mIIDII, D. (cds) 2002: Spalill/ cluster motIeHing. l.cJIIdon: Chapman &: Hall. La. . . A. B. ad KuDdarfJ,M. 1999: A n::view of cluster ddedion methods. In DisroJe nIOPPinB atld ,iJJC twaS· DH!IIlfo, pub/ir MoIth. Chichester. Joim Wiley a. Sons. LId.. Ua. V.. I..awIoa, A. B. aad M., B. 2OC»: Evaluation of putative bawd Ics1s under ba<:kpo&md risk het«O&CDCit)'. Enrironnrelrks20.l. 2tiO-74. OpeasIIaw. s.•, .1. 1987: A III8Ik 1ceopaphical analysis machine for die automated analysis of point data sds. Intmwlional JourNll on Gf!OBrap/rkallnformatiotr Systmu I. 335.
disease mapping See
DJSE.o\SE CLustDINO. SPAlW.
EPlD&tIOLOOY
disease surveillance
See SA\11OJEMIIORAL DISEASE
5VR\'ElU.t\NCE
DMF score
Sec DENl'AL STA11S11CS
dot plot 1his is a useful graphical display for data on some continuous variable reconIc:d within the categories of a particular categorical variable. An example is shown in Ihe figum on page 144. A dot plot is gcnemlly far mon:: elTc:ctive in communicating the pattern in the data than either a PIE CIWlTor a bar chart. palticularly if Ihe numbcrof categories is reasonably large. SSE dropout at random (DAR) See DIlOPOlm dropout completely at random (DeAR)
See
DRQIIOUfS
dropouts 'These arc patients in a study. commonly a clinicallrial. who fail to attend protocol-scheduled visits or assessments ofaresponse variable taken after some puticular time point in the study. Dropping aut of a study implies that once an observation at a putic:ular time point is missing 50 an:: all abe n::maining planned observaUons. Sucb missing
143
DROPOUTS _________________________________________________________________
I
I
I
I
I
Fumace Labourers Conslruclion Painters
...
Tobacco
:
COm....ricaIions
....
_'W'
...--
Chemical Senice Engineering
-.-
Minn
::
warehousemen
Crane drivels Woodworkers Clothing LeaIher EIedricaI Other
-= ...-
-~
Textile
Printing Sales
Clerical Managers Professional
.-
--
:::
Glass
Farmers
----
.-
--
-
:
---
.:
... ~
:: -
...
I
I
I
I
I
60
80
100
120
140
SMR dot plot Dot plot of slsndaRlJsed mottalily rates lor lung C81JC8r in 25 occupatlonlll groups
absemdioas ~ a DUisllllClC and the very best way to avoid problems with missing values is not 10 have any! If only a small ptupOrlian of the paticnlS iD the trial dnIp out" it is unlikely Ihal lhcsc will cause any majar dimcullies for analysis and 10 011. Ir. howc\lCl'. a substantial number or dropouts occur th~ is potential rar makil1J iDt'Ol'lm inrerences aadlor praduciRl biased ellimata ir a valid analysis procedtft is nOl used.. In such cases considenlion needs to be given to die reasons why individuals drop out and how the probability or clrapping aut depends on the n:spoasc wriable~ since this has implications ror which rorms or analysis 1ft suitable aad which 1ft DOl (Ullie. 1995). 11Rc dropout mechanisms an: usually diffen:alialed based on the classification of missing values originally suge&led by Rubin (1976).
cIIops oul at lime t beinlllte same as the dislribution or the ruture w1ues or D subjc:ct who mnains in at time I, ir they have the Sllllla covariatcs and the same past history of oulcolDl: up to ad includiag lime ,. Murray and FiadJay (1988) providcan example or this fYpCormissing wlue fram a study orhypertensivc druas iD which die oUklame mc:a51R was diasIolic blood pn:ssui-e. The protocol or the study specified lhat the participant should be mnovecl rrom Ihe lIudy when his or her blood pres51R became too high. HeR blood pMSSUIe at the time of dnlpaul was observed befoM the participant diapped out. so "altholllh the missing wlue mechanism is DOl DeAR since it dcpendson the values orbload 1R5Sum. it is DAR, because dropoul depends only on the observed ~ or the data.
completely III ralfllom (DCAR). ~ Ihe probability Ihat a patient drops out does nat cIepead on eillter the observed or missinl willeS or Ihc n:sponse. ne observed (nonmissing) values effectively CODSIiI1ltc a simple random sample or the: values far all subjects. Possible DrOpDII'
,mulom (DAR). Tbe dropout at nndom mcchaniSID occun when the dropout process depends on the outcome mcaslRS Ihalha,,'e bceaobsc:nal in lite past. but given this iDfarmation is canditiaaally iDcIcpeadcnt or aU the ftablM(Unm:onIecI) 'Values ortheoutc:omc variable following dnipouL Hen: "missingness· depends oaIy on the observed daIa with the distribution of fulUM values rar a subject who Dropout QI
examples include missing Iaboralory measun:menlS because of a drOpped tcsl tube (if it was nolo dropped because of the knowledge of any measun:ment). Ihe
____________________________________________________________ KCidentaI death of a participant in a study or a participant moving to IUIGlher 8IQ. Complelcly nndom dropout causes least problem far data analysis. but it is a strang assumption.
NDnignorQble (sometimes referred
10
goneal variable with k outcomes will n:qwre k - 1 dummy variables to rqRsent it.
dummy variables Dummyvatiables ()4 and xJ USBdto IfIPI8Sent smoking status for 14 people
Q:I injDnrrol;ve,
dropolli. For Ibis final type of dropout mechanism.
missinr;ness clcpcnds on the unleClXdc:d missing values obsc:mdions an: likely to be missing when lhe true values 1ft systematicaUy higher ar lower Ibm usual. A nonmedical example is when individuals willi lower income levels or very high iDcames 11ft: less likely to provide their penonaI income in an interview. In a medical. setling. possible ClLamples an: a participant dropping aul of a Iollliludinal slUdy when his or her blood PftlSSIR became 100 high and this value was nat observed or when their paiD became intolerable and the associated pain value was not n:conL:d. Deaiilll with data containing missing wlues of this Iype is Dot roUline.
Simple methods
DUMMYVAR~BL8S
or analysis for longilUdinai data. e.g.
ar SUMMARY MEASURE ANALYSIS rely on the DeAR assumpliaa: others. such as lINEAR r.nxmER!CIS MODELS. involve only the weaker DAR n:Slriclion. Identifying Ihe type of dropout mechanism far a particular dataset is ~y straightforwanl. although some useful inrormal procedures are described ia Carpealcr. Pocock and Lamm (2002). When informative dropouts an: suspc:cted the methods suggested by Rabe-Hesketb~ Pickles and Sknmdal (2001) may be useful. SSE
CunaII smoker
o
Ell-smoira Ell-smoira Carraat smoker Never smoker Never smoker Never smoker CUmnI smoker Never smoker CunaII smoker Ell-smoira Ell-smoira Never smoker Ell·smoker
1 1
o o o o o o o 1 1
o 1
1
o o I
o o o I
o 1
o o o o
CC»IJIIEIE CASE ANALYSIS
Qupeater, J.. PDcad, S. ad ......., C. J. 2002: Caping with missiq data iD clinical llials: a model bued appaNIda applied to asthma biab.. Slatislie$ in &I«lidlre 21. 1043-66. u., R. J. A. 1995: Modeliag lhedmpout mechanism ill repealed ~ studies. joumai tI/ Iitt AnfIIrialll SIQlisl;tQl AssotialiDII 90, 1112-21. MIIIT81, G. D. ad FbIdIaJ. J. G. 1988: Carn:ctiq for Ihc bias
caused by cbopouIs ia h)'pcrteasiGft Irials. Slalisl'~s in MetlidM 7. 941-6. ............ So. PIcIda. A. ad S~ A. 2001: GLAMM: a class of'madels and a STATA piOpIID. Mllililel'el Modelling Nett'JI~lter 13. 17-23. R....., D. & 1976: InfcmICc aDd missiq data. BiDmelrilcs 63. 581-92.
For ClLample. consider a sitWlliaa when: the hypalbcsis is that blood ~ is different between thase who currently smoke. used 10 smoke (but have sinc:c quil) and have never smoked.. Assumillllhat suilable clatahave bcc:ncollccted.1he hypothesis could be acsted by filting an ANALYSIS OF VARIANCE model with a smoking group as theexpl....ory variable and blood pn:ssure 115 the clepenclenl variable. nus explanatory variable would have thn:e levels (current/ex/never) and thus two DBJRJ!ES OF ~I. An aJtemative approach would be to cleftne two dummy variables to n:present the smoking group. 58y:
dummy variables
These comprise a set of variables. each with only two possible outcomes. that an: used to n:present a categorical ClLplanalory wriable in a statistical model that is designed to handle quantitative variables. Usually. the two outcomes allowed for each dummy variable an: taken to be zero and unity. Each dummy variable takes the value zero when the pan:nt categorical variable attains its pn:dcftned n:f~nce. or base. level. When the caIer;orical variable attains any ather level. one unique dummy variable takes the value unity: the n:&I take the value ZCIO. A calC-
I
for ex-smolccrs fOl' never smaken
I
far cum:nl smokers far never SJDDkers
XI
={ 0
.t'2
={ 0
(see the table) and then fit a !.IUL11PLE RmRESSION model with .~I and X2 as the explanatory variables. The I'Cgression sum of squan:s (with two dcr;rees of freedom) in this re~ssiaa model would be the same as that for the variable 'smoking group' in the earlier analysis of variance model. 80Ih XI and %2 are conlrasts between. respectively. ell- and Dever and betwec:n cum:nt and never smokers. In this sense. never smolcen 11ft: the rcfen:ace group far analyses of smoking status.
smom
145
DYNAMIC DESIGNS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Dummy variables
1ft
useful in IhaI they remove the
necessity to develop scparate statistical models for CaleCaI'ical and continuous explanatory variables. 1bey also allow variables or mixed type 10 be handled within the same.
single methodology. Some compuler software: requires the user to define dummy variables ror lhemselves. wben:as others do the compu-
laIioo automatically once the particular variable ror which dummy wriables are n:quirc:cl has been dc:clarc:cl as a cateMW gorical variable.
dynamic designs
Sec DATA-DEPENDEN1' DESJONS
E EaSt
This sonwlRpackage allows Ihe clcsign and analysis of sequential trials. The name ·East' is derived from 'Early stopping' • As an alternativc 10 sbmdan:l. fixecl-sample OJNIC\L mw.s. ftexib1e clinical trials utilise gRJUp-5Cquentiai and adaptive mdhodolacies to pemlit inlerim looks at accruing c:IaIa with a view to making early stopping or sample size n:adjUgmenl decisions. while pn:sc:ning Type 1 enor and POWB. Jennison and Turnbull (2001) provide a thorough trealment or sequential design and analysis. These methods havc been incorporated into Ihe East softWIl'C packqe devclop:cl by Cytel SaflwlR COJpDI1Ilion (www.cytel.com). East has Ihree basic components: a clcsip module. a simulation module and an interim monitoring module. The design module can be used to design two-ann superiority or Doninferiorily IriaIs of normal. binomial or timc-lo-eVent J!M)POINTS. Extensions to marc general endpoints are available through the use of an inflation ractor thai incn:ases the sample size or a fixed-sample design by &he appropriate amolDlt so as to pn:serve power. A special design worksheet is provided ror designing studies on the basis or maximum infomudion rathcl' than maximum sample size. Such designs proVide the I1exibility 10 adjust the sample size during the interim monitoring phase. to accommodate adjustmenls to impol1ant design panuneten. such as patient-t.palient wriability.lhat might havc been misspccified al the design Slap. Many ramilies or slOpping boundaries and spending func:tionsare available. daus proViding ~ flexibilily rar making early stopping clccisions either foref6cac:y or futility or both. Trials cn:aIcd by the design module can be simulated in the simulation module. Since the statistical theory IDIdcrlying East utilises Iarge-sample assumptions. tbc: simulation mudule is a useful tool for verifying that &he operating characteristics of the desip are plaCrYcd for small or unbalanced studies. A special reatun: of the simulation module is the capability to simulate adaptive designs. The interim monitoring module includes a worksheet that aaepls the cunent value or the lest statistic and the cunent sample size ofinformation. It then nH:omputcs the Slopping boundaries based on the specified spending functions. determines if the boundary has been crossed and provides important interim results such as conditional power and n:peatcd CONFIDENCE INlERYALS. East additionally plUvidcs tables. paphs and n:ports that allow investigators to visualise and clearly demonSlnlle the fealUres and results of the planned design. For example, one can plot stopping boundaries as a function of time or plaCnt eM power calculations in graphical or tabular form.
.Jtamoa, C.IIDII Tumbal, B. 2001: Group JWtUenlitli melhtHb ,.·Uh appIitrllions tD ~/;"itrll trillis. New York: Chapnan a: Halll CRe.
EBII
See EYIDENCI!-BASED MEDICINE
ecological fallacy
See EPIDEMIOLOOY
ecological studies Sec EPIDEMIOLOGY efIectIveness See
AWUm1EHr FOR NOXCOMFLL\NCE IN
CUNIC.o\L TRIALS
efIIcacy
See ADJUstMENT RJR NONC'OMIU~~CE IN CUNICAL
TRIALS
eligibility criteria
See INCLUSION AND EXa.u5KlN
CIUJ'ERL\
EM algortthm nus is a general computational method rarcalcuJating!.L\XIMUM UKB.IHOOD ESmIA11ON5 with incomplete data. e.g. MlSSINO DATA or data containing CEHS(IIEJ) OBSERVA11ONS. The algorithm is based on the notion that if we had the missing or ccnson:d observations we could estimate panuneters or interest in the usual way and Ihal if~ knew the parameters we could impute the missing observations by selling them 10 their pralicted wlues under the madel. Consequently, given some initial values for the parameters we can proceed itcndively bct~n axnpuling predicted values. filling in Ihe missing observations and then eSlimating the panuneten using new 'complete data'. 'I1Ie algorithm is widely used in slalistics and a detailed technical account is given in Laird (1998). BSE
LaIrd. No 1998: EM alprithms.ln Annitqe.. P. and Colton. 1: (eels). Encyclopmill If bioJtalislia. Chicbesler: Jolm Wiley a: Sats. Ltd.
endpoints 'I1Iese describe a measurement or discrete CYell~ n:IaaccI to the disease under investigation. which
measures the effectivcness of an intervention. The definition of suitable endpoints depends on the di!le8Se IDlder study. In serious diseases.. such as coronary bean disease. endpoints sucb as 'mortality' provide a ~liable measure of disease: progn:ssion. whileaftercundive surgery farcanc:er. 'n:cum:nt CIllltU' would be an example of such a measwe. 'I1Iese examples are "biruuy' in nature (i.e. subjec:ts have either had the evenl ornol), but in some diseases it is mCR appropriaIe to mcasurediseascprogression in tennsofaconlinuOllS outcome
EtrqdfllNlfldit: CfIIIfJIIIIf- 10 MNialI S/mBlks: SRrIIIII EdilirM YIed by Brian S. Eyeritt _
Chrisloph« R. JIaIJa«
C 2011 10ID Wiley 4 Sou. ....
147
ENTRYCRnER~
________________________________________________________________
measure (c.g. blood pmssure) or an ordinal scale (e.g. a measure oflhc quality oflifc. such as the SF-36). When designing a randomised trial (see CWIIC'.o\L tRIAlS). it is usual to define primlll')' and sccondary endpoints on which judgements about the ovcrall benefits and hanns oftrelllment ~ to be made. The 'primlll')' cndpoinl' (or 'target variable') is chosen as the chief measure of the effects of an intcrvention. on which analyses me to be conduc:tcd in order 10 assess the primary hypothcsisoriginally stated in the study protocol (sec PROI'OCCLS IN CLINICAL tRIAlS). Genc:rally. the primary endpoint is a measun: that is CXpcclc:d to be innuenced favourably by the intcrvcntion. i.e. it is a measure of clinic:al emcacy. For example. basc:d on long-term randomisc:d trials of aspirin conducted among palicnts who bad survi,'cd a heart attack. il was anlicipalc:d that aspirin would reducc the clinically important endpoint 'vascular lDOIIality' in the first few wc:eks after a suspected acute heart attack. so this was the choicc of the primlll')' cndpoinl in the Second International Study of Infarct Survival (lSIS-2 Collaborativc Oroup. 1988). Occasionally it may be appropriate for a primary endpoint 10 measure safety; e.g. if we wen: to compare a higher vcrsus a lower dose of aspirin. the known pharmacology of aspirin might lead us to hypothesise that both ~gimcns would havc similar clinical eflicac:y. bul"'at serious blc:c:ding might be less frequent with the lower aspirin dose (Anlilhrombotic Trialists' Collabontion (ATC). 2(02). In these circumslanccs. we mighl be advised to choose 'bleeding mauiring transfusion' as a primary safely cndpoint. As weD as defining a primary cndpoint in the sludy protocol. together with the planned mc:daad of statistical analysis of that prinwy cndpoi.., it is usual to define a number of "secondary endpoints' . These ~ mcasun:s that. when considen:d with the prinwy cndpoi... provide helpful additional information about the clinical ellicacy and safely of the intervention. In the 1S1S-2 trial, for cxample, an assessment of the overall bcncrds and banns of aspirin (and of streptokinase) was facililalcd by assessing the cft"ects of these tn:almcnts on sceondary cndpoints such as rc-infarction, hacmorrhagic stroke and bleeds requiring transfusion (1S1S-2, (988). In view 0( the fac1 thalthe aim of assessing clTects on lc:Condary endpoints is 10 make sound clinic:al judgements aboul the balance ofbencfil and harm. il is impemlivc to limil the number 0( the secondary endpoints. If too many me chosen. the likelihood that an incffecti\'C tn:1Ilmen1 might appear to influenccthcriskofoncorlDDlCsuchendpoints(i.e. a TYPE I ERIIOR) iocreaes and it is dilJlcult to inlcrJRt any data that rauiL In circumstances where a number of clinically important endpoints are likcly to be innuenced ravourably by a treatment. itmaybeuscrultodeftnea "compositc· (or 'combined') endpoint. whe~ a subject is considen:d to havc n:achcd this endpoint if they experience one (or more) or the component cndpoints. When a lmltment has similar effccts on each of the components of a composite endpoinl, usc of that
compositecndpoint inc~s the statistical powcrnol only of the prirrwy analysis but morccspecially of subgroupanalyscs aiming to assess whether Ihc: effects of treatment cliffeI' importantly among sclc:ctc:d catcgorics of patients of clinical intcraL For examplc. to assess antiplatelct therapy for thc prevention or occlusivc wscular disease. which is a systemic disease affecting both cardiac and cerebral arteries. the composite cndpoint "myCX'anliai infarction, stroke or vascular death· has been found 10 be useful. particularly when assessing effccts in particular groups of patients (such as men and women. young and old, etc.) (ATe, 2002; CAPRIE Sleering Committee, (996). Scvc:ral problems arise. howe\'Cr, when a composite endpoint is designed to assess a "global cffect' of lmItmcnl by grouping together the major anticipated bencftts and hanns. In these circumstanc::a. an cstimated elTecl on the global composite outcome that is not statistically signiftcanl could reftcct a worthwhile beneftt masked by a harm. and this would be enti~ly missed unless the benefits and hanns an: considen:d separately in a trial or META-AN.\LYSIS that is largc cnough to assess the cffects on both. In circum5lanccs when il is not practical 10 assess the effects oflrcatmcnt on clinical endpoints, il is possible toconsidcr the use of a SURJIOO.O\TE fJlDPOINJ" far assessing In:alJncnt efl'ect5.ln the past. howevcr. many promising surropIC en~nts havc proved unn:Iiablc. For this n:ason. il is inapprvpiate 10 Edy on a nonvalidatcd SUII'Optc endpoinllo assess abc effects of a dnag thai is 10 be used for a common and 5Crious disease. The choice of endpoints and the method by which they an: to be analysed should be set out clearly in the saudy protocol. It is important thallhis choice is ,ivcn can:ful thought. sincc il may oftcn determine whether a trial suc:cccds in answcring a clinically userul question. and it is climcuJl to change the choice of cndpoints aftcr Ihc trial has closed without the risk of introducing serious bias. CB AntIthromboIk TrIaIIstI' CoIlabontIaa CATe) 2002: Collaborative mela-aDDIysis of raadomiscd Irials for prevenCion of death. myoc:anIiaI infarction. and suoke in bigh-risk patients. Brilim Medial JOrmrtlJ 324. 71-86. CAPRIE 5........ CCIIIIIIIIUM 1996: A raadomised. blinded. bial orclopidogrel versus aspirin in plllienls at risk of ischaemic: events. The I.mIcrl 348. 1329-39. 1SIS-2
(Secoad .ate..tIonaI Stud)' of InIard Sarvh"BI) Callabaratlft Graap 1988: Randomiscd trial of inlm'eD0U5 sueptokinasc.. oral aspirin. both. or neilher among 17.187 cases of suspected acuk: myoc:anIiaI infllKlion: ISIS-l. T1re Ltmt:el ii. 349-60.
entry crlterta EPlINFO
Sec INCU.~ION AND EXCLUSION C1tIIERIA
See CASE-(oNIllOLSTUDIES. STA11SlICAL B\CKAOES
epidemiology
This is the sludy of disease and its risk ractcn (Clayton and Hills. 1993: Ashton. 1994; Siolleyand LuJc.y. 1995; Woodward. 2005; Rothman. Greenland and Lash. 2(08). Originany usc:d to describe the study of
_________________________________________________________________ epidemics. it now encompasses noncommunicable as well as infeclious diseases. Simple examples would be an analysis of the tn:nds over lime in the number ofcases of AIDS recorded in a certain country and a comparison of the age-specific death ndes due to AIDS between countries. More complex examples would be an evaluation of the proximity of nuclear power stalions 10 the homes of people diqnoscd with leukaemia and an invesligalion into possible associalion between ~gular consumption of fast foods and subsequent developmenl of canlio"ascular disease. Many of the major pUblic health issues of the day have been the subject of scruliny by epidemiologists, whose careful amassing offacls and figures. and subsequcat slalistie" analyses. ha,'e demonslrated a strong case farcausality. One of the lint examples was that of John Snow, who used 19th cennay fteld data to demonstnde that cholera cases we~ clustered uuund the Broad Street water pump that was used by h0useholds amund London's Golden SquIB (see HJS'T(]Rya: MEDICAL srmmcs). His data filled in with a theory. DOl gencraDy acxeptcd allhc time.. oflhe disease being carried by polluted walcr and with the practice of emptying sewerBle into willer thai made its way to the well which the pump SCl"\'Cd (Cha,'e. 19S8). Whenever the~ is a new. unexplained. heallh problem. epidemiologists will usually be chargc:d with plaiting the course of the problem, sifting the available infarmalion in ordu to uncover the nalUnd history and likely cause and helping to devise plans far avoidance of Ihc problem in futum Examples would be the so-called 'Oulf War Synckame' attributed 10 surviving soldiers from the Gulf War of 1~1991 and the outbRak of SARS in 2002-2003. which was particularly seveR in Asia and Canada (see Ihc first ftgurc). Epidemiologists tend 10 mab considerable use or sIatisliaal mc:lhodology. butlhc profession n:quin:s a subsbiMiallevel of medical kaowlcdge also. In m:c:nt years. specialist aras or epidemiology have emerged. such as CIENETIC EPJDSIIDLOOY. w~ epiclemiologicaltools an: appIic:cI to genetic daIa. 60
1
"'0
50
,!I! 0.
0
(1",1
,'30
i
,~
~
~
20 10
0
0000=0
Februaflj
rch
Ap
May
June.
,July
epidemiology Number of new BARS cases, by week, in canada in 2003 (WHOdailyrepods, wilhintetpOlation and extRJpOlation)
EPIDEM~OGY
Epidemiological studies fonn one of the two major Iypes of study desip in medical n:search - the olhcr being CUNiCAL TRIALS. The distinclion between the two is thai epidemiological investigations are observational whc:~s clinical trials involve interverrlions. For example. a large group of middleaged women may be monilored for several years and the proportions of women who develop venous thrombosis com~ between diose women who did and those who did not use hormone n:placement thcnpy (HRT) when Ihe study began. This would be an epidemiological study. A COrTespondingclinicaltrial might involve taking a group of women who have De''Cr used HRT. but for whom the~ an: no medical ~ns why they should not lake HRT. Some would be allocated to receive HIn' and same to receive a plKebo; both groups would be followed over time to make the same comparison as above. 'Ibe gmat advan~ of the epidemiological approach would be thal n:sults are obtained in reallife cin:umstances. Furthermore. there are ethical advanlages comJHlR'Ci with the approach of dc:c:idin& who receives Ihe factor of inten:st. even when the decision is made using some chance mechanism, as should be the case. whenever possible. in clinical trials. For example. it is amlikely that a clinical trial would be allowed that sought to study the elTccts of ciglRlle smoking. since it would be unethical to ask people to smoke.. Yet some of the classic epidemiological studies have compill=! smokers and nonsmokers for their cbance of disease. On the other hand. clinical trials (when feasible) gcnc:nlly offer ~ ~liable ~sults because they can be designed 10 minimise the effect of cOH/ounding. which is a aJIIUIIOn soun:e of BIAS in epidemiological studies. For example. perhaps women who lake HRT tend to be more educ8l&:d than those who do _ and hence the relative elTc:cts sc:ca cannot necessarily be allributed only to HRT in an epidemiological study. In the com:sponding clinical trial. the two poups could be ballUllCed in terms of the level of edUCaiion achieved. In making the distinction between epidemiologicalsludies and clinical trials. it should be understood that the praclising epidemiologist will make use of information from clinical trials whcn:vcr it conbibules 10 their subject of inte~sL Indeed. many intervention SlUdies an: CXJnduckd by people who would consider themselves 10 be epidemiologists. who IB, quite sensibly, using the best tools available for the job at hand. Fwthermore. a randomized clinicallrial will oftcn be considered 81 the ultimate test of the epidemiologicallhemy. For example. many epidemiological studies have found high consumption of rood that is ricb in certain vitamins, specifically those round in fNit and vegetables. to be prolective against heart disease. nus has given rise to the epidemiological hypothesis that these vitamins protect the heart. and thus to several clinical IriaJs of \i1amin supplementation. At the time ofwriting. the combined evidence from these Irials is that such supplementation has no beneficial effecl, leaving
149
~D8M~OGY
_______________________________________________________________
apen the questiaa or whether the ftnclinp iD epidemiological studies an: simply clue to confounding. Epidemiological invesligations might simply be eDIninations of routinely collected clala. such as "'gislnlions or death by cause. to search rOl" seaSClllal pallems (as in lhe firsl fip"') or difre",nces in disease incidence by regions or the: counlry. or examinations or cases or disease. 10 look ror common ractors 01" clusters or cases in lime and space (see also DISEASE D.US1'EJUNO). Such information identifies specific heallh problems and helps 10 rormulate theories aboul the: potential aetiology of the particular medical caadilion. The latler issue is sometimes addressed. using rouline da... in a fonaal way through ecologiCtl/ $Iudie$. These ~ studies or data on averap values of disease aateome and risk factor stalus rrom groups or people. typically lbase who live in specific regions. For example. Sl Leger. Cochnme aocI Maore (1979) ploued monality rrona COI'OIUII")' heart clisc:ase (CHD) per thousand men qed S5-64 years in 18 countries qunsl wine consumption (from induIII')' souR:Ics) in the same countries (see the second lIu"". 1he data seemccl to sDlgest an inverse ",Ialionship; e.g. France had the highc:st wine consumption and the: lowest CHD rate.
11 • Finland
10 9 c:
I "ell
8
!.
8
I
4
7
I
B.
• USA
ScoITancI ·NZ.Auslralia • Canada E&W ·.lreIand
Notway • • SWeden. .Belgium .AUSIria West Germany.
5
France.
2 1
Italy.
Switzerland.
3
,
0.40
,
0.80
,
1.20
,
1.60
2.00
LogarHhm 01 wine consumption (lilres per pelIOn per year) eplcIemIoIagy Death rates for men aged 55-64 yeatS oIdin 197011f111/nSt the /ogIIliIhm ofwfne ctJI1SumpIion in 18 counll'ies (St Leger, Cochrane Md Moore, 1979) Routine. orothertypc:s of~isling(secondary) clataan: clearly ",Ialively cheap and easy 10 coIlc:ct. and will oRca be consiclc:red authorilalive when derived frona gG'ia'IIIIICnt SOUKCS or from international orpnisations. such as the World Heal... OJpnisation (WHO). However. often lhey an:
iacomplele (such as when death rqiSlntions alone an: used to examine morbidity). inadequate (such 8!i wheD total numben an: n:canlcd but not numbers within impartaDl demographic subgroups) and out of date. Ecological studies. used to investipte associations. oRen suITer from mismatching of the paups in the two data series. those ror disease and risk ractar. For iDsIance. in the study or Sll..eger. Cochrane and Moon: (1979) deaths wen: for a particular agelsex group whc:rals wine consumption was feX' Ihe en~ papulation. In IDDSl cases they olTer little~ ell' 110. oppoltUllity to conllVl rar confounding. Furthermore, Ihe", is no raISOII why ",Ialionships observed ror groups should bold wilen individuals are observed - the ~Ied ecolo,im/tlIItJc,". Thus. il could be that. while France cIoc:s generally have a low nle or heart disc:ase and a mgh consumption or wine, Ihase Fn:nchmen who drink .elatively little an: the ones who suITer II10SI CHD. Ecological slUclies should be seen as hypothcsis-gcru:nling tools rather than ways of deriving definitive information on .elalionships between risk factors and disease. Although manyc:cological SlUcIies may well give fallacious ",suits. they may also be the: lint clue to an lISIOCialion that was not pn:viously discussc:cl. 51 Lqer. Cochraae and Moon: (1979) is a nice example or this: their dc:monslnlliaa that ",I&lively low wine amsUlllption mighl be prok:Clive DJBinst heart auac:k was initially mel willa scorn and was usually assumed to be Ihc: n:suIt of bias orCIOnfounding. Howe~~ subsequent ",sean=h. using mare .eliable epidemiological study designs (sec: below). caafirmc:d their hypothesis. which is now comIIlOnly accepted. Even so. the appaI'I:nt ~laIionship could have bee. spurious - far instance variations in Ihe way heart disease is diagnosed across countries and confounding with other aspc:cts of Ihe diet besides wine: lDight have explained away the invene association between wiDe and heart disease. The ecological dc:sip gencraJly C8IUIDl delvedceply enough to unecm:I'suchsubllelies: collection ofclata rrom indiyiduals is n:quinxL There an: tlu= main 1)'pCS or epidemiological study that inyolve coIlc:clion of new information I'rom individual people: CIt05S-Sf.CTIOJW. mJDIES (surveys). CASE-CDmIOL S11JDIES and COHORI" 51UD1ES. Surveys an: called crasssc:cIianal because they occur al a sillJle poinl in lime: (see Ihe Ihinl figu.e on pqc: 151). Oc:nc:raIly. lhey involve die: dmwing of a repn:senlalive sample or the enli~ papulation. allhough \'CIY occasionally the enti", papulation is included. in which cue a cc:asus has occUJRCI. Surveys have die: advantage. over using routine da... thatlhc in~gatar can collm precisely the information mauin:cl for the subsequc:al analysis. within practical alDslraints. They ~ particularly useful for descriptive purposes. Epidemiological surveys typically includc:quc:stians about disease slates and levc:lsorrisk ractors rarthese: diseases.111c: answc:n cm be used tocslimate pnwalenceordisease and Ihc: distribution of the risk factors. For iDslaDce. a national
_________________________________________________________________ population survey in Scotland (Tuaslall-Pcdoe el 01., 1997) included taking blood samples from all participanlS, f'nm1 which serum choleslcJol was measured. The results gave a piclwe of the disbibution of cholesterol in Scotland at that time. allowing (for instance) an e5linudc of the number and percentage of Scots whose chole5lerollevel was above that considered "safe' and thus were Iilcely to benefit grutiy from chole5lero1-lowcnng treatment. Sun'Cys can be made more accurate by using random sampling (to n:duce bias). using sensible stratificalion (to increase pn:cision) and laking a IlIl'Ier sample si~ (also 10 increase precision). The laller point needs some qualiftcalion: there may be no benefit from increasing sample size iflhis leads 10 increased bias emJI'. for inslance through taking less time. per subject. 10 ensure that accurate n:spGMeS arc solicilcd from questioning. Cluster sampling is often used for CIOIIvenience. or simply to n:duc:c costs. but docs have the unfortunale effect of clcclaSing precision. measun:d. for example. by the width of lhe CONFJDENCE IN11!RVAL for the estimate obtained from the survey (e.g. the mean cholesterol). Swveys have limited use in investigating CAUSAUrY in associations because they arc prone to the ·chicken and egg' effecl- il is difficult. often impossible. 10 asecrtain whether the observed value of the risk factor was a pn:cunar. or a consequence. orthe observc:d disease slate. For instance. the SaItlish swvey dcscribc:d ~ included the que5lion. ·Have you ever been told by a doctor thai you suffen:d from a heart allack?' Comparing average choleslel'ollevels between those who did and did DOl n:porl haVing had a heart allack gives a simple indicalion of whether cholesterol is associatc:d with having had a hean auack. However. any such conclusion of causalilY may be spurious. A high cholesterol reading today could be the consequence of a n:c:ent heart atlack. inslead of the hypothesised eJTect of relatively high cholesterol incmlSing the risk of a heart allack. Another example would be a survey where a particular chronic disease is found to be less common among those who smoke. This may not mean that smaking Ic:IIds to proICCt people &lainsl the disease~ instead it may be thai smokers. having developed the disease. give up the habit. thus leading to a pmlominance of the disease among those not smoking at the lime oflhe survey. Although il may be possible 10 discover when the risk factor was fint encounlcrcd (e.g. when smoking was taken up), the reliabilily of such information. often requiring long-lenn n:call. may be poor. In any case, it is nre 10 be able to fix a lime for the onset of the disease. Thus. surveys are not suited for investiptions of causalily. which arc commonly the pmpose of epidemiological investiptions. Man: reliable information 011 causality can be gleaned from a CBSe-allilrol sludy. A set of cases, people with the disease of interest. arc idenlified - e.g. through hospital n:cords - and the putative risk factor(s) of interest is(are) n:cordc:d for each case. In parallel. a conlrasting set or
EPIDEM~OGY
controls. those without the disease. are selected and submitted 10 the same investiptions as the cases. Comparison of the risk factor levels betweca cases and controls enables the risk faelor-disease association to be assessed. usually measured by the COOS RAllO. In principle, c8SC>Control studies comlNR disease slalUs now with risk faelor le\'els in the past. which is why they are often callc:d relrospective sludies (see the lhird figure). However. this is only really true if incident (new) cases arc used. and even then certain poICntial risk faelDlS (for inslanL'lC markcrsofinftammation) might be elevalc:d 50 soon after the disease hils tlud the 'chicken and egg' problem may occur. Bven if this type of bias is DOl an issue. c~ontrol 5ludies are very susceptible to other kinds orbias. such as that caused by differential quality of infonnalion from cases compared with controls,. the fonner group perhaps being of gn:alcr clinical inlen:s1 to study invesliptan and hence observed or researched more thoroughly than controls. Berkson's bias may occur in certain case-atntrol designs (sec BBtKSON·S FALLACY). Careful matching of controls to cases can increase precision. but still leaves the polenlial for sevc:ral importanl saun:es of bias. Cross-sectional study (survey) Case-
Risk fadors
con~ ~---------study
..----p-------~.T~
--------~--
Disease All (yeslno) data
I L
FoIl~~
Baseline risk factOIs
epidemiology Schematiccompafison 01 the three major study designs in epidemiology NES1I!D CASKONTROL STUDIES. on the other hand. are much less prone to bias envr, as is also the case for the relatc:d casecohort design. These arc prospective studies based on Ihe relro5pcctive c~ontrol conc:cpL Although c~onlrol sludies are Dot the most reliable SOUKeS of information on causality for most epidemiological relationships. the casecontrol design is the design ofchoice in two situations. One is where the disease is so nre lhat any other lcind or design is unlikely to produce enough cases of disease to obtain reliable eslimates or association. The other is where Ihe risk factor is transient. such as an outbn:ak of food poisonilll. For some transient risk factors. a case-crossoYer study. where a case serves as his or her own control. mighl be advantageous. An example of this is a study or dri\ICIS involved in road traIIic accidents where mobile (cell) telephone use just before the crash was comparc:d with mobile phone use in eqUivalent
151
EPIDEMIOLOGY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
periods when Ihc subject had nocnash (McEvoy et 01•• 20(5). Since drivcu are theirown controls. Ibis design autOinDlicaily ClDIItrols for nonIraDsient characteristics ofthe driver (e.g. age and sex) that may affect lhe risk of a cosh. In other situations lhe epidemiological design of choice is the aJIIort study. In the typical situation. a IarJc group of people an: sun'eyedat a point in lime.orDileastovera limited number of months.. aad several pUlative risk facton m:oRIed. Over succccding years (lhe follow-up) instances of disease and death may be I'CCOrdcd and related to lhe Ie\ocls ofa risk factor DI initiation (baseline). Thus. cohort slUdies an: said to be prospc:ctive (see lhe third ligure). For example, aner lhe initial cross-sectional study of the Scottish population mentioned above, the investipten ammgc:d for any hospilal admissions for coronary disease aad deaahs expcricncc:d by any of their sample (now called Ihc study cohort) to be n:cordc:d. 1Unstall-Pedoe el 01. (1997) describe the relationship bctw=n coronary disease and 27 different risk facten for this study over a 7-year period. 11Ic advantage of the aJIIort study over the casc-c:onlrOl study is that the time sequc:ac:e of risk factor pn:cc:ding disease can be eslablished. strengthening the argument for causality. For instance. Tunstall-Pcdoe and colleagues were able to conclude that high choleSlcroI levels tended to be followed by heart attacks, rather than lhe other way round. Another advanlage is thatlhe cohort study can be usc:d to invcslipte several risk facton and sevcnl diseases. whereas a casc-conlrol study is restrictc:d to a single disease.. that which defmcd the: cascs. One disadvanlqe is that they are time consuming. since it may take many years for enough cases of disease or death to occur to enable reliable estimation. Another is the likelihood of wilhdrawals. a special case of censoring. which is gcnenally dealt with. at lhe analysis stage. through survival models. nus leads to adoption of the hazard ratio as the measure of relative chanec of disease or death in most mhort studies. Provided that pn>c:xisting cascs of disease at baseline are excluded. incident disease is measured in a cohort study. Cohort studies can be made more informative by ~measuriRJ; lhe eohort after baseline. in which case they are often called longiludinQI studies (see LONOII1IDINAL DATA). The repeat measurements may be used to obtain a man: accurate picture oftbe IrUc association between the risk factor (which will often change over lime: e.g. smokers may quit during follow-up) and the disease outcome. pc:dmps through lhe use of mixed eR'c:cts models. Sometimes only a subsample of the cohort is re-mcasured to enable CXII'IeCtion of REClRESSION DlWTION BlAS. A majOl' question in many epidemiological investigations is whether it is masonable to conclude that the hypothesised risk factor causes thediseasc in question (see also CAUSAurY). This issue was acldrcsscd by one or the pionc:ers or medical statistics and modem epidemiology. Sir Austin Bmdford Hill. In 1965 he proposed a set of principles that describe ideal
conditiom for verifying a risk factor hypolbesis (see also BRADfORD HILL'S CRITERIA). Some of these principles have been criticised as vague or impmctical and often have been misinterpreted as rules. rathcrthan guidelines. However. their use is Widespread, if sometimes unconsciously. The principles include then: beiRJ;: a strong association between the pulali\'e risk factor and the disease (e.g. a large ~lative risk): consistency of the association in different setliRJ;s (e.g. time and place): ~vembility or the association (e.g. if the risk factor is removed lhe disease should have less likelihood or oecurring): evidence of the: risk factOl' pn:cediRJ; the disease (rather than vice versa): evidence or a biological gradient (a dosc-n:sponse effect. meaning the more the risk factor. Ihc mo~ the disease); biological plausibility for the hypothesis (even if the mechanism is not yet understood): and IKk or an altemDIive explanation for lhe observed association (e.g. confoundiRJ;). If all these principles hold. most epidemiologists would accept that there is truly good evidence to conclude causality. In ~allife, the situation is onen less clear-cut and information on certain principles may be Jacking or impossible to collect. so that judicious application of Bradford HiII's framework is requin:cl. In addition. their use has changed over time. For instance. relative risks of much lower levels than Bradford Hill seems to have envisaged are now routinely interpn:tcd as supporliRJ; a causal hypothesis. for imlance in debates over the etTects of passive smakiRJ;. Con\ICnCly, the principle of consiStency is now given mo~ importance than in Bradford Hill's time due to the growth in popularity of MET.'-AN.'LYSf5. as well as the developments in communication of ~sean:h ftndiRJ;s. Still the: bottom line remains: anyone undertaking an epidemiological investigation would do well to judge their work against Bradford Hill's principles before aIIempling to ascribe causality. TWo important featun:s of Bradford Hill's principles an: the acknowledgment that data alone an: insuflic:ient (medical or olher biological knowledge is crucial) and that ftnal conclusions can only be drawn with comprchemh'C metaanaIyscs. SiRJ;le epidemiolOgical studies may rail to find a significant relDiionship due to small numbers. confounding effects or olher biases. It is only by considering a range of studies that sensible conclusions may be drawn. not only because or the ~Iiability of estimation afforded by large numbers. but also because variations in faults might be explained by comparing n:sults with study chanacteristics. such as by USing META-REORESSION. When epidemiological results an: combined in rnela-analysis it is usual to restrict use to casc-c:ontrol and cohort studies. because of lhe myriad biases inherent in ecological and CI'OS.W1CClional designs. Even then. when then: is a sufficient number of cohort studies. the tendency is to draw final "best evidence' conclusions from the cohort studies (aDd. whe~ available. neSlc:d case-conlrol studies) alone. MW
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ERRORS IN HYPOTHESIS TESTS AIId-, J. (cd.) 1994: 'lilt! ~pidemiDlognJ imtIginllliorl. BuckiDlhim: Open Univasily Pm:t.. B........... A. 1065: '11Ie CIlvn. menl and disease: lSIDCiltianorclllllliall? Pr««tliIrls D/11w Rt1yQI Soddy ofMerJidM Sa. 295. a..., S. P. W. 195I:.JaIua Snow. die BIOid SlRet pump aad after. 'I'M I4ftiktJl 0jJI«r 99, 347.,..9. a.,toa, Do ad . . ., M. 1993: SIflIir,imlIfHIIkuilrqiJlmriology. Oxford: 0xfanI UDivasity Iftss.. McEvoy, & P.,~M. R., MeCarttt A. T., Woodward, M., ......... P. ad eeranll. R. 2005: Ro1eofmabilc phanesia motor YCbicIccnshe.s iauItiJiI in hospital aamclmxe: a cuc-mJIIOYCI' Bri,irb I4rt1iraJ JIIItIfIIII 331, 428. ......n, K. J., GnnI8Dd, S. ad ....... T. L 2008: AI"'" ~p_miDIogy. 3nI edition. Philadelphia, PA: LippiIXClG W'dliams aad Wilkiu. st ......... A. s., C.......,
c.,,......., stud,.
A. L ad M~ P. 1979: Factors lSIDCialalwilh CIIdiIc: lDDIIIIity in de\'CIopcd CCIUIIIries with particular ~fcmxe 10 Ihc CXIRSUIIIpIioD of wine.. Llmce, i, 1017-20. StaIley, P. Do ... .....,. T. 1995: ""'f!~tiga'iIr& db«ur ptltlnlU. TIle JrWlr« of epitlmriology. New Ya: W H FReIDlll• .,.....-...... H., WOOIIwm'd, Me, ,........., Re, A'BraaII. R. ad MeClulby. M. J(. 1997: Comparison of die ~ by 27 dift'eraal flClars of cDI'GDIII')' bean disease aad death in mea ad women of die Scouish Heart Health Stud,: cohort study. BriliM lI~tlimJ J1lllfllQ/11S, 722-9. Wood...... Me 200S: Epftiemiology: jlllli)· tlesign ad tIa,a tllfQIy~i•• 2nd cditian. Baea RaIon., FL: Chapman and HaW CRCPRss.
These an: called TYPE I ERRCIS and TYPE U EIUIORS n:spc:clively (sec dac table). Ncyman and Pearson suueSIcd that by fixilli. in advllDCC. Ihe 'IYPc I (a) and 1)pc II (/J) CII'OI' rates., invelligalon wauld limit the number ofmiSlakes made over many difi"c:renl experiments. The fil~ shows the probabilities or occurnmcc of Ihese lwo lypesof~. (ora lest at Ihe SCJ, level, iD thDcantext ora nonDaOy distributed statistic such as the cIifrcn:nce bc:twcen . two means. TIle P-VALUE (silniftcancc level) equals die probability or occurreace 01 a result u extmne u or man: cxtn:mc than that obsc:rvcd if b Dull hypolhe. wen: true. Porcxamp)e. put(a)showsthal~ isaS" probabililythat samplins variation alonc win lead toa n:sult signiftcaat alb 5f), level (P
(
EQS Sec 51RUC1U1tAL EQUA1IDN MODfllINO SOf'IWAIE
equivalence .tudles
o
Sec ACI1YE CONI1IOL EQUlVA-
I.DItE S'I1JDIES
811'0" In hypot....l. tests Nc)'lDlll1 and Pearson (19.33) proposed that Ihc: subjective view 01 Ihc: SlIenlda of cvidence &pi_ the NULL IIYPOIHESIS inherent in Fisher's signiflcancc lcsIs be rcplacal with an objccti\'C. aisi.,.. lhc:on:lic appmach 10 the: &'aulls o( cxperimc:Dts. In this
the inveslillllOr cleddcs. in advance. a rule ahat SIaICS when Ihe null hypothesis (C.I. that there is no association between a risk raelor' and a discaseaulcomc. orllOCffect o( a lR:atmcnl) will be n:jc:ctcd or DOl n:jccted (aa:epk:d). The null hypalhesis may be rejc:ctcd when it is in fact true. 01' allemalively we may fail to n:jecl it when il is false. approac~
2.5
emn In hJpoIhesla . . . ProbIIbiIiIies otoccunence of 1yPe land Type II fHfOts, for a test at the 5% level
NulI"""'IreJi~ & trw
Reject null hypothesis
'T'J,w , ~r"".
Do IICII ~jed auII bypathcsis
(prallability = sipificlllClC 1e\reI) Correct eoaclusioa (pIabability = I - sipificanoe _I)
c~, CIIIIcluJioll
(prabebility =power) 'T'Jpt II ~"'". (pnJbability = I -pcM'CI')
153
ETHICAL REVIEW OOMMrTTEES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
The SCCXIIId type of CI'RII' is thai the null hypothesis is DOl n:jcded when it is false. This OCCW'5 bc:c:ausc or overlap bc:Iw=n Ibc real samplilll dislributian or thc sample differencc aboul the population dilTerencc. d (;' 0). and the acccplanCc regioa ror the null hypothesis based on the hypothesised samplilll clislribulion about the incOl'lUt diffi:n:ncc, O. Tbis is illustrated in part (b). '1be shaded araa shows the prapartion ~) of abe real samplins dislributioa that would fall within abe aca:planCc region for the null hypothesis. i.e. that would appear consistent with the null hypothesis at the S fj'f, level. The probability thai we do nol make a 1)pe II CITOI" (100 - /JfI.) cquals the lOWER oflhc test. In the figure the followins holds. Part (a) shows a 1)pc J CITOr: null hypothesis (NH) is Irlle and the population difrCRncc =O.1hc curvcshows the sampliq clislribUlionofthe sample diffcrence. The shaded areas (lolal 5 fj'f,) givc the probabililY that the null hypothesiS is wnmgly n:jeclCd.. Part (b) shows a 1)pe II cnor: null hypothesis is /tlI~e and the population diR'CRIICIC = d p: O. The mnliDuous curve shows the real sampling clisbibution of the samplc diffCRncc. while the dashed curvc shows the samplilll dillribUlion under the null hypolhcsis. The shaded araa is the probability (/JfI;) that the null hypothesis fails to be rejected. JS NeJ-, J. and Ptanon, E. 1933: On the prdJIem of the most dticicntlCSlSof statistical hypolbcses. PhiltUDp/lklllTrtlllStltlioluof 1M RoyIIl S«iel)'. Serle, A 231. 219-337.
ethical review committees
These commiHccs are formally mnslilUlm and cmpowcn:d poups charpcl with \'CUing and approving n:scan:h protocols prior 10 study initiation tOC8SlR sound clhics. 'l1Ic World Medical Asmcialion Declaration of Helsinki (1001) laIuin:s that experimental praIocols be submiu.c:cl to an ·cthical review commillcc. which mUSI be iaclcpenclcnt of the investiptor. the spaIISGI" or any oIhcrkind ofunduc inftuence'. Such commiUc:csDft: variously lmown as institutional n:view baan:Is (IRBs). n:scarch ethics boards (REBs) and hwnan n:scan:h ethics commilk:es (HRECs). In the UK. resean:h echics committees (RECs) rail under abc auspices of health aulharilics. ensuring indepcndcacc flUID un"'enilies and lCaching hospitals. The Natiomal Resean:h Ethics Service (NRES) 0W7SCCS abc activities of aboul 100 aulharisccl commilb:cs. a minority or which are 'ec:ogniscd' 10 n:vicw applications for clinical trials of investigational medicinal products (CTlMPs). Histaric:ally. in the UK.. a Ccalrai OO1ce for Rcscan:h Ethics Commiltccs (COREe) fulfilled this role or o\'CI'SCCilll n:gionally based Mulliccntre and (a much laJpI' number of) Loc:aI Rcscarch Ethics Commillces (abbreviated MRECs and LREes respectively) (sec thc National Research Ethics Service at hltp':llwww.nreLnpsa.nhs.uki). The pcmcivccl role or ethical review commillccs in practice varies bath betwc:cn and within countries. n.c Declaralion of Helsinki requires that mediad research
'conrarm 10 Ic:aerally acccplcd scicnliftc principles'. It is therefore the n:sponsibility or the mmmillce either 10 assess the sc:icnlirlC merit or each llUdy or to satisfy itself that sufficient assessment has been uncIertaken. Altman (1980) argued thai pooI'USC or Slatislies in meclic:al research is unethical. F'1ISl, ir n:sults cannot be lrUItcd then the process is at best a waste of participants' time and may entail risk 10 participants without any possible beneftL Second. the process has also waslCd scan:e research n:sourccs. P"maUy. publication of incorn:ct conclusions may block or mislead future n:sc.an:h. n:sultilll in subslandard palient can: in the long term. Slalislic:ians on ethical n:vicw commillccs pay particular allcnlion 10 the design of proposed studies: Is the design appropriate 10 the aims and are there mssanable safe""" 10 limil palcntial confoundins and BIAS? Unlike many mon of analysis and inlerprdation. fundamental cmxs in design cannot be remedied at a later slap. Vail (1998) clc:scribed several issues arising in prKtice ror both experimental and observational studies. In thcnpeUlic intervcntion sluclies.the unbiased allocation orparticipants 10 groups is esscntialto avoid selection BIAS. The stalislic:ian should CRAft the use of RANDOMISATION or MINIMISA110M and the mncealment of the randomizalion pr0cess so lhat the next allac:ation cannot be reliably guessed. BUNDINO or cliniCians. patients and oulcomc assessment. including the use or JII..A(UC) therapies. whCR appropriate. should be explained or the ablCllt'C of blinding justifted expJicidy. It is also the slDlisticiu"s role to cnsun: that. wa.m, intcrYcnlion studies dilfcr rrom the stanclanl twopoup parallcltrial. jusliftcation is given and results will be valid. For example. misuse or CADSS-ovER TRIALS and railun: 10 n:copisc clustering (sec CLUS1U ANALYSIS IN MEDIC1NE) arc common. In observational sludies the dcftnition and selcction of saudy groups is key and must be appmpriaac to the study aims.. Usc of individual-level malchilll should be explained. 11Ic Slatislic:ian should also enslR that common soura:s of bias. such as REORESSJOH TO 11IE MfAN and the HAW1"IIDRHE EFFECT. have bc:cn adequately acldn:sscd. In all quantilDlive studies the proposed sample size is a Slatistical issue. An excessively lillie (ovcrpowcn:d) saudy would waslc resoun:es and unnecessarily delay dissemination or polcntially useful Rncli.s. In prac:lice, such studies arc nrc. More commonly. a study that is 100 small (underpowered) risks patient involvement aad n:soun:es with litde chance or findilll userul results. Newell (1978) argues thal unclerpowcn:d studies involving additional discomfort or risk 10 the palient arc unethical. WhelaS many stalistic:ians consider all undcrpowcn:d Sludics 10 be unethical, this view is not univenally held (Edwards el QI•• 1998). e\'Cn for experimental studies. In gcneral, statistical power calculations n:quire several ·guesstimates· and thCR is noconscasus
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ETHICS AND aJNICAL TRIALS
on appropriate power, although 80~ is oltca considcn:d a minimum (sec: SlUIPt.£ 512£ DEl'ERMlNAno.~ enllies). The: elhical review commiUee's slatistical role: in assessing analysis of studies is DOl easily dermc:d. Occasionally it will be etc. fium a proposal that the planned analysis wiD be: invalid or will be misinlapreleci. Far example. the invesligalols may propose to conclude equiyalence: of treatments from an undcrpowcn:d. nonsignificlll1l camparison. may confuse a~ cialion with causalion. may not m:ognisc a need for case-mix adjustmc:nt or may simply propose: a malhcmatic:ally inappropriate analysis. In such cases., it may be: CXIIWidemi unethical to allow Ihe projc:c:t to proeccd without an undertaking to usc appropriate Slatistical analyses. Man: UMBUy. proposed analyses an: sufliciently vague to cover valid as well as inapproprialc: analyses and individual committees vary in the exlenl to which they require detailed analysis plans. Although there are good reasons for including a statistician in the me:mbcnhip of c:thical ~view committces. Williamson el aL (2000) found low ~resentation in a UK survey. Possible reasons for this include a shortage of slatisticians who an: qualiftcd, available and willing 10 become involved. It may also rel1cct a lack of awareness of the benefits of slalistical input: only 43 (2"') of 148 rapondcnts without a slatislician on lheir commitlcc: considen:d that they nceded one. In the UK. officialguidanec since lhis survey requires each National Health Service REC to include expertise such that .the: rationale. aims, objectives and design of the n:scarch proposals •.. can be: effectively reconciled with the dignity. rights, safcty and wellbeing' ofparlicipants, but falls short of requiring the: input of a sIalislician. AV (See also Enucs AND CUNiCAL lRIAlS. SlUnu. SIZE DmIWIN.'. TION IN CUNICAL 11UALS. $,UIPI.E SIZE DETEIlMINI\no.~ IN CLusrER IlANtXIflOSED TRIALS. SAMPLE SIZE. DE'l"BWINATKlN IN OBSERVA.
TIONAL STUDIES)
Altmln. D. G. 1980: Slatistics and ethics in medical n:sean:h: misuse of statislics is uDdhkal. BritUh Ml!tiit.Yl1 Joumal 281, 1182-4. Edw.... S. J ..... LIford. R. J., Bra...-..a. Do A.. JIICksoa, J. Co. He. . . . J. ad 'IbDrDtoa, J. 1998: Elhical issues in the design and conduct of randomised clinicallrials. Healilt Technology A:sse:ssment 2. IS. Newell, Do J. 1978: 'JYpe II crrus aad ethics. Srilish Medical Journai2. 1719. Vall, A. 1998: ExpericIwcs of a bioslalistician on a UK research ethics committee. SIDlislirs in Metlidne 17.2811-14. WUUUIICtIlt P., Hattoa.J. L, B.... J., BIUBt, Jot C...plNll. M. J. aad Nlcbolsoa, R. 2000: Slatistical review b)' research ethics committees. JournDI of lite RO),DI SIatutifD/ SoC'iely. Series A 163, S-I l. World Medial A.... CI.tIoD Dlelar.U.D of IIeIstDkI 2008: EthifD/ principleJ for medit.Yl/ T'eJetll'th iRl'O/v;"g human subjel.'ts. Seoul: S9th WMA General Assembly.
ethics and clinical btals Ethics and statistics meet one another head on. not exclusiyely but most acutely. in CLINICAl. tRIAlS. At first glance. one: would think that the pair
or disciplines. statistics and ethics. ~ poles apart. lin1ccd at best tenuously by their common. though misplacccl. perception as being ne:cessary but peripheral topics within a medical course: curriculum. However. whilc their diJTerences may be obvious. there an: swprisiDg similarities linking these db'c:rsc: subjects. They arc concc:mcd. mspcctively. with the noble pursuits of what is lJUc (at least numc:ric-bascd truth) and what is right. both amid uncertainty. For if there were: DO uncertainty. there would be: nothing to pUmIC. One: discipline appc:aIs to PROIL.'BD.ITY to describe what might. mayor could happen; the other appeals to morality to describe: what ought. musl or should happen. Clinical trials arc experiments incorporating a delicate:. three-part mixture of theory. practice: and ethics. with the ulmost importance attached to matters of c:thics. in view of the pricc:lc:ss nature of the experimental units involvcd. 11Iey raise ethical questions before they slalt, while still in )HOIress. when they end and. often. long after they ~ finished too. whereas thc:on:ticaJ and pradical aspects tend to be more limited in their scopes. In an agricultural bial, it does not realty matter ira field of wheat perishes under. say. fertiliser A. One: may actually be: quite happy to gain a clear-cut n:sult thai fertiliser B is superior to A. with happiness in\'ersc:ly proportional to the: magnitude of the P-VAl.UE. Howc:ycr, when comparing drug A and drug B in a clinical setting. one must consider the: people inyolvcd. not forgetting that statistical'succcss' and ·failu~' outcomes could be euphemisms for a patient's life and death. One: may be happy to demonslralc: a statistically signific:ant difference be:twc:c:n In:almc:nt groups and hence dc:cl~ a positive n:suiL Then apin. one: might think. 'Could not this conclusion have been reached sooncr. with similar confidence. yet sparing the lives or some of those randomiscd to the inferior tmatmcnt?' This line: of reasoning has molivated many slatisticians to conduct n:sc:arch into DATA-DEPENDENT DESIONS for clinical trials, including ADAPI1VE DESIONS. sequential methods (sec SEQUEH11AL ANALYSIS) and Bayesian and dccision-lheoretic approacbc:slo the: desigD and analysis of trials (sec: BAYESIAN METHODS). Ethical concc:ms ha\·e. of COIII'5G. been around for a longer time: than slatislics and the modena controlled clinicaillial. A number of altcmpts to codify ethics for medical rescardJ in gcncnd (not jusl trials) havc been made. 11Ic: most ramous such codc is the: Wortd Medical Association (WMA) Declaration of Helsinki. first adopted in 1964 and updated periodically (1975, 1983, 1989, 1996.2000,2002,2004 and 20(8). An online venion can be found at htlp:l/www.wma. nctIenl30publicationsll0p0licic:slbllindc:x.html. As a fon:.runner 10 this Declanlion. in the aftermDlh of wartime: atrocities in Nazi-occupicd stalc:s.. another set ofintemational guidelines applicable to all types of medical racan:h. the: Nuremberg Code. wasestablishc:d. This has primary focus on the desire to protect defenceless subjects rrom unwilling
155
ETHICS AND CLINICAL TRIALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ participalion in 'resca.m'. Sec. for example. htlp:l/ohsr.ad. nih.gov/guidclineslnun:mbel'g.html for dclails. Clearly. IhoUlh, no lei of n:gulatians can be suflicicnt to guide resean:hcrsconduclinl an)' particular clinicallrial and 10 prutect its volunlcels. Hcnce.m:enl decades ha~ seen the emcqence and IrowIh of Instilulional Review Boards or. more generally. E1IICAL RE\'JEW C'O.WoIIrI'EB. Their mmil is to scrulinise saud)' proposals. GIl a casc-by-case basis. prior to gnntiq Shady appmval and 10 monitor ongoing raearch anec underwa),. The: remainder of Ihis enlly will dcseribe some pneral ethical principles. Ihcn outiiDc imponDal applicalions to clinicallrials. emphasisiq statistical aspects. so that aspirinllaCan:hcn may become familiar wilh lhc:sc main areas for alRsiclcration when clcsilniq future clinical 1riaIs. Punhcr details and discussion can be found in Edwards el
al. (1998).
Biaclhicists ha~ de~laped various IdS of principles by which to eWllualc IIKIIBI aspects of resean:h. One such sct. the ~four pillan of lIKIIBIity·. am discussed al Ic:agth by Oillon (1998) and by Beauchamp and Childress (2001). These pillars 8M QUlonomy (respecting a patient's right to self-Iovern). benejicen« (doing good), nolflllaliji«nce (avoiding hann) andjuslicr (being fair). AD alternative set of duec principles sc:eks to solve ethical decision-making problems b), appc:alinlto utilitarianillD (a consequcntialist approach asking ·What maximises total sood minus total harm?') and dUly-basccl and rights-basc:d deontological principles. In brief. duty-based deontology aqucs for doing that which is intrinsicall), righl (askiq 'Is it right or WIVII&?') whc:n:as a rights-based deontological approach bases decision-making on whc:1hcr peaple an: treated applUJJlilllcly (askinl 'Is he or she wroqed?'). Thoughtful application of Ihcse lids of principles can help when faced with ethical quandaries. DDllimiteclto the conduct of clinical aaearch. 11Iey serYe useful purposes for those involved in trials, for the)' can be appealed to far jusail'ying numerous aaearch-mlatcd concepts. Amonl these 8M the role of lW1XW15.o\nON. the need ror obtaining inf'ormc:d consent. aIXlCplabilit)' of blinding and use or PLACEBOS. and indeed the raliODllIc bchincllrials in Ihc first place. Statisticians. byconlrasl.tend to bclicve a sct oflwoethical principles 10 be preferable to tine or four. and make the simpler-dichotomy into collectire elhks and int/ipilllia/ elhics (lennS coined by Schwartz. Flamant and Lellouch. 1980). Applie:d 10 clinical trials. these conCc:plsc:quate to doing what is right and bell for fUhIre patic:nls (those who stand to beneRt from the raults of a trial) and claing what is rilht and best for current patients (the volunteers in the .nal) n:spc:ctively. IncIcc:d. a clinical trial can be thoughl of as a balance in delicate c:quilibrium between these two types of ethics (Pococ:k. 1913). Collc:ctive e:lhics. also known as n:scan:h ethics. and individual ethics tend to be in direct competition
with one: analhc:r. If one IICIheRcI purel)' to collective ethics. there would be Ullllt"CCplable human sambs. but equally. if one adhcnxl purely to individual ethics.. Ihc:rc wauld be liltle scope fOl' makiq medical prop:5L According lo indi'Vidual ethics. each palient in a trial should m:eive the best possible treatment. whereas according to colle:ctive ethics. each trial should )'ield the best scienlilic n:sult possible. The tension is clear. A doctor rightly has lo pay gn:ateslllllenlion 10 the needs of his or her patients. This is the essence of Ihc Hippocratic Oath and full)' suppmtcd by the abcwc-menlioncd Declaration of Helsinki. Amonl its pra:c:plS is 'In resean:h 011 man. Ihc interests or scic:ncc and society should ne~ take pn:cedencc over considerations related to the well-being of the: subject', or, in othcl' wonts. collc:ctive ethics can never be allowed to usurp individual ethics for the sake: of scientific
endeavour. Tuming to applications of c:lhics in clinical trials. it is convcnic:nl 10 Calelarise IIIX'OIding to whether primarily atTc:clinglhc period befom. cluriq or after the Irial's ra:nlitment phase. Cbronolop:ally. Ihc lina edlieal consideration. then, is whc:thc:r a pIVpOsc:d trial should be conducted al all. Quite often Ihcre is onl), a limited window or opportunily (meaning in calendartimc) in which Ioconcluct anuadomiscd controlled lrial, assuming of course the situation is one whemin nmcIomisalian is itself acccplable. Why is them only a limitecllimc window? Elhicisls have coined the phrase. and nalion, of cliniml etpl;poix. It is a Pft:Conditian for initialing a clinicallrial and mfen to the: balance of clinical opinian among all dacton that needs 10 exist before it is ethical for a trial to begin. ThUs. while it ma)' not be: possible fOl' anyone: doc:lar to be perfectly balanced in their own mind conc:eminl the relative merits of two or more trcatmenlS. perhaps including investigatianal new ones~ it is quite possible: that oIhcr doclan have pn:ferences for the ditTen:nl Raiments involved. Hence. a planned trial can be consiclcml ethical on the basis of divc:rgc:acc of opinions. eve:D if one has a slighl pcrsanaI pmference, Ihough DDt yet an)' finn evidence. in faYOW' or one panicular In:atmcnt. If. ~ver. Ihc weipa or clinical opinion is too heavily in fll~ of a gival In:aImcDI it can became 100 laic 10 seize the opportunity 10 undcdake a rigorous clinicallrial. In tuna. this means that the chance 10 sc:cure beSI qualily e\'iclc:ncc to confinn. or overturn. such clinical (mere) opinian may be IosL EvIDENCE-1Wm MEDICINE sc:eb 10 alRvClt opinions (expc:ricnce-basccl hunches. beliefs or gut feelings) into IDOM objc:cti~ly held e'Viclc:ncc. based on sound data collection. latbm:d suPlmlely thnJugh randomised controlled trials. One certain point for debate at mec:tiqs or ethics commillc:cS relates to Ihc process for obtaining each palic:nt's informed alRSCDI to palticipale. nus is not just because of the Nurcmbel'g code. but a (happy) consequence or the:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ETHICS AND aJNICAL TRIALS
composition of ethics committees being mandalcd to include a number of laypeople. Those nonmc:dically InIincd people mayor may not understand the inlricacies of In:atmcnls involved in a n:sean:b proposal. but they will dcfinilely be able to identify with prospective participants. Hence. c:ommillees will include discussion on the all- important patient inConnation sheet. or its equivalent. that outlines in everyday language the risks and benefits involved in trial participation. Infonned consent from participants is one of Ihc most imporlanl safeguank built into clinical trials. It means that subjects never take part in a trial against their will (and for this n:ason il can be helpful to mer to participants as "'oluntcers').ldeally consent should mean written. fully infonncd consent. although the~ are circumstances. usually emergencies. when wilncsscd verbal consenl has 10 suffice. The~ are complications when the subjects oflacarc:h include childn:n. or those who arc mental I), ill, or comatose. or otherwise unable to undcrsland the full implicatioM of a;n:eing to enter a trial. In such cascs. a proxy has to be appointed to serve as a spokesperson. Ethical matlers in general are heightened when dealing with special populations such as those just mentioned. In addition, one can add prisoners and medical students as special populations when undertaking researcb. Admittedly. it is unusual to put sucb poupslOgether in the same bn:ath. but in both cases. though for dilTen:nt n:asons, then: is a possible sense of coercion involved. meaning that one has to be extra careful when going about obtaining an indivKlual's consen.. Similarly. trials sponsored by developed nations that are to be conduclcd in developing nalions are another major SDUI'tlC of ethical conftict. lbis is especially true if wealthier nations stand to benefil more than the participating populations from the ~5Ulls of such foreipbased trials. Another mallei' is exac:lly how much information is necessary to impart 10 the subjectsatlhe time of trial n:cruilmcnt. II surely includes ~laying the uncenainty about whichcoune of treatment is truly the best (for otherwise why perform a trial?) and that by volunteering they would be helping in the punail of medical ~ras to try and remove some of that uncertainty. The act of giving informed consent is a sIalement from or on behalf of the trial participant that allows n:scarc:hcrs to seek entry into the trial. Scing nandomised is not yet guamntced, however. as ~ ~ strict eligibility n:qui~ents (sec INCLUSION AND EXa.u5KlN CIUTERIA) that may need to be checked aflcr obtaining consent. ObViously, no subjc:d should be forced into participating. but from the scientiftc viewpoinl it is preferable to havc as high a proportion as possible of Ihasc invited ~ on to participalc. This is analogous to scc:king a high ~ nile in a sample suncy - it reduces selection bias and enhances the generalisability of
the trial's results (see BIAS, Q~NAlRES). Part of the informed consent process sbould include a brier justification of the need for randomisation within the trial as a general principle. In addition. palients must be informed of specific details of what will be expected of them, the likely risks involved. together with reassurancc thal they can withdraw from the trial at any lime without compromising their future treatment or care. So far informed consent has been discussed with repnllo the prospective palienL It is their opportunity to decline to take part in the research. Next. consider the related dec:ision that an invaligator conducting ~searc:b may ha"e to make prior to invol\'ement in a trial. Sometimes doclols or oIhcrs are approached 10 become collaborators with someone else's research project. Then: is no formal equivalent 10 scc:uring consent among investigators. but there ~ ethical considcntions to be borne in mind. Chicfty. there is what is amusingly known as the uncle I~JI for rQlftlomimlion. It calls for the trial's investigators 10 answer affirmatively the question: 'Would you be willing to randomise either yourself or a close relative of youn (parent. spouse. sibling. child. uncle. etc.) into this trial?' That is. it seeks assurance that one's individual ~fen:nc:es are not so heavily biased towards one of the lIeatments that then: would be a ~Iuc lance take the risk of receiving the least-favoured tn:alment for themselves or someone close 10 them. If investigators cannot bonestly answer 'ycs' to the uncle test then they simply should not enrol any patients into the trial (and certainly not succumb 10 financial induc:lcments or other temptations to do so if the trial happens to be sponsored by deep-pocketed sources). The use of PLACUOS (inert substances made to mimic the appc:anmc::e of aclive lIeatmcnts) and of I'UCEBO RUN-IN periods (10 assess a patient's compliaJlClC with tn:alment schedules. possibly involving the withholding of their usual medication) within lrials ~ controversial an:u. attracting much attention frum those with ethical viewpoints. Note that a footnote to the most recent version of the Dc:claralion of Helsinki addn:sscs the use of placebos in clinical trials. arguing in their favour in the right circumstances. The choice of an active l~tment control or a placebo control group is an exampleofhow trials are an amalgamation of theory. practice and ethics. Statistically. one needs fewer subjects to demonstrate a diffe~nc:e in eflic:acy between a new dnag and a placebo than between two aclive drugs. However. the decision whether to use a placebo or not must not be dictalCd solely by sample size consKlerations. but primarily by whether it is ethically acceptable to put patients deliberately on to inactive treatment lqimens. nis further exemplifies the tension between individual and collective ethics. Similarly. the use of placebo run-iM prior to randomisation. while not nc:ccssarily completely unjustifiable. is harder to defend on ethical grounds (Senn. 1997).
157
EVIDENCE-BASED MEDICINE (EBM) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
The actual conduct of a clinical trial must be to highest scientific standards, for without such rigour the trial is eDlDpJOmised aad Ihc results unable to contribute meaningfully to medical progress. Man: positively. the wellconducted nmdomisc:d controlled trial is rightly reckoned to be: the most reliable source of evidence for or aeainst any treatmenL Note this includes adhering to the clinical trial's PROIOC(1. using a proper method for RANDOMISATION aDd. if blinded. employing suirable means to conceal b'eatment allocation elTectively (sec BLINDING). The protocol. among OIber thinp. must include Ihc hypolhcses beilll investigated along with the primary and secondary outcomes. It is unethical to look first at one's data and thendcclde to promote in impol1ance a nonprimary outcome and demote the primary outcome. based on Ihc actual nndings. 1be excuse that unforeseen results may be more interesting does not justify departilll from Ihc protocol when publishing the study. Sc:e POST HOC ANALYSES for further discussion. A mailer not always giVCD full consideration is what to do with samples that are collected during Ihc: coone or the clinical trial. Ideally. the protocol would specify not only when: and for how long sensitive information. or equaRy blood. or DNA samples. CIe•• are ston:d. but who has acccssto them in Ihc futun: and under what conditions. Foe this ethical reason of protecting patient confidentiality. it is important that dala aad samples are stored securely and suitably coded. or even anonymiscd a1togedler. For the bc:aefit or future ~rs. however. it is preferable whenever possible to have a system that allows IICXlCSS to indiYidual patient information. This is a matter dlat should be included within the patient informed consent prucc:ss if JOllier tcnn research use of patients" data. or samples. is enYisaged. Anolhcr issue is whether a trial should be: allowed the chance to stop earlier than planned. That is. can patient recruitment cease before the originally anticipated n:cruitment levels havc been ~achcd? Scientifically. there is a penalty for stoppilll a trial unexpectedly once initiated. so only in extreme cases arc trials interrupted. 11Iere can be pre5SU~. e.g. from discase-spc:cirlc patient support groups. to expedite a drug development process. "nIis arises in part fi'om Ihc long timespan involved before a promising treatment can be safely marketed. 11Ic: statistical implication of planning possibly to stop early is best described by analogy with making MULnFLE COMmliSONS. Testing a dataset aceording to many different subsets will yield many false positive statistically significant findings. Similarly. multiple looks at the data at sevcrallNTERIM ANALYS5S. each with the opportunity to stop the trial. make it mon:. likely that an apparent treatment difference would e:merge when in fact then: is none. 11Ic: penalty for unplanned early stoppilll is on the collective clhicalside orthe balance widl individual ethics. for it is future poIic:nts who an: denied the opportunity to learn more about the tn:almcnts under investigation. Then.
again. one has also to consider the nc:c:ds aad rights or patients enlen:d towards the end of the trial. so it is never a
simple choice. The accurate and timely reportillJ; of clinical trials are yet funher matlers with ethical implications. Choosilll not to report a trial because results show sponsors' products unfavourably is without excuse and a clear abuse of ethics. For the vast majority for which publication in journals is sought. lhcn conforming to the CONSDUD.O\TED STAND.O\RDS OF REPoRT. ING TRIALs (CONSORT) STATEMEm' guidelines is helpful. ir not mandatory. as it is be:comilll for a growing number of biomedicaljoumals. An online version of the lalest slalcme:nt can be accessed at http://www.consort-statement.olJ. Authors failing to declan: n:.Jcvant conRicts of interest. financial or otherwise. or bulk-ordering expensive reprints ofan article fromajournal" s offtc:c prior to formal acc:cptance are further examples of inexcusable behaviour at the ~ publication stage. In an editorial in the British Medical Journal. Altman (1994) suggelilcd there was rather too much n:scan:h happening. not all of the highest qualily and not always undertaken for the right reasons. If a clinical trial can be conducted. then it should be. in prefen:ace to an obsc:national study. in onIer to pin the best quality evidence. It must fU'St pass ethical re\'ic:w and be conducted properly. widl suilable sample size, randomisation. analysis plans. etc .• detailed in Ihc protocol. aad it should also be n:portc:d according to highest standards. All this is a lall order. hence another reason for CONSlIL11NO A ST.o\11STICI.o\N carly in the trial's life. for it is notoriously easy to fall into the PlTFALU OF MEDICAL RESEARCH. Finally, rccallilll the link between those not-SCHIisparatc diSCiplines. fOIJe:t not that bad statistics is bad ethics! CRP AI......, D. O. 1994: The scandal of poor medical research (editorial). British MMical JDUTllQI 308, 283-4. Beudwmp, T. ad ebB..., J. lOGI: Prindp/es of biomedical ethics. Sth ediliOlL New Yen: Oxford Univmity Pras. Edwards, S. J ..... LIford. R.J.. Bnnn.... D.A.. JIltbm,J.C.,H........ J.ad'lbomtan. J. 1991: Ethical issues in tbe design and conduct 0( mndomised CCJIlIIOUcd trials. H«IIIIr Tee/lIfoloBY Assessment 2. IS. Online at bttp:/I'A'ww.hta.nhsweb.ahs.uklexecsununlsumm2IS.htm. GUIon, R. I98S: Philosophiml nwical ethics. Chichester: John Wiley & Sons. lJd. Scbwartz. D., Flamml, R. aDd ...110.11. J. 1980: Clini~ollrials. ttanslatcd by Healy. M. J. R. London: Academic ~. 5II1II, S. J. 1997: An: placebo run·ins justified? Brilish MedimIJOIIrno/lI4.1191-l. Poceck,S.J.1983:Clinimllriau: a Chichester: Jom Wiley & Sons. Ltd. Wodd l\Iedlcal AlIGClaaan DeelaraOan 01 Helsinki 2001: Ethiml
PTQ~'iC'llI approoelr.
prinC'ip/eJ Jor met/ita/ re~orclr ina'O/vi"l human subjecls. Seoul: 59th WMA Gc:acral Assembly.
evldence-based medicine (EBM) Evidencebased medicine is the conscientious. explicit aad judicious use of cunent best evidence: in making diseases about the can:
____________________________________________ or individual patients and is the definilion of BBM given by one of its fon:most proponents (Sackeu el III•• 1996). Altc:mative definitions thai havc appeanxl an: very similar. sRssing the aim or assessing and applying ",levant eviclenc:c tor better healthcan: decision making and aJlowing clinicians to practise beller medicine by being awan: orthc evidence in suppad or clinical practice aad the strength or that evidcace. 11Ic: primary lools used to drive eviclcnce-basc:d mc:dicine arc RA.~5EO aJNICAL 'I1UAU and SYSTEMA11C REVIEWS AND MEl'A-ANALYSl5. However. as Sacketl el III. (1996) make clear. evidenc:e-basc:d medicine is nul ~slric::tcd to lhcsc: lools and, in particular cin:umstanccs. the best exlernal evidc:nc:c with which to answer a clinical question may involve CROSS-SEC'llONAL STUDIES. genelic sludies or immunological in\lCSligalions. SSE Sackttt, D. L, 1lcIIIa..... M. c., GraJ, J. A.. ...,..., R. .. ad
RIc......W.I996:Bvidcace.-basedlDCdiciac:wbatitisandwhal it isn't. Brilish MftIit:aJ JDIII'IItI/312. 71-2.
exact methods for categorlca. data
This is a
collective Ienn for analytical melhacls that n:q~ no distributional appmximations in anIc:r to validate Ihc n:sultiRl infen:nc:c. The ·exact' label applies when lhe pmbabilily disbibutiCIII or the appropriate test statistic is fully determined. RIIluiring no assumptions aboul unknown population characteristics and no largo-sample elisbibutiCIIIaI justificalions (e.l. usinl approximate nonnality). For expository n:asons we illustrate the exact approach by focusing primarily on the: P-VAWE yielded by a hypothesis tesl. It should be notc:d, however. that exacl melhods can also be used for estimation and computinl exact CONFIDENCE INlERYALS.
A fundamental problem in statistical inference is summansiRl observed data in lenns of a P-vaIue. 1'hc P-valuc tonns pan or the theory of hypothesis testing and may be rqaudccl as an index for juclginl whelhcr 10 ac:cept or n:jcct the 1U.L 1IYPO'I1IE51S. A very small P-value is indicative of evidence apinst Ihe nuD hypothesis, while a large P-valuc implies that the observed daIa an: compatible with the null hypothesis. 11Ic:n: is a lonclnldition or using die: value 0.05 as the cut-ofT for n:jecliCIII or acceplance or the Dull hypothesis. While this may appear arbilJDry in some contexts. its almost UDivenal adoption for tellinl scientific hypalhcscs has the meril of limiting Ihc number of false-pasitive CCtIIClusions to at most 5 ~. Al any mae. no matler what cul01Tone: chooses. die: P-vaIuc: plUvides an impartant objective input for judpng if the observed data an: statistically significanL 1hen:f'an: it is crucial that this number be compub:d accurately. Sinc:c data may be pthcn:cl ..... divenc. oRen nonvcrifiable. CXlDditions. il is desirable. for P-valuc calculations. 10 make as rcw assumptions as possible about the underlying
EXACTMETHODSFORCATEGOR~DATA
data-generation process. In particular. one wishes to avoid makinldistributional assumptions. such as thai die: dalacamc: from a NORMAL Dl51RJBurJON. This goal has spawned an entin: ficlcl of statistics known as JlDnparamclric statistics. In Ihc pn:racc 10 his baok. NonptJlflmetrks: $IIII;Sl;C'1I1 methods btued on rtlllla. 1.cIunann (1975) lraces Ihe earlie.. clcvelopmc:at of a nonparamdric test 10 Arbuthnot (1710). who came up with the n:markabiy simple yca popular sign test. In the 20th century. nonparamc:lric mclhads m:eived a major impetus from a seminal paper by Plank Wilcoxon (1945) in which be: developed the now uniYClSally adopted Wn.coxON SIGNED RANK TES1" and the Wilcoxon rank sum tesL The contribulions of many n:sc:an:hers advanced this field - an excellent survey of these: developments is liven in Agresti (1992). 11M: n:sc:an=h just mentioned. and the nUJllCl'OUS papc:n. IIIDftOIraphs and aextbaoks that followal in ils wake. deal primarily with hypalhcsis lesIs iDYOlving continuous distributions. The data usually consistc:cl of several iadependent samples of n:aI numbers (possibly CXlDlainilll ties) drawn from dilTen:nt papulations. with the objective or makinl clistribution-fn:c one:. two or K-sample comparisons.. performing goodness-of-ftt lests and computing lIICIISura or association. Much earlier. Karl Pcar.son (1900) demonstrated thai the larp-sample dillributiCIII or a test statislic. based on the diR'en:nc:c between Ihc ~ and expected counts or catqarical data generated from multinomial. hypcl'leomclric or PaSSON DlS11UBl1I1O.'6.. is a au-SQUARE DIS1RI8U11ON. This work was found to be: applicable 10 a whole class of ~te data pR)b1cms. It was followed by many silniflcant contributiOlLl and evcnluaUy evolved into the field ofCalcgorical data analysis. An excellent up-lo-date textbook dealing with this continually lrowing field is AlRsti (2002). The: lechniques of naaparametric and calegorical dala infereDC'e arc popular mainly because lhey make only minimal assumptions about how the clata were generated: assumptions such as independent sampling or randomiscd tn:atment assignmenL For continuous data one docs nol have to know the unclcrlyiq distribution living rise to Ihe data. For calclorical dala mathematical models like the: multinomial. Poisson or hypcr;comelric arise: naturally from the independence assumptions or abe sampled observations. Ncvenheless. ror both the continuous and calegorical cases. these: methods do n:quin: one assumption that is sometimes hanito verify. 11M:y assume that the clalascl is large enough ror the lesl statistic to CODYerge 10 an appropriate limiting narmaI or chi-squan: elisbibutiCIII. P-valucs arc then obtained by evaluatiRl the tail an:a of the IimitiRl disbibution. instead of actually deriving Ihc true distribution of the ICstSlalislic and then evaluating its tail area. P-,oaIucs based on the larg~ sample assumplion arc known as _ymplotiC' P-l'llhles.
151
EXACT METHODS FOR CATEGORICAL DATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
while P-wlues based OR deriving the bUe distribution of the lest slatistk: ~ termed e:tad P-Wllues. While one would prefer to usc exact P-wlues for scieatific inference they often pose fonaidable compulalianai problems and so. as a practical matler, asymptotic P-values an: used in their place. For I~ and wcJl-balanced clatasds this makes very liUle diffen:ace since the cuet and asymptotic P-values an: very similar. HowcYer. for small. spanc:. unbalanced and heavily lied data. the exaet and asymptotic P-vaIuc:s can be quite different and may lead to opposite conclusiolW CXJRCmIing the hypothesiS of intcm:st. This was a major conc:cm of Fisher. who staled in Ihe preface to the find edition of Stalistktll methods lor re:leQF'ch ...-orkers (1925): "The tnaditional machinery of slalistieal processes is wholly unsuilCd to the nc:c:ds of practical raeardJ.. Not only does it lake a cannon to shoot a sparrow. but it misses the spanow! The elaborate mechanism built on the theory of infinitely lmIe samples is not accurate enough for simple laboratory data. Only by systematically tackling small problems on their merits does it seem possible to apply accunte lesls 10 pnICIical data.' ThaI Fisher's concern was justiftc:d is seen from the following example of a 3 )( 9 sparse contingency table: 0700000) I 1 I 1 1 1 1 100 080000000 1be PcanonClU-SQl\O\RETEST is commonly used to lest forrow and colwnn intenH:lion. Far our conUngc:acy table. the obscncd value of Pearson's statistic is 22.29 and the asymptotic P-value is the wi ami to the righl of22.29 from a chisquan: dislribulion with 16 DEOREES OF fREEDOM. This P-value is 0.1342. implying that then: is no lOW and column interaetion. However. we can also compute the tail aMa 10 the right of22.29 from the exact dillribution ofPearson'lstatislic. The exacl P-value so obtaiDcd is 0.0013. implYing thallfllR is a slrong row and colwnn inleraetion. We will c:onceptuaUy describe further on how 10 compute an exact P-value for the chi-square statistic. Howe,,", even without knowing the ICchnicai delails behind such a computation. an invelligalOr comparing lhc: asymptotic and eUd P-values for this example might wonder at the disparity and not know which ~ull is reliable. 11Iisexample highlights the nc:ccI to compute the exact P-value. rather than mying on asymptotic raults, whcac:vcr the dataset is small. sparse. unbalanced or heavily tied. 1'be trouble islhal il is difficult to identify, a priori. that a givea dataset suffers ftum these obslDcles to asymptotic infen:nc:e. The: conc:ems expn:ssed by Fisher and others can be n:soIved if we directly compute exact P-values instead of replacing them by their asymptotic versions and hoping thai these will be accunde. FISher himself suggeslCd the use ofcxad P-values for 2 x 2 tables (1925) as well as for
dala from randomised experiments (1956) (see FlsHER'S EXACT 1EST). for the 2 x 2 table. Fisher proposed pennuting the observed data in all possible ways and comparing what was actually observed to what might have been observed. Thus exact P-wlues ~ also known as permUlalional P-values. We demonstrale here how this IIppI'OIICb can be used 10 obtain the exaet distribution for the commonly used Peanon chi-square tcsL 11M: table shows n:sullsfram an entrance examination for fin: lighten in a small US lown. All five while applicants nx:eived a pass raull. whereas lhc: n:sults for the other groups are mixed. Is Ibis evidence lhal entrance cum results are n:latcd to race? Note thai while then: is some evidence of a pallcm. the total number of obserwlians is only 20. A statistically inclined researcher might procc:cd as follows:
Null hypothe~is. Exam results and independent.
I1ICe
of examinee an:
AlterlfOlire hypolhesis. Exam n:suJts and racc of examinee an: DOl independent.
To lest the hypalhesis or indcpc:ndence. one would onIinarily usc: the Peanoachi-square tesl. 11Ie lest stalistic has the form:
E
(Obsem:d count-Expected count)2 TJIIIe ctIIs Expected counl exact methods for categorical data Entrance examination results for file lighters in a small US town Race Alian
Hispanic
ROM' 'o~aI
2
2
0
I
0
)
9 2
2 S
3
4 S
Test remIts
Wlrile
Blode
Pass
S 0 0 S
Nosbow Fail Column total
5
.,
20
11M: disbibutioa of this lest statistic is tuymplolimlly chisquare. Suppose we would like to conduct the tesl at the 0.05 level or significance for these dala. Running this test. we obtain lhe foilowiDl n:sullS: Peanon chi-squan: Degrees of freedom
Significance
11.5S556 6 0.07265
Because the obscned significance of 0.07265 is larger than 0.05. the I"C:SCIU'dIer would conclude that exam raults an:
____________________________________________ independent or race or examinee. However. rOl' this labIe the minimum expected cell fiequency is 0.5. and aU 12 cells in this bible expc:c1Ccl frequeacics uneler the null hypolhesis less than 5: i.e. sinal all the ceUs in the labIe ba~ small expeck:cl counts, what docs this mean? Docs it matter? The lenn 'BSymptoticaDy' meaas "pw;n a suflicieat sample size'. although il is not easy to describe the sample size nc:ccIcd for the chi-square clistribution to applUximale well the exact clisbibulion of the Pc:anon statistic. Two widely used rules or thumb ~ pw;n by: I. 1bc miRimwn expected cell caunt for all cells should beat Ieasl5. 2. Far lables laqer than 2 )( 2. a minimum expcctal count or 1 is penaissible as lang as no ~ Ihan about 209(, or the ceDs ha~ expc:cled vatucs below 5.
While these aacI 0Iher rules haw; been proposal and stucliccL in the cacI no simple Me CO~ all cases (sec Agn:sti. 2002. for rurther diSCUSSion). In our case.. in lCnDs of sample size. number or cells relalive to sample size or small expeCled counts, it appealS thai relying an an asymplotic n:sult to compute a P-value mighl be problematic. What if. instead or relying on the cbi-sq~ distribulion to appRWmale the P-wlue. it wen: possible to usc the hUe sampling distributian or the test slatistic ancIlIIcRby produce an eud P-value? Hen: we explain in aD inlUitive way how this P-wluc is computed and why il isexacL (For a I1ICR tcchIlicai discussion. see Mehta. 1994.) 1bc:.main iclc:a is to evaluate our l )( 4 cRlSS-labulaaion. n:lative to a 'rcfc:rcnc:e set' of other l x 4 tables thai an: like it in every possible n:specI. excc:pl in tams of their n:asonablcocss under the null hypalhcsis. It is gc:aerally accepIcd that this refc:n:ncc set consists of all 3 )( 4 tables in the form of the obscm:cl table thai have Ihe same lOW and column maqinsas the obscncd table. This isa n:asanable choice far a refcn:ncc set. even when these mal'gias an: nat naaurally fixed in the original dalald. because they do nat contain any infonnation about the null hypothesis being tcstccL We n:fer to this as a conditional eXIICIIIPPrtltlc/r. The eUCl P-value is then obtained by identifying all the tablcs in this n:fcn:nce set whose PelllSDll statiSlics equal or exceed 11.55556. the observed statistic. and summing their prubabililics. 'I1Iis is an exact P-value because the probability of any table in the n:ferencc set of tables with ftxed margins can be compub:d exactly under the null bypolhesis. Far instance. the table: 52:2 0 9
0002 2 o II 3 9 SSS520 is a memberorthe n:fen:nce set. 1bc Peanoa statistic for this table yields a value of 14.67. Since this value is pater than
EXACTMETHODSFORCATEGOR~DATA
the obscrvecI value II.sSS56. we I'Cgard this member of Ihe refen:nce set as ·IIICR CXRIDe' than Ihe observed table. Its exact probability is 0.000 loa aacI will conbibute to the exact P-value. In principle we can n:peat this analysis for every single table in Ihe n:fCRDCC set. identify all those thai an: at least as eXln:me lIS the original table and sum their exact probabilities.. 'I1Ie exact P-value is this sum. In faet. the cuel P-value basccI on Pearson's statistic is 0.0398. Althe 0.05 level or sipificlllC'C. a n:sea.hcr would reject the nuD hypothesis and conclude Ihal then: is evidence that the exam ICSults and race of examinee ~ n:latcd. This conclusion is the opposite of what would be concluded using the lISympialic approach. sillDe the latter praducccl a P-valuc of 0.07625. The asymptotic P-value. hcMwm:r. is anly an approximate eSlimalcortheexac:t P-value. As the sample size goes to inftnity the ex.:t P-value con\'Clles to the cbi-sqUlln:based P-value. Of co~ the sample size for the curn:at dataset is nal infinite and we observe Ihal dais asymptotic result bas fan:d nlher poorly. This conditional appnNICh to exact iafen:nce is curn:ntly the most wielely usecl mcthocI for exacl infen:ncc. However. conditional methods ha\'C their drawbacks. 1bcy can be computationally intensive. Until the advent of modem compuling. eUd teSls were generally infeasible for aDy dataset that was not n:lalively w;ry small. Over the past lwo or thn:e decades. progress in computing technology alang with the development of eflicient algorithms have made conditioaal exact mdhocls available to meR pnclitianen for solving an incrasingly larger class of applied pmblems. V"utually all commercial statistical softwan: packages DOW contaia at least some exact options for aaalysing categorical datL More fundamentally. conditional exacl mclhocIs ~ somelimes criticisccl for lheir aJDsenalism. By cOllSlrUction. exact conditional tests an: guaranlccclto CIODIrOI the 1)pe 1 emil' rate at any clcsirm level. (A 1)pc 1 cnar occum whc:a we enancausly n:jcct a null hypothesis.) 'I1Iis means. for example. that if you consiSlcntly use an exact P-valuc or 0.05 as your cut-orrfor clecicling wt.cthcryaur raults are statistically signilicant. this decision rule will limit your rate of clcclBring "false positiw;s' to at mast 5CJt.. Howew:r. since the exact distribulionofthc 1e51 statistic is clisen:te. you may not be able to achieve a ~ 1 enor rate of exactly s«Jt. Instead. the actual enor rate of your clc:cision rule will Iypically be smaller. nus aJDsemdism. which is eatnly allributable 10 the discreleness of the lest SIDIislic. is the price you pay for exactness. The exlent of the conservatism is not easy to detennine. bcloes DDl maRifeSi ilsclftluaugb exact P-values that are always larger Ihan corresponcIing asymptotic P-values. On the conll'ar)'. there are plenty of examples (including the fin: 8gblei' data shown earlier) when:in the exact P-valuc:s an: substantially smaller than the asymptotic P-values. Conservatism isa slalcmeat about a long-tenn emil' rate. rather than an individual P-value.
iIi
EXACT METHODS FOR CATEGORICAL DATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
How serious is the conservatism? Not many empirie&l studies have been conducted 10 answer this question in gc:aeral. For 2 x 2 tables. the extent of the eonsc:mlIism has been invesligatc:d extc:asively and many alternatives have been proposed oVCl' the years to counter it. None of these oIher approaches has 50 far lII8IUlged 10 dislodge conditional exact tests as the methods of choice. for reasons discussed in Yales (1984) and Barnard (1949. 1989). In fact, it has been demonstrated that the conservatism of both types of tests is ncglilible in this setting. indicating that eonsemllism is also likely to be a negligible factor in the more general rx cseUing. Conservatism is still an issue for the single 2 x 2 contingency table. One way of reducing conservatism that has recently n:ceived gn:ater attention is to use an unconditional approach. Much of Ihe blame for consemllism is auribuled 10 the reference set having fixed row and column margins. In the single 2 x 2 table espccially~ such conditioning makes the distribution of the Ic:st statistic ndher discrete in small samples. On the other hand. eliminating nuisance paramc:tels (e.g. the odds ratio for a 2 x 2 table) by conditioning on their sufficient statistics. i.e.1he row and column rnaqins. is at the bean of exact conditional inference. Unconditional tests usually rely on large sample theory to eliminate nuisance porameten from Ihe distribution of the test stalislic. This is ~ble for large datasets. but may not be accLll1lle for small. spane or unbalanced data. 1bere does exist an alternative unconditional exact approach. h~r. that is valid in small samples. Barnard (1941) proposed an unconditional exact test based on a minimax elimination of the nuisance panmeter. 11te reference set was defined to be the set of all 2 x 2 tables with fixed row margins and all possible column margins. Since the reference set for Baman:I's tesl does not fix the column margins. the distribution of the test statistic is less discrete than would be obtained by permuting the conditional refCl'ence SCI in which both margins me fixed. However. BIII'IUIId was DOl satisfted with his test and disavowed itlWo years later (Barnard. 1949). Barnard was invoking Fisher's principle of ancillaril)'. whereby infCR:IIL"C should be based on hypothetical repetitions of the orilinal experiment. fixing Ihose aspects of the experiment that arc unrelated to the hypothesis undCl'test. In more recent publications Bamard (1989) pro\ides additional arguments apinst the test. Some prominent statisticians have expressed rqret at Barnard's disa\·owal. OIhen conlinue to favour inrc:n:nc:e based on the conditional reference set. At any rate most stalislician.'i agree that the case for conditionilll is especially penuasive with RANDOMISED CL~ICAL lRIAtS. for then one can argue that the sum of the responses fiom the two lRabnent groups is fixed in advance undCl'the null hypothesis; i.e.
subjects predisposed 10 respond will respond regardlessorthe lRabnent m:eivcd. since the two In:alments arc identical under the null hypothesis. Pcnauting the ~ference set merely amounts to assigning the tn:atments to the patients in all poaible ways. Even if Barnard's method were accepted without n:scrvadon. it would be hard to implement for general r x c contingency tables. .t calls for enrichilll the reference set by pc:nniUing the column margins to WI)' and then maximising over the unknown marginal probabilities. Such a process is diflicult. computationally. for tables of highc:r dimension than 2 x 2. Anolhc:r way of add~ng conservatism is by using 'Oexible' signiftc.ance levels. Conservatism really hinges on approaching data with a fixed signiflcance level (say S CJ.,) in mind. Not all Slatisttcians believe in fixed significance levels for decision making. howcver. Fisher (1973) staled: "No scientiftc worker has a fbed level of significance at which from year 10 year. and in all cin:umstaDccs~ he rejc:cts hypotheses: he rathc:r gives his mind to each particular case in the light of his evidence and his ideas: Bamard (1989) has formalised this principle in tenns orthe '8exible Fisher exact lest' for Ihe single 2 x 2 table for comparilll two binomials. He proposes that we choose different significance levels for rejecting Ihe null hypothesis depeading on the observed sum of successes in Ihe two binomial populations. The Oexible FISher exact telil can be shown 10 be equivalent 10 the \'DIious alternatives that ha,'e been proposed over the years to counter the alleged conservatism of FishCl"s exactlc:st for 2 x 2 tables. A third way or controlling conservatism is by usilll a continuity correction. such as the mid-P-value. which is obtained by subtractilll half the probability of the observal statistic from the exact P-value. This modified P-value has been m:ommendc:d by many slDlisticians (see. for example. Baman:I. 1989) as a good compromise between Rporting a poaibly conservative exact P-valuc and relying on a randomised test to eliminate eonsemllism completely. However. the mid-P-value cannot guarantee. theoretically, that the Type 1 error rate will be limilcd to the desired Icvel. By the same Ioken. researchers have shown empirically that midP-valucs do in fact pn:serve the lYpe 1 error rate while reducing the eonliCl'\lBtism ofexact P-values forasingle 2 x 2 contingency lable and in k 2 x 2 contingency bibles. The coverage of the mici-P conftdcnce intervals was not compromised. but they were shortu on average than the c:orrc:sponding exact intervals. CColPSelCMlNP A..-u,A. 1992: A S1U\'CY or euct iafcn:nc:e forcCJlllinlency tables. SlaliJliml Science 7. I, 131-77. ApatI. A. 2002: Cal~gorit:al.'a QII/Ilysis, 2adcdilion. New YOlk: Joba \VaIey & Sons., Inc. ArbatbDat,
J. 1710: AD upuneat for diviae providence. taken from the CCII5Ianl regularity obsem:d in the birth orbada sexes.. PhHosophkal TrtDI.SlKIiDIU 27, 186-90. BaraanI, O. A. 1947: Signific:1IIIOC tesIs for 2 x 2
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ EXPERIMENTAL DESIGN
tables. Bionrelri/co 34. 123-3B. BaraIIrd. O. A. 1949: Stalilltical i_mICe. Journal of 1M ROJQl Sialislim/ Society, Series B 11. 115-39. 8amard. G. A. 1989: On allc;cdpiDsin pRocrfmrn lower P-values. Statislics in Mt!t/icine B. 1469-77. JIbIaer, R. A.. 1925: StalistiC'OI methods for reseorm 'R'Orlcers. EdinburP: Oliver and 80)'<1. Flslm't R. A. 1956: Sta/istical melhodsfor sdenlijk inference. Edinburgh: Oliver and Boyd.11sIIert R. A. 1973: SIOIisliC'Q/ methods ond scientiJlt III/Bente. 3m edition. London: Collier Macmillan. 1At.'na. It. L 1975: Nonparanwtrics: statislKal methods hosed on ranla. S. Francisco: Holden-Day. ~Ieb'" C. R. 1994: 1he exacl analysis of ooatiafImCY tables ia medical R:Stan:h. Stulistkol Metltotis in Medkm Reran:IJ 3. 135-56. Petmaa. K. 1900: On the criterioa that a givea symm of de\·iatiOlls flam abe probable in the cueofacorrela~d system of variables is such thai il can be reasonably supposed to ha\'e arisen from raadom sampling. PhiIOJOplriC'O/ Maga=me Stries j SO. 157-75. WIcoua, F. 1945: (acIi\'idual c:ampari50Q5 by ranking melhods Biometrics 1.80-3. Yates. F. 1984: lest or sigaificance far 2 x 2 contin;ency l8bIes. Journal of tM Ro)vl StatisliC'Q/ Sotiety. smes A 147.426-63.
experimental design
An experiment is a planned method of ex»llccting data under conlrOlled condiliorw. carried oul so thal the inftuence on the n:sponses of one or more factors (which c:wld. for example. repmienl differentlrealments in a CI..OOCAL 11UAL) may be assessed. The n:sponscs at the \'arious levels of the factors are compan:d to see whether then: are any diffen:nces that could indicate that the elTcct of one level of a factor is different from that of another. Unlike industrial experimentation. the dcsips mainly used in medical n:scarch tend to be n:lalivcly simple in structun:. This is partly because experiments in\'OI\'ing patients an: much ~ dif1k:ult to 'control" than experiments carried oul under strict industrial conditions. Patients do not always comply wilb the medication regime and some may withdraw from In:abncnt because they go on holiday. change GP. move house. change jobs. etc. It could even be Ibe case thal the benefits of a sophisticated design could be lost in the complexity of the final analysis. Although in Ihc early phases of Ihc development of a new mg, more involved experiments. such as dose-ranging experiments using animals or small numbers of heallby volunteers. might be conducted. it is usually regarded as safer when dealing with patients to employ designs that are simple and not so sensitive to MJSSINO DATA or incom:ctly applied In:almcnts. For this reason, the Yast majority of clinical bials an: carried oul as ~tHnparQlil'e rlllldomised controlled experiments USing a parQllel group structun: (a completely randomised design) or as a simple cross-oW!r experiment when: each subject is exposed to each treatment USing a pn:delCnnined specific In:atmcnt scqucncc. In any design situalion. it is impoltanlto considerlhc thn:c basic principles of experimentation (Cox. 1958). These principles underlie all forms of good experimentation and are n:quin:d iflhe conclusions are to possess the properties of Wllidily, predsion and coverage. To achie\'C validity. an experiment should be planned so that the conclusions are
free from BIAS - eilhcr conscious or subconscious. It is not enough that the experimenter feels sun: that they ha\'e not introduced personal bias or prefen:RClCs into the experiment: it is a question of using a suitable experimenlal design and following the procedun: laid down in &he protocol. The first principle ofexperimentation is R.oUrDOMlSA~. which is used to avoid bias. Then: should be an allocation of In:atmcatto individual subjects according to some randomisation procedun:. which the experimenter cannot inftuencc. Random number tables or a ex»mputer-generated randomisation should be used to allocate Ihc In:almcnts to the subjects. In a parallel group experiment. the subjects an: randomly allocated to two or mon: separate anns of Ihc study and receive one speciftc lI'eatment throughout the In:atment period. In a CROSSOVER 11lLO\L. patients are randomly allocated to a particular sequence of trcatmcnlS. The simplest foem of crossover experiment is the two-period. lwo-ln:atmenl. twosequence crossover. also known as an ABIBA crossover. In clinical n:search. it has become common to use BUNDING as an additional fealun: to avoid bias. Single-blinding is when: a In:almenl has been allocated to the patient at random. without the patient knowing which In:abncnt he has n:ccivcd. Double-blinding is when: neither the patient nor the investigator (or assessor) is aware ofihe specific tn:almenl received. When the object of an experiment is to compare the elTcets of different In:atmcnts. then: should be a mcasun: of the precision (standard error) of the estimates of the differences between the elTccts of the In:alments. This can only be obtained if then: is replication, the second principle of experimentation. To achieve this. Ihc same lI'eatment must be applied to diffen:nt subjects. These repeated applications furnish a mcasun: of the variation in &he treatment elTcels that may be compared with the variation due to random error that would arise e\'en if there were no differences between In:atments. One of the main rcquin:mcnts of an experiment. particularly in a medical environmenl. where it could be unethical to carry out an experiment unless it can be shown that the estimates will have sufticient precision. is that the study must be ofa suflicicnt size: i.e. there needs to be enough power in the experiment to detect a clinically impoltant difference, if one exists. A power comparison. dClcnnining the sample sizes needed in a planned experiment. is an essential part of a clinical bial protocol (sec PROrOC.'ClLS FOR CUNlC'AL tRIALS).
The precision of an experiment depends not only on the numbcrof repJications used in the cxperimcat but also on the inhcn:nt variability of the subjeclS studied. The variability wiD be smaller if the subjeclS are more homogeneous. However. in order to achieve wide coW!rage of the conclusions. the subjects used should be as varied as possible. For example. a lriaIlo compare asthma treatments will tell us very liltle about the response for elderly paticnlS if it is restricted to a nanow age range of yaung patients. If the results of Ihc
183
EXPERTSVSTEMS ___________________________________________________________
experiment are 10 .pply to all p.lic:nts,dIe experiment should include palienlS from a wiele l1II1Ie of age groups. However. thedesin: 10 extend the coverage of aa experiment m.y n:sult in sySlcmalic CIIOI5 due to the hetelVgeneity oflhc: subjects. This would be particularly serious if the nmdomisalion n:sullCd in subjects exposed toone treatment being geaer.lly ditrcn:nt flVm (e.g. younger than) those exposed to IIDDlher treatment. One of the techniques for the control of this nonrandom systematic error is the tcchnique of slrQlif>-m& the subjects into homogeneous blocks and theD mndomising the treatmenlS \\'ithin blacks. SIrBlijiCtllion. the third principle ofexperimcnlalion.leads to experimeDtai designs in which die eft"c:cIS of diffen:nt blocks may be laken intoaccxJUllt in the analysis. Examplesof stratiftecl designs ale ranclomiscd block designs. Latin square designs and incamplele block designs such as Youden squan:s. balanced incomplete blocksdcsigns. group divisible designs and cyclic designs (see Cudu. 1998). Other tc:chniques for allowing for dilfCI'CIICes in the subjects. aad then:rare extending abe coverage or Ihc: conclusions. invol,'e the use of auxiliary inform.lion. For example, in • hypertension experimeDt. patients may be enlered inlO the trial with diffen:nl initi.1 systolic blood pn:ssun:s. so a simple comparison of the diffenmce between the .verage systolic blood pn:ssun:s for the patient groups after tn:.lmeat will not give a InIe comparison of the tn:atments unless suilable adjustment ror their initial (baseline) blood pn:ssures is made. This adjustment can be made using die ANALYSIS OF COVARIANCE. In addition 10 the parallel group and clVSSOver designs ahady mentioned. faclOriaI designs an: being used mon: IRqUC:Ddy in medical experimenlalion. In such experimenls. II10Ie than one factor is involved. giving rise 10 tra.tmcnls formed from diffcn:nt eambinations of the ractor levels. For example. one ractorcould be diffen:nt drug treatments. while a second f'ac.IOr might consist of dift"eRDt leYCIs of patient can:. It is important in thcsc: designs not only to be able to assess the main effects of the dilfen:nt factors bUI also any inlcnlctions bet\\'CCn them. ODe problem with suchdesips is that the number of tmdJnent combinations increases rapicDy so ....t it might be DCCCSSBI)' to usc fractional n:plications as well as confounding 10 reduce the size of the experiment and to IICt'Ommodate Slratiftcalian. Response sur/Bee desi&ns (Box and Draper. 1917) are used to model a surface Iq)n:senting the n:spolWCS at difl"en:nt levels or. set of factors. "nIe abject of the experiment might be to dclcnDine the best levels al \\'hk'h to set • number of racton thaldeet the n:spanscs. Simple fldorial designs may be used to fit first-onlcr n:sponse surface models. but mon: complicated designs such .s ceDtraI composite designs IR nc:c:dcd ir higher order models IR rapn:d. Another type of experimenlal design. incn:asingly applied in clinical research. is the sequential design (see SEQUENn.U.
ANALYSIS) or sequential group design (Whilehead. 1997). in which parallel groups of patieDlS are studied. Such a trial continues unlil a clear benefit ofCJDC treatment is seeD orunlil il is unlikely that any diffcn:ace belween In:atmenls will emerge. n.e main advanlage or these sequcntialtri.1s is dud. they will oRen be shorter. aad theRrom involve rewer patients. when Ihen: is a hqc difl"eRnce iD the efl"ectiveness or the t\\'o tn:abDents. PP lSec also PHASE I TRIALS, PHASE IIlRL\LS, PHAsE In TRIALS. PHAsE IV lRIAlS)
",G.1. P.1IDd Dnper.N. 8.1917: EmpirimlmDt/tJbuiltliRI_ response surftlUr. New York: Jahn Wiley &: Sans. Inc. 0. R. 1958: PlDnRing ofe.v:perimmlr. New Yark: John Wiley &: Sam. lac. Cutler. 0. R. 1998: Incomplete bIoct dcsigas. In AnnitIp, P. aad
c..
Colton. T. (cds) EncycJDptt/iQ of b.SIBtOtics. Chichcslcr: John Wiley &: SoDS. UcL Wldtelltad, J. 1997: The titsign _ onalysir ofMqUtfttiaJ clinical Irials. Chiclxsacr: Jalm Wiley &: Sam. Ltd.
expert system.
Also bown as knowledge-based systems. expert systems ~ computer programs th.1 combine taowledge of some specific application domain with the general capability of drawing infcn:nces from iL so as to be able to assist a decision-making process in specialisl domains. Rcscan:h in this field originaled in the 1960s, nourishing in abe 19705 aad for some lime: dominating the mainstream of .-umFlCJAL INTELLIGENCE (AI). Many limilations 10 this paradigm an: now knOWD. along with ils advant.ges. One of the original molivillions for the development of this line of n:seIRh was the impossibility to build generalpurpose pmblem solvers. masdy due to the surprisingly exlensive use or coml11ODSCllse knowlcdJe that is n:quired in such systems. ID a very delimiled ilia of expertise. however. this is much less or. problem. The an:a or human intellect'" endeavour to be capt""'" in an expert system is called the "task domain' of the expert sySlcm. The primary goal of expert systems n:seIRh is to make expert knowledge mailable to clccision makers. by incOlpOl'llling the expertise in an expen system. by means of a pmcess bo\\'n as ·kaowlcdp exlnldion·. Expert systems IR often divided into two parts: a Iaskdepc:adent ·knowlc:dge base~ and an 'expertsystemsheU' dud. is independent of tile task domain. In the classic approach the expertise iD the kaowlc:dgc base is rcpn:senlCd in a symbolic w.y (e.g. by means or logic-type I1Iles). 1'11e expert system shell. by way of contrast. contains an 'infen:nce engine' that makcs use of symbolic manipulations (e.g. logical infen:nce) to n:ason with die information in the kaowledgc base. Same expert system shells also conlain an explanalion sySlem (giving explanations about coaclusions draw. and about questions asked) and a kaowledgc base c:dilor (allowing modiftcalions to be made in the kno\\'lc:cI&c base iD an easy w.y)~ One last important part or expert system shells is the
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ EXPONENTlAL DISTRIBUTlON
user interrace. which allows user-friendly interaction with the expert system. One of the first and most classical expc:n systems is PROSPECTOR (Duda el al., 1977: Duda. 1980). which was used to cvaluDle the mineral potenlial of a geological site or region. In medical applications expert systems first provc:d their usefulness by the development of MYCIN (ShortlilTe el al.. 1975: Buchanan and ShortlilTe. 1984). This is an inlmlClive program developed at Stanford Universily thai diagnoses certain blood infections. ~mmends treatmenl and can explain its n::asoning in detail. In a conllOUed lest. its perfonnance equalled thai or specialists. Unfortunately. legal and ethical issues related 10 the use of computers in medicine prevented the system being used in practice. Many other developments raulted from the MYCIN project. howevere.g. EMYCIN.the ftrsa expert shell developed from MYCIN. Another earl)' example of a medical expcl1 system is INTERNIST-I (Miller. Pople and M)'cr5. 1982). which later e\'olved into Iliad. Today expert systems are used in different applicDlions. including medical diagnosis. design (e.g. or large buildings). planning and scheduling (e.g. in logistics). Recent resean:h in expert systems focuses on the use of stalistical reasoning methods such as belief nc:Iworks. which in tum fall under the man: general header or probabilistic GRAPHICAL ).f()Dfl.S. In modem AI. for example. much of lhe inferential procedures arc based on stalistical rather than logical infcrc:nce. Anothu, related. trend is moving from a bowledgc-intc:nsive approach (wilh emphasis on producing a good knowledge base) 10 a data-intensive approach (letting the system improve b)' MACHINE LEARNING and data mining). NC/TDB B....n·n, 8. G. ad SbortIIfI'e, E. H. 1984: Uncenainty and evidential support. In RuTe-baseti expert sy:Slmu: the MYCIN experilnent:s of/Ire SItllf/ord heurislic programming projrtl. Reading. MA: Addisoa-Weslcy. Dada. R. O. 1980: The PROSPECroR system far mineraJ exploration. Menlo Part: Stanford Researcb IDstilU1C F'maJ Report. Project 8172. Dud&, R. 0 .. Hut, P. E., NIIIIoa. N. Jot Ke...... R., Slanun, J. ad Sulberlaad, G. L 1977:
De\dopmml or • computer-bascd cODsultant for mineral exploration. SlInford RcseaR:h InsIitutc Annual Report. Project 5821 and 64IS. Miller, R. A.. Pople. H. It. ad l\lyen, J. D. 1982: INTERNIST-I. an experimental compulc:r-based diaposlic consultant for gc:axraI inlCmai mediciae.. Nn' EnglanJ Joumal of Met/irine. 19. Aups1.SbortIIII'e.E. H.,RbaI.. r. 8., AxlIDe, 8. O.,C.......S.N., B.......... 8. G., Dan., R., ScGtt, A. C.. Cba'u-P..... R. ad Y8II MeUt,W. J. 1975: t.lYCIN: a compula' propam providing antimiCJObiailherapy RCommcDdalions. Clinkal Mft!icint 34.
explanatory variables 1bcse are variables
used as potential explanations or another variable in a SlatislicaJ model. Thus. when using the Simple: IinearregRSSion model:
.v=a+/l."C+E
we seek lo explain the variation in the outcome wriable. y. according to varialion in the explanDlOry variable. x. Hen:. a and /l are conSlanls thal specify how .l' is relaled to x: any residual, unexplained. varialion is accounted for b)' the random enor 1cnn. E. Due 10 the widespread use of the symbols in this equation. an explanatory \'ariable is often refcned 10 as an 'x variable'. For example. Woodward and Walker (1994) used sugar consumplion per head of popuJalion as an explanatory variable in a simple linear regression model to pn:dict the avenge numbc:rof deca~ missing and fiUed teeth in 90 countries in an ecological stud)'. In the analysis of associDlion between these two variablcs. supr consumption was laken as the explanDlOry variable. because the hypothesis was that consumption of supr caused dental problems. rather than the other way round. In general. several explanatory variables ma), be adoptc:d as potential explanalions of the outcome variable. For example. Bollon-Smith el QI. (1991) used 15 explanatory variables in a Mtl..nPl.£ L~ REGRESSION model to predict the level of hip..ctc:nsity lipoprotein (HDL)-choleslc:lOl in 5236 women. Explanalory variables ma), be quantitative or categorical; the complcle set ofexplanatexy variablcs ma)' be a mixl~ For example. in the HDL-cholesterol study. the 15 variables included continuous measures. such as blood pressure. and categorical classificalions. such as marital stalus. Somelimes there may be a single explanatexy variable that is the subject of the hypothesis of interest and the n:maining explanatory variables in the statistical model are confounding. or prognostic. variables. For example. in a StJIMVAL ANALYSIS or the effects of ciprate smoking on IURl cancu the explanatexy \'llriables in the ftlled model might be age and smoking. Hm:. age is not ofinlc:Rst as a predicti\'C variable in its own right, but is a potential confounder in the relationship belwa:n smoking and lung cancu. Including age as an explanatory variable enables adjuslment ror the elTecls of age. Sometimes. a model includes certain explanalory WJiabies that repralCnt interactions. for instance an age by smoking interaction mighl be included in the SICl of explanalOry variables in order to sec whcthu the elTect of smoking differs by age. MW BoltoaoSmlaa. Co, Wooctw.d, M., Smltb. w. C. S. ad 'J'anstd. Pedoe. H. 1991: Dietary IIDd non-dietuy pmIictors or senam total IIDd HDL-cholcstcrul in men and 'NameD: rauJts from the Scottish Heart Heallb Study. InlerTNlliDlllJl Journtll of EpitJemioiogy 20. 95-104. Wood""', M. ad W.aer. A. a. P. 1994: Sugar consumption aDd dcDtaI caries: evidence f'RlID 90 countries. British DenlDI JourllQ/l76.. 297-302.
exploratory factor analysis Sec F.\CIOR ANALYSIS exponential dlstrtbutlon This is a singlc-panuncter FROBABD.n"Y DlSlRIBtmON that often models the length of lime
to an evenl. If a random variable. X. lakes an exponential
185
EXPONEN~LFAMILY
___________________________________________________________
distribution with parameter A then this is sometimes wrillen X - E(A) ror short. The exponential disbibution has the dcasily runction:
which is monotonically decn:asing rrom the MODE at x =o. 1bc: distribution has a rttEAN or llA and VARJ..\NCE or 1/).,2. When)., =0.5, the diSlribulion is identical 10 a CHI-SQUARE D151R1BtmON with two DBIlEI!S elF fIlI!IlIX»L Changilll 1 n:scales the densily ruaclion on both ucs. but leaves the shape unchanged. 11H: ruaclion always intc:n:epls lhc,. axis at ,. = A and the 95th pc:m:ntilc: (userul ror aa1culalilll com=J. DENCE LBO'IS) is always approximalcJy.'f = 3/)'" (The shape can be sc:c:n illuslnlc:d in the figure accompanying the enlly on the chi-sq~ disbibulion. For runhc:r dc:lails on how the exponential disbibution ",Iates 10 other diSlributions. sec
These distributions have some attractive reatures that enable easy computation and manipulation. A distribution is deftnc:d by its distribution function (if a discRte distribUtion) or its density function (if a ClOntiDuous distribution). in either case lhc defining runclioa is a runction of the data. x. and same panunc:tcn.. 8. 'I11c: distribution is in the exponential ramily if ( I) Ihc: runction Clift be wrilten as a function of the plll'llll1ClClS (e.g. Q(8)) multiplic:d by afunclion ofthe dais (e.g. b(.'f» multiplied by the exponential orlIftothu function or the pal'lllDCtel'S (e.g. ((9») multiplied by lIftother function of lhc: data (e.g. ti(.'f» and (2) the value of Ihc: panlIIICk:rS does DOl alter the of possible values or the datL To illustrate condition I. the EXJIONENIW. DlSTRlBU1lQN is unsulpl'isifllly in theexpanc:nlial family ofdistributions. 'I11c: exponential distribution has lhc probability density runction:
mage
I(x) = le-.b ~ x 2: 0
Lcemis (1986).)
The most interesting property or lhc exponential distribution is the lack ormc:mory (lDM) property. This means that if' the liretime or a sUlla instnnnc:at is distributed exponentially with paramc:lU A.then ir after 2 years we note that it is still working. its ",maining liretime will be distribuaed exponentially with paramc:aer ).,: i.e. it is as ir the process has been reset or the first 2 yean have been rorgotten. To illustrate.. Ainsworth el QI. (2000) found that the Ic:agthof time in days until discluuge rrom hip.dc:pc:ndc:ncy care for ~ babies is exponentially distributc:d with a MEDL\N value of 6. The I.:k ormc:mary propc:Ily tells us that if a baby has already spc:nl2days in suchc~ then its median mnaining time in high-dcpc:nclcncy ~ is stiD 6 days: i.e. the median ovcmIl time for babies whose time exceeds 2 days is 8 days. If one has n random variables that independently hayc exponc:atial distributions with parameter A. then their sum will be a OAMYA DISTRIBU1ION with parameters A. and n. Coupled with the lack of mc:mmy propc:ny, this tells us that if the time to an event has an exponential distribution. the time to the second or 2 (or the third of 3, etc.) independent such events will have a gamma distribution. AGL AlDswartla,5.B.,B.............. W.. MOpw,D. W.A....w.N.
J., MattJaews.J. N.s.. FIDt_ A. C. ad Ward ...... M. P. 2000: PUmactaal and porKIanl air. for tmdmenI or rapiratory disRss syndmmc in neonates born at 25-29 wecb' ICStaIion: a raadomiscd trial. 1M lmrttl 355. 1317-92. LeemIs, L AoL 1986: Relatiaasbips among ODDUDOII univariate dimillulioDs. 1M Amtriam SIQliJtit:ilm
40. 2. 143-6.
exponential family This ClOmprises distributions. including many or the ClOmmon ones. whosc: dillribution or density functions can be partitioned in a particular way.
which we Deed to show can be wriUen in the fOl1ll: I(.'f) = Q(A)b(.'f)er(l)d(KJ This
Clift
c().,) =
be achieved Simply by sc:Uing 0(1) =A.
b(.~)
= I.
-l. ~x) =.'f.
To illustrate condition 2. consider the ClOnlinuous
uniformdisiributionbetweenOand8thathastheprobabilily density fuaction:: f(.~) =
I
S'
0 ~ .'f.~ 0
This Clift be parameterised in lhc axrc:ct manner. but dilfc:n:at values or x bc:aJme possible given difl'e",nt values off) (e.g. 7 is possible if 8 =8. bul not if II =6). 'I11c:",rCR this does nol belolq; to the exponential ramily of dislributions. Other distributions that ~ in the exponential family or distributions include the exponential. the NORMAL. the BINDMlAL.the PoIsSON. the Bernoulli. the UNJRHU.I (ifbc:twc:en fixed points). lhc BETA and the cw.BfA - to name only somc: or the mon: COIIUIIIJII members. In frequenlist sbdistics.the dislributions in lhc exponential family are useful because lhcy passess properties thai allow for both straightforward MAXIMUM LIKELIHOOD ESTlMA110N and OENER.WSED UNEAR MODElS. amang othen. For BAYESIAN MImIODS. one altractive feature is that they ~ diSlribulions that have naIuraI conjupte PRKJR DISlRIBUI'IONS (sec Oelman el 1995). For further details see Dobson (1990). AGL
m.,
.,...., A. J. 1990: An ilrlroduClitJn 10 gatralizftlliMtlr mode&. Laadoa: CblqllD8R &: Hall. 0........ A., C'arID. J. B., Sten, H. S. IIDII D. B. 1995: BQ)'nitm datQ QllQipis. Boc:. Raton: Chapman &: HaJIICRC.
Ra.....
F factor analysis
This is a generic term for procc:dun:s thai allc:mpt to uncover whether lhc associations between a set of observed or maaifest \'IIriables can be explained by the n:lationships or tlae variables to a small number of undulying LmNT VARIABUS (more usually lefencd 10 as comnrorr faclors in lhisconlext). Factor analysis teehniques attemplto clisawcr the number and nature of the laleat variables that explain die wriDlioa and man: spc:cificaDy covarialion in the set or measun:d variables. The common faclors ~ considen:cIto conlain the essential information in the larger set of observed variables. The factor analysis model pastulates that each obsenoecl variable can be explessc:cl as a linear function oflhe common factors plUI a lesidual k:nD. i.e. a MUU1PI.E LINEAR REORf.5SJON model for the obscrval variables on the common facton. The model implies thatlhe covarianceslCXJII'Cladons between the observed variables arise frum their mutual n:latioaships to the common factors. The COVARIANCE MATRIX of the observed variables predicted by Ihe factor analysis model is a function oflhc legn:ssion coefficienlS of the observed wriables on the conunan factan (the factor loadillls) and the variances or the ~idual terms. Estimalion or both factor loadinJs and raidual variances involves makillllhc conespondilll clementi of the predicted and observed covariance matrices u close as possible iD some sense. MAXIMUM UKEUHOOD ESTIJ.fA11CJN is CiDIIUI10DIy used and has the advanlale of providing a fOnDai test of the numberoffactors needed adequately to n:pn:sent the data. In Ic:neraIlhis test is to be prefermilO infannal tells such.as KAlsa·s RUlE and the SCREE PLOf (see. for example. Pn:acher and MacCullum, 20(3). After the initial estimation phase an attempt is made 10 simplify the often difficultlask of interpredng the derived facton using a pmcess known as F.ACI'OR ROTAl1ON. In gcncraIthe aim is to produce a solution having what is known U Jimple slmclllre, i.e. c:aeh common factor all'eetl only a small number of specific observed \'ariables. Rotated facton can be allowed to be: independent or ClOIRlated, with lhc former oRen beilll chosen by default since it appears 10 provide a simpler solution. In particular circumslaDces. however. com:latcd faclOn might be considenxlto be a man: mdistic option. A mc:clical example of the application of ractor analysis is provided in Whillick (1989). Altitudes to aucgiving ~ examined in three paups of caras: mothc:n carilll for a menially handicapped child, mothen cariDI for a mentally handicapped adub and dauplers caring for a palent with clcmc:ntiL An altitude questionnaile containilll 26 variables
factor analysis CIJlcuIation of factor scores on caregivIng data SubJt.Ylle
F
MDIIter' tbiItJ
410""
DtluglJlerl parenl
19.6 6.3
21.3 5.1
27.8
4.5
19.5
SD
24.3 3.9
25.0 3.1
19.4 4.3
21.3
IlrSInliDil Meaa SO
7.5 3.5
8.3
12.9
2.9
2.7
OmjIkl Meaa
SD Lorr MeaD
lIIIMll
34.5
was developed and adminis~ by post to 145 cann. The com:lalion matrix of the observed variables was subjc:c:tecllo factor analysis and a particular form of ractor IOIation livinl independent facton. The thn:c-factor solutioa could be interpmed as follows: Faclor I. Nelative aspc:c1S of CBlelivilll with an emphasis on role conftict. family disrupdon and IeSCntmcnt about the carilll role. Labelled u 'confticl". Faclor 2. Positive aspects of CBlePvilll with the emphasis on love for the dc:pcndant and salisfaction lained rnxn the cBlegivilll role. Labelled u 'love'. Faclor 3. Wallil1llles5 to accept institutional ~ with an emphasis on its advanlales. Labelled as 'inslibllion'.
These thn:c: facten provide a concise zind convenient descripticJa or a lelalively complex dablseI and wcm used u lhc basis of a further investigadon or dill'en:nces beIween die three paups of C8IaS. Factor scCm:s wele calculated on each ofihese duec fadon for all 145 c8rcn in lhc sample and a one-way ANALYSIS OF VARIANCE applied. giving lesults as shown in the table. The analysis of variance showed that then: wele significant dilTen:oces between ~ groups on dnefaclon. Factor analysis as described in this scction is man: accurately called exploratory factor analysis. with the 'exploratory" implyiDl that the investilalOr uses the analysis with no pn:c:oneciw:d ideas aboutlhc factor structu~ 10 be
an
&qdOfNlldie cMIJIIIIIlM 10 ltImlt'Gl S/fllislia: s.n.d Edit_ Ediled by Briaa So Everitt and ChristGph« R. PiIInIer C 2011 JohD Wiley & Soar.. ....
187
FACTORROTATION _______________________________________________________ expected (except. of course. thai it will be relali~ly simple anclopcn toclcar interpmation). Jn some silUalions. howcver. the resean:her may havc a lheomical factOl'structtR in mind to be tellcd on a dataseL Jn such a case. confirmatory factor anaJysis may be aed. SSE
factor ratatIon ConeIation coefficients of six school
(See also PRINCIPAL CmIPONENT ANALYSIS)
2 Eqlish 3 History 4 Ariduactic ,5 Algdn 6 Oeomclr)'
.........., K. J. and l\1aI:C....... R. C. 2003: Repairia; Tom Swift's electric factar analysis IDIChinc. UrukrslantliJrg Slatutia 2, 13-44. Wldtti, J. E. 1989: Dementia and mental handicap: altitudes. emotiaaal disRss and caregi\ing. BrUisIJ Joumtll 0/ Mrtliml PJydlO/ogy 62. 181-9.
factor rotaUon
This is a procedure tLtCd in cxploratory F.ACTORANALYSIS that aims to allow the factor analysis solution to be deseribecl as simply as possiblc. Such a procesl is possible because thc exploratory factOl' analysis model docs not possess a unique solulion. Essentially. factor rotalion lries to find an easily inlelpR:l8ble solution fmm among an infinitely largc sel of allcnlali'VCS thai each account for the wrianc:es and covarianccs of the observed variables equally well. Factar nJIalion is a way by which a solulion is made more intcrprdable without changing its underlying ~ matical properties. The DUmericaitechniques dud arc used far factor rotation aim for solutions in whichcach variable is highly IoacIecI on at mosI onc factor with factor loadilils being cither IlU'Ic and positive or ncar zero. In esscace they by to alter the inilial solulion by making large 10adililS larger and smaillaadinp smaller by optimising lOme suitable numerical criterion. Somc methods of rotation give u~lated (orthogonal) factors. while others allow com:laIed (oblique) factors. As a general rul~ if a rcsean:her is primarily conc:emcd with getting results that "best fit' the dalll., thcn the factors should be rotated obliquely. Jf. however, Ihe~ is J11(R in~ in the gellClBlisability of results. thea orthogonal rotation is piobably to be IRfcrml. For a full discussion of the pros and cons of Ihe two forms of roIation ICC Preacher and MacCallum (2003). We can iIIuSlrBlc fadar rotation using thc axn:lalion mabix shown in Ihe llna table. The factor loadings in the initial two-factor solution fOl' these com:lBlions arc shown in the second tabI~ as ~ the ralated factor loadilils (an orthogonal ralation was used). 11Ic factors in the rotated solution might be labelled as "verbal' and "mathematical'. The lack of uniqueness of the fadar loadings f1'om an exploratory faclor analysis once caused the technique to be viewed with a certain amount of suspicion (particularly by slatisticians!), since. apparently, it allows investigators Iicencc to consider a large number of SOIUtiOM (each com:sponding to a roIaIion of the facton) and to select the one closest to their a priori expectations (p~judices) about the factor structure of the data. HoW'C\'cr. such suspicion is largely misplaced because of the essential ·cxpJoratcxy·
subjecfs
Subfrcl
I Fmlcb
I
2
J
4
5
6
1.00 0.44 0.41 CL"9 0.33 0.25
1.00 0.3,5 0.3,5 0.l2 0.33
1.00 0.16 0.19 0.11
1.00 0.59 0.47
1.00 0.46
1.00
factor rotaIIan Unrotsted and rotated factOlloadings Rolaled loadiltgJ
Unrolatm loadiRgJ
Variable
I
I Fmlcb
0.55 0.57 0.39 0.74 0.72
2 English 3 History 4 Ariduaecic ,5 Algeln 6 Cieomctiy
o.s9
2 0.43 0.29 0.4.5 -0.27 -0.21 -0.13
I
2
0.20 0.30
0.62 0.52 0.,55 0.1,5 0.18 0.20
0.05 0.75 0.65 0.50
natu~ of the factar analysis solution that is subjected 10 RJlation. (See Everitt and Dunn. 2001. for further discussion.) SSE
Et..ut, B. S. aDd Dunn. G. 2001: Applied nruJlilYUiale data QIIQ{rsu. 2ad editioa. London: Arnold. Pnedaer, Ie. J. and MacCalhun, R. C. 2003: RqIairing "Ibm Swift's dectric analysis machine. Ulft!entantlilrg SlatistitJ 2, 13-44.
_tor
factorial designs
A term used in the context of a randomised llial (ICC CLINICAL TRIALS) to mer 10 a particular cxperimental design that allows two or man: inlcnentions to be evaluated in a statislically cfficient way. ID its simplest farm. where lMatmcntsA and B ~to becompaRd with lheir rapcctivc: placebos (see the table). a 2 x 2 fBCIoriai design involves each patient being randomised twice. namely 10 cilller active A or placebo A and to either active 8 or placebo B. 'tbis design allows the separate effects of A and B to be assessed in the same sample or palicats and. for a given sample size. is mo~ powerful than a llial comparing A venus B VCRUS no tn:alment. 11Ic analysis of factorial design trials involves a number of 5Icps, which may be illustrated by considering a hypothetical analysis of the trial dcpictc:d in the table. The ftnI step is to assess the effects oftrulment A amolll all patients allocBled to A or matching placebo A. The mosI poweIfuJ means of assessing the elTects of A ilto comp~ all those allocatc:d to
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ FALSENEOA11VERATE factorial d_llns Schematic diatJram 018 2)( 2 fadotlaJ ttiIII of treatment A vetSUS placebo A 8IId of II8IIIment B vetSUS' placebo 8 Aeli,rA
PitIt'tIJo A
AB AO
OB 00
AliA
All non-A
ADI All DDD-I
CftnuuaIarJ - tile ....... 11ac IIIIIJinaI analysis for Ihe efb:tivcDCSS or trcaImcllt A im'oIvcs compariag Ibe IRCIISIft or the ballamt efrcct IUDOIII aU those alb_ ballamt A (Ibe tata.I in the ccillabelled 'All A') with the IIIC8SIR of the IRalment ell'ect amonc alilbasc IICII allaclled to A (Ibe Ioral ia die ceU labelled 'AU noa-A·). ThcllllUJinaI analysis farbdl'ectsof B isaaalapus ('All B' venus 'AU nan-I'). The ICSI far intenK:liaIl between the dfeelSof A and B invaI\'CS_ appropriate IIIeaSIR or the diffen:nce bet\\'cea (a) the eWccts of A amonc subjcctsallacalcd 10 I and (b) the eWec.u ofA IIIIIODI palieats not allClCllcd 10 B. In Ihe special situaliaa ~ die autc.vmc is binary (e.g.lIICII1aIity) andllratment ell'eeIS involve comparisoas or pnJpDI1ioDs.thea a ratio can be oompu1Cd, lObi the IC5I ia this table would be the ratio or the mlative risks (AIWB) and (AOIOO).J
A (celli AS and AO in the: table) with alllho.sc allocated to placebo A (OB and 00). Analogausly.abeelTectl orB can dlen be estimated by comparingaillhose allocated to B (AB and 08) with all Ihose allocaled Placebo B (AO and 00). 11Iis ·lIIBI'Iinal analysis' is the most statistically dlicient analysis unless an ·iDteraction· exists such that the effecls of A differ amolll patients allocatcclto B and among those: allocatcclto placebo B (or vice vena. namely abc elTeclsof8 clifferamong palienls allocated to A 01' plKcbo A). It is necelSlll)'. lhen:fon:. 10 lest for such an inlc:raclion in abe routine analysis or factorial trials. because in situations where the effects of A ~ smaller 811lOIII palienls allocated 10 8 Iban among Ihose allocated to placebo B. thc: marginal ....ysis will underestimate the effects of B (and similarly the elTc:cts of B will be overalimaled by Ihe marlinal analysis if allocatiaa to B enhances Ihe el1lcacy of A). In die example given. lei us suppose that the primary outcome is binuy (e.g. mortality). so that a test for inlcnlctiOD woulcllest whclher the primary comparison (e.g. relalive risk) was slatistically sipificandy cliffen:al for the comparison of A venus placebo A among either patients aIIocaaedlo 8 (i.e. AB versus OB) or amaag those allocated to placebo B (i.e. AO 'VCISUS 00). In the I1R situation when: such an inlCnlclion is idenlified. and is clinically signifieDt (i.e. ils existence is releyant to drug selecliOD). separate analyses or Ihe effects or a lrealment should be performcclllllllJllg all those allocated Ihe interactil1l drug and all those allocated nollo rccei~ that q. In the past. some authors ha~ ellpressed COnc:enl5 about the potential fOl' misleading eslimates or effecl arising from
imporlanl interactions in faclOriallrials (Lubsen and Pocock. 1994). II should be nolccl. however. thai such inlCnK:tions appear to be quite rare. In a m:cnt systematic review of factorial trials or tn:aImcnts for myocanlial ischaemiL fOl' example.. McAlister el Ill. (2003) found thai only 2 or31 (6") comparisons cIemonSlnlted a slalislically silnificant inlCnK:tion between two treatments. The raclorial design is an c:spccially versatile experimental design. For ellample.. if then: an: pod a priori JaIiORS 10 suspect thallwo interventions miJhl ad syneq;illically (i.e. their effects in combination may be lrealer than Slrictly mulliplicatiYe}.1hen a faclorial design is the only design that can establish this reliably. Similarly. irthe marginal analysis of a radorial trial sugesls thallwo tn:aImcnts. A and B, an: effective. Ihc absence of an illlCnlction between A and B sullcsts thai A will be simil. .y effective iD the presence of B: i.e. the combination of A and B is IlIOn: effeclive than B alone. 'lbe tesl for inlcraclion between the effects of lwo lrealments has low FOWER and so would only be able to deleet huge cliffen:nces in effectiYCIICSS when In:alments 1ft gi\ICII alone or in combination. However. provided it can be established reliably that a Raiment is effc:clive. the elistence of modest variation in the size of that effect (which is. after all. no mon: Ihan would be expeclCd biologically) may be of less immediate clinical relevance. While faclorial designs mighl well provide a useful tool fOl' answeriq clinical questions IIICR efficiently. lhere may occasionally be cin:ulft5lallces when: such a clc:sip pnm:s impmc:tical. Then: an: lwo particular situations 10 note. First. if two or more interventions an: to be asscssc:cl in a faclorial trial. subjc:ctscannot be randomisccl if oneoflhe lreaImenls is considen:cllo be definitel)' indicated (or is contraindicated). This may limildle prapadion or a tqet popuIaliOD Ibal is eligible fOl' atrial. Sc:cond.trial particiPMls who belic~ they have experienced an advcrse drug reaction in a factorial trial may simply choose 10 discontinue both lreatmcnls. so if Ihe price of assessilll a speculative ImltmCnt is a sacrifice in compliance with a II1O.M promising tn:aImCDl,. then the price may nol be wonh paying. CB L....... J. ad "'Ic, S. J. 1994: Factorial trials in cudiolao: pRII and COItS.EMTDpetIII HellT' JoumaIl5.SI5-I. MeAIII.er,P. A..
Stnas5. E.,s-ua. D. L and A..llman. 0. O. 2003: Analysis and n:porting or fKIoriaI bials: a systanatic ~icw. Jourlllli o/11w Ammam Met/iall IUmriDlio" 219, 2545-53.
181_ negative rate
A false negative Iesl result iD a diaposlic test study occurs when a penon who bas Ihe disease when IIICIISUmI by a n:fen:nce standard has a neptive lest result. 'I1Ie false: neptive rate is the proportion of individuals with false negative n:sults out of all orthasc who have the disease. For eXAmple. when babies are scn:ened fOl' hearing loss using lnIdilional ·distraclion' tesls, a number of babies with lICIali~ lesl results will be found laleron 10 have
18
FALSEPOSITIVERATE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ signiflcanl hearilll loss. These: babies have had raise ncgalive n:sulls and Ihe false negative rate is the number of babies with false: neplive mAIlls divided by the 10IaI number of babies with hearilll loss. II can also be exprascd u 1 CLC
SENSJ11\'ITY.
(See also IW.SE POSmVE RATE)
false positive rate
A raise positi,'e test result in a
diqnostic test study occurs when a person who does not have the disease: when measun:d by a n:ference slDDdard has a posilive lest laull. The false positive rale is the proportion of individuals willi raise positive mAIlls out of aU of those: who an: disease free. For eumple. when newborn babies arc screened for cOlllenital hypothyroidism usin; blood spot tests, a number or babies with initial positive tests will be folDld to have nannal values or thyroid horrnooc on repeat testing. 1'1Iese babies ha\'C had raise positive results and the raise positive rate is the number or babies with ralse positive n:sulls divided by the total numbcrofbabies who do naI have congenilal hypothyroidism. The false positive rate can also be explased as 1 - SP£CIfJCII'Y. While it is always desirable to avoid false: positive mAIlls. this is particularly important in the context of population screening. where appan:ndy healthy individuals arc invited to Ulldcrgo screening. Patients with false: positive tests will be banned. as they arc likely to experience anxiety they otherwise would not have felt and,. in onIcr to clarify their dise:ase stalUs. will have to undergo rurther investigations that may cany some risk. Hence evaluations of scrcenilll should coasider both the beneftts to patients com:clly identified as being at risk or the adverse consequences of a disease (as measUl'Cd by the test SENSIlMTY) and the hann to patients with false positive diagnoses (as measun:d by the raise positive rate). CLC [See also FALSE NEOATIYE RA1EJ
FDA
See FooD.AND DRUG ADMINIStRATION
Foodlatrlbutlon The F-distribution is the PRCIIABIUlY Dl51R1BtmON of the ndio of two variables. both or which have Tbe F-distribution is dcftned by two parameters. oftendenotcdm andn. known as the DEGREES CF FREEDOM of the distribulion.If A is a random wriable with a chi-squan: distribution with m degrees of fmedom and B is a I'DIHIom ,'Driable independenlly distributed as chi-square with n degn:es of freedom. lhen nA/(mB) has an F-distribution with RI and " (termed IDIlffeTal", and derrominDlor respectively) degrees of fn:edam. For funhcr delails of the distribution see Grimmell and Slirz.aker (1992). The most common usc or Ihe P-distribution is in the ANALYSIS CF VARIANCE. when: the ralio or two e5limalcs of the variance. each of which indcpendcady has a chi-sqlllln: distribution, is examined. For example, when looking at the III OD-sQUARE DISTRIBUTION.
effects of historical milk inlake on hip bone density. Murphy el QI. (1994) ftftd that their wriZInCC ndio is approximately 3.1. Comparing this to an F-dislribution with 2 and 245 degrees of fn:cclom, the probability of such an exRmc value: is 0.0237. ForllDlatcly. in the medical IitcndUre one se:ldom has to deal with the probability density function (PDF) of Ihe F-distribulion. II is useful 10 know that the mean of Ihe distribulion. as deftned here. is equal to ,r/(n - 2) as long as " is grealcr lhan 2, and thatlhe distribution wiJI be positively skewed bUI approaches SynuneIJy as m and n bc:come lillie. The distribution was named OF by Snedec.-or (1934). a nomenclature laler aUribuled to be in honour or R. A. Fillher. and so isoccasionaJly refenalto as Sncdecor's P-distribution or some similar varian.. AGL Grtaunett, G. It. IIIId StInabr, Do R. 1992:
Probtzbililyand
rtmtIom proassra. 2nd edition. Oxford: CWmion Press. af...,...,.. s., 1018", Ko-T.. May. II. ..... CGm....... J. E. 1994: Milk CGIL1Umption aad boac mineral density iD middle aged and cleled)' women. British Metliml Jouma/3Ol. 93941. Snedecar. O. w. 1934: CaJcuJalion aad inlapmalion of analysis of wriance aad c:ovmancc. Ames low CoUegiale Press.
flnne mixture distributions These arc PROBABIUIY DISI1UBUI1ONS thai laull from a weighted sum
or a number of
component distributions. Such diSlribulions have a long history. appamldy lint bcilll used by Karl Peanon in Ihe 18905 10 model a set of dala on ratio or rorehead 10 bady length for 1000 crabs. These data were skcwccI and a passible reason suggested for this skewness was thai Ihe sample contained reprcsentalives of two types of crab but when the daIa were c:ollcded Ihe)' had 1M)( been labelled as such. This led Pearson to propose thallhe distribution of the measurements on the crabs mighl be modelled by a weightc:d sum or two NORMAL DlSl'RIBU11ONS. with Ihe two weights being the proportions or the crabs or each type (see Pearson. 1194). In mathematical terms, Pearson~s suggested distribution for the measurements on the crabs was of the fann:
/(x) =pN(.~,.uI.al)+(1-p)N(.~'Jl2'a2)
(1)
where p is the pmporlion or a type or crab for which Ihe ratio of rorehead 10 body length has mean III and standard de"ialion 01. and (I - p) is the proportion or a type of crab for which the corresponding values arc /12 and 02. In equalion (I ): N(."(~/I;,ar)
I exp [1 = Ji'W; - 20f (."(-11;)-, ]
(2)
The distribution in cqualion (1) will be bimodal ir Ihe two componenl distributions arc Widely separated or will simply display a degree of skewness when the separalion of the components is not so peal (sec IIMODIAL DlS1IUBUTlON).
____________________________________________________
ASH~SBQCTTarr
1-
0.08
filled IWo-ccIq)oItonI miIcIIn densiIy ....... Fi1ted single normal density
0.06
0.04
0.02
0.0 I
,20
40
I
I
60
80
I
tOO
Age at onset of mania (years)
tlnlte mixture distributions Hislogtams and fitted mixture distributions tor age of onset data Pearson~s
original estimation pnx:edlR for Ihc: ftve puameten in equation (1) was based on Ihe method of moments (see Everill and Hand. 1981), aD approach that is now only Rldy or historical inlcRst. Nowadays. the pant-
mc:len of a ~mple ftnite mixlUn: IDOdeI such u eqWllion (1) or IIICR complex examples wi'" I110Ie thaa two componcnls or other than univariale nonnal CIOIDpaaeats would gc:aerally be quanlificd usiq the MAXlMUU LDUIIOOD ESTIMATION, often involving the EM ALCJORfI'HM. (Details are givc:a in
McLacldan and Peel. 2000.) In some appliclllions of finite mixtlR distributions the number ofcomponent distributions in the mixhR is known a priori (this was the cue far the crab data when: two lypes of .crab wen: known to cxisI in the'sqion frum which the data ~ collected). However, ftnile mixture distributions can also be used as the basis of a clusla' analysis of data (see CLUSJD ANALYSIS IN MEDICINE), with each component of the miXhR assumed to clcscribe Ihc: distribution oflhc: measurement (or measumaenls) in a particular eluster. and the maximum value of the estimated posterior probabilities of an ablClWliaa being in a particular cluSIa' being used to detc:nninc cluster membership. In sucb applicaliaas. the number of compaaents of the mixlUl'e (i.e. the nlDDher of ,c1uslen in Ihc: data) will be unknown and therefore will also ncccl to beeslimatc:cl in SORIC way. (This. too, is considcRd in McLacblan aad Peel. 2000.) As ail example of the application of finite miXlIR distributions we will look at the qe of onlel of mania to investigate the po5S:ibility thai then: is an early onset JIOUP and alate onlel giaup in the datL This subtype model implies
that the age or onset distribution for mania will be a mixtun: with lWo CIOID~. To investigale this, madel. ,finite mixhR distributions with normal components ~ filled to the age or onset (determined as age on ftlll admission) of 246 manic patienlS using the maxinllan likelihood estimation. Histognuns of Ihe daIa showing baIh the filled twoc:omponeat mixhR distribution and a single normal ftt an: shown in the figure. The UICEUIIOOD RAno tesl for number of graul'S (see McLachlan and Pee), 2(00) pmvidc:s stnJDg evidence that a tWCH:omponent mixture pIOYicles a better fit than a BSE single nonnal. EftrUt, 8. S. aad IIaad, De J. 1981: Filrilelflixlllre t/i",;bu,ions. Landon: Chapman at HalIICRC Md........, O. J. IIDIiI PeII, De 2000: Finil~ mLYlIIIV! tlatrilllllitHu. New York: John W'dcy at SoDS, Inc. ........, K. 1194: Caatrilulions to the mathelDlllical theory or cvaIutian. Philol'Dp/Jiml Trarutlclioru A lIS, 71-110.
FI...... exact teat
This is a lest of Ihe assaciation between the rows and collllllM or a two-way CONTINOENCY TABLE. n.c lest is "exact· in the sense that, under the bypolhcsil or no interaction between the rows and columns. Ihe distribution of the aUociated lest statistic is complelely determined. An advantage of such exact methods is thai they guarantee pn:servation of a n:searc:~'s pn:-spccified lesting level (in lIIis case. the probability of rejecting the hypathcsis of no lOW andcoJumn inlenlc:lion'wbeo the rows and columns are. in ract. not lISSOCiatc:d). The enlly EXACI' MEDIODS FOR CATEGORICAL IMTA describes exact tests mon: generally. but one aIIIId say that Sir R. A. fisher - with his method
171
r
s
E
T
r
c
Q
E
~
~
A
____________________________________________________
described hen: - was the flllhcr or exacllc:slS. He developed whal is popularly known u FlSher's exacl test for a single 2 x 2 conlinpncy lablc. He molivalCd his lest throup a British ritual, the drinkinJ of lea. Whea clrinkilll lea one aftenIDon during abe 19205 with Sir Fisller and several odaer univenity assacia~ a British woman claimed 10 be able to di51illluish whedler milk or lea wu added 10 the cup fint.ln order 10 testlhis claim. she was given eilht cups ofleLla raur oflhc cups. tea was addccI finl and in the other rour milk was added ftnt.11Iecxder in which the cups ~ JRsenlCd toller was nndomisccl. She was told thaltheR were thn:e cups or each type. so that &lie sIIauId maIcc rour predictions or each anIer. This experiment is dc:scribc:cl by Fisher (1925) and more n:ccntly by Salsbtq (1001). One possible n:suIl or the experiment is shown in the lable. Given this particular pcrfCJl1ll8llClC. could one conclude that she can distinguish whether milk or lea wu addc:cIlO dac cup ftrst? 'I1Ic: experimc:atal autcome displayed heR shows ahat she pssc:cl CXJIRCtly mon: limes Ihan not. but. by the same token. the: talal number or trials is nul VCl)' Iaqe and she miJht have pessc:cI ~Iy by chance alone. A statistically inelincd laean:her might pmccccl as follows: Nltllhyptllhesi6. The anIer in which milk or tea is pouml ina
cup and the lasIa"'s pelS or the order IR indcpenclc:nt. AllemtlliYe lr,.poIMSi6.
n.c laster can ClllTCCliy guess the
order in which milk or lea is pomaI in a cup.
Note: thai the allemalive hypalbesis is onc-siclccl. Althoup then: IR twopassibililies.ahat the woman guesses bc:ncrlhaa average or she guesses wane than avemp. we 1ft anly intelated iD dc:lccting the aItcmalive thai she guc:sscs better thanamqc:. SIIppDISC that the n:sean:hc:r decides to work at dac 0.05 level or signiftc:ance and decides to use abe Pearson CIDSQUARE lEST or indcpc:ncIcaa:. Results an: as follows: Pcanon chi-squ~ 2 ~ or rn:cclom I SignificllllCe 0.1573
Because the altc:mative h)1lCJlhcsis is anc-sicled (i.e. we IR anly intcn:SIed inevidencc: in favaurofdac woman'sabilily to distillluish between milk ftnl or lea IIrst). one milht have the n:pcxtecl signiftcancc. thereby obtaining 0.0786 u the abservc:cl P-VALUE. Because the obsm-c:d P-vaIue is gmderlhaa O.OS. the raeaad1er might conclude _ ~ is lID evidence ahat the woman can conecdy guess tea-milk aniea-. although theobsc:n'Cd level of0.0716 is aaly marginally largcrthan the 0.05 level of sipificDIItIC used ror the tesl. It is easy to see rna inspection of the table that the expected cell count UDder Ihc: NULL IlYFOIlIESIS or indepc:a-
FIIIIIer'. exact test FISher's tea-lIIsting expedment Pour
ROil'
lOlal
Milk
Milk
3
1ba
J
eoiwna totai
4
I l 4
4 4 4
dence is 2 for evay cell. Givco lhc: popular rules or lhumb about cxpectccl cell counts (e.g. sce EXACr METHODS fOR CAlEOORICAL DATA). this raises 4X111Cem about use of the one DBIREE a= FRI!EDOM chi-squBle distribution as an approllimalion 10 the distributian or the Pc:anon chi-squan: statistic rar abe table. Rather Ihan n:ly on an appmllimalian that has aa asymplalic justification. slIppDISC one inslead uses an exact approach. We demonstrate hc= how this is accomplished. Far the 2 x 2 table, Fisher noted thai under' lhc: null hypalhesis of indepcndc:ncc. if one assumes fixed marginal rmauencics rar baIh the lOW ad column counts, aben a so-called hypapometric diSlributi_ charaderisc:s the: distribution or lhc: four cell counts in the 2 )( 2 table. A hypcrgc:omclric clillribution caa be Ihought of as the probability model for selecting a padicular numbc:a- or Ial balls (say x mI balls) out or n lOtai selections. wiahout n:pIacement. frum ajar_ conlains, rat balls and RJ - , black balls. This distribution can be dcriwd using basic pmbability rules. but it prav"Klc:sa useful tool rora varicly of appIicalians IhaI include cxm infcn:nc:c ror conlingency tables. (We suppn:ss the clc:lails n:pnling the actual rann or Ihc bypcrgcomctric dislribution hen:. allbough ODe: may find out mon: fiom aay book CUlltaining a discussion of basic pmbabilily.) In the case of our 2 x 2 table. ir \\'C fix the: IlUlllinal counts we can sc:c IhaI. under the hypalhesis or indepcndc:nce. abc selection or milk lirst or lea lint is like choosing Ial or black balls frum a jar: i.e. the woman knows that raur or the eipt cups wen: pn:pan:cI with milk first and the other rour with tea firsL .r, in rid, she cllnnDllellabc difl"en:nce. then corn:clly choosing some number or milk-filii cups fiom Ihe eipt-cup lotal islikcl1llldomly sclc:ctingecl balls rlOlDajarcontaining raur n:cI balls and four black. 1'1Iis faca enables one to calcul_ an exact P-value I1IIhcr than n:ly on an asymptotic justification. In facl, the P-vaIue for Fisher~s c:xact test of independence in the 2 x 2 table is abe sum orh)1lCllcameb'lc probabilities ror outcomes III 1f!Q61 tIS flll'Dllf'(lbie ID 'he IIllemllli~ hypolhes;6 as the: absc:ned outcome. Let us apply this Jine or ahougha to abe lea-clrinking problem. In this example the experimental design itself IIllCS baIh marginal diSlributians. since abe woman was asked to guess whicb four cups had the milk adckd first and thc:Rran: which ftJurcups had the lea addc:d first. Note that ifwc: fix the
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ FlSHER'SEXACTTEST marginal counls we can focus on the cell of the table ahat corresponds to the number or milk-fint cups identified com:clly -ahis value delmDines the other tIRe cell values. In other words. assuming fixed row and column counts. one could observe the rollowia: eable with the indicated probabilities:
T ••-1astI1il data NumbB 0/miIk-jir" DIpS rorm:tly identified
0
0 4 4
1 3
4 2
PrfTabieJ
pot'olw
0.014
1.000
0.229
0.986
0.514
0.757
4 4
0.229
0.243
4
0.014
0.014
Tab/I!
2 2
4 3
3
4
1 4 4
4 0
4 3 1 4 2 2
4 1 3 4
0
0 4
4
4
4
4
• 4 4 8 4 4 8
• 4 8
Note that the probability of eacb possible bible in the n:ference set of 2 x 2 tables with the observed margins is obtained frvm the bypergc:ometric distributionjustdcscribed. where the value x in this case represents the number or milkfirst cups COI1'Utly idc:nUfted by the woman: i.e. she expends four milk-first choices on the eight-cup tolal- four of whicb were QclUQlly prepared with the milk poured ftlSl. The P-values just displayed are the sums of probabiUlies for all outcomes at least as ravourable (in terms of guessing com:clly) as Ihc one in question. For example. since the table Ktually obtervcd has .'"C = 3. the exact P-value is Ihc sum of probabilities of alllhc tables for whicb x equals orexcc:c:ds 3. This works out 10 0.229 + 0.014 == 0.243. Given such a n:lalively lillie P-value. one would conclude that the woman's perf'orrnancc does not furnish suflIcient evidence thal she can CCXRClly guess milk--Iea pouring order. Note thal the approximale P-value ror the Peanonchi-square test or independence was 0.0786, a dramalically different numbu. The exact test I§Ultleads to the same conclusion as the asymptotic test resulL but the exacl P-value is very ditren:nt rrom O.OS when:as the asymptotic P-value is aaly marginally bigger than O.OS. In this example all rour lIIIII'gins or the 2 x 2 table were fixed by design. In many cases. however. the ~ns are not ftxcd by design. Nevertheless. the n:ference set when computing Fisher's exact tesl is constructed using fixed row and
column margins. We stress once again that whether or not the margins or the observcdcontingency table are naturally fixed is irrelevant to Ihc method used to compute the exact test. In either case. one computes an exact P-value by examining the observed table in relalion to all other bibles in a rererence set of contingency tables whose margins an: the same as those or the actually obtervcd table. We do not imply that othu marginal outcomes are impossible uncI&:r the conditions of the original experimenL However, since the infe~nce is based on hypothetical repetitions or the original experiment. the~ is no logical poblem with imagining that in these repetitions all outcomes whose row and column margins do not match the ones actually observed will be ignored. Then: are many compelling n:asoDS for this conditioning. including Ihc arrdlltuity principle. the suffICiency principle and the notion of eliminating unknown population characteristics (often called NUlS.-ua: PARAMETERS) from the PROBABIUTY DISIUIUTIDN of the test statistic. Beginning with FISher, there is a rich body or literature justifying conditional inference alaag these Jines. A man: IetlCntlmltmenl or conditioninl is provided by Yates (1984). The main advantagc of conditioning is that Ihc dislribution of the observed lable is known, thereby making exact inference possible. Note fUJther that while Fisher's exacl test is traditionally associated with the single 2 x 2 contingency table, its extension to two-way tables with an arbilrary number of rows or columns was fint propasc:d by Freeman and Halton (195 1). Thus it is also known as lhe PR:eman-Halton test. II is heace an alternali"" to Ihe Pearson chi-square and the likelihood ratio tests for testing independence or row and column classifications in an unanIerccIlwo-way contingency table. n.e idea or conditional infe~nce to eliminale nuisance paramclClS was fint propasc:d by R. A. FishCl"(l92S).lt is the cbiving force behind mucb of exact inrerence for categorical data. However. Barnard (I94S) proposed an exact lest that eliminates the nuisance parameter frvm a single 2 x 2 contingency table without conditioning on Ihe marginal counts. This has been shown to be less conservative for 2 )( 2 tables than the conditional lest. The method is caatrovenial. however (see Yates, 1984). and was subsequently disavowed by Bamani himselr. Moreover. it does nol Radily extend to tables or higher dimension than 2 x 2. For these reasons Fisher's test n:mains the most Widely used exact method ror obtaining an exact test ofassociation between IWO categorical variables. CColPSeJCMINP
BanIard. G. A. 1945: A new te5t for 2 x 2 tallies. Nallllr (56, 177. FIIMr, R. A. 1925: Sialisliral melhotb for rrseorm M"OI'kers. EdinburP: Oliver aad Boyd. Freemaa, G. II. aDd HaUaa, J. II. 195 I: Note 011 an cud lmIlIIICnt orcOIItinscaC)', goodness of fit ad other problems of siplificancc. Biometriko 38. 141-9. Sa....... Do 2001: Tht Iad)' ItUling tN. New Vorl: W.H. mcl'DlllL Yat., 1".
173
FISHER'S UNEAR DISCRIMINANT FUNCTION (lDF) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ 1984: Test ofsipificance for 2 )( 2coalingcaey tables. JountIIl ofl"~ Royal SIG/U/im/ S«iely. SerieJ" 147, 4~l.
fisher's linear dlacltmlnant function (LDP)
For lIIOI1: details sec www.rda.gov. BSE (See also MEDICINES AND IiEAL1IICARE PIt.OOOCTS REaUUlORY AODIC'Y (MHRA), 1lBU.A'ItIlY SfATIS11CAL t.IA11ERS]
forest plot
See DlSCRIMINM« RINCTION AHALYSIS
This diqram is most CDIIUDOnly used in
SYSTEMATIC 1E\1E\\'S .o\NDMEI'A-ANALYSES orCLINICAL TRJALdata.
fisher's ..transformation fishing
Sec CQlJtELA110X
Sec POST HOC ANALYSES
flve-number ..mmary Sec BOXPUJI"
fixed effect This is one of a set of elTects on a I1:SPOnsc variable corresponding to a finite set or values taken by an EXPlANATalY VARIABLE. Fixed effects 1ft included in a n:cR:Ssion model to acknowledge that R:SpOII5C IcIIcIs to dilTer Ixtwccn Ihc: groups dcfiaed by the explanaltxy variable. By includin:; nxedelTects.1hc: invCSliplOrcan estimate Ihc: level of R:SpOII5C ror each scparaIC group 01' eSlimDle the effect or another- \'Driable or intCl'CSl while conlrDllin:; ror lhc differences between groups. TYPical examples or explanatory variables include indicator variables rCll' leader. clnJl In:alrnc:at n:ceiYeci in a clinical trial. ethnic background or ce~ in a multicen~ study. If lhc variable cleftncs k distinct groups in lhc dalascL e.l. k categories of ethnic background.1hc: fixed groupeffecls an: iDcorpondc:d by adding k - 1 n:gn:ssion panunclCl5lo the regression model. Fixccl errcclS an: appropriate when lhc in\lCSliptCll' wisbc:s to estimate or conlJol ror the effecls or the k speciftc groups deft ned by the expllllUlloly variable in the datascl or intm:SI. An a1tcmalive approach to maclclling the dilTenmces bctw=n groups is to assume RANDOM EfRCI'S. by declaring lhc etTc:cIs or the IlUUping 'Variable to be clnwn ftum a distribulion of possible effects. "Ibis is appropriate when. ror example, the 10 hospitals in a study an: repnlcd as a sample clnwn fiom all hospitals and the investigator is interested in the population ofhaspitals in general rather-dian the 10 particular hospitals n:cruik:d. RT
flexible designs See IMTA-oEPENDENI' DESKINS Food and Drug Administration (FDA) The rood and clnJg n:gulalory body in lhc USA is the FDA. lis mission statement is: 'The: FDA is responsible ror pralec:ling lhc public health by assurin:; the safety, efficacy, and security of human and vetcrinuy clnJls. biological products. medical devices, our nation's rood supply, CXJSII1dics and producls that emit radiation. 1bc FDA is also n:sponsible for adwncinl Ihc: public health by helping to speed inno\'8lions that make medicines and roods more effective, safer and more affonlable: and helping the public 10 gel the accurate sciCIK.'ICbased inftxmation they need to usc medicines and foad to improvc their health.'
bUI is also usc:cl for displaying the n:sults from other types or Sludics. 1bc: plot consists or a diapam thai shows both Ihc: eSlimak:d elTect sizes rrom each Sludy and the com:spondin:; C'ONfIDE~ ImERVAL. An example from a mcta-analysis or 28 C.o\SE-COJIITROL S'I1JDI!S collClCl1lCd with the possible association between Chiomyditllrochollllllis and onI COIItraccpliVC usc is shown in the fint BIlK (sec page 175). Hen: the elTcct size is the logarithm of the odds ratio. (Other examples an: given in Sullon el DI., 2000.) The point estimates 1ft sometimes marked usin:; squan: shapes or size propoatianal 10 the size or the study n:1ftsc:atccl. 1biscounlenlcls the viewer's eyes bein:; drawn to the least silnilicant studies, which have the wideSl conftdcnce intervals and an: ~fCR graphically ~imposin:;. Some:times. 100. the individual lines an: onlcn:d by the cIaIc or stady. by some index or quality study CII' by the point estimate or effCCll size. The second fig. . (scc page 175) contains both paphical and tabular elcmcnls. Data ftum each included study an: summarised in harizontalmws on the clialnun. willi eslimatc:s or Ihc lIadmcnt etrecl marbd by a black and associak:d uncertainly depicted by lincsextending betwccn the uppcrand lowcrconlldcac:e limits. The size or the black varies between SIUdics ton:8ect the weight given tocach in theMErA.-ANALYSIS., mon: inftuential studies having larpr blocks. The 0\'C188 estimate or etTcct is marked at the bottom or the plot as a diamond. thecenbBI points indicalin:; the pointeslimale while lhc outer paints IIIIIIk the conftclcnce limits. A 'VCI'Iicalline is often drawn across the diqram atlhc mctlHU1a1ylicai poinl cSlimatc. Forest plals or J'Dlio eft"c:cI mc:asurcs (such as odds ratios, risk ralios - see RDATIVE RlSIC AND ODDS JlA'IIO - and hazanI raaios- see SURYlYAL ANALYSIS)~ platk:d on log scales so lhaIlhc confadcnce intc:rYals rar individual studies and the ovcrall eslimaIe 1ft symmcbical about their point estimates. In addition to the gmphical display. rCRSI plots may numerically n:parl the data for each IriaI rl'Olft which the CSlimate is calculak:cl, the estimalc or effect Dad confidcacc interval and the pcrcentap weight thatlhc study contributes to the mcla-analysis. The ovendl estimate may also be reported numerically staling the point estimate. a confidence interval. a test or Slalislieal sipificllJltlC of the NUlL H\'IIDTHESJS of no trcaImcDt etTc:ct and a test of homogeneity orno difference in etTc:cls bc:lWcc:n Sladies. From the ploa it is often possible visually to assess the dcgn:c or hcIerogeneily in study n:suIts by nolinl the overlap or conftdcnce intcnais or individual slUclies with the mCla-analytical point estimate.
_________________________________________________________
~STP~
I
r
I•
~
•
I
•
I• •
i
o
-1
... a tor errec:t size (log odds 1liiio ICIII) faresi plot Forestplotof_odd n.tios
_cue
COIJImI sIucfes ofChlamydia hchomatIs sndOllllCOli6...... USII
~t plots may be suppiaaentecl iD syllemalic RViews by L "Abbi pklts (whic. plot ~t ratcsOD tn:atment apiDIt eYCDt. rates _ COIdIOI) and ..... IUJIJ. 1ioMMr, __ or IhcsecliapunsC_RpIace r~plots in IhcirabilaylOIeport bolla 11M: individual study'" clepiclina cfl'aciestilllllla'"
ancounten=d spelling as flN'l'~~' a~d capitalisation ~
inapproprialO.
JDlBSB
.
""'.10
LewIs, S. ... a.... M. 2001: faIat plat: sa: _ waad ~ the InCI. BriIiM Ihdbi ...., m. 14l9-1D. ...."..,.p., Bleb, J., 1'---, D., . . - . ........ M. ... . . . . ,... 2mD: SJIlCIDIIic mirN and ecanaiaic NIluaIiaa of· nlod aadiaIiaD 1m...... IIr DDIHIIccr dyspepsia. Brililh Melit:aI 111III'IIIII 321. 659-64........ A. J.,
_.s..
uacertaiDly,aswcUasllH:mc:ta-analyticallUllUlllll)'.t:llimalcs. Lewis aDd Clarke '(2001) leVieweel ~'arilinl or ~ DaIDe 6forcsl' .... ide.iftccllhat both the occasionally
H""'.
WeIght
1191164 .1D184 87189 7M2 1211154 1431t54 1011133 1111t42 81/150 721143
· 0.92 (0.8t, UICS) 0.85(0.81,1.11) · 0.85 (0.77, G.I8) om (0JI5, 1.11) 1m (0.88, 1.34) 0.81 (0.70, 1.18) · 0.88 (0.77,0.99) 0.83 (~88, UIO) .0. (0.60, 1.24)
8.
2IW1 2691480 10214 741129 8&'124 31n4' 34170
1
t4.9% . 8.2% t8.3%. t2.3% 8.4".
's..
22.9%.
to.O%
4.0%
· 0.81 (G.88, 0.98) P • 0.0002 Test til heterogeneity: 0=7.1 ..... P=O.5
Rllkratio fOIwt pled FOItIIIl plot of the nine ttiaIs conipanng HeIcobact8r pyIoii endcalJan IherlIpy IIIIh pIat:8bo.1IIIIlbiDIIt:s (AfOayyecl at aI.• 2000)
175
FORWARDS REGRESSION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ A....... K. R., J.... D. R., ....... T. A. ....... F. 2000: Mtllrotls in IMltlotIIftIlym. ChicIa1cr: Jdan Wiley a: Sons. 1Jd.
forward. regreaalon
See
LOOISIlC
REGRESSION.
MUl.JlllLE LINEAJl RfORESSJON
frailty A term generally used for unobserved individual hc:ICIogc:acity, particularly in the anaI),sis or SURVIVAL DATA. Such analysis is usually based on the assumplion that lhe surviwl limes or different individuals an: independent and came rlOm lhe same distribution. i.e. we assume a homogeneous population. In pmclice. populations an: usuall), heterogeneous. We attempt to explain this hctcrvgenc:il)' by fttling models (such as COX'S REORSSION .10DflS) using explanatory variables. Fn:quenll),. even after adjustilll for the explanatory variables. subslanlial hcIerogeneil)' n:mains. either due to lack of Itnowlc:dge or or inability 10 measure all explanatory variables. Frailt)' models wen: clc:vc1oped b), Vaupel. Manton and Stallard (1979) to take into accounllhis 'extra! Yariabilit)'. IndiYiduals who have the same obsc:rvcdexplanatory Yariable ma)' differ in lheir undcd),illl unobserved health slalUs or hilt)'. As time progresses and rrail indiYiduals lead to die. the composition of the population with n:spc:ct 10 rrailty changes. lporilll this can lead to biased estimation. An lIIUIIC8SumI random variable n:prc:senting the unknown hilt)' is generally asS1lllled to acl mulliplicativel), on the baseline hazard l'uncIion. 1'bcsc t)'pes or models arc essentiall)' 1DXED or RANDOM EFFECT MOOIU. The UKEUHOOD functions for flaill)' modc:lsan: quite mmplex and esdmalion is generally carried out using the EM AI..GORIIHraI. Much n:seaadI has been deyoted to lhe choice of distributions for the frailt), variable. A common choice is die: O.uD1A DISTIl. BU'J1Cr.I. Klein and Moeschbcrpr( (997) pnwide SAS macros that estimate the pmma rrailty eox rqn:ssion model on their website. Other choices an: discussed in Hougaani (2000). "I11esc frailt), models have been widcJ)' applied. Hoqaani (2000) proyides a numbc:l' or examples. ineludilll death due to malignant melanoma and limes to calhc:ler n:moval due to infeclion. for which the SIandan:I Cox 1q~S5ion approach is CXIIIIpIIJaI with the use or fRill)' models. Fraill)' models an: also used in the analysis of multivariate survival data 10 model dc:pc:ndentIC between times (AaIc:n. 1994). Such data can arise in seyeral dilTemit ways. For example. time 10 l'CCum:nl events (such as epileptic sc:izun:s) on a subject will generall), be clc:pcndc:nl. Multivariate data aIsoariscs incoMcction with time 10 failu~ or similarorgans (righl eye. left eye) or liretimes or related people (related geneticall)', by a comman eavironmenl. c:IC.). Forsimplicit)'. considersunival dalaon pairs oflWins who an: assumed to ha~ a common fRill)'. This frai)ty is taken to be a random wriable. Given the frailt), for a specified pair. the individual hazard function is taken 10 be the multiple of
frailty with a baseline hazard. The common frailly (of a pair) creales lhe dcpcndc:ace between the: surviyal times within pain;. Assuming a dislributional rorm for the rrailty and a\ICI'Bging the conditional surviyal functions over the frailty distribulion pcnnilS the derivation or a joint surviyal function. The resulting shum frailty models describe the dependence between the times. Details and man: mmplicaled multivariate: rnailty maclc:1s an: discussed by Hougaanl (2000) along with various applications sw:h as twin suniYBI dala. DF/BR
o.
A~ O. 1994: Efrccts offrailty ia suniYal analysis.. SItllislimI M~thotJs ill M«lil.Yll R~setlrm 3. 227-43. H......-., P. 2000: AtrtiyJiJ D/RUlllhYITiDI~ surriraltlala. New Ymk: Springer ~rt•. Klela,J. P..... M.......... M. L 1997: SurrhYIIQIIIIlysis. New
York: Springer Verlag. Vaapel,J. W.. ~ K. G .... Stallard, Eo 1979: 'I1Ic impact of ~ty in individual frailty GIl the dyaamics 01' manality. DtlflDlraph,. 16. 439-S4.
fraud cletectlon In biomedical .....rch Fraud in biomedical research comes in many guises (Lock and Wells. 1993). The boundary between tiaud and simple carelessnc:5S is oRen ruzzy, although ftaud is characterised by a deliberale attc:mpltodc:ceiYC, which ma), be Yery hard to pm~ in the absence or positive extcrnal evidc:nec 01' conression. Data discn:pancics. such as transcription errors belween the saurce cb3nents and the data collection forms. may polCDtiall), be: reganled as fraud if they occur in some S)'Slcmatic way or with abllDl'lDBlI)' high rmaucncy. two cin:umslanccs thai n:quin: a statistical assessmenL In the USA. the tcnn 'fraud' implies injury or damage to victims; hence the term 'misconduct' might be prerCIRd. However. lmiscondud' also includes pmctic:cs such as plagiarism. connias or inlc:rc:s1. misuse or funds and other queslionable resc:an:h practices. In the UK. a Joinl Consensus Conference on Misconduct in Biomedical RcscaR:h held in Odobcr 1999 deftned rc:scan:h misconduct as 'behaviour by a n:scan:her, inlenlitll'llli 01" nol. thai falls shad of good ethical and scientific standards' • Here the k:Im lrraud' rerers spcciftcall)' to .'a/tlbrilYllion (making up data YBlucs) and .'1I 11I1sijilYllion (amending 01' eliminadng data valucs). a usc of Ihc: word that is at once more n:slriclive than is implied in nannaI conversation and less specific than in legal texts. Scientific fraud (in the limited sense or data fabrication 01' ralsificalion) is. in all likelihood. a rare phenomenon in biomedical n:sc:an:h. although other misconduct may be: more common. Some: of the most widely publicised cases or rraud ha~ taken place in nmdomisc:d clinical trials and have created so much media allention that the uncritical obsc:rvermay ha~ been mislc:d into thinking thai the problem was far wonc than it aclually is. In all systc:malic investigations ~portcd by cooperative groups and pluumaceulical companies, Ihc: pmpadion of invesligalor.s who were found to have CX1IIImiucd fnwd was Jess thaD I CJ. (Buysc el al.•
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ fRAUD DETECTION IN BIOMEDICAL RESEARCH 1999). In Ihc audits pcrfonned by Ihc United SlaIcs Fooo AND DRUO ADMlNlSTRAmH. the "rar~usc' investigations that
followed revealed in most cases sloppiness or incompetence ndhc:r than fraud (Buyse eloJ•• 1999). However. ~ may be substantial bias in esIimating the actuallRvalcnce of rmud bc:c:ause of the natural tcndc:ncy to su~. conceal or minimise actual cases. In a n:c:ent cross-sc:ctional survey of bioSlalislicians who were members or the International Society for Clinical Biostalisties in 1998. man: lhan half of the MSpOndenls stated Ihat they knew or a projc:ct in which rmud had occum:d in the pmvious 10 yean. while almosl onc-lhird of them had been engaged in a projc:c:t in which rmud took place or was about to take plllltlC (Ranstam el Qt•• 2000). All in all. reliable data on the true prevalence or fraud are lacking. The majar ditTemlCe between fraud and me~ ermr lies in the "intention tocheaf that defines fraud (Buysc el Qt•• 1999). This differeacc must. however. be qualified by the: nature or the intent OIlen investigators fabricate or falsify data to have complete records on all cues. or to climinate 0l1I1JERS. nolto modiry thc outc:ome orthe experiment Such data manipulations. irdone independently ofllealment assignment (c.g. in a blinded bial) introduce llOUe but no BIAS in the experiment. MCR serious cases of fraud involve rabricaling complete patienls or lampering the data in order to obtain a desirable result 1hese are cases where there is an expectation of pin in tenns of prestige. advancement or money. 1'bcse cases may also be the easiest to detect statistically. especially in MULn. CEHlRE "IRJALS.
Some data items collected in clinical trials seem to be more prone to error and/or fraud than others. EligibililY l.'rileTia may be ·pushed' a little to I1'UIke a palient eligible for the bial when in fact that patient does nol strictly mcetthe c:riteria~ many examples or fraud may have occum:d because eligibility criteria were excessively resbictive and widening enb'y standards is often a good solution. RepeQled nlellSllTemerrls are requested ~pcatedly 0\'Cf time (such as. rar instance. a ballcry of laboratory examinations). in which case data may be ·propagated· from the ptevious visit if the measurements are missing for a parlicular visit: such imputation of missing values is questionable at the time orthe analysis. but certainly unaIXlCptable when reporting the obscrvalions. Adverse erenls arc likely to be undCIRported by some investigatcn (although such underRporting may reveal lack or intc:n:st or diffemlCes in interpretation ndhc:r than rmud). CompliQnt.'e dQIQ are notoriously u~liable ir they are based on the numbu or medications returned ("pill CXJUnts')~ whenever compliance inrormation is deemed important. it is advisable to usc objective measurc:menls based 00 blood or urine tests. Data fabrication has been detected inptllienl dkuies through the colour and texture or the pen supposedly used on successive days by the patient. the potien"s handwriliag. ele.: the reliability orinronnation collected in patient diaries can oRen
be called into question. although electronic data captu~ can provide a man: reliable altemalivc. Some types of fraud arc committed in trials conducted in a rastidious way. with lengthy case ~POrl rorms. excessive requests ror data clariftcation. etc. Often randomiscd clinical trials can be drastically simplifted without 1055 of essential infonnalion. For instance. eli,ibilily criteria can be simplified and len 10 the discretion or the investigator; the amount or data collected in the case report rorms can be reduced. e.g. by eliminating much of the medical histolY and prior medications. concomitant medications. laboratory examinations. etc:.. that arc not essential 10 the interpretation of the trial results; the follow-up or the patients in trials requiring prolorq;ed observation can be as in routine clinical practice. The traditional approach to fraud dctc:ction has involved monitoring visits to Ihc centres or sites participating in the trial (Knallcnld el til.• 1998). Some such onsite monitoring may be needed and useful. for many types of fraud would remain completely undiscovered wc:rc it not rar the careful checks carried out during these visits. However. onsitc monitoring is labour intensive and expensive and it. too. may fail to pick up fraudulent data. Moreover. the law of diminishing mums suggests that it is not cost effective to demand 100CJt \'Cfification or all SCJUmC data. Monitoring activities can be limited to somemndom selc:c:lion of the data. wilh the possible exception or data pertaining to the prinuuy EHDPOIHTofthe trial. The random selection can be done at the Icvel of the centres. the patienls or the data itemsthemselvC5. With such a random sampling scheme. one can estimalc: the o\'erall dala error rate with ~spccified precision and increase the amount or oosite monitoring. if the observed rate exceeds some upper limit Another approach consists of visiting only the centres in which problems. errors ar fraud are suspected. A more innoWlivc approach todctecl fraud relies solely on statistics. The data of randomiscd CUNiCAL 11lLW can be verified using statistical techniques thai take advantage of their hi,hly structured nature. Most Dl\TA ENIllY and DATA MANAOEMENT software used for clinical trials perfonn basic checks. such as RANCJE and consistency checks. but more cxtensb'e data checks typically occur at the end oflhc study along with alhcr statistical analyscs. far too laic for corrective action. Balleries of checks using standard statistical techniques could be used early on in the course or a trial without large incn:ues in cosls and could save considerable time if problems wen: detected and com:cted early. The principles involved in uncovering fraud through statistical Ic:chniques rest on the difficully of fabricating plausible data. particularly in high dimensions (Evans. 1998). Univariate oMen'alions can always be fabricated 10 rail close to the MEA.~. although preSCIVing Ihcir VARIANCE is man: or a challenge to the inexperienced. Even the astutc
177
~D~N~B~CAL~CH
______________________________________
fraud detection In bIamedIcaI ,..18n:h Some stallsIicaI tecIrniques that may beusedto unctN8f'fraud One YUiable • a lime
DcsaiptiYe stllislics
Boxplot Fmqucacy histopam Staa.......1caf plots
TCSIS far slippap CrasHlbulalialllscattcrplot c~ssian
Cook·. dilillllXC
MaWI'"'N,· disIaaa: ctlJSlcr .....)'Sis DiserimilllDt lIIIIysis
Cbcmoft'races Star (needle. spike, pIaIs Halcllilll's -r TCSIS far _ cDnaIsts A~lations
PnIfiIcs PDIynomial CDDInSIS iuDS lata
Residual plata Cusum
CaaInII cllarts
Statistical appmachcs may also take full adYlIIIlap of Ihe highly struchnd I1IIIUR ofclinical trials. which arc: JIIUSPCClive studies, entirely spcciftccl in 8 writlCll prolacDI aad data coDc:cIion iDsbUmcnl (the: 'calC Iq)DIt farm'). usually involYiIil scvcraI ccnlla and. when co.nqiandive. a rancIomly usignc:clIrc:almc:aL Comparilll each cen= or IiellllDentlo the albers in lenDaofthedillribulion oflDlllC variables. eithc:r IDIccn in isolatioa (univariate appraach) or joinlly (mullivariate approach). can dc:ac:ct uausuBI paIIcmS in the cIaIa. Comparisans bc:t'M:Cl'l cenIJa am particularl)' informative if them are IIIIR than a few observations pcrccnlle (in which case mud in aD)' one cenlR may ha", a sizeable impact on theovenallmsuk).SucheamparisDnaarc:userulwilhdilfcRDt types of fraud; far iDstanc::e. the presence of audic:n or die: consistc:ac)' in Ihe ctrcctoflrc:lIbnc:Dt may m~ liaudaimc:d at eaagc:ndiDl the etrecL while the pn:scnce of 'inlicrs' ar Unclcnlispc:rsiOll iD die cIata may nm:al inwmlcd cases. Several uniwriale Slalistical leI:haiqucs ma)' be: usc:d Ia iDSpCCtlhc: cIaIa (sec the ftnI table). Statistical checks may n:veal lDIusuai data patlCms thal an: often Ihe mark of fraud (sec die second table). In",nIcd or manipulated data lend to haw ·too liuJc variance. no outliers or aJI abnormally Ial diSlribuliOll. 11Ic:ir clislribution ma, be: IGockMc toa simple but implausible madel, such as a NODL\L DISIRIISl1JIQN with numbcn far the ..... and srANIWtD DE\'L\11ON. Since fnucI usually cx:cws in a sinr:Jc cenn (except ia die: unlikcly IitualiOD of a coanIinatcd I'1aucl aaoss se\'C1'81
IOU""
cll&:aa who lakes can: ia prac:rvil1l baIh dac mean ancIthe variance may.be lrippc:d up by examination ofthc KUR105Isof abe cliSlribulion. MulliwrilllC abscr41ions must in additioa be consislCllt with the com:lalioa SlrUCIUm ~ their individual components. Ia paeraI, wb:a cIaIa an: fabricated to pus ccnain sbdislical tests. they an: likely to fail on oIhcn (Haldane. 1948).
Another absemdi_ that CaD be used 10 check fabricated data is Ihat bumans~ paorl1llldomaumberlenendarS. Even informed people seem uubIc to gcnentc Ioq sequencca of n.,bc:n that pus simple testa for randomness. Dipt prd'tRncc. e.spccially IcnDinaI digit pn:fcn:acc. or aa excess of aauncl numbers may easily meal data fabricllian. 8c:afon1's law may also be usedtocbc:ck the rancIomncssofthc firstclipt or all real numbers reported by asilille individual (or a silille cenR). This law cstablisbc:s that the probabilit)' or Ihe lint silnificant diJit bc:il1l equal to D (0= I•...• 9) is appIOXimatcl)' sivcn by a loJarithmic dislribuli_ (Hill. 1998): P(O)
~
ICII(O + I) -log(D)
Hcncelhe fn:qucncyof 1asa lint dipt should bess bilh as I B'J,. while that of 9s should be lower Iban 5'1" a resultlhat IUns Bpinsl inhlition. MCR sophisticlllCd techniques are available to check the I'IIIICIonmcss of cli&its in a ICqIlClKlC of data \llllucs. 3QCJt. the fRIqUCDC)' of2 as alilst clilitshould beckMc to
fraud cIeIectIan In biomedical . ...reh Some paItems that.may I8VfIIII ftaud in cIinIcIII triIIJ data One wriable at a lime
Dilit ....crnce
Rouad MUIIber prefaace 1bo few or 100 I11III)' outliers 1bo little or too au:h vlliaDce SUIIIFPC:ab Data too sbwcd
MultMriatc inIien MulliYUialc outIien
Lemaac
1bo weak ar tao stnIDI
CClRIatian lnIapoIaIion Duplicates 1D\'CIIICCl p8ltCmS Brcada of .........1ion 0.)'1 of week (Sundays or holidays) Implausible accrual 111DC baIIIs
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ fRAUD DETECTION IN BIOMEDICAL RESEARCH cenaRs). statiSlicai checks must be performed within each centre as well as O\'Crail. A comparison of the n:sults Jq)OI1ed by diffen:nI centn:s may n:vealtoo liule variability in one or mon: cenln:s as c:ompaml to the o\'erall variability. Perfecl compliance with the protocol. for instance. may be the mark of fraud. Such a comparison may also ~'YCaI 'slippap:' of one or mon: centn:s. the NULL HYJIOJ1IESIS being that the means of the variable of inlen:St ~ equaL but for random ftucluations.1o Ihe overall mean (Canner. Huang and Meinert.. 19s1). Multivariate slalistical tc:cbniques ofTer mon: checking possibilities. but they an: seldom used in clinical trials. if at all. Multivariate statistical methods include c:orn:lations between several patient-n:latcd variables as well as comparisons between the randomiscd groups. Simple IWO-way cross-tabulations or SCAlTERPI.01'S for various pain or variables can be compaml KlOSS cenln:s and any unusual pallelns investigated funher. Outlying observations. or outlying groups of observation.'i coming from the same centte. can be detected man: effectively in multidimensional space than in a single dimension. Mcm:over. in multidimensional space. inliers am be detected through the usc of the Mahalanobis' dislance just as well as outliers: inliers have an abnormally low Mahalanobis' diSUlllce (they falllOOclosc to the multivariate: mean). while outlien have an abnormally high MalmJanobis· distllllCe (they (all too far from the multiwriate mean). 11Ie detection of inlien may be mon: useful to detect fraud than the detection of outlien. because fabricated data will tend nol to contain outliers that an: at higher risk of being deaccted than an: values close to the (multivariate) mean. Robust methods such as using ranks in place of the observations an: advisable for the detc:ction or outlien. because these can create seven: departun:s from multiwriate normality. When. as is often the case. some variables an: mc:asun:d n:pcate:dly over the course of the trial on the same palient. these measures lend themselves well to a \'ariety of checks. H~ again. an insuf1icient variability over time may n:vcaI propaption ofpn:vious values rather than genuine observations. Sometimes the fraud involves a mechanism or computer algorithm for making up data. In any trial with prolonged patient entry and follow-up. one can use calendar time to perform additional checks on the dala. Simple checks can be perfonnc:d on a specific day of the wc:c:k. for instance. since cenain e\'ents or examinations ~ unlikely to have taken place on a Sunday. Time intervals between successive visils and the number of visils per unit time proVide funher opportunities for checking the plaUSibility of a sequence of c\'ents. A comparison of In:atment groups by week or month orR.ANDC»USAnON can reveal suspect periods during which allln:atmenls wen: not allocated with equal probability. Pc:Ifcct compliance with the protocol in terms of dates may be D marker of fraud. Man: adwac:ed
checks can be useful~ such as the saability of the variaDL'IC of observations over lime. Randomised clinical trials constitute. by design. the most n:liable type of medical experiment and their n:sults an: gel1ClBlly robuSi to occasional cases of data falsification and fabrication at some participating cenln:s. The highly publicised case of fraud in the National Surgical Adju\'ant Bn:ast and Bowel Project (NSABP) provides a framework 10 examine the impKt of such fraud on the n:sults of clinical trials. Briefty. one of the investigaton in bn:1ISI cancer trials systematically altered some baseline patient data so that these patients became eligible for enb')' into the trials. The data subject to falsification wen: the date of surgery. the dale of biopsy and c:slrogen n:ceptor values. For example., in one study. the delay between the surgery aDd lDDIiomisalion had been set to a maximum of 30 days by Ihe trial protocol and dates were falsified for a few patients in whom this limit had been exc:ccdecl. 11Ie fraud was clearly not aimed at distorting the ~ults of the trials one way or anothc:r and. indeed. a can:ruI reanalysis of NSABP trial data with aad without Ihe fraudulent ClCnln: confirmed that the trial outcomes had nat been materially affc:ctc:d by the fraud (F'lShel'ei aI.• 1995). In another large published bial in stroke. all data from one cenln: suspected of fraud wen: excluded from the analysis. again wida negligible impact on the study n:sults (ESPS2 Group. (997). Yet this cenln: had contribuacd to the study of 452 of the 7054 patients invol ved O\ocraiU Fraud is unlikely to affect the n:sults o( a trial if any of the follOWing conditions hold: the fraud is limiacd to one or a few investigators (perbaps one cc:nln: in a mullicentn:: setting) and/or to 8 few data items. proVided that th~ ~ many inveSligators or centn:s; the fraud bears on secondary variables that have lillie or no cfTect on the primary endpoint of the trial: the fraud affeds allln:atment poups equally. and hence does not bias the n:sults of the trial. Fraud c:ommiued without n:gan:l to the In:atment assignments (e.g. prior to randomisation or in double-blind trials) generales noise but no bias. At least one of these conditions fn:quently holds and tben:fon: fraud should not be expc:cted to have a major impact on the n:sults of multicenln: clinical trials. One caveat is that where an incn:ase in noise occurs. this can make dissimilar In:almCnts appear similar. With 8 tn:nd towards using equivalence or noninferiority bials for licensing pwposes this is of c:oncem and could n:sult in inetTcctive medicines being IiClCnsed. MB Bayse. M., Gearae~ s.. L, EtuI~ S.. Glller~ N.. R. . . . ., J .. Scbernr. B.. Lta8ft. E.. Marn)'~ G., Edler, L, Ratto.., J .. CcIIfaa. T.. LacbtaIlraU, P. Bad Verma. B. for tM ISCB Sabc.nJDIttee GIl Fraud 1999: 1bc rule of biostalistic:s in the prm::otion. ddcction and lKatmcnt of fraud in clinical bials. 5101;sl;(3 in MedkiM 18.3435-52. CUaer, P. L. Haaaa. Y. B. Bad l'felaert. c. L 1981: On the detection of outlierclinics in medical and ~caI
179
FREOUENCYDlSTRIBUnON _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ trials: L Practical CCJIISideraIioD COtIlrolkd CliniaU Trill& 2, 231-40. ESPSl Gnup 1997: Eurapcaa SInJke ~ntioft Study 2. Efficacy amd safety data. JOIIIfIal of Neurological SdelfaJ lSI (SuppL). SI-&77. E9as.S. 1998: Fraud and miscoaducl ia medical science. In Anni. . P. amd CoItan. T. (cds), EIrc)'dopaetlia 0/ bioslatistics. Chicbesacr. 101m Wiley &: Sans, Ud. 1'IIIIer, B., AIIdenaa, S..... Redmnd. c. K." Ill. 1995: Reamalysis aDd raults after 12 yellS of foUow-up in a rudomiscd clinical llill comparing toIaI ~y with Iumpecaomy with or without imldialiall iD the ImdmeIII of Imast cancer. Nmo EnglaRtl JIJUlJlQI tI MftiitiM 333. 1456-61. .........., J. B. S. 1948: 1bc faking of FDdic results. Eurrka 6. 21~. . . . T. P. 1998: The ftnI-cligit pheaomman. Anteri",,, Sdenlbl 86, 358-63. KIIIIUenId, O. La, RdMId. F. W., Gecqe, S. L., a.toa, F. 8., DMIs, c. Eo, Falrw"", WI &.B....... T., MoWU)', R. .... O'Neil, R. T. 1991: Guidelines for quality ISSUIIIICC praccdum ror multicata trials: a position paper. Qml",llttlClinics/Trials 19,477-93. Lark, S. ad W", FI (eels) 1993: Fr. and mist.YNJtiIRi in nlftlic:al remum.l.oDdm: BMI PUblislUng Circlup. Renetam, J.,...,.. Me, Georae. S. La, EYaaI, s., Geller, No, Scberrer. B.,1...eIaftre, Eo, r.t1llT8J.O.,Edier.L.Huet.,J.,eauc., T...........bradI,P. for llaelSCB SubmmndUee GIl Fraad 2000: 1be binslmjctici.'s view of fraud in medical raearch. C-lrollftl Clilliml TriD& 21. 415-27.
frequency distribution This describes the division of a sample of observations into a number of classestogelhcr with a count of the number of observations falling in each class. II ads as a userul lahular sumrnuy or the main features of a dalasct. e.g. location. shape and sprcacL SSE [Sec also HlSlOORAM. PIIOBABDJrY DlS1RJBUTJOK)
friedman test
This is a nonpanunctric equivalent to REPEATED MEASURES ANALYm OF VARIANCE. being an eXlcnsion of abc WIlCOXON SIGNED RAHIt lEST 10 IIICR than two groups or lime points. It examines the ranks within a poop and tests if lbe underlying conlinuous distribution is the same for each group. The NUU. IIYIVI'HESIS is that theft arc no diffcn:aces in MEDIANS between the groups. n.e altemati~ hypothesis is thatlhcre is at least one difference in medians. The cIaIa should be continuous in nature and should be a mndomly selected sample measured at ditrerenltimc points or blocks of matched subjecls nndomly assigned to a poop. Subjects or blocks of subjects should be independent. An extension to the tesl exists that allows repealed measures on each subjecl. The test bcgi_ by consIructing a two-way table with N (the number of subjects) rows and k (Ibc number of time points) extIumns. Rank each ro\\' from lowest 10 highest. assilnilllthc average rank to lies in the data. Find the sum or nmks in each of the columns. Calculate: 12 [ F, = Ni(k + I)
,&RJ -[3N(k + I)) k
]
Friedman test FEV, (forced expiratory volume in 1 s) data from seven subjects recon:Ied at three times a day Morlfing
RImIc
AjtemtlDft
Rank
EJ'eRUrg
0.25 0.56 0.63 0.65 0.74 0.97 1.91
I I 2 2
0.4 0.17 1.45 3.02
2
un
I 3
1.29 0.15
2.5 2 3 3 3 :2
0.4 1.06 0.25 0.45 0.21 1.91 0.27
Raak ....
12
I 16.5
RtmIc 2.5 3 1
1 1 3
2
13.5
w~:
= the sum of the J1IDIcs for column j N = the numbel' of subjects k = the number of periods or conditions RJ
Compare Fr 10 the critical \'a1uc in Friedman IDbles and reject the NULL HYPOI'HESIS if Fr is IRalei' than or equal to the critical value. If Nand k am sufficiently large then cbi-squan: tables willa k - I DBJREES OF FREEDOM can be used i_tead of Friedman laba. As an example a study measured FEV I al tIu= dilTcrcnt timcsofclay to sec ifthcn:was aclifrerence. Data can be found in ahc: table.. 1b compute the Fric:clman test statislic: F,
= NIr.(!2+ I) [t.RJ] -[3N(/r. + I») = 7 X 3 x12(3+ I) x [122 + 16.52 + 13.52] -[3 x 7 x (3 + I)] = 1.5
From the tables (N = 7. k = 3, a =0.05) the critical yalue is 7.714; as I.S is less than 7.714. there is insuflicicnl evidence to rcjc:ct the null hypothesis of no ditrerence in medians between the tIRe lime: points.. so the medians can be consideml uncbanginr; across the time points. When the Friedman test gives a significanl result it is possible to do POST HOC 1ES'IDKJ to ICC where any differences lie. 1'bcre am two ways of doing Ibis. Usc the Wilcoxon signed rank lest pairwise on ahc: lroups applyi~ a c:oIMCtion for multiple testing such as a Bo.~ RaNI CORREC"I1QN. Altemati\'cly. the a\aBle ranks in each of the IrouPS can bccomparc:d. 'The null hypothesis is n:jectccl if the absolute difference in mean nab is lRatcrthan or equal to the critical value as shown:
__________________________________________________________
when::
I; = tile IDeIIIi malt iD period ar condition ;' Rj = tile mean rank iD period ar concIitioaj z.,= the critical Z wlue: far ~ ~ = al[k(k -I»)
Ie = tile number of' periods 01' conditions N = the number of subjects
Far furtbc:r ddails see Pett (1997) and Caacwcr (1999).
SLV
c..n.r, w. J. cditian.
1999: I'tatlitlll "."... .lrtc .,tll.ulk~, 3n1
auchcsta: Jahn W".teY cl SoDS. Ltd. Pelt. Me A.
N~ ~Itlt&tia/Dr Will mIW
1997:
mwum. 'I'hDIasIad Oaks:
Sap.
funnel Plots
Funnel plells.1ft • paphical clcwicc, used to cIetcct publicllllon BIAS ill SYS'IEMA11C REVEWS AND ....AANALYSES. Farcach study in • nwiew" estimated IRllllnent elled is platted apinstallll:8SlRofliiai pn:cJsioD such.the VARlAMCEarSTANlMRDERIORorlhctlalmClltelledardac:study sample size: (u. ad PiIJClnar. 1984). In a cIepartuIe rnm SIaIIdanI paphical practice. the plals comentiaDaIly depict prec:isioa on the vcrIicai axis and IIatnM:al effeci 011 tile horiZGlltai axis. Themeta-aDalyticaJ slDIIIIIIIIYmay bellllll'ked by a wraical Une. When aJl study !aUlts am published it is expa:teclt.... the: studies will haw: • syaunetrieal clislribulion 8R)UJ'Id die averap ellecl line. the ofltUdies with low pmtisioa beiaghuprlhaa lhaIoI'stuclies with hiJ. precision, yieldiDg • funnel-lib shape. Same paphs mar1t the funnel with liac:s wilhin which 9S1L of studies would faJl· \VCR IIIere nobelweelHludylH:renJPllClity. 'I1IechoiceoflllemellSlRof treaIInenl elJect· t1UI and Uu. 2000) ancItIIe measun: of pn:cisioa (S...., and Eger. 2(01) JDBka a dillerence 10 the shape ofb pial. Plats ~ bUtment eft'eds apinst IIaDdanI
spn:_
FUNNa~
ann am ~Iy 10 be: IRfcned. as dac: fun_I will MYel Slnli&ht ndher than curvilinear sida. Studies of tile causcs ofpublicalion bi8Sba~iadiated dial aoapublicalion is oftaa Unbclio stadies with nonsipiftc_t P-VAWB. whicbteDdaobe:dac:smalla:sr.udiesmpallililnuilar small eft'ec:ts (Dickcnin. 1997). suppresSion of these slUclies cn:alesa visual hole in tile plot. Unlcssthc: intc:rvcDlion has no eft'c:cI.lhis will bias Ihe META-AJlW.YSIS and induce .YIIUIIeII')' iDlodac: pIaL 1nIsofpublication bias. sucb as tile Beg (Bell and Mazumdar. 1994) and EIPI' lests (Ega-" III., 1997), ale Iesline for lids asyrrunelry. The IriDHnd-ftU melhad or iDYelstipiiDl tile impact of public:alion bias (DuYIII and 1WcccIie. 20(0) illlpllla missing studies willa !aUlts dcsjped to mDOVC asymmeby flam tile ploa. 1ntcrpn:11IIion or fUnnc:1 pJaIs is oRal visually di8lcal~ due: 10 lheR heine inadequate numben of !billies. Asscssia& die causesoffil_l plol asyrrunetly isaJso dilllcak. as betweenBluely hetcrop:ndt)'. reIaIionshipsbetweens study quality and IBIIIPIe sizc~ as weD as pubUcation bi-. can all cause similar . .lIerns iD funael plols (PGer ~IIII•• 1997). An example of. fuaacl plot is pYCD in tile SYSI1!M.VIC Il1VIEW5 AND MEl'AANALYSis nil')'.
JD
(See also IORES1' JIIDI')
....c. ..........'....,M.I994:()pCIatincctw.:laislicsofa rankcom:laliaatatfarpuWicaticm bias. BiMr4rtria50..4. 1088-101.
Dlcklnlll,1C.1997: How impadaatispaltlicatiaa bias? AS)'atbcsisof aWliIabic data. AIDS EliltmliM . .PmwtliM 9 (SuppI.~ 15-21. . . . .S. ...,.......R.L2OID:'J'riaHnd.fiU:asimpleftmnelpial based medIod of tcstiq and ad;ustin& rar publication bias ia IIIdaIIDIIysiS. BitHwlria 56, 2. 455:-63. 1faIr, Me. Daft)' ImIIII. G.. Ie........., Me .... MIader,C.I997: Bias il.l ....III8IJlisdl:la:tcd by • simple, papIIicallClL Blit< IIHkt11 JfIIInfIII 31S. 11•• 629-34• ....., .. J• .ad ......, D." 1984: S'IarIriIrJ.,: ,. scil!rw fl/rftVwilf, ramrrlr. ~ MA: ~ UniYasil)' Pas.sae.,".A.C._....,M.200I:Fanelploisfarclcledinc bias inDlCbHDllysis: pidelinesanc:hDiccofuiLJ"""",ltljClillimI EpilkrnieloD 54. 1~55......... "........ u.. J. L Y.2OOCt Millcadiqftmnel plalsfardclectiDnafbiasialDC.......JSis.ltJIIlftIl tJ/CIilliI.Yll EpitJem./OD .53. 477~.
181
G gamma dlatrlbuUon
11IisisaPAOlLmuTVDISlRlBUJION Cor DDlUlelalive values deftncd by IWO panuneteIS. here deDOIcd A and t (where t must be ~r than I). with the density function:
J(x)
=
r;,) A'x'-Ie-.b
Hen: t dcfineslhc shape ofthc distribution. while J. dcrlllCs the scale an which the distribution is observed. The r(/) expression is the pmma function that for inleger wlucs of I is equal toe' - I)! (faclOrial). Thcmcanoflhcdislribulionis IIAand the \'ARIAHCE is 1/1'2. For further delails or the distribution sec GrimrncIt and Slirzaker ( 1992). If evcats take place as part of a Poisson proc:es5. i.e. the time between slICCCssive events can be modelled as taking the EXFONEN1L\L DlSTRlBunON with parameter l. thc:n the time bclwc:cn afty two events will lake a gamma distribution. 11Ic pmma distribution can Ihc:n arise as the distribution of the sum of exponentially dislributed variables. It is also the QSC that the exponential distribution is a special case of the gamma distribution. The oa-SQU~RE DrsrRIBlIl'ION is also a special case. and pvcn two variables A and B that have gamma distribulioas. it is possible 10 crellle a variable AI(A + B) that has a BETA DlS'IIUBUnON. Gamma distributions with Ihc same saaIe parameter will sum to DDOIhc:r gamma distribution with that scale paramc:Icr. The pmma distribution is always positively skewed (of liuJe surprise since it is constrainc:d at zero on the left-hand side. but the tail extends 10 infinity on the right-hand side). but the SKEWNESS diminishes as I tends 10 infinit)'. where the NORMAL DlSTRlBunON is the limiting distribution. For some ilIustndion of the shapes thai the gamma distribution can take see the illustralion in the chi-square distribution entry. For furthcr dctails of the relationships with athcrdistributions see Lc:emis ( 1986). The pmma is sometimes used to model times to evcnt. as has bc:en sugestcd. for example. in Phillips el al. (1994). when: it is used 10 model the time to develop AIDS. Often, however. it is merely a eonvenient distribution Cor a quantity that cannaI be negative. In Bayesian analyses (see BAYESIAN MEIlIOOS) (Gelman el al.• 1995)~ it is of use as the eonjugatc prior for the mean of a PoJsso.~ DISTRIBUTION. but most commonly as the conjugale prior for the invcnc of the YarillllCC or a normal distribution. AGL
G..... A., CadID, J. B.. Stem. H. S. aDd Rallio. D. B. 1995: /layesilm tlsla _IyJis. Boca RaIm: Oaapmaa &: HalIJCRC. &qdOfNllldjt CtNHpIIIfioIr It) MmllYll Slalislia; S«rMd EdiliuIJ C 2011 JohD Wiley & ~ .....
Grtauaett.
o. R. Bad Stlnabr, D.
R. 1992: Probabili,yand 2RcI edition. Oxford: Clamadon Press. Leads,
rondom pr0UJJe6. L M. 1986: Relationships among commoa univ.iale diSUillutions. TIre NneriNIIS'alislirilm40. 2. 143-6. Jlbllllpl.A. N.. s....... C. A..
Elfo..... J ....... M.,JUIIIIISJ,O._Lee.C.A.I994:UseofCD4 lymphocyte caant to prc:dict long-term SUlYivaJ me of AIDS after HIV infectioD. Briti. Ittetlical Jouma/lO9. 309-13.
general fertility rate (GFR)
See DEMO(IlApHy
generalised additive models (GAMs)
1bese
models allow possible nonlinear n::lationships between a response variable and one or more explanatary variables to be aocounled forin a l1exibiemanner.OcncnIiscdadditive madcls aremastWICftllinsituationswherctherdaiionshipbctwc:eathe variables is expccrcd tobcofacomplex form.noIeasily fiucdby sIandard methods or when: there is no a priori muon forusinJ; aparlicularrnodeJ.1n gcneralisedadditivemodels. thelJ,.l"1 tcnn oC MlUIPLE lINEAR JWJRSSSION and LOOJSrJ: IlfIJRESSIQN is replaced by a 'smooth" function of the explanatory variable x,. as suggested by the observed data (Evc:rill. 2002). The building blocks of generalised additive models are SCA11ERPtDT SMOOIHERS such as locally weighted regression fils and spline funclions. Generalised aalitiye models work by n:placing the regression coefficients found in other regression models by the fit from one or other of these ·smoothers'. In this way. the StroD, assumplians aboul the relalionships of the: raponsc 10 each explanatory variable implicit in standard ~gressiOft models are avoided. Dclailsof how such madels are fitted 10 daaa are given in Hastic and Tibshirani (1990). Generalised additive models provide a useful addition 10 the 10015 ayailable for exploring the Rlntianship between a response variable and a set of explanatory variables. Such madels allow possible nonlinear lenns in the: laller to be discovered and then perhaps 10 be modelled in tenns of a suitable. more ramiliar. low-clcp:c polynomial. Generalised additive models can deal with nonlinearity in comllles that are nat the main interest in a study and adjust for such effects appropriately. An up-to-date technical account of GAMs is given in Wood (2006). SSE (See also (BIEJWJSED IJNEAR MODElS]
a. s.
2002: Modern metlit:al J/tlli6lics. London: Amold. HattIe, T. J. _ nllltdrul, R. J. 1990: Gmerali=etI atItIith~ ,,1OIle/$. Boca Raton: CRC PresslChapmaa .t: Hall. Wood, S. No 2006: GeBl!rtlli=et/atItIilive DlDdels. London: Chapman &: HallIC'RC.
EYII'Ift,
Edited by Briaa S. Everitt and ChrisIGph« R. P'dmeI'
183
GENERALISED ESTIMATING EQUATIONS (GEE) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
generalll8d estimating equations (GEE)
A
popuIarmethocl far anal)'Si1ll clustcn:cldala sucb as REPE.\1ED MEa\SWES IM.TA.
Let tbm: be i = I ..... "" obscrYalioas ror each clusa J. j= I..... N. In a IqJCIIk:cI measures setli. . the clustell j wauId typically be subjcats and the units i waUl be InCasumnent aceasiallS. 1he MEAN _ expectation 11# or the n:sponscs YII giyen a welor of cavarialcs x'O=(.~IUo ••• , x,..,) is modelled usinl a CBlERAI.ISED LDIEAIl MODEL:
,(E()',lxq)) ='CPij) = flo + ~IXI, + ... + fJ,.Y(Iij when: ,(.) is a link f'uDclion and flo to /1" IR Mpasion . . . . .ten. For example, _ identity link lives a LINEAIl ItECII!SSION ),(OOfJ. and a lasit link a I.OCHS1'I: RECIBSSION model. The madel is exaclly the same as a generalised linear model for independent data. The inlClpn:talion of the n:pasion panuDC:lCI'S islhlRrOM nat afrcctcd by the aatUM or the within-clusa clcpeadeac:e between the n:sponscs. This is iD aJDlrasl to RANDOM EHI!C'IS MCJDD.I. wbeIe the Mp:ssion pllllllllCten n:pIeSCllt the colllliliDlltlI or 6uhjecl-~pedJic effects or covariates pvea the random ell'ccls. These eR'ects will often clifl'er from the margilrtl/or papullllion-awmsc:d (6erl6es1imalcd USing Gm This cliR'cn:ace is illustrated ror a logitlink (logistic: rqn:ssioD) in die 811ft. Jkft the Ihin CUI"VCS reJRlCIIl subject-specific lelalionships bc:twc:ea the piababilily thatlhc: JapOIISC equals I and a ca\lBliale x for a . . . . . inlen:c:pllopstic JqreSSion macIeI. wheR the harizoDlal shifts an: due 10 clifl'elalt values or the I'IIIIIIom intcn:epL 11x: thick curve n:pn:senlS the populalioa-awmsc:d n:1alionship. rormed by a~1II the thin curves f_ each yaluc or x. 1bc slope of the popuialioft.aYeragecl CUI'YC is ftaller IhID the slopes of the subject-specific cum:s. '111m:rOM the popuIalion-aYCraled n:p:ssion panuneleD rend to be allenUalc:d (closer 10 2CIV) lelaliw to the subject-spc:ciftc n:pasion ~ Hale Ihat the distinction between
,,=
IaIURS I. 1he fealaM of' GBE that difl'en Ii'Dm usual generalised
linearmode)sislhatclifreraltn:sponsc:sYoandY'.Iroraclusler 8M allowccIlo be: com:laIed &i~n the cxwariates.. These conelalions 8M typically assumed 10 have a simple: SllUctun: paramdcrised by 8 small number of a..,..meters. One orlhe follOWing mrrelation Illl'UClUIa is mmmonly um
j
Indepa_ce. The responses IR inclepcndent liven Ihe
covaria1e5: Cor(.vij'Y'jlX;~"a'j) = 0 £XClrtl",mhk. All pails of' responses far the same clUSlcr
haw: the: same cam:l8Iion pven Ihe ccmuillles: Cor("'.Ya'jlX;, "Ij) = a, i
rF I
AR/IJ. 'I1Ie c:om:lalions bc:twc:c:a pain of responses for Ihe same: cluster (Ji,allhc: ca\'llrialcs) rail ott exponentially as die time lag between Ihc:aa increases (for l_gitudiul data only):
Car()"¥.Yi+'JIx,v.Xi-+I,j) =
a', lal < 1,1 = 0,1, ... ,lfj-i
U1&SImclllmi. 1he corn:lation matrix of the responses liven thc covariares is eslimatc:cl freely. wilhout n:striclions:
1.0
Cor()·"YljIXij. Kr'j} = ali'. i
0.8
fO.8 I 0.4 0.2 0.0
x gill ..........
subjcct-specilic aDd papulatiall-avc:npd elTecls cliuppc:an when _ identity link is usc:cl and. f_ RANDOM INIEIlCEPr MOOELS. when a log link is uscd.. As in COftYenti.... genc:nIiscd linear models. the VA~ lANCES ofthe n:spansespven thecavariales IRUSumc:cltD be Var(}"J"o) = "V(ptl)~ where the YBriance ftlnclion ~tI) is detc:nninecl by the choice or distribution. For instance.. f_ dichotomous n:sponses. the Bcnaaulli clillribution implies Ihat V(IIII)=Jlu(I-p#). whcn:as rorcount __ the PoISSON DlSTRlBUJION implies thai V(HII) =I'll" Since CM:I'dispersion is ccmunon in clustcn:cl _ the dispc:nion parameter " is typically estimllk:cl ~ if the clislribution
_.matlng equations CondtIonIIIIIIJd
marg/nlll1ogIstic relallonshlps
#: ,
For given valuc:s or the lelR-- pIII'IIIIIdeIs fJ. to fJ,,~ Ihe a-panmeters or the ..'orltill, t:On'eilllion IIIIIlriy can be estimated alOIII with the clispenion ~. (see Zeger and Uang. 1986, far details). Thc:se eslimak:s can Ihc:a be used in SlH:aIIccI ,etlelYllised e~liIIfIIliIrg equalioa to obtain ellimates of the repasiDD panundeI5. Since estimation of the corn:lation and dispersion panunc:tc:n laluin:s knowlc:dse of' the lepa&ion panunell:rs and vice vena. the GEE algorillun ~ by ilenlling betw=n (a)eslimalion orlbc: lelRssiDn panundeI5 usilll the cam:lalioD and dispel5iDn panunc:I&:I5 I'mm Ibe IRviaus itcndiall and (b) estimation of the eam:lalion and dispersiDn panunc:I&:I5 usiDi Ihc:
__________________________________________ n:p:aiall panun.c.tas froaalbe pn:viousitcndion. E\'Cntually. the algarithm cOlWClJcs, produciJII the same eslimatcs ia
successive ilaalions. It has been dc:monsInded lhat die: c:stillllded ",llrgiRai efl'ectl ill 10 iJ, IR ClGllSislent; i.e. the estimates approach the IIUc values as the number or clusten incIaIses. Imparlanlly. these eslimalcs IR °rabust· in the sense: Ihat they IR consistcnl far misspecific:d corRlalion structures. assuming thai the mean slrUclun: is ccxm:lJy spcciftcd. Call1islent cslimatesorlhc: OOVAIIAKt'EYATRIX or the estia.led marpnaJ efl'ecls IR nexlobtained by means ofdle sa-callecl sandwich estimator.
The Madras LoaIiludinal Schizophrenia Siudy followed up 44 fcmalc patients monIhIy after their lint hospilalisalion
ror schizaphn:aia (see RANDOM EFFECT MODELS FOR DISCREIE ror the full dalll). We will use GEE to invcstiple whether the CGUniC or iBness clifl'crs between patients with early and late onseL TIle duee wriables consicleled IR: (Monda): number or IIIOIIlhs since ftrst hospitalisation; lEarly): early ODICt (I: befCR. 20. 0: at Ale 20 or laler); L'* n:pcaIc:d mcasun:s of daaulht disanler ( I: pmsc:nl, 0: abseal). Htft we cxmsidcr a subset or lbc . . . namely on whether lhoqbt disanler IYJ was IftSCI1t or DOl aI O. 2. 4, 6. 8 and 10 moaths after hospitalisation. LeUing ''I, be die mcasumnenl of thought clisonler at occasion i far palientj, we eansider a dichotamous logistic JqIaSicm mocIel: LONCIIIUDIXAL IMTA
pr(,.q - liz,) )
In ( . .. ~-.- = 1-Pr()·, = 11.1',
IJo + /lIXlj + ... + fJr-Xajj r-n
We use GEE with indcpc:adcnce and exchangeable working .conelations 10 estimate a model with explanatory wriables (Month). (Early) and Ihe intc:rac:lion (Early) )( (Monlhl. allowing UI to inwstipte the liaear In:Dd or time (for the 101 odds) as well as cliffelalccs between limes of -set. DOl just in the oyeraU odds or daaughl disanlcr bul also in lhc wncl oyer lime (see die table)•
G8NERAUSEDesnMAnNGEQUAnoNS(GE~
The estimates assuaailll an excbangcablc c~ati_ is a clecliDe oyer lime in dae odds of thau&ht cIisorcIcr in die JaIc.onsct patients (adds ratio = exp( -0.28) - 0.76 pel' month). However, die carlyonset palicnlS do nalappc. 10 have an appRICiably p'C1Ilcr odds of lhaaghl disanlcr at baseline (odds ratio = exp(o. 10) = 1.1 I) or a pliler _line in the odds oyer lime (odds ralio=exp(-O.04) =0.96). 'I1Ie ellimalc or Ihe corn:lation parameter G is 0.26. Estilllldcs of die G-paralDCters IR usually not IqJOI1Cd because they an: ~ as nuisance parameters. Adcfinilc mcritorOEEislhatvaiid inrcn:nc:cs an: pracluccd rar papulation-avelDled effects as long as die mean strucIun: is corn:clly speciftccl. e\ICII if the dcpendcace slnlcture is misspcciftcd. Howcvcr.Ihc~ an: aisoanumberoflimilaliaas. The cstimalcs IR consistenl only under die rcSlricli\IC assumption thaa the pmbability that a rcsponsc: il mi&Sing docs noI cIcpend on othcrraponses for the same cluster. gi\'CII1he cowriales (sec _SINO DATA). Althoulh GEE is often said 10 require Ihat rcspoasc:s IR missiDg completely at rancIam (MCAR). missingness may in fact cIepencI on amuialcs includcclin the mocIel. Analher limitation is thai it is ingcncral climcullro assess model adcqulIC)' in GEE; liblihaocl-buccl diagnaslics ~ for instance. DOt available. The usc or GEE SlnlcllR SllJlCSl that th~
.~WW~mmR_n:~Io~~~w~m~
or population-avcnpd effects an: or inlcn:st and it can be arpccIlhaI it should be a~cIecI in anaIYllCSorac:lioiogy. This isbccausc causal pnJCc:s&eSopcrate at thcclustc:rorindividual level, DOl aldie population Icyci. Unlike conditional efl'eclS. population-averaged effects also cIcpcad on die clcp'Ce or hetcragencity in the: papulation. Finally. thcCSlimalcd rcp'Cssion paramcIaS IR no I_p CXllllislenl if "baseline:· (initial) rcspaascs an: incluclc:d as mvariatcs ror loqilUdinal data (Crauchieyand Dayies. 1999). Sec Dip eI III (2002) for a IhorouIh IrcaImc:Dl of GEE and Unclscy and Lambert (1998) far a crilical evaluation. ASlSRH
........... _mating equations EsIimIlIedf8(J18SSiDtJparamsIeIs fromGEEwHhlndtlpendenoeandeJCCluurgeable correlation sltuclures
Esl
GEE
GEE
intkpmt/ml
exdrang«lbk (°Sallli."iclt·
£61
SE)
110 (Cons)
II_ (Mondal 1121E"yJ lis IE.'yJ x (Month)
0.69 -0.27 0.04 -0.04
(0.35) (0.07) (0.67) (0.11)
(°SllntIK'irh' SE}
0.71 -0.28 0.10 -0.04
(0.35) (0.07) (0.69) (0.11)
GENERA!JSED LINEAR MODEL (GLM) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
CnacIIIIJ." 0._"
ad B. 1999: A CDI1IpIriSDDofpopulalion 8\'CI8IC and nndam dl'CCIS models for Ibc analysis or laqitudinal
count cilia with baso-linc infonnalion. JOIII"IItIl 0/11w Royo'StlltUfical Society. S"in A 162. 331-47. DIIIIt, P. J.. ......." P., K.-Y...... ZeaIr. S. L 2002: AntIIyJU oj longil__1 • tll. 0xfanI: 0xf0nI Univasil)' Pless. 1..IaIIIeJ. J. K. ad I • ......, P. 1998: OR Ibe appropriateness of III8IIinaI IDDIIcIs for rqatcd IIICISIRmmts ill cliaic:a1 IriaIs.. StDliJlics in M«Iirille 17. 447-69. ZIIer. S. ... .ad ....... K.-Y. 1986:
u....
tonptudiaal data aaa)ysis for discme ... amliauaus outcomes.. Biomelri(,J 42. 121-30.
generalised IIMar modal (GLM) 'I1Iis model forms a unificcl fmmewadt far repe__ models inll'oduced in a landmark paper by Neider and Wedderburn (1972) o\'Cr 30 yean co. A wiele nmge of stllliSlicai madels including ANALYSIS OF VARIANa. ANALYSIS OF COVARIANCE. MUL11PLE
LINEAR RfOPFSSKlN and LOOLmC REORESSION arc
included in the GLM framework. A compn:hensiw: technical accaunl of Ihe model is gi'VCR ill McCuIl.p and Neider (1919) with a more concise description appears in Dobsoa (2001) and Cook (1998).
The ICnn "regn:ssion' was ftDt inlloducccl by Francis Gallon in Ibe 19111 "nlul)' to cbaraderise a lCndenc:y to mediocrilY. i.e.lowanis lbeaw:rqe. absencd in Ihe offspring or plRDl seeds and used by Karl Peanon in a study of' the hc:iPts of flllhers ad SOftS. The sons' heighls leaded.. on a'VCI1IIc, lO be less exRme lhan Ibe fathers· (sec RBJRESSION T01HE MEAN). In CSSCDCC. all fonnsofregn:ssion haw: as lbeir aim Ihe dew:lopmenl and asscssment of a mathematical model for Ibe relationship between a response 'VIIIiable, )'. and a set of q explanaloly 'Variables. XI. X2' ••• , x.. Mulliple linear repc__• farcumple, involw:s the following model fory.
y=
flo + fJ ••'t1 + ... +(Jr. + ,
whue /l00/l •• •• •• /1. arc rqn:ssion coemcients that haw: to be eslimlllCd rraaa &le data and E is 8D c:mJI' ICnn assumed to be IIUIIDIIIly distribulccl with ZCIO mean and a constant 'Variance ,r. An equivalent way of
wrilinl
Ibe mUltiple aqrcslion
model is: .' rvN(p,al)
when: II =fJo + /l•.'C1 + ... +/J"xtr nais makes it clear that this model is only suitable farconliDuous n:sponse variables wilh, conditional on Ihe 'Values oftheexpllll1aloly yariabl~ a NORMAL DIS11UBunoN with constanl VAIUAHC'E. Analysis of'VDliance is essentially exaaly the same model with .'CI. X2 • ••• , .1'., being dummy 'Variable CXJdiIll faclor levels and inlcnlctioDS between fKlan:. analysis of COwrillllCC is also the SBIDC madel with a mixture of continuous and Calegarical explanatory 'VIIIiables.
The assumption of the eanclitional normality of a continuous raponsc variable is one Ihat is probably made mon: oftc:n than it is 'W8II1U1ted.. There an: also many siluations where such an assumption is clearly DOl juslifted. One example is where the: rcspaasc is a binary variable (e.g• illlplO\'c:cI1noI im)XOved) and another is where it is. count (e. g. number of corn:cl _wen in same telling situation). The: question then arises as lo how the multiple ~~ion IIIDdeI can be modiftecllO allow such re5pIID1CS to be related to Ihe explanaloly variables of interesL In the GLM approach. Ihe genenlisation of Ibe multiple ~Irasion model consists or allowiqlbe following tbm: assumptions lISSOCialcd with this madel 10 be modiftc:cI: Ihe raponsc 'Variable is nonnally dislribulccl with a MEAN detcnnined by the modcl~ the mean can be IIIOdelled as a linear fundi_ of (possibly nonlinear lnIIISf'ollDlllions) of the explanatory wriables. i.e. Ibe efl'ccIs onhe explanalory Yariable on the mean arc additive; andlhe variance of the response variable given Ihe (predicted) mean isconstanL In. GLM. same lransfonnalions oflbe mean an: modelled by a linear funclioa of the c:xpllUllllor)' variables and Ihc: distribution of tile n:sponse araund its mea (often refenallO as Ibe error clis/ribuII"lIn) is r:c:neraIised usually in a way that fils naturally with a particular lransfonnation. TIle resull is • very wide class or repession madels. 'I1Ie essential camponents of a GLM arc: a liaear pn:dic:tor. fl. fonned fram Ihe explanatory variables: 1J = /lo + fJlxl
+ I'2Xl + ... + /J.·'tf
A tnuasfonnillion of the mean. II. of the ~sponse Yariable is called Ihe liRk JiI",·'ion. &fII). In a GLM it is &fII) that is madelled by the liDear predictor:
8(11) = 1J In multiple liDear n:gn:slionand analysis of variance. the link function is Ihe idenlity funclion. Other link functions include Ibe log. IogiL plUbit. inverse and power lransformalioas. allhaugh the log and iogil arc those mosl commonly mel in practice. The: I..il liDle, far example. is Ihe basis or logistic n:gn:ssioa. 1hedistributionortllen:sponsevariablegivcn itsmeaall is assumed to be a distribution fram Ihe EXPONENIlAL fA).lJLY• Distributions in Ibe exponential family include the narmal disbibutioa. the BDDIW. ..S11tIBWON, POISSON DISlRIBl1IION. o.wr.tA DISI"RIBU1KIN and EXPDNEN11AL DlS1RI8U11OH. Puticular link functiOftS in GLMs an: naturally assac:iatcd with particular enur distribulioftS. e.g. the identilY link with Ibe Gaussian dislribalion. the Iopt with the binomial distribution and Ibe log willi the Poisson distribution. In these cllSCS~ the ICnn clIRonicollinlc is used. The choice of PROIWIILITY DlSTRlBU110N detcnninc:s the relationships between the wriaDce of Ihe n:sponse 'VIIIiable (canclitional on the explanalaly variables) and its mean.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ GENERALISED LlNEARMODEL(GLM)
11Iis RllIIionship is known as the N,illllujilnciiorr.. denoted C/)V
115•
I
,. 10·
I '0
I:i
5·
•
We can illuslnle an applic:alian or GLM uBi. the data shown in the lint table. which 1ft liven in Habennaa (1978) and also in Seeber (1998). They arise limn asking randomly chosen household members from a probability sample or a town in the USA which stn:ssful cvents had occuned within Ihe lui 18 months ancIto rcpad the month or OCCIIIRDCe or Ihese Cftnts. A SCATJERPLOr or lhe data (see the fiISl ftlun:) indicates a _line in the DUmber of e'VCIIls as these lay runber in the pasI. the n:sull pedI8ps of the fallibility or human 1DeIIIOI)'.
general..... U.... madel DisIl1bulIon by monIhs prior to intenIIew of sIr8ssIuI e.ms reported 110m subjects; 147subjBcIs repotfIng exacIIyone stf'essfuI event in ,he pedod from 1 18 monIhs ptIor to inlenAew
'0
TiRle
y
I
IS II
2
3
4
5 II
9
8
•
•
• •
5
•
• • • • •
. 10
• •
•
•
.15 • •
Months generalised II.... madel Plot of IfICII/Ied IIIBIIIOIIes data
10 4
10
10
II 12 13
9
IS 16 17 II
•
14 17
S 6 7 8
14
•
11
values or the: IapaDlC appIOXimale Ihe observed values; dac: deviance qualcd in most examplcs orGLM ftltilll is actually -21imc:s the maximisccllog·likelihoocI for a "'1. 10 dial dilfen:ncc:s in devillllCCS or compdinl models give a LIICEU. IIOODRATIOfarCOlllp8linglhemaclelLA IIICRcletailedaccDUDl or the usessment of III Cor OLMs is liven in Dobsaa (2002).
7
II 3 6
I I
4
Since the IapanlC variable here is a counl ... can only lake zero or posiU~ values it would nat be appropriate 10 use mulliple linear R:IRSlion heR to inYeslilate the relaUonship or Rcallsto lime. Instc_ we shall apply a OLM with. 101 lint funcUon 10 that litted \l8lues IR conslninc:d to be positive and as an en'OI' distribution use the Poiuon distribution. which is suitable for camt data. These two assumptions lead 10 what is usually I.belled Pou." regnssill!'. Explicit'y, die IDOdeI to be litted 10 the mean number of m:al1s.,I. is:
187
GENEnCEADEMKaOGY _______________________________________________________
log(P) =
orrmp.ency disuibulionsof mulliple bappeniDp. JOIUIIalof Ihe Royal 510l&Ii£al Society 83. 255.............. S. 1978: Ana/yJis 0/qualilolin! dala, \bl L New York: Academic Pn5s.. Mccau.p. P..... NeIder, J. A. 1989: Gmerolist!tlliMtu nrotIel,. 2nd edition. laIIdoa: Cbapm_ cI: H.II. NeIder, J. A. ad WeddeItHIm, R. w. M. 1972: GeaenJised lincu modcls. Joumalqflhe Royo/ SlaliJlical So~irty. Series A 135, 370-84. SeeIN!r. G. U. R. 1998: Poisson ~pasion. In AImiIaIe. P. and Colton. T. (cds). En~)'dope. of bitBlo,,,,k$. Chichester: Jaha Wiley cI: Scm. Ltd. DI.1U~
flo +PI time
The results of the ntling praccd&R ~ shown in the SCI:OIld table. Tbc estimated ~sion cocflicient ror lime is -0.014. with an eslimalc:d STANDARD ERROR of 0.017. Expaacnliating this last equalion and inserting Ihc estimaled parameler values gives the mcxIcl in lerms orlhc fitted counlS rather dian their lop. i.e.:
p = 16.5 x 0.92C1iThe scallcrplOl of the exiginal data now also showing lhe ftlled model is given in the secaad ftgure. Tbc diffCRIICC in deviance of the null model wilh noexplanalory variables and one including time as an explanatory variable is 1arJc~ which clc"y indicates Ihal the rqression cocf1icienl for lime is not zero.
SSE
• • 5
10 Months
15
generalised II.... model Recaled memories data shcwIing the fitted Poisson I'8gresslon model generalised Onesr model Results of a Poisson IBflression COl'QriDles
(Iatercept)
n...
Eslimated
StQntlmd
Estilllate
regression coefficient
error
SE
2.803 -0.084
0.141 0.017
11.920 -4.987
Dispersion parametcrfor Poisson family takca 10 be I; nuD dc~: .50.84 OD 17 ~ of mcdom: ~SidUaI deviance: 24.57 OD 16
dcp:a of freedom.. (Sec also OI!NEIlAUSED ESmL\mro EQUAnoNS) CoM, R. J. 1991: Gcnc:nIised liDCal'models. In Annitap:. P. and Colton. T. (cds). En~JopNia of bitB'ol"'ks. Chichester. John Wiley cI: Sons.lJd. DabIDa, A. J. 2001: AIr inlrrNlucliotr 10 geM,o/L-ytllilleor mothb. 2nd edition. Boca RIdon: Chapman cI: Hall' CRC. GI'MIWODd, .L aDd Y.... O. U. 1920: AD inquiry iIMo die
This is the study or the ~ netic aspc:cls Dr the patterns of disease and other biological
genetic epidemiology
lnilS. Allhaugb. conceptually. genetic epidemiology is a branch of epidemiology. it has developed from distinct historicall'OOlS and employs diffe~nt study dc:5igns. melhods Dr slalislical analysis and lenninology. Genetic epidemiology is usually distinpished from population genetics. which emphasises the ~lics of populations Ol'a' time and ils ~Ialion 10 facton such as population stnIctlR and sclc:c:lion. The statistical component orlhcsc and ~Ialedaaas is rerem:d 10 as statistical genetiu. The ccnlnll themes in genetic cpidcmiolOl)' ~ the identification of gcaes related to disease Dr other lnits and the evaluation of risks associated with diffe~nl genetic variants. There is a major emphasis in genetic epidemiology on family studies (pedigree QlllllysU). lnclccd. same authors have considered that familial facton in disease. whclhcr or not genetic, ~ an essential componenl or genetic epidemiology. He~ we outline some or the main analytical approaches in genc:Cic epidemiology. A genetic epidemiological analysis often starts with ~ sc:ripli\IC studies of familial ageption. sIlIIlies an: importanl in this context. because comparison ofMZ and DZ twins allows genetic effects 10 be scparataI from shan:cI environmental inn~s on risk. For binary bails such as diseases. famUiai agn:gation is often described in ~s or /tuniliDl reIQIi.l! risla. 1'bcse are deftned as Ihe lBlio or the risk or the disease in a ~lative of an alTcc:1ed individual to Ihc risk of the disease in the gcacral population. The size of Ihc familial elati\IC risk and its variation by lype of ~Ialive can give clues to the genetic model underlying the disease. For quanlilali,,, traits. familial aggn:.galion may be described in lenDs of co~ions in trail values belween ~lalives. Anothc:r important slalistical approach in genetic cpi~ miolog), is SEOlEOATION AN.\LYSIS. The aim or segn:ption analysis is 10 fit different gcaetic models lo diseases in pedigrees. These may include models involving a single genClic locus Dr mUlliple loci. The madels arc parameterised in lenns of the fn:quencies of alleles and Ihc rislcs of disease associalc:d with each genolype. Measured or unmeasured environmental risk radon might also be included in the model. Models are usually ftned by computing likelihoods and using a MAXBIUM UltEUHDOD ESl'IKo\1ION. Efficient algo-
n"in
_____________________________________________________________ GENEnCLINKAGE rithms. called peeling algorithms. are available ror likelihood compulaliOll in .....e or compicx pedi~es. allhough alternative appnlllChes. such as a MARKOV OL~N Mmm: CARLO algorithm. an: sometimes necessary. An impodanl concept in pcdilm: analysis is tDCerlQinmenl. Aaalysis is usually conducted on a series ofpecli~ lhal have been collcc:ted becauseofthe pracnccorsame trait. II is importanl 10 dcftnc the part of abe daIa leading 10 the ascertailll1lCllt or the pediJR:C and make appropriate adjustment ror it in the analysis. by c:onstruclinl the appropriate conditional likelihood. Then: 1ft lwo main .pproaches 10 the mapping or disease JCIlCS: CJEHE'I1C LINKAOE studies and ASSOCIATION sludies. Ocnetic linkap slUdics are based on the inheritance of tnits within families. They rely on the ract that loci ...., are clasc tOJether on the SIUIIC CIuvID05Oll1C tend to be co-inherited (linked). when:as loci far apart will be inheriled indcpendeatly. due to pnICCSS or recombinlllion at meiasis. The JIROIWlJUTY or. nx:ombin.tion between 1W0 lenetic: loci is called abe reco,nbilllllionfr«liotr (usually ~lCIIted as 8). In a family with multiple cases or a disease. the diseased individuals will tend to share alleles at loci that 1ft close to the disease gene. In lhis w.y. the entile lenome can be examined for evidence or link. using a limited number of genetic rnarItcrs whose posilion in the lenome is known. 111e SlatiSlical analysis or genetic liqe data aims to detcnnine wheaber the pallem or co-inherilance of disease and marker lenotype isdilTen:nl fram what one would expc:cl under the null hypothesis of no linkage. Slalislical analysis of genetic link. studies Clift be of two Iypes: panunetric and nonpanunelric (or model rree). In pal'lllllCtric linkage aaaiysis. a particular disease model is specified and likelihoods are then conslnlcted for different values of the recombination fi'aclion. Link.ge evidence is often summarised in terms or 11ft UJD $CDTe. dcftnecl as 10110 of the 1DIi0 of abe likelihood or the data for a particular m:ombination rraction to the likelihood under no linkage. Model·rree linkqe analysis is based on the sharing or marker alleles IUDDftg aITec:ted indiYiduais in pedign:c:s. The aim is 10 determine whether the numberofmarkcr alleles at a giYen lime shan:d by affccled individuals is ~ would be expected by chaac:e. This approach is popular in the slUdy of genetically complex dise.se when: the disease model is unknown. It is oRen used in the study of alTected sibling pairs. which is a common study disease ror linkage in complex anits. butlhc: .pproach has been generalised to man: general pedipees. Unkqe analysis is implemeated in sev· enaJ programs. including UNKAGE and GENEHUNTBR. Linkage analysis is • good approach for identifying genes thai have almge effect on disease risk.. but tend to lack power to identify loci "'at h.~ a madc:rate elTect on risk. Association studies cvalualc dim:t1y Ihe assac:ialion between spc:cific genetic varianls and the anil or inlclat. 1hey 1ft
"'1Ift
the method orchoice for idc:atifying lcaes ofweakelTect. Far diseases these an: usually CAS'E-CON11IOL SlUDIES with u~ h.1ed cases and cOnlJols. Such SlUdies can be analysed using standard caso-c:ontl'Ol approaches. b is also possible to con· duci association studies using within·ramily conbols. This approach can eliminate PIOblcms or uncontrolled confound· ing. One commoaly used desip is to gc:aalype a series or cases of the dise.se tOielher with their two pan:nls. The case genotypes an: then compami with the alleles in the pamdS that an:: not lransmilled to the .lTcctcd case. nus appnlllCh leads 10 the transmission distodiOll lest (TOT). Oeaetic association m.y arise due to a causalassaciation willi the polymarphism or intere!t1 or because of • true: lISIOCiatiOll with a neighbouring polymorphism. The I.ttc:r Clift arise because genotypes at neighbaurinl polymorphisms tend to be eoJRlated. Polymarphismsarisc by mutation .tone point in the history of. populatiOll. 011 a padicular chron. some with • particular htJplolype or marker alleles. A newly arising allele willlherefOn: be in associaiiOll with abe allelent neighbouring loci and this associ.tion will be maintained in the popul.tion if the loci areclase together. This phenomenon is kllOWll as /inlcage tlisequi/ibrillm. To elucidate rully die: lISIOCiatiOll at a particular locus. it may be necessary toextend DE the analysis 10 the joinl elTccls or multiple markc:n. (Sec also CE
MoJ.,"",
S'a'is'itIUrh_genel;a. Loadan: Amold. T.....pr,J.D...... Oft, J. 1994: A"alysis of IIUIIIIHI grnel;~ liRlcagr. lnl cditian.. BallilllCft: Jahns Hopkins Uniwnity Press.
genetic linkage lbisis the nonindepcndentsepelldion of alleles at genetic loci close to one another on Ihe same cluvmosomc. Mendel's law of SCln:pliaa states Ihat an indi· vidual with Ihe helelOZ)"laus genotype (As) has an equal pm,ability ortransmilling either allele (A or a) to 11ft offspring. The same is lnIe ofllftY other locus with alleles B and b. Under Mendel's SCIXIIId law, that of independent assortment. the probabilities or transmitting the four possible combinations of alleles (AB. Ab. aB. ab) 1ft all equal namclyone..quarter. 1'his law is. however. only IIUc for pairs of loci that an:: on sepandc chromosomes.. For two loci that are on the same chromosome (known tcduaically as s)'lllenic:). the pmbabililies of the rour pmetic:cllISICS(AB.Ab.aB.ab)lftnaleq..Lwith an exccssof the sumc allelic combinations as those that wen: transmiued to the individual from his 01' her paRnIs. In other wonts.. if the individual m:ei~the allcJiccombinaliaaAB from one)JlRnl and ab fmm the oIher.lhen he or she willlnlnsmit these same combinations with greatel' pmbabilily thllft the othen (i.e. Ab andaB).1hc farmerallelic cambinationsarc knownasplRntai
1.
GENET~UNKAGE
_____________________________________________________________
types and lhc latter recombinants. "I11c strength of geacUe linluq:e bc:Iwccn two loci is measW1:d by the recombination fiaclioD. defined as the probability that a raxJlllbinana of the twolociistmnsmitledtoanoft"sprin~An:c:ombinalionfl8dion
ranges flUID 0 (compldc linble) to 0.5 (independent aaartmcnL or the complete absc:nce orlinlcqc). Recombinant gametes of two syntenic loci arc generated by the clOSsilil ovCl'ofhomologous duomasomes at certain semi-random locations during meiosis. The smaller the diSlanc:e between two S)'DlCnic loci. the less likely Ihaa they will be &epanltcd by cnming OVCl' and thcn:feR the smaller the recombination fraction. A n:combinalian of 0.01 concsponds approximately to a lenetic map distance of 1 centiMorgan (eM). 11ac erassilll-over rate varies belwcen males and females and ror difrcn:nt chromosomal regions. but on avenle a genetic dislance of 1cM cOlI'Csponds approximately to a physical dislance or one million DNA base pairs. TIle: loIal,enelie len&th of the human lenome is approximately 3S00cM. For many decades linkaJe analysis was ralrictcd to Mendelian phenotypes such as the ABO blood groups and HLA anliJcns. Rcccnl developments in molecular genetics have enabled a wriely ornlllUrally occurring poIymorphisms (C.I. short sequence n:peats. single nucleotide polymorphisms) 10 be delected and mcasurm. SIaDdanI sets of such genetic ...ten. evenly spaced dnulhaut the enU~ ~ nome. have been dc\'elopcd forsystematic linkqc analysis to localise genetic varianls tMt inm:ase the risk ofdisease. "Ibis is a particularly attractive meabod of mappin& the genes ror diseases sina: no knowledge: of the pathophysiology is required. For dais n:ason. the use of linkage analysis to map disease leDCS is also called positional c/onin,. Linkage analysis in humans ~sents intc~51iDl statistical challenges. For Mendelian diseases. the challcnles arc lhase or variable pedigree struclu~ and size and the common occurrence of MlSSINO DA.TA. The standard method of analysis involves caleulating the likelihood with n:spcctto the n:combinalion fraction bc:twccn disease: and marker loci or the map posilion of the disease: locus in relation 10 a set of markc:rloci. while the disease model isassumc:dknown (e.g. dominant or n:cessive). Traditionally. the slrc:Agth of evidence for IinkQe is summarised as a lod SCeR. defined as the common (i.e. base 10) 10larithm of the: nlio of the likelihood for a certain recombination fraction to that under no linkalC. A lod sco~ or 3 or more is eODventionally ~larcIed as sipifleanl evidence of linka,c:. Por Mendelian disorders. 98 4Jf, ofreporls of linkage thai mcetthis criterion have been subsequently confirmed. Linkage analysis has successfully localised and identified Ihc genes for huncln:ds of Mendelian disorders. Loeus heterogeneity in linkage analysis ~rcrs 10 the siluation where the mode of inheritance. but not the actual disease: locus. is the same KlOSS difrc:rcnl pedil~es. In
other words. there ~ multiple disease loci that ~ indistinguishable from each other both in terms of manifestations at the individual level and in the pattern of familial transmission. Under thesccircumstances. the power to deled linkage is much diminished. even wilh lod scon:s modified to lake account or locus heterogeneity. espc:c:ially for samples consisting of small pc:cIil~S. For conunandisc:ases dud do not show a simple Mendelian pattern ofinheritance and arc themfo~ likely to be the result of multiple lenetic and enviranmcntal factors. linkage analysis is a IlIOn: difllcultlask. For such diseascs we typically would have an idea of the overall importance or ,emetic factan (i.e. HERlTABDJIY) but 110 detailed knowledge of genetic ardIitcdun: in lc:nns of the number of vulncrabilily genes or Ihc: mapitudc of Ibc:ir effects. Then: ~ lwo major approachc:s to the liDlcqe analysis of such complex diseases. The firit is to adopl • lod SC~ approach. but modified to allow far a number or IIIIR or less ~alislic models for Ibc: OENOrYFE-PllEXOI'YFE n:lalionship and 10 adjust the largest loci SC~ over these: models for mulliple tc:stilil. 1bc: second approach is ·model f~e' in the SCIISC dlat a disease model does naI have 10 be specified for the analysis. Instead. the analysis paoccccls by defining some measure of allele sharing between individuals in a pc:dipec and n:lating the extent allele sharilll to pbc:noIypic similarity. One popular version of model-be linkqe analysis is the affected sibop_ method. which is based an the dc:ccetion of excessive allele sbarinl at a marker locus for a sample of sibling pails when: bath members ~ affcctc:d by Ibc: disorder. 11ac: usual definition ofallelc sharing in model-fn:e linkAle analysis isidenlit),-bydescent. which ~rc:rs to alleles that ~ descc:ndc:d fram (and ~ Ihc:~fo~ n:plicates 00 a sialic anceslral allcJe in a n:cent common ancestor. Al,arilhms for estimating the extent of local IBD fram marker genotype data ba\'e been developed. Methods oflinkqc analysis have been developed also for quanlilali\'C lnits (e.g. blood pn:ssun:. body mass index). A padieularly simple method is based on a ~gn:ssion of phc:nolypic similarity on allele sharilll. A I1KR sophislicatccl approach is based on a vARlANCEcampanc:nts modc:l in which a COMPONENT OF VAJU.UX:E is spccific:clto have COVARIANCE between ~latives that is proponionalto the exlent or allele sharinl bc:twec:a Ihc: relalives. Rcprdless of the statislieal melhod used for the linkqe analysis of complex traits. Ibc:n: ~ two major inhcn:nt limitatians or the approach. The fllSI is thai the sample siX! nquiJallo dck:ct a locus with a small effect size ~ very hllge. poICfttiaJly many thousands of families. 1bc: second is the low raolving pDWa'. in thai the rqion that shows linkqe is typically w:ry broad. containing potentially hundmls of genes. Forlhc:sc n:asons link. is usually combined with an assoc:iaUan Sllaleg)' in the searc:h for the genetic dctcnninants of mullilenic diseases. PS
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ GEOMETRICMEAN [See also ALJ.BJC ASSOCA'IION. CJENEI1C EJIIDEMIOLOOY, IL\PI.O. TYPE ANALYSIS)
genoma-wlde ...oclatlon studies (GWAS) See AIJ.EUC ASSDCIATION
genotype
Oenolype describes abe gcnc:lic maltc:-up or an individual or orpnism. usually n:fcnilll to a particular ICDC or p:actic IOCUL DiIl"cn:ntlcnalypeS arise at loci in a lcaome wilen: then: an: dilJcn:nccs in DNA sequc:ace - such loci an: said to be polymDrpllic and the sequence diffcn:nces an: n:rem:d to as allele:.. Alleles that an: sllOlIIly assaeialed with a panicWar disease or lnil may be n:ferrcd to as mulations. Thc lenalype or an individual at a partieular locus is then die: camhinalian of alleles at lhal IOCalion. In humans and GIber hip cqanisms. individuals inherit two copies of cach gcne. one: from each pan:nt (CX"pI those: OD the sex cluvmosomcs in males). A genotype al a single locus thcn:fon: n:f'ers to a pair of alleles. Farcxamplc.the gene encoding the ABO blood group has tine commonly n:cognisc:d alleles. known as A. B and O. 11H:n: an: then:fon: six possiblc gcJIotypes: (A.A). (A,B). (B.8), (A.O). (B.O) and (0.0). Onc may also consider the IcaalYPCS at several loci together. Far cumplc. suppose then: an: two poI)'IDOIJJhic loci in a particular gene or n:gion with allcles Al and AlO at the lint locus and B I and B2 aI Ihc second locus. The multi locus gcnalype far 8D individual may consist or A 1-81 on one CluolDUSDmc and Al-B2 on abe aIhcr. The combinatian or allcles at dilJCIalt loci on a livcn chromoDE some (C.I. AI-BI) is called a IrsplDtype. (See also 0t!NE11C' EPlDEMlOLOOY]
........ D.J.. BIIIIDp,M..... CunInp. C.(eds) 2001: HtlIIdbaok ojstlllillit:ttlge"~lic$. OiicllcSlcr: Jahn Waley" SoDs. Ltd. m-,P.
1991: Slatistiu i" IJumon ,arlin. Loadan: Arnold.
geometric distribution This is abe ~ABIUIY DIS. 11lIBUllDNofthe nurnbcrofevcnls RXluircd in onIcrloobsc:rvc a lint ·succcss'. If we ha~ a sequencc or events. each of which independently has a prabability or success p. then the probability mass fuDcliOD for the number of cvents laluin:d berCR abservilll a success is: Pr(X
= .\") =p( I-pt
and that for the number n:quimlto observe the IIIst success is: Pr(X = x) =p(I_p).Y-1
Both an: sometimes called the gcomclric clisbibulion. Hcn: we shall consider die: second formula 10 be the dellnition. If X has this distribution. the mean of X is lip and the variance of X is (I - p)/p2. The geometric distri-
bulion is a special case of tile NEOATI\'E BINOMIAL DI.S1RIIU. TIOX. For fullller details orahe distribution see Orimmetl and Slirzakcr (1992). The inle..,mation of the probability mass I'Unclian is sbBia;htforward. ID onIcr 10 observe the first sua:css on the A1h cvent. the.m. e~nl must he a success (with prabability p) and the x - I pn:vious cvents must be railun:s (willi pro. abililic:s I - p). Since abe events an: independent, the overall pmbability is obtained by simply multiplyiDg Iogcthcr lhcsc iadiviclual probabilities (see PROBABIUI'Y). The event can be almast anythiq. from a ICncratiOD of a ramily (peduaps far the purposes of modeDinllhe numbCll'of gcnc:nlions until cxlinclionofa spccies),toa nuclcalidean a chromasamc: (for the PUIpGIC of modelling the leqth of chromasomc: hefon: a mulalion). Indcc:cI. it is this last definilion of c~nl that HiDikcr el QI. ( 1994) use as they look 10 madel thc length of' conversion tracts in tenns of numbers of nuclcolicles.the IClllthofacaavenion InIeI beililthe number ornuclealidc:sn:quiml befon:a k:lmination ('sua:ess') of abe bact is obsc:nrcd. IndcccI. Ihe majority or uscs or this distribuliOD in the medical sciences appear 10 be in the ftcJd OfgcaeliCL The geometric du.1ribulion is a discn:re analogue or die EXPO.NENIIAL DlSTRlBurJOX and a numbcror similarilies should he appan:nl. c.g. die: functiOD of the distribution in modelling the time 10 an evenL When the probability of a succ~ p, is small the conliDuous exponenlial distribution (with parameter p) can provide a good approximation to the discn:te geometric distribution. Theconstruclion orlheclistribulion in tcnns ofindc:pendent events means that the ·lack ormcmory' pmpeny is also shan:cI: i.e. obsclViq two e~lS to be flilwa does not alrer the dislributiOD of how many evcnts need to he observed to have a suecess(indepc:ndc:nt cyents. by definition. canol inftuence one another). ACiL
a.m. Ie_
G......... G. . . and StInabr, D. R. 1992: ProlJalJilil)' _ ,tIINIOnr ptfICf!sstJ, 2nd cdili•• Oxford: PIas. . . . ., A. J.. H...... G......... A. G., Gray, M., Clan, S. H..... CIIImdck, A. 1994: Mciatic &CDC cxmmsion IrId distribltlioa withialherasy loc:usofDm.roplrilaRrftmog4lt". ~Mtk1137,
1019-26.
geometrlc .....n
'I1ais is a t.IEASURE OF l.OCA'IION used when da.. exhibit pasilive SKD"NESS. for example. such that the log-transformed daIa have an approximate NORMAL DISTRIIUIIDN. The (arithmetic) '-lEAN of the logged cIaIa is. likc the individual values.. an the log scale. The geometric mean is calculated by backlnlllsfarming (antiloging) this arithmetic me... The histogram in the IIpn: illuSlnlc:s the position of die: arithmetic mean. geometric mean and MEDIAN of n:d "II folate I11C8SIIn:IDCnts made OD blood samples from a large random samplc of'women visitilll a clinic. Since Ihe data an:
iIi
Gam~nON
_________________________________________________
slccwed the median is a better D1C1WR of location than the aritbmelic mean. However. the logged red Clell rolate data arc normally dislributcd: in such circumstances the geometric mean is a good measure or loclllion and is closer to the median thaD 10 the arilhmelic ~. Note thai abe geomelric mean can only be used when all data poinls ha\'C values or above zero., since it is impossible without further lRANSfORMATION (e.g. by abe addition of an initially large nlDl1ber to all obserwdons) to talec the IOJ or zero ar a negative number. L.ass are typically taken to either base 100r basee: both an: equalI)' wlid.lrlhe data are logged to base 10 (denoled log OD mast calculators), then abe antilog to calculate the geometric mean must be to base 10 (denoted 1(000).lr. however. the data are logged to base: e (In). known as the natunilOJaritJun. then the antilog musl be to base e (e"). 1be red Clell folate data Wtft Ioyed to base e and the arithmetic mean or abese: values was calculated as 5.777. 1be antilo, to basee of 5.777 is eUff =323. In a RANDOMISEDCONIROWD 1RIAL b)' N, el DI. (2002), the primary outcome was Ihe len,th or hospital stay followin, compulcd tomopaph), (CT) scan in patients with acute abdominal pain of uncertain aetiology. SiDce the length or hospital Slay was expected 10 have a skewed dislribution. the primary comparison was buc:d on pomClric mean Ieqlh or Slay. In lheir results scction.1he authon quoted the geometric mean for the two poups involved (5.3 and 6.4 da)'s) but also. since the inlelpR:lalion or geometric mean is I110Ie canduCM to relative than to absolute cliffen:noes. dais was qualcd as a 20'1, incmasc iD one group's length or stay. with a 9S4JI, canftdence ink:r\'al from an I tJ, shorter to a 56 ~ longer stay in hospital. SRC Geome1iic
me~n
= 323
~ (Arithmetic) meiMl .= 351
Median = 35 -
.---
1000 -
N. C. s., Watsoa. c. J. Eo, P........ C. ... s., T. C.. BIIaurJ. N. A., Rowdea, a. A.. BrHIeJ. J. A. aDd OIDIa, A. K. 2002: ~valualian of cui), Ibdominopclvic compulCd IOmDpapby in ... ticats with acute abdomiaal pain of unbowa cause: pI05IJCCtiw: rancIomised saud)'. BrilUIr MedkQI Joumtl/32S. 1317-91.
G-estlmatlon This ~fC15 to an estimation technique designed to exlnlet the causal elTccl or observed exposwa in a I1IIldomised or abservalional selling (see CAUSALITY). II exploits an assumed independence (conditional on covariates) bctweea the poleatial outcome under a ~fcr ence In:almcDt and the randomization indicator or the observed exposures. The method typically starts from a slnIclural nested madel. which panme~se:s a causal conlnsl between the dislributionof'the observed outcome and a potential outcome under a reren:ncc tn:atmc:aL When an lNS'I1lUMENTAL VARIABLE R is awitable. the polC:Dtiallrealment rn:e outcome wiD be (MEAN) iadependc:at of R. In an observational SCIIin,abe 'no unmcasum! confounders assumption' (NUCA) implies (mean) indepenclenec between a poIc:Alial beatmenl-frec outcome and Ihe observed tn:atmenl. conditional on the cxmrounden.. In each case. G-eslimation follows upon backlransformi"l the observed outcome usin, the causal parameters to an (expected) lJ'caImc:at-fm: outcome and then solving an indepeadence equation. The method has been espc:cially wluable forestimalion of Ihe efl'cct of ~vlII)'ing exposures in the absence of unmeasuml confollllCbs when measured timo-varying conrounders may be inlermcclialC variables on the causal path rrom exposure 10 outcome. 11ais happens.Jor instance, when estimating the effect of AZr on T4-count. if an observed low T4-counlleads to an incn:asc:c:l dose of AZr (Robins. 1994). SbIIIdud regn:ssion does not wark in Ibis case since one musl adjust ror aJRrounders to avoid bias. bUI cBIUIDI adjust rar intermediate variables if one is to retain the fUll causal effec:L Below. we explain the setup of G-c:stimalion meR ronnally rar a slnlcturaJ neslc:d mean model in that sc:Iti"l (sec Ihe filum).
r-
"-
-
t-
r-
o(1
.---
200
t-
n-n.
G ......,maBaa SIl'IIdUTDI nesled meQII model
T
400 600 HOC) 1000 1200. IRed) cell 'f{llate (I'Imolll)
geometric _n MedIan, geometric mBIIIf and stithmetic mean of red folate measutemtIfJts on blood SIIIItpies from 5OS2 women
l..el at time points (10-/., ' 2 , •••• I~_I) a sequence of exposure levels ~k-I (Ao. A I. Az••••, A«_I) be obsc:rvc:d. which could have been set to aditrerent sequence 11":_1 = (ao. "1. "2, •. 0, DK_I) tluuup some intervention or ~ipulation. One further observes mvariates LA: = (La. LI •... , LK) over lhasc time points and at an additional final time pointlA.. in response to the previously obsc:rved lIealmenls. The ENDfIODIT
=
__________________________________________________________ G-ESnMAnON ofinlcrcst. Y. is a well chosen function ofthc full sequCRCe -LI(. which could be just LI(. A dynamic treatmenl strategy is Ihcn a rule G Ibal assigM aI each time point lit a well-dcOned treatmenl a!(f/t. Ai -I) in the ftmction of Ihc observed treatment and eovariate histor)' up to that point. With each such rule com:sponds Ihc potential outcome Y(G) expressing whal the outcome would have been had. possibly contrary to ract. treatment strategy G been rollowed. To model the causal effect of (dynamiC) Ratmcnts. the slrUctunl nested mean model (SNMM) consideJ5 at each time lit how Ihc expc:ctcd outcome would change if a reference relime (e.g. placebo or tmltmenl levels a.l = ror j ~ k) would be followed from thea onwards. Conditional on observed treatment and eovanate history Hit = (4 ..... A" -I). the expected mean diffCRMC between the observed and the potential outcome.. is thea parameterized. This can happen. for instance. duough so-called blip functions. which express the additive elTcct of a final treatment blip alt at time I" on the mean potential outcome; i.e. with (GIt _ I • alt. 0,,+) n:presc:ntilllthe b'eatmcnt stratelY G up to t~ 11l_ h followed by treatment alt at time and treatments from then onwunls. one could model E(Y(~_I' alt. 0/t+)I HIl) - E(Y(GIt-I. Olt. 01:+) 1Hk) as a linear function of the history (!:,t.AIt_I)' The NUCA formally states that at each considc~ time. ,,,. conditional on observed treabncnt and cavanaa history. Hit. the next level of observed treatment. A", is independent of the future response Y(G) one would obtain under a pre-specified dynamic strategy G such as "~_I' 0el.0oh)' Formally this is written as:
°
_
°
'It
Assumption (1): Y(G)UA.rd£i,& To justify Assumption (I). il is important to plan to measure the needed cowrialcs L. As the Olun: indicates. this allows forctirect elTccts between Ihc measun:d \·ariables. but not for unmeasured confounders: Ihc dotted BROWS should be absent. so we can discard Ihc unmcasun:d U from the causal graph clcpictinl thc necessary direct effects. An extension of Ihc orilinal approach yields a doubly robust estimalor by involving an additional model for the distribution oflhc nextlrealment level given the histor)' H". The corresponding estimator is Ihcn consistent when either this propensity model for the next treatment or the structural model is com:ct. So far we have discussed parameter estimation for lhe SNMM. The above assumptions suggest also Ihc G-nulltest of 'no causal elTc:ctoftreatmcnl A' onoulcome.lnclccd. under the strong null. Ihc observed Y and counterfactuaJ Y®'s eoincide for any choice oftreatmcntlcvelA, at any time point ,,,. Hence assumption (1) then implies thai Ihc conditional distribution or the next treatment level Ait is independent of the: future Ygiven Hi. the past tn:almcnt andeovariate history.
This can be tested by qressing Yon the past tn:almcnl and covariale history as well as AA. and testing ror indcpcndcnc:e of Ait in the conlext of that regression. Altcmalively. one may test whether conditionally on the past H, the distribution of the next tmltmenl level Alt docs not further depend on Ihc observed outcome. An important advantage of this test is its robustness under the null to misspcc:ificalion of the causal effect model. To estimate E(Y(G». the expected outcomc under any (dynamiC) In:almcnt strategy G one wishes to evaluate. one needs the additional assumption of 'no current In:almcnt interaction' (sec. for instance. Section 4 in Robins. 1994). The assumption implies that Ihc postulaled mean causal effect of alt is not reslriclc:d to the: subset of individuals who happened to R:CCive A" = but holds for all aJftt'emcd. In that case Ihc so-called G-c:ompulalion algorilhm can be used with Ihc cSlimated causal effect parameters to dcri\'C an estimate for E(Y(G). The G-cstimation approach has Icncrally proved to be quite rich. There is Ihc doubly robust G-estimation which stays valid proVided the structural nested causal model holds. when eilhcr a model for the conditional distribution of covariates is co~ or the model for Ihc conditional distribution or treatments is salisfied (Robins. 20(0). With coRm ccnsonxl survival outcomes (sec 5VR\'lVAL ANALYSIS - AN OVERVIEW). SlnICturai accelerated failure time models, allowilll for time spans to be sluunk orexpanded as a result oftreabncnt received. have led to popularO-cstimation approaches. The tc:cbnique was funhcrdcvelopcd to estimate optimal dynamic tn:almcnt n:gimcs (Robins. On:llana and RolnilBy. 2008). Ooclgeluk. Vanslc:clandt and Cioctghcbcur (2008) show how sequential G-cstimation allows controlled din::ct elTccts quite simply to be cSlimated under conditional indcpcndcna: assumptions. Sa'eral prominent applications or O-cslimation of timeV8J)i1ll exposun:s have enlc:R:d Ihc lilcnlture. Most noIably. Hemin and colleagues (2008) have applied ilto analyse: the effect of HRTon CXII'OI'Iary heart disease. based on the Nurses' Health Study. Tbey found results consistent wilb those from randomized trials hilhcrto ihouPlto contradict observational fmdilllS.ln gencnl implemenlalioncan be quite involved. not only because the assumption of no unmeasured confounders is hard to justify, but also due to the compulalional challcllle. An implementation in Stata (see STA11mCAL PACXAOES) is proposed by Stcnae and Tillinl (2002). EG
a,.
Goetaelak. 5., Vaasteelaadt. 5. aad GaetPe........ Eo 2008: Estimillion of controlled diRet elJccts. JOUTntll of tbe Royal SlatiSlicol Sociely. Stritl B 70. 1049-66. H..... M. " til. 2008: ObsenalionaI studies IIIIIllyzcd like randomized experiments: aD appIic:1dioa to poslmcnopausal honnone daenpy and CORJl'lIl)' heart disease. Epidemiology 19. 7~79. RaIIID!I. J, 1994: Correcting for noDcompliance in rudomizcd bials using SlructuJal aesac:d mean
193
~--------------------------------------------------models. OInrmunit'tllions ill Slatmira 23. 2379-412. Re...... J. 2000: RalJuSl estimation in sequealially ipnble missing data and causal iAf'emxe modds. In Prot:tedinls of thr AmrriNlr SttlliJlit'tll AuodtltiOfl. Scclion on BtlyrJitul attlt&1it'tI1stieR«. 1999. pp. ~ 10. Reill-. J., ~ L and ~. A. 2008: Estimation and extrapolation 01 optimalllalmnlt and testing 5tr*gies. Sttll&tiu in Met/kiM 27. 467~721. StaDt, J . . . 1'IIIIDI. K. 2002: 0estimation ofcausal df'ccts, alleMing for ti~wrying canfauDding. T1Ir Sttlla Journal:! 164-12.
GLII
See CJENEJlAU.SED LINEAR MOOEL
global scaling Measuring qualilalil'C Yariables.such as quality of life. physical funclion. mental health and ocher CIOIIIpJex variables. is a multiconceplUal and mUltidisciplinary problem. and there arc no standardised rules for m:onIing. The thecm:eical definition is a c:onceplUlllisalion of the wriable. which means identilk:ation or the specific concepts and of the hiCI'III'Chicai stnJc:lure to be slUdied. The variable might be considc:Rld as heine unidimensional or composed of dimensionsandsubdimensians that in lUmcan besepandal into difrcn:nt CIOIIIponenas. The operational definition dcftaes cKh CXJmpOnent in lmnS or items; each item c:omprises a single attribute to be RlCOIdcd. often by an onIcrccI calegorical scale. The ftrst figure shows a general scheme or a variable A that is opcndionally cIcftnc:d as being CCll1'lpased by the lIRe subvariablc:s. aIsocallcd dimensions. labelled A I. A2 and A,. The hcaldt-n:lalcd quality of lire qucstionnain:.. Short Form(SF)36 (sec QUWTY OF Lft MEASUREMEm), has this struclure with nine dimensions and a total of 36 items (Svensson. 20(1).
~
I..
~
] .. I ]
U ... ]
n ".n Il ~ ... ~ 1 It~I ,,· 11
~
_ _ _ _~I~ •.
[J " l]
1] ,, 11 gIabaIscailng An iluslralionofthe possfJIeoptl1lltionaJ stRICtum of a thlflfl-dimensionlll, multi-item (10 Items) valiable labelled A The purpose of usine multi-items to measure II certain variable could be to inm:1ISC the covcngc oyer hetelOlC-ncous groups or to reftc:ct various aspects of the same variable. Many items can also be used in ardcr to identify the mast significant sign of a c:erlain sllllus or dysfunction.
The PoslopcnlliYe Recoyery Profile (PRP) Questionnaire includes 19 item wriables of importance for nxovery aflcI' surgery (AlIYin el Qt•• 2(09). Multidimensional multi-item insIrWncalS allow for interpretation at threc distinct bul relared lcvels: the discrete item level. the dimcasional. subvariable level and the gJoballcvel. Data frum cach or these levels will provide an integrated picture orthe indiYiclual. As the items coycrditTcrcnt aspcclsofthe same variable. a single global scale of the variable is required. 11aere arc yarious approaches to agreptc multi-item assessments to a global scale. Calculation of sum scores of item scale n:sponscs and lI'aI&ormation of Ibis sum to a standardised sc:ore ranging from 0 to 100 is a very cammon approach to global SCGriIll. However. the rank-inwriant properties of data from scale asses5mCnts imply thlll adding sccm:s is not appropriate and conclusions drawn from mathcmlllical calculations on ordinal data may nOl be valid. 1bcrcfOle. other approaches that take account of the nonmc:lric propcrliesof ordinal dala mUSI be consiclc:ml (see RANK INVAIlWa). The IUles far a global scaling of the variable should be based OD the lhcan:tical and opcmtionai fnuncwodts of the variable. When the multi-items an: conslnlcted far identification orthe most serious sign or a panicular stulc.the maximum categorical level or the items aJUld be 11ft ~ priate global SCCR. Another appealing ap)JIUIICh. especially far dichotomous data. is to use the numhcrofindicators ofthe outcome ofinlemil. In the PRP Questionnaire. each or the 19 items represent IIItribulcs. such as pain. pslroinlcslinal function and pcnonal hygicac. which. when pcn:cived as a problem. are a sign of nat fuU m:ovc:ry after surgery. Therefce the number of attributes of 'no pmbIcm' is Ihc suggested global SCGIC of recovery. the gioballt"ale ranging from 0 to 19. when: 19 is operationally deftned as the score of being fully IeCOVCred (AlIYin et QI•• 20(9). A single score of mulli-ilc:ms can also be defined by Ihc MEDIAN. Whc:. there arc 11ft odd nwnber of items abc median score is well defined by the onleml item response that comes halfway in the range or anIcrc:d item n:sponscs.ln the CIISC of an even number of items and the two central item n:sponscs differ the median cannot be delined as the a~ of these categorical values because orthc DOnnumcricai pmpcrlics of the data. Any of the two central clIICgories and others in between the ordered set of item responses will serve as a median. Forexamplc. for an an:Ien:d sel of six item rcspoascs. 'none. slight. slight. modcrale. modcralc. vcry severe'. both 'slight' and "moderate' will serve as a median. Then the calqcxy thai reftc:cts the peater dcJn:C or severity can be suilable as a global score. especially whca the instrument is used 10 idealify individuals III risk. In the example 'modenlc' wiu be used as the global scon:. A global scale with an optional number of onIm:d C8tc:gories could also be construc:tc:d out of a thc:on:lical
_____________________________________________________________ GUDBALSCAUNG
Con -e;pt A.
Concept H
"'-2 .f
84,
A3
D,)::;fu , 00md til bess. unctiona] Di~ B2': b c 1-_ _ _ _-+-_ _-p..!!!!B!!!!ilJ!!!! i !Iil!!!!n!!!! oo ..dOllllliiiiilt-separrn ed BI : b
BJ
[ deal
BaJanced
glablll8CIIIlng The main slnlCtuffJ 01 "" flIab!lJ scoting 'attainily IuncIion condiIionaJ QIJ,fhecafrl(pdcal1eve1s of the ~ lrere labelled A and Ei
of.
glablll-=-ling 8tJOIt ~ dille CtiIfIIia for the tiigIrl categories In the Bw«Iish WH5Ion dille G/8sg0w . OUtcomes ScIIJe (S-GOS) . . an 8JCIIIr1pIe single gkjb81 SClIIe . . TIM S-GOS allegories 0/ tntlcorne
Short daaipliolu of enlem
De"" B.
Dead
A.
V.IaIi~ SIBIl:
.c. ~ disabilily, low D. $c.vcn: disability, hil" B. Maderaae disability. 1D\\' P. Modcnde disability. hip
Vqeltlli..~ sltlle COIuciDu:r, depmt/eR1
b",
Mini..... cammw1icalioa is .,..ible by ~lionalIeSPDnsc; dependena in aD ADL .:liVilies . Putially independent ia ADL; POSI-lnumalic complaialslsips~ n:suinplian of lftYious lire ~ work not possible I~t•. but di6llbled I~ ia ADL; unable lO laURIe previous socW 8IId.Iar work acliYilies; ·past-1rau1llBlic signs evident Post-traumalie sips·~ present, Which. however. allow aaumplic;m of IIIOSI formc:r activities. either full-time or put-lime MIlY htl'N! mild lDit1uIll eJf«ts Capable or n:suming normal ocCupatiOnal and social ,activities: .dicn: an: aIi.DOI" pIa~icaI' or mental cleficilS 01' cDmplairiis Full ra:G\IeIy without sicns or symplOlnS
Iiamewark. professional knowlcdp and experience or ~
.composite variable. FACES is a »item qileltiannai~ or family fUncliOD, whicb is. CGlDpiex \'IIriable defiDed by two 'main Conc:cpIs. each having Ihn:e subdimcasions aad 10 item scales·with four YedJaI clcscriptivc sc:ale ~s., ~ ODe or the cancepts· could be illusIndI:cI by variable A iD the first ft&tR. n.e median BJIIX"CNIC" was used for deftnlng the 110bal sc:ale of.each subclimcnsiDn and Dr ~ two IDIIiD .~ hen: labelled A andB.1n onIcrlOget IIICR clclailccl inlonnaliOD from ~ assessmenlS the aperationaI deftniliOD of an e~c:ht..poiDt &~ scale or family ftmclian was
-ucstal by the j.vressio..l researcher· based Oa die 4 )( 4 .CGlDbinatioDs· or median scan:s fram die two miln conec:pl.. u shoWD in the&eClOlld ftlU~(Slarkcaad Svensson. 2001 ).It is also possiblclo use nlDlleric:al1abelsofcateJOries. bearing in mind thallhc data will stiD be ordinal. A sinaiI.. approach to global IIXIringoflhe variable bodily pain asscssecI by two items in Ihe SP-36 lave been sugested •• "'-~anlalb:mIIlivc to die use of SlaDdanlisccl sum sccns (Svensson. 2(01). .0. way 10 avoid agpeplion of multi-item scaIes·is to construct. siqle.hier.n:hical ~ale with muilidinaensioaal
1.
G~~DE~
___________________________________________________
condiaions for each anIemI c_goricallevcl, such 85 in the Glaspw OUteames Scale (GOS) (see the table) (SYCIISSDD and SIarmark. 2002). ES
480 470
AI... .., I.1IIIIDn, M........ No, S~ Eo ad ..aB. .. lOOt: ~ ofa~ toJIIeIIIR paIicnI-. . . . .
i: i-
pas&aperIIi~ JalCMI)': CHIleDl
\'aIidiIJ IIId inlm-patienl n:liabiUty.
.ItJrImtI1 oJ ErtJJ.11IIIJ ill ClhdrtIJ I'r«lice IS, 411-19...... Me ... ......,B.200I:C'aasIIucli. afall" aucSSlllCDtscaJeaf ramily fiIIIctiaD. uSia& a qucstiaaMR. S«itIl Work ill HMII" CdN' 34.13142.Sft--.E.200I:Coastrudiaaofasialle&lobllsc:aIe ror multi-ileal _ _ _ _ af die same YBriIbIe. S#lIIillita in Ihdit_lO, 3831-t6. Eo ............... J. B. 2002: &aIladon of iadividual and . . , chIaps ia social auICGnIe . . .
s.-.
aaaaysIIIII subardnaid
~
461 451
S 413
:403
!393
a ....-lcnn roUow-up
_y. JtIIInIIII tl/lWtI6iIiltlliM M_ _ 34, 251-9. graphical decepllon
Graphical dc:cepliGII imolws
dispIa,s ordala that may mislead abe _wuycilherbydcsip
arbyenw. ColBider. foreumplc, the pial ortheclcaah nIe per million f... c....... or abc bn:asa. far IICYCrai periods cwer Ihc last Ihn:e decades. shown in Ihe ftnt IIIIR. 'I1Ic mte appc:an 10 IIIIIL::rgo a ralhc:r alarming increase. Howe\'Cl'. wilen abe data IR ~plOIled willa die YCllicaiscaieslaltinc. zau. 85 __n in the scaalliglR. the increase is akop:Iher less saartlq. 11M: example illuslraaeslhal undue c:ugendiGD or compn:ssian of ahe scales is best aWJidcd whal c:anstrudinI paphs if you wanllo avoid Ihc chup of paphical cleDcpaion. 500
graphical deceplan Deafh rates from CIIIJOIIr 01 ,he breast whenJ fhe y-axis does.not include the 0IfgIn Analhc:r distortion made popular by graphics paclcqcs is the .Disuse of duaHlimensionalily. such as in PIE CIIARR.. wcnenccl by Ihc abilit)' 10 raIaIe abc pic and dcIach sliClCS • wiD. This can haw: the effect or inlaling ar masldng a padicuW sube_pry to IUiI the paint being 1DIIIIa. Leadilll jownals paahibillhc usc of such unscicnlifte dcviClCS. In Ihe I8111Cwa)'.barcharlsandbisaopaasshauldnolhavearliftcial dno-cIimcnsionalit)' inb'Dduced 85 it c:anfUscs the n:ader when tryiag to lad 011' axis values.
The shrinking family doctor
400
inCj"1fom1a
~ of doc:Iora deuoted solely to IamIy practice
..
a s "200
I
100 O~~~--~--r--r~~
1955 19&0 1
1
t
1980
gMphlcai cIeoepUon Dell" lBtes from canoer of ItJe brRsf whete the y-fIXis does Include the origin A \'CIY coaunan form or diSlorlion inUalucc:cl into Ihc paphics. often papular in lhe mc:cIia, is one wheM both dimcasions of a two-dimensional ftpre or icon an: wried simullancoUsly in mpolllC to changcs in a lingle observed quantilY. An example is shown in the third fl&tR.
1:224=_atIon graphladclecepUon Theshtfnkingfamilydoctor{l8lcen with pennIssion Tulle, 1983)
'tom
_____________________________________________________________ GRAPHICALMODELS Tufte (1983) quantifies the diSlorlion in gnphical displays with what he calls the lie factor or the display, defined as follows:
Ue factor = Size or cffect in graph/size of cffect in data The lic factor rar thc shrinking docton is 2.8. Some suggestcdprinciples for avoiding graphical distortion leading to possible graphie&l deception taken from Tufte (1983) IR: n:prcscntation ofnumbc:n., as physically mcasun:d on the surfacc of the graphic itself. should be din:ctly proporlional to the numerical quantitics n:pn:scntcd; clear. detailed and thorough labeUing should be used 10 dcfeat graphical distortion and ambiguity. Write out explanations ofthe dataOD the graphic itself. Label important e\'ents in the data; to be lnIthfui and revealing, data graphics must bear OD the heart of quanlilalive thinking. 'comparc:d 10 what?' Graphics musl not qUOlc data out of conlcxt: abovc all else. show Ihc cbdn. SSE (Sec:
also ORAPHICAL DlSft.AYS)
TIdIe. E. R. 1981: The l'iSUDI displa)' oj quanlilatire information. Clacstlft, CT: Gnqlhics Press.
graphical displays These are proecciures for visually displaying measured quanlitics by means of the combined use ofpoints.lincs. a coordinate system. numbers. symbols. wonk. shading and colour. It has been estimatcd thai betwccn 900 billion (9 x lOll) and 2 trillion (2 x 1012) illUllCS of statisUcal graphics are printed each year. Some of the advantages of graphical mc:dKxls havc been lisled by Schmid (1954): in comparison with other types of presCfttation, welldesigned charts are man: effective in cn:aling inlclest and in appealing to the atlClltion ofthe Mader: visual n:lationships as portraycd by charts and graphs ~ more easily grasped and more easily remembcn:d: the usc of charts and graphs savcs lime. because the esscntial meaning of large measures or statisUcai data can be visualised at a glancc; charts and graphs proVide a comprehensive picture of a problem thai makes for a more complcte and better balanced understanding than could be derived rrom tabular or textual forms of presentation; charts and graphs can bring out hidden facts and relationships and can stimulate. as well as aid. analytical thinking and invcstigation. The last point in panicular implies that perhaps the greatest value of a piCIun:: is whca il fon:'ICs us to notiec what we ReYer expectcd to sec, although it should not be fOlp)ttc:ft that humans an: goad at clisceming subtle paItenI5 that are really there (but equally good aI imagining them whCft they arc altogether absent!) - and graphs are sometimes aJDslnIctcd so as to mislead (see ORAPllJc:AL DECEP11ON). Many graphical displays used in medical researdJ. e.g. the JIISIOOIWof.. PE CKo\U· and SC.o\TJERIIlDT, have bcc:a around ror many yean. but during the last two decades a wide variety or new methods have becn deYeIopcd with the aim of making this
padicular aspect of the examination of data as informative as possiblc. Onphical techniques have evolved that will provide an cwavicw. hunt for special eO"ecls in dalD., indicalc OUJUERS, identify patterns. c&agnosc (and criticise) models and senerally se8IdI for novcl and uncxpcc:tcd phenomena. One cxample of these lI&:\\'cr graphical techniques is 11lELlJS ORAPHICS. 1hc cuneot approach to statistical graphics largely arises from the 'visualisation" philosophy expounded by Clevcland (1985. 1993). Then:: arc two compancnts to Cleveland's approach to dispJayingdaIa: ( I) groplring: visualisation implies a process in which information is encoded in visual displays and (2).Jilling: ftUing mathematical functions 10 dala is nec:dcd as \\'ell asjusta graphical display. Justg~JDwdata. wiahout fitting them and without gmphing the fits and residuals. often leavcs important aspects of the data undiscO\'cn:d. Visualisation is critical to data analysis. II pnn'ides a front line of aUack. n:vealing intric* &lJucturc in data that cannot be absorbed in any other way and can lead to the discovel)' of W1imagincd effects as wcll as challenging imagined ODeS. Oood graphics will tell a convincing 1I1ory about the data. In pradice. large numbers of graplw may be needed and computers will sencraJly be needed to draw them for Ihc same: rason that they ~ used for numerical analysiS. spc:c:d and accuracy. SSE (Sec also BOXJIl.(JJ'S, (iR()WI'H CHARI'S. RESIOOAL PLOTS. SCATJU. PLOT MA1RIC'ES)
CIneIaad. W. S. 1985: The titml.'flls ofgraplting dala. Summit. NJ: HoIat~. CIeftIand, W.5.1993: VjsuQli:ingtiala. Summit.NJ: Hobart Press. SduDId. C. ,. 1954: HtmtIbooIc ofgrap/riC' presentD-
lion. New York: Ronald.
graphical models Also known as t:ondiliollQl independence gToplIs. these models n:prcscnl interrelationships in
multivarialc data pictorially. 1be graphs associated with these models depict the n:lationships between variables, with nodes rcprcscnting random variables.. and lines between them (edges) n:pn:senUng associations between them. 1be idcnliftcalion of indcpc:ndcnce between pairs of variables. conditional on the othCl' variables in the model. enables graphs to be simplifted by the omission ofunnctlCS58I)' cqcs. Models rar multivariate normal data are called gTaphical Gmasitl/l motlelsor covar;allt:"e :wleclio" moJeu. The graphs associated \\'ith these models depicl partial com::IaUons bet\\'ccn variables. conditional on aU the other variables. Models for categorical data are based on the multinomial family of distributions and an: called graph;collog-linear models. Here the associations thal an: depicted an:: interaction terms in LOOLINEAR MOOEU. ML"Ced models that combine categorical and multivariale normal data (mixcd models) can also be fitted. The ftrst figure (sec pagc 198) shows some output flUID MIM,asoftwarepacUgcdc:dicatedlothistypcofanalysis(sce Edwards. 2002. far further details or the package and of this
197
G~~ALMODELS
___________________________________________________________
example.. which relates ID malhematicsexam marks). 1hc first malrixisac:arrcJationmalrix rorftvevariabies(v. w. x,y.:)and is roUowed by Ihc paniaJ cam:lationllUllrix cOITCSponding ID it. The fUB model willi all pairwise pallial ~OIIS can be cxpn:ssal by • model rannuJa as vw.~..,. ancIthe lIIIIDCiab:ci paphissllownonlhc left. A modcIspc:c:iftc:clas l'IVx.-"Y:relains lhosc: subsets or variables Ihat an: mutually (partially) com:IatcdandsclslDzauanyinsiiniftcantpadiai cOllClation. Hcz. the choice of edges to be amiUccI C.y" 1'=. "". ,,'z) is rairly obvious. However. in • mare complex silUation the n:duc:c:d model would typically be found by rcmovilll each pair or variables in tum OD the basis of their slalisticalsipificanc:c. The fiuccl partial com:lalion nudrix c:arrapcmdiq ID this rmucc:d "".~..1)'Z model is shown as Ihc third malrix. with Ihc paphon the righL The pasitiaaoftbc variables in the cIiagnua is not impallllnt. only the links between thc:m.ln the n:ducc:d model. one: can conclude" ror exlllllple. that ~ is nc:edcd ID pmIid any or the other variablc:a and also thai y DDCI;: an: nat nc:c:dccIlo pralict ". &0 long as x and wan: auilable.
Input canellilion malrix 1.000 l' 0.553 1.000 IV x 0.547 0.610 1.000 0.409 0.415 0.711 1.000 Y 0.389 0.665 0.607 0.436 IV .tC % l' Y Partial correlation malrix 1.000 l' 0.329 1.000 IV 0.230 0.211 1.000 x -0.002 0.071 0.432 1.000 Y 0.025 0.020 0.357 0.253 IV .tC % l' Y Partial correlation maIrix. rour elc:mc:nts set 10 0 1.000 l' 0.332 IV 1.00 0.235 0.327 1.000 x 0.000 0.000 0.451 1.000 Y -0.000 -0.000 0.364 0.256 .~ IV l' Y
=
1.000
=
1.000
=
.-
1.000
lraphlad madels A full and a reducedgmphical modBJ with associated pal1illl correIIJtion matdoss IIIJd Ibe IniIiaI COI'I'8Iation maltbc (mathematics marks data from 1Lfanfa, Kent IIIId Bibby, 1979. analysed by Edwards, 20(2)
OnphicaJ madeJs an: based DOl only OD statistical disbibuli_ theory but also on conc:cpls of mathematical graph theory, e.g. clique:. and acyclicgmplu.ln dais example.. the subset of variables isa clique sinec Ca) all the vertices are joiDed and (b) any IlIIIer IIIbset CXllllaining it does not have dais properly. Acyclic means that there are lID paths flUID a node back to itself. Conc:epIS such as lhese ea5tR thaalhc: iIIuslndive graphs and lheirassaciatcd models donal contain redundant infonnatiOft and can be inb:lprek:d uDllJllbiguausIy. D«ompombnily is a crik:rion sonu:Iimes !lGUlht. This requires that madels can be bralccn clown inlo series or repasions; it can aid interpn:lalion and allow ca1ain CJC.act sipificllllCC tests to be appIic:cI. Otla more familiar criteria may be applied 10 models. For example. only log-linear madels that are hiertll'chiml an: usually CXIIISidered. as is the case in IIandanIlog-linear analysis. GTaphictll cMin models. also known as Bll}V!sitm IIetltt"QTk:.~ n:pn:senl a series or mcxIels thaa have a din:cliaaal relationship lo one IIDDIhc:r. i.e. ant: madel pn:cc:des anada in same sense. either through a ...lumI orcIerilll in time ar throup same assumed causal relaliomihip.11ae variables rar each component model an: considen:cllo be in a block and the blacks are onIemIlo rann a chain. Associations within blacks arc considc:ml 10 be noncusal whereas Ihose between blocks an: considcml potentially causal. FRIm Ihe second block onwanlscach model is CXllldilional on the ather variables in that bloc:k and those: in all preceding blacks. 11M: edges linking the components an: tc:nnaI directed and an: denoted by UI'DWI. nae second ftgun: (see page 199) illuslnlles • relatively complCJC. paphical chain model concerned with inrant mortality in Malaysia (Mohamed. Diamond and Smith. 1998). using calegorisc:cl data. Note thai the conveation is to show discrete variables (as they an: in this example) as closed cin:1es when:as CXllllinuGUS variables are dcpictccI as open. as in the lint figure. 1be compom:atsofchain maclelsan: shown as blocks (often c:aclased in rectan&ic:s).ln the second lipan:. parts (a) and (c) an: simple grapllical models showiq assaciati_s betwcea pairs of variables contmIling ror Ihe DIhen.. The other parts show models thai have a temporal or causal relationship with one another. A summary was ~ ducc:d fRlm alllhe constiluent chain maclels, fiom which it was concluded thai neonatal mortality was dira:tly associated with mab:maI c:ducati_. ethnicily. slate. year or birth. lOUKe ordrinkilll water, birth interval.lR-ntalUrily and sex. HOMM:I'. neonatal IDOrlaIity was nol associab:ci with birth atleDdant. binllplace and anteD.... care. Estimation or models rmm dala is generally pafClllDCd by MAXIMUM I.JKEUIIOOD. Automatic selcc:ti_ ofvariablcs can be perfarmcd _ the basis or the ch8llle in likelihood n:sullilll rmm adcIin& or n:moYin&.s in lum. This pnx::css typicaUy SIarls either with the rull or lIIurated madel. conlainilll all passible c:dps. followed by suc:cc:ssive elimination or non-
""'X
___________________________________________________________
~ICALMODBB
(b)
(a)
(c)
.---..........
....0_
.
-
.Cd)
_CII'IIIIr • IIMbInlilnNll • 1)'pt or.1eI
...........
•
~(e)
.
..
,
..... 01
-'01
..............
,
, "
~
. , . : ,/ ./" I~""
.
•
:
:
•
,
. ,
,
\\::
,:
"
.,:
I
,le
.-"
,
•
,~
~
...~~
/
......•
1)11"' .
............ .~.1iDIIIIl
• ,..01 ......
.-....,
,.......
DInInI-
l
!
". #
.....-
..... 01 . . . . . . . .
r
.,' ,r i ,.'~
\" •• I, .,." , #' ,. 'i",' "
IIIIIh
A---I
... .......
... -1--1-" "....--...
--. To lirth place, birth attendInI and pramaturay - - To Iirth place and birth attendant
._ Signilcant at 5% - Slgnllcant at 1%
InIphIad modele GtBp/IicIII model for fIfJdJ step 01 an analysis 01 inIanI motIIIJII.y In Malaysia (110m AfoIBned, DiIImond IIIId SmiIh, 1998): (II) socioeconomic facIots - inIIablock assoc:iIlions; (II) socioeconomic factcn and faclols lJefoI8 prefJII4fJCy-inteIIJIot:k associIItIons; (e) facIoIs before prefJII4fJCy - /nIrabIocf( assocllJtlons; (d) tIt8cI assocIalions bettwJen IIIJIenaW Clll8111Jd lis det8tmtIIJnIs; (8) drect 8SSC7CiItians beIw8fIn facfols at bIrIh and their potenllel dtJIetmInants; (I) dI8ct associations befMI8n neonaIIII morIIJIIy and lis tIeIetminIInIs
Potential
sipificllDlliaks.. or with Ihe null model roDowcd by 1IIIIXIeSsive inclusloaof'liaks.1D the case ofmultiYllriale nannaI cIaIa. SUII1IIIIIIY lIDIiIaics such 85 t'CWARWCE WI'RICE3 nIIber lb. J8W daIa may be suflicienllD lit IIIDCIcIs. However, diltp«wtic tcsls based on individual CIIICS and cIaIa transt'anDIIlian arc thea nat possible. StICh tcsts include e.xaaainalion of'n:sicluals rOrQUlUERSancI.lIIIaSIIIaIlorlll....lil)' if~ 1IDx-
Cox tests can be used 10 cIecicIe on the aJlFlVPlilda 1nInsI'or-1DIIIi_ ifthis is iadicatccl. Gnaphic:al GausUua __lsassume dIIIlllllR 8M no intaactions (partial corn:laIions clcpencIiDI _ Ihe Ic~ or • lhinl variable)... the case of ClDllliIUlUl wuiablcs it may be advi.... lDdic:halomise Ihe data and n:fit the maclc:1 as a las-linear model ar a aWu:cI model., that any inlcnlc:liGas can be dctcc:Icd.
GROSSREPROOUCnON RATE (GRR) ________________________________________________ Many llandard statistical Icchniques can be framed as speci.1 cases of graphical models. e.g. mixtun:s of multivariate normals (see RNI1E MlXTUltEDlSI"RIBlI"JIONS).or ANALYSlSaF VARL\.'«1 However.1here ~ two silualioRl in the analysis of medical daIa for which graphical models m.y be particularly useful. The 8m is when liUIe is known .bout patential ASSCCIA'IICNS among • poup of variables and wl1c:R an explonllory approach is therefan: called for: here a stepwise .ulomatic seJecliOD pI'OCCss wuuld probably be used to seek simple models consistenl with the data. The other. as in the earlier infant mortality example. is when: one has in mind a allllplex modcJ of assac:iatiODs and causal links. the overall SlrUctun: or which can be SCI out even Ihough the details canaol be specified. The ability to depict die associ.tions visually is pallicularly helpful in lhese lwo contexts. The numerical values or etTect sizes. e.g. the partial eam:lations. ~ of course important. but it may be lhc qualitative infonnation as to which pairs of wriables 8M associated at any levcl (conditional on the n:st) that may be of prime inten:st. This is most easily represented gnphically I1IIher than in tables. A iccy paper is by Lauritzen and Wermuth (1989) and Edwank (2002) proVides an applicatiOD-oriented inll'oduction making usc of dedicated software MIM. Whittaker (1990) is anoIher general ICXL Closely n:latccl Icchniques are path analysis and SlJUctund equation models. BoyesilBr "aphiCQI modellilrg. in which paramclcn and latenl vari• bles arc included in the graph (in addilion to die obsen'ed quantities). is disc:usscd by Spiegelhalter (1998) aad illustrated with an example n:lating to cancer incidence. ML I!'.thrudI, 0. 2002: ""rDtiMtllo" ID ,raphiNl motlt:llin,. 2ad edition. New Yorl: Sprin&er ~ ......... S. L. _ W.....atII,
W. C. 1989: Graphical models far lIS50Ciatians between variables. same of which _ qualitative and some quantitative. AnIIDls D/ $lalist;a 17. 31-S7. MardI-, K. V.. Kat, J. T. _ BIbby, J. M. 1919: Mulli.llriate 1lIIIIi)'Jis. Landan: Academic Pras.. ............ W. N........... L 8IId SIIdtIa, P. W. F. 1998: "I1Ic dctcrminanls of iaflllll mortality in Malaysia: a graphical chain modelliDc appruacb. JDlllflQI ofIlle Royal S'alut;tQl Sodely. Serin A 161. J49..46. SpIepIbIIIIer, D. J. 1998: Bayesian paphicaI modelling: a casc-study in monitariDI health outcomes. Applied $lalis';a 47. 115-33. WIdtIIIker. J. 1990: Grapltital modeb ill opplim"",Ilil'llnate statis,ics. Chichcsler: Jahn Wiley &: SoDs. LId.
gross reproduction rat. (ORR) group sequentla. methods
See DEMOORAIIIIY
See IlVA-DEJlENDENr
DESIONS. INIERI),I ANALYSIS. SEQUENlIAL ANALYSIS
growth charta
1bcse are ORAPHIC\L DlSPUYS that pn:s-
enl AtE-RELAtm REFERENCE RANOES for anthropometry sum as hc:ight and weight in childhood. Growth charts con",nliODally provide informatiOD on three or IIlCR QIL\NI11ES or the
age-n:1atc:d distribution. including the MEDIAN and other centiles (percentiles) placed symmc:Uically about the median.11Ie firstlilure (see PIlle 20 I) illustrates apowtb chart of infant weight in British boys. 1bere an: Dine a:nliles on the chart. equally spaced two-thirds of a unit ..... on the Z-SCORE scale. Theft is also the growth Cun'e or an infant followed over a 2-month period. showing marked growth faltering. The puwIh chart is used in seven) distinct ways. Finl. it is • sm:ening or diapostic test. com:sponding to the way a~ related n:ference ranges an: used. Measurements an: pIoucd against age on the chad. and children whose measurement lies outside lhc refc:n:nce range. i.e. above the lop a:nlile or below the bouomcentile for the child's age. arc cansidenx110 be at risk or a growth disorder and are n:fClTed for further investigation. The propanion of childn:a screened depends on the centile used. e.g. 3., below lhc 3m centile or 10., abo", the 90th a:ntile. This assumes that the children an: repn:sentali", of the n:ference papulation on which die growth chart is based. The population IlREVALEXEofa growth disorder is small, so the scRlening in rate cam:sponds closely to the FALSE PDISI11YE RATE (100'" -SJIEC1fMnY) of the scRlening tcsL the \'list majority of the selected childn:n being t"n:e ofthe disorder. Note though that the growth chait provides no information of the lnIC positive rate (SENsmvnY) of the scn:eninglc:sl. as this needs to be based on a n:pn:sentativc sample or growth-disanlcml childn:n. which is geaerally not available. A second use of the growth chart is to quantify the centile position oflhe individual child. by seeing whatccntile curve lhcir measurement is close to. For measun:ments in the body ofthe disaibutiOD the approximate a:ntile can be oblained by interpolating between adjacenl a:ntile curves. The third and most common use orthe chart is an exlcnsion of this previOUS use to mcasun:ments on two or more occasions. which are plotted on the chart and die points joined togelher to make a growth CW'YC. The child's growth velocily over time isassesscd on the assumplionoftracking. where the centile is expc:ctcd to be conllant and the growth curve parallel to the ceatile Cun'eS. This constant centile over time cOII'Csponds to a growth velocity that is close to the population MEDIAN. whc:n:as growth-clisonlcml childn:n show 'a:ntile crossing'. i.e. lhcy grow faster or slower than the implied refCRMIC median ",Iocity. as seen. for example. in lhc lirst liIure. It is this usc that gives the name to growth cbans. quantifying growth in indi\iduals n:lative to the refen:acc. It should be IeCXllnised. however, thai powth charts used in this w.y lack statistical ripr - the n:feR:nce data an: cross-sectional rather than 1000itudinai. aad contain no infonnalion about growth velocity. In addition the chart does not adjust for n:gression to the median. Child cenliles followed o",r lime (assumed 10 be n:prc.. SClltati", of lhc n:ference population) will be subject 10 regn:ssion towards the median. so that. for example. in a
_______________________________________________________________
GRO~CHARTS
I II I I II II I I I
13
BOV"S WEIGHT (kg) -~ BIRTH - 1yr
_ ••~
~a
1+-+-+-f~I_+_++++++-+-t~I_+_++ ...... _ _ _ _ _ rHI
1 ..
1" ..
I II 1..:" 3 ~~,v...
12 _ ~ DOB .......1 .......J ........
EOO±wks ....•... •r
~
"
I"' '
i'
....
..,
~
"''..' .... ,•• 1·'
I·'
,.-
"" .......
""", !., ,.,
"'" ... 1-' 10-'"
1-1-
- I"' ..
l-
!- ..
,
~
,
~
1-' ......
..
.
I-'~
... ",10~~
Ioo'~
100'"
, .1'
i-
I- i"
!•
growth ch.t Growth chaff to assess weight in infancy, British boys 1990. The chart shows the OAth, 2nd, 9th, 25th, 50th, 75th, 91st, 98th and 99.6th centi/es of the weight distribution by age. Also shown ;s the growth curve for a child measured at 7, 8 and 9 months (reproduced with permission of the Child Growth Foundation)
group of children on the 9th centile on one occasion. their mcan measurement centile will be closer to the median when followed up. The size of this cenlile crossing effect depends solely on the sln:ngth of the correlation betwcen measurements on the two occasions. Heigh.. for example. tracks vcry strongly in mid-childhood before puberty (the ycar-on-year correlation excccds 0.97). so most children show \'cry liUle height cenlile crossing over lime and regression to the mcdian is hard to detect. For more labile measurements like weight. body mass index or skin fold thickness. and particularly during periods of rapid growth such as early infancy or puberty. the age-on-age correlation may be as low as O.S and regression to the median is nuuted. This emphasises that
charts to measure size arc not ideal to assess growth. and vice versa. A more userultoolto assess growth \'elocity is a dedicated velocity chart. Charts to assess growth \'clocity can be constructcd in the same way as chans for size. Here cach child from the ~ference population pro\'ides two measu~ ments taken some p~-spccificd lime interval apart (e.g. • ycar). which are converted to a velocity and thcn analysed to construct centile charts of \'c1ocity for age. The restricted lime inten'al ensUll:s that the contribution of measurement crror to the total \'ariability is fixed. as the amount of MEASURElIENT ERROR \'m1cS inversely as the time intcrval. VelOCity chans ~ less satisfaclory than size charts for three
201
GRO~HCHARTS
_______________________________________________________________
n:asons: (a) the time interval requin:ment is reslrictive as moSl children arc not measured that regullll'l)'; (b) the chan does not adjust for regression to the median and (c) n:quiring two chlll'ls rather than one to assess cach child's growth incn:ases resource costs. An ahcrnath'c is to assess the child's growth \'clocity from the sizc chart. This exploits thc principle that \'clocity is the rute of changc of size. so that the slope of the line joining successive measurements is a measure of vclocity. Velocity can be assessed in tenns of the rule of change of either the measurement or thc measurement centile (more accurately the l.-scon:), and the lallercorresponds directl)' to ccntile crossing. This is the principle behind ·thri\'C lines', a
.~'
·;t;f;t;f;t. t;tF~
I
~'
t
set of curves analogous to «ntile curves that are printed on atransparcnt plastic overlay and placed on the growth (size) chart to detect failun: to thrive (i.e. poor growth). CcnUle cun'cs repn:scnl the pattern (or direction) of growth in childn:n who are trucking perfcctly, i.c. whose ccntile remains constant ovcr time. Such children are growing on the median vclocity (ignoring regression to the median). In the same way. thrh'c lines can be drawn to rencct the pattern (or direction) of growth of childn:n growing at some specified \'clocity centile olhcr than the median. Thc second figure shows thrive lines for the Sth velocity ccntilc, superimposed on the infant weight chart of the first figure. The child's growth cun'C in the second figure lnICks along the
4 I ~I
21
a 4~
12
.......
BIRTH -1yr DOB ....... J •...... J ........ EDD ± wks ........
I-
~
I... ~
... ""
9
"" ..
... ,~
8
.... ~ ~
7 1--
.. I v
,'iii'"
(weeks)
,
I
~~ ':" l~
6
I;II;"~
,j~~ 7 1."
5
,,'
..
,
'
~jiI'"
~
,': I--
"
~~ ~~
~~
"fo:: fI~ ... l ~
f-- ....
"'. ~ .' "i4'i""~ '=:i;IIio-" ,
/~
'1'
io;oI~'" ,.'
~
~
,•• 1."
~
,1'
1,,1"
--~
.1,,1-
..
,.,
•
..
",III'
"
i-"
i
I
... 1-'
"" ..
..
... I-'~
.. lor
.. ""I"
,..
I_~~ 10
.111"C I
91'1 ....
I",,, 'Il-' ~~ ~~ ~ I··~ ~ I··~ ~ 1"'1" ':" I'
'~T •
..1-'"
8 7
~
6
='1-" 1-"""
5 4
~
3
, i-" f-'
2
!!~ ~~ ""~ ~I'" I" ' WIlla ...._ _
:1- ..
~~1'121111~ ~ I ~ ) I ~ I ~ ·I~ , I ~ ~
9
~
~f"~
'1,)1
..
~
..
11.'-
,
~~I-".
"If
L' " I~~ ~ I·"
3
--~
~'
-'" Ii ~ , ~.I'-
4
~
~
I-"
~
,""
~
l.
'lilT 11 - . .' I
"" ~.
""i-"
13
J..: I A~- I
I...
1n
2
.Ii! 12
.
,.. I"
'
;Ii&;;»;: ~
... ",,1-' ,
1
o~~ I~ 4 I II
--...1111-~
1i
2
II I 3~1
I I I I I I I I I I I I Weea
131-- BOY'S WEIGHT (kg) 1--
If;f;t;;.;t;t; t
1
IIIIICIIII~
I. I I I I J I I I t I I I !:Ill
I II
I :11 tJ I
~
I II
11:1
~ Ilf 1"1' I ' 1'1' I' '1· ~ l:a
~ I ~ ~ Irt I ~ II II ~I:l)
growth chart The growth chart of the first figure with thrive lines overlaid. The thrive Hnes quantify the 5th centile for weight velocity overa 4-weeIc period. The infanfs growth curve ttacks along the thrive lines, conesponding to the 5th weight velocitycenti/e for the first month. and approximately the 1st centile for the whole 2-month period (reproduced with permission of the Child Growth Foundation)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ GRUBBS' TEST STATISTIC
thrive lines for a period or 4 weeks. and this defines the growth rate as the Sth velocity centile. Only about I child in 20 grows InOI'C slowly than this. The time period is important. and the growth pallern becomes more extreme the longer it lracks the thri\"C lines. Thererore. as the child continues to lraclc. along the thri\'C lines for a further 4 weeks. i.e. 2 months altogether. this means that the \'elocity over the whole period is near the I sa centile. clearly or much greater concern. Note that the first figure highlights the centile crossing, but gh'cs no clue a... to how ex~me the \'elocity centile is. The main technical con,,-em with growth charts is the representativeness of the underlying reference population. This depends on the nationality. ethnicity and timing of measurement or the child being assessed compared to the rererence. for example. the British growth charts (e.g. the first figure) an: 00scd on ethnic Caucasian British children measured in 1990. They arc therefore less appropriate ror assessing Caucasian Dutch or ethnic minority British
children. and will become progressi\'cly more out-of-date as time passes. due to the secular trend to increasing height and particularly weighl. For rurther details see Tanner ( 1978). Cole ( 1998). Cole. Freeman and Preece ( 1998) and Ulijaz.sek. Johnston and Preece (1998). TJC
ecH. T. J. 1998: Presenting infonnatioo on growth distance and conditional velocity in one chart: practical issues of chart design. Stalislic.'i in Mt·dicine 17. 2(f.}7-7C11. Cole. T. J .. Freeman. J. V. and Preec•• ~f. A. 1998: British 1990 growth reference cenliic:s for weighL hright. body mass index and head circumference filted by maximum pcllllJiz.cd likelihood. Statistics in Mt!dicint~ 17.407-29. Tanner. J. 1\1. 1978: Foetru into man: p/ryJica/ grol\"llr fro", ,"O,.c~ption to maturity. London: Open Books. UUJazsek. S. J .. Johnston. F. E. and Preece. M. A. 1998: Canlbridge L'II(-yclopt
Grubbs' test statistic
See OCJTUERS
203
H haplotype analysis
A haplotype refers to a aJlDbination of alleles transmined from a parent to a child throlllh a haploid nucleus in a gametic cell. allhough the term is often n:slricted to a combination of alleles that are in tight linkage on the same chromosome. Humans are diploid: an individual's genotype is derived from the union of a haplotype from the rather and a haplotype rrom the I1'IOIher. Haplotype analysis includes the estimation of population haplotype frequencies from sample genolype data. the inference of an individual's haplotype From genolype data and the investigation of possible associalions between haplotypes and disease or other trails. One important problem in haplotype analysis is that an individual's genotype may be aJDsistent with multiple pails of haplotypes. Thus., the genotype AaBb is consistent with haplotype pails ABlab and AbiaB. In general. if there are nr heacrozygous loci in the genolype. then then: are 2"'-consistent haplotype pairs. 1bc aVailability of genotype but not haplotype dala can be regarded as a form of incomplete daIa. so that the estimation of haplotype fmiuencics from genotype data can be accomplished by an EM Al..CKIlI1HM. Other methods to haplotype frequency estimation have also been proposed. including Bayesian approaches that take ac:aJUnt of the similarities between the haplotypcs. If the fRquency ora haplotype deviates from the pracluct or the frequencies of the constituent alleles. then the alleles are nol independent but assoc:ialcd with each other. and are said to be in linkage disequilibrium. A number of measures of linkage disequilibrium are available for two diallelic loci. including D (the difference between haplotype rn:quenc:yand the pracluct of aJDstituent allele frequcac:ies). IY (D divided by the maximum possible D given the allele frequencies or the two loci) and R (the CORREl..ATlOH coeffICient between numericallyaxled values for the alleles of the two loci). The strength of linkBIe disequilibrium between the rnaJters in a n:gion may reftect the recombination nIC in that region (pOSSibly detenninc:d by the local chromosome structure) and stochastic variation in the recombination and mutation histol)' or the population. A possible association between haplotype and disease is usually examined by estimating haplotype frequencies in cases and aJDlIOls and testing whether these can be equated. Sometimes the association between a haplotype and a disease is slrongerthan the association between any of the constituent alleles and the disease. This happens when the bUe causal variant had originated in a mutation that ac:currcd on a chromosome aJDtainiDl a particular combination of alleles.
or when there is an interadion between the effects or alleles on the same chromosome (called ds interactions). Knowledge of haplotype Slnlct~ is also important for the optimal choice or markers in association studies. leaving out any markers that are predictable by the others because of strong linkage disequilibrium. PS (See also ALLELIC ASSOCIATION. OElIIETIC EPlDBIKI.OOY, CJEI\ETIC W«AOEJ
Hardy-Weinberg law This is a result cona:ming the rrequency distribution of genotypes at a polylllOlphic genetic locus in a population uncIu random mating. The HanlyWeinberg la\\' is an impoltant n:sult in population genetics that was derived independently by the English mathematician. G. H. Hardy. and the German physician. W. Weinberg, in 1908. For a genetic locus with two alternate sequence variants (alleles). the Hanly-Weinberg law stales that half the f'mauency (expmssed as a proportion) of the heterozygote genotype is equal to the square root of the product (i.e. the OE:OMETRlC MEAN) of the rn:quencies of the two homozygous genotypes. An alternative way of slating the HanlyWeinberg law is that the frequency of a homozygous ge~ type is equal to the square of the frequency of the conslitueat allele. while the fmaucney of a hetew~ygous genotype is equal to tWice the product of the rrequencies of the two constituent alleles. If the rrequencies of aUeles A and 8 are denoted by p and q. then the Hardy-WeinberJ law stales that the frequenciesorthe AA. AB and BB genotypes are given by ; . 2pq and t/ R:spec:tively. The Hanly-Weinberg law is therefore the result or the simple rule that the probability of two iDdcpendent events is equal to the product or the probabilities or the two events. The Hardy-Weinberg law can be violated in real populatiOM or samples for many n:asons. Populations thatmnsist of noninterbrecding (i.e. slnllified) subpopulations with different allele frequencies will tend to have an excess of individuals with homozygous genotypes. The chamcteristic Hardy-Weinberg ratios can be distorted by natural selection. where one or more genotypes con fen a survival advantage over the others. The ovenll population ratios can be distOJtc:d at a locus that contai.u; diseasc-pmlisposing variants in a sample of patients with the disease. Finally, the appan:nt distonions of the Hardy-Weinberg ratios ror some loci in a SCI of genotype daIa can be the result or genotyping errors in the labontOl)'. Testing for Hardy-Weinberg proportions is therefore a routine part of data quality checks in genetic studies. For loci with two alleles a Pearson CID-SQUO\RE lEST
EtrqUopIINit C'OMpIIItiolf It) M.aKaI S/fllislic$: S«ond Edition Edited by Briaa S. EYeritt and ChrisIGph« R. PoaImec' oJ> 2011 JohD Wiley & Sons. .....
20S
HAWTHORNEEFFECT _____________________________________________________________
wilh one dc:gn:e comparing observed and pn:dicled counts is Slandani bul for loci with ~ than IWO alleles a pennulation-based lesl is preferable. PS (See also AI.I.ELIC ASSOCIAnON. OEJlE'l1C EPlDBO(1()(JY. HERlTABDJTY I
Hawthorne effect This is a possible effcCI that might be pmduced in an experiment or study simply rrom subjccls~ awan:acss or participalion in some fonn of scienlific investigation. Thai individual behaviours mighl be altered becausc thcy know they arc being studic:d was first said to have been demonstraled in a research projecl carric:d out at the Hawthorne Planl oflhe Western Elcelric Company in Cicero. Ulinois. in the laic 1920s. 1bc major finding orthe study was that., almost regardless of the experimental manipulation employcd.the production or the workers sccmc:d to impro\'C. The implication or thc elTed is that people who arc singled out ror a study or any kind may impro~ thcirperformance or behaviour. not because of any specific condilion being tesled but simply because of the attention lbey ,"cive. A mc:dieal example suggeslc:d by Gail (1998) involves a study of methods 10 promote smoking cessation. in which il is necessary 10 4XlRtact Slady participants each year 10 ddc:rmine smoking &latus. A furthermore recent medieal example orlhe appearance of Ibe Hawthorne elTcet is given in Fox. Brennan and Qasen (2008). 1be Hawthorne effecl c:ouIddistoJt study results if this mpealc:d annual conlact affec:lc:d smoking behaviour or thc l'CpOIting of smoking behaviour. SSE Fox. N. 50, B~ J. s.. aDd CIIaIIa, S. T. 2008: Clinical estimation of fctal wcight and the Hawthome c:ft'cct. EuroptOll JOumDl of OIJSl~trlt3. G}'lftltC'Olog), tmd bproduclhoe Biology l4l. 111-14. Gd, Me H. 1998: Hawtbome etTect. In Anni~ P. and Colton. T. (eds). EItt)'clopetliQ of biOJtalistics. Chichester: John Wiley a: Sons. Ltd.
hazard funcUon
Sec FRCFORTIOIW. HAZARDS. SURVIVAL
ANALYSIS - AN OYEIMEW
health-adJU8ted life expectancy (HALE)
Sec
DEMODRAPJIY
health services research Health scrvices Rsearc:h. according 10 Bowling (2002). 'is concemcd with the relationship between the provision. elTediveness and emeient usc or health services and the health needs or the populalion. b is narrower than health n:scan:b'.Itthus entails measuring and evalualing the inputs. processes and outcomes or healthcare provision. Input and process infonnalion that is primarily aimc:d at assisting hcalthcarc managers and providers. especially when coIlectc:d on a rouline basis. is probably ItICR corrcclly 4XlRsiclcn:d as audit or quaUly DlSUnmtlC.
Genenlly speaking. such routine data can l'III'CIy be used ror reseudl pwposc:s. due 10 diftk:ullies in mainlaining stanclarcls in data coUcc:lion. An exccption would be a lang-tenn case regisk:rconlaining daIa on all patients in a gb'en arca pthcn:d in a Slricdy conlmllc:d and objectivc fashion. While most sIandanI slali5lieal methods arc potentially appUcablc in health services msearch. some arc I1'lCR useful than others. This is bc:c:ause health services research is often relalively complex. inyolving as il docs the analysis or dilTemnt inlerventions. outcomes and levels of data simullancousJy. Becausc of lhis complexity. and also sometimes becausc of ethical issues. relatively unusual experimenlal or pseudoexperimcnlal dcsips such as stepped wedge designs. pmrerenc:e trials and randomised consent designs arc available in addition 10 the more standard experimental designs. such as individually or cluster randomisc:d CUNJCAL 11UAL5. observational studies. such as CI05S-5ECT101W. 51\JD1ES. COHORT STtJDIES and CASE-coNIRDL STUDIES may be man: approprialc or indeed the only rcasible option for studying health serviees in natunlistic scltinp. For further inrormalion on \'DI'ious approaches. see the MRC gUidance on complex interventions. which has bcc:n revised and updaled from 2000 (CDig el al.• 20(8). A conn... between the typical he~ bial and Ihe typical phlU'ltUlCOlogicaillial is that the latler usually rocuses on the outcome for Ibe individual palicnt and assesses some particular therapeutic intervention such as a drug. a sUllieal proc:edwc or a psychological intervenlion. The remit of a typical hcallhcare trial. contnuiwise. tends 10 be broader and more complex since it often involves the evalualion of one or more interventions. the environment in which they take place and the personnel administering lbem. The outcomes may be measured at the patient level but they may also be measun:d at other levels. such as the ward or the hospilal. or indcc:d at several of these levels simultaneously. Three nested levels hem might be patient. ward and hospital. and these would all nec:d to be taken inlo KCOunI in a MULTILEVEl. MODEL. In a discussion aimed specifically al psychiabislS. bul which is nevertheless generally applicable. Dunn (2001) draws altention 10 some or the problems inherent in health service trials. One: ofthesc islbe HA\VJ1IORNE EffECI'. in which there is a nonspecific or PLACEBO elTcet that is not dim:tJy associated with Ibe specific content or the intervenlion bul is rather due to the mere fact or participalion in the study. Dunn points out thai. in health service research. providcJs as well as patients may be subjc:ctlo such an elTed. AVOiding it is oRen more difficult in heallhcare llials compared 10 clinical trials because blinding or participants lIUly be impractical or unethical. nae definition or outcome is often quite problematic in health services research since interventions may potc:ntially procIuc:c mulliple and confticling cbangc:s in scveral
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ HERITABILlTV
dimensions. ImparlDDt statistical issues in this ilia 1ft thus cIcaIing with mulliple significance lestiq and combining outcomes into summai)' statistics. Economic anaI)'S~ aimed at balancing Ihc effecliw:ncss or outcomes apinstlhc casl or providiq inlcrVcnlions.. is c:ammonIy pcrfonncd in health scnic:cs research (sec COST-EIftCI1YENESS ANALYSIS). One issue that has to be addn:ssccI in this conIcU is whether scnice usc information. such as number of hospital admissions, should be relarded as an outcome in its own right or whclhcr il should be c:onsidm:d purely on the ClGIl side of the equation. 1bc views orclinicians and health c:canamists may differ _Ibis poinL Oftc:a oukXlmcs are concc:mcd with such conccpIs as patient_isfaction or QUALITY m: LIFE (Fayen and Machin. 2(07). which may be difficult 10 define and capture. E\'Cft once they ha'VC bc:c:a defined conceptually. outcomes are DOl always Slnlightforward 10 measure and often in~lve the usc or QUESIIONNAIRES. 1bc latter may be prone to Icstmat imJRCision. due to a subjec:t~s inconsistency or. indcccl. gc:auinc chaages fram one lime poinlto the: next. or clisapeemcnl bctwccn raters (in cases where Ihc qucstionnai.a 1ft administeml aad inlClJRlcd by someone other than the subjccl). Methods for assessing MEASUIBIENr a_ 1ft Ibus impoltana in health scnices research. The analysis or the psychometric praperlies of inslrUmcnts. such as Ihc:ir reliability and validity (sec Streincr and Norman. 2008). may be nc:cessary when: inslnancnlS have been developed especially for a study. The treatment or MISSINO DATA. and data quality in genenL is also a relatively common issue arising in health services research. nis is bc:causc there is gellCl1llly less control over cIaIa collection in the community ar a hospital. as oppascd to an cxperimental laboratory or clcclicatal clinic. 1bc sl8adanl CONSOJn" ST~ t.I!NI' may need to be adapted (sec Boutran el til.• 2008) for nonclinicalouames. Sometimes Ihc focus in health services research is on 8IJl'CI* dala fran high-le'VCl units such as hospitals or health authorities. For example. melhads for comparing the pc:Ifannancc or health providers in league tables may be mauin:cl. Goldstein and SpiegclhallCr (1996) discuss some of thc issues arisiq from Ihc: comparison of institutional pc:Ifannancc. Methods for analysing spatial statistics are used when the geopaphical Iocalion of the units is also importanl and such methods may be integndCcl with a Ic:ographical information s)'Stem (GIS): this is a specialised form of database that holds complex popaphical data so as 10 allow ilto be visualised. Such mc:daods may be aimed at idenlirying outlying disease clusters. examining the impacl or area-widc intencnlions or measuriq health inequalities and relating them to other area-wide daIa such as social deprivation. ML
ISec also ECOLOCHCAL snJ)IfSl
BoIdroa. I.. Moller, D.. A..... D., ScaUz. K. and Ra,..... P. 2008: &tcadinl the CONSORT S'*lDCnl to nnclamimllrials of noa-pharmacoJagic tlalmcnt: explanation aad elabcnlion. AIUfIII, of I",ernal Medicine 148. 295-309•. 1JowIIDt. A. 2OD2: ReJearrh fllelhotbilr IrmI,Ir. 2ndeditiaa. BuckinPam: Open Univenil)' ~ss. C...... P.9 DIeppe. P., ~ s., MIdde. s., NuantIa, L and PtCtknw, l\oL 2001: DeveIClllinc. and evaluatiac complex intcm:ntions: the new Medical Rescan:h Councilpillluace. Brilm. Medit:tll JtNII'IItll. 337, al6S5. DaaD. G. 2001: SlatiSlical methods for measuriag 0IIlcDmfs. ID 11aamieroft. G. and Tlnlclla. M. (cds),. MmlaJ bral'" oulcome fllelUlltes. 2nd cclitian. New Vcnl: SprinJer ~d8&o pp. 5-18. F.,.., P. Me and Maclda, 0. 2007: QMali" Djli,fr: the tJlUs.JRtInl. QIItII)'Jia anti inle'prettllion Dj ",tient-l'eptlT,Gi 011't'Dmt'a. 2ad edition. New yeldt: Wiley a: Saas..Inc. GGIdsteID. H. ad SpI,,,WIer, Do J. 1996: Leque tables and thcir limitations: statistical issues in c:ampari.sans of insIitutional pafarmancc. JtNII'IItll of tire Ro)YlI SltI'ulkal Socie'y. sma A 159. 38S-443. StnIaer, D. L ... Nonaaa, G. R. 2008: Health metl.SlUemml scale.: tI pratliml ,1Ii,w10 IMiI' tltloelopIM"t tIIId we. 4th edition. Oxfonl: Oxford Uaivenity ~ss.
J_
herttability
In the broad sense. heritability is the p. . portion of the variance of a givCII InIiI that is explained by genetic ditTerences in a population. In the: narrow SCIISC. genc:lic dilJerences ~ reslrictcd 10 thasc due 10 the addili'VC effc:cls of alleles. Heritability is a key concept in population ICnclies intnxluced by Sir R. A. fisher. in close: cOlUlCCtion with his work on the ANALYSIS m: VAItJANCE. Nonaddiliw genc:lic inftucnc:cs. which include inlCnlc:tions between alleles aI the same locus (dominance) or at different loci (epistasis). are included in bnIad but not nanow hcrilability. In humans. herilabilily is usually estimated by twin ar adoption studies (sec TWIN ANALYSIS). The classical twin design relies on the fact thai monozygotic (MZ) twins an: developed from the same fertilised qg and 1ft therefore genetically identical. whereas dizygotic (DZ) twins an: like onIinary brothers and Sisters in being develapecl from two separate fertilised ova and therefon: sIIIIm _ a\'Cl'8le SO CJt or their genes. Given this fact.. and under some additional assumptions (including the equality or the CIIvironmental similarity bct\WCl1 MZ aad DZ twins and the absence of dominance and epistasis). a simple cslimalc or heritability is given by twice Ihc difference between the il1ll1lC:lass MZ and DZ correlations far the trail. This simple method of estimali_ for the heritabililY is known as Falconer's formula. Adoption studies wodt under the assumption that any com:lalion bcIwccn an adoplCC and his or her biological family is enmly gc:ac:tic in origin. In the absence or epistasis, twice the com:lalion between adoptee and biological pan:at provides an estilDlllc of narrow heritability. Similarly. the intraclass correlation far MZ twins reared aparllJlQvides an estimate far the broad heritability.
HIERARCHICAL MODELS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
histogram This is a graphical replaenlalion of a
A high herilDbilily 'is IDIDeIimes misinteqnlcd as meaning thal the trait is untilrcl)' to rapond to enviranmenIaJ chanps. HerilDbility rcllec:lI on the genetic and environmc:alal diffc:n:nccs that exist in a particular
fleQuency diSlribuliOD iD which cadi class inrerwl is ~ SCllred by a vertical bar whose base is the class interval and whose heiPt is the numbc:.- or ob~ODS in Ihe class intenal. When;lIIe class intervals arc unequally spaced Ihe histogram is cbawn in such a way that the .a'or each bar is pIOpOItional to the fn:quency f_ thai class inlerVai. Scott (1979) cxmsidcn how 10 choose the optimal nmnber and width of classes in a histogram. for then: arc mailers of choice. Two examples of histograms arc shown in the ftI'RThehistopamisgenendlyusedfortwopurpases.caunting and displaying the distribution of a mable, although it is ndativcly incffcclive forbolh. with stem-and-leafplals bein. better forcaunling and boxploll better far auessing dilbibUlional pmperties. SSE
populalion; it cannot be used to predict the consequences or enYiJonmental changes outside the normal range ror the populaliOD. A ramiliar example is that the mental retardalion that is iDvariably assac:ialcd wilh Ihe genelic condilion phenylketonuria in a nalural populati~n can be IRYCnled by the inlroduclioD of a 10w-phcD)'laluinc dica iD early inraDC),. PS [See also CJENE'I1C' EFlDDtKI.OOY, CIENEI1C I..OOCAOE,. QUANnTAME 1RAJT LOCI)
hierarchical models Sec LOO-LINEAR MOOEU
Scoa, D. W. 1979: On optimal and data-t.sed bitaapams. Bionw-
hlglHllmanslonai data This is a lenn usc:d for cia-
~riktl66, ~Io.
IaseIS thai arc chancterisc:d by a Yery large Dumber of
\lllliables and a muc:h IIIGI'e modest number or olM!erYations. In the 21st c:enlUr)' such dalasels arc coDc:c1Cd in man)' arca5, e.g. IeXtlwcb data miDiIll (see DI\T.~ MININCJ IN MEOICIJIE) and BlOIJIIfORMA1ICS. The IIISk or cxtractiDg meaningrul statistical and biological iDronnation flOlll such datasc:ls pn:scnls many challenlCs for which a number of IeCCIII methodoloJical dcvclopmenlS may be helpful: for details ICC. far elUllllpie. Francois (2008). SSE
historical controls This Mfers to the use of' past data f_ the purpose of makingComparilDDS with pn:sent daIa in a rc5ClRh contexL U.fortunalcJy. despile the appeal of desirinI to make efficient use of previously collec:led ~ soun:es, with informalioD sIoml perhaps OD a computer data~. the use of historical controls is frau.ht with BIAS (Pac:ocl, 1983). Onc annal make reliable inferences in controlled aJNIC'AL 1RL\lS by comparing new daIa with old. Tile main reason why bias would be introduced is the lack of comparability aI baseline between the two groups. Only
.'tI
IF. . . . . D. 2ODB: Higll-dimmsiolrtrl t111t1l)'m: jrDRt oplimlll IIIIttria III Imllft ~kd;lJII. VDM "=da&-
(a,
(b)
-
100-
15
-
r---
~
80-
10 .
r-- '
I
1
-
5
I--
20-
,.
0-
-
r-
do 160 Helghl(cm)
do
i '
180
o'-
II
o sdo 1000 1sbo 2Obo 25b0aobo SUrvival lime (days)
histogram (II) Heights (em) of ekIedy ."..",; (b) sulVivlll times (days) of patients with IeukIJemIa
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ HISTORVOF MEDICAL STATISTICS
CCJIICum:nt IANDDMISATlON of eligible pllllicipants can bc:stow such betweea-graup comparabilit)'. since ranclomisil1l alone can seck 10 CDStR tmatmenlpoups ~ balanced with n:spccllo alllbe known and (innumerable) unknown risk factors. CRP Pocack,s. Jo 1983: Cli"it:tli trim: II /NQctittll ap,rQQm. Cbic:bcsaa: John W-dey &: Sans. Ltd.
hlatollcal dMiography
See DDtOOIAPHY
history of medical statistics
The lint atlempts at °mc:clical statistics' might perhaps be CGDsiden:d the early elTOIts 10 keep track of bil1hs and deaths Ihraulh chun:h n:cords or wc:ddi11l1. christenings and burials. However. mon: ambitious statistical procedun:s than simple counting would have been IlIIIel)' unwclcame to physicians until well inlo the 17th centur)' simpl), lxcausc the)' might ha~ raised Ibe unlhinkable speeR of questioning Ibe invulnerability mast or lhem still claimed. Medical practices at the lime wen: largel), based on uncritical n:liancc on past experience.. po.sl 11«. ergo propler hot: reasonil1l. and veneratiaa or the "1I'1I1h' as pnlClaimc:d by authoritative ftgun:s such as Galen (130-200), a On:cl ph)'sician whose inftuCDCC dominatal medicine: far many centuries. Such atliludcs largel)' SliW an)' intcn:&l in expcrimcntatiaa or proper scicatific investigation or explanation of medical phenomena. Even the rew cliDicians who did Slrive to increase their knowlc:dge by close abscnDlion or simple experimcat oRca illlelplCtcd their IIDdil1lS in the Iighl of the cum:nlly acceptccI dopua. Sevc:nlauthors have painted out whaI must qualify as the waders earliest n=c:cxdccI camparalive trial. Described in the biblical book or Daniel. Ju:ac,e cin=a 600 B~ Daniel and three colleques ellpn:sscd their prererence not to be given road thai had been ~ conlnr)' to their beliefs. Their slUdy involwd a prior hypothesis and primary ENDPOINT. albeit rather subjcclivc (facial appearance). and the trial chnlion was limited 10 just 10 cIa)'s. 11ac control IJ'DIIP. which n:c:eived thc SlaIIdard r~ was an unkDown size. but. clearl)" the IR:atmCDt poup, which n:c:eived \ICIetables and water onl)', was small. at just rour. 11Ie study turned out posilively rar Daniel. Despite modcm-cIay criticism. notably lack of 1WIlOMISA11C»I. 110 ane could critic:isc Daniel far his inftucatial choice or publicatiaa (see Daniel I: 1-16.. Holy Bible). 8y the laic 17th and earl)' 18th centuries., medicine: beg_ its slow prvgRlSS from a son of mySlical cc:dainty 10 a SCientifically man: acceptable unc:cdainly about many or its proccdun:s. Tbe laking of systematic observations and canyinl oUI or experiments became mon: wi~. John Oraant (1620-1674), SOD of a Londoa draper. for example. published his Nlllllrtli tmtI poliliall ob.n'lllilllU
rmuIe upo" lhe bill.. O/IIfIIrllllil}' in 1662 and dcri~ the first cver life table. Graunt wu what mighllOday be lcrmc:d a vital llaliSlician: he elWnined the risk inben:at in lhe plUCCss of birth. marriage and death and used bills or lDOItaIit)' (weekly n:porlS on the numbers and causes or death in aD area) 10 compare one disease with maoIher and one year with 8IIOIhcr by calculatinl mortality statistics. Graunl's wed and ideas had considerable in8uenc:c and bills of mortality ~ also intraduc:al in Paris and odICI' cities in Euntpc:. Early ellperimental work in medicine is iIIuSlrDlcd b)' Ihe Cllamplc: that is oOen quoted of James Uad~s (l71~17M) Slud)' unclertabn on boarcIthe ship Ihe SlJIi.rbury in 1747. Und assessed sewn! clitTeRnt possible trc:almcats farlCUl'\'y by giving each to a difl'en:nt pair of sailors with the disease.. He observed thai the lwo mea given oranges and lelllODs made the most dramatic n:coftl)'. altholllh it wu to be aaalher40)'CIID beron: the Admirally wu convinc:al ellDUlh by Und's finding to issue lemon juicc 10 IIICmbers of the British Navy. The 1700salso saw the first appcarancc of a pracedun: that loats n:marbbly similar to a modcm-cIay SlCJHlFK'AHCE'I155T. specificall)' a SIGN TEST. This arvsc rnm John Arbuthnot's (1667-1735) endcavaun to argue the case for Divine: PIvvidc:nc:cin the Slabilit)' orthe ralioofnumberofmen towomcn. Arbuthnot maintained lhatlhe guiding hand ora diviDe beinl was 10 be discerned in the ae"y constant ratio of male 10 remale christenil1lS n:canlccl annually in London over the yean 1629-1710. The data pn:sentccl by Arbuthnat (1710) showed thai in caeh orlhe 82 years in this period. the annual number or male christenings had been CGDsislCndy higher than the number or remale christeninls. but never \ICl)" much higller. He then essential I), teslcd a null h)'JXllhcsis of "chance' dclcJminlllion or sell aI birth. against an alternative of Divine Providence. by calculating. under the assumpliaa that the null hypalhcsis is true. a PROIWIIllI'Y deftncd by n:fen:ace 10 tbe observed data. Albuthnal·s n:presc:ntlllion of chance in lbis context was the toss or a fair two-sided coin. in which case the distribution of births would be: (1/2+ 1/2)r1. 50 that the obsc:rved exccss of male christenings on cach of 82
occasions had an ex~mcly small probability, thus providing support ror the Divine Providcace h),pothesis. Arbuthnot oreea an eliplaalllion rar the gmdCr supply or males u a wise economy of natun:. u the males ~ IIICR subject 10 accidents and diseases. having 10 seek their rood with danler. Then:fon:.. plUvidenl nature 10 n:pair tbe loIS brings fanh man: males. The nc:arcqualil), orthe sexes isdesilned so that evCl)' malc ma)' have a female in the same cauntry and of suitable age. Olher mathematical dc:velopmc:nts in the 18th century that wen: or special n:Jevance ror medical statiSlics included
201
HISTORY OF MEDICAL STATISTICS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Daniel 8cmoulli's (17(D-1782) development of the normal approximation 10 Ihe BlNOMIAL DlS'I'JlIBUIION. which was also used in studies of Ihe liability or the sex ratio at birth. The ~ or mediad slalislics in punWlll reform is illustnll:d by Ihe work of Flon:ace Niptinple (1120- 1907). ID her efl"olls 10 improve the squalid hospital canditions in Turkey doriag the Crimean War, and in her subsequent campaigns to impnJve Ihc health and livilll candiliaas or the British Army. the sanilaly conditions and admini5ll1ltion of hospilals and the nunilll .,..,ressiOD. Flan:ncc Nighlingale was not unlike many aIhcr VlClOriaa rcfol111Cl5. Howcvu. in one importanl n:spccl &he was \'elY difl"ercnt. since she marshalled massive amounts of data. ~rully BmIIIgcd.bibulatcd and paphccl. and presented this rnaterialao ministcn. vieclOysand othcn,toc:onvincedacInof the jusliee of her case. No other major national cause had previously been championed Ihmugh the pn:selllaliOD of sound slatistical data and those who apposed F10rence Niptilllale's reforms M:nt down to defeat because her da.. were un&nSM:rable: their publiclllian led 10 an outcry. Another telling example of how camuJ arrangement of daIa was used in the 19th ccntury to save lives is pmvidcd by the work or the epidcmiologill laIIn Snow (1813-1858). Ancr an outbreak or cholera in central Lonclan in Seplembcr 1854. Snow used data coJlccled by the General Relister OfIice and pIoued the Iocatiaa or deaths on a map or the IUU and also showed the location or die amI·S II water pumps. The resultilll map is shown in the figure. Examining the scaUCl' O\'CI' Ihe surface or the map. Snow observccllhat nearly all thecbolem deaths wc= IllDDllllhascwholiyccl near the Braud SlRet pump. Howcver. befCRclaiming that he had disc:oven:d a passible causal connection. Snow made a morc ddailed invc:stipliaa of the deaIhs that had CJCCUIRd DellI' some odIcr pumps. He visilcd abc ramilies or '0 or the dc:ccasc:d and found Ihat faur or Ihcse, because they JRfenai its Iaslc, regularly senl ror water fnm the Braud SlRet pump. 11ua: others wen: chilcRn who attended a sehooJ ncar the 8mad Street pump. One oIbc:r finding thai inilially confused Snow was that then: wen: no dcalhs amang wolkers in a IRway close 10 abc Brvad SlRet pump. aconfusion that was quickly raolYccl when it became appan:atlhat the warken drank only beer, never water. Saow's findings WCR sumciently compelling 10 pcnuacIc the authorities ao JemD\'C the handle or the Braud Stm:I pump and. in days. the ncipbourhood cpidemic Ihaa had claimed more tbaa 500 lives had c:adcd. Later in Ihc 19th Ccnlury and in the carly 20Ib century. lhe walk of people such as Sir Francis Gallon (1822-1911). Wilhelm Lcxis (1837-1914) aacI. in particular. Karl Pcarsoa (1857-1936) bepn to chaillc Ihc emphasis in 51a1is1ics 80m the descriptive to the mathematical. The concept or ~ U'IICJN and iIs mcasun:mcnt by a cOJXlalion cocOlcient was introduced. Slatistical infen:ncc bcpn 10 clc:Yclop and cnter
history of medical .....8tIcs Snaw's IIIIIP of cIroItHIf deaths in lhe Bmad Street . , .
areas or scientific investigation, iDeludilll mccIical research. In 1909 Raaald Aylmer Fisher (later Sir Ronald) (1890-1962) enlc:mi Cambridse to study mathemaaics. die finl step to bccominllhe most influential slatistician or the 201h century. Fisher dc:vclapccl MAXI).ftJM LlKELDlOOD ES'I'DL\. liON. walked on evolutionary thcary. made musivc conlributions to scnelics and inycnted Ihc ANALYSIS OF VAl,. lANCE.. However, FlSher's most imporlanl canbibution 10 medical statistics was his intraduction of randomisatiaa as a principle in the dcsisn ofeenain experiments. In F"ashc:r"s case the experiments wen: in qriculwJe and wereconcc:rnc:d with which rcrliliscn Ic:cIao the In:atesl crop yields. FISher cliyiclccl ap1cultural areas into plots and randomly assipc:cl the plots to differenl experimental rcrliliscn. The principle was soon adoptc:cI in medidne in studies to compare comPCtillllhcrapies for a parlicularcondition.leading. of course, to the rancIomised CIJNIC\L 1RL\L (RCT). described by cmineniBritish slatistician Sir David Cox as "die masl irnparlDnl IDDSl
____________________________________________________________ HYPCmHESISTESTS
contribution of 2CJth.cenlUry statistics·. 11Ie lint properly pc:Ifanncd nuadomiscd clinical trial is now geacrally acknowledged 10 be thai published in I94B by anothCl' gianl
of 20th century medical statisaics.. Sir Austin Bradfanl Hill (1897-1991). who invcstigalalthe usc ofsRpIomycin in the tralmcnt of puhnolllll')' tuben:ulasis. Nowadays. il is eslimalc:cl &hat oyer 8000 RCTs ~ undc!taken worldwide every year. At aboul the lime lhal Bradfanl Hill was busy willa the fint randomised clinical lrial. anolher development was taking place, which. by RMtIutionising man's ability to calculate. was 10 have a dramatic em:ct GIl lhc science of statislies and the work of SIaIislic:iaDs. "nIe computer age was about to bqin. alllKHllh it would be some yean befeft statisticiaas wen: cnI~ly relieved or the bunIen or undel'taking large amounts or laborious arillamctic _ some pre-campUICI' caleulalor. HOWCYeI", in the 1960s. the lint statistical softWIR packages bc:pn to appear, which made the application or maDy complex slalistic:aI proccdura easy and mutine. The inftuence ofinereasing. inexpensivecompuling poweron stalislics continues 10 this day and OVCl'the last 20 yeBlS ils almost univc:rsal availability has meant that rescan:h workcn in stalislics in general. and medical stalislics in particular, no longer have 10 keep one eye on the computational difficulties when developing new methods of analysis. The result has been the intraduetion of many exciting and powerfal new statistical methods many of which ~ of greal imporlance in medical statistics. Notable examples to name bul a few an: BOD1S'I1lAP. COX·S REOUSSIGN, OENEJlALISED EStWATINO fQUATIONS. LOaISTIC RBIRESSION and WLTlPLE IMJIUI'ADON. In addition. BAyESIAN METHODS, at one time lilde man: Ihan
an intellectual curiosily wilhout practical implications because of their associated computational requirements, can DOW be applied relatively raulincly. MaDy inlc:n:sling examples are described in Congdon (2001). There seems liUle doubt thalthe remarkable success of medical slatistic:s will CODIinuc into lhe 21sa century. SSE (See also DEMOCJIWII~ EPlDDIIOUXIY] ArInI......J. 1710: Anarpnnl fa' Diviac Pnnideace.taken from the CCJIIStaM rqularity obscrv'd ill Ibc births or both sexes. I'IIi. soplrirtllTl'aluaclitItUo/. Roy. Sod~" 27, 186-90. CGaad-. P. 2001: BIIyuillll JloI&li",1 rntNIellin&. Cbic:bcsIa: John W"aIcy at Soas.LId..
hotspot clustering See DISEASE CLUSIUINCJ
Huber-Whn. estimate
See CWSlER IWmOMISED
TRIALS
human ......rch ethics board (HREB) Sec EnIIC\L REVIEW COMMrI1EES
hypothesis teals 11Ie testing of hypotheses is rundumental to statistics and BllUII1Cnas about appmprialc ways to lest hypothcsesdalc back toclispulcS between the founders of slatiSticai inference. durilll the lint half of the 20th cc:almy. R. A. Fisher proposccI SIGNIFICANCE 'IESTS as a meaDS of examining lhc discn:pancy between lhc cIaIa and a mill Irypolhau (e.g. lhc null hypothesis Ibat them is no associatioD belween two variables). The P-VAWE. (sirnijictlllce I~Pe" is the FROBAIILII'Y that an assoc:ialion as large or larger than thai observed in the cIaIa would occur if the null hypolhcsis wen: 1nIe. In Fisher·s appmach the null hypothesis is neycr prow:d or eslablishcd. but is possibly disprDlw. FISher advocated 1'=0.05 (5 fJt silniftc:ancc) as a sIancIard le~l ror CXJDCluding that then: is evidence against the hypothesis tcsIed. allhaugh not as an absolute nile: If Pis bct,,'CCn .1 and .91hc~ is ccnainl)' no
.,..y
RaSDII
S1ISpCCt abc h)'lXldlesis 1C:SIcd. Jrit is below .02 il is
to
indicalCd thallhe hypothesis fails toaccaunt for Ibc wIIaIe of the fadS.. ~ . . . DOt often be asuay if we draw a CCIIIYCDlionaIlinc II .05 (Fisher, 19SO). fa fld no Kiadiftc: warm has a fixed level of sipific:ance II which tium year to Year', and in all cimllnstaaccs, he .ejects bypotheses; be rather gives his mind 10 each panicular case in die lilbt of his e\'idence and his idea (fisher. 1973).
Par Fisher, inlCrpn:lation of the P-wlue was ullimately for the experimenter: e.g. a P-value of amund 0.05 milht lead neither to belief nor disbelief in the null hypothesis. hulto a dcc:ision 10 perform anathc:r experimenl. To some exlcnt. usc orlhn:sholds far signiftcance n:sultcd fmm lhe rmuClion in lhc size of stalistieal tables ,,'hen only the quanliles of distribulions (such as 0.1.0.05 and 0.01) wen: tabulBled. Dislilcc or the subjective inlcl'pn:lation inhen:al in Fishel" s appI'DIICb led Neyman and Pearson ( 1933) to prapasc what lhcy callccllrypolhesu tests. which wen: designed to proVide an objective. dcc:ision-thcorelic approach 10 the raulas of expc:rimcnlS. Instead of focusing _ evidcace qainsl a null hypothesis. Neyman and Pearson consicIcred how 10 decide between lwocompctinghypothcses.the null hypothesis and a specified Qltemtlliw hypolhesu. For ClUIIIIplc. the null hypothesis might state thatlhe clil1'cn:nce between the means of IWO normally dislributcd variables is zero, while the alternative hypothesis might slate thai this clifl'en:acc is 10. Based on this paradigm, Neyman and Peanon argued thai lhcn: ~ twotypcsofc:narthatcauld be made in intcrprelilll lhc n:sults of an experiment (see ERmRS IN HYPOnIE5IS TBIS). We make a TYPE I ERROR ifwc n:jccl the Dull hypalhesis when it is. in fild.1nIe. while we make a TYPE 11 EIIIOR ifwc aca:pI the Dull hypothesis when il is. in ract. false. Neyman and Pearson lhcn showed how to find apli..J rules thai would. iD Ihc long run. minimise lhc probabilities (the Type I and Type II emir
211
~ESSTESTS
______________________________________________________________
rales) or making these CItOJS over a series of many experiments. The ~ I error raIC. usually denolcd as Q_ is closely related to the P-value since ir, for eumple. the 'JYpe I enor rate is fixed at 5 fI, then we will reject the null hypolhe:sis whca P < 0.05. The Type II error rate is usually denotc:d as {J and thepoweroflhe tesl (the probabililY that we do not make a ~ II c:mJI' if the allemalive hypolhesis is lrUe) is I - /J. Basc:don lhese: ideas. Ne:ymanand Pearson were able lOclcrive teslS that were "bc:sa' in the se:ase that they minimised the'l}'pe II c:mJI' nate. given a particular 'l}'pe I enor rale. II is important to realise that in Ibis paradigm we do not attempl to inrer whether the Dull h)'pOlhesis is 1nIC:
No test based upon a Ihcory of' probability caD by ilSelf provide any valuable: cvidaaoc of the buth or falsehood of' a hypolbcsis. But we IDly look at the purpose of' tests from anoIhcr viewpoint Without hoping to bow \\iaelher cKb separate hypodJesis is IIUe ar false. we may ~b far rules to go\~m our beha\'iour with reprd to them. in following ncb we iDs1R that. in the long IUD of expcrienoe. we shall not often be 'ATCIIIg (Nc)'IDID ad ~n. 1933). To iIluslnlte the diffe:rences between the two approaches. consider the hypothetical connlled trial ora new cholesterollowering drug. with n:suIts (mean posI-llCatmeDl cholesterol) swnmarised in the table.
hypothesi. _Is Results of a hypothetical controlled trial of a new cholesterol-lowering dlUfl Group Nc:wdrug
Placebo
Numbt!rof
Meon
partit:#ptlItIs
tlro/esterol (m&ldlJ
IS IS
220 lOS
SIGlldarti
tier., io" 2S 2S
Mean cholestc:rol has been n:duccd by 15 mg/d); a reduction or this mapitudc mighl lead to a substantial .muction in the risk of heart disease. An unpaired I-test gives P=O.Il. Based on Fisher's approach. the null hypothesis has nol been disproved. Howc'VCl'. a thoughtful investigator might. rather' than discarding the proceed to conduct a larger trial. Application of the Neymm-Peanon approach n:qUiRlS the specification or both Type J and Type II error raIcs in ad\'8DCC. so we must specify a pm:ise alternative hypothesis. e.g. that the mean reduction is 10 mgldl. An investiplOr .ttemplin, to foUow the Neyman-Pcanon approach would nc:cd to IqJC)rt not only that the lest was not si,nificant at the 5 CJ. Ie:vel ('JYpe I enol' rate 5 fit) bul also the ~spc:cific:d Type II c:mJI' rate. Howe\lCr. the power or a study with 15 patients per group to dc:tcct a difference of 10 mglcII is only 19.5 «.it. for a study that
dru,.
is too small. such as this one, there is no choice or'JYpe I and
1YPe II error rates that is satisfactory. Had we done a FOWER calculation on the basis that we wishe:d to detect a difference of 10 mgldl with SO fit power at 5 «.it significance. we wauId have round that we require a much larger study. with 99 patients in cadi poup. The usc of power calculations to CDsu~ that studies are large enough to detect associations or inte~"1 is an endwing Ic:pcy of Neyman and Pearson's wort. Now that most slalislical computer packages n:porI pn:cise P-wlues. there seems little justification in n:palting the ~sults of our drug trial as P > 0.05_ P > 0.1 or ·NS (nonsignificant) unless one is following a pn:-specifie:d choice of bolh 1YPe 1 QIId 1YPe II error rates. This is ran:ly the case: even in randomiscd trials we will usuaRy invesligate a number of hypolhescs beyond the primary one for which the trial was desipc:d. Therefore. in modem medical slatistics. it is usual to JqJOrlthe precise P-value., together with the estimak:d dilTen:nce and the CONRDENCE IN11lRVAL ror the difference. For elUlmple~ ror our hypathc:ticaltrial we could ~port that the MEAN raiuction in cholesterol was IS mglcII (9SCJ. CI -3.7mg1d) to 33.7mgldl. P=O.l1)~ When we examine the conndeaee interval we see Ibat the results arc consistc:Rt either with a substantial aad clinically impaltant n:duction in mean cholestelOl or with a modest increase. Examining the confidence intc:nal should help us aYOid the common error ofequaling 'nOMigniftcance' with acceptance orthe null hypothesis that the dru,has noetrecl.n:gardlc:ssof the power of the study to detect diffen:nces of inteRlSl. A number orboolcs and articles discuss in more detail the 1CStin, of hypotheses. the arguments between the Fisher and Neyman-Pearson scbooJs of inrerence and the case for Bayesian n:asoning as an alternative (e.g. Cox. 1982: Oakes. 1986: Lehmann. 1993: Goodman. 1999a. 1999b: Sterne and Davey Smith. 200 I). IS f
Cox, Do R. 1982: Slatistical sipificanc:e tcsts. BriliJh Journal of Clinical PlrtumGcoIogy 14. 325-31. n.Iier, R. A. 19SO: SllIlisliml
melltods1M Ttsetlrth workm. London: Oliver and Boyd. JIIsIaer. R.A. 1973: SllItislimi tMlhotIs _ ~~nliJk infermt'e. Londan: Collias Macmillan. Goadmaa. S. N. 19998: TowudC\'idca~
medical s1alistics. I: the P-\'a1ue fallacy. MlftIls of I",erlftlliono/ Metlkilfe 130. 99>1004. Qaad••", S. No 1999b: TO'A'ardcvideDl:cbased medical statistics. 2: the Bayes factor. Arrrrills o/I",erlftlliono/ Metlkirre 130. 100>13. J.eIuuaa, Eo L 1993: The Fisher. Neyman-Pcarsaa tbcories of'testin, hypadac.scs: one theory or 1\\'01 JoumtIi oj ,Ire Amerimrl SIlI,isticll1 ABDtillliorr 88. 1242-9. NIJIIID, J............... Eo 1933: On the problem of the most eftkie.. tals ofSIali51ical hypotbcscs. Philosopbim/TrDlUtltUons of ,he Royal Sotiet)'. ~rie$ A 231.289-337. Oabs, Ptl. 1986: SIlIlis,iml in/nmre. Chichcslcr: John Wiley & SoDs. LJd. Sterae, J. A. ..... Daft7 Sadtb,G.200I:Siftiagtheevidence-.t's\\TODgwilb significance tesas? BritiD. IVeJksl Journtll 322. 226-31.
I ICC Abbrevialioa for DI'J1lAQ.US'I'E COIlRB..AlION COERICIENI' ICER Abbreviation ror INCREMENTAL COST-EfRCTIVENESS RATJO. See COST-EffECTIVENESS ANALYSIS
ImmuIW proportion
11Iis proportion indicates individuals who may not bcsubjecllo dealh. failu~ relapse.ele.• in a sample of censon:d surviwllimes. The presence of such individuals may be indiclllCd by a relalively high number of individuals with large censon:d survival limes. Finite mixhR distribulions can be used 10 investigate such data. Specifically. the population is assumed 10 consist of two componenlS. The lirs~ which is presenl in prapanion. p say. contains those indi\iduals who are susceptible to SDmeevent ofinteresl (death. relapse. etc.) and have. say, an eJtponcnliai disbibution for the lime ID the occlllmlCe of the event. These individuals an: subject to righl censoring. The remaining proportion. 1- P. of the population is assumed to be immune to. or cun:cl or, the disease and rorthese individuals the event never happens. Consequenlly, observations on their survival times· an: always censored at the limil of rollow-up. An importanl aspect of such analysis is to consider whc:ther or not an immune proportion does in fact eJtist in the population (sec. for eJtampie. Maller and 2hou. (995). SSE [See also aJRE MODELS) Miller. ReA. adZllaa, S. 1995: ThsIing f« dIe.,.am:c ofinunulle
«curcdindiWUlsiDcc:morm!llJ'Yi,'aldata.Biametrk.rSI.181-20J.
Imputation
See MUL11PLE IMPUTATJON
11Ie incidence of a disease is the: number or new cases of the diseuc occurring wilhin a specified period of lime in a denned population. A lime period or I year is most commonly used. but any appropriale length of lime can be substituted. II is generally presented as a J1IIe. Thus:
Incidence
Incide nee
rate _ Number of new cases of the disease in one ),ear NumLei' in ihC popuiaiiOR II ri5E
This assumes that lhe size or the study populalion remains constant over the lime: period for which the nle is calculated. Small iDCmISCS or dCC19SCS in population size over a year. for eJtampie. can be dealt willi by using the mid-year papulation as the denominalor for lhe incidence ndc. This results in a number between 0 and 1. but for case of presentation it is often eJtpn:sscd as a nle per 1000. per 100 000 or per I 000 000 depending on the disease nrity. As
an example. the incidence rate of coIorectai cancer in males aged 60-64 in Scodancl was 159 per 100000 in the year 2006 compared 10 206 per 100000 in the year 2000 (NHS National Services ScolJand: Information Services Division. www .isclscotland.org). Thus incidence lales can be used lo measure risk and comPIR risks across lime or between different populations. This definition is nIher simplistic because it igncns Ihe ractlhat when new cases of the disease OCCUl", the subject is no I~er at risk and should icleaUy be removed from the denominator. It is also unsatisfaclory for cleaIing with data from LONOIlUDINAI.STUDIES in which subjeclS may be followc:d up forw,ryinglengthsoflime. For these studies the incidence rate can be definc:d as: IDcidace rile
=..... af new
"'5
GIleS crf Ihe dRae _die clef. . paput.... TCIIIiI .... d IiIIII! lac ..1IidI hll\"e hie. rallned lip
The dcnominaIDr gives lhe number of person-yam of observation. Incidence rates defined in this way an: often eJtpresscd as rates per 100 or per 1000 pcnon-years of observalion. (A more detailed discussion of incidence
and incidence rates is given in Rolhman. Greenland and
Lasb.. 2008.) .OR should be taken todislinguish between incidence and FREYAJ..ENCE. Althaugh the deftnitions appear similar at lirsl
sighL they are usc:d forclifTerent purposes and it isessentiallo distinguish belween them ctXIeCtiy. Furlher details can be round in Woodward (2004). 'VHG Rathman, K. J.. GnIIIIaad, S. sad ...... T. L 2008: MDIkm rpilkmioloD, 31d edition. Philadelphia: Lippincaa.. Wilkins and WiUi.... Woadwllnl. ~L 2004: £pi_ioto,,: $tuJI). tleJign and dala tIIItllyliJ. 2nd cdilion. Boca Rataa: Chapman a: Hall
Inclusion and exclusion crlterta
Thc:se crileria opc:I1IIionaJise the choice of sludy group. a choice that lies at the heart of the design of, and inference from. CLINICAL TRIALS. 'Inclusion' criteria define the population or inlen:st: 'cxc:lusion' crit~a remove people far whom the study treatment is contraiDclicaleci or Wllikcly ID be eJTc:ctive. Collectively. inclusion crilCria and cxc:lusion criteria comprise the e",r,. cri'erill or eligibililY crimio. Biological plausibility. the: inlernal validity of the study. the epidemiological basis far generalisability Dad statistical powcr all play parts in selc:cting entry criteria ancIln making leCommendalions from the raultsofthe trial. The selec:lion orthose to be enrolled in a trial often reftc:cts a delibendc attempllo
213
INCWSIONAND EXCLUSION CRITERIA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
select a slUdy calion homogeneous eDOlIIh 10 allow a IIUc lmltmcnt elTcct to bc::come lIUIIIifcsl. )'et heterogeneous enoulh to pennil ",liable generalisation to a blOlldcr population. Clinical lrials necessaril)' study people with mo", homolCllCOUS characteristics than the patients to whom clinicians will apply the n:sults. Strict relRsentatiw:ncss is n:icvantto the gmemlisabilil,. of clinical trials but is not esscntialto ilr/erence rmm them. In mndomised studies. the logital basis forcbawilll COfIclusioas lies in the act ofRAl'ftXHSA'JIOH.'J'hcpracessorconclucling that the etrect seen in a clinicailriaJ wiD apply to another populatian is informal ancIsubjc:clive (Cowan and Witles., 19M). Homogeneity of the stud)' papulation ditren from homogc:DCity of the treatment etrect. The fonner ",ren to a study group's sharing similar chamc:lCristic:s: the laller MfeD to an effect of IrCalment whasc expected magnitude and dim:lion wauld lead to the same n:commcndation for use or nonuse in identifiable sublftJUPS. If a thc:rapy affects a wide group or people quite similarl)'. then either a lIomoJencaus 01' hetcrogeaeous study lroup will provide similar answers "'larding the mqnituclc of lmllment etrect. An ideal stud)' gmup would consist or a cohort ror whom the IrCalmenl is efl'ective and com:lIpOIIdinJ to nom is an identifiable populalian thai will be 1rcaIcd. Defining such a stud)' group bef~ the llial is usually difficulL Available data am I1IR:ly sufllciendy n:1iab1c to plVvide serious guidance about wham to include. Early-phase studies typically deftnc IUU1'OW CAli')' criteria to establish prelimiDIIIY safely or to cIemonslnlc: proof or CDIICCpt (sec PHAsE I 'I1UALS, PHAsE II TRL\I.S). Such lrialsoften exclude childn:n. IRgnant and noning warnen. the frail elderly and GIber vulnc:nble populations. Later phase trials with nanow entry criteria specify the t)'pe or patient likely ID beneftt mast and then lest wbc:lhcr the IrCatment worb rar them (sec PHAsE UI1RL\U. PHAsE IV nIALS). A study showinl benefit in this narrow IrauP or participants ma), lead ID fUton: trials with widc:r entry criteria. A IrCaImenl with important hcterogenc:ity or effect ~quin:s a holllOlencous study population. Trials with WidcCAtrycritc:riaaddn:ss whc:therthc: tJadment understudy wodts an avcnagc: when applied to potential useD. Wide entry crilc:ria simplify SCReDinl and recruitmenL enlOlling a wide range or pc:op1e is consistent with assuming homogeneity of eft'ect while afl'onIing Ihc inwstigalor a tentatiye glimpse at Ihc likelihood of the tnlth of thai assumption. Biological plausibility should play a decisive role in selecting the ranp of people to enrol in a trial. Study enlfy criteria should aim ID achieve helc:mlencity wbCA no ClODvincing information at the start or the Sluciy SUlgesls that sizeable differential effi:c1s an: likely. As a hc:tcrogeneous study poup leads to varialion in the incidence of ENDPOINI'S. incn:asing hclerogenc:ity pnendly n:quin:s an inclaSed sample size.
Defining CAli')' criteria n:quin:san opc:nlional definilion of Ihc disease in a IrCalment trial or a specification of who is at
risk in a prevention trial. Allowing people with questionable diagnoses to enter a lrialte:nds 10 aUenu_ the estimak:cl tn:atmcnt etrect and hence decreases statistical power. Yel often the insistence on unequivocal dacumentation or diagnosis excludes many people who in ract wauld n:ceivc the tn:llbllent if the trial shows benefil (Yusur, Held and Teo. 1994). Trials must exclude: people kaown to have c:ontraindicalions to the tn:abnents under study or those who an: padicularly vaI_rabie. Similady, trials or therapies already known ID be etrective or incffecliw: in c:atain gmops should exclude thosc groups of patients. Some raadomisccl trials use an 'uncertaint)' principle' toguideenll)' (sec MEOA-lRIAL). 'A patient can be enteml ire and only ir. Ihe raponsible clinician is substantially uncertain which or the trial lI1:alments wauld be most appropriate ror that particular patient' (PelD and Bailent. 1998). 1)'pical PROI'OCOLS RlR CLIN1C.~L TRIALS exclude people unlikely 10 finish a study or 10 adhcn: to Ihe protocol. Many clinicaltria1s have ver)' rew participants with some specific characteristics. A trial ma)' exclude racial or ethnic groups, not because the entry criteria pn:clude lheir participation bUI because the clinics involved in the study do not have access to them. In summary. trial deSigneD should cason: that each entry criterion repR:scnlS a defensible limitation on Ihc study group; howe\lel'. the: ract of inclusion cIoesnol usually provide much information about the effec:l or tratmcnt in specific groups or people. 'J'hc argument that only by including. say, women and minorities., can one legitimatel), apply the raults orb trial needs to be tempered with the fael that a trial ran:ly giYCS enough infonnalion about specific pvups to learn much about the effecl of trealment for them. When the trial is o\'er, the n:sults should usuall), be applied quite: broadly, both to people whose demographic characteristics am similar and dissimilar ID those in the trial: however, the medical communit)' should maintain an inlellecwal stance open 10 sugc:stive data indicating differences. n.e situation is ~ complicated ror groups of people definc:d by such medical orphysiolOlic variable asdiqnasis, scverity, prognoslic rcatun:s. prior history or concomitanl medications. for often appan:ntly biological cogent ft'1ISDftS justify exclusions. Hen: too a critical questioning of the reasons rar exclusion is wananted: in many cases very few data an: available to support even slIangly held views. DesigneD or cliRicaltriais should construct entry crileria bearinl iD mind the purpose of the cum:ntlrial, the: available knowlcdJe or the study llUlmenls being tested. the likely Sluclies that will follow the trial and how investigators, practising clinicians. palients and n:gulalory qcncies will JW intcrpn:l the: raults in light or the entry criteria.
_______________________________________________________ Cana. Co .... wnta.J.I9M: IDlCI1:cptsludics.cliDicallrials. and clust.:r cxperimcats: to whom can we exIIIpolale? Conlrolletl ct;,,· icsl Trial, 15.24-9. Pete, ........... C. 1998: Trials: the IICXI SO yan. Met/itllt JOIIIfItII 317. 1170-1. y~ S., Held, P. aadTeo,K.K.I994:SelectionafpaliCRISformndonaiscdCXJRlfoIIed 1riaIs: impIic:aIioDs of wide. narrow elipbiUty crileriL StGlistics iR
Bri,.
I4m;dM9.7~
Incomplete block designs
See CROSSOVER tRIAlS
Incremental cost-eflactlveness ratio (ICER) Sec COST-EFFErJ1\IDIESS A.tW..YSIS
Incubation period
Tbis is Ihe time inlcnal between
the acquisilion of inrCClioa and the appcaranc:e or sympkHDatic disease. Examples include the time bet\\'CCD exposun: to mdialiOll GI' to a chemical can:inogca aacI the oc:cummce or cancer and Ihe lime fram iDfc:cliOD with HaVand the: 0Dsel of AIDS.
The leagth orlbc incubation perioddc:pencls on thediseasc. nangalll fram days. rar instance. in Ihe case or malaria to a numba- or yam rar HIV. 11Ie incubatiaD periodlypically yaries from individual to individual and may depend Oft the close or the cliseasc-causinl &genl m:eiYed. Oiven this yariabiUly. it makes SCII5C to talk about incubation period disbibutiOft. The incubation period diSlributicm I'll) represcnls thepmbabilily thallhe leqlhofthe incubaaiaD period is less than or c:qualto I time unils. EslimlllioD and chlll'KlcriSlllioD of F(I) is impodant for a numba- or IaSORL For diseases with short ilK'ubalion periods. such as outbrcab. knowl. or the incubation period is esseDtiaI to the investigatiOll or the circumstances in which Ihe disease bas spRad. ID the case of cliscaues with loal incubation periods, such as HlV or CmllZfelcltJakob disease. infonnaliOll on RI) is a accessary input to Ihe estimation and projection of Ibc evolution of the epidemic (sec BACK-CAl.CtU1IDN). Finally. it is yel)' imporlanl to identify covarillles duat milht affCClIhe lellllla or the incubation pcriocI for an elTective clinical management or the palienl. The ideal setup to estimate the inaibation period distribution is a CCIIORf S11JDY wbc:R individuals 8R uniDfectc:cl at enrolment aad 8R followed up toobscrve bolla theoccllJRllCC of infection and the appearance or symplOmalic disease. Tbe Rsullilll obsc:n'8lioas will be right cellSDRd as every individual will have either clevelapc:d the: disea5C or bc:c:n censoml by the: end or Ihc: foHow-up period (see CBlSORED 0IISEJtVA110NS). Classical survi\..l analysis can be usc:d to eslimatc: FCl) both nonparamc:lrically, via KAIIlAN-MEID JILOTS. and parametrically. by fillinl paramc:lric mocIc:ls to the riPt~nsan:cI data. Usually. especially far diseases willa a 10111 iDcUbatiOll time. such cohort studies ~ diflicultto set up. EslimaliaD or the incubation period distribuliOD is Ihc:n
INSrRUMENTALVAR~es
carried aul either usinl information OIl individuals who have aln:ady clevelapc:d symptoms or rollowilll up cohorts of individuals who 8R aln:ady inrcctc:d. but have not yet developed Ihc disease. In eithc:r case. biased raults can be obtainc:d ir CIIimatiaD does not prapc:rly account far the sampling criteria by which individuals 8R included in the study. DDA
B....-...,., .. 1998: Jncubalioa penact of infcetiaus diseases. In Armitage. P. and Colaaa. T. (cds). ERqc:loper/kl D/6iD"tllulk,. ~. I. pp. lOll-l6. Chichester: JahD Wiley a: Saas, Ltcl.IInIaIau)w• .. ad Gall, M. H. 1994: AIDS epitkmiolDgy: tI 1JUIIII1;IGlh~ approGCll. New York: 0xfanI UDiYmity ~
Indlract standardisation Individual ethics
See DDtOOIlAPIIY'
Sec E1HICS AND aJNICAI. TRIAU
Infant mortality rate
See DBIOOIWHY
Informative censoring/dropout
Scc: CENSDIED
OBSERVATIONS. DRCJIOOT. MlSSlNO DATA
Informative dropout
S)'IIDftym rar
NONIONORAaE
DRQIIOU1'
Informed consent
Sec ETHICS AND aJNJCAL 'J1UAlS
Instantaneous death rate
See SUlMVAL ANALYSIS -
ANO\IEIMEW
InsUtutioMI review board (IRB)
See EIIIIC\L
REVIEW COMMl11EES
Instrumental varlabl..
A variable thai is highly com:laIed with an explanatory variable but has DO direct inftuence on Ibc R:spontC variable (i.e. its elTCCI is mediated by Ihe explanatory variable). Consider a silUaliaD in which we: can assume thai. n:sponse variable. Y. is linearly n:1atcd to an explanatory variable. X, as follows:
Y=a+JJX+£
(I)
WhcR S is a random deviation ora particular value or Y fram that expc:clc:cl flVlll its n:latiaDsbip wilb X.1)'pically. we: wish 10 usc: • sample of (x. JI) pairs of mc:asun:mc:Dls in order 10 e:stimate Ihe unknown yalues or the: panunc:tc:rs., a and/J. Tbe familiar ORDINARY ~ SQUARES (CLS) estimator of fJ is c:quivalenlto the ratio or Ihe estimated COVAlllANC£ of X and r to Ihc: estimated variance of X (dais ratio is usually caleuIBIed by dividinlthe sum orthc cross-pmcIuc:ts orlhe X and r values rram the:irrespcclive mc:an by the sum of squan:s orlhe X \'alueS).ll is possible toclcmonstratc that such an estimate is ID1biasc:d rOT JJ pnwiclal cenain assumptions bold - the lcey one bciag that X andrr 8R UDCOI'Rlatcd.
21&
S
~
R
A
V
L
A
r
N
E
M
U
R
T
S
N
I
_______________________________________________________
Now, irwe have an omitted variable. C. col1'Clalcd with X. and such that the tnJc model is. in rad: (2)
where " is the random deviation or Y from that explained by the model.lrwc still proceed with our naive OLS estimator as ror equation (I) then we will obtain a biased estimate of fJ. This is a n:sult or the rad that the c::om:lation between X and B is no longer zero. This is an example or what econometrists call endogencity (sec Wooldridge. 20(3). In epidemiology. the variable Cis kDOWR as a cotifounder (in this case a hidden conrounder). In such cin:UI11SlaDCCS. how might we obtain a valid estimate of/l? The obvious answer is to mca5ure C and fit equation (2). Another approach (much mon: common in economics than in medical applications) is to find a variable that is slnJDgly corrclalcd with X. but uncorrclalcd with the n:sidual. B. Such a variable is called an instrumental variable (IV) or instrument. for short. Now let us CXJDsider a diffe~nt circumstance. Suppose that the values or X an: mcasun:d subject to emil' such that:
X=r+,
(3)
In which the .. values are random measun:mcnt erron with zero mean and assumed to be uncorrelated both with each other and with the true values. 1'. 'n1c n:lationship we are ~lly intcn:slcd in is the rollowing: Y=a+~+B
~
How do we estimate fJ? Apin. usin,OLS in a n:gn:ssion or Y apinsa X would produce a biased n:sult (sec ATJEHUATION DUE 10 ).IEASVREMEHT ERRQR). This is anaIhcr example of the endogencity problem. A similar situation holds when we attempt the c:omparative calibnlion of two mcasun:ment methods. both subject to me~mcnt CI1OI5 (see MEIHOD <mIPA~ S11JDES). If we we~ in the rortuitous position or knowing the VARIANCE of the measurement crrun in X. or the n:liability or X. we would be able to make appropriate canec:tions. AnoIhcr approach is again to ftnd an inSlJ'Umcntal variable - a wriable that is strongly c:orrelalCd with X but aJDditionaily independent of Y ~vcn X. Consider an inslnlmcntal variable. Z. The instnlmental variable (IV) estimator of fJ in equation (2) or (4) is:
fJlV
E(Z-Z)(Y- Y) = E(Z - Z)(X - :I)
(5)
which isc:quiwlcntto the ratio or the estimated c:ovariance or Z and Yto the estimated co'VBriance of Z and X. Typically this estimate is obtained through the usc of a two-stage least squan:s (2SLS or TSLS) algorithm (see Wooldridge. 2003. Cor further details or the method. including the sampling distribution or the IVestimalc). This algorithm is available in mosIlarr;e general-purpose softw~ pac:~. Note that ils
validity is not dependent on any distributional assumptions concerning either Z or X. 80th could be binary (ycsIno) indicatan. for example. For linear models. IV estimates can also be obtained with ease usin, structural equation modelling (sec S1RUCI1IRAL EQtk\1ION MODfl.S and S11WCTURAL EQUATION MDDEU.JNO SOfTWARE). As an example. an early medical application of instru-
mental variable methods was provided by Pc:nnult and Hebel (1989). They describe a bial in which prqnant women wen:
randomly allocated to n:ceive enc:oungemcnt to n:duce or stop their cigarette smoking during pregnancy (the tn:almcnt
group). or not (the conarol group) - indicated by the binary variable. Z. An intermediate outcome variable (X) was the amount of cigamte smoking n:cordcd during pn:,nanc:y. 'n1c ultimate outcome (I') was Ihc birth wei,ht oC the newborn child. Readcn will be familiar with evaluating the effect of RANDOMIZATION on the child's birth weighL However. what about the effect or smoking (X) on birth weight? Smoking is likely to have been n:duced in the pvup subject to enc:ouragemcnt. but also in the conarol group (but. presumably. to a lesser extent). Then: are also likely to be hidden conCounden (e.g. other health promoting behaviours) that are associalCd with both the mother·s smalting during prq:nancy and the child"s birth weight. Smaltin, eX) is an endogenous In:atmcnt variable. The problem is solved by noting that randomization (2) is an obvious candidate Cor the instnlmental variable.lrthe intervention (i.e. enc:ourqemenlto reduce smoking) works then randomizaIion should be cOll'ClaICd with smoking during pregnaocy. It is also a reasonable to assume that the effect or randomization is completely I1'lc:.diated by its effect on smoking (thai there is no direct effect of randomization on outcome (the birth weight of Ihc: child). Randomization (2). in fact. is increasingly being used as an instrumental variable in the estimation or the effect of In:atmcnt n:c:ei~ (X) on outcome (Y) in nndomizc:d controlled trials subjeclto nonadherence or nonc:ompiiaDce with the allocated In:atmcnt (see ADJUmtENr R)R NONCOMIII.WlC'E IN RANDOMIZED CONIROum TIlIALS). 'n1c poICIItial for the usc ofinstnnncntal variables inepidcmiological investigations is illustrated by Onx:nland (2000) (see also MENDELIAN RAND0MISATION). Health economic: applications are ~viewed by Newhouse and McClellan (1998). What about MEASURaIEN1' ERRal problems? Well. first note that ror the example providc:d by Pcnnutt and Hebel (1989) the IV estimate or the effect of mother's smokin, on her child·s birth weight is not attenuated by the inevitable measu~ment error in the number or cigan:ttcs smoked by the mother. The IV estimator effectively copes with the simullancoUs problems oC c:onCounding and measu~mcnt error. What about Ihe problem solely due to measu~menl error? H~ an obvious choice for an instrument is a mcasun:mcnt oCthe characteristic measun:d asX using adiffcn:nt proc:edun:. Smoking (X) could be measun:d by selC-n:port (in
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ INTENTION-TQ-TREAT (ITT)
a diary. ror eumple) and a suitable inSlnlmenl (Z) mipt be a measurement or a biomarker or nicotine CODSumption (cotinine levels in the blood. for example). The key here is to be able to convince oneself of the conditional indepeacIc:ncc orZ(bi~ me:asu~menl) and outcome, Y(hcalth Slalus). gi'Vc:n Ibe fallible indicalor or exposlR, X (selfn:pOItcd cilanlle smoking). Dunn (2004) pnwides detaile:d descriptio.. orlhc use or instrumental wriable: meIhacIololY in the e'VDluation or mcasun:mcnt CITOrs. mainly in the context or linear models, bUI also in lalent class madelling or binary dialllDSlic lest n:sults. Naalinear models an much more difftcult 10 deal with and arc well beyond the scope of this article: (but scc Stefanski and Buzas. 1995). GD DaIuI, O. 2004: StGlistiI.'Gl no_tiM of nlttJJUl'e""nl error••
London: Arnold. OI'ellll8lld, S. 2000: An iatnJductioa to illSbUmental 'Variabies far cpidcmioloplS. intemalitJrlGlJDIH.IIIII of EpitlmritJlo,,· 29. 122-9 (Eaalum. p.II02). New...., J. P.... McCIeIIaa, M. 1998: &CIIICIII1drics ill outcCIIDCS~!nIdl: the usc of iDSlnlIDClllaI. \'IIriabics. AnllIIIII RnirtF. ofPublic Hmlth 19. 11-34. I'InIatt, T..... He.... J. R. 1919: SimuillDCClUHqllllion aliJUlian ia a clinicallrial of the etTect of smoking and bidh weillL Biometric. 45, 619-22. StlfBlllld, L . . Baas, J. S. 1995: laslnllncatal 'Variable _matitlll ill biauy tqR:ssiaa measumncDl cnor madcls. Jour_ oJ the Amnfam Sttlt&titlll AssotitII_ 90. 541-9. WaaldrIdp. J. M. 2003: In/rotiMtlory «tNftIIIIetrk.: tlppnlGc". 2nd edition. MISOD. Ohio: Sau1h-WCSIeIII.
II "",.".
Integrated hazard function
See SURVIVAL ~~ALYSIi
ANOVERVIEW
Intentlaft.to-lNaI Cotonaty . , . , bypass surgery In sIBbIe anginapedotls trill. MottdIy at,., years afIer TIIIJdotnisatio by allocated IIIId acfuBI.i'JIeMnim (Eufope8n Cotonaty SUqJety SIr4'~, 1979) Allocated
MediCllI
Medical
SUrgictJ'
SrlrgiCll'
Medical
SIqical
Surgical
Medical
inlerve"tion Actual iDtaYmlian SuniYOrs
296
41
353
20
DeadB MartaIity
21
2
IS
6
8.4~
4.fi
4.llJt
23.1'1
died berOlC surgery could be done. aad exclusion of such participants from one: arm only introduces BIAS. the In analysis of these data would c~a lIIOItaIity rate on.8 CJt (291373) in those allocated 10 medicaillealmcnl willi a mte of S.3" (211394) in thase alloca1c:d1o sulJCI)".1f the six dealhs that occurred in panicipanls allocated tosurpcal interYenlion who died befon: receiving suqery (identified by "Actual intervention = Medical" in the lable) arc not attributed 10 surpcal intervention using an intcntian-lo-Ral analysis. surgery would appear to have a falsely low mortality ndc. Since pralacoJ deviations aad naacampliaace arc likely to occur in raulinc usc: of an inICn"ntion. rrr analysis can provide an c:sIimalc of the tn:aIment ell'ce," which n:asonably n:ftects what mipt happen in clinical practice. II is thc:rdan: Ihe most suitable approach ror pnlgmatic IriaIs that aim to measun: the o\ocnaU f!jfecli,'me$S of an inla"Vcnliaa policy in
Intentlon-ta-I...I (ITT)
11Iil is a principle used iD the design. analysis and conduct of randomised CLDIICAL 1RL\I.S (Heriliel". Ocbski and l
I /
I=~I /
A
"0
AI ........
A
B
..... PrataaII
A
ITf
0
I
RANDOMISE
A
"" I
CONTROL
GROUP. B
/
B
"0
A
B
0
B
B
Intention to .... Graphical representation of flIOUP membelShip Iorhowindividullls following tandomJsalion IUe considered tor ,.,... prJtpOSBS acoording to the principles of 'ITr, trealed' and perptOtocoI'. 71Jose,
as
usUlllly relatively few, indh!iduIIIs aIIocIIted to one fJfOUP (A orB) but in IIcIuIIIiIy in receipt oIllitemative treatment (8 or A, respecf/vely) are handled clfferentIy, or if indiCllted .. _M are dropped 1IItogether, in llna(ysis IIfOIJPS.
217
INTENTION-To-TREAT (ITT) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
routine practice. II is less suitable for cxplanalOly trials. or explanalory analyses of pragmatic trials. which aim to measure the: effiCQcy of an intervention under equalised wndilions. but even here it may still be preferable to the allcmalives. Alternatives 10 ITf include PER FIIOTOCOI.. analysis. where only participants whooomply with the allocated inlerVcntion an: included. and ASTREA'lEDanalysis. where panicipants arc analysed according to the intervention received I1lther than the ranclomised allocation (see the ftgure on pagc 217). Each of these analyses aims to estimate emc:acy. rather than effectiveness as estimated in an I1T analysis. In the mal oomparing mc:cIical and sUlln lherapy in stable angina pecloris. the intention to IJ'eaI analysis gives an estimate of 2.5 t.it higher mortality with surgery (954Jt CIOIIfidence interval of -1.5 4Jt 10 + 5.S ~). Per protocol and as lrcaled analyses arc severely biased by lheir handling of Ihe six deaths in patients ranclomiscd 10 surgery who wen: too sick or dic:d too soon 10 n:ccive surgery. giving statistically sipifieant estimated illCMaSCs in manality with surgery of 4.3 4Jt and 5.4 Cit respectively (see the second table). Ways to estimate el1icacy while avoiding this bias arc discussed in ADJUSTMDlI' RKl HONCOWU..\I\X:E IN RCTs. If some parlicipanls lack outcome data. then a full I1T approach is not possible. Ltul obserFQlion mrriedfonvudis oRen used as a way 10 include all randomiscd paniclJNIDls in the analysis. but this inb'oduces furlhel' assumptions that an: rarely plaUsible. Instead. analysis should be based on plausible assumptions (see !.DSSI.'«J D.o\TA and DROPOUTS). and sensitivity analysis should examine the poICntiai impact of departures from these assumptions (Hollis and Campbell. 1999). All participants with post-randomisation data should be included in the analysis. even if they lack obsc:rwlions of the punicular oulcome variable of interest. 1biscan be achieved by using MULTlPLEIMPUI'AnOH or RANDOM EfFEC'IS MODEWNCJ.
Intentloft.tcMreat Different methods of BfJIJIysis iJIusItated usingmorlBlty at MoyealSafterllJfJdomisllllo in the corona" atIeIy bypass SUIfl6IY in stable anginapecfotis trial (European Cotonaty SUrgety Study G~, 1979) Mediall 'lI (nlNJ
Medical 1'01 SlITgiall diffemlce (95~ CI)
1nIenIion-fDImlt aDDIysis
Per-JIIOlOCol analysis As bated
analysis
7.8'lt (291373) 1.4'lt (27/323) 9.5'lt (33/349)
5.l4.t (211394) 4.14.t (15/368) 4.14.t (17/418)
2.St.t
(-1.5Cit. 5.54.t) 4.3Cit (O.74.t. 8.29t) 5.4Cit (I.K.9.39L)
It is oRen argued thai J1T provides a conscn'atiye estimate of treatment etTectivcness. which is a smaller effect than the true potential effectiveness of un inlervention. since the estimated treallnent effect is likely to be reduced by the inclusion of prolocol deviations and noncompliance. This may be generally InIc for comparisons with I'lAC'EBO. because any switching between groups will tend 10 dilute the estimatc:d balment effecL Howeyer, in comparisons belween active tn:alments or when an elTectiye rescue medication is available. an rrr analysis may not be conservative. For example. two equally good tn:atments will appear dirrerent on In analysis ir clinicians are more likely to supplement one of the treatments with a more powerful agent. Parlicularc.arc should be laken when using the rrr approach for adverse etTccts or safety data unci for noninferiorily or eqUivalence trials. In these situalions the generally conservative answers provided by IlT may lead to inappropriate conclusions. Other analyses such as pel' prolocol analysis are commonly carric:d out in these situations, allhough it may be preferable 10 avoid selection bias by using randomisation-based methods (sec ADJUSTMENT fOR NO.~co~w..'ICE
IN RefS).
Non-frr analyses thai exclude some randomised indiYiduals un: sometimes justiftc:d. proyided that these exc:lusions arc not associated with b'eatmenl allocation or outoome (Fergusson el 01.• 2(02). Ineligible participants who un: randomiscd in error. or whose eligibility cannol be established before randomisation. could be excluded provided that the judgement of eligibility is based on information eSlablished berore randomisation. and not inftuenced by the allocated intervention or outcome. Such exclusions should noI be made if c:Iinical practice n:quira b'eatment to be 5Iartc:d hefon: eligibility can be detc:rminc:d. since Ihe most clinically relevant comparison is usually between all those randomiscd 10 lrcutmcnt and allthosc randomiscd to control. Failure 10 start the allocatc:d Ircalmenl in a double-blind trial is also sometimes a justified basis for exclusion. WheRCvCl' randomiscd participanlS un: excluded from the analysis. it should be dcmonstratc:d lhal steps have been laken to avoid bias. such as the use of inclepeadcnl blinded assessment of eligibility. Even so. such analyses should nol be described as I1T analyses. Finally. every elTort should he made 10 avoid post-randomisation exclusions through appropriate design and execution of trials. SH/IW (See also AYAD..ABLE CASE ANAUIS. COMPLETE CASE ANALYSIS) EanpI8II CorGlllll'J SvgerJ Study Gnup 1979: Coronuy-ancry
bypassswp:ry instable anginapeclOris: suni\'alallwoycus.Ltmcel
..una.
i, 889-93. Fer........ D., s.. GayaH, O. . . . Hellert. P.2002: Posl-nmdomisation exclusions: the inteDtion 10 bat priDciple and excluding paIialls fiom analysis. Brilim Medital JDUmQI. 325. 652-1. Herltier. S. It.. 01..... V. J. aDd KeedI, A. C. 2003: Inclusion of patients in c:linicallrial analysis: the inteRtion-lo-llal principle. Medical Jounral ofArutraliD 179. 438~. HaWs. s. ...
_______________________________________________________________ INTERIMANALYSS CampheII, F. 1999: Wllat is meanl by 'intention to Imd' analysis? Britim Medical JOIImaJ l19. 67-4.............. C'GaIenace GO H.............. E9 Bxpert WorIcIDa Gnup (101 D) 1999: lCH harmonised tripartite guideline. Slalistical pinl:iples for diDicai
trials. Slatutirs in Metlifint 18. IS. 1905-42.
Interim analyala This is performed al n:pJar intervals for monitoriq data and safety in c1inicallrials. An interim analysis n:fer.s to aDy analysis performed duringlhe coune of a trial and is often intended to compare intcncntion elrcets wi'" rapc:ct to efficacy and safety prior to Ihe formal completion of a trial. Because the number, melhocls and consequeaces or these wmparisons affect the interpretalion of the Irial, all inlCrim analyses should be can:fully pluncd in advance and described in the pratocol explicilly. When an inlerim anaJysis is planned wilh the intention of deciding whclher or not to terminale a llial early. this is usually acaJIIIplishcd by one of three general mclhods known as glOUp sequential melhocls, lliaDgUlar tests and stochaslic cunailmeat procedures. The goal orsuch an inlcrim analysis is to stoplhe trial early if the superiority or an intervention under stucly is clearly eslablished. if the demonstration of a ~levant difl'elence in intervention elrccts bc:comcs unlikely or if unacceptable adw:r:se effects an: apparc:nt. Also. as a leSult of interim analyses. Irial interventions may be modified or an experimeatal desilll. such as the enrol meat inclusion and exclusion criteria or sample size n:quin:menL· chanpd. An ethical obligation to the sludy participants and even beyond the study demuds thai n:sults be monitored during the study to protect study participants-Ifonc intervention is substantially superior 10 the other, if there an: unexpected adverse effects oneithcrofthe interventions or if the study is lDIlikely togiw: definitive answers to Ihe saaely questions. continuing RANDO. ),IJSA11ON meaas that participants can be assigned to and subsequenlly tlated wilh an inferior interveation or put to an unac:ccssary and unjustifiable experimenL The issues or early stopping due to unexpc:ctcd advene effects an: less stalislical in nature.. unless safety is Ihe primmy outcome of inten:st to Ihe investiptors. Suppose the response 10 intervention is normally distributed with meansPA and,u. ror intervention arms A and B and known VARIANCE 0 2• We want to lesl the null hypothesis: against the alternative hypolhesis HI: ,uA ~P8 or, equivalenlly. Ha: dp 0 against HI: . . ¥: O. where 6p /IA - ,ull. Let XA and XI be the sample means respectively for interventions A and B and let n denote Ihe number of participants per intervention per analysis. ID a fixed sample siudy, one may use the test slalistic:
=
=
to test the null hypothesis Ho. for a significance level Q, one would leject Ho iflZl ~ =1-012, when: =1-a12 is the I aI2 quantile of Ihe stanclanl NORMAL DlsntiIUTlON. 0R0ut SEQUENlW. METHODS call ror monitoring of Ihe accumulating data periodically after gf0UP5 of obsemdians. One simple-minded approach is to ~ject the null hypothesis wheneverlhe P-value islcss than 0.05, say. 11Ie problem wilh this approach is that multiple looks at the 0.05 level lead to an overalllcvel or significance g~ than O.OS. Man: specifically, theaetualTYPE I ERROR probability becomes 0.083 wilh two looks. 0.142 with five looks and becomes closer and closer to 1 wilh m~ and man: looks. This phenomenon was aptly described as ~sampliq to n:ach a f~gone conclusion' by Ansc:ombe (1954). Suppose we plan to conduct interim analyses of the accumulating data up to K limes after a pro-spc:ciftc:d num~ of participants n on each interveation. 11Ie dilfelence in intervenlion elrcets is measunxl aldie kth inlCrim analysis by:
RM -.Rat .... N(cl.." ~In) whele XAk and ka" an: the sample means or" observations accumulaled between the (k - l)lh and the kth inlCrim analyses on interventions A and B rapc:clively. It could also be sumnuuiscd by the slandanlised dilre~nce:
X.u -klk
r,t = ';(202/n)"" N(6*~ I) =
where cl* 6,,1 ';(202/n). For the kIh interim analysis. we consider the partial sum or independeatJy and ideatically distributed normal nmdom variables rl ..... YIr:
"
Sit = ~ Yi"" N(d*k,k) or equivalently Ihe st8ada.rdised tesl statistic:
z.. = s"IJk,... N(6· /h. I) and decide to n:jecl Ho or to continue to the nexl group. up 10 a muimum or K interim analyses. 11Ie objectivc of a group sequential design is to derive a poup sequential test that has desinxl operatiq charac:tcristics, i.e. pn:-specified 1Ype I and 1)pe II error probabilities. Thus a paup sequential design for a IriaI n:quin:s choosing group sequeatial critical values. CI, •••• CI(, such thai one rejects Ho after Ihe kth inlcrim analysis if the statistic IZ"I exceeds Col for the first time. We do not n:ject Ihe nuD hypotheses if IZII < Cit ••• , IZ~ < CIt. 'nIe1e ale many diffelenl designs. i.e. many diffen:nt choit:lCs. for the group sequential critical values. However, lhele an: a few wilh known slatislical juslificalions.
219
INI'ERIMANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
The palp sequenliallal by Pocock (1997) uses Ihe same critical value al each interim analysis. Specifically. the Pocock group scquentiallcsl ~jc:cls Ho the ftrsltime when:
IZ.I ~ Ci
=
Cp or
equivalendy ISil ~ bk
:=
w~ V dc:aok:s Fisher infonnation. For a fixed sample test. the: crilical value: c and FISher information n:quin:cl ror the: Bludy are dc:Ic:nDincd to have: Type I and II c:nor probabilities a and Il respectively, such that:
cp.,fk
Hence one: has only to dc:tcnninc: ",. as a filnctioa ofthe overall ~ I error probability a and the maxi..... number of ink:rim analysc:s K. The: group sc:qucatiallest by O'Briea and Flc:ming (1979) uses bu:er critical values al earlier intc:rim analyses so Ihal it is diflicuh 10 rejcct Ho early in die: study and relaxes the criteria until. aldie: end. thecrilical value: isclasc: IOIhe fixed sample: critical wlue. Specifically, the O~Bric:a-Fleming group sequential lest ~jc:CIS Ho the ftrst lime whea:
Pr(IZI ~ c;
0) = a and Pr(Z ~ c; fll ) = l-JJ
w~91 is the hypothesised dill'~nce ofinlcresl. 'I1Ic:se: lwo raauireme:nIs lead 10:
V=
.
(':1-1112 +=.,,)2 and c = {':I-II!2 +ZI,,)ZI-a,ll 9. 9.
According 10 Whitehead (1997),a IriUgulartesl iscleftac:d by the upper and lower boundaries or the form: Z
= a+cV
aDd Z
= -a+3cV
n:speclively, with lhc apex oflhe IJianCle 81 Z In die: special case: where: an fJ:
=
Apin, one huonly todc:tennine Co as a runction or a and K. The standanilJUUp sequential melhocl has some: limitations because: or the requirements in the pn:-spcciftc:d maximum number of inlc:rim analyses and the: equal incn:ment in statistical information between interim anaIy&c:S. 1be~ arc. howc:vc:r, Oc:xible graup sequential pracedun:s that make: thc:se requiremeRls &lllllelCC5sary basc:d on the notion or aD ermr spending fUnction as proposed by Lan and DeMets (1983). Especially far lriaJs with a:nson:d survivuJ or lq3C8ICcI IIIC8SURlS data. it is necessary 10 be 8cxible in lhe: group sequential lest since typical inlerim analyses take piKe after unc:qual incmnc:nts in the infonnation fraction. Also n:cc:nt developments in group sequential methods allow early slOpping in order to rejc:cllhc: altc:mati\'C hypothesis just as the lrianlUlar tests do with the mpred 8cxibility, as pmpased by Chang. Hwang and Shih (1998) and PampalIona, Tsiatis and Kim (200 I). Wald's sequential pmbability ratio tc:st (1947) far No: 8 ='0 \'CI"I1IS HI: , =81 is Oplimal in the se:nse: Ih~ among all IeslS with ~ I aDd II c:nor probabilities a and p, it minimises the expc:c:ted sample: sizes E(/NJo) and E(NI9 I ). when: N is a random variable: for the: sequential sample: size. However, E(NI9) can be worse: than Ihe com:spondiag fixed sample: size: at ar near fI =(90 +'.)12. Man:oyeF, the: sample: size is unbauadccl and. in particular~ Pr(N ~ II) > 0 fOl' any PVeft n. Thus the motivation in ADdc:non (1960) was toftada sc:quc:nliallc:st that would minimise: E(MfJ) al9 =(80 + leading to the so-called 1RL\NOlJLa\R tBST. Triangular IcSIS have: been further developed in Whitehead (1997), as clcscribed below. In a gc:ac:raI proble:m.the: efficient SCeR Z for the: parameter or mlerat'. which typically mc:asureslhe ditTc:reac:le. has the: following asymptotic distribution according 10 UKELIHOOD theory:
'.)12.
Z !!N(flV. V)
a
= -2Ioga/'.
=2a and V =ole.
and c = '1/4
A solution Ls pa5lible far the: gcaemI case: 85 well when an. <
P. which is typically lhc: case.
Since interim anaIyscs are performed only a limiICd number of times. some: adjustments need 10 be: made in order 10 maintain Ihe opc:ralinc characteristics. This is accomplished by Ihe so-called "Christmas bee: adjustment'. which is described later. Suppose Ihal (Z-, \lit) clenoleS Ihe \'IlIac: or sequential statistics allhe lime an upper bouadaJy is crossed.. The: overshoot R is the vertical distance bc:twcc:a die: ftnal point oflhe sample: path and the: continuous boundary deftned &:
R = Z*-(a + ,·V.. ) In onIcr to account ror the diarrc:tcnc:ss anaIyses~ Ihe c:ontinuous Slapping criterion:
or
the interim
Z ~ a+cV is rc:placc:d by:
w~:
A
= E(R;9)
In developing triangular 1Cs~ two' different power n:~ menls are iJlCCiftc:d. Tmditional accounts of Icstiqthe null hypothesis Ho: 9=0 allow lwo outcamc:s in which ·Ho is accepicd' or 'Ho is rejccted'. However, three outcomes are possible in practice. 11Ic: power funclion C(O) is the probability of Rjecting Ho u.the: paramc:Ic:r \'IlIac: 9 cleftned as: C(9)
= C+ (') +C""(fI)
wa.m, C+ (0) denotes the: probability Ihal Ho is ~jectc:dancl it is concluded thallhe experimental intervention is superiar and C-(') deaotc:s the piobability Ihal No is n:jc:cted and it is
___________________________________________________________ concluded that the experimental intervention is inferior. ObviOUsly. for TWO-SIDED TEnS:
C(O) = a with C· (0)
= C- (0) = a/2
Twospecificpowcrn:quiremenls~eilhc:r_C+(91)= I-{J
orC+(9, )= I -(J= C-(-9,). 1bcaegive rise toasymmclric or symmdric lriangular tests for a two-sidcd lest of the null hypothesis. In cwtailmcnt sampling. one is inlclatcd in assessing the likelihood of a lRnd revenal. "n1ere are two possible ways: dc:lenninislic andslocbllSlic. This nalioa has been found to be useful in c:onsideration of early stapping for futility. In contrast.. early acceptance of Ihe null hypothesis is possible based on group sequeatial mc:thods and triangular tests discussed earlier. An example of detenninistic curtailment is Curlailed sampling in sampling inspection in which trend ~versal is impossible. Let S dc:ootc a tell statillic thlll IDC8S'IIn:S tbe diff(ftnce in inknenlion eft'ecll and let the sample space Q of S consist of disjoint regions. A and R. such thai: Pr(SE RIHo) = a and Pr(S .EAIHI) =/l LetldcnOlethelimeofaninlc:rimanalysisandletD(t)dCDDtc the accumulaacd data up 10 time t. A delenninistic curtailment test rejecls or acceplS the nuD hypothesis Ho if: Pr(S e RlD(t»
= I or Pr(S E AID(I)) = I
n:speclively. reprdlessofwhethcr HoCII' HI is 1JUe. Note that this proced~ does DOl affect Ihe 'I)"pc I and II CITOI" probabilities. As an example. consider tellina; the fairness of a coin, Ho: 1r=O.S venus H,: 1r~O.S. After tossing a coin 400 times. one will wnsider Ihe total number of heads S and reject Ho if 121 > 1.96 al a sia;nifiamce level 0.05 where:
z-
S-200 - -J7'!"'(400-X-O-.S-X-O-.S-)
'I.
=
= Pr(S E RID(I); 9)
Forsomc rOo YI > 1J2.astoclmsticcunailmClltlestrejecls the nuD hypolhesis if:
Pc(8.} ~ I andPc(O)
~OandPc(91)
< I-YI
According 10 Lan, Simon and Halperin (1982).1he Type I and II c:nur probabilities ~ in8aled but remain bounded fram above by:
a' = alYoandlf = {Jly. Generally stochastic curtailmenl is very COMerYalive and if Yo= 1= YI. it bc:comes deterministic curtailmcnL A formal significance test is only one factor in the complex decision pruccss of whether 10 conlinue. modiry or slop D trial. Interim analyses based on a;roup sequential methods. lliana;ular ICsls or stochastic curtailment procedu~s proVide objective gUidelines to lhe DATA AND SAFETY MOJmORINO BOARDS.
The choice for the mediad of interim analyses should depend on the dcsin:d opending characteristics of the study in terms ofthc early stopping prapelty.the maximum sample size n:quiremcnt and the expected sample size. For example. if the lIudy continues through all K analyses. the group sequential design wiD accrue I110IC participants than Ihe fixed sample dcsip. which is likely to occur if Ho is true. However. iflhe study slops at zin earlier interim analysis, Ihe poup sequential desia;n will aocrue participants on averqe than Ihe flxcd sample design. which is likely 10 occur if HI is true. A l1UIdomisc:cl Phase IU trial should neYel' be terminalcd in the early stages ofrecruilmeat merely because: it is failina; 10 reach the anlicipaled ·minimal' benefit CIlvisqed III Ihe desia;n staa;e of the study. This is because early tc:nnination of a slUdy in these c~umstanccs will leave Ihe associaacd CONfIDENCE IN1BlVAL unacceptably wide. thereby indicating the passibilily of a plaUSible. and maybe worthwhile.. advanID&C to one thempy even WheD there is no IJUe difference in intervention effects. ID such cin=umstances the level of uncertainly remains unacceptably high. KK
rcwer
(See also DATA AND SAfETY MONmlUNO BOARDS]
or CQuivalendy if is - 2001 ~ 20. After 350 losses. we will reject Ho for sure with 220 heads. With 210 heads. however. it dc:pends on Ihe fut~ outcomes. Consider a fixed sample test of Ho: 9 = 0 III D significance level a with power I - JJ 10 ddcct tbe difference 9 The conditional probability of rejection of Ho. i.e. conditional power. III 9 is defined as:
Pc(9)
Pc(O}
INTERIMANALY~S
> Yo
or accepts the nuD hypothesis (rejects Ihe alternative hypothesis) ir:
Andenoa. T. W. (1960). A modification of the sequential pRJba. bilily ratio ICsl to mluce Ibe sample size. AIIIIDIJ 11/ MatMmtltlml
Slatistics 31. 16S-97. AnInmbe. F. J. 1954: Fixed-samplc.-size analysis ofscqucntial obsenatioas. Biomt!lriu 10.89-100. MoN., H. . . . L K.1Uldsu.. W.J. 1998: GRlUpsequential dcsips usiDg both Type I and 1)pe 0 error pmbIbility spending functions. Commlllliratiolu ill Sialislicl, PaTt A - 71reory tmtI Methods 27. 1323-39. L.... K. K. G. ad .,... . . Do L. 19I1: Discrete: scquealiaI boundaries for clinical bials. Bionwlrilca 70, .659-63....... K. K. G., SbnDII, R. ..... Halperin, M. 1912: SlOcllaslicaUy CUltaiIed lcslilll in 1oar:-ccnn cJiakal Irials. Squentiol AM(y.rir I, 207-19. O'Brt., P. C. and .,...... T. R. 1979: A multiple btiDr: pracedwe for clinicallrials. Biometrics 35. Sl9-S6.............. So, 1'IIdI, A. A...... KIIII, It. 2001: Speadilll functions for Type I BDCI 1)pc 0 mar pobabiIities of poup sequential IriaIs. ",., In/ornlatitNr Journo/72, 247-60.I'oaIek,5.J. 1977: Group sequealiaI methods in die design and analysis of clinicallrials.. Blomelrilea
a.ua.
221
INFERNAL PILOT STUDY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
64. 191-9. W..., A. 1947: Sequmlial _/pis. Nft· YOlt: John Wiley .. Sans.. Inc. ~ J. 1997: tlaign and _11m of MqWlllilllclinicallriab, 2nd~iscdcdiliaa..Chichcsta: Jaha \VUey A SoDS. Ud.
n.e
Intema' pilot study
See PIDI' S11JDJES. SAMIU SIZE
DETERMlNAnox IN a.JNIC\L TRIALS
Interquartlle range 1bis ranse is a MEASURE OF SlllEAD defined as the interval between Ihc willeS that an: located aac-quartcr and ~ or the way thlUUlh the saaaple whc:a the abscnatiODS an: onlenxL Thus. it encloses the midcDc SO., or the data points. Far example. suppose the weights in kilopams of II elderly men from a community sample attending a clinic Jnlelquartile range: ...,
51 60
11
63 65 "
70 72
71
15 9S
t Median Then, the interquDltile range is Ihe interval bdwec:n the thinl and ninth values. i.e. 61 kg to 77q, a dilfen:nce of 16q. The iDlcrquarlile range is mast informatiye ifthe upper and lower Yalues are bath qualed. rather than simply the interval bcaween them. The lower \lalue is known as the lowerqlUUtile or 25th pen:enlile and the upper yalue as the upper quartile or 75th pcn:c:nlile. Iflhcnumberofabsemdions + I is diYisibleby4 thca the inlClquartile range is simple tocalculale. Iflhis is nalthe case then the \'aIues far the inlerquanile I1IDgC need to be inlerpolalcd. In general. the positiOD or the lower quartile is calculated by multiplying the sample six plus one by 0.25. and by 0.75 in the case of the upper quarlile. 11acn:fon:. ifanother man altends Ihc clinic with a weight of lOOq then the lowcrquarlile is now at position (12+ 1)/4 = 31/4: 51 60 61 63 65 66
t
Lower quutile
70 72 77 IS 95 100
T UppeI' quartile
1busthe lower quartile lies a quarter oflhe way between 61 and 63. inlcrpoJatal as 61.5 q. Simil.ty.1hc upper quartile lies dan:e-quutenofthc way bctwccn 77 and IS, inlcrpolalc:d as 83kc. The ilRl"quartile range is typically used as a mc:asun: of spladaroundthemeclian.Ulcethemedian..itisulClUlwhcathe daIa an:: not syrnrnc:lricallydislributcd because it is not unduly ....edcd by the presence or SKI!WNfSS orOUR.lEl5. SRC
Intrae.... correlation coefficient 1lR COltREl.ATJON (W'FIlENT
See INJ'IlACWS.
Intracluster COIT8latlon coefficient (ICC)
This is a measure that quantifies the exlenI of similarity among individual observations within cluslCl'5. For example. when a study collects data on palic:nls from a number or dim=at clinics. dac intracluster c:orn:lation coelftc:ienl (ICC) n:pn:.scnlS the clegn=e to which patients allcnding the same clinic an: IIICR similar than the patienls attending diffmml clinics. Also known as the inlnM:lus carRlalion cocfticienl, the ICC labs Yalues bc:Iwcen 0 and I. ~ the valacO cam:sponds to the silUalion whe.e incli\liduals flOlll the sameclusler an: no man: alike than indiYiduais frvm difrClall cluslclS and higher values indicale gn:aICr similarity within cluslcl5. The ICC has been used eXlcnsiyely in ICYcraI applicatiOD IR85. ID bealth services n:SC8Kh. it is used to measun: Ibc ex lent of similarity of patients within .tmiDistndh'C units such as baspitals or geopaphical units such as towns. In family slUdies, it measun:slhc degn:eof n:semblance amoIII mcmbcn or the same family. In psychological n:seardJ. it is used when examining n:liability (sec ~5VREMENT PRECISION AND IB.IABILITY). whIR the same measurements an: taken on subjects by dift"cn:nl 8S5C1S1n. When individuals an: sampled within cluslas such as hospitals or families. the ICC representing withiD-elu51er similarity is defined as the ~ parlion of the variation between indiyiclualslhat is explained by the yarialion bcawcen clusters. Fonnally. lids definition assumes a simple mndom ell"ecI.s model for Ihc EJIUIOIN1' of inten:1It, which includes rancIam clusterelTccts with VAlJAHCEa; and individual rcsidualell"cds willa \lariance~. and the ICC is deftncdas ~I(~ + 02). Tbc IDDSl common approach for eslimatinctbe ICC is to obtain eslimalCs for ~ ad ,r by filling the RANDOM EfFEtTS MODEl. (Danner and WeDs. (986) to Ibc data and to substitute thcsc inlothe farmula. CONFIDEHCEINI"ERVALS forn:porlinlalonpidc the ICC estimate an: also obtained usinc the IIISIIIIIplions of the randaaa eft"ects model. In most lCUings. neptiYe ICC values an: n:sanlcd as implausible. so ncpti\'C wlucs obtained f. ICC estimates an: set to 0 and Iowa limits or confidence inlCmlls an: truncated at o. When consicIerias the ICC fora biDaly OUIComc, e.g. the pn:sence or absence ora disease in family mcmbcn. an ICCellimalecan be oblaincd as described above. bul the mcIhads for conllnlCting confidence intenals an: based OD dift"mml 8Ssumptians. The simple model outlined ben: is nol appropriate for all types or design. When measuring n:liability. the definition of the ICC dilTers BC:lCOIdiIil to wbelbcr the focus is OD 'caasislency' or 'absolute qn=emcnt', as described by McGraw and Wong(I996). Some 5ludydcsipslaluin: man: complex models: e.g. when n:pcatcd measuremenls an: a\lailable OD patienls within clullen or when wishinlto estimate multiple ICC valucs simulbincausiy (Donner. 1916). RT DaaIIer,A.I986: Am-iewofinf'Cft:ace...,ceduresrartheinbal:1ass CGlRlltian caeftkienl in the ~way naIom cft'cds model.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ INVERSE PROBABILITY WEIGHTING (IPW)
Inlemat;otIal SIDI&Iksl Rene"' 54. 67-12. Doaaer. A. ..... Well, G. 1916: A compcuisal 01 confidence interval methods for Ihe inlraclasscondalion codlicieat. Biomelrics42, 401-12 M~w, o. aDd W..... S. P. 1996: Fonning inferencc:s about some inbKlass condalion coe8icieats. PsychologiC'llI Melhods I. 30-46.
K.
Inverse probability welghUng (IPW) This rercrs to a general method of adjuSiing M-cSlimaton (Everitt and Skrondal, 2010) rorconfounding or seleclion bias (e.g. due to CENSORED OBSERVA11ONS. MISSINO DATA or sample seleclion) when Ihe unverifiable assumption or no unmc:asun:d aliifounders. noninronnath'C: censoring or missingness at random is. respectively, mel. The underlying idea is to filler spuriousassocialions away from the data by weighting each subject's data inversely to the magnitude of those associations. In Ihc: process~ one redresses imbalances so that issues or confounding or selection bias may subsequenlly be ignon:d in lhc: analysis. We will illustrate IPW with two examples. Co_icier first a setting where Ihe relation between some exposure A and same outcome Yisofinlen:sl. bUI distorted by measured CXJIIfoundc:rs L. Then weighting each subject's data by Ihe reciprocal ofthc conditional density)tAIL) of expoMR A given confounders L. eliminales the association between A and L. and thereby eliminaleSconfounding by L. nus implies that SIandard measun:s of association be:twc:enA and Y, when applied to the inversely weidaacd data. are no Iongerdislorled by confounding due to L. In the special case where A is dichotomous. taking values 0 for the 'unexposed' and I for the "exposed'. the adjustc:d association between A and Yean thus be calculated as:
Lr. . . ,
wil
Yi/L;~"'-1 II'il- Li:.l.1.8 Jl'IDYr/L;:.l.... wl1) (6)
where Ihe weighls "''' = IIPr(A,= IIL/) and "'10= IIPr(AI= OIL /) can be estimated based on the filted "slues of a UXJLmC REORESSION model for A. given L. More generally, adjustment for measun:d confounding can be accomplished via otT-theshelfsoftwarepackages(sceSTATJmCALPACIUDB)byfiUinga regression model ror the outcome. involving the exposure or interest only (e.g. E( tlA) = a + /lA). while assigning each for those with A, = 1 subject's data the given weight (i.e. and "'10 for those with A, = 0). The impacl of inverse probability weighting is to standardize (see DEMOCJRAPHY) the expected outcome in the exposed and unexposed ~vcIy. with the total group as the refen:nce papulation (Sato and Matsuyama. 2003). It follows that. when L includes all confounden of the relation bclwecn A and Y. Ihe JPW eslimale (6) can be inlClpreted as the change in average outcome that would be observed if the talal group were exposed versus unexposed. As such. IPW
w,.
provides an alacmativc: to direct adjustment methods fOl' mcasun:d confounclcrs. where one involves models fOl' the regression of exposure ralhc:r than outcome on the: measured confounders. Consider Rexl a selting where the inlCRSi lies in a linear regression (sec: MVLTDU UNEAR REDRESSION) of a completely observed outcome Yon to an incompletely observed covarialc X. When missingness in X is inRuenced by the observed outcome and possibly also by extraneous. measured cowribul has no residual association with the missingX. then a regression analysis of Y on X in the complete cases (i.e. those wilh complete data in X) may give biased results. When the association bc:lWc:en missingness and its predictors is filtcml away through inverse probability weidating of the complete cases. the missingness becomes compleacly at random 50 IhDl an analysis of the reweightc:d complete cases bc:eame:s valid. This can be accomplished via olr-Iho-shelf 5OnW~ packages by fiDing a regression model for the outcome to Ihe complete C8ICS. while assigning each subject's data Ihe weight IIPr(R = UY. 2). where R is a missingness indicatorlhat assigns 1 to subjects wi'" complete data and 0 10 subjects with incomplete data in X. '11Ic: missingness probability appearing in the: weights can. fOl' instance. be estimated via logistic regression analysis. This idea that BIAS due to missing data can be c:om:ctcd by weighting each ofthcse subjects· observations by the inverse of the probability of observing complete data dales back to at least Horvitz and Thompson (1952). For many yean. the IPW method gained little: BlXlCplance because of its imprecision relalive 10 more popular missing data methods. such as MULTIPLE IMPUTATION. This has changed drastically over the: past decade. since the seminal wmtt of Robi_. Rotnitzky and Zhao (1994). who dcmonslnllcd how the precision of IPW eSlimaton could be greatly improvc:d to the point when: they bcaJmc: competitive with impulalion estimators. The recent success of IPW is largely due to its abililY to enable adjustment ror timc:-vmying confounding where standard regression adjustment fails (see MAROINAL STRUcrtJR..\L MOl). ELS). This abilily results from its capacity 10 filtc:I' associations away fiom the data. IPW methock have the further advantage that they ~ generically applicable in a wide variety of seltings. that they enable relatively simple sensilivity analyses to investigate the impact of violalion of unverifiable (missing data) assumptions (Scharfslein. ROlnitzky and Robins. 1999) and thai they ~ less prone to exb1lpOJation than mlR standard adjustment methods (e.g. regression adjustment for confounders. multiple imputation) (Tan. 20(8). The main limitation of IPW estimators is Ihat they can be: unstable and imprecise when somc: subjects n:ceive large weights. This may happen when Ihc:n:: is strong confounding, strongly infonnative cc:_orin; or missingncss. 01' when a continuous exposure requires invc:nc weighting by a density. In that case. one muSi consider heuristic weight
atcsz.
223
INVERSEPA08AaLi1Y.WE~
(IPW) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
tnmcatioia~(CoIeaadHcman;, 2008)
ar IaXlUflC.'lo
el1iciaal (daubly ~) IPW ~ _ _ _ (Rabias 'eta'.. 2001;- GacIpIuk, Vaast=IancIt and Gadpe..... ~ '!lie 1.1IC1'_.typicallyallD~....D&i.tcnnsof mbUstncss __inst model -;••~fic:"L- but.8M 'COIDpu-IalionaIly mOie . . . .&-· . .Finally.;l~ IPW II1CIJHKI enjoys Ihe ~nc:Jtpec:lccl property lhid iD esumaied ndlertban kllGWD _ptslcDds to incaase ·Howcftr; when standard SlatiltjcDi softw_ packqc:s (c.c. sAS.. STATA. R: statistical packaps) ~.used·ror IPW. caution ~ ~ in interp-et,inc.lIM:·itandanI Cmn POYicW by lhe iDftwan:,. because ihcie ip~ the iliipn:c.iSion of Ihe Cslimaled WeiibiS. By IDUIe
•
.
•
•
___
<
••
,I.,illgellicieney.
.....,..-.. _ . .
see
uSiac muliDe~ ~l n:paIt IIH;aIIed I8IICIwicb cstiJnaIars (B~rill ad SbondaJ~ 2010)•. conseIWlive sl8DCl8nt enars an: ·obtlliaccl. . . SVIEG ~& ..... ......,M.A.:1ODB:..~~prab-
ability ....., . mqiaaI . . . . . mocIeIs.Ammt'dIIliJunItIJ t1/ EpiilmriDIogy 168.·656-64. ___ a. & . . . . . . . . A.
.~IO: CtiDtbridge • ."",., tJ/ ~'aliJi~ 4iIi ~ CaaIIIridp UiIiYcnity JIr-.,.CUdIridp........., So. v........, & ... ........., Eo 2OOB: EstiDticin of caIIbaIIcd ·cIirid effects. JOIIIffIII· fI/ ,• . , . Slt"&lktll ~ Se,•• B· 70.. 1049-66. ...... De G. ... D. J. I~ A pnadzaliaa Of IIIIIIpIiaI ~ . . . . . . flam II· fiDife·uai¥asc.. .J_ _ tJ/. ~_ ARrIrktBr ·SltltUiitd ~,.,ritl'_ 47. 663-85• ....., J. K, ......" A. .... a-,...... 1994: Estimiliaa· of Rp:aiaa
n..r--.
caeIIdcats win ..-c·~arelllll.a,salasencd. JDII1IItI/ GIllie ARla'itvrlr SlIIIulktll ARIIdtiliM 89~· 846-66. .......... ,. Mow .....M., 1eLGwm, Go'" IW"""" A. 2001: Pcrfanaucecf .......... cstiIMIDrI ~ .•~ .~' ___ an: biPly \llliable;. Sta,miL'fll. St.VIIar 22. 544-59. S'Iatot T . . . . .....~ Y. ~: MqiaaI ....... models as. IDDI. far ~ ~. 14•. 610-6. .......... Ii. 0..
....'''»:'A.,.........'.·M.lmAd,iasliacforaoa-ipaabIc
..... asiDe icmipiuameIric: IICIII-rapaaseDlCldeIs.. Jt1IITlIII/ ojlhe Amtriam SltltUtiMl· ADoritlli", M. l096-i1O. 1'-,
u.....q
Z. 2008:
~.IIS, ~ DR. SltllillktII.Srim« 22, 5dO-8.
J Jaccard coefficient
or CXIUIIe. this is irnprKtical. Howe\ler. it sugeslS an altc:mali\fC approadL Suppose we draw B number of subsamples fram Ihe data and calcullllC Ihe a~e height in each ~lIIese subsamples. 11Ie variation oflhe aVCIDIC hcipl in lhese subsamples about the wille in Ihe 0bserYcd data mipt be a reasonable estimate orthe variation or i about die
SeeCLumJl ANALYSIS IN MEDICINE
Jackknife meIaod
1he jaclcbire was originally pr0posed as a method for estimating biases (QllenDuille.. 1949); soon afterwards it was applied to estilDllling SIandanI enors. Subsc:qucnlJy it has bc:ca applied more widely. In particular, the "jackknife ~ bootstrap' , discussc:d laIu, is a usefill tool
InIe adu.k height. 11. One metbacI or cDIIItnICling Ibese
sUbsamples is die jackknife. Thejackknife da'asets arc canllJucted byomillinl oneobserYlllion in 'urn from the obserYeddalaset. Thus if die dataset bu seven obsenations, th~ arc se~n jackknife dataselS. These arc shown in the thinl iow in Ihe figun,. Haviq oblained Ihe jackknife dataseu, we simply calcu1* the averqe heilhl of Ihe six adulls in each jackknife dalascl. By can\'elllion. lhese .~ denoted i( I) •••• ,i(7J, whaM the subscript number inclicalcs the ob.TMtlon Ilrat is omitted. Recall that we arc intc:n:stccI in ellimatinl Ihe variabilily or II. LclIhe average of the jadknife estimates be 'Co} (itl ) +9(2) + ... + 11(7))/7 ad let n equal Ihe sample size: (7 in dais case). The jackknife estimate.of
for understandiDg the influeneeofindiYidual abscrvatiOllsoR BCXJfSIRAP analyseS. ~ we lDDliYalc the jackknife using B hypalhelical example and illuslnlle it usinl a _ dala". We Ihea brielly ~iscUIS its rclalionship to.the boaIslrap and 'jllClcbire after booIstrap' analyses. Suppose we wish 10 ellimale the Bvcng,e. adult height in a population, which we ~ by 9. 10 do Ibis, we lake a sample or seven adults from lhe population and caladale h avcl1llC beiPl in Ihe sample. Let the ftnt lOW of h figure rqRlICIIl the popuIatiaa, ftvm whidJ sevenaduil5arc sampled. Hale thatlhe numbers I, •.., 7 inclc:x lhe sampled _llS; lhey . an: DOth acIual he" This sample isrepraental in the box in lite sc:cond IVW. The averap heighlofihe sew:aadults in this sample, dc:aaIaI by II~ is our estimale of' f!. SuppoSe we now wish ID calculale ·an estimate of lhe yariability or i about 9. HypalheticaUy, we could do Ihis by drawing B number or I'UI1ber samples, .also or size 7. rrom the population and . calculllling the avem,ge adult heiP.t5 in eacb or Ibese•.
4 ••• : •
=
V~il:
·6.·.· ............ . ,1-. : ••• :. " , •. ] ••••••••••.••••••••••••••••.•• : •.• . .... '. 5... • . !:. ,t . : . . . . . . . •• ,'. • •••• :. •••• I.' ••••• 2' •••• , • '.:.:.4,', ......... ' ..... . . , . . I I : I ' ," ............... ' I I I"' . , , ' ....••••. : • • . . • : . •
•
,.".,
t.•
•• t
' •••••
••••
• • • • • • '.
'.
I.'
••••
••
.:'
t
-
'.
I
:.
I • • •
'.
II
J
••
••• ' '
I..
•
•
••
•
••••
••
Population parameter value e
•
0 2
)
,
(;
"
Sample
parameter estimate •
1
DD[J2[J2 []1 []2 []2 3
4
3
~67
~
~
3
567561567
3
4
3
675
~
3
4
7.56
The 7 jacldcrllife samples parameter estimates
G(I). '.,. ' ... '..,. i ....... e~
"MIJIIIIIII-
&qdfl#NMli.: 10 Mdwl S1",.1k1: JRMII E4i1itM EdiIaI by Briaa S. Ewrin MIl f:luis1Gpher R. JIaInaew C 2011 JoIID Wiley I: Sou. ....
22&
~CKKNIFEM~
____________________________________________________________
(NOIc., in passing, dlat by conveation. the jackknife estimate of variance mITers from the boaIstrap estimate by a factor of (n - I)I"!' but this is nCJlilible unless" is small.) 11ms Ihc jackknife is motiwlCd by an ....Iogous version of Ihc: baolstnlp principle. in whicb jackknife dalasets replace booIstnap datascts. HDWa'er. in comparison with boauIrap dalascts. jackknife dalasds 1ft much less variable. Therefol'C. extra multiplying facton. like n - I in the numcndor of equation (1) ~ needed to make the jackknife work. As an example. consider Ihc: data in Ihc lint table. We will usc Ihe jackknife to eslimate the variance or Ihc: avc:rage c q e in the carbon monoxide II'aIIsrcr factor. which is
(33
+ 2 + 24 + 27 + 4 + 1-6)17-12.14.
)acldcnlfe method Data on Ihe CIIIbon motJOJdde ITansfBllactor for sewm smo(cers with chIcIcenpox, measuted on admission to hospital and atler a siay ot one week (Davison and Hinkley, 1997. p.67J &II,.,
Weelc
Change = ('Veek-En'')',
I 2 3 4 S 6
40 SO 56
73
33
52 80
24
7
66
Palien'
sa
60
62
85 64 63 dO
2
27 4
I -6
The seven jackknife samples. and Ihcir corrcspoading rnca.ns, ~ shown in the second lablc. As the slDlistic we ~ usinl is.. ilia ~. it tums out that ;(.) = (9(1)+ '(1) + .•• + 9(7,)/7 12.14. From equation (1). the
=() =
jackknife estimate of variance is:
ora = ;
{<S.67-12.14)l + (13.13-12.14)2
+(10.17-12.14)2 + (9.67-12.14.)2
+ (13.50-12.14)2 + (14.00-12.14)2 + (15.17-12.14)2\- 5.Sl l
al~s
Nole thai Ihe jackknife esliJle of variance: exactly with the result obtainc:cl using the usual formula for the variance of a mean. This is a feature of thc statistic usc:cl in Ibis example (the: mean) and will not occur geacrally.
Jackknife method JIlCIduJiIe slllfllJ/es lor the 'Change' data from the first table Slatislic (m«llr}
Dalafiam flat table
33
2
24
27
4
I
-6
i= 12.14
2
24
27
4
I
-6
ill)
=8.67
33
-
24
27
4
I
-6
i':II)
= 13.83
33
2
-
27
4
I
-6
i,,) = 10.17
33
2
24
-
4
-6
i,.. ) =9.67
33
2
24
27
-
-6
i,,) = 13.50
33
2
24
27
4
-
-6
ito)
33
2
24
27
4
I
Isljatbifc
sample 2ad jackknife ample 3Rt jackkaife SlllDplc: 4th jadcbife sample 5th jacltbifc: sampIc: 6th jacUaifc: sample 7th jacUaifc: sample
t."
= 14.00 = 15.17
Appan:at similarities or the jackknife and booIstrap an= indicativc of a cIc:epCI' ",Ialionship. II IUms out that, in sitWllions whc:", Ihc jackknifc works. it can be: vicwal as an appmximation to the booIsb'ap (Brron and nbshirani. 1993, p. 146). The allnIClion of the jackknife: is thai it is computationally easier (as ~ are only a finite Dumber of jackknife samples). 'I1Ic clisadvantqc is thal it only WOIb in IIaIR ",sbictc:cl situatiOns. In particular. the jackknife only works forslatislics wha5lc value changc:s smoothly as the data changes (Efron and Tibshinni, 1993. p. 141). The mean is an example of a smoolh statiStic. The MEDIAN is not. however. because as the data wlucs change it does not chanlc smoothly. For example. the median of tile: change variablc in the: fini table: is 4. Suppose' we now increase the second observalion from its observed value. 2. While: the new value for the: sc:concl observation is less than 4. the median ",mains unchangc:cl. As soon as it is greater than 4, the: median chanles. This lack of smoothness malccs the: jackknife fail for the: median. By contrast, the booislmp works for the median. A useful application of the jackknife is known as the ·jackknife after bootsbap'. Following a boaIsb'ap .....ysis. the boo15ln1p datascts an: cliviclccl into gmups: thole that do DOl contain the first obscrwIion.those that do not contain the second and 50 011. In oIhcr words•. we fonn jackknife IIUUpS fl'Ulll (i.e. after) gc:acratiq the: bootsbap datasclJ. The analysis or the bootsbap data can thea be performed on each of tbcse jackknife groups in tUID. Marked cliffCIaICes in the results bc:twecn thcjackknirc gmups incIicatc that a particular observation. or group or obscrwtioas. is strongly alTecting
__________________________________________________ the conclusiolLl. For example., if jackknife gnJUp 1 (i.e. the paup or boolSlnlp dalascts thai do not contain the first obsemdion) results in a P-VAWE abo~ 0.05, but all the other jackknire groups result in P-values below O.OS. 1bis suggesls thal the results critically depend on the first obserwlion and should be intcrprded cautiously. In prac:tic:e. 'jal:kknife aller bootslnlp' resuJts arc usually displayed gJ1Iphicaily (e.g. Carpenter and Bithell. 2000; Davison and Hinkley. 1997, p. 117). (For further reading. sec Ermn and nbshirani. 1993, or Da'Vison and Hinkley. 1997. badI of which discuss the jackknife comprehensively.) JRC
JOURNALSINMED~ALsrATIST~
Janclcheere-T_pstra test Mcm-2 values In a breast C8nctJ1 study accorcIng ID increasing IrIsIoIogit:sJ tlflJde Histologic-III grade J
2
1.99 3.01 4.17 7.13 9.12 9.91
4.40 9.12 10.23 11.99 11.99 1l.J7 13.20
3 6.94
8.01 9.12 IS.7S 11.30 25.01 26.40 28.17
(Ac:know..........at: James R. Carpcater was supported by ESRC Rescan:h Methods Prognmunc granl H333 150047. tided 'Missing dllla in multi-level models'.) Carpeater, J. aad BUIIeII,.J. 2000: Buobtrap conIidcnce iDIerwIs: wbea, Uich, wiud? A practical pidc for medical slalisticilllS. StalUtics in Medidne 19. 1141~. Damo.. A. c. ... lUMley, Do V. 1997: Booulrap IMthodr tmtl their applkaJion. CImbridge: Camllridge UDiYmity Pn:ss.. 8. aM 'I1IMIIInuII, R. 1993: An inlrotilltliolr 10 the boolnrap. New Vorl: Chapman ~ Hall. QtEIIOIIIu.. M. 1949: Approltimale IeIlS or cOrRlaIion iD time series. Joumtll t1/ lire RO)YlI Sialislimi SoC'iel,.. Serin B, JI. 11-44.
maa.
Joncicheere-Terpstra tast
This is a nonparamearic test farordered alternatives with the null hypothesis thai there is no cliffcRncc in gruup MEDIANS aad the altcmathoe hypothesis that the group medians incn:asc in a specific pmletermined sequence. It is used when the assumption Ihal the independent 'Variable is nominal in the KRus~WALUSTESf is violated. As it allows the independenl variable 10 have an Older it is more powerfuJ than the Knlsbl-Wallis test when the groups arc onIcn:cI. The method begins by specifying the order of the groups. which nced not be of equal size. Then cast the data into a lwo-way table with the groups in the prespccifted order. with the group wilh the lowest median first and the data within Ihe groups ordered from the smallest to the IBllell. Find the tolal number of times each value in the: first group pn:ccdc:s a value in the subsequent groups. Add I h 10 each preccdent count when a lie occun. Repeat for the Rmaining groups and sum over the groups to give J.the test stalistic. Compan: this value to thal as found in standard tables. forexamplc. in Siegel and Castellan (1998). To illustrate. Mcm-2 Yalues were collected in a breast cancer study. 11Ie media Mcm-2 'Value was expected to increase with histological grade (sec the data in the first table). This hypothesis was lesled using the JanckbcCRTerpstra test. wilh intcnnedialc calculations shown in the second table.
Janckheere-T_pstra test Derivation of the Jonckheere-Terpslra test statistic from data in first table Precedent munls Grodes J and 2
G,.ades I and J
7 7 7 6
8 8 8 1
SOS
5.S
S
5
GrDties 2 and J 8
5.S 5 S
5 5 S
TofIII37.5
41.5
1hercforc. J= 37.5 +41.5+38.5= 117.5. From tables (n.=6. "2=7, n,=8. a=O.05) the critical 'Value is 99. As 117.5 > 99. there is suf1iclent evidence to n:jcct the null hypothesis and conclude thal there is a significant iDCMaSe in the median Mc:m-2 value as the histological cracIe increases. For rurther delails sec CODO\fCI' (1999). SLY C""1!r. W. J. 1999: Pradit:allltJllptlTtllllelriC' stOl&tks, 31d
edition. Chicbcsler: Jalm Wiley & SOlIS. Ucl. ..... S. aM C. . . .n. N. J. 1998: Nonpartllfltlric slatistics fiN' 11ft bt/rtt.iDral sC'itfWS, 2nd edition. Maidenhead: McGraw-HiD.
Joumalsln medical atatlstlca
Then: an: thousands of published articles on medical statistics sc:atten:cI very widely IIuougbout the statistical and the biomedical lilCnlbR. 11Iey re8c:cl both thedivenity and the complexilY oflhis· diSCipline and range rrom highly theoretical to the mast mundane practical applications. This section pro'Vides a brief overview of lhis literature. together with an historical perspective of the rise of joumals in medical slalislics. Papen on aspects of medical statistics have been published in the biomedical lilCnllUrc since the late 19205. 1bcir purpose is to explain and illustrate specific techniques. to
~IN~8TATlsnC8
..........
.
~1cIII"'" .kJumIIJs,.,.,.", . . . . it i!fedJt:tII statIsIIcs 161 WII.
TIlle
iDOJ
I'IIbIIIII6r Vol....
I'tIga
(~)
"".,..".,
~"'1Ii
1901 1904
Bialnelrib'Dust AM
90(4) 129(6) 157n5l(24)
"...
.
iODI
Vdunw (
'2)56
II 41 214
95(4) 134.(6) 1671168(24)
9M
m
1'tIp8
"..,..
lGOB
75
.,)
964
4~
'2981
354
.A.rn. JIIIIIUI #1/
lui
OUP
~
1936
61(4)
616
31
73(4)
194
50
59(4) 57 (12)
II. 996
121
JDIIntIiJ #1/
1945 1947
~~ .Sacicty lIS BMI
·204
64(4) 62 (12)
1_ il04
138 206
Bril<·1fIIInMII1l/"...
1947
BPS
56(2)
J8I
21
61(2)
m
26
1955
ElICYicr
·56(12)
1-
166
61 (12)
l_
In
1959 I_
W-lley-VCH
45(8) 1Jfl4(12)
1042
'J2
50(6)
ID
~(l3)
Ie.
121
1964
EIsnicr
1.7 1.7
DIA
450 626
46
DuIIri-......
.J7(4) .41 (12)
a6
"".,..,.". JIIIIIIIIIl tI/
1972'
OUP
32(6)
1134
~.""'" ~c. . .
I_
1979
OUP EIInicr
25(1)
24(.8)
91 904 ..
22(24) 13 (II) 14(7) 12(4)
J932 742 1011 221
~.,
1Ip"""""-
,.,.",~
.c~ H«IhIPI
iMiittiJ _ St.tutiall . pgcWtid4i
MItirt4 II/CIiIkIIl .",.".,.".,
...".,.,I"""
a-.w~..,.."b· CIiIIif:tII.'IHtIlr ~, ",." ."",. "".,.".,. JIIIIIIIIIl tJf
.I'" ,,,,.,,,.,.,
~~~)
79
1110
256'
42(6) 46(12)
640 e62
6f 76
140
37(7)
I_
196
9
30(1) 19(6)
1'71 910
10
•
247 100 125 20
'Z1 (30) 18 (12) 19(7) 17(6)
6U4
.4(D
.946 I_
121 125
:JI4
39
~",..",.,.,,-
.,.."a~.
1WGD: Dfo~._. . .
.
-~
SIIIIl6Ib ii
"Mit"
Aaaa&fI/~'" ~
"".,..,.". JtIIInfIIl tI/
'lI"'iI~
1912 1990 1990 1991
w-., EIInicr
Kluwer
WIIIIr(..,)
'"
RamtrIt'
~D/~ktII
1.1
.,..,(nylar &
13(4)
116
54
11(6)
1236
81
1992
Fnacis) AIaDId(_)
12(6)
SS4
30
17(6)
661
41
ICIuwer (SJritpr)
9(4)
412
22
14(4)
S10
32
iloMed CcnImI
650
2 44 'Z1
-(12.) 9(4) 1(1)
786
56 80
324 lOI
17 24
8(4) 7(4)
402 lOI
18 29
.Sltllillia
SltdIIInllI....·• ~ c",~
iI'ri- DtIItI AlMlpu
""""'II/~~1kJ'" .
1"5 1-
1IIoIbIc.tmJ(IIMC) ~ ,."",iltkl
2000 2000
OUP
IlIaMNc..InII fllllCJ
2001
B....,.Ce~
-(2) 4(4) 3'(1)
2001
AnIDId(SRp) Wiky
3(4) 2(4)
75
~1lamrdI
~
SItIIl6lit:tIl . . . . """"-"'IiNl JltlIilIIia
200i
JOURNALS IN MEDICAl.. STATISTICS
Tille
UntkrJllIIIIliIrg S'a',,'ia
19/1'01.
2002
Publisher
200J
Lawreac:e
2008
Volume (i£SUes,
Pages
Papers
2(4)
280
16
Volume (imle8,
Pages
Papers
642
66
Erllleum CliRittil Tritlb: JOIITfItlI ofthe S«iety/or Clinital TritIls Ep;demiologie Persperlirer aru/ IIUfOI'OIions Emerging 'Thmra ill Epitltmiology 1,,'trlftltiOlltlI JtHmfQl of BioslaJislies
2004
Arnold (Sage)
S (6)
2004
BioMcd CCDIraI
- (5)
B
2004
BioMcd Cenbal
- (8)
2S
20DS
Berkeley Eieo-
4 (I)
21
bailie PEas
Paper CCIUJIIs far some jaumals ~ .....,ximale IS some Editorials IIId Commentaries have bcea excluded. C1'Prmously Amtrit:tm Journal o/II)'gime; JOIlI1falo/Hygiene; (2'prmously BiDlMlrit$ BuRttiJr. ~)pm'iously British JOIII1IIII ofSotia/ Mttlirint Ibea Brilish JtHmfQl If Prrrrlftire IIIIIl Social AI_ine; Joumm of Epidemiology _ Conmrimily Mttlirint; £pitltnrJolo". and COIIIIIUIIIity Health; ("'previously Brilish JourlflllIfPS)'tllOlogy: StalislicalSetIiOll du:n BritWr JOIIIfttIl ojStalislit.YII PsyMoIo,,-:. (StlRYiausly JOIlmllI ojChronit Disetues: (·~·iously Clilliral Tridh JDllmQI and rlUlD 1995 incoIparated wilbin Conlrollftl ClillirtllTria/r. (7~viausly D,.",In/ormalion Bullelin: (.tlRYiausly IlIIemtllions/ Jourlftll o/Cliniall Plrtumar%l)' ad Biophamftley; Inlemalional JDllmQI ojClillirtll PhtumdtolDg)' tmtI'1'herapeulits; InlemdtiDnalJDllmQI oIClillical PhormtltDlogy. 1'heropyandToXitology. (etpl'eYioaaly eonlrolletiClillirtll TriDIs: DeJign. Mtthoth, tmtl AnII/yJis; tlO~1IICr Journal D/Ctmrer EpitltnrJololY and Prt''tIIlion.
poiDt out inc:orrccl applicalions aod to make the radcnhip Ic:nerally awan: or how. and why, statillics can lead to poor experimc:naal dcsilOS. inc~1 analysis and unjustified conclusions. Publication within the biomedical (instead or sllllislical) lilerature caables din::cl contact with researchers and gn:alcr impacl dnup explanation within a specific medical conlext. These papers caD be grouped broadly into sevcaarcas(sc:eJohnsanandAitman. 200S. rormoredelails): isolaled papers on a particular statistical issue (e.l. the CHI-5QU.o\RE TEST and JNIENlION-TO-11l£AT analysis): series or thematic: papers dedicllled to a narrow statistical area (c.g. systematic mviews and CONRCENCE INIERVAU): series or papers coYCljng broad areas of medical slalistics (such as lhat by Bradfoad Hill published in TIle lImcel in 1937): luidclincs (n:cenl examples are CONSORT and its exlCDsions ror reporting clinical llials and STROBE rar observalional slUdies, all being available at www.equator-netwom. org); surveys of published papers ~porting Ihe frequency or uSBle of statislicallechniques (and the slalislical knowledge requin:d to understand the Jac:arch literalure); reviews or published papers examining critically aspects or design, analysis. conduct. praenlation and summary (all leneral medical journals, and many of the specialist OReS havc been the subjcct of these); and SYSTEMATIC REVIEWS AND METMNALYSIS incorporaling assessmeal or methodological quality. Papen on the scncraJ Ihc:ory of techniques employed within medical stalistics. as well as specific applications. A
have been published far ewer a centul)' in slatistical journals such as JOIII'IIQ/ of Ihe ARlerictIII SlatuliclIl ADocilllion and those produced by Ihe Royal Stat.illical Society. London. However. the Grip. of medical statistics lie in the older disciplines orbiolllClr)', psychometry. cpidemiolOl)'. dc:mographyand actuarial science. and it is the journals in these speciallies lhat published many orabe early papers in medical statistics: some orthcm continue to do so today (see thc table far same examples). Biomelrika (from abe Biomelrib Trusl) publishes papers 'conlaining original lhcorctical conlributions or dirul ar potentia) value in applications' •whileBiomelrits(thejoumai of the International Biometrics Society) 'pR1lDOlCs and extends the use or mathematical and stalislica1 mcthocls in pure ud applied biological sciences'. 1bc BioRlelrical JourMI. also publishc:d O\'Cl' maRy years. aims ror papers ·on the dc~'CIopmenl or slalistic:aI ud related rneIhodolOlJ and ils applicalions to problems BrisiRl in all areas or abe lire sciences. in particular medicine'. The Ihree psychology journals. P~ycbologitaJ Bllilelin (rrom the AmcriCaD Psychological Association), Psychomelrilca (rrom the Psychological Socicly) and the Brilislr Journal 0/Mlltiremlllical anti Sllllislital Psydrolog), (rrom the British PllychoJogical Society), all publish articles on the developmenl of quantilalive melhacls in psychololY. 1be principal applications or medical slalistics arc within EPIDEMIOLOOY and CLINICAL 'I1UALS and it was the journals within abese specific areas that nexl took up publication of
229
~NMBINMED~8TAnsnC8
__________________________________________________
papers OIl the bmadest ..peels of medical llatislic:s. as it applied within each of Ihem. 'I1.Ie lable proyicli:s a &st the principii (English Iquqe) jaumals in these an:as topthar with sane details ~ the Size of the volumes published in 2003 (Iiam Ihe lit edition of this wlume) and 2ODI. AD Ihc epidemiological and clinical IriaJs jaarnaIs publish papm describinglDdhadolacical clcYelopmcats (RJUgblyS tJioftheir caateiIt). Such papm IB alsO fcHmd in the meR specialised jaumaIs ~ to specilc disease an:as. e,a. the Brilislr JDIIrIIIlI of Cancer aad Ihe Irrlemtltio_1 JtIIlmQl t1/ 0Drcwr.
or
It was nol until the 191O.s abat the discipliDc or medical sbllistics was .ftnally n:copisccl by publicalioa of its own jaumals. Sltllislica I" MftiJdne, which lRsenlS practical IqIpIicliliOllS or statislics and other quaalilalivc: .1DC1hacIs to mcdiciDe ad iIs applied seienC!CS, lOgCIher with. aU aspecls or·dac: collection, analysis. pn:senlaliaa, aad illb:lprdatiim of medical ~ and ~ to c:ahancc communication belween SIali• •~ cUnici_ .and medical n:scan:hers' ~ SIaIted in. 191 I and was soon followed by the reView journal, SllIliallcttl Method:s in Medictll Ruelll"Ch. 11Ic launch.of Ihese journals ciIincicIed willa a period ofhllle incn:aue in baIh thc.•maacI ror medical SIaIi.iticiaD.S aad iD Ihc: dCw:iopment of sophiaticlllal macIeIling Icchniques. c:aabled particularly by inCtased campulini power ad f'uelled by die need 10 ran:c1lSl the rcquimneDIs far fulUrc heaIthcarc and the: extcDl of the AlDS·cpicIeaaic. BythelUmorthe:century.thec8pacilyofthe curn:nI jaumals cauIcI nol fCRllalI ruilher specialil8lion.. rqxesenlal by P~lIt:eulktll· SltIIlsllca. 'ClJllCemecI with
the appUcaUOII aad usc of stalistics in all slages of. cIruI devcl~nl'. and StllluliCllI Modelling.. or Ihe need for another journal of biostatistics and another for clinical trials. 'I1Iis e~paniiOD has continued since 2003 with. the launch of f~ur ~ journals as well as expusion in numben of paps and papcn in IIIIISl of the journals listed in the table.. Mcclical IlatistiCS in~ with all. quantilaliw: 8IC8I of biomcclicine and alany qualilative . . . as well. k is no surprise daat in addition 10 Ihe publications mc:alionc:cl ~ady, .....y papc:rs ~ also f_ad in established jou....... such a.s ~etlietll Deem_ _ liking. J,!UI1IQ/ of ""~retictll Biology and MllltiMrillle Behll11itRI'tll ReKtl,ch, as well as lhase daat Jie at the: inlcd'aces with eampuliJll (Co"'P,,'en in BiDIogy .Md Jlerlicine9 Jolllfllll of Iiiomedit:lll Olmpllling aad Stlllislics ad Computing) and malhematics (CompIIlIIlioRtll tIIUl MlIlhmttJtiml Metll. in Medicine)~ altiftcial inlellipnce, qualily of lire and health ~ics. as well as the comparaliw:1Y rea:nI an:as iqnscnted by Erit/mt.'e. . .d Medicine and· JOIll'llllI of BioiIffDmllllic:I. Yet IDDI1: joumals will. appear in the rUbIn:., SOIDC as papeI', odac:n eIc:cIronically, and with the welcome inlnMluclion or an
impDllaDl policy of opeD access.
TJ
.,...., T~ .... A...... D.' 0.2005: Medical joumaIs. Slllistical articles. In Armitlp, P. and CoI_ T. (a), EMyd. . . DJ 6ioJltJliJliu. Claichcsta: Jahn Wiley a: ~ lid... 2ad cditian. p.3151-3162..
K Kaiser's rule This is a rule for selecting a IWmbe:r of common faclen in FAC'I'OR ANALYSIS and the number of components in PRINCIPAL cm.tPONENTS ANALYSIS. The nale is to relain as many factors or components derived from the sample correlation matrix duat have VWANCES greater than ODe. The rationale for this rule is that factors (components) with variances ,ruler than one ICCIOUnt for at least as much yariability as can be: explaiDed by a single observed Yariable. Those factors (components) with YBriances less than one account for less variability than does a sin,le observed variable and so will usually be of little interest to the invcstigator. Funher discussions of the nile ~ givc:n in Floyd and Whlaman (1995) and Preacher and MacCallum (2003). BSE Floyd, F. J. aDd W.......... K. F. 1995: FadGr analysis in abe dcvc]apmaat and refincmml of clinical assessmeal insbUmcnlS. P~olo,;caJ Asusllftr'" 7, 28t)..99. . . . . .ber. K. J. aDd MuCaDum. R. C. 2003: Repairing Tom Swift's electric: faca
occUlled just befon: the censcnd limcs. The KapI~MeiCl' estimate for the probability of surviving up to time I, S(I), is then given by the produci or CONDmOH.U PROlWIlLmES for surviving each ofthc:se lime intervals (given surviyal duvugh the preceding time inlerVDls) before or including I. More precisely, suppose dlat the:re are n subjects drawn randomly from the papulation of inlc:n:st. with observed survival times (both uncensored and censored) ,. I". Assume thai then: wcm d events occwrin; and that the r unique event times 4XHIeSponding to these events can be < '(2' < ... < '(r)o Let the: arranged in asc:encling order DS kth lime band. starlliom 'eli) aDd end at a time just befon: I.k wrillen Ik =(Ilk), '(11 + .,). Also let m, and t4 be Ihe IWmber-ofsubjectsat riskjust WeRl(.t, (i.e. at the beginning of the kth lime bad) a.ad the number who experience Ihc: event ofinterest al/ek, (i.e. the number-of events occurring in the kth time band) respectively. Tllen, the probability of surviving up to time: , is eslimalcd as: 0 ••• ,
+'"
'(I)
I".
analysis machine. Vlldersladin, StaliJl;ts 2. 13-44.
S(,) =
Kaplan-Meler 8811...tor
Also known as the product limit eslimator. it is one or the: most commonly n:porled methods for describin, (estimating) the survival dislribution or survivor runction (see SUR\'IVAL ANALYSIS - AN OVERVIEW). 5(1). oCa homogeneous population in lim~to-cYent (survival) slUdies with riPl-cENSORED OBSERVATIONS. Kaplan and Meier inlJOduced it in 1958 as a nonparametric method for using the exact survival (or event) times of those subjects uncenscnd in the calculation or the survival (or eYent-~) probabilities. The method lakes appropriate account or the infannation contained in censored observations duough the dennition or the number at risk. b rcquin:s. however. that the assumption of indepc:ndcnt censoring bold to be applied yalidly. The Kaplan-Meierestimator is based on the panitioning or the time durin, which subjects IR obsem:d in a sequence or nonoverlapping time bands (or time intervals), whem each lime band CORlairw only one distinct (unique) uncclUlOl1:d event time and this event time is taken to occur at the: start or the interval. If censored and uncensorm limes are tied. the CODYention is to CODsider the uncenscnd times to ha~
H("j~~~
for I lying within the kth time band., l/t. S (I) is defined to be equal to I for any time less than the smalleSI uncensored time. 1.1,. If the largest observed survival time is an uncellSC:ftd observation. then for all I greater than or equal to '(rto S (I) = o. Howeyer. if the: largest observed lime is a censored· observation. then the KapI~Mcicr estimate is DOl defined pastahis censored lime. Observe also thai the probability of 'sun'iving' remains corwtant throu&houl any ,iven interval. Finally. nacc that standard c:mJI5 based on a formula due to Greenwood can be atlac:hcd to the KapI~Mcicrestimates of the survival probabilities so as to rellecl Ihc: uncertainty inhcn:nt in them. The information obtained from the Kaplaa-Meier estimator can be displayed in tabular ronnat or graphically as a KAPLAN-MEIER PLOr. The second table illustrates the use of the Kapl~Mcicr estimator for the data in the lint table describing a hypolhc:tical study of 10 patients followed up for 36S days rrom cliagaosis or small~D lang CDDCer tocleath. if
Kaplan Meier estimator SUrvival times (days) 0110 sma/I-celliung ClllJCBr patients Sllrpi.VIIlimes ,diJy.rJ Stalus II dead; 0 tf!lUOrrd)
=
=
9 J
3S
o
40
o
6J 1
110 I
EtlqWDptIItM CDIfI,..." to M«Iirill Sltlliltil's: Secoffll EdiliD" Edired by BriIul S. Everila _ C 2011 Jaba wiley a: Sans, Ltd
151
212
I
I
Cbrilliapbu R.
284
o
30:5
J
365
o
P~IIIC!I
231
KAPLAN-MEIER PLOT
_____ eetIinaiar Kaplan .feier.IinNIte'oIthe SIII'IIVoTfuncIfcn tor"", _ _table llIInral k
..., n.
NumbIr lllrirk
".
I"
I 2 3
.
09-
10 10
63-
7 6 5
110-
.'1-
5 6' 7
.,. 0
I I I
.
212305-
"."..lIi1y '"JfIniiiItg
".,.." .-
«For
""0 2
0.900 0.157 0.833
I 1
0.750 G.5OO
0.193
G.386
11ac plot _ &om I (IOO'JL 01 subjects event, free) at tiIna 0 and _lines loWIIrds 0 (all subjects "'vec:spericllcccldleeWillolintaul) Wida incnUililliinc. II isploual uallcp IUncti... siace thesurviwl .......lity
bcIw. . sllCCClSiw (UIXlC.......) event limes ,_ oaIy . . . illllanlancously ,8D'J IiIiI: _ eWDl acaas.. The amph anly ,acacia 0 if Ibc: subject 'with Ihc: Ionpst abseIWId survival tillll: experieDDes an CftIIl: adIerwile, die suni_ curw is undctIaalllftcr dIis Ii.... Note Ihat ~ 0IISEIlVA1IDNS ocaniD& in the iD1cn'81 bclweca twa ~ cWIIllimes ~ DD c8'c:ct 011 the lIIn'ivaI ~ ability calculated ror dais iataval. . . will have an impact an Ibe calc:ulation of the surviyal ptababiiity ,81 the SbDt of the am i n ' " duouJh die IIUIIIber 81 risk.. 'I1Ie ftpn: ~ an caamplc or a 1Caphm-Meic:r c:ww:. ' 1'IIc ICapJaa-Mcicr c:arYC is die bat cIncriptiOn 0I1Iac: . . . . . CDIISIIIDt
A pap,1pr way or cIispIayia& ......
war inbmatian obIaincd l'aftlhe KAlUN-MEaESlDlA'lOR.
the Kajllaa-Mcicr plat ar aIn'e, is a ..... in which ... harizalltal axis, Ih: suniwd Car Cv.a) timaS .... die vcdical axis n:paesenls die surviyal (ar event-flee)
Ie"""
toO
G.IOO
8.0949 0.1442 0.1879 0.1_ 0.1732 0.1615
PRmMILIIIES..
BT
iIr"""
S(I)
'I 0.900 0.771 OM3 G.514
I
0IIIItt, 0.1003: II",.., IIITfiwldtlla ,..",t:h. 2DII editiaa. Bac:a RIIaa: 0Iapaiia a HaWCRC. KwpIM, .. L ad MIIir, P. 1958: ~ cslilDllliDD ina iIIcDaI)IIcte obscr.aiaIi. J~ 11/* Allrlrk. SItIlUliI.'tIJ• ..,ifIII5l. 457~J.
ICapIan-IIeIar plot
SttIIIIItInI
fll,-d~,
it has accuned. 1bc. IIObIIioD ("II .IIDb:S die DUIIIba 01 censamcl absa .tions ift die Idh time band. Obsene in tile sec:aacI table . . . . . ~ i(l) in die Ath iaIcmIJ is abllIiIIed by malliplyil1l' tile cslilllllll: 01 Ihc canclilional probability aluined in abe IcIh iaIawJ 10," CIIil8lllc 01 ~(') iII,lba (k-I)Ib intcnal. For fuIIIIer details ICC CoIIdt (2003).
~
0 0 0 I I
1
2
Ctlll/li,. .
cMMIMI
d,
given iJ ihe
sunift) c:spaicllce ofa .....cneoas paup of'~s in a
.,••, , . '
'I.
I
0.8 \. \,
"
•
;~, .~
' '"..,....
0.8
I I
.\
.
I
;\,'\.
.....\
..
.... -----------~-'\' -~-----------------------·r.:'. \ \, ',.
I~
~,
,
," I
A.
I I I
0.2 .
',."
,~
.\.~,
.. -, ....
',-_._ I_... \.. ..
\ ..._ .._ .......... -.-
,:~-""''''''''''''''1010-010 ..,'-.-."- .........- ..........
' .., '., ...... ~
II
Medan lin. -= 352 :
0..0
":.'''-'
''''-..
I
~----~~----~~----~~----~------~
o
200
400
'800
1000
SUrvival tIIMs'. days
Kaplan ....r plot KIlpItIn-MeiercutWl'" ~ CtJtfIIdIInoe bands. The Iick . .des CDlRJspont/ to Ihe ot:t:UmJnCII of c.wJSOIing
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ KAPPA AND WEIGHTED KAPPA
study. The n:ading or the CUI"VC is slnligiatrorwani. For example. if the estimated MEDIAN survival time (see SUlVIVAL ANALYSIS - AN OVERVIEW) is to be n:ported (i.e. the time beyond which 5~ orlhc: subjc:cts in the saudy IR expected to survive or be event I'n:e).lhis can be round by extending a horizonlal line from Ihc survival probability of0.5 on Ihc: yertical axis. to the point when: the curve and this line: iatc:rsc:ct. and then druppilll a vertical line: to the time axis. 111c: estimated median time is Ihc point at whk'h Ihc vertical line cuts the time axis. This is iIIu5lndcd in the figure. Ir. howC\"Cr. the survival curve is horizontal at the probability 0.5. then no uaique value can be idcatified ror the median survival time. In this situation. a n:asonable estimate of the median lime is the midpoint or Ihc time interval oyer which the sun'ival curve has a probability or o.s. Note that the surviyal curve shows the ptlltern D/morIQ/ily (ir death is the outcome or intcn:st) over lime and not the details. Thus any conclusions made based on the finu details of lhc: curve are likely to be inaccundc. In particular. ir the "tail' or the survival curve is "Oat'. thea Ihis docs not mean that the risk to subjects slill alive is aonexistent (i.e. eYidence of a "cu~ rl1lCtion' or patients). la fact. this may occur because Ihc number of subjects under observation alolII the tail is small and Ihcrerore n:liable estimates or the survival probabilities along it are not obtained.. Also. cbaslic drops and lillie Rat sections in OIher parts or Ihc curve may be due to a lillie proportion of censored observations and may indicate inappropriate censoring. Hence lhc: survival curve is unn:liablc if based on small numbers at risk. Thc:rc is also a natural tendency ror the eye to be drawn to the right-hand end or the curve w~ it is least reliable. Thererore it is wise nOito place too much conftdc:ncc in the finer delails of lhc: curve. unless thcrc is a valid n:ason (based possibly on prior knowledge) to do so. The ovcraJl picture is man: reliable. Poc:oc:k. Clayton and Altman (2002) advocate placing confidence bands (or standard enor ban) at a rew regularly spaced lime points on Ihc curve in order to highlight Ihc: uncenainty in the estimated survival probabilities. In addition. they n:commend that the number at risk at select lime points should also be displayed. They further discuss whether 'survival curves' should go up or down (for it is sensible when events are I'DI'C to consider plotting the event rate instead of lhc: event-free rate) and how rar in lime to extend the plot. Finally. note thai presenting more than one surviyal curve on a single diqram (e.g. curves ror bated and uDtreated) is a userul way ofinformally comparilll the sUJVival experiences or diffen:nt groups or homogeneous subjects and can be \'CIy informative. However. this diagram alone will not allow us to say with any aJllfidence whether or not lhere arc any rcal ditrcn:nc:es between these puups. The observed differences may be true: differences. but equally could be due men:ly to chance variation. Assessing whether or nOi there IR any rcal
differences between groups can only be done. with any depee or confidence. by utilising formal statislic:aI ICsts. such as Ihc: log rank test (AnnilDge. Beny and Matthews. 20(2). For rurther details see Matthews and Farewell (1996) and Coneu (2003). BT Ana...... P., BIII7, G._ M.atIans, J. N. S. 2002: Stalistical rnellrotls in mediUJI remm:IJ. 4Ih edition. Oxford: Blackwell Scimc:e. Collett. D. 2003: Modelling slU,iraldota in mt!tIiraJ rrlttlTck 2nd edition. London: Chapman &: HalIlCRC. MattHn, D. 8. _ Fuewnll, V. T. 1996: Using ud untIerstanJing metliraislaliltitJ. 3rd edition. New York: Karrer. Pococ:Ic, S. J.. CIII)1aa. T. C. _ Altman, D. G. 2002: Sun-iwl pldS of tin»to-eYcnl CJUtQomes in tliDicailrials: good practice and pitralls. The i.mKeI3S9. 1686-9.
kappa and weighted kappa
Rating scales are commonly used ror assessment or subjective yariables. such as well-being. pain and satisfaction. Experts use scales in diagnostic judgements concerning. for example. the severity or disease and in the classification of physical. mental and social slDlUs. How reliable arc judgements on scales? Reli· ability expresses the extent to which repeated uscssments yield similar results. 1bc Cohen's coemcient kappa (K) is an agn:ement meas~ that takes into account the fact that two assessments could 8Il'CC by chance (Cohen. 1960). Kappa is defined as Ihc: mio between the pen:entage agn:cment (p.J after excludilll the chance expected agreement (P,,) and the chancC-COll'CCtcd maximum possible 8Il'CCmcnt: K
= (Po -Pe)/(I-Pe)
POssible values nnge bc:Iween -I and 1. where the agn:ement that could occur by chance is ZCIU kappa and a higher ap1Xment than the: chance cxpc:ctc:d will yield a positive value. Kappa was developed ror CtllegDrical c/tJSsijiCQlions. where a disagreement to any otheraategory is equally likely. but is commonly used ror on/ered ctllegoriCtlI (ordintll) classifications. where disa~ng classifications c:oncern adjaeenl rather than extreme aalegories. In the weighted kappa ("'IF) an observations an: weighted and included in Ihc calculation. Cicchetti (1976) proposed linear agreement weights: maximum weight (= I) for observations of tolal agreement and minimum weight (= 0) for the most extn:mc: disagreeing observations. For example. observations n:~ senting a disagreement or one. two and thn:c categories between two four-point scale assessments IR weighted 213. 113 and 0 n:spcctively. A SCI of disagreement weights will have the opposite order. 1'bc: K ... with quadratic disagn:c:mcnt weights equals the INI1Wl.ASS CCIUIEI..AlJlN CX&F1CIENI'. provided there arc equal ~nal distributions (unbiased nders). There arc limitations with kappa. 1bc maximum kappa. K = I. is obtainable only when unbiased ralers agn:e completely. Kappa depends on lhc: marginal distributions and on the number of categories; kappa increases when lhc:
233
U
A
T
S
'
~
D
N
E
K
___________________________________________________________
Expert X (ar A BCD
28 28
D
> c
i
Total
Expert X (b)' A BCD Total
6
B
6
6
6
1
3
30 34
12
1
1
7
3 12
12
1
6
1
1
9
4
3
7
2
1
1
Total 4
9
12 34 59
4
9
12 34 59
A
4
kappa and weighted kIIppa Two examples of paired distributions of assessments made by expetts X and Y when cIassiIying 59 subjects in the ordered-scale categoIies A, B. C and D number o( categories decreases. ThcrefCR. kappa values Ii'om dilTtRnt sbldics are DOl mmparable and using rules for interpretation saying lhat kappa larger than 0.6 represents goad agreement is not meaningful. The depends on the choice of wei;hts (Maclu~ and Willcu. 1987: Allman. 2000). 1\vo disa&reemcnl patterns inspin:d by data (rom a reliability sludy in ncurondiology ilIustrale the limitations of a sunamaJY measure of ~liabililY. Two experts. X and Y. indcpcndcally judge:d 59 objects an a four-point scale. he~ labelled A. B. C and D. Two hypothetical frequency distributions of the: pairs of judgements ~ pvcn in the figures (parts (a) and (b). The disagreement patlcmS differ. but the perccatage agrc:c:mcats are similar: 75CJt (44 of 59) in
"'w
=
"'w =
(Sec .Iso MATCHED PAIRS AmLYSlS. ORDJNM. DATA. IWJABUI'Y. RATING SCALES)
AItn.D, D. G. 2000: PrtJt:litlli SlQtistits/tH" metlitalre.rearclJ. Boca Raton: Cllaprnan &: HaDfCRC. CII:daeUI, D. V. 1976: .Assessing intcr-nlc:r reliability for lllilll scales: n:soIving some basic issues. BlitUlr JounraJ 0/ PS)"t:'IUaIl'1 129, 4S2~. C. . . . J. 1960: A cocflicienl of ~ for DDmiIIaI scales. EductlliDIIDI and Ps,do/olic.'01 Metmnmtnt 20, 37-46. M.adare, M. and WIDeU. \V. C. 1987: Misinlcqln:lalion and misuse of the kappa stalistic. Ammtrlll Joumtll of EpitltrtJioJogy 126. 161-9. SYeassoa. &. ~ J.-E.. nboba, s., v. Ean, C. aad JaIaaassoa, A. 1996: Analysis of inter-obsena diS8ll=llent in the assessmeal of subarachnoid blood and acuIc hydruccpbaJus on cr SCIIDI. Neuro10litlll Remur:IJ 18.487-94.
Kendall's tau
Sec CORRElATION
kemel density estimation
Sec DENSItY ESmtA'I'ION
k-nearest neighbour rule (kNN)
Sec DISCRIMINANT
RJNcrION ANALYSIS
knowledge discovery In databases (KDD) Synonym for data mining. Sec also DATA MININO IN tdEDICC'm
knowledge-baaed systems
Synonym far expclt
syslcms. See E.XPEJn" SYSlEMS Koimogorov~mlrnov teat This is a nonparamctric. goadnesS-Of-fil test Ibat takes two diffcn:nt forms, Ihc one-sample and the lwo-sample form. The one-sample test compares the ob&crvcd cumulati,,-e dislribution function with a thco~lic:al cumulative distribution (unclion, e.g. the NORMAL DlSmBU11ON. The lwo-sample lest com~ two observed cumulati,,-e distribution functions with each olher. 'I1Ie Kolmogorov-Smirnov lest assumes thai the theoretical distribution is completely specified. i.e. for each observed value. Ibe value of the theoretical distribulion can be calculated. 1bc panunctcrs (e.g. the MEAN) of the thCCRtical dislribUlion should be specified in advance and DOl derived from the data. For the on~ple case the NUll. HYPOIlIESIS islhat the~ is no dilTerence bctwe:ca the obscrvc:d and the the0retical distribulion. For Ihc two-sample case. the: null hypothesis is that there is no diffcn:acc: bctwccn the two distributioas. The alternative hypothesis can be dmtional or nondirectional. For the onc-samplc case. the method begins by ordering the obscl'\'alions from smaJlcsllO IargesL 1'bcn calculalc the empirical distribution function S(.t'). which is the proportion ofobsc:mdions lesslhan or equal to x. Calculate the value of the theoretical cumulaliYC dislribution function. F(x). Calculate the test slatistic:
T
= maxIF(.t') -
S(.t')1
KO~ORO~~NOVT8ST
Kolmogorov-8mlinov teat Raw data, expei;fed cumulative dislrlRltion IuncfIon, F(x), here nomJaJ with tneflIf 3.5, stBndard dfwilllion 1.75, empidclll disltfbution ~ S(x), thefr diIferencs and absolute difference Jknt-2, ."
Fl·")
st.",
(F(x)-S(x)J
0.54 0.97 1.27 1.99 2.02 2.42 2.B9 3.01 l.13 3..26
0.045 0.074 0.101 0.194 0.199 0.209 0.364 0.39 0.416 0.445 0.521 0.645 0.649 0.67 0.676 0.696 0.7S 0.888 0.968 0.975
OM
0.045 0.G24 O.ex)) 0.G44 -0.001 0.019 0.G64 0.04 0.016 -o.cJ05 0.021 0.G95 0JM9 0.02 -o.G24 -0.054 -o.os
3.59
4.IS 4.17 4.27 4.3 4.4 4.68 5.63 6.73 6.94
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0~75
0.1 0.15 0.9 0.95 1
and compare this value to saancIanl tabJc:S. This is D lwo-siclecl lest with the alleI'nIIlive hypolhesis:
rF s(.,,)
'I1ae are two altemalive one-sided tests and abe hypalilescs for Ibese an: F(.,,) < S(x) and Sex) :$ 1'(.,,). willa test sbdislics TI =
DIU
(3(x) - F(x)] and T2 =
DIU
[F(x) - S(.,,)]
RlSpeCtively. Parlhc lwo-samplecase, calculatc S(x) forCKb gmupand lhe test statistic is T max IS.(.,,) ..S~x)l; this am be compuaI to Ihc labIesand is. IWCHidccllcsL Faraaae-siclcd IesL
=
"xl)
0.005 0.026 0.049 0.006
Istx) - Fl·")!
0.068 0.025
0.031 -0.014 0.01 0.034 0.055 0.019 -0.045 0.001 0.03 0.074 0.104 0.1 0.012 -0.018 0.025
O.ODS 0.026 0.049 0.006 0.051 0.031 0.014 0.01 0.034 0.055 0.029 0.045· 0.001 0.03 0.074 0.104 0.1 0.012· 0.018 0.025
0.G95
0.104
0.104
o.OSI
O.oJl
Maximum
F(x)
l5(x) -
the fest stalislic:s an:: Tl
= II1U lSI (x) -Sl(X)] and T2 = DIU [Sl(X) -5.(.,,)).
dependiq on which disaribution is expcctecitD be pelllc:l'. As an ~...ple, Mcm-2 values an: colleclcd ia a study. II is thau&hllhallhcy come Iium a IIOI11UIl distributiOn with mean 3.5 and STANIWtD DE.VL\~ 1.15. 'I1Ie Kolmogonw-5min1oY onc:.-sample k:sl will be used to tell allis hypolbesil. ThemfeR, F(z) iscalcalalcd fmm D normal distribulion with mean 3.5 and·sIandanI deviation 1.15 (sec Ihc 8nllable). The teslsbdislic. is shown in thc ...... row oflhcftrst lable as lhe IMXimal value ofthc final column f~ the two-sideclasc. willi T= max 11'(.,,) -S(x)1 =:0.104. Camparins T=O.I041O
KoImagorov-Smlrnov teat Frequencydistribulions and empilfclll dIsfIj)ution funclions, IheIt differences and IlbsoIuIe diIIetencB
Cal.,,,,, I 2 3 4
5 6 Muimum
Ir.2
S,t.",
SJ(x}
(S,(x) - ~x'i
(S2(x) - S,(x}1
ISdx) - SJ(x)1
14
3~
43
45 5 10 14 4
O.otiO 0.245 0.506 0.631 0.700 1.000
0.310 0.708 0.752 0.841 0.965
0.250 0.463 0.246 0.210 0.265 0.000
-0.250 -0.463 -0.246 -0.210 -0.265
0.000
0.250 0.463 0.246 0.210 0.265 0.000
0.463
0.000
0.463
II,
61. 29 16 70
1.000
2SS
KRUSKAL-WAlU8~
_________________________________________________________
tables (n = 2O~ a = 0.05) gives Ihe value 0.294. As 0.104 < 0.294, then: is insufJicient evidence to Rlject lhc null hypothesis. T~fon: it caa be concluded dull it is plausible for these daIa 10 have come Iiom a nonnal dislribution with mean 305 and standard clevialion 1.7S. Data far a two-sampie example: classified into six categories an: shown in the second table:.. with: s~
T= muISI(x)-~(x)1 =0.463 As. Ihe sample size is too large far Ihe lables use the fannula for a == 0.05, which is:
t
12
., n;(fl; - k)-
H = --.,;~___~-
whc= nl = the numbCl'ofobsc:rvations in IIOUP ; and N =die total sample size. Compare H to the critical yalue of Ihe C1II-SQVAREmmtlB1J'IIOX wilh k-I DECIlEESOF fRfEDOM. Rejc:cl the NUlL JIYIIO'I1IES1S if H is billCf than the critical value. To illUSIraII:. suppose Ihat. wilhiD a study. • is an impartaat predictar and so whethu ~ is a difference in age between three IIOUps needs to be decided. ~c Sles an: shown in die table. as weD as the ranks for each observation and the sum of lhc ranks and Ihe aVCIBCe nmIt in each gRJUp. Kruaka~WIIIII.test AgesandtheiroveraJlranks within
Ihtee groups As. 0.463 > 1S6lhere issufllel t evidence to Mjecllhe null hypothesis dull the dislributions an: the same: therefon: it is CXJIICIudc:d Ihat the distributions an: different. FuJther infonnalion, includi~ SWKlanllables. is available in textbooks such as Pdt (1997). Siegel and Caslcllan (1998) and Conover (1999) 01' in die softwan: manual for StaIXact (Mehta and PaIcJ, 1998). SLV
GTDIIp I
SS SI dO 34
4S SS
J.
o-ner, \Y. J. 1999: Practit4/ "tI1IpIUQ1fWtri~ "misties, Jrd cditiaD. Chichester: Wiley a: SCIIS. MIlD, c . . . PattI, N. 1991: Sttll-X«t 4 for Wbrdows ,.,. mtlIIUQt. CytcI SoftwIR CarponIioa. Nt.M. A. 1997: Nonfltll'tlwwtric JlalU';csp hm/lhaw rexarr:h. Thousaad Oaks: Sap. SIeieI, S. ad C. . .·n. N. J. 1991: NoIrpartllrletrit Rtltimu ftW tht! btlrttriDlll'lll Mie,,~J. 2nd cditian. MIIidaItad: McOraw-HiIi. Kruskal-Wallia teat This nonparamelric method is the extension of the MANN-WIII1'NEY RANK SlJ),[ TEST to ~ than two IlOUps. It is more sensilive than the MEDIA.'I 'll!Sl'1IS it uses the magnitude of the differenees radaer thaa the din:ction.lt is less sensiliye than die JONCKHEERE-TERPSTRA 'IBT if the poups have aa inhereDt order. as Ihe KrusblWallis test looks far a difference between IrouPS and the Jonckheen:-Terpstra lest looks for an increase OVeI' the groups. The Kruskal-Wallis lest is derived from one-way ANALYSIS OF VARIANCE but uses ranks rathel' thaa the aclual observations. It tells if k groups arc drawn from popuIalions with Ihe same median and assumes that the data an: a randomly selected set of observations and thai the daIa are continuous in nature and in mOM thaa 1W0 grvups. It also assumes an independence of Ihe groups and observations within a group and that the groups have similarly shaped distributions. To use the 1esI. rank Ihe whole sample Iiom Ihe smallest to the largest. Calculate die sum of the ranks in each JIOUp. the average raak in each lroup. ii, ancIlhe avel'8le rank for the whole samp~ R. Calculate Ihe test SIalistic H:
SI dO 34
4S Raak sum (averqe)
RIInk
GrDllp2
RIIIIk
GrtlllpJ
18 13 27.5 105 7.5 18 13 27.5 1.5 7.5 135 (13.5)
59 40
2S.5 3.5
Sl
ISo5
62
48
II 20 6
55
29 18
S8
23 23
56 44 42
S
S8
23
61
30 2S.5 3.5 IS] (lS.3)
59 40
sa
57 46 SI
53 46
RIIIrk
21 9.5 13 IS.5 9.5 177
(J7.7)
Flum stanclanltables. the critical value orlhe chi-square distribution wilh 2 deJR:CS of rn:edom is S.99, I.lS < S.99. ThererOM Ihere is not sufficient evidence to reject die null hypolhcsis that the mediaa SIC is the same in Ihe three
glOUps. It = N + 1 = 30 T I = 15.5 2
H=
=
2
12
",(.,_R)2 H(NT I)
12 x (10)( (13.5-15.5)2 + 10 X (15.3-15.5)2 T 10)( (17.7-15.5):1) .10)( (lOT I)
= 1.15 For further details. refer to texts by Pell (1997), SieFI and OIstellan (l99B) or Caaover (1999). SLY (See also MEDIAN 'JI!STI
Caaowr. w. J. 1999: "'tlclkat ftonptlrGmetr;t Jltllisl;~s, 3rd editiOD. Chicheslcr: JohD Wilc), a: SoDs. Ltd. Peat, M. A. 1997: No,,-pllrtRMtri~s lor betlltht:tlre re.arrlt. 11aausand Oaks: Sagc. S....., S. ad Caste..... N. J. 1998: NtIIIJ1tIrtlllfe/ri~ sltliUliu for tb~ 6ebtniouttll Jtim~s. 2ad edition. Mlideaheaci: McGraw-Hili.
________________________________________________________________ KURJOSS
(b)
- -
-
-
-- -
f--
r-
-
_ -
f-
-
n-f kurtosis
This is a tmn Ihal describes ·pcalcedneu' or a
FREQUENCY DISI'RIIU'JION or a FIIOBABIIJ1'Y DIS'I1UBtmDN. Data
thai have high kurtosis ba~ a sharp peak and decline rapidly at cilher side to leave lonllails (part (a) in Ihe ligun:). Data with low kurtosis arc 811lter (part (b) in the ligun:). Mathematical measun:s of kunosis can be used to describe distributions. The kUdosis or. NORMAL DlSlRlBunON
is 3 and of a
r--k,
tJNJR)JtM. DISl'RIIIU1IOX
is 1.8. However.
some fonnulae for kW1asis subtracl 3. 50 Ihal the kwtosis of a normal dislribulioa is 0 and ibid or a unirorm distribulion is -1.2. SRC (Sec also DESCRIPI1\'E STA11S11CS]
L L'A.... plots
lead time bl.. Sec BIAS., satEENINO S'I1IDIES
See FOREST IIIDI'
I.rge .Imple btal (LST)
Sec MEGA-TRIAL
"........-yoll1lO deslg..
latent v.rlables
These an: wriables dial Cllllllot be mc:asural clin:cdy bul arc assumed to be ft'lated to a number of observable OJ'manifest Yllriablcs. Consider. rorexample:. a CODCCplsuchu 'nciallRjuclice' .Clcarly,clitecl measuranc:nl oflhisCODCCpI is DOl possible: however. one could. forexamPie:. obsc:m: wllc:thcr a pcnaa approyCS ar disapprovcs of a panicular piccc ofgcwcrmncnllc:gisllllion clcsignc:d 10 achie~ a gradCl'cIcpec ofmcial equality. whether hearshc numbers mcmbeisorlOllleclhnicminorityalDlJftghisorherfricnclsanci acquaintanccs. c:tc•• and assume thatlhc:se are. in some sense. incliclIlon of the: IIIlR fundamental variable:. neial prejuctlCC. Although latent variables are the basis of NCIOR ANALYSIS and STlWCT1JIW. EQUA1ION MODElS. scepticism about methods based 011 such variables (paniculady factor analysis) has not been uncommon 11l11OIII slalisticians oycr the yc:an. Lalenl Yariablc modelling has oRen bcca viewed as a dubious exclCisc fn.....t with unveriOabie assumptions and naive infcft'nc:a ft'pnling causality. For such crilics the only thing in favourorlalenl Yariable models isthatlhcy occupy a nther obsclft an:a of statistics, primarily confined 10 psychometrics. Thc:n: are. hIM'C\ICI", a number of reasons that such criticisms DR mlSlakcn: iporing lalenl variables often implies sllVngcr assumplions lIuan including them. lalent variable modelling then being vicwc:cl as a sealilivily analysis of a simplei' analysis excluding latent variables: many of the assumptions in latent Yariable modelling rtIII be empirically assc:ssc:d and some can be R7laxccl: lalcnl wriable madelling pc:muIcs macIcm mainslleam statistics and arc widely used in difl'en:nt disciplinc:s - not only mccIicinc but also economics. enginc:ering. psychology, geography. IIIIII'kcling and biology (sec Skrondal and Rabo-Heskdh.. 2(04). This 'omnipracnc:c:' of lalcal variables is commonly not n:cognised. perbaps because IIIlCDI variables arc given diffel'enl names in diffcft'nt alaS of application. e.g. RANDOM UFEC'I'S. common raclors and lalenl classes. In faet, lalenl Yariablescan beuscclto R7pn:sent a varicty ofphenomcna. e.g. true variablc:s measun:d with enor. hypothclical COallrUcls. unobserved heterogeneity. lalent n:sponses underlying categorical \'ariables and missing daIa. to name but a few. SSE (Sec also S11WCI1JRAL EQlJA1IONS MODEU)
SkraDdaI. A..
and ............... S. 2001: Gellmlli:ed Ioltnl ,1lriIIbk modr/ing. Boca Ram: Chapmm a: HaUICRC.
£rtqdtlflllltlk CtMIjMIIiM
'0 II_KG' SlalirliD.: S«tMII Edit_
oJ> 1011 JohD Waley a So& ....
....t aqu.... eatlm.lIon
Sc:e Il\TA-IEFENDENJ'D6SI(ftS
1his is apnc:ral method rarcslimating ft'~sion paramc:tcn in a modcJ ror cxpc:ctc:d aulcomcs (e.g. a linear R71n:ssion model). Parameter Yalues arc chosca to minimise the sum or squaRd diffcn:nce5 belween the observed and expected outcomes..l..cast sqUIRS cSlimali_ is, ror iallanc:e.1hc slandanl method ofcstimation in MUL~ IAEAR REGllESSION. whem obsc:rvali_s )" DR approximalc:cl by a linear function or CXpJanalory wriables Kit c.g. /l,Xt. +- ••• of. fJlt-~. with , •••••• iJlt. unknown. When Ihe modcJ is linear and OUlC'Gmc:s 1ft indepc:ndc:nl and nonnaIIy dislributed willi constanl VARIANCE around their cxpc:clalion. lhe method coincides willa MAXlM"'t UICELIHOOO EmMATION and is cflicicnl in large samplcs, i.c. yiclds Iftcisc cstimatan. M~ genc:rally. least sqUIRS cslimatOlS (LSEs) can be studied ror any type of outa:Jlnc distribution around die: ft'p:ssi_ curve. They an: also use....1 outside the context of statistical madcJs. Least squan:s lines DR. ror inslanc:c.. typically dnwn as the best Illi. line duoup a claad of points. This is ilJusIraIc:d in the: IIpn: (sec page 240). whc:n: we depict the: LSBora linc:arft'p:ssiCIII curvc ofbacly weighl (in kg) asa runctionofbod)' height (incm) bascdon a random sample or 250 American men (Penrmc. Nelson and Fisher. 1985). the CUI'YC is obtained by minimising the sum of squared dislanc:es ~ O\lCl" all observalioal. It contains Ihc point wlKft the sampJo-avcnq;cc1 body weipt is pn:dictcd rar men of average body heighL Within the context of a ft'p:ssiCIII model, iRdicliCIIIS on the estimated ft'gn:ssi_ curvc arc mast precise at the centn: or the cIaIa. but ma)' be iJDIRCisc towards the lails. ~lrapolalioal beyond the rangc of the dala often cannat be trusted. Least squan:s cslilDldOlSIR attradi,'c clue 10 their sIJong intuitive appeal. the stability and efftcicncy of the algorithm and their souad S1Atislical propcdies in large samples. Undc:r mild ft'lularityconditions.1arge samples yield an US thai is subject to normal variation around the true paramc:ler. A 9SCJ, CONfIDENCE INl'ERVAL then bc:c:omcs the LSE plus OJ' minus 1.96 times the STANlMRD ER_. When aulconIC variation is known 10 differ between o~ sc:rvali_a. a m~ pRlCisecSlimalor is obIainccIby a weighted Ic:ast squares estimator (WLSE). which minilllisc:s a sum of ",Yiglrled squared difl'cmac:cs between obscrYc:d and
Yiell by Briaa So Everitt and Chris. . . . R. P'IIIIaer
~VE~&OUTCRO~VALIDA~ON
_________________________________________________
• 110
11 .¥
-loS;
90
1>-
I
70
50
• • • ••• •• • • •• • • • •••• • •• •••••••• •• ••••1• • •• • ••• •••• • •• ... . I• •• • • •• • • • •••• • • •• •• ••• • • • I · ••••• •• • •••• •• •• ••••• • • • ••• • • •• • •• •
•
•
•
........ , I··
'.. ··1-· •· ... -..
180
170
.-
e
•
•
190
180 Body height (In em)
..... IIqU1U88 ....rnaUOn ScIIltetpIot and IelJst squ8f8S I8f1I8sslon Ine of body
wetJhI WJISUS body helght In •
tandom sample d 250 AmeIiaIn men
expc:c:ted outcomes. When the model is linear and the autcOllles are independent, wei"'ls chosen as the invene or the individual variances yield the most eflcienl estimator•. When. furtllcllDlR. the autcomes are nonnally disbibulcd. this WLSE then also coincide~ with the maximum likelihood emmale (MLE). In ather instances.. e.g. with logistic rq:n:ssiOR and many oIhcr ICncnJiscd linear models. thcoptimal weights depend on the larget paramc:1Cr. Because the laller is unknowD. a mclhad or ilcnllively Iewcighlccllcast squares is usually applied. wbcleby wciJbts in each itcraliOa depcacl OR previous ellimalcs for the uaknown plllBDlClCr. Ewn lhaugh the LSE may be less pm:Uc than the MLE uadcI- certain models. Ie... sqU8leS estimation is popular because it does Dot requile a cOmplete description or the sampling distribution of the observed data. Far example. it is DOl necessary 10 assume thai OUlalmcs are IIDIIII8Ily distributed to derive least squares estimates of the unkaown rqn:ssiOD. cocOicienlS in the MUI.TIIU UN!AIl REORESSION model. n.c model of inlerest for the means is "all thai is nccdccI. The LSE is then:fCR immuac to misspccificatioa or the sampling distribution, unlil£c the MLE.
Further modifications or leall squares esti.....oa "ve been devised to adUe\le additional goals. For inSlaJlcc. to enhaace mbuslDCls looutlyingabscmdiaas. LI-ftlCn:sUon is based on minimisiDg the avenge absolute deviation between the observations and the regressioa function. -SVIEG
PalnR.K. W.. ~A.G.adt1llaer.A.G. J9I5:Gcncrllizcd body campasitian pmlidian far men usiag simple mcasurancat techniques. Medidlre IIIIIl Stant in S"",; _ Mm;dlle 17, 189.
leave-ontHKd cro••vallclatlon
Sec DlSCIIMlNANT
RnD10N ANALYSIS
This rorm of samplilll arises when items an: sampled iD proportion to their values on some variable ofiDterest., e.g. a sampling scheme based on the number or palical visits;. A BIAS may be inlnKluccd because some indiYicluals 8M I1IIR likely to be scJcclcd than others simply because they make man: rrequcnl visilS. 1bc pmblem arises in SCREENDiJ SIUDIES. where die sample or cases clelCCled is likely 10 contain an excess of slow-groWing CIIIICCn eampan:d to the sample diapascd positive because orlhcir symptoms. If Icnglh-biascd sampliDl is ipamI. die
length-biased ampllna
_____________________________________________________________
eslillllllC oflhc: true population MI!AM CaD be gn:aaly inftatccl. An example of length-biased sampling is described in 0.vidOY and Zelcm (2001) in the eantext or the assc:ssment or familial risk of diseuc based on Rrenml databases in wllich the lqer the ramily,1hc: paler the prabability of finding the family in the cIaIabse.
sse
(See
aI.
BIAS. BIAS IN OBSERVATIONAL STUDIES, SAMPUNO MEJ1IODS - AX OVERVIEW]
DMidoY,O.aad ZIIa, M. 2001: Rcrmnt sampliq. family bbtaIy aad rcllIiYc risk: die rule mIeqth-biucd sanapJiII& BiGJttlliJIks 2, 173-11.
Leva...•• teat nus is used to lest whdher lwo or I11CR paups have equal VAIIAKCE. The N1JLL JIYIIOI1IE1IS slales that the \'arianc:e of all puups is equal: the altc:malive hypothesis slates Ibat Ihc: variances 1ft unequal for at Ic:ast one pair. Equal varillllCC or two or mon: paups is a rRlqucnl uswnplion for panIII1t:bic als., ANALYSIS OF VARLUICE for eumple. and. Levene's tell can be used to verify this assumption. Levene~s test is relllliftl), simple and robust to departun:s from normaIit)'. To perfonn Lcvenc~s leSt we begin by ftncIilll. ror each paup,.1hc: absolute diffen:nces betwc:c:a the observed values and Ihe MEDIAN. MEAN or IriIlUlled IDI:DD. The poups ru:c:cI DOl beofcqual size. WlleIIaIO use the median. ..... or IriIl1llled mean depcncIs on the underI)'i1ll dislribuli_ or the dalL Ir thedala~ syllllDCtric and naodc:rate tailed the mean pmvidc:s the best power. usilll the IIimmed meaD performs best if the dada ~ heav)' lailed and the median performs best if the data ~ slccwed. However. using the median provides JGOCI IOBUS'INESS forma), l)'pes ofnonnannal data while maining load power (Wilcox. 1998). We complete Lc:wae's test by perfanning an anal,sjs of variance oa these absoIulC differences (sec the table).
Levene·. test Data from sppIyIng Leveners test on tIl,. diffelent treatment lJIOUPS resulting
GrrNIp AbJDIu~ OlDllP All_I, Group Abso/ul, 1 .r',,"6 2 tliffenn"" J diJfnen~.s
24
5.5 4.5 0.5
22 21 18
o.s
25
18 22
23
4.5 12.5 13.5 2.5 4.5
10
1.5
23
19 18 14 6 S 21 MIdIIUI JU
16
17 26 23 21.5
0.5 0.5 3.5 3.5 3.5 0.5 5.5 4.5 4.5 1.5
8 12 14 IS
12 13 7 14 13 15 13
~8DIABRAM
For example. rar In:aImenI 1 a score or 23 Men by 4.5 paints frum the median, while a SCDR or 14 alsa dilras by 4.5 pailllS fmm the median. The icIea is "Ill the larpI' the differences iD same paups compan:lli to aIhen.1he IIICR the spradancLhencc. the IIKIR likely it will be thaa the variliace in the papullllians, fIom which Ihe)' arose. is DOl the same. 1"bcRf4R. a OIH'-K-Yl)' tIIItI/ysUD/WlTillllCfl on these diff~s willlCSl this. This resulls iD an F-statislic: or 4.21 with aD associaled Pvalue or 0.026 so we rejecl the null hypalhesis that die: \I8riaace of the pvups is equal. More details can be rauncl ia Brown and Forsythe (1974). MMB
B....... M. .... Pon,.., A. 1974: Rabut tats far the equality of variances. JDurntlID/ the Ammtllll Stlltistklll AlstJriGlilJll69. 346. 364-7. Wllats,R.I998:1"rimaaiIIg .... WinsariSllian.laAnai. . p. GIld CaIIaa.. T. (cds). Ellcydo/Mdill of bionlltislic6. Chichester: Joha Wiley a: SoDs. lJd.
Lexl. diagram This is a dc:scriptive taoI used ia epidemiOlogy ad cIemopaph)" being a plot of individuals in a 1I1Icl)' an two limeacales simuilaDc:ausly. These timesc:alesan: masl CDIIIIIIOnly calendar lime and age. Each individual is dlen n:presenleci by a diagonal Jine: 81450 10 each axis, which begins at the caleadar lime and. at CDmbnent and ends at the ca1cndar time and age at the ewat of intcn:st (e.g. death) ar cellSClliD&As an example. the table shows the year of birth, BlC at earolment and age 81 delllhlcelLlGl'lng of f'aur individuals earolled ia a saudy thai began ia 1975 and ended with followup in 2004. The: conapolldin& l.aisdiapam is shown in Ihc: filaR (see page 242). AdeaIh isshowa by a fiUcclem:lc and a censoring evcnl by aD empty cin::1e One usc or Ihe LeU! dillJlDlll ~Iales 10 c:stimalilll. _ adjuslilll rar. the effects of age and calendar lime aD die: martalit)' (_ marbidily) nile. In lhis applicalion• • is divided into (e.g. 5-year) bands and ealendartime is divided iato (C.I. S-year) periads..ll is IISSUmed that the martality rille (_ baseline IIIUIlaIit)' rate) is piecewise CDDSlanl, i.e. is' constant _ each combinlllion of lIIe band and eaIendar period. To estimate the mart_ty rate in each of these bancl-pc:riod combinatiaDs. it is IIClCeSSBIY to calculate Ihe DUmber or dc:aIhs and talallime at risk in each band period.
5 I I 2 I 0
Lexls d ...... Stuc(y data lor four individuals
Im/i,itlual
Birtll
Age III
)Y!tU
mrDlmmI
SIll""
_III Dr mIM1I'irr,
6
I 0 2
A,eotll
A
B C D
1940 1951 1954 1960
sa
30
Died Ccascml Died
21
Ccnscnd
30
3S 24
S3 48
241
UFE~ANCv
___________________________________________________________
. . . ..c8IeaclIr period 6',em later. iD 2000......... 2.
_y
yauB ..... in~.11Iebxia ...... l.eaenlalian . . . . it 10. .,c,. wIIea lie chaaps fran one .....paiod Ia
.1DIJdIar: Ibis.bappenawllene~ his di&&analliae CRIISCS • ~_ yatica) Jine..lildividuaiCCOIIIribuIes6.4,6arid 2 yean ~ risk icspecaiWly. to fo'ur INInd-periodsDl • cIcaIh
50
MIll to ~ .... artbc:& .'
____ 1 __ _ 1 1
1 ,I
1 1
- --- ----·-·-r--.
30
·1 I 1
.i
i . 1 1
1
'I
I
I
2O~~~----t~MO~'-----~m~.~D~ Calendar , .
VIri~ ~ dais simple Lcxk diacimn Dilly be Obtained by chaiJciDi the Ii. . . . . . l'Cpn:scIiIccI by ilia by markiDI' OIlier symbols aD Ihe diqanal .liDes 10 IepRIICIIl oIhcr ~ ofinlcn:at GI' iD~. caIaIir Ia di8'cn:nlillie pc:riack·.pc. by _ i~ iD difreniat .~. FeW ·.ftIidIer delllik 'lce Keidinl ' GoIdmin ('1992) ucI
axes.
b,
CIa,.. ...
'111m he clumps lIP .... 4 yean 1aIcr, in 1994. Fmally. he
siis
.
~D._"".1~1:~~"'.~
.0dR: BIIclwcII Scieace Public:IIlcWk' CJtM=n. A; L 1-= B\ICiIII diidI.:·Yhni1izi. ~ ad oIbcr.tiiaelkue_ cIII&. . . . Amtrfam SlIII16,. 46, 13-18. ~. N. ._ ~1Iic8I
48'1-_. ' . '.' *
iIIraaIce ill _ ~ .......~. nr.u.tIiGu til
.Las.",,,,,,.... ....". fOIlour~
FOr example, Ir... baadsandcal. . . .pariadsol' IOyeD' cbRtion' ~.......·I. .¥idaa..C·jaiai Ihe·cahan. iD I~ qed 30~ He cMniaealc.... period after 6 ,an. in 19SJO.:
Hills (1993).
(I""
RtIJIIl Sod«, oJ LMlIM. sma A m;
'I1Iis JiopIiIar ~ IDIsasum is used In DBIOCIR •.,.aY aad DIISIIOIOoY rar . . . . . . 11M: cum:at. health of • papulalioD ar' far hl:ailh compaIisaDJ .~ .,..Jalians. Far ,. 1pDCiIC.'P!'I"iJ.i~ die IWIdianI
,life
~cy.
CUI
....
02
o+-------~~~----~------~------~~--~
o
ire
expec..., sUwivaJaIIN
20 .1
I..
U
IWfII' 11118. Shaded. . CDn8IIPDfIIItJ to Ihe. . . underlhe BUNlvaic:urve Imm IIIJtI Jt
OfMBIdt r i IS t!fIfiiIl to e(K)S(Jt)
_________________________________________________________________ UFETMLES
deftnition of life expectancy at a gj~n ap is die: avaage nwnber of years l1li indivicillal or Ihal pm1icuIar age has Jallaining if the qc-spc:cific . . . .ty raIcs do nol chaqc in Ihc: fubR. Thc:sc .-spc:ciftc lIIOIIaIity nles can be obtainc:cl from the 1ft TABLES for lhat papulation. which may be Slratilied by variables Icnowa to be IIIongly associated with mortality, such as scx. calc:adar period aad smoking slatus if aYailable. 'I1Ie lire Rabie can be estimalc:d eilher aonpanunc:trieally or paramc:tric.y (see. far example., GaitalZis ~I QI•• 2004. and 10m and FIRWCIl, 2(09). In sIaDdud (actuarial) life lable nolation. the life expectaney aI qe :c. e~. is gi~n by ~~ == T~/lx. wIleR T~ is the nwnber of penon-yean lived between die exact age x and extinction. aad I~ is the number of persons saill aIi~ aI the exactage.T:. Thus. farexamplc, the lire expeclaDcy at birth. eo. in a liven binh cohan or life table population is the avaage penon-yean lived fium birth if Ihe CUlRllt ~specific mortality I1IIes ~maiD unchanpd in Ihe futuRo The continuous time mathematical ~nlaliaa of life expectllDC)" (also kDown lIS Ihc: mean laicluailirc:time) of 11ft individual known to ha~ survived to an qe x in terms orthe cum:nl fon:e of mortality (i.e. hazard function over age: see SURVIVAL AXALYSIS - AX CMIlVlEW)• .1(,,). is given by
and is equi\'BIenl to the an:a under die: survival curve. S(u). fromqe.'Conwanisdiriclcdbytheprababilityofsunivingup to age x. S(x) (see Ihe fllU~ on page 242). Note dud because Ihe age-specific manality rates IR expected to c......e in the futUR. the life eJtpc:ctancy is not a melllMR or how long a spcciftc individual ora liven age in the: population ofinten:st is actually expeclcd to live further. Althaup life expectancy is a long-standing and easily uadentood indicalor or pn:senl populalion heallil. it has incRasinlly been sc:ea as too crude for this purpose since il daes noI take mlO account the impact of chronic diseases and disability. Extensions of life expeclanCy to healthy lire expcclancy, disability-fm: life expec:laDcy ad. ~ generally. 10 life expc:etancies in yariaus health slates ha~ bc:en made and can be estimatc:d throup the: ftlling or MULnsr.m: MODELS (see.. for example. Butler ~l QI•• 2008). BT Baller, T. c., . . dell Hoat, A., MIl..., P. E......... J. p .. .....,., C. ... Aanbnd, D. 2008: Dc....Ii. and survi\'aI in Pukiasan disease - a 12·,CS' papulilion study. Nftlrrliogy 7a 1017-22. 0 ...... A.. Jall-, A. L., CWwkkt D. W.. SIMtnoa. S. 0. Bad Saader, J. W. 2004: Life apectaDcy in pcapIe with newly diaposed epilepsy. Bra;" 127.2427-32. Tom, B. D..L Bad ....... V. T. 2009: Statistical methacls far iDdividual-I~1 data ill cahall IDOItaIity studiesofrhcumalic diseases.. CDllIllllllrD liMs in Stalistics -1'IIary""tl Methods 31, 3472-87.
life lab.. Life tables IR models thai conveniently sullllllDlise the level or mortality in a popuIalion of inteaat.. Their best-known fianction. life expectancy. has a ready intuitive mc:anilll. Ufe table functions an: inclc:penclenl of the age strucllft of the papulation whase monalily ~lthey are used 10 SIIIIUIUIrise. Period life tables are used to sununarise the mortalilY expc:rienceduriq a giyc:nperiod. e.g. acaleadarye•• Coharl life labIes sumaaarisc: the experience of a defiDed cahall as it ages thraugh calendar lime. For the nc:cessary mortaIily observations to be available 10 constnIct them. the Rle\lBllt cohort has to be atlc:ast towanIs the end or iIs lifespan. Full life tables have one lOW for each year of life. usually 10 age 110 (see the BgUR on page 244). Abridged life tables typically ha~ one lOW far each 5 yean of life exceptlhat ~ are usually separate lOWS far . s 0 to I and I 10 5. CDlUlrllding life IQbles. TbcmDR two main steps to buildiq
a life table. Fint.the~ is
the ·lRliminar)' computation' • in which the observed age and scx-spccilicdc:aIh rates during the period of intc:resI DR eonverted into com:sponclin& risks of deaIh between two exact ages. Suppose. for example. that Ihe observed a:nlnll death rate in the population of inll:rest far persaasqed 40 to 44 last birthday isO.ool404. (This is 5 M..o in lifetable nolaIion when ~fening to observations made in the populationorinIcRsi. and is convenlionally taken to bean IUIbiasecl estimator or the c:orrapoadjng life Rabie function .....) 1be risk of death between exacl age 40 and eucl8le 45 is given by
_
5 x 0.003404 = 0.01690 1 x (1-0.59)5 x 0.003404
whe~stl.,isthefl'Ktionortheageinlervalliyed.oDaYerale.
by those who die during it. The risk of deaIh across the: age intc:rvaI is close. but not equal to. the cenlnl de.... rate. (In lhiscue,thea:ntraI clcathrate times 5 -10 take accOlUlI oflhc: age interval width - equals 0.017 02. slightly greak:I' than the risk.) Second. there is the computation or the life table proper
(see the liIure). In conslnlcling the life table an initial hypothc:lical cohort of 100000 (/0. known as the radix) is subjected acmss each successive age inlCrvaI 10 the calculalcd risks or death. Th~fan:. SlaIting at binb (.T:=O). 100000 an: eJtposcd to the risk of death Wan: exact age I. i.e. If. wbich in thiseumpleequals 0.02006. This RSuits in 2006deatbsintheintc:rval(ldo). 'I1Iepenoa-timeliyedinlhc: intc:rvaI (.t.) is I year far all who survived it (I. =97994) plus Ihe time lived by thasc who died in the interval - wldch 243
UFE~ES
~
_______________________________________________________________
,,,,. jilllditlru ad
_tII_
x I.J eXMIIII' X. ·U. , . xiII binlrtltq. II drs ID ihe Width of the. iataullIeiq 00nsicIend. In • ruu ure taIIIe wIIR ItII I ·it 1M, be CMIIitllld. eJl is lire apa:1acy aI cact lIP x. . . "". is • cc......... IIIC far peIICIIIS apd bcIwca x . . Z+_ ill die bypadaaliallife table popuIatiaa. h is esIi~ by "MI: (below). "JI. ~ oaIImI .... ate.ia _ papaIIIiaD of ___ I. is the nambc:r of·pcnaas still alh~ at aactqe :t. • f. il Ihc risk (.....liIy) of ..... bctwca qcs x mil x+". JI. il Ihc. • of IIIn'WiDI flam cuet. • x to x+" (eqIIIIs I - .f.)• • 11. il _ acnp frIICIiaa of die iatcmIIliYal by tbase wba die benwa " aad· Z+IL "L. is die ....... 0( pcII1DD-yanliwd bctwcea cuet IPS" ad Z+". TJl IIUIIIbcr of pcnan-JClllIimI bcIRca exa&:lap " II1II the eIIiIIcIiaa of abe IaypadIeticaI caIat.
is*
eu.:.
is_
life ....... &InJt:t of 1InII6 and IIJst 10 lOWS of a fill lie IlIbIe for US rrhIte maI8s In 197(/J Ap iIItmtII. pnWit~
_1tJmr Z...".
iUr_
Will" 11/
l'ropDrIiMlI/
iIIJYIft
pmtIIU ~tII.
tip
X.tJZ+II
".....,11/ tip."""
dyiq.-IRIfrIYIf
Of lotHJtIO ,.,." tliw
"--11/ --.
Ii..., """"dy., MIRIber
III
illtmwl
•
0 I 2 :I 4 5
o.ooo~
0.00012 0.00059 0.00054
..
100 101 101 101 104 105 106 107
I. I.
·tI. is
"t4 G.02D06 0.00116
00
0.35479 G.365.J3 o.37S50 0.38411 0.:39320 0.40101 0.40811 0.41475 0A1075 1.00000
I. ImOOD '19M 97_ 97799 97728 9767.
..
Amvip' " " " . , 11/
,.""". willt IOD. ",.." ..,"·i' .... ,..
1«InGf. rMJaiII., til
IIIBrtber 11/ ",...".s 11/ ~liMI;"
""""11/ ""'1"fI'S 11/ life IiMI ;" ,Ills . .
f.
I¥,."., III. iIIlmYII
"XJIItC'tIIIeJ'
"".iIWntII
tIIl_,..", iIIIn'IfIIJ
.da
.~
T.
tI.
2006 114
98252
6193828 6095576 665P7639 __ 64Ol036 6304336
67.94 68.33 67.41 d6..46 65.51 CMoS5
415
2.20
"". iII,,,",,1
x
,,, Mill..., t~ ,.,.,
81 71 .57
..
52
·97037 97i40 9776] 97700 97645
..
.
.
189 122
.45
100
2dO
~13
77
I"
29
62
leo
2.01
48
18
39
98
2.01
30
12 7 5 2 2 2
24 15
59
1.'-
35
1.94
I
.20 12 7 4
I .•
18 II
. 6
2
(II
5 3 4
1.86 1.12 1.79
0.129 rar the lilt ,c. of. life 1IDII1I(JpI"GIjmIIe1, o.s far .11 subscqueDt JaIL
equaI.s Ida x 1"0 (the ~·orthe . . . . . BYCCI by . . . . who cIed wi...... iI) ar 1006 )( 0.129. whidI equals 98 2S2~ u showa, wilen addc:cIlo I •. (Pol' CICCIIIOIII),. . . . is aaI shown in the tIIbJe: ·it is 0.129 far die 8nt ,ar or lire'" appIOXimalel, 0.5111inaflcw.) TIle "Lx column is c:aIcuI... in this wa)'~ aae row at -lime, ID the end or die IifClplll. A special ~ is daen na:dccl farclcasiD& the last open-cndal illlcnal-
~Ift&elllia&
On . . table shoWn. the penaa lime li_
~i. 1'IIis is ealimaaed by l.orIJ4.... (The julliftcatiaa ror ...ii is IhaIthe n:nuIinia& lWVivai Jime is lalcen 10 be clislributed expaaenti.Jly willi _ IIICIUI of
beyond log -
l/caM....) TJt is daen I1DDIIICd back fnHn the cad or . . &fespan. beliJUliIll, ia this cumpJ&:. with T.., which equals L ••
_______________________________________________________________ UKEUHOOD
Movins up one row. TIG8 then equals T100 + ILIOI and so on back to To. wIIc~ To n:pR&ents alllbe pc:noIHimc lived by Ihc 100000 who set oua. so the averagc penon-limc lived. or life expc:ctanc:yeOo is Tolloand mCRgencrally. for any age
e,. = TII/la• Lire table populations can be interpreted in two ways: (a) as fully hypalhc:lical conslnIcls or models in which 100000 individuals IR imqincd, as it wen:. 10 be born in the same instant and dam inllantaneously subjCClal to lhc: ~lcYlUlt risks of de.... dnupaut Ihcir hypothetical lifespans: or (b) as n:pn:sc:ntinl slationary populations experiencing constant. and equal. birth and death J1IIc:s • In this latter inlclpnUtion. ,.L~ gives the expcclcd number of iDdividuals x to .Y + n and To gi\'es the loIaI population size. In such stationary populations the~ ~ 10 bilths each year, so the birlh nile = IoITo - lhc: iJwasc of the lifc expeclancy. nus: ."C.
81m
Crude birth laic = Crude death nile = l/eo Use$ D/ille Iffe Illble. The la and qa columns have many uses in summarising marlality risks in populations of interest. "Ibus infant mOdality, conceptually defincd as the risk of death bef~ Ihc finl bilthclay~ is Iqo. The uacler 5 morIalily ·ratc· (actually lhe risk of clc:ath befon: age 5) is I - Isllo- The adult mortality ·l1ItC· (45115, or the probability of death between IS and 60) is 1 -l.u/lu. Similarly. the pmbabilityof survival between any two ages i andj is liven by 11 11,. Lifc ..bles ha~ long been used to proVide comparable summaries of monaIity risks in populations. They an: also sening as lhc: basis of DCMr 'SUIIUIIIU')' measun:s or average population health'. which combine infonnalioa on both the risks of pranallIIC clc:aIh and of nonfatal illness and injury. Such summary mc:asura may be cither ·health expectancics' (such as "health adjUSlcd lifc expeclallCy') or ·health gaps' (such as DALYs (disability adjusted lircycan) last).
The: dJr and 131 functions when plotted fora given population at successive time inlervals show how the distribulion of age at death ch8agc:s as life expectancy has risen. One inlel'JRtation of n:cenlln:nds in low mortality countries is that the rise in eo has beendispraporlionatcly due to a raluclion in the ~ ~ adult deaths. As this pmcess coatinuc:s. a IlII'Icr and iarpr proportion of each gcnc:ralion survives until closer to the maximmn lifcspaa. 'l11e clisbibution ofdeaths by 8Ie aI clc:ath becomes concenlralai ata high SIC -maaifcst as a n:cIaceci clispc:nion in lhc: distribution of deaths by age (d.) in the lifc table. 11Ie c:orn:sponding shift in paKem for the survi\'Ol'Ship (I./If) fUnction is for it to n:main high unlil closer to the maximum lifespan aacllhc:n fall sharply - a process clc:scribed as the 'n:c:langularisalion or thc survival curve'. This ·opIimistic' interpn:tlllion of n:c:c:at tn:nds is taken to
imply that then: has bc:cn no cxlcasionofthc avaqeduration of iII-hc:allh in the period immecliately bcfCR death. Par flUther dc:lails sec Elandt-Johnson and Johnson (1980). PmSian. HeuvcliDe and Guillal (2001), Lopez dill. (2003) and Peden 41'1111. (2003). JP
_"$;'a.. .
EIaadt.,Jolaaloa," C. aad J........ N. L 1980: SUT1'i.Y11Ir1t1dtI, _dIIla New Yolk: Joba Wiley Ii Sons. lac. Lopa,A. 0.. AImIad, o. 0 .......... F......, B. 0., ......, J. A.. l\Ia..,.,.C.J.L..... HII,K.H.2OOJ:UreIabIcsCar 191 cOUDlrics Car 2000: data, methods. results. la Murray. C. J. L. and Ewas, D. B. (ells). Hm/lir $1,'enu performtIIfU tII_,..",: tkbtzlel, DIt'W, _ empiridsm. Gcaeva: World Health Orpniueian. pp. 335-53. Pee.....,A..~J.J.,~F.,l\Iacb...... J.P..
AI Maanaa.A. ........... L 2003: Obesity ia adaIdaaod ad ilS CCJIIICIIIICIIIS for life cxpcctaacy: a lifo-table analysis. AMaI, of IlItmNIliolllll M_riM 138. 24-32. ......... S. .... H......., p. .... GaIIIDt, .L 1001: Dtmo,r.",: lIIftIJVI'iIIg _ motklillg popuklliDlr JllWesM$. OsIon:I: Blackwell.
likelihood The likelihood function plays two roles in in ill own right it proVides a means for estimatins unknown parameters by fincIing the value: of the unknown paramelcr(s) thai maximises it (maximum likelihood) as ~11 as for CGlllparing hypotheses (LIKELDIOOD RATIO). Sccaad. it has a role in Bayesian statistics (see
STAlIS'I'ICS. Ani.
BAYESIAN METHODS).
SUppasc: inlCrest lies in leaming about the: response of patients suR'ering from inftuenza symptoms to a new tn:atment. Da.. an: collected from 10 patients. of whom six JapOnd positively. What CD be said about the unkaown probability of a positive response 1r? By deftnilian. the probability of a positivc response is If and. of a JlCgalive response. I - 31. Suppose Ibal we have observed six positive R:sponse and four negative ~ponses in that order. The likclihood of Ibis happening is ~ (I In pncIice, the order of obserYatioa orthe: responses is arbilnry ad we could account for this by mUltiplying the likelihood we ha~ calculated by the number of ways six positive responses and four negative responses could occur.lrlbis is done the: likelihood bc:comc:s:
"t.
6'
J
,
4
4~!""(1-;r) = 15".-(1-;r)
which corresponds to a probability from a BINOMIAL DISTIl. BU11OX. For diffc~nt values of If we can detennine the likclihaocl and on Ibis basis find the most likely value. For example.. if 1r has lhc: valuc 0.1 lhc: likelihood is IS x O.:zo )( 0.1"' -0.0000098 and for the values of 31 0.3. 0.5. 0.7 and 0.9 Ihc com:sponcling likelihood values an:
=
0.002 63. 0.0146, 0.0143 and 0.000 SO n:spc:clively. ~ fon::.ofthese fourvaluc:s. o.s is the mostlikcly.ln fact. ~can pial the likelihoacl values for all palCntial values of 31 and 245
UKEUHOODRAno _____________________________________________________________ 0.025
pmcecds by ealculalilll:
0.02
p(9IData) <X p(DalaI9)p(l)
'10.015
Ii .I 0.01 :::J
0.005 O+----.~--~--_+----~--~--
o
0.2
0.4
0.6
0.8
1
__
1.2
Probability of pasiIi\I8 response (It)
likelihood Llce/lhood function based responses out of 10
on six positive
choascthc wlue thatliws the-maximum. as in Ihc first ftgwe. Prom the fipn:, we can c:onclude that the mast liltely value for Jf is 0.6. as il gives the largesllikelihood value. This value is the maximum likelihood eslimalor. The sameapproac:h caa be used far otherlypeS ofdata. For eumple. Altman (1991) giYCSthc followiDgclata on Ihc daily ellCl'g)' intake (kJ) of II healthy women: S260. 5470. S640, 6180.6390.6515.680S, 7515, 7~15. 8230, 8770. Assuming that these data arise fram a NOIWAL DIS'I1UBU1ION with a CXJIIIIIKID MEAN denoted II and knowa STANDARD DEVIATION 1100 we .can determine Ihe likelihood as a functioa of the unknown,u and plot it as befCR. 1bc secOnd fiI'ft UJuslnllc:s this, in which the maximum likelihood occurs al Ihc value 6754.
1.2
in which ~(J) is the priar distribution expressiJIJ initial beUefs in Ihc pallllDCter of interest. I. p(6IData) is Ihc c~spondilll posterior distribution of beliefs and p(DalaI9) is Ihc likelihood. If there is great priar uncertainly about the panunc:lU of inlcn:st so that the prior distribution is essentially ftat relative 10 the repon in which the likelihood is peaked, thc:a it has linle impact on modifyiq priar beliefs. In such circumstanc:lcs. the postcriardislributiOD is esseatially pmpodional to Ihc likelihood so Ihat posterior beliefs about the pammeter are dictalcd by the location and shape of Ihc liltclihaad. In palticular, the posterior mocIc:. the value • lievcd to be the mastlilcely anercollccting data. isesseatially cquivalenlto the value thal maximises Ihc likelihoocL Le.1hc AG
MAXIMtBI UKEUIIOOD ESTD.IATION.
A......, D. O. 1991: I'mtlkal S#tltislic.s for med~aI mwD'clr.
Landoa: Qapma a Hall.
likelihood ratio 1bc likelihood ratio provides a method for comparing competing hypotheses basc:cI on Ihc UIC& LIHODO calculaled fram experimental clatL It also plays a role in Bayesian hypolhcsis lestilll (sc:c BAlBIAN ME11IODS). Suppose interesl lies in learning about Ihc response of patients suffcriq from inftuenza symptoms 10 a new tn:almenl. Data arc collected from 10 patients.. of wham six respond positively. What can be said about the compc:tiq hypalhc:ses H.= R=O.3 and H 2 : ",-"0.71 The: likelihood of obtaining six positive results and four negative mAIllS is: 1 .-6 6~I;r(l-;r) " = lS;r(I-;r) .-Ii "
4~~.
'8
O. ~ 0.6
~ 0.4 0.2
O+-----~~--~~~--~----
5000
6000 7000 8000 Mean daly eneray intake (kJ)
__
9000
likelihood UIceIihoodfunctfon for the mean dai/Ysnerrw intBke based on a sample of 11 values
and this can be determined for the compelilll values of R under the pair of hypDtheses. Far hypalhcsis H. the value is 0.002"63. while that for H2 is 0.0 143. 1be raIioofthcsc values is 5.44, the likelihood ratio ofH: &laiRsI H. indicating. in this illSlanL"e. that hypolhciis H2. is almost Sl h times as likely as H •• which is slnlng evidence in favour of H: rather than H •• 1bc Bayesian cquivalentlo this farm orhypodacsis testinl is based 011 detenninin& the raliooflhc pallcriorprobabilities or the hypalhcses. Fonnally. we calculate: p(H;IData) IX p(DataIH;) p(H;),
; = 1,2
in which P(H,) is the prior probability of hypolllcsis H,. cx(Rssiq initial beliefs in its vcrac:it)'. P(H)DaJa) is Ihc In Ba~an statistics, Ihc likelihood works to modiry the PUll DlmUBU11QN to" yield thc POSTERIOR DIS'J1UBUI'ION and n:pn:scnts Ihc information contained in the expc:rimcntabout the panllDClU of iDleresL In a fannal way. Bayesian analysis
correspondilll posterior pmbabililics and p(DuJII,) is Ihc likelihood of the hypothesis givilll rise to the data. By taItin& the l1Ilio of Ihc two cxpn:ssions just given, Ihc ratio of the postcriar pmbabilities of the two h)'~ can be
________________________________________________________ UMITSOFAGREEMENT Dev..... R. F. 1991: Scale tJel'elopment: theory tlIIIlappliUllions. Laadoa: Sage. Stniaer, D. L .... NOI'IIIIUI, G. R. 1995: Heallh mftUIII'tmeIII scale.: apr«liml,II;tk 10 Ilreir tkJelopmeRI 0RdUN.
expressed as:
p-;...;(~H~ll=D-ata..;)
= p(DalaIH2) x p(H2)
p(HIIData)
p(DalaIHI)
2ad cditiDD. Oxford: Oxford Univcnil)' Press.
p(HI)
ne left-hand side of litis exJRSSian is the posterior odds ndio. the fint lenD on the right-side is the likelihood ratio and the second lenD is Ihc prior odds ratio. This fonn of Bayesian analysis is familiar in diagnoSlic testing. In thai mntcxtlhc Jikelihood mtio is expmllCd as:
Likeliboocl ratio _ Probability (positiw lest muhldiseue)
- Prati86i1ity (posili\'c test resultlno diSCase)
_ semitivity - I-specificity
AO
AItIuD. D. O. 1991: Procliml stalwk./OI' metJjml rexan:IL London: Cblqllll8ll a: Hall
Uk_ scales
111ese scales ~ usc:d 10 measure the extent to which an individual qR:CS with a litatcmcnL A Likclt scale I)'pically has live levels. ranging from 'slrona;ly d~' to 'slIOna;ly aa;n:c'. One common alternative is to usc an even number of options in order 10 avoid having a "neutral' option. A I)'pical Ukcrl scale queSli~ item with five levels is the fonowin&;: In a proposed study of mild asIhma. il is elltically acceptable to give some participanlS a placebo treatmcnl.
• strongly disagn:e • • • •
disqn:e neither agree nor disaa;n:c agree strongly qn:e
11Ic data from a Liken scale ~ often coded asa number (e.g. 1 lOS) and il is typical for~sponsc:s from multiple itcmslo be summed or averagccllO provide an ovcrallscore related to an underlyin&; issue or LA'RM VARIABLE. When there are multiple ilcms, it is ~cornmcadc:d dud the onIeroflapOnsc:s be ~VCnCCI for some itcms,lo help prevent subjects fallin&; into a simple paUcmofraponscs (e.a;.always select 'strvna;ly 8I~'). The data from one or anon: Ukcrl scale items ~ often analysed as interval data. For this to be a justifiable approach. it is important 10 biat and develop Ibe items properly, pcmaps using a pilot study and lc:st for validation and ~Jiability (see r.lEAStJRBIENl' PREC1SION AND RELIABIUTY). For further details sec DeVellis (1991) and Sb'Cincr and Norman (I99S). PM lSec also F.o\CI'OR ANALYSIS]
limits of agreement
This approach was developc:d by Bland and Allman (1986) 10 assess AOREEMENI' in method comparison Sludies. Based on bath a;raphical techniques and straia;htforwani statistical calculations. iI is simple 10 apply and interpreL It quantifies agreement between two methods through Ibe mean difference (i.e. the eslimate of the systematic BIAS of one method relatiye 10 the other) and Ibe sr.UID.<\RD DEV1A1IO...... of the differences between measurements taken b)' the melbods on Ibe same subjects (i.e. an indicatian of the variability of these differences across subjecls). The infonnalion IJIOvidc:d by Ihcse summary statidies is· commonly pracnted visually in a Blanti-Allmtm plol. when: the diffc:n:ncc between measurements are plotted &pinst Ihc avcrqe of the measuremenlS takca by the melhods on Ihc same subjects (see the figure). This summary infonnation is displayed an the plot b)' Ihn:e horizontal lines indicating die mean difference and lite mean dilTCMnce 1.96 sIandard deviDlion of the difl'CMnces. The latter two lines rcprescat the 9S'.it limits of ~ment. That is a rana;e within which one would expect 9SCJt or Ibc differences to lie. under Ihc assumptions of normally distributed diffc:n:nces and Ihc unirormity of systematic bias and sIandard dcvialian acrass the whole ranlC of mcasuremeats (as indicated by no evidence in Ihc Bland and Altman plot of a ~Ialionship between the diffe~ncc and the aVCl'8le in measurements takca by Ibc two methods on each subject). The assumption ofnannality or the dilTcmaces is ICncrally reasonable. as much of the betwecn-subject wrialion is removed by the differencing of Ihc measurements between methods: therefore whaa remains is Ihc measurement error. ~r.e\·enirlhis assumption is violated, there may not be any serious c:ansequcaccs resulting from the cOlWtruclian of the limits ofqn:emcnt as alread), described. Thus S'it of the differences will still be expeclc:d to lie outside the nma;e created by these limits. althOUlh moll of these diffc:n:nccs ma)' be in the same di~tioD. Ifth~ is found to be a ~Iationship between theditTcrenc:e and avcn;e. c.a;. the plot shows a ·fannina; out' effect of Ibc differences as the averBIe inc~ases (te. the variability oflhc differences is increasing with the size of the measu~menl). then application of lite limits of the qKeI11Cnt method. as described above. would pnxIuce limits thal would be wi'" apart than nc:cdcd for lower values of Ihc avcn;e and II8ITOWCI" tho expcc:tccl at higher values. Thus it is bcucr 10 by to either accommodalc this ~Ialionship or remove it b), suilably transforming the data (e.g. usina; a logarithmic bansformation).
=
247
UMnaOFABREEMENT ________________________________________________________
0
10
-
Mean + 1.88 SO
.... 0
0
~
0
0 0 0
-
0
., g
·ft
0 00
0
-
•
0 1ft
lit
0
0
0 0 0
-5
.,
00 0
-
0 0
-
•
0
0 000
....
00 o 0 0 ,·0
00
0
o -"'
o·
0
0
.0
0 0 0 0 0
~
0
0
-
0
-
Mean
00
0
8
.-
0
0
0
o·. 0 0
00
0
•
0
-10
0
0 °0 0
0
Mean -1.96 SO
0 I
I
I
I
I
8
8
10
12
14
Awrage of two methods
IIniHa of l1118ement BIIInd snd AIImBn pIot·of"" dIIIerent:e ,.,..., the..,., for fWD methods
Bland aacl Alima pmYide compn:hcmsivc expaBilions _boullhe Ii_Is or dIe·qRCInI:Id approach. the issues that !aull wbc:a lhe assumpliaas or IIIIIIDBIiIy and ..ifonnily or the bias aadllllndanl deviation an: vioIaIed. and the ways to CWCR:IIHIIe ~obstaclcsill their appnJaCh. 1hey aIsoclisca&s n:pealability and n:pIic:aIiaD.. and fuIIher exIeDIiaas to lheir The n:adI:r is ~fenal to their alticlcs rar IIIOte infonnaliala (see the ~I'cn:aca beloW). Beren CDIICIIIIIiDI this enlly, further must be ~ofthepuIpCIIIID beIIind the. limits ofap:c:mcnlappmach. In mc:dicine inlClat oftc" lies in mmparilll two (01' ....) clilfemd.~ or 1ecluiiquc:1 for IIIC8SIIIiDl _ _ cliDically important quantily. such 8S camIicI . . . ar blaad IRian. an.: of abe IIICIhacIs may be _ad)' nlllliDely eslllblilllad in clinical pnICIice (e.g. illlnl-arlelial cliaital ~ (DSA)~ WIllIe the oIhcr may be _ new lachnlque (e.g. conInISI-CahancccI mapeIic IaIIII8IICC . . . . .y (CBMRA» Ibat ·needs 10 be cwJuall:cl. . . however, rneasum the bUD quaalily or illleR" willi emir. 1hus aeilher is _ .tiuc ptIcI stanclald. and ....._ the
.....-a.
_li_
_Ii"""
questiOD of illll:n:lI is Dat wbethc:r Ihe IICW . .thad. say CEMRA. 1ICCUI'BIe1)' mellllRS IIenasiI orlhc clinllid 1IIIay. as III8CSICd by dac c&tablilhc:d IIIClhad~ mA. InIIcad, it is
"00 the cli8'cnmllllClhacls or IllCUllRmenllICRO sulllc:ienlly closely ~ allow eilhcr dac 111M mc:Ihod 10 RPlace the old ar baIb to be UICd inlcftlamapably. wilh lillie ar 110 dim:n:aces arrived at in clinical CDDCIusions ordcc:isiansT Farc:umplc:, ifCEMRA illhDwa 10 live sullic:icady close me8IIIIaIICIIII to DSA. lhen. as the IIIIk:r is an inwsiYO and expensive lCdInique dtat carries willi ils _ _. small, bat sipiftc:aat
risk or""arcleada.jultillcllion forasilll CEMRA~ which is • lIOIIilwasive Iec:JWquc, ow:r DSA is ObIainccl.. Note 1...ly dud Ihc ic:w.. or the Je.1 or acccplable ~ depc:ads OR the c:lillic:al canal of Ihe .shllly. aacI shauld be IIIIIIIe based ODcliaicaijUllpmcnL DOt an slalistical pIIIndL Further, thiS clecisiola sIIaaId be made. in paeral, • priGri or the COIIIIDCIICCIIIaI of the SlUdy. lIT AI..... 0. O. 1991: #'rftlicaJ Rltil&Ija /Dr IMdklll rt'.redKIr. Laadoa: Cbapm... a Hall. A....., 0. O........... 1913: MeasaNment ia lIICdiciac: lile aalJlil of lllelbad CGJIIIB'iIDa sladics. l1Ir Slilli8Ikill" 32, 3CJ1-11••Iud". M...... AI......
,.M.
D. O. 1916: Slalislicalmclbacls forasscssiDlqRClllClll1IcIwcD lwo IIICIhDds or .clinic". IIIC8SIRmeDL 1..ttIM~, i, 307-10. B..... J ........ AI..... D. O. 1999: Meas1Iriq . . . . . ia melhocl cxnaparisoa sludics. Slillilllmi lie'''' ill IIftii~ttI
R;..m, 8_ 135-60.
____________________________________________________________
linea, mlxed-elfacts models ReglaSion models thai include bath fIXED EFfECI'S and ~DmI EfH'CTS. which ~ also known as MULTU.EVfl. MODELS or hicnln:hiaal models. Mixc:cl-cfl'eds models an: fitlc:d to daIa thai ha~ a hierarchical orcJuslaal structure. in which individuaJ observations are com:laIcd within chlstelS. Examples of applicatian areas include longitudinal datL when: Ihe measun:mcnts laken n:pealcdly on patieats o~r time an: com:lab:d within patients. and multicentre studies.. when: the measurements taken on patients withincenln:s~CCXKIatcd. Mixc:d-cll'ccts models include random effects to allow for the cormatian of obsc:mdions within cluster.s. To iIIUSInde the basic structure. we consider a simple example of a linear mixc:d-cfl"ccts model. In a C1INICAL 11UAI. comparing the elrects or acti~ tlatment and control over time. suppaIC thai >"11 is the n:sponse mcastnd on subject j (J = 1. . ..• J) at lime lu (; = I ....• ~). aDd let the tn:almeat paup allocated 10 each SUbject be indicated by '~11 = Ofl. A simple mixc:d-clrecls model for lhc daIa >"11 includes ranclam effeclS for patients to acknowledge thai Ihe respaasc tcads to dilrer between patients and Ihal repealed measuremenls takc:a on Ihe same patient ~ lhcref'ore alike. 'Ibis modcI is written as: )'ij
= (a + IIj) + /llv + y.~, + eij
when: lhc "J an: random patient effects and the etlare random residual efTc:c:ls. which represent the variability between measurements wilhin patients. The paramdClS a. /I and y ~ lixcdefl"ccIs that repn:seat, n:spc:cti'VCly.1he overall mean n:sponsc in the CICIIIImI group (when: '~II =0). the trend in response overtime and lhc treatment effect, which isconslant over lime. The two sets of nandont effects "I and ell ~ indcpcnclcnt and it is usualro assume these to be DOnnally distributed. "i'''' N(O, o!) and I.'j"" N(O. ~). This basic: mixcd-cffccts madel far the dalacan beextcaclc:d in a number of interesting ways. Far example. we could allow the trend in n:sponse over time 10 vary flum one palient to aaother. we could include additional explanatory variables or we could allow the elrect ortreallDcntto 'VBIY over lime. For a r8l1le of extended mixed-cffc:c:1S models aDd luiclance on their inlCl'prelalion. rc:adc:n are referred to the rull entry on this subjc:cl area. litled MUU1LEVEL MOOEL5 (sec also 8ftritt and Pickles. 2004: Pinheilo and Bales. 2000). 'I1Iis entry provides details of methods and software for estimation of mixc:d-clrccts models and also cown topics such as handlinl of missing data and complex applications. RT LSee also ClEHBWJSED ES11MIJINCJ EQUATIONS. MODELS)
tI_ 11/clinicallriDls. PR:ss.I'InIIIIrotJ. c.
),ftJl.11IJ1:VEL
It. . . . B. S. MIl Plddll,A. 2004: SItllisl;C'IlIIUp«IJ oJlM tkJign tllllli}'Sis 2Rcl edition. Landon: Imperial CaUcce ad Bates, D. M.2000: II;xedrJ/«Is rnotIe& in S _ S·PWS. New yadt: Spri.... ~dlll.
linear reg ...slon
See MUL11FI.E UNEAR IlECIRESSION
linkage disequilibrium
LISREL
~SMETHOO
Sec ALLEUC ASSOCIATION
Sec snwrnJRAL BQlL\TION MOOEWNO SCFIWARE
LMS method
This is a melhod used far cODSlrUcliq ACE-RB.AlED REfERENCI! RANGES. typically applied to OROWlH CHARrS. The underlying agc-spcciftc FREQUENCY DlSTRlBU. TION of the mcasuremenl (typically anduopomctry such as height or weilbt. though it can be applied to any ratio scale measurement) is summarised by thn:e qe-varyiq parameters represcntiq the lilSl three moments of die distribution. The RlSt is the r.tE.DL\M as a MEASlJRf.OFLOCATlON. the IiCICOnd is Ihe COEIRCIENTOF VARIATION orCVas a MEASURE OF SPREAD or scale and the third is the Box-Cox powe:r lnInsformation (see lUNSRHWATIONS) needed to adjust for SDWNESS. as a measure of shape. KURTOSIS is assumed 10 match that of the NORMAL DISTRIBUTION. and is not estimalc:d as such. Adjusting for skewness ensures a symmetric disIribuUon. so that the ),lEAN on the lransformcd scale is also the mc:cIian on the original scale:. TbeCViseslimatc:d rather than the SfANDARD DEVIATION (SD). as die SD oRen increases wilh &Ie in proportion 10 the mean. whereas lhc CV is relatively uncorrclatcd with age. 11ac oriliaal publication describiq the LMS method usc:d the notation A for the Box-Cox POWER. II for die median and (I for the CV - hence the LMS method. 11ac lhrcc age-related curves are referred 10 as the L cunc, M curve and S curve n:spcc:tively. and tOlelher they allow any requin:d QUANTILE of the di5lribuliaa to be constructed as a smooth function of qe. Equally they allow individual measurements to be expressed as a standardised n:sidual or z-SCORE adjusted for skewness. See OROWl'll CHARTS for an example of a cenlile chart conslnlctcd using the LMS mc:lhod. 11ac LMS method is a semi-)HII1IIDc:Iric KlI'CssioD model where Ihe tIRe paramctersofthe distribution ~ estimated as generalized adcIilive cubic smoothing spline curves (sec SCATrE1lPlDl' SMOOI'IIERS) usinl penalizc:d UICELIHOOD. 11ac only analytical decision to make when Rtting the model is to specify the number of equiyalent dcgRlCS of freedom (cOoP) required for each of the three smoothin& spline CUl"\'CS, so that they are neither under- nor o'VClSmoothed. Criteria such 8S AlCAIKE'S INFORMATION CRl'JUION or the Bayesian infonnalian criterion (EvcriU and Skrondal. 2010) are useful hc:~ to InIdc: orr improvc:d fit apinsl increased model complexily. For infanl anihropomclJy die M curve is often steep at birth and prosrcssiftly shallower with incrc:asing age. "n'ansfonning the age scale can help here. e.g. with a square raol transfonnalion 10 stretch younger qes and shrink older ages. as this lends to 249
LOCAL RESEARCH ETHICS COMMITTEE (LREC) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
linemse the M cuneo simplify Ihe cune shape and improve the fit. The LMS melhod is a special case or the family or GENERALJZm ADDITIVE MODELS for location. scale and shape (GAMLSS). These ~ powerful models thai can be applied 10 many different dislribulions. whe~ up 10 four momenls or the dislribution an: estimated as sepante generalised additive regrusion.models. For further dewls see Cole (1988). Cole and Gmen (1992) and Rigby and Stasinopoulos (2005). TJC Cole, T. J. 1988: filii", smoothed ceDliJc curves to men:nce daIa (with discussion). JtHII1ItIl oJ IIIe Royo' Slalillirol Soriely. sma A
lSI. 385-418. CaIe, T. J...... One... P. J. 1992: SlIICICMhing refereac:e ccatile curves: the LMS method aad pcaalized litclihaad. Statistics in Metlit:ine II. 1305-19. Eymtt. B. s. ..... SUoadaI. A. 2010: Cambridge didiDIIQr)' o/$talistics. 4th edition. Cam.,...: CambriclgcUni\'elSity~ss·Rlaby,R.A.8DdStaslaopaalas.D.M.
2005: Generalized addili\tc models for IocatiCIII, scale and shape (with discussion). Applied StQlutks 54. .507-14.
local research ethics committee (LREC) See E11IICAL RE\'JEW cmalllTEES
locally weighted regression See
SCA11UIWI"
SMOOrHEIS
loess
See SCATIDPLOI' st.IDOI1IERS
logistic discrimination See
DlSt'mIlNANT fUNCTION
Mtl..TIU I.L'ft!AR REGRESSION. as the dllla arc not nonnally disbibutc:d. do DOl have the same VARIANCE for groups wilh
diffetent OUICOnIC proportions and PRCficlions ofproponions must not fall outside die range zero to one. which can happen if multiple linear tegmssion of a proportion is used. Binary data can be analysed in telation 10 a si~1e categorical explanatory variable using the C1D-SiQUARE lEST. but very rn:quently it is necessary to analyse a binary out4."lCJlne in n:lation to several explanatory variables. same or all of which may be continuous. For example. in a lIudy that investigated whether n:poned wheeze is telBled to the usc of gas ror coaki~ il would be desirable to take age and gender into account. and also conditions in the home. such as an extractor r~ Ihal might affect ehe concentration of the combustion producalhought to be responsible: for aDy increase: in sym~ toms. Alternatively. we might want to analyse wheeze in n:lalion to the usc or gas for cooking and passive smoking simultaneously. To analyse binarydala and adjust the relation to the factor of primary inten:1I ror confounding variables or to determine to which or sevc:nl potential explanatory variables abe outcome is n:lated. an analogue of analysis of variaac:e and multiple rcgrasion is lequirecl. Logislic n:pcssion meets these requemcnts. An explanation or the method is castell in telation to an example. Logistic n:lRssion has been used to dcsc:ribe ehe disbibulion or age at menarche in girts and the factors associatc:d with early or delayed menarche. Roberts. Romer and Swan (1971) carried oul a clUSS-sc:ctional survey or girls in South Shields. County Durham. in 1967. Data an: shown in the ftnI table.
ANALYSIS
logistic regression
logistic regression Number of girls and number recorded as menstruating, by age group
outcomes often havc only two possibilities. Whether a patient is dead or alive is the moll obvious. but the presence or absence or particular diDInDscs. symptoms or signs an: also examples of binary or dichotomous vanables. Hypcncnsion. obesily and ainvays obstruction an: diDlnoscs that n:sult from observing Ihat a particular measun:menl is above or below a particular value, thus ctellliDg a binary oulc:omc from a continuous measun:mcnt. Methods for the analysis ofbinarydllla differfiom Ihose for eonlinuous variables. Fint. the SUIDl11DlY statistic to dcsc:ribe then:sultsisaproponionorpeR:lentageofindividualswboan: dead (or alive). have the symptom or in general have the designalcd out4."lCJlne. Data from a continuous variable an: summarised by the MEAN and SfANDARD DEVIATION or ltEDlAN and ~ARTD.E R.oU(OE. as how variable the values arc is required as well as a typical value. while for binary data the proportion or pen:cnlap tells usevcrything. Second. when we analyse abinar)'ouame in n:lation toEXJII.ANATORY VARL\BLES we cannot use S11JDENTS I-lEST, ANALYSIS OF V,UIANCE or
AgegrDllp
A fonn or rqmssion analysis to be used when the n:sponse is a binary wriable.. Medical
Il- < 12 12 - < 13 13 - < 14 14- < IS 1.5 - < 16
No. 0/ girls
12 304 366 351 216
No.
4i
mD1111'lItIling
mensllVtltillg
4
76 171 215 2(»
4.9 25.0 41.6 81.2 96.8
The percentage of girts who had reached menan:he. of course. increased with age. being vcry low in the youngc:sl age group and very high in the oldesl. Had younser age groups been included the percentage menstruating would have been less than 4.K and ()Cjf, if sufftciently low. Similarly. the percentage would havc been close to 100.. in older age groups. The telalion ofpropoltion or percent8le mensbuali~ to age can be described by an S-shapc:d (or 'sigmoid') curve. The cIaIa from Ihe ftnt table ~ plotted together with a Rtted smooth Sigmoid curve in Ihe ftgure.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ LOGISTIC REGRESSION
ID
1
i I! I
0.8
'i
0.6
1a 0.4
1
02
i£.
1~
1
Age
IogIsIIc
rear"'" CUmulative IogisIic curve
The curve sbowa is a cumulalive logistic curve. selected from the family of such curva so that it best describes the dais. Its fannula is: 11=
I 1 +e-(~
when: Ir is the proportion menSlJualing by age .tC and a and fJ ~ the parameters that describe the best filling curve. These panmetelS wen: eStimatc:d to be -18.40 and 1.37 n:spcclively to fit the curve shown. Median age at menan:he is caliRlaled aI 11 O.s. i.e. where -(a + fJx) 0 or .tC -fIIfJ. sowas 13.4 years flam thesedala. As the Iopstic distribution is symmc:1ric Ibis is an cstimaIe of mean qe at mcnan:he. The equation defining n CD be IeWrittcn:
=
=
11
I~-
I-.:r
computilll was known asp,obit QlfalYJis(sec PROBrrMODELS). • probit being a IlOIIDIII equivalent deviate with 5 added to avoid negative numbers and was developed far usc in pharmacology (Pinney~ (971). 1be distribution of the dose of a toxic subslanc:e rcquiral to kiD a given stnin"of animal is known as the tole,ance disl,ibutiolr. It cannot usually be observed din:ctly. but if groups of animals are ~ycn ditren:at doses of the drug and the plOpOItionsdying an: recOrded.. then • sigmoid curve of proportion with dose is obsel'Yed that describes the cumulatiye tolc:rmce distribution. Pinacy (1964) ascribed Ihe logistic traasformatiOll to Berkson and showed the close agn:cment of the normal and lo~stic distributions. but favoured the normal dislributiOllIo describe the tolerance distribution of diu, toxicilY. Hence. in pnc:ral. the nonnal or prubit InnSf'onnalion was usc:cl when there was an underlying to1emnce distribution. An exception was &Ie aI mc:nllKhe; it bc:came accepted that the 10000lic llansf'onnalion should be used (Pinney. 1971) as one study apparently found a better fit of the lopstic transf'ormation than of the nonnal dislribution. " Just as linear rqn:ssiaa can be extended to multiple n:gressiOll and also incorporate calqorical explanatory wriabies. so can Iopstic rqn:ssion. A mUltiple logistic n:per lim cqualion can be wriuea: I~ -
I-n
=
/I = a+,.,:c
The left-hand side of this c:qualiOll is known as the logistic I'tIIItf'ormalionofthe propoItionJr. Ilsefl'ecl is to stretch the scale. 10 lIIat Ihe lramfonncd variable can take wluc:s rrom minus infinity (-00). c:om:spondin, to ;r = O. to plus infinity (+oo).c:onespoadingto;r l.andalsotolincarisethen:llllion with ap. Filling the logistic curve can therefon: be achieved
=
byleastsquan:sn:p:ssionofthel~istictransfonnof;rm.
(see I.I!ASTSQl!ARESES11MATION).exCcpl thal the lnIDsfonnation docs DOl achieve homogeneity or varianc..-e and so iIcraIi~l)" wcighled least squan:s rqn:ssion is n:quin:d. However. most modem computer )H1JIrams use MAXDnIM UICEI.D(OOD EmJ.IA. TION. which also mJuin:s iteralion, and individual ratherlh... puuped data an: usually analysed. Pull specificalion of the binary outcome)' for individuals rcquin:slhal y is cliSbibuted as a binomial distribuliOD willi paramc:ter I (heR also known as a Bc:mouIli cIislributiaa) and success probability Ir. The Iolistic curve is nOiIhe only curve that could be fitled todc:scribe the data. A ftttedcumulative NORMAL DISlRIBUTlON "would be almost indislinpishable from the Iogjstic curve. PiUing a normal "dislribUliOll befon: the clays of elcclJonic
n
= a +/l.x. + ~X2 + iJJ''-J+ •..
the :C, can be continuous or dummy variables to indicate categories of11UUpS. For example. Robcns. Danskin and Chinn (1975) analysc:d qc at menardte in ~Ialion to family size. in calqOries of one. two.. thr=. four and at least fivechildn:n.1n the madel shown graphically in Ihe paper••'tl was &Ie. and f'ow' dummy yariables X2 to Xs wen: usc:d to describe the clitren:DCe5 in median .,e aI menan:he betwc:cn Ihe five family size lroups. concspondiq to ftltiq parallel sigmoid curves. The estimates hi of Ihe Il, an: knowa as logistic Rlglasion coeflicicats. To n:tum to our first example: if the praence or absence of wheeze is the outcome and the presence or absence of a ps coulter the indepc:ndent arexpianatoJy yariable. with no Glher facton alllsidered for the moment. then if the dummy (indicalor) Yariable x is 0 far absence and 1 for p~sence of a las coolecr'. dlen we have: whe~
I~ [ I-;r,.. .iJps ] -Iole [ Jl'eops ] =Jl I-Jrq. 251
LOGISTIC REGRESSION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Each oflhc tams within bnckclS is an acids and a diffc:rnc:e in log odcIs. when andloged. is an OODS RAllO. Antilog,ing both sides of the equation gi~:
n'ps(I-Jrqas) n'.,.(I-Jrps)
="
The odds ratio on Ihc left-lland side of this equation islhc addsorhavilll wheeze in Ihc pn:senc::e ofa pscoolc.crdividcd by the odds or having whc:c:zc if no gas cookei' is present. It will appmximalc Ihc Rlative risk. ar risk ratio. of wheeze in the pn:sencc or a gas cooker compami to a no gas cooker provided thai the pMvalenc::e of wheeze is low. This relative risk is ellimalcd by,/. 11Ie diffCRnc::es or this example from the 8&e at menan:hecumpleare thatR f'or whc:czc is ualikely to be IRater than. say. 0.2, and no ·tolerance dillributi.· analogous to that of age at nlCnaR:hc is diRclly spcciftccl. although one can envisap Ibis being Ihc dillribution of whalevcr pnxIuCI or cambustion of las is responsible for incn:uc:d wheeze in peaple in homes with gas cookers. Even with such a dillribution spc:ciftcd.. il is unlikely &hat cx~ would ever be high enough to cause lQOtI.. whccz.c. so. in practice., only the lower portion or the sigmoid curve is relCVanl. As with applications in gcncnal. it was the availabililY of software that led to an expansion in Ihc usc of logillic regression.. in particularOLIM (generalised linear interactive modelling) in Ihc carly 1970s. Allhough now Iqcly superscdccI. nolably but not exclusively by StaIa. GUM enabled unbalanced analysis or varianc:e, multiple linear Mpasion and multiple logistic I"CIRssion madels to be fiac:d wilhin the same framework. The application of logistic Mgression in epidemiology and public health journals showed a sleep rise IiDm IIRIUDd 1980 (Hosmer. Taber and Lc:mshow. 1991: Chinn~ 2(01). Odds mlias were usc:d in epidemiology and. in particular. for the Rsults of a C\SE-coJmtOL ~ befoM the wiclcspn:ad availabiliay of campulas and slalistical softWIR enabled easy fttlilll of multiple logistic repasion models; thcref'OM logistic Mgression was Radily adoptc:d by epidemiologills. It was also established as the appropriate method for tbc analysiS or cuc-conb'ol studies with adjustment ror confounding. When cases and controls IR individually malchcd Ihc metbocl of _)'Sis is conditionallogiSlic regression. Mosl statistical softWIR for logillic relression raauiaa the binaryoutcomc to bccadccl oand I, willi I forthe 'positivc' ClUlCome. Like all estimates from a samplc. aD odds ndio has an associated ~ DIlERYAL The NUU. HYJIOI1IESJS of no relation to an explaaatary 'Variable is an odds ndio or one or equiwlendy zero far the com:sponding logislic rep:ssion eaellicicnt. OIdcrpapcn gave logistic repasion coefficients with standard em:xs or 95., confidence intcnais. but moM ~ent papcn live ocIck ratios with 95., confidence inb:nals.
For example., Somerville. Rona and Chinn (1988) pve logistic rcp:ssion coemcients in a study of passive smoking by children in a survey of 5- to II-year-old chiklMn in BnlIancI and Scotland. One result 15 shown in the filii line oflhc second lable. 11Ie Iogillic regrasion eaellicicnt orlhc symptom. reported by a parent in a self-adminislClaI quc~ tionnaiR. 'chest EVER sounds whcczy or whistliJII·. on passive smoking as measun:d by the total number or cigareues pcrday n:podCd to be smoked by the pareats. was 0.0 I I with a standanl error ofO.ClOS. By calculalilll the eaef6cient :I: 1.96 X standard error. a 9S~ confidence ror the logistic repasion coemdcnl can be obtained. AntiloUing (base e) the cocfIicicnt and each limit of its confidence intcrvalli~ the odds ratio and its 95., confidence inlcmll in the second line. However. the odds ratio usocialcd with exposure to just one cilarette a day is not \'CI')' useful; 20 cipn:ucs a day repn:scnts a more common cx~ of childn:n who an: exposed to passive smokilil. To obtain the odds ratio au. cialc:d with exposure 10 20 ciglRllcs a day. multiply both Ibc logistic regrasion coefftc:ient and ill slandanlcrror by 20 and repeat Ihc confidence interval calculation and andloging 10 obtain the thirclline of Ihc table.
logistic regression Altemalflle pt8S8ntations of result 01 /of1isIic regression 1IIJIIIysis, illustrated by 'r:hest EVER sounds whNzy or whisIIIng' in relation to passitlfl smoking for children in the Nalionlll Study 01 HealIhandGlOwtb (Somerville, Rona IUId Chinn, 198BJ Rew/I topstie rcp:ssiaG codIiciClll:: staadard emil' oa numllc:r or cilmacs smoked II home by fadler ancIlIICIIbcr Odds ralio per cillldte smoked (95 ~ confideac:e inIcn'aI) Odds ratio pcr20 cipMlCS smabd (95CJ. CDnftdeacc inIavaI)
lOt"
0.011 :t: 0.00.5
1.011 (1.00110 1.021)
1.246 (1.02410 J.516)
Although tbc evidcacc against the null hypothesis was nol
IIroftI (p,.., 0.021) and the 954JL conftclcnce intc:nal corresponciilllly wide. the results in the thinllinc showlhat the size orlhc likely effect is not ncgiigible. which could DOl be easily appn:cialc:d from either or Ihc Orst Iwo rows. Note that the confidenc::e intcnal for the logistic regn:ssion eaellicient is symmclric araund Ihc estimate. bul that far an odds ratio it is not. II is tempting to interprel the Ihinllinc of Ihe second table as mcanilll that exPOSUM to 20 cigarettes smoked a day in the home results in an incn:asc:d risk of wheeze of betwecft 2.4" ad 51.64JL. This is inlerprctinl an odds
__________________________________________________________ ratio as if it wen: a relalive risk. which is only juslified if the prevalence ofwhceze is low. say less than IK (Zhang and Yu. 1998). In lIIis case it was IO.9CJt. so perhaps not too misleading. but il is easy to ftnd examples of incorrect interpretation of odds ratios in abc medical literatu~ (Chinn. 20Cn). Although the fact that the odds mtio is biased away from Ihc null \'alue of 1 as an estimate of relative risk is well known to statisticians and cpidemiologislS. it is oRen conveniently ignored in the medical lilerature. especially in the repoltiDg of results of CROSSSECTIONAL 511JD1ES.ln faet. il is possible to estimate relative risk directly. by biDomial ~gression, but at abc expeftSe of the ilerative model litting sometimes failing to converge (Chinn, 2001). Lasistic regression is essential for the analysis or unI1UIIchc:d casc:-conlrOl studies and is UkeJy 10 conlinue to be the IDD5t used method fCll' the analysis of binary outcomes in crvss-scclional studies. Statistically il cannot be faullcd; it is in the rcporting. and the fact ahat an odds ratio does DOt estimate relative risk din:clly, that the problem Ucs. Binary outcomes in CQIIORT STUDIES should be analysed by SUJlVIYAL ANALYSIS. unless the follow-up time is CGIlslant. which is rarcly the case. The P-YALUE associated with the odds ratio. to lest a difl'emlce frum 1~ can be obtained by dividing the logistic rqrcssion coel1icicnt by its STAND.\RD ERROR and comparing the lault with the nonnal dislribution. as the null hypolhcsis yalue for the logistic regn:ssion cxx:flicient is ZCIO. Note that the normal distribulion is used rather than the '-DISTRJIUDON. as no residual standard deviation is estimated. nus is because a binomial dislribution is assumc:cl for the observations. which is specified only by Ihc expected propartion and does not involve a standard deviation. Altcmatively. if the model wen: fitted by MAXlt.RJM LIUlJIIDOD. the UICflJ. HOOO RATIO lest CD be used and will usually give approximately the same answer for a single parameter. If model I is the model willi the factCll' of inlcn:st included. with likelihood II' and model 2 ahat with it omitted. with likelihood ' 29 then -210g(/JII) has a Dl-SQUARE DIS1IUIUTJON willi DEDRES OF FREEDOM equal to the difference in the number or "Ited parameters. nais can be used 10 test the equivalence of se'VCIBI paramc:1c:15. e.l. equal mc:cIi.. age at meaan::he fCll' girls Iiom diffCRDt sizes of family (Robats. DanskiD and Chinn. 1975). Related to ICSling for association of outcome with risk factors is Ihal of goodness of fit or the model. This is more diflicult to assess than with a linear n:gn:ssion model. as individual wJues an: each Oor I. so a plot orobserved against litted \'BIues. or ofrcsiduals. is uninfannative. Farassoclated rcasaas the ovcralllikeUhood 1DIi0 statistic cannot be usc:cl. Hasmcr. Taber and Lcmcshow (1991) give a number ofpiots thai can be used and the ncccSSIII')' calculations an: implemented in Slata.
LOGlsncREBRESS~
Logistic n:gn:ssion as dc:scribed here fCll' a binary outcome is a special case or the more general multinomial. or pol),tomous. logistic regrcssion fCll' a calqOrical outcome with line or meR possible wJues (see LOCJIT .tOOElS FCR ORDlNl\L RESPDNSES). II is also closely related to the I.OG-LINEAR MOOEL. which assumes a PoISSIDN DISI1UIUI1ON fCll' the count in each cell of a eaatingency table. Each is an example or a CJENER.
wsm LINEAR MODEL. Medical joumals now frequently report laults frona multiple logistic rcgn:ssion. showing odds ratios,. P-wlues and confidence intervals. These need to be read can:fully. as seemingly similar lables may be used in differenl situations. The lines oflhe table ma), be fordiffcn:nt binDl)' outcomes 01' independent analyses or the same autcome willi ditrenmt explanatory variables. 1be odds ratios will oftt:n be adjusted. fCll' a list or stated variables such as &Ie and sex. although unadjusted odds mliGS may also be shown. An example is shown in second Rabie or Lawlor, Patel and Ebrahim (2003). in which odds ratiosoffallsin women aged 60 to 79 with drug use arc given. Each IVW of the table gives n:suIls forone class of drug. while then: an: columns for ·crude' • i.c. unadjusted. and full), adjusted odds ratios foreachofdRcauacomcs: any falls.lWoor~ falls and ralls where medical allcDtion was given. The variables used to adjust the fully adjusted odds ratios an: listed as a foalnote to the table. Other papcn gi~ odds mlios that an: mutually adjusted fCll' other facton shown iD the same table of results. i.e. alllhc results come flOm a single multiple ICllistie regrcssion and full iDfCllllllllion is given, while Lawlor. Patel and Ebrahim (2003) (described earlier) appear to haYe carried oUI 21 adjusted analyses (thn:c: outcomes by seven drui classes). (For an example or mutual I), adjusted odds ratios see Slap eI til. (2003). lint table in the abriqed printed venion. seeond table in the full elc:dlOnic vcniaa.) Particularly wIKK all results shown arc 'slatisticaUy significant' (SS). the n:aclcr Deeds to a5CC1tain whether all facton in the model an: shown and whether the final maclel was selected fiom a set of possible models. "Ibis is appropriate if eilher the question is 'What factors an: associalcd with the aulcome?~ or a parsimonious madel is n:quircd fCll' pn:diction purpases and selected either by forwards or backwards stepwise elimination. However. as wilh a similar proccdun: with multiple regression. it must be uncIerstoacl ahat pn:cIiclion on a further dataset will not be as good as on Ihe one flOm which the pn:diction was derived. and exclusion or iKlusion of facton with P-wlues close to the choscn critical value may nat be rcpraducible. By the same token, howevcr. when ~ is a SIaI&:d hypothesis. the odds ratio of interest should ideally be adjuslal fCll' all facton dc:tcnnincd a priori to be of potential importance. Some of these ma)' IK1t be associated with outcome in the data al the conventional level of statistical silllificance. but adjustment can saill affecl the odds ratio of
LOGIT MODELS FOR ORDINAL RESPONSES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
inten:st. It is useful when there may be conll'Oversy o~r the nwnber or potentially conrounding variables to be included to give both unadjusted. rully adjusted and, perhaps also. partially acljustc:d odds ratios. SC CIdnn, S. 2001: 11ac rue and faU of logislic R:gression. Australa5itJn EpiJemiologist 4. 7-10. FInne7, D. J. 1964: Statistkal method in biologkal assa)". 2nd edition. London: Griffin. 'Ianey, D. J. 1971: Probit QllQlysis. 3rd edition. Cambridge: Cambridp: Uaivc:nity PR:ss. Haaner. D. W.. Taber, S. and LemesIIow. S. 1991: 11ac impodance or assessing the fit of lopstic regn:ssiOll models: a case 5lUdy.Am~ri«DIJOIIIPItIIofPub/icH«lllhBI.I~5.Lawlor.D.A.,
Patel, R. aDd EbrabIaI,s. 2003: Associalionbdwcen falls indderly
'A'Dlnea and chroaic disc8SC5 aad drug use: cross-sedioaaI study. British MMitaiJouma1327, 712-15. Roberts, D. ... Romer. L M. _ S..... A. V. 1971: Age at mea.-cllc, physique mel eaviJonmenl in iadustrial northeast England. Ada Pamialrim S(tI1ft/ina1'im 60, 158-64. RoberG. Do F .. BIt....., 1\1. J .... alan, S. 1975: Menarcheal age in Northumberland. Acla PattiialTim SttmI/;1fQV;tQ 64,845-52. Slap, G.... Lot, L.1Iaaq, B.. .,....,... C. A., ZInk, T. M.... 5afc:Gp, P. A. 2003: Selual behaviour or adoIescelMS in Nigeria: cross-sectional survq- or secondary school studmts. Brita MrtlimlJouma/326.15-IB. Samentlle,S., Raaa,R. J.... CIIIna, S. 1988: Pass~ smoking and ~1'IIIOry conditions in primary school dUkEn. Journal of EpidrmioioBY and CommlUlily Hta/th 42. 105-10. ZIIana, J.... Va. K. F. 1998: What's lbc Jdati\'e risk? Journal oJ the ADleritQII Mftdim/.woriation 280. 1690-1.
logft models for ordinal responses A ~gn:ssion model is a slalistical model for describing the rdalionship between one or mo~ elplanatory variables and the response (dependent) wriable. The purpose of slalislical modelling is to lit the best model from a medical and epidemiological point of view that describes this relationship. The statistical modelling of how the relatiOnship between the elplanatory variables and the response variable could be described depends on how the response variable is recorded. The linear regression model as5UmCS continuous quantilali~ response values. When the response variable has only two possible valucs or is measured on a rating scale. a 10g;1 Irml.s/om,alion of the response valucs will meet the asswnption of continuity. A simple liMOr regression model describes how much a contilMlOUS quantitative response variable (y) depends on the explanatory (x) variable by the expression y = a + b:c, where a is a COns1an1.1hc inlercepl. and b is the regression coefficielll, which contains the imporlaDl infonnation about the dependence ofyon:c. According to the madel. y will change b units when x inm:ascs I unil.ln a multiple regression modela linear combination of several elplanalory variables are included. The purpose could be to in~stiplC how the response ,'ariable depends on all explanalDry ,'ariables together. Some of the variables could also be included in the madel as confounding racton. which means that Ihcy would disturb the relationship or inlClCSl if not being adjuslcd for.
In the case or only two responses. sucxesslfailure or discasc:dlnondiseasccL the range of possible response values is betwc:en zero and one: e.g. when the probability of success is p = O.B, then the pmbability or railu~ is (I - p) = 0.2. As the madclling asswnes unlimiled possible continuous valucs.lheelplanatcxy variable will be linked to the raponsc variable by a logit Irans/onntllion. Then the odtU ofsucccss is the ratio between the pmbability of success and the probability of railu~: odds = p/( I - p). The 10Pi of the proportion p is defined as the log odds = 10git(p) = In p/( I - p). when: In denotes the logarithm to the base e. The regression model is called a (linear) LOOISJ1C REORESSIOX model, lopt (P) =a + b."C. and the multiple logistic repessionmodelislogil(p)=a + blx, + ~2+·· ·+bkKII'.whenk explanatory variables are included. The interpretation of the relationship between an explDnatory variable x and the probability p of success is that when x increases 1 unit. the odds will change eb • For example. lopl(p) = 3.2 + I.n means that the logit(p), or the log odds, is predicted to change 1.3 roreach unil ofincrease in."C and hence the odds orsuccess • will change e 1.3 = 3.7. Logistic regression is commonly used to compare the odds or soocess between two groups or subjects with and wilhout some prognostic propeny, such as smoking habits. For ilIUSlnltion, consider a model for having a spc:ciftc disease. logil(p) = 3.2 + 1.3 age + 0.4 smoking. where the prognostic variables are age (yean) and smoking habits coded as smoken = 1 and nonsmokers = O. Assuming the same age in the two groups. the logit ror smokers is logit (PJ = 3.2 + 1.3 age + 0.4 and ror nonsmokers logit(P_J = 3.2+ 1.3 age+O. Tbc difference between these logits is logil(pJ - IOgil(p_) =0.4, wbic:h is a difren:nce between the log cxkk or disease in smokers and nonsmokers. This difference between loprithms is the same as a ratio. in this case the log odds ratio, InOR. Thus. InOR = 0.4, which was the regreSSion coemcient associated with the variable smoking: then OR = eo... = 1.5. According to this example. we can prediCI that the odds or ha"ing the disease are related to smoking habits and are predicted to be I.S limes larger in smokc:n than in nonsmokers, aftu adjustmenl for age. Tbc logit lransformation makes it possible to model how a dichotomous response variable dc:pcnds on the explanatory variables. 1'11c lopt tnasronnation is also suitable ror ordered calegorical (ominal) responses, provided there is dicholomisation or the response: categories. Consider a rour-poinl scale with the catqorics 'none'. "slighI'. "modcraie' and "seve~·. Assume that the numbers orobservations in the caaegariesan: n~. nl and n.. respectively and the total number of observations is n. The cumulali'te, conlinulllionratio and adjacenl-calegories logits are three approaches to creating dichotomous dalascts considering the ordered structure of the: ordinal responses.
R,.
___________________________________________________________ In the nlmu/ol;~ 10g;l. also called Ihe propor';onal otItb model, the probability of being in Ihe lower calqories is compared with Ihe probability or beil1l in die higher. Empirically. the numbel" of observations in categories rqRsentinl loWer levels is compared with Ihe number of obsemdions in Ihe higher levels of the scale. There are (m - 1) possible cut-off points between categories in a scale with nI cldcgories when creating cumulative 10gi15. In the four-poinl scale ~ are line possible cumulalive lagits; when the cul-ofl' point is the IIrst caleJory. 'none', the cumulative lopl is In 1"./(n1 + 113 + ",,)]; by moving the cut-off point onecalegory at a lime the cumulative Iogits will be In 1(111 + n2)/("3 + nJ) IIDd In 1(11. + n1 + "31",,]. In cumulative Iopts, all data are used in each Iogit. The lint cumulative logilcauld be interprck:cl as the log odds of the n:sponse ·IICIDC'. as eomplRd with ·slight'. ·moderate' and 'seven:'. If the wriable is pain Ibis cut-off point seems n:asonable. Absence or pain is compared with presence of pain. but the alhercumulali'VC logilS could also be of iDlen:&l in a logistic model. In the rolllinlllll;on-TQliD tlpprotlCh. the nwnber or abserYIIlions in one catelOl')' is compared with the number of obsen'alions in all catelories represeating lower levelLln the four-point scale the continualiOlHBlio lopts are Inlll~.J. In(nl(n. + nJl and InlllAn, + ~ + "3)]' In Ihe tltIja~nl-clllegories 10g;l, adjacent calelaries are .comparcci: in the four-point aaIe the Iogits are In("z'"'). In(,,-'''J and 1n(".Jn~) and this applOBCh is also applicable to categoricallnaminal data. After dichotomisalion. the IDiits Cor ordinal clala can be used in the lopslic n:gsasion model Cor dichotomaus daaa
LOG~NEARMODELS
and with com:sponding intclpRUtian of odds ratios, when evaluating possible n:lalionships between dichotomisccl ordinal n:sponses aad some prognostic variable. when conIIOlIi. for other propastic or disblrbing backpHmd variables. For fUlther details see Agesti (1984), Altman (2000) and Campbell (2001). ES (See also LINEAR RBHtESSJDN. LCXH.INEAR MOOELS, MlJIl'IIU REGRESSION KODELS)
"pIItI, A. 1914: AIrtIlym ofnlllQI tIIlegorit.YJItIDla. New yurt: John WUey.t: SolIs, Ine..AI...... O' O. 2fX»:Prtldit:alstal&tiajDr medical resmm.. Boca RaIoiJ: C1Iapman at. HallICRC. Campbell. l\L J. 2001: SllIIislics til MpItII'e '"~. Bristol: BM) Baab.
log-llnear models These an: models thai serve
10
describe the n:lationships between frequencies (aJUDts) and one or mon: variables lIIat affect lheir size. In pmclice. Ioglinear models are most often used in connection with CON. TINOENCY TABLES to describe the natlR of associations ~ tweenmulliple DOmina' catelorical variables. n.: analysis of cOnlingency tables formed flUID thn:e or more categorical variables will be ,the primary concern of dais entry, lince two-way contingency lables IR dc:a1l with in the entry mentioned earlier. When 8 sample rrom some population is classilled with respecl to more than lwo qualitative variables. the n:sulling data can be displayed as a multiway contiqency table. As aD example. we consider the three-way contingency table faulting from classifying 1330 patients according to blood pmiSIIn:. senan cholcstmJl and CIOnJIIaIY heart disease (see the ftnl table). In ~way labIes 'Ia,mag' is,
Iog-lln_ models Cross-cIassitIc of patients wlh IfISIJfICf to Ihree cIinicBJ tlllliables discussed in Ku and Kullback (1974) ~rum
CDl'tJllIIT)'
inrI
Blood
al!tlJr
preJnITr
Yes
<127mm
HI 127-146 147-166 >167 Total No
200-219
Tolal 220-259
>260
2 3
3 2
l I
l
I
II
6
6
7
II
II
20
12 21
,21
24
117
121
IS
98 209 99
47 43
22 20
68 46
43 33
2(M
Jl8
225
142
4
12 9 31 41 93
<127mm
fir:
127-146 147-166 >167
Total 0w:raI1 total
< 200 mgll00«
moteslerol
119 67 311 408
527 555
3(11 246 439 245 1237 1330
25&
LOG-LINEAR MODELS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
used to IICI."OI11IIIOC the levels oflhe third categorical variable (heart discuc). Several independence hypotheses might be of inlCmst in the thn:e-way allllingency table. These COlRSpond to dill'cl'Cftt combinations of jinl-o,der Telaliorrs/ripa between pain of categorical variables:
whereF""are theordical expected frequencies in a threc-way table under a particularhypalhesis. AmlllTtlledmotkl forlhe FtJ/l. i.e. a model that explains all thc varilllion in the data. is givcn by: In(Fjt}
=,,+ 1I1(;} + II11J) + ul(.t) + 1112(,1 + "13(it) + "DUl) + 11123(ft)
(I) mutllal intIeperrt/ence of the tlRC variables. i.e. none of
Ihc: pairs of variables is associated: (2) /HITliol independence, i.e. aD associalion exists belween lwo oflhc: variables. both of whicb an: independent of the thinl; (3) condilional intkpentlmce. i.e. two of the variables are independenl in each level of the third, but each may be associated with Ihc Ihinl variable; (4) nllllllal lU8DCilllion. i.e. each pair of variables is associaIc:d within each level of Ihc Ihinl variable. In addition, the Ihn:c wriables in a Ihrec-way contingency table may display a man: complex fonn of association. namely what is known as a:l«Olld-ortler relllliolUhip. This means that the type and/or degm: of association between two cab:gorical variables is diffemat in same « aU levels of the remaining variable. In thc:ory. in a /c-dimensional table relationships up to (/c - 1)111 order can be invcstipled but the inlc:rpn:talion of hipcrcxderrelatianships bccomesincrcasingly man: dil1k:ulL For some of the hypotheses of interest ill multiwsy tables. the corresponding ellpcclcd values under die: NULL HYPOI'IIESIS can be calculalc:d directly from appmprilllc marginal totals. but ror DIllen same: farm of iterative fttting algorithm is ncccIcd (see Everitt. 1992, ror details). The basic idea of log-linear modelling is to InmslalC the dill'cl'Cftt hypotheses of interest in a multiway table into a &eqUc:DCIe of statistical models so as to proVide a syslcmalic approach to die: analysis of complex multidimensional tables and. in addition. to provide estimates of the magnitudes of effects of intcn:st. The analysis of ~dimcnsional lables poses entirely new eanc:epluaJ problems as compaml with those in two dimensions. Howewr. the extension from tables of thn:c dimensions to four or more. while becoming more complex in analysis and inlclpn:tation, poses no fUrther new problems and here description of Ihc: analysis of higher CJI'der contingCDCy lables will be ill terms of those arising from thn:c categorical variables. The nomenclature used for dealing with the r )( c lable is easily CJtlcndc:d to clc:aI with a thn»climensional r)( c X I contingency lable having r rows. ccolumns and IlayelS. The obsem:d frequency in the ij/cth cell is now repn:scnled by,,_ for i= I, 2. .•.• r.j= I, 2, .•.• c. /c = 1.2•...• 1. 11Ic general model is:
In(Fijt) = linear function of paramcten
where II is an unknown parameter refcm:d to as an ·overall mean cffect~ since alllhc: OIhel' model terms are restricted 10 be dc\'iation terms; with L;"I(r1 = 0 is an unknown deviation tenn thai varies with the level of variable I and is called the 'main effecl of variable I ': "wi wilh Ej'lz{J) = 0 is an unknown deviation tenn Ihal varies with the level of variable 2. the so-called 'main effect of variable 2'; US(/cJ with U'lJ{k) = 0 is an unknown deviation tcnn Ihat varies with the level of variable 3, the so-called 'main effect of variable: 3': II 12(11. with L;"12(!1} = 0 ror all j e ( I, ...• c J and Ej "I2(,1 = 0 for all ; ell •.... r} is a funher unknown deviation tenD for the IIh category of variable I and the fth Calqary of variable 2. Ihe so-called "interaction bdween variables I and 2'; IIIJfdo with Eillll(it. = 0 Cor all k e II, ..., It and UIIIl(Ik) = 0 for all; e t I •..., r} is a further unknown dcviation term for the ith catepry of variable I and the leth calcgary of variable 3, the so-called "interaction between variables 1 and 3': ,,~) with Ej"»{jt) = 0 COl' all k ell .... , It and E.t"D(iI;} = 0 Cor all jell..... c} is • further unknown deviation tenn for the fth catc:gary of variable 2 and the kth calcgory or variable 3. the SCH:alled 'interaction between variables 2 and 3'; "11S(o/r) wilh Eilll23(ijk) = 0 for all j and Ej IlI2J(ft) = 0 fOl' all i and k and E"uJ2J(4,!i) = 0 far all i llidj is yet another unknown deviation tcnn for Ihc ith Clllcgory of vSriable I willl.n Ihc: fth category of variable 2 and Ihc: /cth calcgory of variable 3, the so-called ·thn:c-way interaction'. 1bc main effCClICrms in the sec::ond to f'ouIth oflhcsc tcnns serve to model the single variable marginal distribulions. 11H: two-way interaction terms in the ftfth to seventh terms madel the ftnl-orclc:r ~lationships. Different combinations of absc:ncc:lpn:scnce of the tme lwo-way intcraclions correspond to the mutual. partial. conditional independence or mutual association hypotheses. 1bc thn:e-way intc:raclion tcnn in Ihc eighth term models the two-way rcIaIionship. Far example. fOl' the data in our ftnt table we might compare the following sequence of models:
"le,.
(I) all cell Crequencies are the same:
In(Fjt) = II (2) marginaltatals far variable 2 (say cholesterol) and 3 (say heart disease) are equal:
In(F;t) =
II
+ "1(r1
LOG-UNEAR MODELS
lag·ll...... models Idenllllc8tion of an adequate /og-lineIIr model for the datil in the fItst Illble LR tesl
Model t.YJIIIINUUon
Step I
Step:!
Step 3
Step 4
MDIkI """"'~
Simple, motkl
Add inlenctiaa bctwccft blood IRSI1ft and cbolcstaol Add inlenctiaa bctwccft blood IRSI1ft and heart disease Add inlmetiCIII bctwccR cbolcstaol and laltdisease
Minimal madel (4): mu1uaI iDdcpendencc
MDrt! tfoInpkx IIrIItkI
~I (5): ~ iDdcpendencc of hcut disease Model (6): coaditiClllaJ iDdcpendencc of hcut disease and cholestcml
Model (7): mutual
Model (6): ccnlilioiJal iDllcpeadelXc of bean disease aad cWc:atmaI Model (7): mutual associatiaD ~ blood prcsIIR. choIestcml . . beart disease Satunlecl madel (8): all
alSGCiatian 1Iei'A'CCD blood
finl.... 1DII1CCCIDII-ordcr'
IR~ chablcml and ..... discasc
rdlliaRslUps
ale
equal:
In(F~) -
"+ "1(;) + Ill{,")
(4) the variables blood pmcsun:. cholcSIcroI and heart dis-
ease are mutually independent: In(F,,,) -
II
+ "1(;) + 112(,) + "3(1)
(5) variables I (blood pmssun:) and 2 (cholesterol) ale associated and both are independent of variable 3 (heart disease):
+ III(it + 112(,) + UJ(.t) + "12(,1
(6) variables 2 (cholesterol) and 3 (bean disease) IR CXIDclitionally indepeadent given the levcl of variable I (blood plaSlR):
In(Fjk}
= " + 1I1('l + 112(,) + UJ(k) + "12(4,1) + "13(it)
(7) all pain or Yariables are associall:d:
In(Ffit) -
+ 1I1(;} + "2(j) + IlJ(k) + "12(,1 + 1113(11) + 1ID(jt) II
(8) satundcd model far the dne-wa), table. including the seanl-onlc:r n:lalioaship: In(FjI:)
Model (5): panial iDllcpeadelXc of
"·",Iue
9
24.45
0.0036
3
30.45
<0.0001
3
19.28·
0.0002
9
4.77
hcaJtdisclSC
(3) only marginal totals for variable 3 (heart disease)
II
DnitJncr ~
Add three-way inlaldiCIII
Ia(Ffit} -
DF
O.BS
The: maclel is analogous to a lwo-way ANALYSIS Of VARIANCE (ANOVA) - hc:ace the use of the ANOVA tenninology but dillers in a numberofimportanl n:apects: ftnt.lhedata consisl of counts mther than a score for each subject on some cIc:"':nclenl variable; second. abe model does aol distinguish belween inclc:penclenl and dependent variables. All calcgorical variables are In:ated alike.as ·M5ponsc· variables whose mutual asSOCiali~5 are to be c:J(plorc:d: third. whereas a linear combination of parameters is usc:d in an ANOVA or ~Jn:ssion model. in muhiway tables the natunl model is multiplicalive and hence the counts are 101·tnnsfonnc:d to obtain a model in which panuncten are combined additively: lasllY9 whereas the CI1"OIS in an ANOVA or n:lrc:ssion model IR assumed to follow a nonnal distribution~ approprialc; distributions 10 model cell counts are the MULlINOMlAL DlSIRIBUTlON (for fixed sample size) or POISSON DlSTRIBU11OJII' (for random sample si.zc). The purpose or modenilll a.three-way table is lo ftnd the unsaturatccl madel with fewest parameters dlat adcqualely pn:cIicts the observed cell fRquenciel. The UICELDIODO RAtIO (LR) test priDCiple CD be employed formally to assess.1he impnm:ment in aaaclc:1 fit or a more allllplex model apinst a simpler model. The DEVIANCE or a model is clcftnc:cl as minus lwice the 10g-likclihaocI ratio belwc:c:a the model fitted and a saturated model and represents a mcasun: or model fiL Fot cell counts from a contingency table the deviance or 10J-likelihood ratio Slalistie for a particular model is' calculated as:
=" + III(it + 112(,) + ilJ(k) + "12{;) + "13(it) + "23{.i') + 11121(~t
257
~UNEARMODELS
___________________________________________________________
wlKR lhc E" dcIIIJIe maximum lilcclihaod estimates of lhc expected cell counlS und&:r the madel. TIle Iilcclihood ratio principle lhcn stales that an asymplolic test for a null hypadlesis. which amounts to ZCIO difl"emIce between two campelin& DCslaI madels. can be dab-at by camparinllhc dill'en:acc in deviances willa a au-~IAIE DlS1RIBU1'IOJII willa DfODFES OF RtEII)Qt,I c:quallo lhc Dumber or extra paI'BIIICJeIs in lhc IIIIR complex model. We carry aut a sc:ric:s ofLR tests 10 ~ lhe scquc:ac:e or .....lsshown in the second table. model (4). whicballows the I1UIJ'IiIIallotals of all three \'IIriables 10 vary and is a goad sllUtin& point since we 1ft ink:n:&IccI in the ~Ialionships bc:Iwam the variabIc:s ndlac:r Ihaa thcirllllll'linal disbibulions. "Ibis model is usa.y terc:m:d ID as the ",ini"",1 madel for a table. AcIcIin& the: two-way intc:nlction b:Ims impro~ lhc mocIcl Ilt sillliftcandy CIJIIIP8R'CI1D the: simpler model in lhc pn:viGUS step. Model (7). which includes all first-anier rclationsllips bul no sc:cond-onlcr ~lalioDship. provides aD adequate fit far lhe data since the comparison with lhc salUratcd model does nat illdicate any lack or Ilt (p - 0.15). II is important to note dlat llllentian must be ratrictc:cl1D IIIERAIlCIIICAL MODELS. 11Icse ale such ..... whenever a higller anIcr eft"c:ct is iDcluclc:d in a model. the lower order eft"c:c1S aIIIIpOICd rlUlD variables in lhe higher effc:clS 1ft also included. Howc~. in pracliClC, this ~slricli_ is or liale CDDSc:quence, since mostlables can be dc:scribc:cl by a sc:ric:sor hicmn:hical models.
Iclentifted aaaciatiansale bc:at und&:ntoad by c:onstructilll lDbIesorc:llimated cell cauntsundc:rlhe final model. The fiDIII madel rar Ihe data in Ihc: ftrst table sblles Ihat lhc usacialion belween blood JRSSIR and cholc:stClol is the same rar patients wilh ar without heart disease.. the association belwc:c:a blood pess...., and heart disease: is the same for each Ie.el of cholcsleral and lhe usaciali_ between cholelleld and heart disease is the SDIIII: far each Ie.el of blood pn:ss&IJe.. We can lherefcn assess ~ two-way tables of estimalccl cell CCHIIIts (third taWe). Foreach arthesc: two-way tables Ihe le.els of the third variable have been ·awraged out' (on Ihe 101 scale) to pnwicle a pictu~ arthc: two-way interaclian. To unclc:rstaad Ihe nature or the: inlenldiaas., adds of the calclaries of aae \'IIriable CD be calculated (e.g. ar COI01IIII)' heart cliseuc) ad com.,...al belwc:c:n the ClIII:garic:s or Ihc: sc:coad yMiable. ID essence the raullS indil:1III: thBl ~ is a positive assaciati_ bmwcn hilla blood pn:s5IR aad die: occum:ac:e or CGnIDary heart disease ad, similarly. a positiye aslOCiaIion bc:Iwccn bip serum cholestc:ruJ level and conHIIIl')' heart disease. 'I1Ic: adcls of CGIVIUUY head disc:asc ~ eslimated Ia ~ than triple WIac:D comparing the hilhesl. cholc:slcml (blaacl prasun:) category with lhe Iowc:sl. TIle ...lUte ar lhe clc:lectc:cl lIISDCialiaD bc:twec:D blood pras~ and cllDleslcral is less de.. Howe~, laokin& aI Ihc: estimated adds or Ihe second Iawc:sI blood pressure calcICIlY 10 the largesl it would appc:. thaa lhe odds or high
Iog-II..., modeIa eel countsesllmated by the best ~model"V8IIJfIIKI aver the level attire IIrifd V8I#abIe: (a) association beIween blood ptrISSUtBlIIId cIJoIsBtsro/; (b} associIIIion beIweBn head dissase and cholesterol; (e} associIIIion beIween head diseaSfl and blood pt8SSUf8 Snvm tlJtJlatnoi
ItI} BJaad~
< 1271D11l1f1 127-146 147-166 >167 Odds 127-146w> 167
< 2OOmcliOOcc 20.12 14.25 27.83
200-21~
~2S9
20.59 15.90 4737
11.01
22.60
0.63
Heart disease
< lOOJIIIIIOOcc
Yes
4.5
No
94.~
Odds
G..04I
Odds
0.43
032
220-259 431 so. 15 0.086
>260 4.53 28.46 0.159
147-166 7.57 91.77 0.G12
>167 10.10 56.09 0.11
200-219 5.75 125.13 0.046
BIODtl pr~s.JIITe
te)
No
0.41 Snvm t6D1at~m1
III)
Haul disease Yes
33.lI
9.36 20.90 21..5S
>260 7.51 6.40 n.53 19.74
< l27mm He 2.95 62.12 0.047
127-146 2.24 52.09 0.043
_____________________________________________________________________ UDGXACT blood pn:ssu~ incn:asc: with inc:n:asiq cholesterol le:vel. When il has bc:c:n decided how best to dc:scribe aa intc:r.:lion the ~Ievant ODDS RATIOS and IRfcnably C'QlIR)EICE INfERVAlS should be: ~ Associations be:twc:c:n categorical wriables CaD also be displayed graphically using CORRISSFONDENCE ANALYSIS.
Log-linear modelling of cell counts is appropriate when a sample is classified with n:spc:ct to several calegcxical variables aDd associations be:tween their 1c:\lCis an: or inlen:st. In othu words. all categorical variables 1ft lrealed as dependent variables and none of the IDIII'ginai loIaIs is fixc:cl by design. When one variable is viewed as the single dc:pcnclcal variable and the others as explanalOcy variabl~ either as a n:sull of a study design that fixed same: of the marginallolals (e.g. a COIIORI" mJDY) or simply because a din:ctiemal n:lalionship is of inte~l. modc:ls such as LOOISTIC REORESSION for binary clepenclenl variables and RJultinomial logistic regressioR for dc:pc:nclenl wriables with man: than two Calcgaries 1ft wananted. (For mo~ clc:lails. see Agn:sti. 1996. and LOOII' MODW FOR CJRDINAL JlESlIONSES.)
While: IOI-linear modelling or cell counts is mosl often uaI to analyse associations be:tween calelorical variables. the CGIICCpI exICDds to any counl data and also allows for etTects of continuous wriables. 'I1Ie total incidence is usually nOl flxc:cI by design in such applications and the maclcUing is ~ genc:rally n:fenallO as PoIssoN RfORF'SSI(]N. E\lCn ~ generally. all the madclling approaches forcaunls mentioned so far can be consiclen:d special cases of CEHERALlSm LINEAR MODELS w~ Ii linle jilRction (e.g. Ihc logarithm) is uato avoid pmlictions outside lhe possible: range and the data mocIellc:cl by a dillribution from a cllllS of distributions (e.l. Poisson or binomial). (For mon: dc:Iails. see McCulla&h and Neider. 1919.) SL A.,.aI, A. 1996: An iIIlrtJtlMr.itHr 10 m/qDrim/tlG,a tIRII!ysu. New VcR: John \Valey a: Sans. IDe. By.., B. S. 1992: '17It!1IIItI1y.iJ of ~OII'illgDley la6I~l. 2nd edition. Boca Raton: C1tapman a. Hall. Ku,H. H. ... KuIIback, S. 1974: Lac-linearmodds in CGllliDgcncy table analysis. AmmCtlll Sia'illician 28, 11S-22. Mc~ P• ... Nil.... J. 1919: CierrmlIL-fti lift«ll' IIfIHlrII. 2nd editian. L.ondoD: Cbapaan ct Hall
lognormal distribution This
is A PROBABIIJTY DI51RI8l111OK such that lhe Daluml logarithms of observalions from the distribulion are normally distributc:cl. As a resuh. the distribulion is always positively skewc:cl (sec SKEWNESS) and only pnxluces positive observalions. The distribution is usually defined by the slanclanl parameten ofthe associated NORMAL DISTRIBurlO.'I. so :c is 10gnonnaUy dislributc:d with parameten,.. and cr if 10l(x) is nonnally distributc:cl with parameters,.. and cr. The clc:nsity function of x is then:
/(.1") ==
1
xaJS
ex p ( - POg(X),_p]2)
~
Although the need for an extra x in the leadinl denominator (compared to the density function of • normal distribution) may not be obvious. it bc:comes apparent that it is requirc:cl when one considers the effective chanle of paramelerisation that bas taken place and ils effect on the integral that clc:fines the cumulative density function of the normal distribution. The distribulion has MEAN eap (p + cr12) and VARIANCI. exp(2p + 2cf) - exp(21' + ~). If y (= Iog(x» has a nanaaI distribution. this is oRen because y is the n:sull 01 summing many indc:pc:nclent but similarly distributed wriables. 1bc IOIDDIIDIII distribution dlen can arise fmm the multiplication or many independent but similarly dislribulCd variables. Objc:ct sias may often be loponnally distributed if Ihcy 1ft the result or ~ (multiplicative) erasion processes or COIIIulation pruc:c:sses. Many IDUR':ICS of pasiti\ICly slccwed data. e.I. suniwllimes (sec SUJMVAL A...ALYSJS). may be: adequaaeJy approximated by a Iopannal disbibuliaa. although this is nat a suflicienl criterion for lIISuming that the 100normai distribution can maclc:laay posilively sIccwc:cI cia... Muubini (1994). in fact, finds that bn:asa cancer survival times CaD be modc:lled as cominl fmm the Io,normal disbibalion. One parlicularlycommon usc: oflhe 10lnormal distribution is rormodelling ratios. In particular. the CQ.4R)ENCE INTERVALS for ODDS RAnos and ~IaUve risks 1ft often calnda'c:cI by assuming that the ratio has come: flUID a Iopannal distribution. It should be: DOled that Iopormal data an: oftc:n subjc:c:lc:d to alog11lANSRJlWA'110 and dac:n In:alcd as nannaI (Bland and Allman. 1996) ndhc:r than explicitly trying 10 usc: die: densilY funcUon giw:n earlier. AGL Bland, M. J. aad AIbnaII, Do o. 1996: StalisIics aota: trlllSfannilll daIB. Briti.Jh MetlkGI Joru1llll312. 770. ManI...... r. 19M: Whca pllienls ,,·ida breast auxer can be CCIIISidemI to be cum!. BrilUIJ M~dkGl Joumt1l309. SS4-S.
Loglact
LogXacl is • companion pradUCI to STAT. XACT. feahlrinl exact infe~nce for binary data in the ~sence of coyarialeS. An underlYing LOOISlK" RBORESSIDN model is assumc:cL BOlh exact and asymptotic solutions an: pIOVidc:cl. LosXact additionally allows modelling of polychotomous responses (i.e. outcomes with IIIOM than two categories). LosXact handles malchc:d~onlml data undc:r general M:N matching usins condilional UKBJIIOOD infe~nce. AsymplOtic infcn:ace is based on maximising the unconditional likelihood function for unstratified data and on maximising the conclitionallikelihood function for stratified data. Exad infemICe is based on gc:nc:ralilll the conditional distributions
LONGITUDINAL DATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
or the sullicicnl ItaliSlics fGl" Ihe ~pession CXlCfBcicnlS of inlCrcst. NUlWilCEPAIWoEII!RSbe:ing eliminated by fixing their n:speclive suflicient statistics aI the obserwd values. Far a detailed discussion of Ihe theory undc:dyilll exact Iopstic Ji:gression. R:rem.ces to numerical algorithms that ped'ann the computations and scwral examples involving the analysis or biomedical data by l..oIXacI. mer to Mehta and Patel (1995). LoJ;Xac:1 abo pruvides exact and as,ymptalic inreR:nce ror PoISSON REORF.mON. Reviews or LogXact 1ft giwn by I..emeshow (1994) and Oster (2002). The CUII'eJII YerSion, l..ogXact 6, uses powerfUl Monte Carlo procedun:s dial enable fast exact infCl1:DCC for a much larger cllllS of datJlselS than those for which exact infCRDC:C was previously thoUlht feasible. Far example., LogXact adUally provides two wrialioas of the Monte Carlo procecIure. Neilherdominates the 0Iher in lenns of eflicicnc)' - for a giwn daIa analysis. Ihe choice of macIeI ad the available c:anpuling II1CIIIOI')' will cletennine which computalional method yields a solutiaa.1he fastest. MOR:OYer. LogXad gives the user Oexibilily in SWitching frum one computational .methacl to BIIaIbc:r without. having to begin the analysis fnHli Ihe beginnilll. As euc:1 cOnditional logistic R:p'ession c:an be time consuming. Ihe incarporation or these Mfinemcnls c:an allow an investigator to achiew significanl lime savings. l..oIXac:t nms on Microsoft Windows NTI2OOOP'XP as a lllllldalaae product. In addition, a special version, PROC-LogXac:1 for SAS Users, is available as aa extc:mal pnic:edme thai can be: used with SAS for Microsoft W"mdows. CCoIPSe/CM/NP
1..emIIIIDw, S. 19M: LogXact-1lubo: Iagistic: ~pasioD saftwan: CeIllIIiq cUd mdbods. Ep;t/tmiDloI1 S. 2. 259-60. MelIta. c. R. IDIl "II,N. R. 1995: Euct IopsIic R:p-eaioa: theory acl~a dans. SIaJi,'ics I" M_itine 14,2143-60. oar. R. A. 2002: An cxlIDinaIioa or statbticaI SDftwm: packages for calqDlkal data anaIy. usial aKt methods. 71re ARrerltlllr StDliJlirilllr 56, 3. 23S-46.
longludlnal data These data haw the distinguimilll fcatlR that the R:spaase variable or iDleR:sI and a set of cxplanalor)' variables (factors DDCllGI" covariates) DR: measured repeatedly ewer lime. Such data arise riequently in medical studies. panicularlf, f~ ClUII11ple. from CLINICAL TRIALS. The main objective in collecting such clata is to characterise change in the R:Sponse \'DIiab1e o~ lime and to determine Ihe covariates mOil associated wi'" aay chaqe. In many clinical llials. for example., primlll')' inlerat wiD ce~ on lhe effect or the tmllmenlgraup on changes in the raponsc. Because observatiaas orlhe n:sponse variabJe~ I1I8de on Ihe same individual Dl dill'eR:nl times. it is liltely that these meas~mc:a15 will be c:om:lated with each .alber mther than iadepenclc:atly. This cam:lalion must be accounted far acIeqUalely in anler to dnaw valid and ellicicat inrerences about how the covarialeS alfcot the response. Consequently models fGl" IoqilUdinal data (LINEAR MIXED-UfEC1S MODELS. CJENER. ALJSB) ESIIMATINO EQUA'I1ONS) geaerally have two c:amponents: abe first is essentially a ~~ssia. model linkiqllle avemp msponse to the covariates: the second is a model fGl" Ihc ass~ covariance stJucIIR or Ihe n:pc:aIc:d meBSlRmeals of the R:Spaasc. The eslimatecl ~~ssian coefficients in the first part wiD be: the parameters or mast interest. willa the: paramdcn modeUing Ihe covariaDCeS being of less 4XJ11Cem (they aR: essc:atially NUISANCE PAIWoIEIERS). ~. seleCling an unsuitable model far Ihe COVARL\NC'E slrudlR of the n:peatc:d R:sponse values. i.e. one that does nal I1IaIch Ihe obserwclllnlctun:. c:an advenely all'ecl. infen:aces on those panunelCrS in which Ihe investigator is most inlCRlICd. Several cxampiesof IIIe _alysis or longitudinal daIa fRlm clinicallrials are given in Evenll and Pickles (2004). SSE (See also DROFOUI'SJ EftIUt,B.S.aadJlldlls,A.2004:SIDIUlictUapttlSlI/lllrti6sip and tlllQlysis of diRaI 2nd editioa.. LaIIdon: ICP.
trw.
M machine I_mlng
This is a branch of AlmflClAL INTELUClENC'E (AI) CGDccmed with developing algorithms that can learn and generalise fiom examples. By 'learning·
one means the acquisilion of domain-specific knowledge n:sulling in incn:ased Pf"diclive power. The use of leanaing algorithms for data analysis can geaerally be divided into two staps. Yd. a lnining set of data is provided to the algorithm and used for selecting a 'hypothesis' (the learning phase). Thea the selc:cted hypothesis is validaled on a set of known data to measle its predictive power (the validation phase) or used to make pn:dictions on unseen data (the test phase). A major problem in this setling is the risk thatlhe selected hypothesis n:8ects specific featura or the paJticular training set Ihat ~ pn:sent due to chance instead or due to the underlying soun:e genc:nting it. This is called ·owmtting· or 'o\'Cl'Inlining' and leads to n:duced pn:dictivc power or geaeralisalion. This risk is nalunlly higher with smaller training samples. Motivated by the need to understand overfilling and generalisation. the last few yc:aJ5 ha\'C seen significant aclvlUK'es in the mathematical theory or learning algDrithms dial have brought this Reid very close to certain parts or statistics. and modem machine learning methods tend to be more motivated by theon:lical considerations (as is the case ror SUJIIIORT \'EtTOR MACHINES and GRAPHICAL MODELS) and less by heurislics or analogies with biology (as was the case - at least Originally - for NEURAL NETWORitS or genetic algorithms). Modem machine IeanaiftI is a very theoretical discipline. whose conneclions with AI ~ sometimes less obvious than ils caanections with multivariate statistics. Umited to the seUing when the examples an: all given together at the staJt. it is a valuable tool far dala analysis. NeRDB lSec also DATA MININO IN MEDICINE)
MIIdNII, T. 1995: Mtu:hilw Imming. Maidenhead: McGraw-Hili. SIIa...TaJIoI', J. ad CrIIIIuI.... N. 2004: K~IWI mt'Ws for ptltI~Tn tIIItIlym. CambricIge: C.nbridle Uniwrsity ~ (www .kemd-melhads.nct). WI_ L H. ad Fnak, E. 1999: Da,s mining: pnlclitGI mochilwletmting 100& tmtll«IuriqurJ willi Jm'll impltmtnlalitlflJ. San Fnnc:isco: Mupn Kaufmann. This crit~on is used for the automated seleclion or variables in MUUlPLE LINEAJl RljIlpfSSKJN (Gonnan and Toman. 1966~ Mallows. 1973). Subsets of differing numbers of variables p ~ consideml. from p = 1 to p =k.
Mallows' Cp
&rCldtlpflftbc ComptIIIiIIIf IQ MNialI SltII;"lkl: SRrIIIII E4i1itM C 2011 JohD Wiley 6\ SoK LItI
wbe~ k is the maximum number of variables. At cadi SIqe the criterion ddcnnines the optimal subset and a stopping rule is then used to decide the wlue or p. Mallows' C" criterion idealities the best subset of size p. i.e. the one that minimizes the rollowing quantity:
C, = SS,s(P) +2(p-I)-n whe~ S8...,,,) is the n:sidual sum or squares based on a IIvariable n:gn:ssion. is an estimate of tr (the n:sidual variance bucd on the full model with aU k variables) and " is the sample size. The term 2(p - 1) - n has the effect of pcnaiisiDl ~ complex models. the aim being 10 produce the simplest modellhat fits the data aclc:quately. A plat of C" venus p allows the various compc:lilar subsets to be judged far diffemlt values ofp and 4 formal stopping rule for p may be applied. The rule suggested by Mallows is to stop whca e" is 'small' or close to p - I. Gilmour (1996) discusses stopping rules and mgues that the subset COlRSpOndiDg 10 the lowest e" wiD tend to include at least one unimportant predictor variable: he praposes a mocIiftcation of e" to lake account of this. AD example or the use of the criterion is given by Sutcliffe el QI. (2001). who used multiple rqn:ssion to determine cost predictors for patients with systemic lupus erythematosus. Mailows~ e" was used to dctcnnine how many variables were required in the model and the bestfitting model with this number of prediclCIIS. '11Ie next four best-filling models with that number or pmlictors and the four best-RUing models with one mare than this number wen: also found. 'J'bis provided a sc:t or candidate models far furthu analysis. TheM an: many allemative methods for subset selection. Hocking (1976) clcscribes the advantages and disadwnlagCSor several or them. including 'best subsets', which is computationally very inteBSive but gJobailyoplimal.and the naan: usual forward selection or backward elimination (or combination) stepwise methods based on F-tests for individual variables (see .o\UlmL\1lC S8.ECI1ON MEtHODS). The latter an: very widely used because of their inclusion in softwan: packages. but they have been criticised by Hocking and many other stalislicius as beiDl potentially misleading. Tbe Mallows· C"criterion. while not globally optimal. is stepwise optimal and may be pn:rerable to such methacls. ML (See also AlL SUBSEl'S REORl!SSION]
r
YIed by Brian S. Evain aad ChrisIGphcr R. JIIaImeIo
281
MANN-WHITNEV RANK SUM TEST _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ GIbaar,5.G.I996:Thei.nlerpmalionofMalIows' Cp stlllistic.11re
SttJlistia. 45. 49-56. GCII'IIIaII, J. W. IIIId T...... R. J. 1966: Selection of wariables for fittiq cqUllians to data. TedlllDmelria 8. 27-5 I. Hodd... R. R. 1976: ne analysis and seleclion of variables in linear rqrasian. Biomelrics 32. 1-49. MaIIn1r, C. L 1973: Same cammallS an C,.. T«hnDmelrks 15. 661-75. ~ N., CIarke,A. B., ~,R.,FnsI,C.aad"""""o.A. 2001: 1bIal cosLs aacl PR'IficIOrs of costs in paIienIs systemic lupus erytbelDIIosus.. RheumtztolDg 40.37-47.
wi'"
Mann-Whitney rank sum test Mcm2 levels in Iwo groups of people, one with and one without fibrosis of ths liver Wil. jibmJiJ (group IJ l
YIIIuc Raak
1.5 7 5 9.5 4 7 7
14 13 12 IS 9.5 3 II 1.5'
41
'79
Sum at
Mann-Whltney rank sum test 'this is the nonparametric version of the iDdcpendcat samples I-test (see STUDENTs I-lEST), also known as the Wilcoxon rank sum lest and Ihe Mann-Whitney U IeSL Mann and Whitney. and Wilcoxon independcady. c1crivccllhe test. so the lest stalistic lakes two different rorms. The U slalislic of Mann and Wbiblcy is usuaJly PRferred as it has a useful intcrpn:lDlion. "The: Mann-Whitney leSt is applied to two independent samples. lelting for a difference in shape and spread or the data between the lwo groups. With the addition of the assumplion Ihat the data from the two glOUpS an: similarly distribulCd. it also tests for a diffen:nce in ~ 01' MEANS between the two groups. The oIhcr assumptions arc lhalthe data 1ft randomly selected observations and that the daIa must be either continuous or ordinal in nature. Tocany out the lest. first rank Blithe du.. tiom the smallest to the IurgesL Assign the a~e rank to any lies in the data. Calculate Ihe sum of Ihe ranks in each or thc groups. Calculate U. and U 2:
U. Ul
-
ftlltl
+ ftl(n.2+ I)
-
ftllf2
+ m(m2+ I) - R2
R
I
U.
_
7 x 8 + 7 x (7 + I) -41 = 44 2
Ul
=
7 x 8 + 8 )( (8 + I) -79 = 13 2
Hence. U=min(44. 13)= 13. From tablcs (n. =7. n2=8. a =0.05). the critical value is 10. As 13> 10. there is not sufficient evidence to reject the null hypothesis. Therefore. th~ is no cvidencc or a difference in spread or location between Ihc two groups. There is a probability of 0.23 (= 13156) that u new observation from group I will be less than a new observation from ,IVUP 2. For further details see Peu (1997). Han (2001) and Swinscow and Campbell (2002). SLY [See also t.If'.DL\N lESTI Hart, A. 2001: Mann-WhilDey ICSl is nat jud a ICSl or medians.: dift'~nc:cs in spmMI can be iqxIrIaIL Brit&lr Medk:aI JoumtIl PJ, 391-3. Pea, .L A. 1997: NonptUtmWtrk sttJlUlits for hmlthcare ~.ThauSlDdOab:~SwllllCcnr,T.o. V....~....,M. J. 2002: SltJliJtia tJI JqUIlr~ OM, 10th edilioa.1..ondon: BMJ Boob..
Mann-Whitney Utest Synonym ror MANN-WlUnEY R.o\NK
when: ft. = the number of abscrvaliaas in gRlUp I. n2 = the manbc:r or abscrvalions in group 2. R. = the sum oflhe ranks assigned 10 group I and R1 = the sum of the ranks assigned to poup2. Calcuialc U=min(U•• U2 ). Compare U with the critical value or the Mann-Whibley U tables. Tbe null hypothesis is R:jectcd if the wluc: or U is less Iban or equal to the critical value in the tables. The valueoC Ul(nln~can be inlCJprcted us thc probability that u new observation from group I is less than a new obscMllion flVm gruup 2. As. an eumple. data in Ihe table show Mcm2levels in two groups ofpcoplc. seven with and eight without fibrosis or the liver. The groups do not ha\IC similar distributions. Nob: that thcBlSigncd rank ror the 10\\'e51 Meml value abservcd.due to thc tic. is midwuy between I and 2. ellClDplirying the convention mentioned c:arJicr.
32 27 2S 33 14 4 24 3
MeIDl
....
II 9 14 7 II II
Wilhoul jIbroJiJ (,rDllp 2J
SlJ).I
TEST
IlANOVA
See ANALYSIS OF VARIANtE
Mantel-Haenszel method. These IR a collection or statistical methods for Slndifted. categorical data. When analysing datu fiom an epidemiological study. one should be awarc or the dan.-=r of confounding. For examplc, in a CASE-COHlROL SI1JDY of the association between an industrial chemical and a particular cancer. 100 cases and 100 controls an: rc:cruited. When the data an: analysed. Ihe ODOS RATIO associated with the chemical is 0.91. slilgesting no association or a possible pR1lc:ctive effect. However~ il is noticed that when the datu are stratified by 5ex.1he odds ratio in men is 1.29 and in women is 1.38. suggesting a possible hannrul eaect (datu 1ft shown in the table). The reason ror this revenul in the odds ratio is that sex is u confounder in the association between exposure and
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MARGINAL STRUCTURAL MODELS disease. The disease is more c:ommaa in women. but women arc less likely to be cxposed. and so exposure appears protective if one does not adj_t for sex.
Mantel-Haenszel methods SUmmary data from a case-oontrol study, stratified by sex Mm
Care Exposed Unaposed
7 34
Total
Women
Control CaN Conlrol II 69
4
55
1 19
C~
OHttro/
11
12 88
89
One metbocI for ovemJmiDi conroundiDi is to sbatify the 5c:X. 1k statistic of interest (col. the odds ratio) is calculated feX' each stratum sepanalely. It is dlen oftea desirable to combine lhcse stratum-spc:cific slatistics into a singlc overall m~ to calculate a slandard enor for Ibis and also to tesl a null hypothesis (e",. Ihatlhe cxIds ratio is one). If die number or subjects in each stratum is IBlle, Ibis may be done using ).IAXIMU).{I..JIBJHODI) MElHOOS (e.g. LOOISTIC R£ORESSION), by introducing an additional JNII1II11d« into the model feX' each stratum. However. when data an: spancr, i.e. when the number ofsubjc:cls in a SIndum may be small maximum likelihood may poe biased esdmaIcs. In this sitWllion. it is necessary to use either conditional maximum liIcclihood methods or MantelHacnad methods. The Iauer have the advanlqc of bc:i11l \'CIY sbaightrorward to adculale and. for this 1aISOII. an: popular. Mantd-HacrISUl methods do not mquR that the numben or individuals in each stndum be ~ only thai the total number oflUbjcclsbclargeenaugh. Ho~ ifcvca lhetatal numbcrof subjc:cts is small it is nca:ssary to use 'cud· methods (sec
dais. as in this case where stratification was by
EXACI' t.Et1IDDS fUl CA1BJOItICAl. DVA).
Mantel-Hacnszd methods an: available feX' estimaling odds ratios rrom ~ontrol data. raIC ratios or rate diffuences from cohan data and odds ratios or risk ratios from casc:-cohort datL 1hey may also be useful when analysing n:peatecl-measun:s designs. When the exposure and the outcome (disease) arc both binary. Ihe analysis of a casecontrol study wilb stratification is an ex.ampleoflhe analysis of multiple 2 x 2 tables: one tablc for each Slndum and, in each table. two rows Cor cxposure and two columns Cor outcome. Mantel-Haenszcl methods also exist Cor the m~ general situalion or multiple I x J tables. e.g. a case-allltrol sbldy with more than two possible exposure (or In:alment) Icvels (I> 2) and/or ~ than two possible outcomes (J > 2). Both the exposlR and outcome variables can be treated as either nominal or ordinal categorical variablc:s. Finally. when combining several stralUm-specific estimates to ronn a single overall estimate. it is important to consider whethu this is sensible. Ir the odds ratio (or other measure) appc:an to vary greatly from one straIum to another.
possibly even bei~ much greater than one in some strata and much less than one in olhu sInda and this variation is IDCR than wauId be expected by chant'C. a siDiIe summaI)' measun: may not be YCI)' meaningful. In dais situation it is beller to n:pDIt the odds ratio estimatc Cor each stralUm separately. Thus. before calculatiDi the ovemIl odds ratio (or 0Ibc:F measure). il is worth testing Ibe null hypolbesis oC homogeneily. i.e. that the odds ratio does nol v8l)' from one stratum to anather. The 8rcslow-Da)' test is one such test. PeX' Curthu details see Kuritz. Landis and Koch (1988). Clayton and Hills (1993) and Rothman and Greenland (1998). SRS a.,.,D.IIIHlHlIlr,M.I993:Slatislkalmode&inepiderniolll81. Oxfonl: Black"dl Scieacc PUbIic"s. Karftz.5. J., L"",J. R. _ Kacb. G. O. 1981: A gencraJ OVCIVi~· of MalcI-HaenszeI methods: applications and Ment deYdapmcnts. Anmtal ReriA' of Pub/it Healtlr9. 123-60. Rot....... It. J. _ GI'IIIIIaIId, 5.1998: IVoIk,II epidemiology. 2nd editiaa.. Philadelphia: LippiDcott-Ravcn
Publishers.
marginal sbuctural models These an: regn:ssion models Cor 50-called COUDterfactuai or potential outcomes Y.. which express how the outcome or intelCSl, Y, would have loaIc.cd like if level a of the: laIIct exposure A had been received (Robins, Heman and Brumback. 2000). The models an: labelled 'marginal' because they arc models Cor exposure efl'ects at the population level. rather than within stratadefincd by covariate values. They an: labelled 'structural· because. by conslrUction. their paramclen carty a causal intcqRlation. For instance. in the linear marginal SllUctural model: E(Y.) = a
+ /la,
the inlera:pt a expresses what the expected oUlc:ome in the: population would be if all subjects were unexposed (i.e.• 0= 0). 1bc rq;rcssion slope:
fJ = E(Yd+I)-E(Y.. ) enc:odc:s the expc:ctcd change in outcome that would result rrom a unil inc:n:asc in Ibe exPOSIR. It compraI'Cs palc:ntial outcomes ror the 'same' subjc:cts under diffcn:nt exposun:: levels and thereby can be inteqRlCd as an averagc CAUSl\L EFfECT or the exlJOli'R on the outcome. 'l1Iis contrasts with SIandanI rq;n:ssion models: E(YIA = 0) = a' +110. whc= the regn:ssion slope:
II = E(YIA = II + I)-E(YIA = 0) compan:s Ihe expected outcome between dill'cn:ntly cxposed subgroups (A =Q + I and A =0) or the populalion. When these subpoups arc not inhc:n:ndy4XllllpU1lble. dw:n{f (unlike /1) cannot be interpreted as a causal exposure elfccl. In standard rqrcssion models. 4XJ111P1U'8bility across lRatment levels can be achieved by adjusti~ ror measutm 283
MARGINAL STRUCTURAL MODELS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ ClGftfounden L of lhe ex~utcome rclationsbip. In IlUU'linal Slnletural maclel... adjuslmelat for confounding happens throqh a weipliag procedure. called INVERSE PROBMlLrrY WElClIIllNa (IPW), which works in two steps: (l) A pseudo sample is canslJucted by weighting each
subject with the probability of the obscnal exposure. given the observed confolllHlen: I Pr(AIL) . For instance, a subject wilhconfounder lew) L = 1who is exposed to level A = I is welghted willi 1/Pr(A = IlL = I). Similarly. a subjc:cl with L =1who is exposed 10 level A = 0 is weighted with I/Pr(A = OIL = I). Here. Pr(AIL) can be estimlllCd as the fined value from a slandarcl regrasion of A 011 L. e.g; a logistic regn:ssion if A is dicbolomous. (2) BeaUle the weighting eliminates confounding, the marginal stl'UCllD"al model can be fiued 10 the pseudo sample as if there wen: no confounding. This can be acc0mplished via oft'-thc-shelf softwan: packages by fitling the corresponding regression madel E( rIA) = a + (JA while assigning each subjc:ct's data the giYen weight. We emphasise thai all methods ofconfounding adjustment - in panicular standanI rqn:ssion adjustment and IPW crucially rely on the assumption ofno unmeasured confounding. This assumption holds if L conlains aU CODf'ounders of lhe expDSUn>-outcame relationship, i.e. all faclOrs that di.ully afTcclthe exposure and an: also associated with the outcome. This is visualised in the causal diagram of the ftnI ftglR through lhe absence or an anuw from the unmeasured variables U 10 A. IrDOl all confounden orlhe expasurc-outcame relationship arc included in the eslimation of the weights IfPr(AIL), lhen the assumption of no unmeasured ClGftfounding is violaled. in which case the estimalc of the causal exposure effect may be biased. Tbc assumption of no unmcasun:d confounding is unlc:stable and has to be defended by subject mailer knowledge.
L/U~ ~r margl. .1 structural models causal diagram tepI'6SBfJIlng the data genet8ting mechanism, with A. exposure; L, measured confounders; Y, outcome; U, ,."".sured variables that affect LandY Because marginal struclUnl models encoclc populationaveraged efleclS. lIIey can conveniently be used for
SlaDdanlisation (see [Bt00RAFHy) with the total graup as the standard population (SaID and MlIlsuyama. 20(3). Because the IPW estimation procedure involves the FIlOPEHSJ1Y saJRE. the resultilll eslimates inherit the prapel'lies or pmpensity score adjusted c:stimalors. Mmlinal IIrUcturaI models arc. howcyer, most commonly adopled forllllSeSSing the eft'cct of a lime-varying exposun: on an outcxtmeiD the pn:sc:nce ortimewrying CXlnfounders. This is because these models awid regression adjustment ror lime-wrying confoundc:n. which is fallible when the Iime-WI)'ing confounders an: bolla affected by pall exposun: levels and affecting ~ exposure levels. For instance. CD4 count ClGftfounds the rclationship between AZT lRabnent and survival in HlV infc:cted subjects because it all'ects the: physicians' assipamellt of particular AZr levels and is associated with survival At the same time, it is aft'ected by carliu AZr exposun: levels. Tbe n:ason why standard Jq;n:SSion adjuslmc:nl fails in IhisconlcXt is bcawse. on the one h..d, it eliminalCS indin:ct exposureeft'ects thai an: medialed IIuuugh Ihcse confoundc:n (e.g. indi~ effects of AZr OIl surYiwl tiuough its efreet on the CD4 aJUIII) and. on the oIherhand. it may induce a ~Icd coIIider-slIalification BIAS by which a spurious A5SOC'IA1IDN between elplJllR and outcome arises. even in the absence or an exposun: effc:cl. Jnfmmce for marpnal sInIclural models sutrers neither or Ihcse two limitalions because it imol\'CS no regression adjustment rOl' confoundilil. The IPW procedure. which is used instead. is now slightly mom involvc:d as it must acknowledge the lime-vlII)'iqlUllurc of the exposure. With A' and L' denoting the exposun: (e.g. the AZT level) and confounders (e.g. the CD4 aJUIIl) IapCCtively, measurallIl study cycle I. the wei&hts now lake Ihe fann: I
(1)
L_ WQlOR;
A-,-1
= (I A .AI , •.. ,A'-1)
and
Lrl
= (LI.1
.,)
,&.0 , ••• ,&.0
refer 10 exposure and confounder history respecli~ly and wlKR T is the end-or-sauciy time. Next. a lDIII'Iinal s1nIctural model for time-varying exposures can be fitlcd. For inst..ce. with Y denoting I ror subjects who survi~ the end-or-stucly lime and 0 otherwise. the marginal structural model:
r-I
l~itPr(r.r= 1)=a+tJa'" +y1:a',
(2)
.r=1
is indexed by paramc:len:
exp(/l) =
odds(Y(~-'.I) =
I) odds(Y(r--'.O) = I)
~~;......--~
encoding the short-lam eft'ect or AZf OIl the odds of death. and exp(y) ca.,auring a long-term effccL These panunc:terscan be cslimated by fitting the corresponding regnssion model T = 1) = a + PAT + y 10gilPr( A" while assigniag eada subject's data the given welghl.
rlA
Er;.·
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MARKOV CHAIN MONTE CARLO (MCMC) Just as far point cxposura. the IPW procedwc for lime-varying expos~ crucially ~lies on the IDIlcslable assumpiionofnounmeasuredconfounding.FarIimc-varying cxposu~s. this asswnplion holds ir all radon beside 1',-1 -'-1 (L ,A ) that afrectthe cxposure at lime I and an: associated with Ihe OUICome an: contained in L'. roreach lime I. 'Ihissc:enario is 'Visualised in thecausaJ diagnun ofthe second ftgure, ror T = 2. liuougb the absc:acc or DD 8I1'OW from the unmeasured variables U to A'. In this fl&u~ the arrows ftom 1"-1 -'-1 I (L ,A ) loA indicalethallhen:ceivcdAZTlevelattime I may be atTedccI by the subjects' In:alment and CD4 count history up to I. Similarly, the anows from (£'-1 •.04'-1) to L' indicate that the CD4 count at time 1 may be affcctc:d by the subjects' treatment and CD4 count history up to I. U represents all unmeasured variables (e.g. gc:nelics, lifestyle faclol'S) that dect a subject's health status over lime.
liliii1i1... atructul1ll mod_ CIluslll diagram, with A'.
exposure at lime t
L'. measured oonfoundtNs at time t;
Y, outcome; U. unmeasuredvarisbles thataffectL'and Y
ft_
When SIandan:I statistic:al software packages (SAS.sTATA.R«"C used for muPnaI slnlctwal models through IPW. caution is needed in illlclJRting Ihc STANIMRD ERRORS provided by the software (sec srmsnCAL PACKAOES). because Ihcse ignore the ilnplmsion of the estimated weights. By using routines that n:poJt so-called sandwich estimalan. caaserwIi\'C sIandanI ell'OlS ~ obIaincd. A major appeal to maqinal structural models is that the undcdying IPW estimation proccdum stmightforwanlly genemliscs to man: compicx settings with time-yaryilll outcomes (Heman. Brumback and Robins. 2002) or surviyal ENIJIOJNI'S (Heman. Brumback and Robins. 20(0). A drawback is that some subjects may have small probabililies Pr(A'IA,-I.i/) at certain lime poinls, 50 lImt they receive inftuCDtiai weights (I). This can maIc.e the IPW cslimalc unstable and impRCisc.. To some exlenl.lhis problem can be mitigated by using so-caUccl stabilized weights. calculated as: srA11S11C\L B\CICAOBS) ~
n!:1 Pr(A'I.o4'-·)
fir=1 Pr(AIIAI-I~i') . When this is insuflicient. progras can sometimes be made by including baseline cowriatcs in the lDBIIinai SlnlCblraI model. i.e. cowriates that )JI'CCCde A 1 in time (e.g. sex.
ethnicity). For cxample. the model in (2) gencnlised to: T-I
10gitPr(ya'
= IIV = v) = a +PJ + y L
CDD
be
tI + 6v+ ~lYl.
.r=1
(3) when: V is a baseline co\lBriate that is contained in Ihe measured baseline conrounder LI. and 1P is a cowrialcexposum interaction. When the model in (3) is ftued through IPW. the (stabilised) weights an: modified as: T Pr( 'IA,-I n1=1 A.... ,V = l') t n1=1 Pr(A , IA-,-1 .L, V = v) -I
•
These weights are lypically less inlluential. This adaptalion may, howcver, be insufftcient when the exposure has strong prediclOn or when it is measured on a continuous scale, in which elise Pr(A'IA'-1 .l!) refers to the density or A', gi'VCD At-I and L'. In that case. one must rc:coune to more efficient (doubly robusl) estimation strategies or consider the relaac:d. bUI more complcx class or structural nested models (Robins. 1997). 11K: latter models have the addilional advantqe that. unlike marginal structural models. they can allow for modification of the exposure effcct by time-varying covariates. AJS/SV H. . . . M.. A., BI'lllllIIadl, B.and RebI-.J. M. 2000: Margiaal structural models 10 estimalc the causal effect of lidcmadinc on the survival of HlV-positive men. EpidemiDiogy 11,561-70........ M. A.. BnIIIIlJack, B. and Ro....... J. M. 2002: Estimating the causal effecl of lidovudinc on CD4 count with a muginal structural model for ,.ateci 1DeaS1RS. Sialutics in M~irinr. 21.1689-709. Pud,J. 2000: Causalil,: Mode&. Rm»ning. and Ilf/erellCr. Cambrid,e: Cambridge University Press. RobIas, J. M.. 1997: Causal iDfe~Doe flOm complex longitudinal dala. In Lalml Variablr Modrling anti Applications to Cawality. New York: Sprinp Verlag....... J. M.., Hem.., M. A. and Bnunbuic. B. 2000: Marginal stJuc:turaI models and causal inferCllce iD epidemiology. Epidemiology II, S~. Sa", T. and MafIa,.. .... Y. 2003: Marginal SlrUcturaJ models IS a tool for studanliutioD. Epidemiology 14, 680-6.
Markov chain Monte Carlo (MCMC)
BAYES' ~
OIEM (I) provides a means ror combining daIa. ". in the fona of the UKBJIIDOD,p()t9), withelUemal evidence in the fonn of a PRKIl DlSlRIBtmoN ror 8. p(8). 10 produt'c a POSI'BUOR DIS11lIBUTION. p(61y) (sec BAYESIAN ME1HOD5). Howe\'er, in order to make inf'crcnccs about either the posterior distribution itselr or lO obtain the posterior expcclation of a function of the model parameters. 9. using Bayes· tbcomn (2), we have to ewJuate often high-dimension inlegmls. which an: only r~ly analytically lractable. ConsequentI),. much of Bayesian slalislics over the last 30 years has been concerned with either parameterising models such that the inlcp'als
285
MARKOV CHAIN MONTECARI..O (MCMC) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ simplify arwilh Ihe uscofapplOXimalion metbacls (BcmanIo and Smith. 1994). Such approximalc mclhacls fall into line broad calelaric:s: lIIymptotic appntXilDlllions. Col. Laplace appraximalions: numerical intepdion techniques. Col. Gaussiaa quadrahR; or simulalioa methacls. e.l. Monic c.lo simulalion (Bc:nuudo and Smith, 1994):
p(B)p(yI9) p(9b') - Jp(II)P()'IB)c19
(I)
EV(B)Lv) = J/(II)P(9Iy)d9
(2)
Oiven a sample or \IBIucs for II from Ihe joiDl posb:rior distribulion.p(91.v), {trr). m= 1•.•• , M}. thea Ihe posb:rior expedaIion ofj(B) can be approximated by:
" £[/(9)11) ~ -1 L/(B(·») 14-=1
(3)
Ihe usc or (3) appcan appeaIiq. in pnadice. generation or samples from oRen hip-dimc:asional joint posb:rior distributions cm. be dimcult However. far (3) to hold. the samples 1C1ICI1Ik:d need nOl be inclepc:aclcnt. but ndher be from a Markov chain whose Slatianlll')' diSlribution is. in fact. Ihe posterior distribution. A Markov chain is a sequence or nadom \IUiabIes eBfI). (12J. .... sucb that II" only depc:ncIs on (1'-'. and not the n:sa of Ihe random \lllliables. C'onsIrucIiq sucb a chain lhen lives rise to MallOY chain Monte Carlo simulalion (Casella aad Georp, 1992; Bnds. 1998). The CODSlIUction of a Markov chain willi a stalionaly distribulion that is the paIleriar dillribution is relalively slrailhtrorwUd and Will initially propascd by McbvpoIis et III. (1953) aacllater Ic:aeralisecl by Maslinp (1970) and is now n:fern:cltoas the Melropo/b-lltulingslligorilirm. At the ilh of m ilcntians FJlCralc: a candidate value for B. from a pmpostll distribution••(IJtf1,-1 J~ aad then with probability a(9 CI-I J. , . ) ac:cepl i.e. (/1) ar rejcc:t it. LCo (1" 11'-1,. ~ a(B (I-I).,.) is pvc:n by the foIlowiq equation. which, in practice. is achieved by pnc:nIiag a \lBlue " &om a unifarm 10.1 J distribution and. if u:S a(9 (/- '., r). While
r,
=,..
r.
accepIiq
=
damain of p(lIy) within a finite number or iteralions and produce samples from Ihe stalionary distribution. i.e. p(1II."). Thus. it sllaalcl nat be clepenclent on the Slalting values. Clearly. one: way in which 10 verilY il'mlucibilily is to use Ihe alpxidun a number of limes with cIiO'erenl slaltiDl values aad inspect the samples obtained. Evc:a if Ihe allOridun is ineducible it has to be IUD 10111 enDU&h so thai it will 'fGlld' ill slalling values aad, in pmc:ticc. this is achievc:cl by runnilll Ihe alprithm for a "bum-in' periad and cliscanliq the linln samples and basiDg inr~nccs on only the lasl m -n samples. or crucial importance, thc:n:ron:, is the question of how Iaqc: m aDCI " should be. In practice. a combination or fonnal methacIs thai have been advocatc:d. together with knowlcdp: of Ihe statistical model and inspc:ction or Ihe samples obtainecI via sensitivity analyses to choices or m, n and dae slartinl values. is the mast )II1IIIII8Iic approach (Cowles and Carlin. 1996: Gilks, Ricbanlson and Spiegelbaltcr. 1996). Bxaminalioa of Ihe autocGIRlalion belwc:c:a the samples at variaas numbers of iteraliaas apart can RMW allariduns dud 1ft mixl". slowly. i.e. coverinl the whole: of p(lIIy). and thus aeecllo be run for considc:nble nu~ or ilcnllians. An alb:mative. often prefaml. option is 10 consider the ~ panunelerisation of the slalistical model in cmIer to illCRUC Ihe nrc of mixinl. In Iiaear relressian models. cClllriIil or cowriates and. in the case: or hieran:bical models, hlerllrclrimll.Wllrilfg have been shown to have dramatic efl"ects on Ihe rate of mixing (Gelfand. Sahu and Carlin. 1995: Gilks. Richanlson and Spi.lhalter, 1996). A special cue of the siqic componc:at Metrapolis-Hasliap alpthm is the: Gibbs sampler in which Ihe prapasal distribulians·1ft Ihc: sc:I or rull coaditianals and Ihe acceplance probability (4) is always equal 10 I (CJeman and Oeman, 1914: Gelrand and Smith. 1990).1bus, givea a sctorinitial ar slaltinl w1ues for the p JNII1IIIlCtelS in a statistical model. IB.(O)••••• B,.(O)}, the Oibbs sampler aI each ilendion draws a sample fram each of the conditional dislributioas in tum.
11Ius:
(5)
r:
( .IlI(i-l)
.Ill.)
a D ' , D'
=
• min
[I
P(II· ly )r("ci-I)19·)]
, p(8CAJ ~).(9.1~=lfj
1.4) ~
...iIJ (.Ill I.M I) B(I, ...il) flp - P D'PI"I • 2 , ••• , flp-I'Y
Thus. the n:alisalions 19.(1)••..• BI ""J}••..• {B,.m• •••, II"cIII'1 Clc:arly. if .(.) is still a multivariale dislribution. the generation of samples may still be diflicull. In pnIClice. most applications or the: MeInlpOI~ngs BIIoritbm use a sinllc CIOIIIponent pmpasaI distribution (Oilles, Richardson and SpielClbalaer. 1996). If Ihe MetnJpolis-Mastinp al&orithm is imldMt:ibk Ihen n:lardIc:ss of where it SIaI1s it will sample from die entin:
al\er m iterations provide samples fram Ihe lIUII'Iinai posIcriGll' distributions and on which infc:n:nces can be based. SampliRl from the conditional clillributions in (5) CaD be cliftk:ult unless lhey 1ft uniwriale. although for many HIJ!R. ARCIIICAL MODELS they are. or they an: IOI-concave. in which case tldtlptiw rejection StIIIJp/i". may be used (Gilks. Richanlson and Spiep:lhalla', 1996). One particular appeal
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MATCHED PAIRS ANALYSIS of die Oibbs sampler is that. in essence. it is simple to implement and while it can be prognmmed in a variely of computer languages and softwlft packqes.1he development of user-friendly soilwan: such as BUOS AND W ~ BUGS has pIOI1IOb:d ilS widespread use in numerous applic:cl scllinIs in biomedical research (Oclman and Rubin. 1996). KRA Be....... J••L aad Sm..., A. F. l\L 1994: BttyeJitlll ,'-"_ Chichester: Joha Wiley a Sons.lJd. Broo. s.. P. 1998: Markov chain Moatc Carlo method aad its applications. JDII1IIIIl oj tile Rt1)vzJ Sla,istiml Socie". Series D 47.69-100.
ea.ua. G. aDd
Georae. E. 1992: &plaiaiac the Gibbs sampler. America Slat-
istidtlll 46. 161-74. Cowin, .,. K. .... C_IIII, B. P. 1996:
Markov chain MCIIIIe Carlo COR\'Cllence dialllOSlics: a comparaliYC review. JDIII'IrtII of the AlMr;am St"'i"i~'" ArsDriatifNI 91. 883-904. G........, A. Eo ... SmI~ A. F. Me 1990: Samplil1lbased apPR*hcs to calcailling naarginal dcasities. IDrnal oj tile AnreriCtl/l StGlutiml An«itlli"" 85. 398-409. 0tU'ad, A. E., S...... S. K. aad C'arIID, B. P. 1995: EflicicDl paramctcrisatioDS for nonnaIlincar mixed models. Bitlflle,rilca82. 419-88. GeIIua, A.... Rubia, Do B. 1996: Maatov chain Monre Carlo IDCIhads in biostalistics. Sla,istit:a/ Mel"'" in Methml RemlrdJ 5. 339-55. 0 ......., S. ad GeIua, D. 1984: Stochastil: reluation. Gibbs disUibutions and the Bayesian restontian of ilnaps. IEEE 7ran.ta~litNa"" Pallem A""I)'$u tIIIII A'at:hille I"telligence 6. 121-41. GIIkI, W. Ro, ..........., S. aDd S........t_, D. J. 1996: MGrlco, diG;" Monle Carlo methotb in prtIt:t;a. New Yark: Chapman .t Hall. H......., W. K. 1970: ManIC Carlo sampling
methods using MaJtov chains and their applic:atioas. BionrelrilcQ 57,97-109. MeIropaUs, No. RoIIabIu~ A. W .. Teller, M. N. ad Tellet, A. H. 1953: Equalians of stale calculations by fast cCIIDpudng machine. Jllllmal 0/ Ch~mit:lll Physic, 21. 1087-91.
matched pairs analysis Ditrerc:at types or designs may lead to maIchc:cI pairs .....ysi5. indiVidually IlUlkhc:cl subjects in prospective studies. iaclividually matched caDtrois to cases in rclrospcCtive studies and pairs of data obtained when the same individual is measun:d lwia: 1ft examples orm8lchcci pairs (see MAmIED 5.WFLES). A sample of malched pairs consists or stalillically dependent data and in statistical analysis the pair. not siqle 11lIbjc:clS. should be: the unit. Matdu:d pairs analysis may conccm conceplS like change. dill'~ncc and odds. bUI aIsoAOREEMENI' and ASSOCLo\l1ON. Questions of change could include: Is there a dirren:nce in outcome due 10 diffen:ntln:atmenlS between individually matched subjects? Is then: a change in outcome within subjects before and after a treatment? Do patients prefer one tn:atment betler than another? Statislical methods for matched pairs analysis of qualitative. onlcn:d categorical and dichotomous data respectively will be presented. The cholcslerollevcJ wu mc:8SIRd in 20 students bcfCR and after a period of having a clietthat wu supposc:cl to have a cholcstc:ml-lowerilll effect. As each student was mc:asured
lwice, the diffen:nc:e between the lWO values was the outcome variable. Tbc: changes incholcslerol nmgecI fi'om -1.0 mmoll I (incn:uc) to 0.8 mmolll (clccn::asc). The table shows thn:c: differcDl SIalislical appnlBChes to matched pairs analysis of quantilalive data: 71Ie mm" tlpprOllt:h. Provided that the dalaset or differences is a sample froID a NORMAL DlS11UBUIlON, the paired S1t1DENt"SI-lEST of the null hypothesis or zero mc:_ cluange can be used. Tbc: observed mean change was 0.23 nunoIII and according to the lest (see the table) one can conclude that the diet will significantly cIccrcase the mcaa cholesterol level in a rcpn:scntalive population of about 0.04-0.42 mmolll. which is the 95 .. CONFJDE.NCE INIBlVAL (CI). 71Ie meditl" IIpprotlClt. Tbc: Wa.coXON SIGNED RANK TEST requires no assumptions about clislribution of the diffen:nces in quantitative data. 'I1Ic mc:cIian change was 0.2 mmolll and according to the test the nuD hypothesis of no MEDIAN CHANaE can be rejec:led (P=O.OI). 71Ie dimDIDmimlion IlPPlYHlm. The cholestcrul level ~ in 16. incrcasecl in tIm::e and was unchangccl in one slUdent.lfthe null hypothesis orunchlUllc:d values wen: true one: would expccl about the same numbers ofpasilive as ncplive diffen:nces: this comparison is pclfonncd by a sign test. Unchanged wlues provide 110 information about Ihe direction of change and will be excludecl The BINOMIAL DISJ'IUBUlION is used for exact calculation of the PROBABJUI'Y or gcltilllthe observed or eVCR IIICR exln:mc unballlllDC in ncplive and positive diffen:DCCS whCR the null hypothesis is true. The table shows that the plUbability of the observed 01' man: extn:me unbalance was 0.004, which is strong evidence that the diet will cllanp the cholesterol level. 11Ic large sample approximation of the one-mmple sign lest (Altman. 2000) can be written as: z~
=
I"-,,pl-!2 ",Ip(I-p)
where r is the number of differences of one sign IIIIIODg n nonzero differences and p is the probability IIIICICI' the null hypothesis of having the actual sign (p =l). In the example. ,. = 16." = 19 and =.: = 2.75. The proportion of stuclcnts with a dc:cn:ase in cholcslCrol was 84 CJf, and the 95., CI (see Newcombe and Altman. 2000) deviales from that of dae null hypothesis (SO.,) (sec the lable). Malc:hecI pairs analysis of cxdc:mI categorical data is applied to a dalasct ftom a study in diagnostic radiology (Svensson el til.• 20(2)- 'I1Ic patient·s pcrccivc:cl climcully during each of two radiological examinations. heR clcnob:d cr and CO, was rated on a scale with Ihe categories "not at all". 'slighdy', 'fairly' and 'very. difficult. Each of the 108 patients underwent boIh examinations. which means paired data (see the lip..: on page 261).
MATCHED PAIRS ANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
matched .....,. ...,... Tbtee dIIfetBnt spproac#Je8 to matched pailS analysis of change In 8-choIesIf?roIln a sample 0120 sIudents
NcpIive changes, 3 Pasitiw clwtps. 16
Mean daqe 0.231111DD111
staadani deviation 0.411lIIII0III
QuutiIcs (Q.;QJ (0.1; O.5)nunaIII Wilcoson siped raakI matched pairsItcst
StUIIaIl's paimI t: 2.527 sipificaDcc IC\'CI I' = 0.02
'=0.01 The 95... a ofmaliaa dllap: (0.1 to 0.5) mmolll
'I1Ic 9SCfI a of IIICID ~ (G.CM to 0.42) nuaaIIl
HoI
SIIghIIy Fairly Very
3
2
1
7
12
2
Zi
28
3
16
12
48
53
5
_uc:
conclude that the cr is pcrcciwdas sipific:anlly Icssdillcull than the co euminlllion.
CT perceived dIIIcdy
at all
The cud sip tat. 1'=0.004 The appruimalc sip test: :~=2.7S. P=o.OO6 The 95... CJ far the pmpartiaa of studcats wida dcc:1aIed 62Ilo to 9441t
21 1
&5
1
29
2
108
..tched pailS ....,... FIfNlIBJCY dimbuIion of PIIII8d data hom evaluation COIJDIItning petr:S/WJd diffI-
AD altcrutiw: way or dichotomisiq is by lJUupillldle: dala in lwo categories. ·nat at all cIifIlcult' (+) and ·diOicuk' (-). which contains tIRe calepricsor dimcully. The table is simpliftcd in two cancanlant and two disconlant combinations orcatepries. The disc:ont.t pain; or 32 cr(+), CO( -) aad 13 CT(-), 00(+) contain iDranDation aboURthe dift'ercncc in perceived diOlculty bctwcca Ihc euminaIions. 1bc SIGN 1I5T ( ; = 2.68.. P = 0.CJ07) confinns thaa IIICH'e patients wiD find the cr euminatian 115 'DOt at all dil&cull' when compared wida their raling or the CO examination.. As is evidcat rnm the ftl'R, a huger prapodi_ or the patienls (11'1,)judgccl the cr as beiq ·nat at all diOlcult'. as 48 patienls(44f1,) I1IIaI dlcCTand 29 (27fJt)nIIccIlhc: CD·not at all cliftk:uIt·. The 95'1, CI rar the dill'eRDDe iD die: paiIaI prqIOI1iaDS., Ap~ was rmlll 5'1, to 29.. accanIinc 10 the
exJRSSionAp= 1.96)( SE(4p).11Ic STANIWlDfIUIOR (SE)iS:
cuIlyCOlJDlltning IwoIlJdlclloglclJl examinations (er, CO) 0.18 fivm scale assessments have an
onIc:mI
ItrucllR
_Iy. which means that chllllle is DOt dcfiacd by the clift'erence. Thm:fon:.. the same statistical mdhods as for pain:d dichalamousciala will be used. A common expn:asian for ,lie sign lesl is: %c=
Ib-('I-I Jb+ c
when: b and c denote abe nuaaber or pairs with clill'cn:nt categories. A McNEMAIt·S tEST is an equivalent teat (Bland.
1996; Alllnan, lOOO). One appIOBCh 10 dichotomisc Ihc data is 10 compare the DUmbas or pain below and ahem: the diqanaJ orunchangc:cl catepxies. For the data iD Ihc ftp~. the 17 paIiaIts.. wblt I1IIcd the cr a hiper Ie'VCI or difBculty thaD die: CO. are ecIIIIpInd with the 4S pairs abo~ the cIiqonaI. Acc:anling 10 the IiID test (re = 3.43, P = 0.0006), this obsenal.baJance iD chanpspw:seviclcaceenouP to
SE(4p) =
! n
J + ('_ b
(b_(')2
"
where" is the 10la1 number or patients (" = 1(1) and bad t: are the numbcn or disaJrdanl pairs (AlbDIUI., 200Ct NcWCXllllbe and Altman. 2000). In order to use a pair-mardIccI CASE-CmI1RC1 S11JDY mediad we are interesb:d In the expD5UR: 10 the risk fiM:tGr:. Usil1l n:baspcctivc cauc-conlrOl SlUdies. individuals having a spc> cific cliseuc (c.1-11IIII CIlllCa') are compaml with indiyiduals widaaut the disease. Both the 0IIICIaaIc \IIIIiabIe (diseased. 1IIIIIIlisc8sc:cl) and the expasun: to Ihc risk factor (exposed. naI exposed) are cIicbotomaus. WithiD caeh pair~ them an: raur possible combinations or disease status and exposun:.. 1\Yo Ids or pairs are concordanL bUI inronnaliOli about Ihc n:latianmip between exposure and disease is gi'VCII by the pairs with clifl'cn:nI exposure. Denote die number or pails with only the case exposed "._ and the number or
_______________________________________________________ pairs with Ihe case unexposed n_ •. Providing nonzero numbers of discordant pairs, the odds ratio in malChed pailS is calculated by OR = n+_/n_+. An OR larger than unity indicalc:s a relalionship between exposure and disease. i.e. B higher odds of developil1l disease when exposed (McNeil, 1999). ES (See also CORRELATION. KAPP..\ AND WEIOH1ED KAPPA MA1tII~O. NATOIED SAMPLESI
AItnIa. D. G. 2000: Pr«timlslalistksjOr m«lica/ Trsetll'tb. Boca Raton: Qapmaa &: HalUCRC. B..... l\oL 1996: Air ilrtrotlMclio" 10 mtditsl stat&tks, 2Dd editiaa. Oxford: Oxford Medical Pras. MeNeIl. D. 1999: Epidtmioio,iml ramrrlr melhOtb. New York: John Waley &: Salls.1nI:. N.wClllllllle. R. G.IIDII AItawI. D. O. 2000: Prapartions anclllleir difl'c:mxes. In Allman, D. G.• Machia. D.• Bryant, T. N. and
matched
samples nus is a set or observations in
'which each observation in one sample is indiVidually matched with one in every othcr sample. PDRd samples consist or individuaUy pair-matched observations. The individually mlllChed obscrvlllioM ~ stDlislimlly dependenl and should be rqanlcd as one unit in sllllistical analysis. which tncans that lhc matched samples might be rqarded as one gRMIp of dependc:nt data. Hence, malched samples have an equal number of absemdions. Dift"cmat types or study. such as C'IOSS-sECTIOJIW.. CASECONJ'R(I. aad reHDhi/il)' STUDIES. can be designed as malched samples. Selr-pairing studicslead to matched samples, e.g. CROSSOYER TRlAU.lesl-relesl. intraobserver studies and studies on pain:cl organs. The purpose of nudchiag is to cn:ate homoFncous pain of observatiODS willa n:garclto important background prapcrties. so the remaining diffcrence between the observalions within a pair could be ascribed the soun:e of inlCrcst (effect oftn:almenl.. exposun:., time. etc.).1bechoice
MATCHEDSAM~
and the number of matching variablcs should be carefully considcml. as the matching wriables cannot be used in 'the statistical evaluation ofa possible rclationship or explanation regarding the maiD variable. Crossover design is ideal in randamised CLINICAL 1'IUAI.S far e\'BIuation or the dilTen:nce in effect between two Ratments: one of the lIaImenis could be a FI..ACEIIO tmdment. 1be variabililY betwccn individuals in lhc sample is eliminated as cach individual is its own control (se/f-ptlirilrl). CRJUO\ICr slUdics are also called chtmgeover. wilhin-Sllbjecl and ABIBA t:f'ODOtw :lllIIIies. Each individual will Ftlhc two tn:almcnls. A and B, in random onIer. with a wash-out period in bctwc:eo in ordcrtoprcvcnltheelTeclorlhcOnlbcatmentinlcractingwith lhc second (auryoycr) eft"ccl (see Ihe figure). 11Ie important assumption is that the patients will be in the same state when they receive each trealmcnl. Then:forc.. this design is suitable for chmnic stales only. An alternative way or cn:aIing h0mogeneous pairs is by using individually rnaIched pairs. n.e aim orusing matched pairs in crass-sectional slUdies is to evaluate the difference in effect bct'M'Cn two or man: trealments. Matching pain is a preferable alternative method when the self-pairing cro5S0~r desiga is not appropriate 10 perfonn.1bc matched paired sample consists of individually matched pain of subjecls regarding some prognostic wriabies when: each member of a pair is randomly liven one of lhc treabnc:nts (sec lhc figure). The comparison oflhc elTects bet'M'Cn the lWO IrcllllDents is made within each pair and could be defined as the dilTen:nce in eft"ect bet'M'Cn the two trealments (additive elTect) or as the ratiooflhe two In:aImcnt cft"ecls (multiplicative cfl'ect). Hcnce, each pair is trelllCd as one unit in the statistical analysis. Mlltc/~d c~C'Dnlrol sludies. One
aim or epidemiological sludies is slatistically to evaluate associations between risk factors and disease OutClOlllCS among individuals. Patients' with a disease (cases) ~ compan:d with subjc:cts Without the disease (control) and the question of interest is their past experience or exPOSeR to possible risk facton. Matched case-conlJol slUdies cOMisi of an individually matched
C~) C~) + ~
I Trealment A I I Trealment B
I
""/ CXJservation
matched ..mplea Main cIIferenc6s between mlllched samples from crossaver and matched pailS design (McNeil, 1999; Allman, 2000; Senn. 20(0)
MATCHING _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ control to each case. In the case of rarc events. each case may be individually matched with more than one control. The ftJst table shows the four different combinations of outcome and exposure in a matched paircasc-control study. when the tatal number of pairs is If. matched aampt_ Four clfffH8l1t combinations of outcome and exposure in a matched case-control study
.CMlr.
ClUe
£YpOMti Unexpon
exposed
Unexpom/
n.. n._
n_. n__ n
nsubjccts
n+n subjeds
Twot~1S
One trealment
eKb
each
n obscnaIionaJ amil5 in analysis
n observational units in anaI)'sis
MATCHJNO)
1999; Altman. 2000).
matched samples 2 x 2 frequency table of an u~ matched case-controi study, showing frequency distribution of the matched case-conttol data of the first table when treated as unmatched
Total
AI..., Ow O. 2000: I'raclicol Jlatwicsjor medkalresearrlJ. Boca Raloa: Qapman &: HalIICRC. MeNeII, D. 1999: EpitkmkHogimJ fell!tlTc/r methods. New York: Jobo Wi)cy &: Sons. lac. S. 2000: Cro:rs-orer lrim in emitsl r~. Chichesler. John Wiley &: Sous. Ud. s , - . F... 2001: Construction of a global scale for multi-item assessmeats of the same mabie. Statislics in Medicine 20. 3831-16.
s.-.
The distribution of data fnxn unmatched casc-cantrol studies is alsocammonly pn:scntcd in a 2 x 2 table. However. unmatched CBSe
£xposetl Unupostd
dependenL AaRESlENT5lUdicscauld also concem an intencalc comparison. wbich is a comparison bc:twcc:n ditTc:rent scales for the same variable. Comparisons between self-ralcd abiUty and an expclt-mted ability of the patient or betwccn a child"s opinion and the pamdS' j~ ofilam also examples of the IIIIIC varialions or pain:d samples in\,olving ordered categorical data. Matched samples am a natural consequence when mulli-itcm instrumclU forqamlitalivc variables. such as pain and quality or life.. an: used (S~nsSDn. 20(1). In summary. a large number of difTemat I)'pCS or study create malchcd samples. The imporUmt feature they all have in common is that matched/paired samples of observations an: dcpcndcnL a ract that should he taken inlo account in Ihc slatistical analysis. ES (See also KAPM AND WEJGHIED KAPPA. MATCIED PAIRS ANALYSIS.
C4J4!'
Conlrol
n•• +n._ n_.+n __ n
n•• +n_. n._+n __ n
Matched samples of ordered mlegorical data. Studies that invol~ rating smles, questionnaires and other t)'pCs of categorical ciassiftcations often produce matched or paim:l samples of observations. Quality assessments ofralinlscales canc:em matched samples of ordered categorical cIala. In intcnJbscm:r reliabiliJ)' studies the agreement between observers in Ihcir classiftcatians of the same subjects is evaluated. The pairs or classificaliOM of each subject am
sin.
matching A study design tcchnique of creating pairs of subjects that an: homogeneous with rcspcc:1 10 impadant backpound variables, which are not interesting for the actual saudy but could interfere with the variable of inlcrcsa. Matching normally means thai each subjecl is individually paired with 1IIIOIhcr. In praspcclive clinical studies when: the effc:cls ortwo lrealJnCnts are to be compared and the two trealJnCnts cannot be liven to the same individuaL malc:hing means that two subjects with the same background properties (e.g. DIe.. ;endcr and some prognostic variables) an: Paired. one of wham is randomly ;iven the tmltmcnt of intcn:st and Ihc other the FUCEBO or standard lmltmenL 1hc main aim of the matching is to malcc the IWO treatment poupscompamble by n:ducing Ihc variabilily and possible systematic differences that could occur due to disturbing blM:klround variables (confounding biGS). This means that the n:maining difference of inten:st between the lwo members or a pair would he due to Ihc different treatments. In retrospecliFe CASE-CONlROL S11JDJES. matchin; means that each case is individually painxl with a control subject willa respect 10 the background variables and thcirexposure 10 the risk faclOr of intelesa is compared. Matchiq twins or siblinp provides pnclically similar individuals. which can he important in both clinical and epidcmiolo;ical studies. individually matched pail'l can be regarded as "artificial twins'. A continuous matchin; variable. such as age. can be lr1IMf'onncd into categories heron: pairiq individuals. but the matching critcriaa for one subjecl to be matched with
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MAXIMUM UKEUHOOD EsnMAnON anothcrcanalsobcexpresscd in tcimsora specified toIcnnce inlCrVal (ealiper matching). The two individuals ina malCbed pair should be n:garded as one unit and the clata should be treated as dependent (pairal) in the Slalislical analysis. For further details see McNeil (1999) and Altman (2000). ES
lSec also MAlCIIED MIlS ANALYSIS. MATCHED SMDUSJ All...... D. 0.2000: PrMtimJsta'istkJ/or mftiim/ TrJetUcb. Boca RalaR: Cbapmaa &. HalIICRC. ).kNell, 0. 1999: EpklemioiOliml remurll nre,IIotb. New yadt: Joba Wiley a. SOlIS. Inc.
maximum likelihood estimation This reras 10 a general method of eslimalion for unknown panunc:lClS in a pmbabiIisIicn1aliliall model rar obsc:rYed data. rar instance a UNEAR RBIIESSIlN madeJ. A wdkhosen panuncterisation or a clisc:n:tc.PROIL\BIJ1YeJI' continuousdcnsityensura thai ditrClalt panunc:ll:r wlucs ddc:nninc difren:nl c:hancc:s (densities) or obIIcning a given cIaIaset. l11e paI1IIIIc:tcr wlue that muimisc:s this chanoc (density) b" an oI!semxI daIaset is cIc:cmecI most likely and is knowDaslhc maximum IikclihaadestilllBle(MLE). The MLE. is often derived in closed fonn by maximising the loprithm orthe UKEIJIIO(I)ofthe cIaIa in a function orthe unknown moclcl paramdcls. The first figun: depicts this loglikelihood function for the MEAN body weight based on a mncIom sample of 2SO America men (Penrose. Nelson and Fisher. 1985) with weights thal ~ nonnally distributed with the known STANIWm DEVIA110N equal to 12.3 q. The solid yerlicalline indicates the MLE at 81.2kg. With two paramclcn. the paph or the log-likelihood becomes a surface. In the second figure. we show contour lines or this surface ror the inlen:epl and slope in a linear model for the regression of body weight (in kg) on heilht (in cm). We assume that weights com:sponding to fixed height IDCB!ItRmcuts are normally distributed with the known Slandarddcviation equal to 10.5 kg. The filled n:grcssioncurve is displayed in the COlly on the lEAST SQUARES ESmtATJOJIi (and seen in the filure thcn:).
maximum likelihood estImaUon Contour linss of lhe sudacs for the Inlercspl tJtJC! slops estimation in a linear modsIforlhe rsgression ofbody wsighf (In leg) on height (incm) To maximise the log-likelihood in practice., one can set its derivative with tapc:Ct to the largct panuneter. i.e. the sc:ore function, equal 10 zero and solve this far the unknown paramck:l'. Altemati'ICly. ClCllDPUlationai algorithms can scan:h numerically for the maximum of the log-likelihood function or for a zero of the score function. Maximum likelihood estimation plays a central role in statistics. It is the standanl method ofeSlimation. for instance. in linear. LOO-LINEAR MODElS and LOOISTIC RmRESSION. This is due 10 its strong intuitiyeappcal and nice statistical properties in laqc samples. Under quite ICncral conditions. the MLE beaimcs normally dislributal around the true panunclcr value as the sample size increases. Hence. 95CJ, CONfIDENCE lN1ERY.w are obtained as the MLE plus or minus 1.96 times· its stANDARD ERROR. For a p-dimeasional parameter estimate beta with COVARIANCE V. the Wald confidence n:giaD is the ellipsoid consisting of bctao-values whose dislanec from the estimator bela. i.e. (bcIa-betllo,)l"y-1 (bcIa-bclao). stays wilhiD the 959t pcn:cutile orthe CHI-SQUARE DISTRIBU110N with P DfDtEES OF ftlEEDmL
Mean body weight mulmum likelihood estimation Log-likelihood funclion for the msan body weight estimation based on a random sample of 250 ArmHican men
More accurate p-dimcnsional 959t confidence regions an: usually obtained as the sci of values that ~ so libly that twice the log-likelihood is DO fw1hcr n:movcd from the maximum achievable than the 9S4JI, percentile of the chisquare distribution with p dqn:es of fn:cdam. In the 8rst figure we dcpic:l this maximum clcYialion by the dashed horizontal line. The clashed vertical lines mark the 95CJ. confidence intenal 19.5-82.8 kg for mean body weight in American men. In the second ligure. suc:h a 9S4JI, conlidcnc:e region for tile intcrcepl and sJopecontains all values enclosed by the 4XlDiDUI' at -679.879. A fiat log-likelihood in the neighbourhood of the MLE makes it hard to pinpoint the MLE and reyeals limited
271
MCNEMAR'STEST _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ information. Hence. a useful summary of the infonnalion about the ~et parameter is lite curvalUn: at Ihe estimaled value minus the second derivatiyc of the log-likelihood function with n:spect to the unknown panuncter, evaluated atlhe MLE. This is called the "observed inrormation·. lis expectcd value under lhe assumed model is known as the Fisher infonnation. Us inverse approximates the VAJlI.o\IID: of lite MLE. The inverse of the: observed informalion is oRcn lItOughtlO yield the beller variance eslinudC as it sits closer to lItedala. The MLE is ellicient when the assumed model holds and lite sample size is large. This means lhat among all eslimators expected 10 equallhc: lrUe paramc:lc:l' wJue in Iaq;e samples. no estimatOl' will have smaller variance. 1'be bad news is lhat lite MLE relies on Ihe com:ct speciftcalion or the probability model and may be biased otherwise. The MLE is usually also sensilive lO outlying observations. To address these probJems, alternative estimation techniques hal'C been devised. such as least squares estimation. Ll-Jql'CSSion or the generalised mc:thod or moments. These a\'Oid specirying the enl~ sampling disbibution or Ihe observed data and tndc robustness againsl GUllying observations rOl' efftc:icncy. For fUrlherdetails see Cox and Hinkley (1979). Clayton and Hills (1993) and Jewell (2004). SVIEG CIaytoa, D. and HIls, M. 1993: StalUticalmtHlels in epitlemiology. Oxford: Oxford UDi\'asity ~ Cox, 0. R. and Hinkley, D. V. 1979: T'Ireoreliml statutks. Boca Ratoa: eRC Plas. In.. N. P. 2004: Stati3ticsjor epitkn,iolDI)'. Boca Ratoa: CRC Pras. PeIll'Olll, K. W.. A. G. and fIIber, A. G. 1985: Cienmlized body composilioa prediction ror mc:II using simple IDCasumncnttechni9ICs. Met/kine anti Science in Sports tmd Medit:iJre 17, 189.
o.
Me...,
McNemar's teat This is an approximation 10 the exact binomial test (see BINOMIAL DJS11UIIUTION) used in situations where lIten: ~ two pain:d outcomes of a binary natun: (e.g. yes/no. deadlalivc. infcc:lCdfnol inrected). 1be lest looks to see ifthe proporlionsorthe msponse levels~ the same in the two groups being considc:m:l. 'l1Iis might occur when we have measun:ments taken on lite same individuals at two dilTen:nt time points (i.e. the SlalUS of a n:currenl condition). when two In:alments ~ being tested on Ihe same individuals (e.g. in CR05SOVEJl TRIALS). when pairc:d indiViduals ~ entering a lrial, when two lesls am being performed on individuals (e.g. whether the individual has high blood pres51R 01' is overweight) or when aJIIIparing the SEHSl11V1TY1SFB:'JFJC'R\' or two classification methods. II should be clear that this is not an exclusiye list. Assuming lhat we have a sct of paiml yaIno outcomes, McNemar's lest (McNemar, (947) looks 10 answer the question as 10 whellter the proportion of "no's is different across the two groupslln:almentsllimes. II does so by aliisidering the discordant pain. i.e. those pairs Ihal retuma "yes' and a 'no' (as opposed to two "no's or IWo 'yes's). If the
proportion of "no's is the same between two lmllmenls.lhen fOl' every ·yes-no· we would expect to see a °no-yes·. Otherwise. there would be an imbalance in lite number of °no's between Ihe two lmltmenls and Ihus evidence thai the proportions an: not lhe same. Ir we were to have 12 discordant pairs. 6 "yes-no's and 6 "no-yes's would be expecled if the proportions were Ihe same belween lhe lwo groups. Devialion rrom this el'Cn splil would be evidence of a dilTen:nce in proportions and McNemar's lest uses a normal approximation toevaluate the statistical signi6cance of that evidence. For exampJe. Nicholson et QI. (1998) perfonn a crossovu trial lO see if a vaccine for inftuenza can bigger asthma exacerbations. Individuals n:ceive bulb the vaccine and a FI..A.CEBO (in random order) and are ob&c:l"VCd 10 see in each case whelher an eXKelbalion occurs. or 2S6 people in Ihe trial. 242 bad the same reaction after both In:alments and 50 provide no din:ct inrormation about the difference between the tn:almcnts. Eleven have an exacerbalion aftc:r Ihe vaccine bUI 1101 after lite placebo and three after the placebo but nol after the vaccine. Under an assumption that ~ is no dilTercnce in proporlions. we would expeci these 14discorclant pain lO have a 7-7 splil ndher than the 11-3 split observed. HoWC\·er. the McNemar test is nonsignificant with a reported P-VALUE or 0.06. The data ror McNemar's lest an: typically displayed in a CClNTINODCY TABLE.. as shown. Them an: various fonus of the al stalistic. Essentially. one is looking 10 compan: (b-c)/../(b + c) to the normal standard disbibulion (MEAN zero and VARIANt"E one) or, equivalenlly. (b_c)2 j(b + c) 10 the CHI-SQlIARE DJS11UBUIlON with one DEOREE OF fREEDOM (00F). These stalistics ~ usually modiftc:d to incorporate a continuity com:ction.lhc most cvmmonly used of which is lO subtract one rrom the absolute value of the numc:ralororlhc posilive squan: roaI of the: numenllor. McNemar's test Tabulated paired binary outcome data ReJPOllR frOIn
trealmenll
RespDIISt' from Ireatmenl2
ND Yes
No
Yes
242 (a) 3 (c)
II (b)
o(d)
For the example or Nicholson et 01. (1998). the stalistic is (Ill - 31- 1)2./( 11 + 3) =3.S. which when compared to Ihe chi-squared slalislic with a DoF of I. givcs a P-value of 0.06. as previously stated. (Nole Ihlll in order lO hal'C aHained statistical signiftcanc:e al the usual 0.05 leyel. the slatistic would have to exceed lite value 3.14.)
_____________________________________________________ MBMUREMENTERROR As is common when data arc well paired. it is the case hem that the number ofdiscordant pain is small. This means that the nonnal approximation may nol pcrfonn well. If the only resource is a table of the NORMAL DlSTRlBunON. then McNemar's lest is clearly convenient, but today it is unlikely that resean:hcrs will not have the capability to pcrf'orm the exact binomial test. Much literature is devoted to the problem ofpowcring such trials and it is true that it is a simplei' task to power for McNemar's test. but ~ain. with modem computing resources. powerirq; the exact binomial test is practicable. It is. therefore. dimcultto advocate the use of this lest. McNemar's test is used for the study of paired dichotomous outcomes. forother types of paired data. other mcIhods arc available (see SlUDENI"'SI-TESTor the WILC'OXON RANK SUM TBT. for example). If the data an: matched biples rather than pain. or in even greater numbers, then the COC'JIRAN Q-lESTis applicable. For further details see Zar (1999). AGL M~. Q.
1947: Note on the samplinc emil' 01' Ihc dift'c:~DCCS bct\\'c:en c:omlatcd prapartians or pcrt'elllqCS. PsymlJllltlrilcG 12. 153-1. NIdaaIIoa, K. G., NID.Je...V....Tam, J. So, AIuDtd, A. Ho, WIsdaI, l\L J ......... J .. A,..... J .. Campbel, J. H., Ddn, Po, Elser. N. M .. Hakbmlft. B. J.. PIanoa, J. C. G .. WIhJ. R. Fo, Walste........ R. J. _ W~ 1\1. A. 1998: Randomised plac:ebo-coaboUed cnJSSDYa' trial on effect of inadi\'8ted influenza wc:cine 011 pubnOll8l)' famelion in asdunL The u",t'el 351.l2~31. Zu. J. R. 1999: BioslalillimlllM". 4th edition. &giewood Cliffs. HJ: Pmdic:e-Hall.
mean
The mean is a!.IEASURE OF LOCA'J'I()N giving a typical value of a set of observations and what is usually meant when the "average' is refcm:d to. although other DE5CIUPI1\'E STA. TlSTlCS can also be used as avenges. Technically, the term "mean' is shorthand fortbe "arithmetic mean'. toditTerentiatc it flUID the 0EmIEJIlIC MEAN. It is calculated by dividing the swn of aJl observations by the number of observations. For example. the ages in yean of seven students in an undCI'gradualcsemiDararcrecordcdas lB.19.19.19.20.20and21. 'I11e mean age of the studenls is: 18+ 19+ 19+ 19+20+20+21_ 194 -------=7------ . yC8l'S The mean is a suitable measure of location to use when the variable being summarised has an approximalely symmetrical distribution. Howeyer. ifsK£WNESS orOU'RJERS arc present then the use ofthc mean is inappropriale. since it is unduly affected by a small number of values. for example. if a mature student aged 5 I joins the undergraduate seminar just described then the mean ~e of the studc:ats becomes: 18+ 19+ 19+ 19+20+20+21 +51
------- - - - - - = 23.4ycvs 8
Here, a mean of 23.4 is nol a suitable summary of the typical age of the students since it is strongly influenced by the age of the mature student. In such cases, measures oC location such as the ).l1:DlAN or geometric mean may be more appropriate. Again. the median may be preferred when summarisirq; disc~te data. such as family size. ror no one has. 58y. 2.4 children! The mean is usually denoted .i or fl. although the laller technically refers to the mean of a population. rather than a 58mple (sec SAMFUNO DlSTRIBU11ON). Altman (1991) suggests that a mean should DOl nonnall)' be quoted to more than one decimal place more than the nw data. The STANDARD DEVL\~ is typically used asa MEASURE OF SPREAD around the mean. SRC
Aftmaa, D. G. 1991: Profliem slolistifS for
Landon: Otapman a Hall.
m~tlifol r~srQ1'flr.
measul'8III8nterror
'I1Iisisacollcctivctennformany different phenomc:aa that arise when a measurement of a particular variable is made. but the measured value fails to match the true value for the subject. When someone's blood pressure is measun:d.. for instance. the equipment ror measuring it might be less than perfect or the observer may nat be using it cxmecdy. A sufficient indication or measurement error occun whc:a a n:pcat readirq; rcturm a different value from that obtained on the flna occasion. even whc:a the subject's true blood pressure has not changed. The effect of measurement enor is wide rangirq; and is relevaat both ror research studies and for clinical pnctice on individual patients. Measurement enor might be systematic or random. Systematic error is where the measurement has a consistent tendency to overestimate (or consistently underestimate) the lnIC value. A clock that is aI\\,ays five minutes fast would be an example of systematic overestimation of time orday. Systematic error may be identified by comparing measurcd values with known true values. In principle. it could then be removed by reaalibraling the measuring instrumc:at. This approach may be feasible if true values could be obtained from an ahemative measuring instrument ("gold standard'). In practice. however. gold standards do not oRen exist and studying the AGREEMENT between two measuring instruments is the mast realistic approach to quantifying systematic measurement error. The mean difference is a measure of syslematic variation between the measuring instruments. Random error is by its nature less predictable. In this scc:aario. the measured value neither consistently overestimates nor underestimates the true value. but may depart flUID the true value in an unpredictable manner. When lWOor mare radings arc taken on each or a numbCl' of individuals. random measurement error may be quaalifted iD lenns of a within-subject S'I'ANDARI) DEVIATION (SD). This SO has the
273
MEASUREMENT PRECISION AND RELIABILITY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ advanlap: orbeing cxpressed in the same units as the variable or interest. making it n:adily interpn:table for clinicians with an intuitiyc grasp of the clinical meaning or die units: it is an absolute mell5eR or measurement CITOr. Its pnctical userulncss may be less. however. fOl' a 5Co~ derive:d from a subjc:ct's response to a questionnaire. since the score may not be used in ruutine clinical~. Few clinicians would hold an inblitive pup of the mcaning of values for such a score. One altcmative approach. then. is to mcasun: the amount of measun:ment crror relative to the variability of subjc:cts' bUc valucs. e.g. by use ofthe intraclas5co~laaioncocllicicnt (sec INI1tAQ.USlER CORRELATION COEfFICIENT). This equals die ratio of between-subject variance to the sum of between- and within-subject variances. Mcasu~meat crror, of CDlI'5C., also OCCUR widl categorical variables; the ~lative lack or qrecment is popularly quantified using the kappa statistic (sec ICAPP.I\ AM) WElQH1EO KAFPA). Absolute measun:s of agn:cmeat for categorical Yariables an: less well eSlabiishcd. If quantifying systematic enor in measun:mcnt of a catc:gorical 'Variable rcpIQCnting pn:5CIICC or abse:nce of a condition. SENsmvrrYand SPfCmCllY an: rc:quin:d. Again. they depend on a gold standard mell5eR for the pracnce or absencc of the condition. In clinical medicine. knowledge or the likely size of mcasun:mcnt e:rror for a particular variable will he:lp inte:rpn:tation of a single n:ading. An observer may know with 9S Cit atnficlcnce abat the obsc:tVCd value fOl' the subject is within two standard deviations or that subject's true value. ApplRnt changes in n:adings from one occasion to another may occur: ifthcyarc largcrthan twice the standard deviation of diffcn:nccs. they an: likely to be true changes. Smaller aJIPIRnt changes could be attributable to mc:asun::ment error alone. It may. thcn:fore. be mare useful to carry out studies that assess intraobserver variation (within the observcr) and inlcrobscrver variation (between obscners) for this reason. Many studies or measurement enor will encountcr both systcmatic and random CITOI'. Comparing two methods of mcasun:ment of the same variable may demonstrate a consislenttcndcncy for method A to mell5eR higher values than mclhod B.lf d n:presents the 'Value givcn by method A minus that givea by method B for a pyea subject. dlea iI (the mean or d ror all the subjects studied) can be calculatcd. If it is nonzero. systematic ditTcn:nces bc:Iwecn mclhods A and B an: suggested. However, the SO of d is also unlikely to e:qual zero; in other wonk. mclhod A will not return exaclly the same number of units higher Ihan method B on eVCl)" subject. This suggests that random ditTe~nccs. as wcll as systematic diffen:accs. exist between melhods A and B. Bland and Altman (1986) ha,·c.thcn:fore. n:commcndc:d the calculation of appIOximale 95Cit limits of agreement. which an: to be represented by iI ± 2 x SD(d~ Knowledge or size of IIICIlMRmeat e:rror is also n:quin:d for interpretation of relationships between two or m~
variables, wiabin rcscan;h studies. Much of cpiclcmiological research involves undcntanding the n:lationship betwe:en an 'cxposure variablc· and an outcome. Inappropriate usc of a single measure of cxposure may lead to REORESSION DILUTION BIAS if the cxposure is not measured precisely. In othe:r words. the estimatc or the: effect or cxposure will be too conservative. However. if abe size or the: measurement error can be quantifted. Ihe true magnitude: or the: rclationship bc:twcen abe exposure and outcome can be properly estimated. If the: outatmc variable in cpidemiological or clinical studies is mcasun:cl imlRCisely. the estimate of the effect ofexposu~ would not necessarily be biased. but the STANDARD ERRORS of estimates or expo~ effects would be inHated. Thus CONFIDENCE INlERVAU would be too wide and P-VALUBS too eonse:rvative. Still further issues arise: in epidemiological studies when: a potential confounding variable: is measured with error and included in an analysis of the relationship betwcen an exposure and an outcome or inlcrcst. It is possible that the: measun:meat error in the: atnfounding variable fails to adjust for its effi:ct properly. a phenomeaon known as ·residual confounding'. RM/J£ (Sec also A~ 10 t.lEASURBtENT ERROR. ).IEASUREMENT PRECISION AND RELLO\BDJIY I
Bland. J. M. and A........ Do G. 1986: Slalistical mdhads for ascssing agRelDCDt bc:tecII two IIIdhods or clinical rnea5IIIaIIcnL TIre lmIt'el I. 8476, 307-10.
measurement precision and reUablil1y
If wc repeatedly me:asure the concentration of glucose in a given blood sample. fOl' example. or measure a patient's blood pn:ssure on cach of several succcssivc days. thea we an: likely to obtain a set of mcasurc:mcnts thai an: similar but not identical. n.e grcaa the similarity or the sc:ric:s or measurements then the more precise is the measun:ment procedure 01' method producing them. If we measure the variability oflhc: replicated me:asurements by their variance (or STANDARD DEYL\11O.'() then we can define the method's pn::cision as the: reciprocal or that variance (or standard deviation). In the context of laboratoty assays. say. if we hold the: mcasu~meat conditions (laboratory, batch of ~cnts. equipment. cquipmeal operator, temperature. etc.) as constant as possible then the mcasurcmeats are being made under what an: usually known as ,.epeatability t'Onditiona (international Stanclarck Institute. 1994a. 1994b). The pn:cision of the measurements is then assc:ssc:d from an cstimate of the: repeatability VARIANCE or n:peatahility standard deviation. The Iala is oRca known for short as the repeatability ofthe process. n.e repeatability standard deviation is analogous to the psychomc:lricians' standard c:nor ofmeasun:ment (Dunn. 2(04). If the: repc:atc:d measun:mcnts are taken in ditTcn:nt
__________________________________________________ labcntories.. aI dilTemat limes. wi'" dill'c:mtt equipment. mlFnlS and equipment opc:mtars. then the raulting variability oftbe measumnenls provides an assessment ofrqmxlucibility or p:neraIizability of the measurement process (International Slanclarck IMlilU~ 19940. 19Mb). 1bis is provided by aD estimate of tbe reproducibility variance: 01' n:pnx(ucibility standard deviation (n:procIucibility. for short). One key characteristic of a measuring inslrUmenf s or rnc:thod's pn:cision (the m:iprocal of the repeatability variance or reproducibility variance. fOl' example) is that it is scale dcpc:ndcnllf we measun: weight in kilograms we will gc:a a pn:cision Ibal is 1000 times as great as the pn:cision if it were measured in grams (the standard deviation for measures using the fonner scale being one-thousandth of Ibat for those using the lauer). It is essential. therefore, that if we are inten:sted in the relative precision of abcmative methods of measurement then the scale of measurement is taken into account. ORen the scale of measurement of one method relative to another is not defined a priori but needs to be established by experiment (see ME1HDO COloIPARLmN S1UDIIES). Quite often inYestiplon will be inten:stc:d in the precision of their instnlment or method compared to the variability of the characteristic in the population underdinical orscientific investigation. We postulate the following simple madel for the observed me8SUl'ClDCnt (X,) on the ,lh individual wilhin a population:
Xi = 'f;+ E;
(I)
Here T, is the ,lh indiYidual's bue value for the chancteristic and E, is a nndom measurement CI1'Dr. We might be interestcd in a measure of pn:cision relati'VC to the yariability of 'fr i.e. the ratio ~/~ where the numenllor is the VAIUANtE of the true valucs ad denominator the variance of the MEASUJIElIENT EJlR(]RS. A more familiar index of reI alive pn:cision is given by the RElJABWTY ratio or n:liabilily coefficient. defined by:
"x = ~/(a; +~)
(2)
If the measun:mcnt c:II'Ors are uncorrelatc:d with each other and wilb the true values then: (3)
The reliability JUlio (or reliability, for short) is the pr0portion of the variation in the observcd measun:mcnts that is explained by the variability in the underlying lrUe values. It provides a measure of the attenuation that might be expected in the calculation of eom:lalion between enor-prone measun:mcnlS or the attenuation in a regression CIOCfficient when the independent variable is subject to me8SUl'ClDCnt error (see AT1!ML\11ON DUElO MEASlJIlEMENl"EJUlOR). Note that reliability is not a filced charactcristic of a method or proc:ess (even if ~ is assumed to be constant). The reliability ratio is a measure ofbow well the rncasun:mcnlS distinguish between members
M~URESOFFSRnLnY
of the relevant population. and as the heterogeneity of the population goes up (i.e. as oi increases) so does "x. As the homogeneily of the population increases (as 0; dc:cn:ascs towards 0) the reliability tends towards zero. With a fixat 0; it is quite straightforward to calculale the change in reliability as one arbilnlrily changes the value of 0;. A relativcly nontcchnical introduction to reliability theory can be found in Cannines and Zeller (1979). What if the variance of the measurement errors (0;) is not independent ofthechanlcteristic (Le. 1') being measumi? It is.. in faet. quite common toob5erve that the variaDL'le of the emxs iDCTc:DSCS with the value of'f (Le. thc precision goes down with inm:asing r). In this situation we nec:d to be able to design a relati\'cly sophisticated precision study (often an interlaboratory precision study) to provide data Ibal can be used to madc:1 the relationship between the two. (This is beyond the scope of this enb")'. but see Du.... 2004. for further information.) ReliabililY ratios are usually estimated from data involving repeated measurements on each of an appropriate sample of subjects. The: analysis usually involves the estimation of the relcvant COlatFONENlS OF VARIANCE (either using the traditional ONE-WAY"ANI\LYSIS OF VARIANCE 01' ItANDOM EIft.Cl'S )'cODa) and then using these to calculate the required reliability. The reliability as defined by equation (2) is equiyalent to the correlation belween repcalc:d measurements. and this can be estimated by calculating an intraclass correlation CIOCfficicnt (usually via a one-way analysiS ofvariaace). OeneralizatioM of reliability and various versions of inlnlclass com:lation (see INI'RACI.USTER CORREl.AnON COEFHCIENT) can be found in Dunn (2004). In the case of binary measurements the intraclass aJlKlation (intraclass kappa. sc:c: ICAPM AND WEIOH1'ED KAJlPA)isequivalenttooneofthechancc-c:orn:ctc:dagreemenl coemcients - Scotrs Jr-statistic (Kracmc:r, 1979). GD
c........ Eo O. and ZtII.r, R. A. 1979: Reliability _lv/Wily as.JeS.Jllrent. Thousand Oaks: Sqe. DaDa. G. 2004: Sialisikal rllQlwtioft ojlflt!tlslUelftmt emJrJ. London: AmoId.lntematIaaaIStaadanls lastItute 1994&: Intematioftai Slo_rJ ISO 5n5-1. A«IIQI'C)' (tnmre.u tmtl prerUiDlf) oj mmnnment mellrotJs and remtu. Pari I: General prineiples and tkjinitions. ............ . . . . .rdllDItftute 1994b: InlernolionJll SlontIoTd /So 5n5-2. ActUal',,' (tnmre.u tmtl pret:WDIf) oj IfI«ISIIRment melWs and remtu. PMI 2: Bwit melhot/lor the tlelemintltion oj lire repeatability OM rrprot!uribilily oj II stumJard measurmJml melhod. KnImIr, H. C. 1979: RamificaliOlls or a popu)aliaa model for I( IS a caet1k:ient of ~liability. PsyMomelrilca 44. 461-71.
measures of central tendency
See MEASURES OF
LOCATION
measures of dispersion measures of fertility
See MEASURES OF 5PIlEI\D
See De.(O(JlAJIHY
275
MEASURES OF LOCATION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
measures of location
Also known collectively as 'averages'. these an: single-figure summaries intended to describe the typical or representaliYe leyel of a set of observations. Then: an: four measures of location commonly encountered. As parlof a study of childhood development. the IQof210 mothers was measured. The a\'erage IQ of the population is 100, but the average IQ in the sample of mothers was 106. nis information indicates Ihat on the whole these women had slightly higher IQs than would be CJtpcclCd from a truly n:pn:.sc:ntativc sample. The t.EAN is what is usually being refermilo when talking about the ·average·~ the averqe mothelS' IQ quoted above was technically the mean mothers' lQ. 1be other averap canunonly used an: the MEDIAN. OEOMEJlUC MEAN and MOOE. The mean is calculalc:d by dividing the sum of all the absemdions by the number of obsemdions. However. the mean is not an appropriate summary slalistic when SKEWNESS or OunJERS are prcsenL The median is an alternative measure of location in situations when the mean is not suitable. The median is the Yalue that comes halfway when the observations an: ordered. It is basc:d an the ranks of the data and is lhen:fare not dependent an the disuibution of observations. In the special case where the data arc positiyely skewed such that the log-transformed data havc a NOtiW. DISlRIBU. ~. the gc:ometric mean is an alternative measure of location. "The geometric: mean is calculated by bac:ktnUlsfonning (antilogginJ;) the (arithmetic) mean of the logged values. The fourth measure of location commonly used in statistics is the made. nis is simply the value that occurs mosI often and it is lherefon: useful in swnmarisinl clllelOrical rather than continuous dala. Its usefulness can, however, be limited. For instance. we could not say much in a study of smoking if the modal number ofcipn:ltessmoked in a sample happened to be zeni A measure of spread is often quoted alongside a measure of Iocatian to pvc infonnation about. the variability of observations around the averqe. SRC
measures of mortality
See DEMOCIlAPHY
measures of spread Once the ave. . . of a set of obsemdions has been defined using a ME.\SURE OF LOCATION. it is helprulto know how widely the data are scattered around this Iypical value. M~ of spread an: used to summarise this information numerically. The most straighlforwani measure of spread is the 1lANOE: the interval between the minimwn and maximum yalues in a set of abscrvalions. Although the range has the ad\'DDtage of simplicity. it is only inHuenccd by the most ellln:me observations in a dataset and is therefore not gencruJly considcm:l a good way of quantifying variability.
InslCad. in the case whe~ the data an: approximately symmc:bicalJy distributed. the STANDARD DEVIA1ION is often quoted. This measure has the useful property that. especially when the data follow a ~RMAL DIS1'RIBU11Q.'I~ approximately 95CJf, of the observations lie within two standard deviations of the MEAN. The INlDQUARnLE IlANOE is an alternativc measure of spmad used in situations when the standard deviation is not suitable. due to SKEWNESS or the presence of OU11JElLS. It is Ihe interval bc:lweeD Ihe values lhaI are located a quarter and three-quarters of the way tluoulh the sample when the observations an: orden:cL Since it is based on the ranks of the data it is not dependent on the distribution of observations. 11ae VARIANCE is the squan: of the standard deviation. AlthouJh this quantity is tn:quenlly used in slalislic:al analysis. it is not as useful as the staadarcl deviation in describing the spread of observations because it is not in the same units as the original data. Whemas one can have an intuitive feel for. say~ cm2• it is less obvious how to cope with units such as y~ or mmHg%. SRC
median The
median is a measure of location. being the central or 'halfway' value when the observations an: orden:d. Thus it is the middle value - in other words. balfthe data lie below it and halfabove. The median is also known as the 50lh petrerrlile. For example~ in the rollowil1l dalasct. the heights of II women are n:cordecl in centimetres:
154 157 157 lSI 159
,
160
161 162 162 163 169
Median The median is the 6th yalue when they an: ordered. i.e.
J60cm. Ifthe~ is an evcn nwnber of observations then the median is. by (lOR\'cntion. the arithmetic mean of the 1",'0 central values. Wilhout such a convention. any value of an infinite number lying between the two central observations would be: a median. but it is pn:ferable to have a definition cnforcinl a unique value. 11ae MEA.'I is generally used as a MEASURE OF LOCATION when the data have a S)'IIIIDClric:aI distribution. If this is not the case then the median is oftcn quoted because it is nol unduly affected by the presenee of SKEWNESS or ouruERS. For instance. a woman who is 115 cm taU is added to the heights dataset in our previous example:
154 157
157 151 .59 J60 161 ... Iq2 162 163 169 ISS Median = 160.s
This chanses the median to 160.5 em. a value that still gives a good indication of the avcmp height of the women.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MEDICAL STATISTICS-AN OYERVIEW The OOERQUAIl11LE RANCJ6 is typically usc:d as a MEASURE OF SPREAD anJUIId the median. SRC
median regression
GIDIIP IIfIRIber
See QUANlUE Rf.IJRESSION
median survival time Sec
medl. . test Number below/equal or above the ovetall median in each 910UP
SURVIVAL A.~ALYS~AN
O\IEIMEW
I 2 3
TaIal
Tolal
>22
S22 8
3
5 5
6 6
II II II
18
15
33
median test 11Ie median lest is a nonparametric test. It can be used when exact values or some observations are unbown. ir they can be classified as above or below the ovenll MEDIAN. It is a useful alternative for testing differences in medians when the assumption or the similarity of disbibulions nc:cessary for the MA.~WHII'NEY RANK SUM TEST or KRUSKAL-WAWS 1ESI" is not met. although it is less powerful than these tests. 11Ie median leSt uUIisc:s the CHl-SQUARE TEST to determine: whether IWo or more (It) independent groups are drawn from populations with the same: median. The median test assumes thai the data are randomly selccled observalions that are ordinal or continuous in nature and that ~ IR at least two independent groups. In additioa the assumptions of the chi-square test apply. To pc:dann the median test. first lIat the data as a single sample and calc:ulate the overaO median. Classify each obsCl"\lDlion as below/equal or above the: o\lmlll median. Calculate the total of each type: in each group. ArranIe thc:se Yalues into a 2 x Ie CONI1NOB\IC'YTAJIl.E. whc:rc the two rows are the classification lllainst the ovcnall median and the Ie columns are the groups. Cany out a chi-square test with Ie - 1 DEGREES OF FREEDOM (DoF) on this table. Reject the null hypothesis or equal medians if the chi-square tesl proves signiftcanl. in which case post hoc pair-wise median rests can be canicd out. Example: data are shown in the finttable; they consist or mini mental stale examination scores (MMSE) in tbn:c groups with diffen:nt types or dementia. Group I has Alzheimer's disease. group 2 has f'rontOlcrnparal dementia and group 3 has semantic dementia.
median teat AfMSE
SCOtf1S
lor the three dementia
groups
Group
MA'SE More
1fIIIrI1w, I 2 3
19 16 4
1 22 9
11 28 21 6 30 24 22 22 30 29 2S 22
21 22 2S
19 28 26
27 I 29 29 27 II
15 0 10
The oyerall median is 22. and the sc:cond table classifies the number or observations in each group &pinst this ovcnall median.
To pc:dann the chi-square test OIl this second table. fint calculare the expeclcd oount in cacl1 cell by multiplying Ihc: row total by the column lolaI and dividing by the ovc:nlltotal. This givesexpc:clcd counlsof6.6. 6 for the ftlSl column and 5. 5. 5 for the sc:cond. ne chi-square lesl is ~cn calculllled as the sum oyer each cell. or the observed minus the expc:ctcd counts squan:d., divided by Ihc: expected txIUIIl. as in Ihc: following equation:
X2
=E
(0,_£,)2
ij
£ij
= (8_6)2 (5_6)2+(5_6)2+(3_5)2 (6_5)2+(6_5)2 6
+
6
6
5
+
5
5
= 0.667 +0.161 +0.167 +0.800+0.200+0.200=2.20
The critical wluc: from the cbi-squan: tables (a =0.05. df =2) is 5 ..99. The value or 2.2.0 from the chi-square lest is less than the critical yalue of 5.99; tbcrcfon: there is· insuflicienl evidence to reject the null hypothesis that the median MMS£ scores are the same for all tbIec dcmc:nlia groups. Hence it is not possible to carry out post hoc pair-wise lests (see POST HOC ANALYSIS) between the poups. SLV
medical statistics - an overview StaliSlics may be dennc:cl as the science of coIlc:cting. anaIysinJ and inler.,ming numerical data. Medical stalisticsis Ihc: applicaliaaof this science to medicine. Slatistics began as informatioa ()CIftCC1'11ing the state and this aspect is still a central part or medical statistics. as we collc:ct and analyse informalion on national rates of birth. death and notifiable: diseases 11Ic term expanded to include: many lypes of numc:ric:aI data. colleCled for the purposes of administration (bed occupancy). resc:arc:h (CLINICAL TRL\LS) and pleasure (baiting a~nges) or any combinalion of Ihc: tbn:c.l was unable to think of a medical example forpleasurc. so I will leave thai as an exen:ise for the n:adc:r. To Ihc: sIalistician. statistics is all pleasure anyway. Slatistics is a skill as well as a science and a goad statistician can take whal seems to its owner to be a bewildering mass of data and Ond struc:lure and meaning within it. What the data owner has to provide is the question that nc:c:cIs to be: answcnd. 11Ic statistician CDD seldom provide that.
277
MEDICAL STATISTICS-AN OVERVIEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Indeed. when the statistician stans to wonder more about the substanlive medical question than how to answer it. he or she ceases to be a statistician mad becomes an epidemiologist., trialisl or health service researcher. The true statistician is much more interested in thc proocss of answering the queslion than the answer itselr, in the journey rather than the destination. Statistics is unusual among academic subjects in that its entire purpose is to solve problems in other disciplines. Medical statistics is a collabomtion betwcen statisticians and those whose main object is to increase understanding of disease. health and hcalthcare. In the history of statistics. many imponant innovations were made by those wishing to answer a question arising in their own discipline. Mosl of us cannot aspirc to be such polymaths. howevcr, and must content ourselvcs with providing one aspect of the collaboration. Most people rrom health fields who wish to usc statistics really need lo acquire two things: the ability lo apply a rew day-to-day stalistical procedures correctly and a vocabulary with which to communicatc with their statistician. We can include in collecting data the design of studics: the decisions as to what data to collcct and from where or whom and how they should be collected. A frequent eomplaint of consulting statisticians is that their clients do DOl come to them early cnough (see co.~ULTlNO A. STADS11C1AN). The researcher arriving with the referee's comments on a rejected paper may be too late. If the design were fundamentally Bowed. there is probably very little that can be done to rectify matters. There is no substitute for sound design (see EXPERWENTAL DESJON) and statistics has a lotio oITer. One or the gtealest of all conlributions to medicine is su:rcly the: invention and development of the randomisc:d clinical bial. which. for the fitst lime. enabled medical rcscan:hen to obtain reliable evidence on the relativc el1ic:acy of clinicalln:almcnts. The basic principles or study design arc fairly stmighlforward and casily underslOod, unlike other aspects of medical statistics. In experimental biological studies, whether clinical trials on human subjects or laboralory experiments on animals. tissue samples or cell cultures. RANDO),05.O\TION with blind allocation is the key. We nqlcct it at our peril. Most medical experiments use Vel)' simple desigM. such as fully randomisc:d tWO-gn:KIP comparisons or lWo-period crossovers (see C1lOSSOVBl lIlJAU). At most. we might have a simple factorial design. where two treatments arc given in eombinalions or none. one or both. Much more complex designs an: used in other areas of application. such as agriculture or the chemical industry. RaadomiSDlion quickly gcts complicated. howcver. as we oRen want to impro\'e the eomparability of groups by stratification or are forced by the nature of the trealmenllo ha~ patients a1localcd in clusters ruther than individually. We may
wanL within small blocks of paticnts, to allocate equal numbers to cach of the trc:alments. to ensure that numben in treatment groups are always similar. In a small bial on a variable subject group we may decide 10 improve comparability by MOOMJSIJION. ensuring thai CCltain key wriables will be balanced belween the groups. These modifications in tum require changes to the planned analysis and hence to the: sample size estimation. 1bcy must nol be ignored. In medical observational studics the usual statistical approach of random sampling is seldom possible, bul this docs not mean we should igneR sampling issues. rather thai we should consider very carcruDy the representativeness of our sample. As in experimental design. the principles or cpidemiological designs such as CASE-CON11lOL, COHORt'. CROSS5Et"J1OWd. and ccological mJDJES an: easy 10 understand. but the details ofthe:ir implementation using cluster or SlRADFlED SAMPLES. matched one to one or one to many. oftcn require statistical insight and input. Clinical designs arc oRen similar to those in epidemiology. but used ror a diffcrent JHDPOSC: case serics may be used to describe clinical experience. cohort studies may be used to describe the nalural hislory of a disease. casc-control designs arise in the evaluation of diagnostic tests. cross-sectional studies arc used to in~sti gate the: propcnics of measurement rnelhods. Data collection is vcry important and here the principic is to eMurc that what we collect is accurate. with a minimum or BIAS and enur. Bias can be n:duced by blind assessment., where the observer is unaware of the subject's status. and by BLINDINO the: subject 10 thinp that may inftuencc response. In an experiment we may achie\"c this by conccaling treatment allocations. e.g. with a PLACEBO. leading to the ideal of the: double-blind randomised trial. In an observational study this is more difficult., but we may. for cxample. include thc questions on our key outcome variable among many others to reduce the apparenl emphasis. Statistics offers techniques to CMurc that we cannot be ccrtain what individual respondents ha\'C told us while still obtaining data rar the sample as a whole. such as sccn:l ballots and randomiscd response, but they arc seldom used in hcalthcarc studies. Perhaps the most important thing we must do is to convince our respondents that their answers arc absolutely confidential. Training of observers is also very important in maintaining data quality and we may wish lo estimate the degree or observer variation and the: effccts or using diITerent observers. To reduce measurement ClIO!' wc may need to consider the frequency ofmcasurcmenl and wc can weilh the relativc ad\,antages of increasing the number of subjects and increas.ing the number of measurements made on each. Sample size is one aspect of study design on which statisticians arc askcd to ad\"ise. but oRen rar too late far them to ha~ any real inpuL A question aboul the sample sizc the day berore the deadline for a grant application is typical. when all that can be done is to provide some kind or
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MEDICAL STATISTICS-AN OVERVIEW justification ror the sample size already chosen on feasibility grounds. To change it at that stage is usually quite impractical. To decide on sample size we need 10 think aboul lhe purpose orlhc study. what outcome variable we inlend to use and the analysis that we propose to callY out. This is. in any c&sc. an excellent discipline ror any investigation. Tho often people collcel data without any idea of how Ihcy an: to be analysed. It can be a shock when they discoyCI' thatlhcy have collected data for which the possible analysis is very complex. difticullto interpret. time consuming and expensive and cannot answer lhcir most impoJtanl quesliOM. Statistical analysis begiM with simple graphical methods. such as 1IIST0000t.IS and scalier diagrams (see SCATTERPLor). and tabulations.. which begin to RWCallhc slnlctlR of data. To make this more manageable. we then use summary statistics such as ME.o\NS. STANDARD DEVlAnONS. centiles and proportions. An analytical method or peculiarly medical application is the Kaplan-Meier SUl'Yival method (see KAPU\N-MElER ESTI.WOR). which enables us to estimate the cumulative survival of a groupofsubjc:cts. some of whom are still surviving and who have been observed for varying lengths or time. Kaplan and Meier' 5 paper (1958) has been reponed to be the most highly cited statistical paper ever (Ryan and Woodall. 2005). Comparison of lhcse summary statistics betwc:ca different groups or subjects leads to the use of differences and ratios between groups. Investigating the strength of relationships bctww:en 'Variables obsc:rved in tabulations can be done using RELATIVE ImKS 1&.'«) COOS 1lA11OS. investigating those observed in scalier diAlram5 by rqrasion and c~ation. A f~uenl problem in medical data is lhat lhc variable of inlcrcst may be inftuc:nced by another. not or any inten:st in itself. National mortalitydala provide a good example. where the age sbUctUIC of the population has a profound effect. Special agc-standardisation methods have been developed for this. producing agc-standardiscd mortalily rates and standardised mortality ratios. Much marc generally applicable to deal with such problems an: MUI.J1IU LINEAR REORESSION (sec LOOIT MOOEl.S FOR ORDINAL RESPONSfS) and its many offspring: logistic. onIered logistic. multinomial. Poisson (sec CENElLWSED UNEAR MODEL). COX's REOIlESSIO.~ MOOEL. LOOISTIC' REORESSION. etc. Such techniques also allow us to analyse siluatioM whc~ we an: inlCrcstcd in several pn:dicting variables and want to look at lhcir IClalive importance as prcdicton. A key question in analysis of dala is the CXII'rcCt unil or analysis. In a trial. for example. ir we randomise individual palienls to treatment, lhc patienl will be the unit of analysis. However. ifM allocate a poup of patients together, forming a cluster, we mUSI lake this into accounl in lhc analysis. For example. we might allocate all the asthma patients in a primlll')' can: centre to n:ceive an educational intervention or all to act as aJDtroIs; the primary can: ccnln: becomes the
unit of analysis. We might calculate the average of our outcome measurement for the duster of patients in the practice and then compan: two groups of clusters. If we want 10 include information collected at lhe level of Ihc individual patient in such a study. we may have to usc mo~ complex. multilevel tc:chniqucs. The same problem arises when we have multiple observations on a few patients or several tissue samples taken from each of a few OIlBRS. Ignoring the correct unit of analysis may be seriously misleading (sec MULTILEYB. YODELS). Another area when slDlistical analysis becomes essential is when we have several outcome 'Variables and no clear primary one. This might happen if we have a baacry of psychological lests or a QUES11ONNA1RE wilh many items. We may want to in\'estigate the: slrUctuse or these. summarise them into a single scale or analyse lhcm all as a group. Methods developed in psychology. such as FACTOR A..'lALYSlS. enable us 10 do this. In medical statistics, we usually have a sample ofobserwtiOM from some lugCl' population. about which we want 10 draw some conclusion. For example. in a clinical trial Ihc subjects an: a sample of all the patients to whom we might wish to give the trial treatments, now and in the future. We use lhc trial to leU us about what would happen in this larger population. Even when we have data on the whole population. as in the case of national mortality rates. we often think of Ihcm as a sampleofa hypothc:licallarger population so that we can investigate the reliability of dilTcn:nces and relationships found. This process of drawing conclusions aboutlhc larger populalion from the sample is called statistical inrerence. Usc:n of statistics often find lhc concepts involved in inference quite diotcult to maslCl'. Most infcn:nce in the medicallitcralUrc lakes the rorm of COJ\"fIDE~ INTERVAlS and SIO.\UFICANC'E TESts. These arc methods from the rR'lqucntist formulation or statistics. one of two c:oncc:plual fiamcwns in which statistical iDrerence is carried out. In lhis approach, we reprd our sample as one of many we eould ha\'e taken and then make deductions from lhc sample we do have about Ihc many samples we do not have. and hence about the population rrom which they would all be drawn. The a1tcmative is Ihc Bayesian conceptual rramework (see BAYESIAN !oIElIIOIlS).1n this. we try todcsaibe what we already bow about the answer to our question in PROBABIUI"Y Imns and then see how Ihcse probabilities an: modified by lhc data we ha\'C collceled. Bolh approaches provide ways to take data from a sample and deckle what lhcy can tell us about the population. BoIh involve difficult concepts that can take years to master. Fortunately. it is possible to analyse data adcqualely even with a quite poor understanding of the underlying philosophy. In Ihc past. statisticians were divided into two warring camps, Ihc Bayesimas claiming that their methods had a sccun:r philosophical foundalion and r~uentists claiming
279
MEDICINES AND HEALTHCARE PRODUCTS REGULATORY AGENCY (MHRA) _ _ _ _ _ _ _ _ _ __ lhat Bayesian mcahods were impnclical for real problems. In ,"ent ycarslheclc~lopment ofpowc:rful computer-intc:nsi~ Bayesian methods and CXIIDputers fast enough to carry them aut has led many mon: statisticians to make use of the Bayesian approach and Ihe barriers IR coming down. Much statistical methodology consisll of techniques to apply these fundamcnlal concepts to inference for different types or design and data. ne flnI methods de~1oped were for lillie samples. estimating and comparing mcans.11ItCS and proportions. These were followal by I-DlSTRlBUIlOH-baseci mdhods for means of small samples. w~ the: distribution of the observations themselves was impartant. This led to the use oflnlDSrOJmations to allow data to be presented in a fOrm that we knew how 10 analyse. It also led to the deveiopmcnt of alternalh'C 1DClhock. such as Ihose based on nnk order. which did not n:qui~ such SIJOng assumplions about the data. Small samples far proportions led to the development of exact probability mclhods such as FlSHER-s EVer JEST. mdhods that the advent of powerful and conveaient computers has made feasible for large clatasels. too. The list of slatistical methods is long and pawing. Almost all the analysis methods have inference attached to them. so that it is often not explicit which aspect of statistics we an: doing. When we have cleciclcd what our data can tell us about the wider world. we usuaRy want to understand why the ~lation ships that we have discaveral have arisen. 'I1Iis brillls us back in a arele 10 Ihe desiln. If we have a nnclamised. double-blind experiment we can usually conclude that the evidence supports the difference in Raiment causiq the difference in ouIcOlllC- We do not. of coume. caacluclc unequivocally lhatlhere is cause and ellcct: slatislics is a discipline that inslils caution and it is always possible. however unlilccly. that we have an extmne sample producing a IaIIlt atypical ofils papulation. Statistics enables us to assess how cautious we need to be. If we ha~ a study that is not blinded or nncIomised, we must consider very clRfully the possible biases that may have been introduced. If we ha,·e an observational study. we must always be VCI)" cautious in Ibe intc:rpn:tation. We must ask how sood our sampling is and how compamble our IrouPS are. We must n:membel' thai just because there is an ASSOC1A1'1ON bc:lWc:en two variables we should not conclude thai one causes the adler. We must consiclerother factors lhat may be responsible for Ihe relationship we have ob5en-ccl.. If we are studyiDl the aetiology of disease. in particular. we must bewan: of a rush to judgement. Guidelines forasscssing lbeevidence forcausality an: available (such as the BRADRJRD lOLL CIJ1ERIA). but it takes cIaIa from several dilTc:n:nt sources before we can apply them. In the en of evidence-based healthCIR. all healtbcarc professionals~ whether clacton. nurses or therapists. must be able to understand and inlclprcl the research evidence on which their practice should be based. Much of this eYidenceis quanlitativeandSlalistics islbe key skill nec:ded. NOl only has
the number of sIaIistical analyses published and the number or statistical methods employed increased peatly. but also these analyses ha~achievcd much Ircater prominence. They an: no lonser conftned to the mcahods and results sections of papers. but now fill the abstmcts. too. familiarity with basic slatislical ideas is inescapable far the healthcan: prufe&Sions. HealthclR raeard1. whether medical. nursing or lbenpy. is unusual in that it is mainly initiated and canied out by heaI~ pruf~"Sionais. Medical research is clone by doctors. nuning research by nurses. This does not happen in other fields. Educational research is not dane by teachers. social research by social workcn ar apicultural raeardl by farmen. but rather by pmfe55ional n:scan:hen in academia and industry. It is quite an attnclive idea that it is part oflhe roIeorcioctan 10 add to medical knowledge. but it puts alreal responsibility on them to do this to a high slandani. Unfortunalely. thisisofte. not the case. Statistical analysis n:quircs quite dilTe~nt habits of mincito those n:quiral for diqnasis. and the training of doctors. nurses and others is aimed at cia-eloping \\'ays of thinking dilTemal to those leuued by the mathematicians who specialise in slatisties. Hcalthcan: research n:quires many different skills and aptitudes and is much better done by collaboration between people fram the disciplines pauessilll these. Rather than attempllo tmin a clinician in slatistic:s to the level n:quin:d for high-quality medical n:scan:b. we should employ a statistician. Not only is this I11CR effective. it is. repeuably. cheaper - they arc not paid as much (or enough). 10 work tOJCther. clinicians n:quire familiarity with slatistical ideas and vocabulary and close collaboration with slatisticians should enhance this. Statisticians need to be familiar with the problems of canyinl out research on particularly wlnerable human beinp. whose needs must always be paramount. and the many spc:cialtechniqucs that haw: been developed 10 do this. Close coUaboration with clinical n:sean:hen should also further the education of Ihe slalisticiaDs. Wcdilll together benefits clinicians. statisticians and Ihe n:scan:b itself. and hence is to the good of all of us. JAIB KIa...... E. L. .... MeIer. P. 1951: NonpalllDdric eslilDllian fftIIn incomplete obscn'8lions. Joumtrl of,1re Amer;"", S'tlliJ,irtll Auorial;"" 53. 457-81. Rpn. T. P..... W..... W. H. 200S: The most~ilcd slaIislicai papers. JDlUIftII ofAslplW SIGlistia.
Medicines and Healthcare products Regulatory Agency (MHRA) The Agency was fonnc:d on I April 2003 as Ihe amaIlamation of Ihe fonner Medicines Control and Medical Devices Agencies. which WC~. ~pc:c lively. the UK government qencies responsible for Ibe assessment ancIlicensing of phannaceutical and biological human medicines and (human) medical devices. 1'he Agency has a varidy of n:sponsibilities includilll assessment of new mcdicineslpnxlucts. assessment of chanles to
___________________________________________________________ MEGA-lRML existing medicines. pa6t-maruting sun'Ciliance (sec PJlARt.IACOVIOILANCE). inspection or CUNJCAL l1UAL conduct and manuraclUri~ facililiea. enforcement of regulaUons and so on.
The legal framework for Ihe Agency's work is set out in the Medicines Act 1968: ror medicines, an applicant is reqUired to demonstrate adequate evidence of safety. quality and efficacy. To Ihis end. the Agency employs a large number or quality. pn>Clinical and clinical assessors. Most of the statistical work is done in c:onjunction wilh the clinical assessors. although statistical considerations oRen come into other an:as such as assessment of p~-clinical sarety studies. determination or product shelflire and assessment or marketing/advenising claims. Companies apply for a "marketing aulhorisalion' (ronnerly called simply a ·Iicence'). Agency staff assess data and prepare assessmenl rcpons. which are considered by the Commission on Human Medicines (CHM) and some or its expert subcommittees. The CHM is a panel or independent expens (including practising doctors. phannac:ologists. statisticians and lay members) that meets monthly and advises on the granling. or oIherwise. or a marketing authorisation. The final decision on granting is made by the government minister responsible ror heallh. In cenain cin::umstances. companies may appeal against unravourable decisions. The MHRA works in close collaboration willa other European national agencies and the European Medicines Agency (EMA). which is based in Loudon. T~ are a \'ariely or routes by which companies cu apply ror a markeUng authorisaUon within Europe - including national licences in as many (or as rew) EU membcrslalesastheywisb or a centralised licence covering all member Stales. In the laller case. two member states will be allocated to complete a compreheMivc assessment of the applicalion bul all other member states are given the opportunity to raise concerns. 1be Europcan oounterpart to the CHM is the Committee ror Human Medicinal Products (CHMP), which meets monllaly. The MHRA and EMA also work wilh oIhcr agencies across the world and. in particular. contribute to and follow guidelines jointly pIq)8n:d by the Intemalional ConrCRnce on Harmonization (a collaborative effort between the major geographical regions or Europe. Japan and the USA). The assessment or safely and emcacy from ruNIC,~ 1RIAI.S is similar to Ihal fCll' rerCReing a paper for a medical journal bul explon:s much II10Ie detail (sec REQULo\TORY srAnmc.~ ).L\11ERS). The law requires companies to submit details or all trials Ihal havc been conducted. Each of lhcse trials will have detailed study reports running to hundreds or pages and rurther extensive appendices includi~: a copy of the protocol and any amendments: individual case reports fCll' serious adverse eventsln:actions; line listings or individual patient daaa~ possibly efficacy results presented separately for each participating cent~: copies or invesligator.s' curriculum
vitae: documcatalion of quality and purity of product used in the lrials~ and so on. These appendices may run to hundreds or volumes and, hence. the need for a varicay of disciplines willain the assessmenlleams. Mo~ inronnalion about the Agency. its work and regulalioM penaining to medicinal products and medical devices is available at the MHRA website: www.mhra.gO\·.uIt.SD
Medicines Control Agency (MeA)
Sec MEDIC'JNES
AND HEALTHCARE PROOlJCTS REouUTORY AOENCY
mega-trial This isa 1aJge...-ale nmdomiscd trial (gcacnaIly imolYing sevemI thousand subjects) thai is designed to detect the elfects ofone GrIllCR llalmentson majorENDllOlNJS. such as death or disabiUty. 1bc need ror mcp-trials arises because the W5l majority ofbalments h8\'C only modcIaIc effects on such endpoints. typically producing n:lativc nxluc:tions at mosL a quaJter. Any study aiming Ioddect such a moderate effect needs to be able to guarantee that any 8L\SB and random cnars inherent in its design ~ substanlially smaller than the expected lR:aImcnt effect (CoUins and MacMahon. 20(1). wiD cnaac thai the n:suIts or the saudy either confirm the JRSCIlce or a modc:ratc effect amvincingly or. if the tn:atmenl is ineffective. provide clear evidc:nc:e that this is so. For a study 10 avoid modcnllc biases requires RANDOMISA. nON using a method that ~Iudes knowledge or each successive allocation. Randomisation in tuN1C'AL nLW is intended to maximise the UKEUHOOD that each type or patient will havc been allocated in similar proportions to the dilTcrcnt treatment slndq;ies being investiplcd (Annitage. 1982). Randomisation requires that trial procedures are organised in a way that ensun:s that abc decision to enler a patient is made irreversibly and without knowledge of which trial treatment a palienl will be allocated. Even when studies an: randomised. however. modente biascscan still be intnxluced by inappropriate analysis or interprctalion. Tbc n:qui~mcnts for reliable assessmenl ofmoderarc trc:aImenl effects am as foJlows: negligible bitlRs. i.e. guaranteed avoidance or moderate biases invol\'Cs proper nncbnisation (nonrandomlscd methods cannot guarantee the avoidance or modcnde biases). im'Olve: analysis by aIIocalcd In:almcnts
or.
nus
(i.c.anINlEN11ON-T~TIlEo\Tanalysis);chieremphasisonovcrall
resulls (willa no unduly dala-derived subgroup anaIysis)~ and systematic META-ANALYSIS of all the relevant randomiscd trials (with no unduly dala-dependenl emphasis on the results fl1JlJl particular studies) and small TllIfdo", errors. i.e. gullrameed avoidanc:c of moderate nmdom errors. involve: use or large numbcrs(with minimal daIa coJlecUon since detailed stalisIicai analyses or masses ofdata on pmgnostic reatwa gcncndly add lillie to the ell'ecU\'C size of D trial): and s)'Slematic metaanalysis or all the relcwnt raadomised trials. One well-recognised circumstance is when paUents are excluded after randomisation. particularly when the
281
MENDELIAN RANDOMISATION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ pnJInosis of the eKluded palic:nls in one Imalmenl group dilTers from Ihat in Ihe alher (such as might occur. for example.ifnoncamplierswen:exc:luclcdaftenandomisalion). While: avoidance of macIendc biases Rquin:s c~ful allention bath ID the randomisalian pn:x:ess and to lhe analysis and inlc:rpl1:lalion of the available lria) evidcace. a study can anly avoid mocL:rale random envrs if it accumulales a sufficienlly large number Dr evenlS. When IIIIIjor enclpoilllS such as dealb affect only a smaJl praparlion of those randamisc:d. \'Cr)' IlIIge nUlllbers or patic:nts ncccI ID be studied befcR eslilDales or bealment elTect can be guaranteed to be statistically (and hence: medically) convincing. ID these cin:um!t1llllCcs. when a Imallnent has the polentialto be used widely and hence confer laqe benefit (or harm). a mcga-trial is the anly type orstudy that is suf6ciently !diable. Forexampl~ for an event that is expected to occur among lOCI. or subjects without active lMaImc:nl. O\1:r 20000 subjects are raplired to demanSllale a 2mt n:lalive risk n:cIuclion (i.e. rlUm lOIJt to 8'1,) reliably (i.e. with 9()fJ, POW1!R at a TYFE I ERROR rale or I'it). Far a mega-llial to randomise Iaqe numbers of trial subjects. the: main banicrs to rapid n:clUilmcnt need to be n:movc:d. To racililale this. the information n:cardcd at enll)' should be brief and should concc:nlrale on those rew clinical cleWis that are or paramount importance (including at most only a few major pmgnostic ractors and only a few variables that an: thaughtlikcJy to inOuence substantially Ihe benefits or hazards of IR:atment). Similarly. the informalion n:cordc:d at rollow-up should be limited ID serious outCDmCS. adverse events andlD approximate measun:s or compliance. (Olhcr outCDmcs. such as sunupte endpoints.lbat an: ofinlcRSi but do not need to be studied on such a large scale. may best be assessed in separate smaller studies or in subsds of thc:se Iar&c studies when this is pnlCticable.) Kc:eping a lrial as simple as parsible increascs the IilccJihood that it will be able to recTUit large numbers or patients. For this n:ason. mela-lrials an: also known as "large simple trials' . For ethical n:asons. randomisation is apprvpriatc only if both the doctor and the patient reel substantially uncertain as to which lrial tIalment is best. The "uncertainty principle" maximiscs the patential for reClUilmc:nt within this ethical constraint. This says that a patient can be entered if. and only ir. the responsible physician is substantially uncc:dain as to which of the trial trealmeats would be most appropriate ror that particular patient. A patient should not be enlered irthe respolWible physician or the patient an:, for any medical or nonmedical reasons. n:a50nably certain that one of the treatments that be allocated would be inappropriate ror this particular individual (either in comparison with no tn:atment or in comparison with some ather treatment that could be oITcn:d to the palic:nt in or outside the trial). If many hospitals are collaborating in a trial then wholehearted use of die uncertainty principle ellClClUntgcs
.,ht
helUogcneity in the n:sullilll lrial population and this. in mela-trials. may add subslantially to1he practical value or1he results. AIRORI the early trials of fibriaalytic thcnpy. ror example. moll of the studies had n:strictive trial entry criteria Ihal pn:cluded the: randomisation or elcledy plllienls. 50 lhose llials mntributcd nothing or din:ca re1cmmce to the important clinical quc:stionofwhcthc:rtn:atmcnt was usefulamang older patients. Other IriaIs that did not impose an upper qe limit. however. did include some cJderIy patients and wen: liIcRfom able to show that ace alone is nat a CDDtnainclication 10 fibrinolytic thempy (Fibrinolytic 11Ierapy Triali&ts' C0llaborative OnIup, 1994). Mep-lrials adapting the uncertainty principle to delcnninc eligibility maximise helclV~ity of the slUdy sample. which in tum elWun:s that their n:sults 8ft: releyant to a vel)' divene range of rUhR palieatl. CB (See also EllIlCS] AI'IIIbp. P. 1982: 11Ic role of randomisation in clinicallrials. Sialislics it MMJrinr I. l45-S2. C...., R. ... l\1aeMDDa, S.
2001: RdiablellS5eS5lDCldoftheetreclSoftRalmcDlon manality and major marbidit),.I: clinicallrials.. TIre Lantrl 357. 373-80. tlbrllIII,sc'l'lllrapy Trildlsts' CoDaboratlYe 0.., 1994: Indications far ftbrinal)'lic therapy in suspected lK:ule ID)'cardiaI infamion: collabanlivc cmniew ofeuly modality and majarmortJidity laUIlS from aU I'IIICIomiscd trials or InIR than 1000 palicDls. 1M Ltmrel 343. 311-22.
Mendelian randomisation Mc:ndelian randomisalion reren to a method or leverqing improved causal infemlCe (see CAUWJTY) frum observational data through utilisation of the: random assipuncnt or an individual's genotype rlUm their plll'ental Icnotypes. It is justifted by interpn:lalions or the: laws of Mendelian ,encticl. Assumilll that the pRJbabilily lhal a postmciotic germ cell that has received any particular allele at sepq;,ation conlribules 10 a viable conc:eplUS is independent of environment (fram Mendel's find law) and that genetic variants sort independently (frvm Mendel's sc:cond law). then at a population level these variants will not be associated with the: confoundinl facton that generally diSion conventional obsenalional studics. Fonnally. random allocation or genotype occun within ramily IrouPS (rrom pamIlS to olTspring). and interpn:tation of such studies is closely analogous to Ibat or randomiscd controlled trials (see aJNICAL TRIALS) (Davey Smith and Ebrahim. 2003). However. it has repealc:dly been demonstrated that at a population Icvelleaelic variants an: genc:rally unrelated to potential confoundiq ractors (Davey Smith el al.• 2008). Confounding by other poetic variants will only oc:curror yariants located close tOldheron the same chl'Oll105Clllle. when they an: said to be in linkage disequilibrium (LD) with each oIhc:r. The lc:rm 'Mendelian randomisation' was ftJst applied in a study using Ihe aVailability of genetically compatible siblings to evaluate the elTectiveness of bone manow
_______________________________________________________ transplant in haemalopoielic cancel' (Gray and Wheatley. 1991). an approach that is conceptually similar but distinct in tenns of design (Da\'ey Smith, 20(6). 'I1Ie concept applied to popuIDlion-based epidemiological studies was mosl clearly articulatc:d by Kalan (1986). who proposed that since polymorphic fonns of the apolipoprotein E (APOE) gene we~ ~Iated to dilTercnl avcntle levcls of serum cholesterol. individuals with the geaotype associated with lower avenge cholesterol should be expected to have a higher cllDClCr risk if low cholesterol levels increased the risk of canCel'. If. however. ~verse cauSDIion or confounding generated the association between low cholesterol and canCel'. then no association would be expected. The conditions fal'. and assumptions underlying. a successful Mendelian randomisation saudy wen: elaborated in 2003 (Davey Smith and Ebrahim. 2003) nCR the general proposilion that Mendelian randomization aNdd be used to make causal inferences aboutlhe relationship between m0difiable risk factors and disease outcomes was advanced. It was argued that if genetic variants are robustly related to dilTerCRt levels ofexpD5~ to a risk fadal'. thea these genetic variants mould be related to disease risk to the extent pn:dicted by their inftucace on the risk factor. Mendelian randomisation implies that genot)'JJlHlisease ASSOCIATIONS should not be aft'eclCd either by aJnfounding or by revene causation. and many biases inherent in conventional obSCl'vationa! saudies may also be avoided (Davey Smith and Ebrahim. 2003). Such associations may t~fon: imply a causal effect of the risk factor on the disease outcome. HoweVCl'. these causal inferences from these studies may be undermined by issues including pleiotropy (the genotype has direct elTects on rnon: than one risk factor for the disease outcome). population slnllification (population subgroups that experience both different disease rates and ha\'C different fn:qucacies of genotypes of in~ exist. ~ulting in aJnfOlDldcd associations between genotype and disease) and canalization (butTering of the effect or genotype on disease by compensatory biological mechanisms) (Davey Smith and Ebrahim. 2(03). Thomas and Conti (2004) pointed out that the Mendelian randomization approach involves application of the method of INSTRUMENTAL VARIABLES. which is commonly used in ecanomelrics. An insIJUmenlDl variable satisraes the following assumptions: (a) it is associated with Ihe exposure of inten:st. (b) it isindcpcndent ofconfounding fat'torsand (c) it is independent of the outcome given the risk fat'tors and c:onfounding fat'tors. Lawlor el til. (2008) reviewed inslJumCRlDl variables methodology in the context of Mendelian nmdomisation studies. Because genetic variants often explain only a small proportion orlhe variance in the target risk factor, vcry large sample sizes may be needed to achieve pn:cise estimates of the causal elTc:ct of the risk factor on disease outoomcs. JSlGDS
METkREGRES~ON
Davey SmItII, G. 2006: CopilalUing DR Mtntklitur nmtJomj:aliDn to James Lind Ubnuy: www.jamcslindlilnry.OJg. DIm.J SmUIa. G. aDd F..bnIIIm, S. 2003: 'Menddiaa randomisation': ('aD genetic: epidemiology caatribute to understanding ctWdunmcntal ddcnninuIs or disease? InltmllliOlltll Joumal of Epit/mrjoi9gy 32. t-22. Daft)' SInD, G., Lniar. D.A., Ibrbonl. Q,J,M.JS the eJfHU of IreQlmenls.
IL. TbplDll, N.. Day, I .... Ebnlalm, S. 2008: Clustam enviroDmeals and rancIomiud FftCS: a rundamcntal distiDdion bdwem l:CJII\ftlioaaJ and ,cadit' epidemiology. PIoS Met/kiM 4. el52; DOI:IO.J371Ijoumal.pmcd.0IN0352. Gn),. R. aDd WIIea8ey, Ie. 1991: How to avoid bias when comparing bone mmow lranspJanlalion with chemodJerap)'. Bone MtlTrtJM' Tralup/tJnlali0n7(Suppi.). 3, 9-12. Kala, r.L B. 1986: ApoIipopmIein E isofonns. SCIUII1 moIesteroi. and cancer. I4ntet I, 8479, 507-8. Lawtor, D. A.. HarIIord, R. M., Steme. J. A. C., TIInpIoa. N.........,. S..... G. 2008: Mendeliaa randomization: using genes as instrurncats for making amaI inf'c:renc:es in epidemiology. SIatisliC's in MediC'ine 21. 8.1133-63. 'I'l1o-, D. C.aadCGntl, D. V. 2004: CommeataIy: the I:CJIIOCpl of "Menddian nmdomizalion'. IIIlemalionaJ Journal of Epidemiology 33. 21-5.
meta-analysls
See SYS'lDlATIC REVIEWS AND META-
ANALYSIS
meta-regresslon
This is an analysis of the relationship between study characteristics and saudy results in the context or a !.IEI'A-A."UaLYSJS. Independent studies of the same problem. e.g; multiple ClOOCAL TRIALS of a particular drug or multiple CASE~NrRQL S1UHI!S of the same exposurc-disease .o\SSOCIATJON. will incvitably differ in many ways. Some or the variation may cause the effects being evaluated to be different in different studies. a situation commonly known as heterogeneity. Metan:gn:ssion analyscs are similar to traditianall.lNEAR RftiRESSDI analyses. a conceptual diJTen:nce being thaa eat~ studies. rathcrlhanindividuals.aretheunitsoranalysis.C'hanIclcristics of studies IR used as explanaloly (indepcndenl) variables and estimates ofetTcct are used as oulaJme (dependent) variables. Regrusion coefficients describe how the eft'ects across the studies incmasc per unit incn:ase in the characteristic. Study characteristics might include numerical summaries oftypes of panicipants. variation in the implementation of an intervention or dill'en:nl mc:lhodological fealun:s. Estimates of effect may be. ror example. OODS RAnm. hazard ratios or diffelQCCS in mean responses. depcnding on Ihe type of study and the natun: of the outcome data. Ralio measures of ctTecl are usually analysed on the (1IDIUmI. or base e) log scaJe. Studies are weighted in the analysis to reftcct impn:cision in their rauJts., the weights typically in\'Olving the inverse variances of the effect estimates. A mela-regression may be a primary reason ror assembling multiple studies. although meta-regrasions are perhaps most commonly used as secondary analyses to investigate heterogeneity when a baditional lDCta-analysis was the
283
MET~REGRE~ON
_____________________________________________________
primary objective of a ~view of studies. The 51udy chamcteristics may be Caleplrical or quanlitali\le and several may be included in die same: aaalysis. Forcalegorical chamtleristies, mela-lqraliion may be viewed as a genendisalion or subgroup analysc:s. ~ the subpuuping is by studies rather than by participants. Meta-n:grasion should ideally be conducted only as part of a thorough sY*m1llic mview to eMWC that the studic:s in\'oJvc:d ~ n:liably identified and appraised. Notable examples or mcta-n:pcssion analyses include an invcstiption of the do~sponse relationship between aspirin and secondary pn:veation of stroke. Among clinical trials adminislCring aspirin aI diffen:nt doses. no relationship was appan::nt belwecn aspirin dose and the n:lati\le risk of ==~nce (Johnson el Ill.• 1999). A second example is proVided by Zeegers., Jellema and 0sIm" (2003). who present a rncIa-analysis of obse"'alionai swdies comparing proslate cance:rrisk between people with and without family history of praslatecancer. Tbeyuscd mela-~g~ssion Ioperfonn several subgroup analyses to assess the rubuslness of their finding that a family history or the disease is associated with roughly a doubling of risk. Studies were broken down by study design. year of publication and ethnic group, among other characteristics. A third example that bas inspired development of mcta-n:pcssion methodology is an analysis describing a mlationship bclwc:ea the geopaphical latitude of studie:s of the Bca vaccine and the mative risk or wbcn:ulosis in those wccinaled \'CI5US those not (Berkeyet al.• 1995). A con\lenient illustration of a mela-n:pasion analysis for a single chamc~ristic isa simple Sc\nERPlDTas in the figure. which shows the n:sult of the BCG vaccine mela-~gression. The cin:les rep~nt studies, with Ihe size or c:ach cin:le proponional 10 the pn:cision or the: relali\le risk eSlimate ftom thai study. The mela-n:.gn:ssion line iIIustrales that the vaccine wasobservc:d 10 be IIIOIC effective fUJther away from the equator.
2 •
. •
1
'c:
JO.5
!
0.2
•
0
•
0.1 ~-___--......--__----r--......--.., o ro ~ 00 ~ m ~ DisBlce from equator (degrees of latitude)
meta·reg......on Meta-regression analysis of ths reiaIionshj) belwBen sffectivBlJBSS of BeG vaociM and latitude of study (data from Berkey st aJ., 1995)
Meta-~gn:ssion analyses invol\le observational compar-
isons across studies and may suffer from BIAS due 10 confoundirq:. since studies similar in one chancleristic may be similar in olhen. Causal mlationships belween characteristics and results can seldom be drawn with confidence. A particular problem is thai in most situations the number or studies in a meta-regression is smull while the number of potentially important characteristics is large. Thus any meta-n:gression analyses perfonned should be driven by a strong scientific rationale and ideally p~-specifled and limited in number. It may be nc:cessary 10 control for the possibility of false-posilive findings since the risk of a TYIIE I ERROR increases substantially when multiple meta-regn:ssion analyses are undertaken. II is possible 10 summarise panicipanl-Ie\lel characteriSlics at the level of a study for use in a meIa-n:grc:ssion. Thus the MEA.~ age of participants. the proportion or remales or Ihe a\'CI'DIC length or rollow-up might be used as study-level characteristics. Such analyses should be inlerprCted carerulIy. as they may nol adequately dec. true associations. For example. suppose the etTect or an intervention wly depends on a patient" s DlCoif several c:Iinicaltrials each inc:lude a wide l1IIIIe ofages. but if the mean age is similar across lrials. then a meta-reP"ssion mating mc:an age 10 size or etTect will fail 10 delc:ct the rcIaIianship that would be evident from withintrial analyses. When inlC~t focuscs on participant-level charaderistics. a meta-analysis or individual participantle\lel data is Ihe most reliable: method of separatirq: within-study rrum DIDOrq:-study ~Iationships. Potential limitations of mc:Ia-regn:ssion. including those uln:ady mentionc:d. an: discussed by Thompson and Higgins (2002). In common with meta-analysiS. meta-JqR:Ssion may be conducted assuming either a FL\'ED EFRCr model or a RANDOM EfFECTS MODEL. A fixed effect meIa-reP"ssion assumes that the study characteristic(s) explain all of Ihe inlentudy varialion in effects. It may be performed usirq: Weighted linear reP"ssion oflhe effect estimates on the study characteristics, weightirq: by the inverse VARL\HtES or the effect estimalCS. However. the srANDARD ERRalS of the ~gression coel1icienlS need to be wm:cted 10 accounl ror the facl thai the variances are known. by dividing them by Ihe mean IqUIR error from the weighted n:pasion. A random elTcc:1S ItIda-n:grasion allows for \'BIiDlion in study dTecIs that is not explained by the study characlcristics. Such "R:Siduai bctcrogc:neily' or eam is couunonly assumed to follow a IIDRMAL DlmUBlmON analogous 10 a random elfects mda-analysis. Random eft'CCIs mcIa-n:pasion rcquirc:s lI1OII1: specialised sonw~ although a con\'C11ien1 implelllClllalion is available for SIala (Sterne, Bnldbum and Fgc:r.2001 and see alsoSTATISTICALRaAOES). Sinceil isunlikcJyahal ~ty can be fully explained by a finite selection at study characteristics. mndam etrc:clS mc:ta-n:pasian has bc:c:n J1X."IDIIIII1CI as the default choice (11x1mpson and Higgins. 20(2).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ METHOD COMPARISON STUDies Mela-~gR:Ssion
may be performed using alternative melhods specific to the nature of the individual level oulcome data when these are available. For example. when the outcome dala frvm individuals in the studies are binary. then mela-~grcssion may be undertaken using logistic regression. Implementations or meta-regression that n:quire special consideration include Ihc n:lalionship between effect estimaICs and underlying risk (due to collelalion between effect estimates and risks). investigations or publication biBS (again due 10 possible correlation between effect estimates and measun:s of precision) and nonindepcadent outcomes (e.g. entering subgroups rram the same study). JPTH
B....,·.C.S.. H..... O' C.,MGlCder.F.... CClldUl.G.A.1995: A random effects ~D madd far meta-analysis. Stalillit's in MftlkiM 14.39>411. JaIuIIaD, & s.. t.an.,5. F.. Wllllwortta 10, C. &.Sderfield, Me .... Abebe, B. L and . . . .r, L W. 1999: A m~ioD analysis of Ihc dasc-RSponsc ell'cd of aspirin on SU'clkc.Am.ireso/lnlelJfQl Medirire 159, 1248-53. saerne,J. A. Co, B........... M. J. and ...... M. 2001: Mda-lIlalysis in SUda™.ln Eger, M.. Dave)' Smilh. G. and Altnum. D. G. (cds). S,,'ematic TerieM'S in heallh mrr: meia-tllltllysis in mIIte.fI. 2ad editiCllL I..CIndoa: 8M1 Publication Group. no......... s. G............, J. P. T. 2002: How should me"~DD aaaI)'scs be undatakcD and illlCqlRleci? Sialisties in Mtrlkilre 21. 1559-73. Me P. A., A. ~L and Osbw, H. 2003: Empiric risk of IJI05IIdC c:arciDlllllla ror rdali~s of paticlD with pmsbIC Clltinoma: a IDdaanalysis.. Cana!'T VI. 1894-903.
z.aen.
J........,
method comparison studies At its simplest. a melhod comparison study involves Ihc measun:ment of a given chamcteristic on a sample or subjc:c:ts or specimens by two differeat mclhock. To take a Simple and familiar example. we could imagine a study in which the body tempenatuR: or CKh of. say. " patients was assessed once using an old men:UI')' thermometer calibmk:d in degn:es FahRnheil (F) and apin using a modem lhmnomeler calibndcd in degn:es Celsius (C). Jr the lnIe Icmperatun: of the hh individual (in degReS Celsius) is 1'1 then the n:sulting sel of measun:mc:nts might be represented by Ihe following two equations:
= I'i+d; F; = 32 + 1.81'; + E;
C;
(I)
The numben 32 and 1.8 follow rrom the temperalUR:S of fRc:zing and boiling walei' (i.e. OOC532°F: IOOoC5212 oF). The key charaderistic or the design is thai the resulting daIa is a series of paiJ'cd measun:ments (C,. F /). The d and to values in these two equations 4:lIJIRSpond to madam measun:ment errors that are assumed to be unc:om:lDk:d both with each other and with the patienl's IrUe tempc:rat~ They are both assumed to have an average value of zero. They cannot be detennincd indiVidually bul slatistical methods can be
al
used 10 assess their variability (their variances. and ~. respectively). Now we will complicate matten by choosing 10 me~ a chamcteristic. such as a tissue enzyme. using two differeat assay methods with indeterminate scales. Arbitrarily choosing one (X) to be the standard (or. indeed. il may be already recognised as the standanI against which a new assay is being compared) and other (Y) to be the campanllor, a n:alistic statistical model might ha\'e the form:
Xi =~; +di Y; = a + fll'; + E;
(2)
Here the values 32 and 1.8 have been n:placed by unknown constants a and /l respectively. Our task is now 10 take the n pain of measun:ments (the XI" Y,) and use statistical methods 10 estimale a and fl. togelhcr with the variances, and 0;. These are the paramelcls of the MEASUREt.ENI' ERROR madel (possibly with Ihc addition of the variance of the IrUe scores. ~. depending on how the patients or specimens have been selected). Before describing how we might attempt to carry out this estimation, however. it will be userul to discuss brieRy whal we might wish to learn ordc:c:ide from the n:sults or a method comparison saudy. We might wish 10 estimate Ihc paramelcls or a ~Iative calibration or measun:mcnt enor model such as that described by the pair or equations in (2). It would obviously be or interest to know Ihc wlues of a and /l so thai we might know how 10 convert the scale of X to that of Y. or vice velSa. In particular. we might wish 10 establish whether a =0 or Jl = I. or both (i.e. are the scales the same?). If the scales or mcasurementarc the same (i.e. a =0 and fl = I) then Ihe lwo meas~ment methods are the MEAN or avemge eqUivalent. In this case we mighl also wish to know whether the lWo methods are equally precise (or whelhcr one is more pR:Cise than the other) by comparing the estimates orabe VARIANCES of the measun:menl erron (i.e. comparing estimates of and 0;). If two methods an: mean equivalent and Ihcir pn:cisions are Ihc same. then they are fully indi\'idually equivalent. In the theory of psychometric tests (applicable 10 the measurement of depn:ssion or anxiety. for example). tests thai are mean equivalent or individual equivalent an: referred to as being I'-equivalc:nt or parallel rcspc:ctively. Measu~ments using alternative methods that an: individually equivalent or parallel an: fully inlc:rchangeablc: without any loss of infonnation. Suppose. howe\'Cl'. that we wish 10 evaluate and compare Ihc pR:Cisions oflwo methods that are known nol to be mean equivalent? How. for example. do we compen: the performance (precision) of an old thermometer calibrated in degn:es FahRnheit with a new one in degm:s Celsius? We would need first 10 conVClt the FahRnheit measurements 10 degn:es Celsius (or via: 1ICrSII) and only then com~ the variances of Ihc measurement errors. For methods X and Y.
ai
oi
285
MHRA _______________________________________________________________ the relevant ndio (i.e. relati'VC precision) for Ibis comparisoD is 1';01). Hen: a direct comparison of ~ would provide die aDswer to the WIOIII question. A less slliDgent quellion might involve askilll whether the two measllMments on a givCD patieDt an: close enough. We do nol ask whelher two methods an: exaclly equivalent but whether. for all practical purposes. they an: inten:hangeable. In this situation we may abandon lhe mc:asun:ment model in die equal ions in (2) entirely ancl concentrate on the paired difl"en:nces (X, - f ,) as indicators of agn:ement betWC:CD lhe two methods (BI_ and Allman. 1986. 1999). If the agnx:ment is good enough dlen we can for all practical purposes n:place a measuremeal made usiDg one or the melhods by a CGm:spondinglllCllSlRmeDI using the other one. This is the rationale for the construction of wms OF AOREE.IENI (Bland and Altman. 1986, 1999). A very useful graphical summary 10 accompaay dlc:se calculalioDs is what is usually known as the Bland-Altman plot - a plat of the ditTerence betWC:CD the two measurcmeDts. X, - f lapinst their mean, (X,+ f,)12 (Bland and Altman, 1986. 1999). In addition. one might wish 10 pnJduce a simple Y 'Venus X SCA1TEJtIIlCJT. to_her with an ellimale of their pnxIuct-momeDI eom:lation and concordance CORRELAnON (Lin. 1989). Many in'VCsliptors. however. will wish to go beyond lesting for equivalence. They will wish to know. for example. whelher f is beller dian X. Is the new mdhod an improycment on the old one? Or. cODtnuiwisc. is it worse? If this is the aim then we have no option bul to mllect lhe relevanl data (the dcSigD might need to be mon: informative thaD those discussed so far). postulate realistic Slatistical models ror the mc:asurcmeDls and proceccllO test whether the models an: appropriate and. ir so. to ftnd what the estimalcs of the model's Plll1Ullelers lell us about the performance of the methods. Returning to die statistical madel described in (2), how do we estimate the paramelCn and lest hypotheses conceming them. given a set of paired mcasun:ments (X,. 1,)? Well. the simple answer is thal we canDOl. ~ is insufficient information pnwidcd by the: dala to enable us to eslimDIC these panunclers. The technical phrase ror this is the 4problem or model undcridcntificalion'. The only way to proceed is by making various assumplionsconceming some of the panmc:lers to pnJduce a model that is ideDliftc:d and then 10 eslimate lhe remaining paramelel5. Examples or these assumptions include (a) knowiltl the variance of the measurement errors of the Slandard (or its ELIASIUTY). (b) assuming a common scale of measurement (i.e. that fJ= .) or (c) kaowilll the relati'VC sizes of the two measurement error variaac:es (i.e. die ratio ~/al). The Ruble with each of lhese assumptions is that we an: asswnilll50mcthing about the measurement methods that we
a;
oi _
would ideally have wished to study as part or the method comparison study. The other problem is that if the chosen assumption is nat actually valid wean: likely to finish up widl the wrang conclusions. COlWicier. ror example. the assumption lhal we bow the ndio ofthecmJl'variances. This leads 10 the use of a method known as odhogonal or Denrin,·s re,re.r~;OI' (very popular in clinical chemistry). 1)pkally the measun:ment error variances arc estimated for cadi oflhe methods by repeatcdly measurllllthe n:levanl characleristic on the same indiYidual(s) or spcdmen(s). This enables us 10 estimate n:peatabilily wriances. These an: ODly valid estimates oflhe measumnc:nl enor variances (~ _ irlhe n:peatcd measurements do nul have corn:latcd mea5U",meat errors. Corn:IaIc:d mcasuremeDl ermn an: almost universal and one should be very W8l)' orlhe use of Deming's rep:ssiOD whea they an: known to be a possibility (Canull and Rupped. 1996). The only really satisfactory way out is to use a man: informalive design iDvolving one or IDOR: of the followiIIJ reatun:s: replication usilll each of the meIhods, die use of inslnuneatal variables (see ~S'IIlUMENTAL VAlUABLES) and Ihc: use of more than lhn:e ditTen:llt mc:lhocIs of measuremeat wilhin the study. '11Ie otherkcy featun: oflhc:se studies should be an adequate sample size. Most method comparison studies arc 100 small. Statistical analyses ror the clala arisilll fram mon: of the iDformative clcsiglW. wilh mon: realistic measuremeat models. is beyond the scope or this cotry. but the methods arc described in considenble clewl iD Dunn (2004). The methods typically involve software developed ror S'IIlucnJRAL E:QUA11ON MOOEI.l.INO (see SOFJ'WAJtE FOR Sl'RUC. TUIAL EQUA110NS MODELS). Methods for the comparison of binary measurements (diqnostic tesls) CD also be found in GD Dunn (2004).
01)
J. M. and .u... a G. 1986: Stalistical mcdaads for assessing ap:clDCnt ~ two mdhocIs ofclinical mcasurcmcnt. lmtte/i.J07-IG. "_J.I'I.- " ....... 0. O. 1999: McasurinI qKelDCal in mc:abod camparisaIl 1tUdies. Slatisliml Meillotls in MIft/icQI Raearrlr .. 135-60. Carroll. R. J. ad Rappen. Do 1996: 11Ic usc and misuse or Clllhapaal ~ssian in liaear cmn-invuiablcs models. TIre Amlfri«lll SIaliSliritm SO. I~ G. lOO4: SIQlistiml "tlllltltiM of m«ISIImMllt errors. Laadon: AmolcL Un, L L-K. 1989: A canoordace corrdlllioa codficicnl to evaIuaIc mpruducibilily. Bionwlrits 45. 255-68 (see Cam:diaas in Biomlftria 56. 32~S). BIIInd,
na..
MHAA
See MEDICINES Rml.UlORY AOENC'Y
AND
mlcrosrray expertments
HEAl.11IC'.o\RE
PROOUCI'S
These arc slUdies in micrabiololY that an: designed 10 measun: Ihc cxpn:ssion 1C'VCls of lenc:s iD a particular cqanism. lencraJly in response to some stimuli or COnditiOIW believed 10 stimulate lhe orpnism's genes. and hence CD be: used 10 assess
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MICROARRAY EXPERIMENTS
pmbabilistically the risk ofdeveloping a disease. or assessing environmental sensitivity or adaptability. for which genetic cxprasion has been identified. Presently. thn:e technologies have been developed to measure these cxpn:ssion levels: all arc: based on lhc biological concept of hybridisalion between matching nucleotides. ByaJIRparing the expression levels of the full complement of geDCs from the organism with those from a "normal' or "control' indi\·idual. ODC can identify genes that arc: "differentially exprasc:d' (exprc:sscd at difl"mmt Icyels by lhc two genomes). To the extent that a gene has been identified as eilhcrcausatiye or highly associated with a particular disease (e.g. BReA I gene on chromosome 17 for suppression of an carly-onset breast cancer tumour or the Rb gene on chromosome 13 for suppression of a retinoblastoma tumour).lhc risk of disease can be estimatc:d and mechanisms ofcausation can be elucidaled. In some cases. the analysis may pennil early intervention to preventlhc onsc:l or progression ofthc disease (c.g. identification of an absent gene may pennit early diagnosis and treatment). Similarly. adaptability or sensitivity to cnvironmenlal contaminants (e.g. metals) can be comparc:d amon; organisms that manifest diffcrin; levels of gene exprasion. (An example of such an ASSOCIATION has been identified in DtIp!miD. a frc:shWDlCrcrustacean that has a compact and wcll-characteriscd genome sequence. in lhcir acclimation and adaptalion to cadmium in lakes: cr. Shaw el al•• 2008.) 1bc identification of genes responsible for an organism's heallh condition and response to the enyironment provides important clues on potential causes of diseasc. The: measurements in microanay experiments reflect levels of complc:menlaly deoxyribonucleic acid (cDNA). rever.scd-transcribed (as explained below) from "Ilular mcsscn;er ribonucleic acid (mRNA), nOllhc Icvels of proIcins that the organism manufactures in n:sponsc to mc:asun:d elevatcd levcls of mRNA. (While not definitely proven. one asSUR1CS that an orpnism's mRNA Icvels would inc:n:ase as a pn:cursor to lhc manufacture: of rc:leyant protein products nec:dcd to rell-pond to lhc stimulus.) Presently. three technologics pennit the mcasurc:mcnt or gCDC expression leycls on microanays: tn:ated glass slides with spoUed and immobilized cDNA (cDNA a/ides). chips spoiled with manufactunxi 2S base pair sequences of nudeotidc:s found in genes (Oligo1Ulcleolide arrays) and hi,b-density cbips with synthesised longer sequences of oligonucleotidc:s (Irigll-dmsily chips). All thn:e technologies arc: based on lhc biological concept of hybridisalion between matching nucleotides, and can contain multiple copies of singlc-stmndcd genes or gene fi'Dlmcnts. called probes. linked to a substrate or surface for binding with cxpR:Ssed transcripts from target tissues. The genetic axle for an organism is c:xJIItaincd in ol);anised strin;s of four nucleotides (A =adenine. C =cysteine. G = guanine, T = thymine), arranged in triplets such that
each triplctcodes foroneof20 amino acids. (Multiple triplets may code for the same amino acid.) Strings ofamino acids an: callc:d peptidc:s. Peptides can act independently in a "II or lhcy can combine with other peptidc:s to ronn complex proteins used by the organism for ccD function. Genetic material known as deoxyribonucleic acid (DNA) is arranged in a double-stranded helical slnlcturc:, with complementary base pairs on eilhcr side of lhc helix (ATrrA or CGIGC). In response to a stimulus to procIUtlC a proIcin. thc codin; genes in the DNA arc: transcribed into messeagcr RNA (mRNA) for translation into peptidc:s. To test for genc expression. the inYcstigDlOr harvests cells from tissues of thc types IIIHIu study and lhc mRNA is nmmse-transcribcd into its more llablc fonn (complementary DNA. or cDNA). split into smaller strands and denatured (·unzipped'). yiclding single-stranded cDNA. The various tissue types arc: labelled with different RUOfCSCing chemicalslhat can be: detected by an instrument Present instnlmcnlDtion allows for the detection of two dilrcn:nt chemicals that ftuGrescc at sufficiently different wa~lengths that they can be readily distinguishccl, thus allowin; the ~hybridisalion of treatment and conlIOl samples on thc same: mic:roarray for a direct mmparison and minimizing technical variability. though the: technology is not limited to only two scanning channels. For quantilatiYC measurement. a sin;lc or a mixture: of cDNA strands is placed on to thc slide: or chip c:xJIIwnin; the ,enc probes. and the strands ofnucleotides in the tar:et sample arc: allowed to bind (hybridise) to their matching parlDCl'5. Spots on the slide or chip where hybridisalion has occurred indicate gene products that arc: present in largu quantities and may havc bec:n expressed in response to thc stimulus. With this technology. lhc repoltcd intensity leyel at a particular location on the slide or chip is a summary of ftuon:scence mcasurc:mcnts detected by an LCD (liquid Cl)"stal display) camera as a series of pixels that comprise the spot on the slide:. n.c thrc:c ~lated technologies for mc:asurin; gCDC expression diffu in the process that is used to manufacture the probes on thc slide or chip. In cDNA slides. the probes typically an: obtained rrom a cDNA library. which has thousandsofbactcrial colonies with clonedcDNA fragments. Once isolated in the bacterial hosts. the: DNA fragments undergo a series of complex processes that amplify and then mechanically deposit them on lhc tn:alc:d glass slide substrate. In oligonucleotide arrays. the gCDC fl'8lments mnsist of spc:c:irac DNA strings rprobes') of 25-70 manufactun:d nudc:olidesthat are placed on the chip robotically. Commercial microanay manuracturers use either photolithography or digital minor devices and photorc:aclant chemistries. In all cases. thc DNA probes an: platlCd in an array of rows and columns. hence the tenn 'microarray·. n.c technologics also differ in the experimental protocols that yield quantitative ,CDC expression data. For glass substnde microarrays. thc hybridisation solution
MICROARRAV EXPERIMENTS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ contains a mixlon: or lwo Iypes of cells. control and expcrimc:nlal. whose mRNA is nwenc-lnUIscribccl inlD the more slable: cDNA and thcn labellcd with lwo dill'en:nt ftucnphoaa: colllrol cells labelled with Cyanine 3. or Cy3 (In:en dyc). ud experimcatal cells.e.g. cells subjected to stress. hcat. radialionorchemicals. orknown toorigilUlle from discue liaue. labeUed with Cyaninc 5. or Cy5 (n:d dyc). When the mRNA concenlmtion is hip in lhcse samplc:s.their eDNA will bind to their cam:spollclinl probes on the spaUcd cDNA slide: anopticaldetcdor in a lascr sc:anner will mc:as1R the ftuorescencc at wawlenllhs cam:spanding to the IfI'Cn and n:d dyes (532 nm and 635 run n:spc:cli'Vely). Goad EXFERDotENI"AJ. IJI5SIOX will include lechnical replicates that inlcn:hange the dyes in a separaIe hybriclisation experiment toaccounl for imbaJ8IIa:s in the sipal intensities from the two types of ftUOlq)hon:s and expcctccl BIASES flOm ge~ye inlmIclions (e.g. possible degradation in the cDNA samples between the ftnt sc:an at 532 nm and second scan at 635 nm). 11H: ratio of the relative abunclance of rat and gn:en dyes at thc:sc two wawlcagths on a cenain spot indicates n:latiw mRNA colL'lCnlnllion betwcen the experimental and conlrol samplcs allhosc ICues. Thus. the gene expn:ssion levels in the genes under the experimcnlal condition can be compan:d din:ctly with those under the conaml condilion. Howcw:r. alhcr technologies an: limited 10 hybridising a sin. labeDcd sample or targets at one time. thus raauiring the addition or CODtroI probes and altcmalc pIOlacols far normalising the signal inlensily clala across replicates and aclUlS chips for reliable CompariSDILI of gene expasion belween the experimc:nlal and conbol condiliODS. Oligonucleotide 811'8ys cimlmw:Dlthe: poaible ilUlCCUl'acies that can arise in the prepandion of eDNA probes and the control and experimental samples rorspolled 8II'8y slides. by usiq pralc:fincd and prefabricated sequences or 25-70 nucleolidcs 10 characterise each gene. Rather than mechanically deposiling DNA. oligonucleotide probes can be synthesised din:ctly on 10 the subslrate. For arrays manufactua via the photoIithopaphy-like process. the: probe cells mcasllR 24 )( 24 or 50 x 50 miclOmcln:s squan: and an: divided in 8 x B pixels: anays manufactun:cl usilll pholon:aclanl technology mc:asurc 13 x 13 micromctres and Mace anays can accommodate IlIOn: prabcs. As with eDNA slides. cells lium the IarJ:Cl sample. labelled again with fluoropbon:s. will hybridisc to those squares on the chip that axleSpand 10 the complementary strands or the laIgel sample's single-stranded cDNA. For these experiments. the largel sample CODlaias only one type orcell (e.g.ln:alment. or CODtlOl); the assessment ofexpRmion is in comparison 10 the expn:ssion lewl on an adjacent probe. which is exactly the same as the gene probe exccpl for certain nucleotides (e.g. 13th oul of 25 nuclcotides). '11Iis ~mismatch' (MM) for the 'perfect match~ (PM) sequence is only a I'OUlh gUide. sincc a larget sample with elevated mRNA concentration for a
certain gene may hybriclise SUfficiently to both the PM and MM probes. Howewr. the results are believed to be less variable. since the probes on the chips are manufactun:d in mon: can:fully conbolled concenlnlions. Oene expression lewis are measured again by a laser scanner thai detects the optical energy in the pixels at the 'Various probes (PM and MM) on the chip. 1bc analysis of the data (ftuareSCCDtle lewis at the various locations on Ihe slide or chip) depends upon the technology. For cDNA experiments. the analysis usually in\'Olves the: loprithm or the ratio of the expression levels between the laIgel and conlrol samples. Faroligonuclcotideexpcrimc:nts. the analysis in\lOl~s a weiptcd linear combination of the logarithm or the PM expression level and the loprithm or the MM expression lewl (with some authors choosing zero far the weilhts of the MM values). Micl'08ll'a), analysis involves several considerations. including: the sepandion or 'spoI' pixels rrom 'background' pixels and the determination oftheexprasion levcl from the intensitiesn:conlccl fram the dala .spol~ pixcls: the: adjustmenl of the calculated spol intcnsily for bKkgnJUnd ('backgnJUnd c::om:cIion'): the normalisation or the range or ftuorescence values f'rom one experiment to anoIhc:r. particularly with oligonucleaticle chips: expcrimenlal design of multiple slides or chips (Kerr and ClUKhill, 2001; Yang and Speed. 2002; Casella. 2001): dala transfannatio_ (Yang el al.• 2002: leaf• • and Phlllll. 20(3); statistical methacls orinrcrence and combininl information rrvm multiple cDNA experiments (Amanaluqa. and Cabrera. 2001: Dudoil el til.• 2002) and from multiple oligonuclCOlide arrays (Bfron el a1., 2001; lrizany el til.• 20(2) and adjuSlmcDts for MUU1PLE COMMRlSONS (Reiner. Yekulieli and Benjamini, 2002: Bfron. 2004; Benjamini and Yekulieli. 20(5). The ~Iow-Iewl' analysis consists or the: necessary 'PR>processing" steps. including dala TRANSFORMAnONS (usually the 100ariihm) to address partially the nonnannalilY or the CJtpn:ssion levcls. and normalisation and background ~ion methods to adjust for different signal intensities acrass dill'cn:nt micl'ClBll'8y experiments and SOUKes or wrialion arising flOm Ihe chip manufaclurinl proccss and backgruuad inte_ily levels. The 'hip-level" analysiS usuall), involves clustering (see CLUSTER AlW.YS~ IX MEDICINE) the gene expressionlewls into lroups of ICMS thal are bcliCMXlto respond similarly. but no consensus has been achieved on the best methods for normalisilll. clUstering and aucing the number of lenes to consider as 'signiftcanlly differentially expn:sscd' when scardling for associations between disease and gene locations on chramosomes. Mic:roarray analyses have become a slandard screeninl 1001 in the exploralion and elucidation of mechanisms of disease and for sludying the interface between the environment and the genome. They haw also been used lo understand the efl"cct of certain exposu~s better. sucb as anthrax or anthnax-like organisms. or metals" on cells from
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MINIMISATION human and animal populations. aad heDC'e to characterise better the risk of such agents to these populations (Human Genome Program. 20(2). Examples of useful gene microanay-based investigations that potentially c.an improve public health pl'DCtice include discoveries of c.andidates for biomarkers of discase: measurements of perturbations in the cell cycle under differing conditions: uncovering genetic underpinnings of numerous human. animal and plant diseases: and explorations of the impacts of environmental change on ecosystems and populations of humans. animals. plants and disease-causing organisms. An example of the potential utility of microarrays is in the traelting of antigenic drifts and shifts of innuenza viruses and the changes in the virus-host relationship. Tluough these types of analyses. the ftelds of genomics. prolcomics and statistics will contribute substantially to how we perceive. measure and addn:ss disease and environmental change. Genome- and proteome-scale microanuys will be used incn:asingly as a cost-effective means of quanlifying risk and identifying pn:cursors for diseases. for developin; vaccines and for other public health and environmental initiatives. Through these types of analyses. the fields ofgenomics. proteomicsand statistics will contribute substantially to how we perceive. measure and addn:ss disease and environmental change. KKlKB ISee also ALLELIC ASSOCIATION. OENETIC EPJDEMIOLOOVI AIDIIntanp. Do aDd Cabren,J. 2001: Analysis of data from "iral DNA microchips. JDIII'U of the Amerit'rm Slati,rtical As.Jot:iotion
96. 456. 1161-70. ~ G. 2001: SlalisliCtJltksign. New York: Springer. Dudalt, s., y .... Y., C'-low••L aDd Speed. T. 2002: Slalistical methods for identifying dift"erellliaU), Q~ ~aes in
JqJIic:alcd cDNA microana)· experiments. Sialislim SiIU(a 12. 111-39. moa, B. 2001: Largc-sc:ale simultaneous hypothesis testing: lhe choice of a null hypolhcsis. JOIInIal of t"~ Ameritan Slalutical Assorialion 99. ~104. Una. 8., 1'IIMIIInIII. R.. Storey, J., TullIer. V. 2001: Empirical Ba~s analysis of a microarray experiment. JOlimal of 1M AmeritYIII Slat&lkal As.Jot:iotion 96. 1151~. Hamu Geaame ,......... 2002: US Depl of Em!rgy HumtIII Gmomr News VI2, NI-2. Februuy 2002: http://www .oml.'OY/sciltcc~uman\_(icnomclpublicatlhg
n/vl2nlJ
HGNI21'_2.pdf. irbarr)', R. A., Hobbs, 8., CaIIID, F., BeuerB.......,., y.C.,A.......I... K.J.,Scbrf. V.aDd Speed, T. P. 2002: Explondion. nonnalizatioo. and summaries or hip density olip nucleOlide &nay probe level data. Bioslatistit.J 19. )85-93. Kafadar. It. aDd ...... T. 2003: ThinsfOnnaiioas, backpauad estimatioo. and prucess eft"ccts in the staWticai analysis or microarnys. CompulalionaJ Slalislirs ami Dala Analysis 44, 313-38. a...... A., y_tle... D., BenJamlnl, Y. 2002: ldeacifying dilTemdiall)' expressed genes usiag raise discO\'CIY rale COIdIolliag pmccdures. Bioin/Ormalics 19, 3. 368-75. S"'" J. R., Pmader. B.D., EadJ, R., Klaper. A., Call......, A.. Cabo., ... J...., B., GObert, D. IIId CGIbDanIe. J. It. 2008: Daplmia as an emerging model for toxicolopcai genomics. In Hoptrancl, C. and Kille, P. (cds). Admnres in experimental biology 011 loxitogmomiu. Elsevier.
pp. 165-219. y .... Y. H. aDd Speed, T. P. 2002: Desip issues for eDNA micruuray experiments. Nature Rerin'l 3. 579-88. Y.... Y. H.. Dadalt,s., Lou, P. aad Speed, T. P.2OO2: Normalizatioo for eDNA microanay daIa: 8 robust composite mc:thod addn:ssinc siagle and multiple slide sysaematic wrialion. NutldC'
Adds ReJWITth 30. E15.
mld-P-value
MIM
See EXACT ME11IODS FOR CATEOOIUCAL DATA
See ORAPHICAL ,..OOW
minimisation
This mcdaod is sometimes used to bal-
ance IW'lDmfl5.o\TION in a CLINICALTRIAL when lhen: are scveral factors on which it is considered necessary to try to force balance across the treatment poups. Simple randomisation will. in theory (or in "the long run'). ensure that treatment groups an: equally represented with n:spcct to all known and unknown pqnostic factors but. for any particular trial. this balance may not be as good as we would hope. When there an: only a few fadon for which balance is neceSSBJ)' (such as gender or stage of disease) then simple stratified randomisation may be suflicienL However. if there an: man: than two or thn:e factors on which to try to balance. lhen the number of sInda becomes excessive and the 1000istics or the trial become overwhelming. Minimisation was a method proposed by Taves (1974) and. more extensively. by Pocock and Simon (1975) as a way of balancing simultaneously for seYel1ll factors (see also Pocock. 1983, pp. 84-6). It is important to n:aJisc thai in most trials patients arrive sequentially. rather than all being available as a 'pool' of patients al the beginning. Hence. when a patient of a certain demographic aneller disease state eruols. we do not know when (or even if) a similar patienl will enrol subsequently. However. if two similar patients wen: to be available rora trial. it would be desirable to allocate one to each of the tn:atment groups (a method easily extendable to I1KR than two treatment groups). If then: wm: only one factor on which to balance the randomisation then for patients Within each stratum we would (optimally) allocate them altemaaely be> tween the In:abnents. If there is more than one factor. e.g. gender (malclfemale) and disease slage (earlylpl'OGressivel advanced), we have to "trade 011' the beneftlS of allocating to one treatment in order to ensun: an equal balance of males/ females KlOSS the lmltment groups - and simultaneously to ensure an equal balance of eartyJprogressi\'Cladwncc:d patients across the treatment groups. Often to balance gender. we might be beuu off allocating the patient to one tn:atment but to balance: for disease stage we might be beller off allocating the patient to the other. Heace we use the tenn 'minimisation': to try to minimise the dcgn:e of imbalance across all the identified factors. The following example is described by Day (1999) and conccms a trial randomising general practitioners to an
289
MISSING AT FWI)()M (MAR) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ interventiOll group or tomntrol(sc:c also Steptoe el al., 1999). Three factors we~ idealiftcd on which to balance the groups: the Jarman scon: (a level of social stalus),lhe ralio of number ofpalienls to hourly nune praclice haun ('low' or "high') and the fundholding stalUs of the pruclice (in thn:e categories). Assume we an: partway through the llial and the first 18 practices have been allocated as in the table. Balance looks agsanably good. Now assume that the next (the 19th) practice is or type: low Jarman scare:., high patient-practice nurse-hoursand is a nonCundholdu. We calculate 'scora' for these types of praclice. which an: 4 + S + 3 = 12 far the intervention gmup and 3 +4 ... 3 = 10 for alae control poup. Imbalance is 'in favour" of inlel"YCniion. so by allocating this practice to mntrol. we minimise the imbalance.
minimisation Allocation of fltst 18 general ptIICIices and profile of 19th ptaelics indicated PrtJvuulic factor
1",~rmJliDft
,rtHlp
COIIlr. ,rtHlp
JtmIftIlf~
-
Low Middle High
-
4
3 2
3 5
-
I
High
FuntlhoItling Jtalus Nalf'uadboldcr lsa WIM: Cllby 2nd wave enuy
5
4 S
4
3 4 2
3 3 3
Da"v, I. 1999: 1Ratmcat allocalion by the mctbad of minimisaliOlL Brili. Mftiittl/ JolITfIIIi 319. 947-8. Pocock, I. J. 1983: Clinical lriDls: a prael;cal apprDQM. Chichester. John Wiley and SoDs, Ltd. Poeoelc, I. J.aad S....... R. 1975: Scqueaial ~1IImCIIl assipment with balancing for prusaostic facton in die conllOllcd clinicallrial Biomelrics 31, 103-15............r. W. F. ad I.adda, J. M. 2002: Rtmtionri:alion in ciilrittl/lriGIs. New York: John Wiley and Saas.lnc. SIIpIoe. A., DDIIedJ. 50, IlIaIc, Eo, KII'I')', S.. KIadrIdc. T. and JIIIIaa. S. 1999: Behayioural counscllinl in cenend practice for the JIIOmction of bealthy behaviour among adullS II increased risk or cORJIIII)' hcaIt disease: randomiscd Irial.. BrilU/r Mftlical JOIlIfIQIll9, 943-7. 'by.., D. R. 1974: Minimization: a newmethad of assipaing patients to babnml and COIItIOl groups. clilUttI/ I'lrtImrtttology ond71reropeul;cs IS, 443-53.
missing
at random (MAR)
See DROPOUTS. MISSING
DATA
missing completely
at
random (MCAR)
See
DROPOUTS. MlSSIND DATA
Paliml-prQt:lke /rourSIWrBwk
Low
satisfies most critics, but some (such as Rosenbelger and Lachin. 2002) still consider that all lhe lhean:lical aspc:cts of how the analysis should be done have DOl been rully worked out. SD
missing data Well-dcsigncd statistical &ludies draw
-
-
When 18ves and then. the following year, Pocock and Si...... published their early papers on this topic. they explained how simple the rnc:lhod is to use aad. in particular. how. for a single institution. it is quite possible to ·minimise' on several factors with a simple cn index system. In a MULTla:aRE 11lIAL this would effectively be 'minimisation. Slratificd by cenb'c'. With modem telephone and computer systems used for central nnciomisalion il becomes eYen easier to usc minimisation across cenlrcs (possibly using ·ceRR· as one of the minimisation faclOJS). It was mentioned carUer how an ·optimal' allocation cauId easily be determined in the case of a single ractor but that would OIIly be optimal in the sense of minimising the imbalancc. Maintaining BLINDINCJ is also impclltant and most minimisation algorithms - panicularly Ihasc JUR on compulers - incorporate an elemenl or randomisation within them so thai, evcn with complete knowlc:clge or all the patienls in the study so far. it is not possible 10 guarantee c:onectIy guessing the next patient assignmenl. Minimisation is DOl without controvcrsy. Including such a random component
a n:praentatiye sample from the study papulation by rollowing a sampling plan and a detailed protocol. Often, some or the planned data an: unavailable or otherwise absent from the dalabase. hence the lenD 'missing dais'. n.e data that would be observed if all inlended measurements WCR obtained will be called ·palenlial data'. The potential data that an: not missing, combined with an indicator of availabilily of each planned n:.SPonsc. rorm the 'obsencd data·. The name ·missing data' may suge&l that these dala can simply be forgollcn by alae dais anaJyll. but nothing could be further from the bUlb. Missing data rorm one or the hardest chaRenges for data analysts. This is because the missed data can be inlrinsically dift"en:nt from obsen'ed data in ways that an: hard to prccIicl. and thus Ica\'C a biased sample. For instance. when studying the evolution of CD4 counts over time. AIDS patients may fail to mum farplanned clinic visilS, nOI only when they an: sick as a n:sult of low CD4 counls but also when they feel good and no longer in need oflRalment.1n view of this. thn:e types of missing data an: lypically distinguished (Uttle and Rubin. 2(02): (I) The simplest situation oc:cun when the risk of missing a certain pad of the data is the same far all subjects.
n:ganllcss or their potential data values. This IJI'OI'Css is known as 'missing completely at random· (MCAR). It happens. ror instance. in a study when: very expensive
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MISSINGDATA outcxxne mcasun:s are. by design. only gathen:d in a nndom sUbsample. In Ihal case. missing observations can simply be dcJcted from the dataset and ignom! in further analyses. (2) A IIICRn:aIistic sitWdiaaaccun when the risk of missing a cCltain pall of Ihe daIa is CDIUIanl ewer the potential outcomes far tim put among subjects far wham we observe the S8IIIC aulaJmcs an a wc:U-c:hasm subset of variables.1bis candilion is easily .inleqmcd ,,'hea dealing wilh baseline ccmuialcs thallR always obsc:ncd and a single OUIcomc dud can be missins- It becomes complex with IIOIIIIIOIIOlon missinpcss paltems.1n cilhercate.1hc data IR Ihcn caUed ...ussiq aI random' (MAR). This happens. farinslm1ce. in two-stqesunpling designs whe~ a subpuup of palienls is invilalto the second &ludy cycle. depending on lheir fiI5l outcome. A naiVe daIa analysis. wbic:la ip10ns missing data. may Ihc:n be mislcadinl. unless it conditions anlhc CX1II1:Ct subset of obscned clala. (3) Whca neilha'ofthese lWoconslrainlShoJd. missingness is called iafOl1llltiYe or naaiporabJc (NO. In a hcaIIh Slll"\'ey, for inslaDc:e. one may lose the unintc:n:slcd ar vay bul)' n:sponclenlS. who haVl: their own disease pmfile. Under each of the abo\'C scenarios the popular MAXIMUM UlEUHDOD ESTIlatATIDN method can be fruilfuDy employed for unbiasc:cl estimation of parameten in the study populalion. The challenp is then 10 pnlpose a (panimoniaus) model ~Iatinl the dislribution of obscned and potential data. Typically. one chooses eilher a so-called °llClcction model' or a 'pallmI-mixlUnf madel (Utde and Rubin. 2002). The fol'll1U adds 10 the usual ....1 fOl' the distribution of the potential data a model for Ihc conditional dislribulion or being observed. liYen the potential datL The latter madels the conditional distribution orthe poICDtiai data foreach level of the n:sponsc indicator and adds to it a model for the dislribution of the raponsc indicator. In botb cues one averages over all possible values of the missing daID to find the observed clala diSlribution that enlCl5 the maximum likelihood procedure. A vel)' useful propclty of MAR is that maximum likelihood estimalion can avoid the nc:cd to model the probability of being observed and still allow for infc:n:nc:e on the potential clata. This makes maximum likelihood eslimation ver)' popular in this setting. Nonelhclcss. observed data likelihoods uncIcr MAR may have complex. fonns DOl cO\'Cn:d by 5IandanI stalistical packqes. 'Ib help avoid Icnllhy compUlaiions in rauline pmctice. the EM ALOOIII'IIM and imputation lcchniques (see ~ IMPUI"A'IKIII) ha\'C been devisc:d. EM is an iterali~ allorilhm Ihat rqJIaces abe usual IoJ-likclihaocl of the potential daID by ilS conditional cxpcc1alion. livCD the Dbservedclaaa. Maximum likelihood estimates ~ then obtained by maximizing it in thc usual way and the expected laglilcclihood is updaaed.lmputalion melhods °fill in' the missing
data by simulatinl from their dislribution condilional on the available daIa. 1bc rcsullinl °completcd~ dalasel is Ihc:n anaIy&c:d using standard softWWR as if no data wen: missiJl&. The lossofinformalion due to the missinl data must. however. be n:copiscd when STANIWtD BlIORS ~ derh'Cd. Carn:ctcd SIaIIcIard cmn have thc:n:ron: beeD pmposed based on the wrialion in eSlimates over difrc:n:nt random imputatiaas (Lillie and Rubin. 2002). One dnawback of the maximum IikcJihoad appmach is that estimates can be biased when the potential data model is misspc:c:ificd. One may ~fOlC choose to specify less fcalUn:S of the model and follow abe Horvitz-Thompson principle, which helps achicve robustness Blainst model misspc:ciftcatioa (Pn:isser. Lohman and Rathouz, 20(2). Ht:n:. the completely observed data an: upweilhtcd by Ihc inverse condilional probability of being observed.li~n the potenlial data. to compcasatc for similar COUDIcIpaI1s that are missiRl. This line of resean:h has seen extensive developments in mx:nt years and is enterinl stalistical praclic:c as softWIR becames IIIIIR n:adily available. Rcprdlcss of Ihc acIoplcd approach. observed data alone seldom contain infonnalion that distinguishes MAR from NI. One is slalislically spealc.iq blind and must rely on luidancc from olhcr sourccs tonaakc progras. nis is made abundantly clc:arby Ihc pattern mixture approach. Indeed. the paIlcrn with unobservedraponsccomplctcly laclc.s infonnalion on the dala distribution. and unbiBllCd inf'en:ac:c needs unverifiable . . sumplions rqanling the dependence of missinlDCSs on Ihc potential data. Reassurance that ·misscd· dalaan:COIIIparable to observed ones is found when data ~ 'missing by desilD~ • bul is hard to obtain otherwise. It is heDClC very impanant at the desiln SlDlC 10 plan 10 pther such information thai helps detennine thediltribulion oflhc mis&c:d data.lncxperimcalal Sludics one seclc.s to plhcrdaaa over timc that can help pn:cIicL Furthcrmon:. a sensitivity analysis can be conduetcd by examiniDl bow estimates vary as different choices are pastulalCd for thc unknown outcome distribulion in nonn:sponclcrs. This pracliccofclcscribinl how conclusions vary over plausible but unteSlable assumplions is n:commended (Kenward. Ooctlhcbcur and MolenbelJhs. 2001: Scharfstein, Daniels and Robins. 2003). While enonnous proJress has bcca made in stalilta melhoclology fOl' dealing with missing data. many problems mnain in practice and in theory. The term ·missiRl data' is oRen misundcntood or the methods are abused. When answen to certain qucsliOlLl an: inlrinsically meaningless or undefined in CCltain categories of people (c.l. blood IRSS~ of dead patients), it is \'cry hanlao justify missiRl data conslnlclions that giYe nonrespondcn the outcome dislribution of ~nders and base conclusioas on oulComes averqed over boIh groups. Furthc:nnare, Ihc MAR assumption is frequently adapted fOl' mathcmalical c:on~nicnce but may be difftcult to interpret arjustify. The loal and n:leWDCe
291
MLlMN _______________________________________________________________ ofaDY analysis. aIoq wilhjustified assumptions. must come lint. In causal infen:ace. far inllance. one has fruilfully exploited missiag data conslrUcts (Vansteclandl and Ooctghebcur. 2005). At oahu limes one ipan:s yaluable and simple MAR mc:Ihods to fall for simplistic analyses thal can be ycry misleadinl. Due caution is always necessary. additionalthoullat is n:quiRXI in model selection and. with repni to missinr; daIa in general~ the familiar adage holds: pl'CVc:Dlion is better than cure. EGlSV (Sec also DROJIOU1S]
Ksward. M. G., GaIf&IatbtGr, E.... MoJIaberPI, G. 2001: Scnsilivily ....lysis for incampldc categorical . . . S'Dtirtical JltJtkliIr, I, 31-41. ~ R. J. A. &ad RabID, D. B. 2002: S'alisti",1 tllltllyJi. tl'ilh misshr, New Yark: Joba Wiley Ii Sans. Inc. PniIIer, J. s., ......n, K. K., ........ P. J. 2002:
.'a.
Pelrannance or weiPtcd
estilDllliDJ cquaIiDDs far I_ptudinal
billll')'daIa willi drvp-auIs missi. at random. Slatislics ill MetikiM. 21.3035-54. SdIarfIfeIa, 0. 0., DaIIIeIs, M. ...... ......, J.l\L 2003: InoDIporaIins prior beliefs abauI sclectiaa bias into Ihc aaaJ)'Sis of I'IUIdomized trials with missing CIIIlcOIDcs. BiDsIDt&,;U 4.495-512. v..... Gae..........,& 200S:SalScand scasiliYily whea cOll'CClinl ror obscned cxpasu~s in randomized clinic:allrials Sialillics in MftiiciM 24, 191-210.
....,s....
MLWln
Sec MUL11LEYEL falODEl.S
mode The mode is a I11CISIft of location. It is simply the value that occurs moll often. For example, the hair colour of born at a UK malcnIity hospilal is shown in the table. In this example. Ihc mode is 'medium brown'.
573 babies
Frequency disttIbuIIon tor hair colour aloud Pale bnM'DIblancI Mediumbmwn Dalklnwn alKIE Red
41
147 244 121 14 6
1bc mocIc. howcyer. is of limited value: in summarising caalinuous data. In conlnlst to the MEAN. the mode is DDt scnsilivc to OUTLIERS but ac:ccI not be a unique: w1ue:~ u a dislribulion of data may be bimodal or multimodal (having two or more modes n:spc:cIiYcly). SRe
MPius
Sec srRVCRJRAL EQUATION MODEWNCJ
mu.lcentr. r8888l1:h ethics committee (MREC) Sec ETIIICAL IlEYIEW COMMJITEES
mu.lcentr. btala 1bc:sc an: studics Ihat are carried out in sevcraI distinct cenlRs.. siles or units (hospilais. clinical
clcpanmcnts. etc.). The lirsttrials or new thcnpies in man (Phase I trials) n:quire few subjects who must be monitoRXI very tillady; thererore the)' arc almost always carried out in a single cenln:. 1"hcse trials arc typically followccl by medium sizc:d mulliccnllC efticac)' Irials (Phase IIlrials).lflhe lCSults of these earl)' lrials an: promisiq. then larpr muiliccnlR anals arc carried out to caalirm the etlicacy ad safely of the new therapies (Phase III trials). Multiccntre trials arc often performed in se~1 countries or e~ sc:vcnal continents. The conduct of multiccntre trials can be: ovc:nc:cn by a slc:ering cammittc:c:. hcadccI b)' the Slud)' chair and consimili ofpcnans dcsignab:CI to n:prescnl stud)' centres. disciplines or actiYities. Multiccntre trials an: nec:ded primarily to accrue Ihc number of subjc:cts 01' patients ~quin:cl far Ihc SlUdy O\'CI' a ~asonably short time period. For common diseases. multiccnllC biats may haYC several ccnllCs with large numbers of subjects per ccnlR or. in the case of ran: diseases. they ma)' have a larr;e numba' of ccnlles willi Yery few subjects per cenllC. Patients treated at different centres (let alone diffcn:at countries ar continents) may be: ellpc:clcci to differ substantially in terms of their clhnicity. ellposlR to ac:IioJocic or risk facton. livinr; conditions and access to health reSGUn:cs. ele. Such bctcmgeneily may be a drawback for two main reasons: lint. the VARL\NCE or the outcome or intelCSl is incrascd because of the hc:lc:rogcncity and. second. a amatInCnt benefit ill some patient subpopulalions mipt be missed ina trial thalacclUCd Olherpalientsubpopulationsas wcll. On closer inspection. neilhcr or these two reasons ques apia multicenln: trials. The iDCmIIC in sample size thai lCSults from hClelogCnCit)' is usually negligible compared 10 Ihc potential number or patients aYailabic al multiple cenlRs. In man)' situations, lID ccntn: would be able to accrue Ihc requilal numberofpaticnts; hcnc:ea siqle-eentn: anal would be infeasible. In addition. the results of. mullicenllC lrial are mon: ~adily gcnc:ralisablc than those of a siqlc:-ccnllC trial. because they arc obtainc:cl in a patienl sample man: likely to rellcct the population of intcn:st. Ir. ImIImcnt is thoupt II priori to exert its benefit sc:lc:ctiYCly in a patient subpo~ ulation. a multic:cntrc trial is Slill indicalccl willi prior exclusion of patients unlikely to benelit. All addccI adYanlqe of muiliccnllC trials is Ihc:ir ROBUST. NESS to fraud. delinquencies and other qUality prablems that may arise at a few centres. 'I1Iis was iUustratc:cI by a series or luge mullicentre biats in c:ady bRut caacc:r. in which exclusion of a fraudulenl ccnlR did not haw: any sizeable impact on the trialresulll (Pcto el til.• 1997). In eanlrul. another highly publicised case orrraud in a trial ofhilh-closc: chemotherapy for adyanccd bn:ast cancer had clisastnJus conscqucnca. because: its results were completely dominaleel by fraudulent dala. Very few ccntn:s had taken part in this trial and fraud from a siqle invcstigalor was sulftcicnt to cause cIramatic: BIAS in the repartc:d lCSults (Weiss el til.•
20lI0). Published raullS orwell~aductcd mulliccatre bials OftcD ha'VC a din:ct impact on clinical practi4.'C. while linelecciIR trials generally RqUR 10 be rqnduc:ed on a IallCr scale before their raults an: aa:epICcI as wlid. The QJNICAL TRW.. PIlOJOC."OL should CDIXJUI'8ge the partieipalin& ceDtn:s 10 PUithe same proccdura in place as Mlards patienl management.. measuremenl of In:aImeDt elfccls and other aspects or the sbldy dud may ha'VC a bearine OR the therapeulic n:sulls. Same helcrogcneily is unavoidable iD mukiCCnln: trials.. bUI such hclemgeneily is unimpcxtanl so I.g as it does ftDt di~dy aO'ectthe auleome or inlm:sI. In a rancIomised trial. in parlicul•• ditTenmces between 4.'Cnba an: ignorable if they do not impact the lreatmenl eft"ecIs or inlen:sl. 11Ius if one centre lCDded to ~ruit paticats orpaor prDgnasis and 1IIIDIhc:r cenIn: tended to recruil patients or good prognosis. Ihis clift"m:nce would nat comprumise the trial ~sullS ir the lreatmenl eft"ecls w~ iDdependcat or prognasis. Such independence is genenally unknown but postulated befeft the Irial and it can. ia ract. be studied within Ihe trial itself.1D theeumple just giWD. clift"m:1KCS iD prognastic mix bctweea ccnln:s mighl be confounclecl by other 4.'C~&peCific factors (such as concomitant medications. supportive can:. ele.). and hence it is advisable to slaDdanlise trial piOCecIwa &crass all participating ~ba 10 the extent possible. The logistics or a mulliccDtre trial c:'an be faidy complex and provisions musl be made for drug shipment and stcnp. trial material distribution, elhic:aI approval and compliance to lacal I'Cgulations. training or investigaton and lacal slafT to avoid variation iD patient mllDllgc:ment. evaluatiaa crileriL follow-up schemes. CIe. Allliu:se issues can be discussed at investigator meetings and monitariDg visits during the trial. SlaIistic:aI qualilY COIIlIoI checks can also be perfCJllDed to icleatifydiscn:pancies bet'Mlen cenln:slhat may call for more thonlugh inveslipli~ especially in ~ba that arc found to be cle. 0U1UEJIS. Such checks arc best performed while the trial is aagoia"., ....t remedial action can be taken e.ly. and lhiscan belacililaled byeleclronic claaacaphnlhat feeds patienl clata to a 4.'Cntral database in Ral lime. 11Icn: is no conclusive evidence that data quality is Jdated to the number of palicDls carolled ia each 4.'Cnlm (Sylvester et DI•• 1911 ~ Hawkins et III.. 1990). In multicenln: llials. 1lANDOMISA11ON of lhe palic:ats is generally orpnisc:d 4.'Cnlnllly. ndher than perfanned in each cciIR. Cenlmlised randomisalion nxauires that all 4.'Cn1n:5 access a 4.'Cnlnll Rsource. usually by inlernet. telephone or fax. 10 obtain the next treatment allocation. Such ~lnIlised control is useful to follow Ihe accrual ofpalients inlo the trial and 10 cbeck eligibility criteria in a ulliform way prior to In:atmenl allocation. Ccatnlisati. of the nndomisalioD prac:css also guarantees ....t it cannot be biased by r~ knowledge of Ihe nexi lR:alment assigDmeDl, which could happen ia apen-Iabel trials with the use of nncIomisalioD
~
~
R
]
E
~
r
c
U
M
____________________________________________________
lists. It is usually desirable to SInlify Ihe alIOCaliOD by centn: and man: gc:aerally by impollant progna5tic fadonmeuun:ci at baseline. such as se'VCrily of disease. patient status. age, ele. Such stratification can be implemelllcd wilh pennurccilislS of lRlllmcnt allocations or through d)'Dlllllic docalion using MIDIISA'I1QN. Allacaliaa duuugh miaimisaliOD has the advantqcs of being able to lake aa:ounl or many prognastic factan and of being complercly unpralictable at any pven centre iD the absence of ialormalion on patienls already rancIomised iD all oilier centra. Mulliccatre trials an: usually designed and lheir sample size calculated under the assumptiaa that the treallDcnl etTect is Ihe same in all celllres. Whether this assumpliaa is suppolled by the data can be tested fonnally. Assume that in Ihe IlII or l4.'Cnln:s, the true tn:aIment effect is given by T, and the eslimateofr,. noted i;. is asymptotically nonnally disbibJb:cI wilh variance Y,:
r;
N(~;, 1';)
""I
The measun:oftrealmeDtefl'ccl istalccn such that nollealmaat effect canapandslo ~t = O.InICn:st focuses first and fomnosl on whethc:rtherc is &lalistical evidence ofan overallln*meDt eft"ect. which can be ICIted IhrauP the tesI slalistic: X~=
(hWi)' ""Izi ;=1,
Ew;
;=1
when: "'; = ''1 I denaIcs the invcne of the wrillllCC of the Raiment etTect in the ilh cenIre. Under the null hypalhesis of
no balmenl effect (~I =T2 = ... = ~/=O). this test statistic has an asymplalic clisllibulion (sceOll-SQOOtEDLmUIII1TION) with one DBIlEE OF FRF.:fDOtL Under the assumption of a coaunon IIalmenl eft"ect in allbials (TI = ~2 = ... = T, = T). the Raiment etTc:cI is estiIDIIIed by:
r
I
E1';K'; ..
i=1
T= ;"'Ji---
Ell'; i=1
In onIcr wOlds. th. is a weighted avcmp of the balmcDt eft'ects in all ~1Rs. The pn:sc:acc of heterogeneity between ccaba can be Ic:sted tIuaugh the lest slalislic: l'J ......, = ....
I
~ (" ~
.)1 w; ""I Xi-I .,
1';-T
':1
which has lID asymplDlic '1.2 clistribulian with 1- I deena of ~. In pmclice. this tal for helclUp:ncity in In:abDcnl eft"ects betweena:ntn:s is nat WIlY informative. ba:ause illacb POWER 10 delectlruc uncIedying cliff~ especially when them 1ft IIIIDY centn:s (I large) with few patients per c.alIc. 293
MUL~OWN~Rnv
_____________________________________________________
Mon:O\'eI'. when slalistiall hcterogcneity is found between centmi., it may be difJicuit to ascribe it 10 a well-identified factor and the intcrpraalion of die overall trcatmc:nt ctrcct may be conlro\"C1SiaL 1bc same lest for helcrogeneily is mon: useful when cclllms can be meaningfully combined according 10 a common charadcristic (for inslancc all cenln:S ahat hll\'c aa:css
to CCJtain c:quipmcnlsor lhat usc catain supporti~ Imlbncnls). When centn:s an: thus combined for Ihc purposes of statistical
analysis. the grouping should be defancd pmspecti~ly and blindly 10 butJncnt aIIocalion and results in the wriousccnbes. A puuping of centres based solely on Ihcir sample: sizes is unlikely to be infonnalive. EYCII when the fonnal leSt or
heterogeneity fails to mach stalistical signiftcance. heterogeneity can be explon:d through dc:saipli\'C stalistics or graphit'aJ displays of the lmalmcnt effects in indMdual ccnIR:S or groups of cenlrcs. ~ diffc:m1CCS in In:atment cft"ccts between cenln:S \'Io'OUld cause conccm. espcciaBy if much of the overall effect was atbibulable to an unexpcctc:dly hqc ell'ect in a single ccnlm or if tmatment had a l11IIItcaIIy negative cft"cct in some: antms - an overall positive tn:a1ment ell'c:ct notwithstanding. Whenever substantial heterogeneity is found. attempts should be made to nDd an explanation in lerms of identifiable fealUla of bial manqcment or subject characteristics. Such an cxplanalion may suggest fta1her analyses or appropriate inlclpmalion. In Ihc absc:nc:e or an explanaliOD. a1temalive e.dimalCS of the ImIlmcat elTect may be mquin:d in on:Iu to substantiate the robUsblcss of Ihc trial ~ulls (lnlemalional Confm:ncc on Hannonisation. 1998).
Regardless of the praence of statistical heterogeneity between cenln:s, the Slatislical model adopCcd for the estimation and lCSIing of treatment efrc:cts may account for centre through stratifICation or by inclusion or a fixed or random eft'ecl for ocnln: in Ihc model If the number of subjects pel' cenln: is limited. cenlm etTects are poorly eslimated and the inclusion of cenlm effects in the model nc;atively all'ecl the power of the tn:abnent comparisons. In such cases. it is preferable 10 ignon: the centre in the analysis. MB
s..
Prior, M. J .. l1sber, ~L R. ..... Bladcbarst, D. W. 1990: Relationship bet'A"eCn rate of paIient enrolment and qualit)· of' clinical ceater perfonnance in t\\'o multicmter trials in ophthalmology. ControU~d Cliniml Trials II. 374-94. Intunatlaul eoarereBCeon lIanDoaIsatIoa 1998: E-9documcnt pidaoc:e 011 statiSlkai .mnciples ror diDicai bials. FeJnal Regisler 63. 179. 49.583-98. No. It., Callbu, R., Sdett, D., ~ J., Babibr, A., Ba)'IIe, Me, saewart. H......... M.. Galdldnc:b. A., IlaaadCllUla, G .. V........,p.. Ratq~st,L.. EIbaume, D.,Albnaa, 0., o.ItIIo, 0., Parmar. M .. HII, C.. aua. Me, Gn)" R. 8IId DaD. R. 1997: The trials or Dr Bernard Fisher: a European perspective OIl an American episode. Controlled C/initQl TritJu 18. 1-13. S),h"elter, R., PInedo, .... De PaDw, M., Staquet. M., Sa,.., M., "...-d, J. mdllaaadaaaa, o. 1981: Qualit)· of institutional participation in multicenter clinicallriak. Nt!t'. Eng/ad Jormllll of MedidJre 3OS. 852-S. Weiss. R. B., RIfkIn. R. M .. stewart, F. M., 11HrIauIt, Ha1t1dlll, B.
R. L, WIll..... L. A., Henna, A. A. 8IId Ben!Itdp, R. A. 2000: Hip-dose cbemcMherap)' for hish-risk primary IRast cancer: an onsite review of the BeZ\\'OCIa study. Th~ Lonerl 355, 999-1003.
mulUcoilineartty This term
is used particularly in
MU..TIIU lJNEAR REQRESSJOH to indicate siluations where Ihe EXPLANATORY VAlUABLES are linearly n:latcd. thus muing
the c:stimation of regression coefficients in the usual way essenlially impossible. Including Ihe sum or average of Ihe cxplanatory variables as a variable would lead 10 this problem. For example. in a blood pn:ssun: study one cannot include among explanatory variables systolic blood pl'Cssure (SBP). diastolic blood pressure (DBP) and, additionally. a Iincarcombinalionorthe two. such as mean blood pn:ssure. wilhoul causing lhe model to bn:ak down completely. Another example is using 100 many dummy variables to code a categorical explanatory variable. In praclice. of course. approxUnQte multicollinearily. Le. where one of the explanatory variables can be predicted with considerable accuracy rrom lhe other- explanatOl)' variables. will be of more cause for conc:em and can lead to in8a1cd variances ror the estimated regression coefficients. Some evidence for approximate multicollinearity can be found by looking at lhe mulliple oorrelation coefficients (see CORRELATION) of each explanalOry variable with the otherexplanatory variables; if any or these is close to one then multicollinearity should be suspected. There is no optimal way of dealing with multicollinearity but in many cases Ihe simplest solution is 10 remo~ explanatory variables that are highly c:orrelated or combine variables in some way. More details are given in Miles and Shevlin (2001). BSE "Illes, J ...... SbeYUD.l\I. 2001: Applying regression and rorrelo-
lion. I..oncIon: Sage.
mulUdlmensionalscallng This technique is ollen used in psychology but less ollen in medicine. The basis of the method is a proximity matm arising eilher di~tly from experiments in which subjects are asked to assess the similarily or pairs of stimuli or. indi~tly. as a measure of Ihc CORRELATION or COVARIANCE of a pair or stimuli derived from a numberormcasurements made on each. In some cases. high proximity 'Values c:orrespond to slimuli thai are similar (similarities); in others. the revcrse is the case (dissimilarities). As an example. the table shows judgements aboul various brands of cola made by a subject using a visual analogue scale with the anchor points "same' (having a SClOIC of 0) and ·differen" (having a SClOIC of 1(0). In this example. the resulting rating for a pair of colas is a diSSimilarity-low value. indicating Ihal the two colas an: regardc:d as alike and vice vena. A similarity mcasun: would have been obtained had lhe anchor points been n:vencd. although similarilies an: usually scaled 10 lie between zero and one.
___________________________________________________ mullldl......I0... scaling Dissimilatity data for pailS of 10 colas for a subject
a.
Sub~c' I
Col"
1
2
J
4
5
45
7
8
9
10
~,
0 2 1
4 .5
6 7
8
9 10
16 0 81 47 56 32 87 68 dO 35 84 SO 99
16
94
87 25 92
0.........,
0
44
71
21
9B 57
98 79
53 90
measlc:scasc:sldeaths iD bothofa pairofan:as. The p:alcr the value: of such a similarity. the: g1'aIc:I' Ihe similarity or Ihc: time series of the: oct'UI'IaCe of mc:asIcs in &he two arc:as. (In allis SlUdy~ the time scries farCKh an:a consiclc:mclClClllSis1edof monthly IoIaIs or measlcs cases far Ihc: 31-year pc:riod from JanuaJy 1960 to December 1990.) 1bc: 811ft on PSle 296showslhe muitidimc:DSionai scaiinJ solUlionsconspaadSSE ing to a onc:-. t~ and dnc>cIimcasionaJ solution. CIIII. A. p.............JIIIDI', Me R.,StNap. De F. .... WI........, G. D. 1995: 1be appliCllliaft ofmultidimensiDDal scaling methods 10 cpidemioIogicaI data.. SIDlal'",' Metbotls in Mediall Re."m, 4. 102-23. Eft... B. S. MIl .............. 5.1997: 'lMtmIIl}osull/"oximi"dalD. Landan: AmDId.
0
71
MULn~MODE~
73 91 13
0 l4 99
0 99
0
19 52
92
45
17
19
44
99
0 84
0
24
18
9B
multilevel models Multiic:vel (aim known u random 0
Rcscan:bers with dala in &he farm of pIODmily malliecs ~gcnc:nlly intcn:slecl in unccm:ringany IIIIUelIR orpalb:m
they may contain ad multic6mcnsional scaling aims to help by n:praeniiDl Ihe observed pmximilics u a spalial or lcometrical madel in which the disIaaccs betWCCD dae poinlS (usually taba to be Euclidean) COlRspond in some way 10 the obsencd pIOximitics. In poeml. Ibis simply means that the larger the dissimilarity (or the smaller the similarily). the: funher apart should be &he points Ie)Rscnling tIu:m in the final geometrical model. The Rlquin:d spatial model is c1cfiDcd by a set of ddimensional poinl5. each ~lICllling ODe of the SIi....1i or intcn:sl .... a measlR of Ihe cIisIance bc:Iwcen Ihcse poillls. 11Ie abjcctiveofmultldimensianal scaling is todctenninc boah the dimensionality of the madel (i.e. the value or tI) and the values or the CXIOIdinatcs. '11Ic: c:oanIinaIes of the poinlS in the modcIlhai rqnsentthe praximilics can be round in a varic:ty of wa)'S. One simple applOKh is to chaasc: the coordinate values (rar a p\ICII value: of tI) to minimise S. dcftnal as:
S = E (6,_dq)2 ij whe~6l/istheabscrYcddissimilarityforstimulilawJj.anddli
is the: diSlaDcc: belRen the poinlS R:JRSCR1inI stimuli i awJj. Since the distanc:c:s 4 ~ a functiaa of the coanliDatc values. so also is s. Far \IIIIiaus I'CUDIIS. S is noIlcnc:nDy a suilable l'unclion ror campuinc distances and dissimilarities and full clc:tails of man: suitable crilcria CaD be round in. far example. Everi. and Rabe-Hc:skc:th (1997). Tbis also includes a discussianofhow lIIIIIIydimensians~ nccdc:cIto pvc an adc:quatc lit of &he gc:omc:trical madc:lto the observed pnaimilic:s. An iIIUSl1ation how multidimcasional scaling has been used in a mc:dicaJ sClling is pIOvidcd in Cliff ellli. (1995). He~ a malrix of similarities is calculated in which each element islhc: numberofaaonths in which thc:n:~ n:partc:d
eft'c:ct. hieran:hical and mixed) models 1ft . . extensive and ftc:xible classofmodc:Js farcOlRl1IlCd data. wIIicharisc: widely in malical statistics. Far example. adult hc:iJht arweiJht may be canelated with those: or oIheI- ramily members and the chance of posl-surgical infec:tiaD may be carrelatcd with that of alher ..lienls willa Ihe same: surgical team. Further. many studies involve the n:peaIc:d measun:mcnt of subjccls' outcomes tluaulhout fallow-up (l.CXIIOITUDINI\ DATA). Such obscmdions 1ft usually quite strcJD&ly com:lalcd. Mullilc:vel models n:1ax the IISIUIIIplion. Rlquin:d far arcIilllll')' least squares (OLS) "'In:ssion.lhat cach n:sponsc: is indcpc:ndc:nt. They have their raoIs in qric:ultunl experiments: indeed they c:mbrace aU the classical anaI)'Sis of vAlUANCEmodc:Is. They have found ladyappliclllion in social science. medical ad economic n:sc:an:h. A bricf histCll)" is given by K",R and de: Lcc:uw (1998. p. 16). The: dala struelU", is viewal as a ICIic:s or levels (ar hieran:hies). For example:. considc:r a multicc:nn bial when: subjects' quantitative oulcGmc:s 1ft recorded n:pc:atcdly over time:. 11Ie fint fillR (on PII&C 296) shows a possible data slnlc:lUrc. Lc:vc:l 1 has the ",pc:aIcd absc:rvatiaas that ~ nested within subjc:c:1S at lem 2. Subjecls 1ft in tum nested within centn:s allevel 3. A multilevel analysis enablc:s us to allow com:clly for. and model. the: CORUUTlOX induced by this 5InH:tu~ Ir we have I_gitudinal data. we can invesliple how subjects change with time. which could be quite different to the: cross-sectional ~Ialionship (Digle etlll.• 2002, p. 16). In aciditiontotheusualjixedpGrtlmeto.r(whareinterp"'talion is similar to the:ir OLS counlClplUlS). I.aUIDOM Eff£Cl'S an: introduced to modellhc: correlation struclwe. as de:scribc:d below.11ac: mix orfixc:d and nadom effects livcs rise: to die: tenD mi.-ced models. Once alCJted. we sc:c: hic:ran:hic:al structun:s everywhere: subjects within wards within countics paliellls within hospitals within health authorities. and 10 on. Thus it is natural to ask what is lained by a multilevel maciel, and when they an: unDCcessary. 9
21&
~TI~~
____________________________________________________
...
....•
III i
....
•
.........
-f.O
....
....
,.
1."
...
.• I
. '-USA
ac.
as
••
............ ..........
0.0
-nuw.. .......
...
............ •• •
wac.
'-&0
,l~1 ~I
_.
1.D
Ywo . . . .orie
I, •••
....
..
--1
~
eo
1••
.... ..... USA.
.,
....~r
r-
·ndllcII............... MOB plots oIl1rfi'USA IfIIIIons Incme-. fltD-lIIIdllne tIrnttnsIonaI space. Data 1118 ftom . monIhIy time seiies masIBs ,cases 191!Jo-1990. Taken tom CIIIf fit III., 191!i5 '.
of"".ied
0_
Fint. OLS srAKiWm..-s ... Wnaac ~ die ~am
muililewl. Far CUinplc:. subjc:cas widd. a dUster 'am si.idIar to cacla ~~ i~ IICII iDdqJcncIcnL They dll:lefCR ClllWCJ less iIII'amuIIion,about Ihe. value-ofa pIII8IIIdeI''''
LeV8i~: ~2:
level 1.:
•
•
,_ illdc:pcDdait (unelustcial) I8IDplC arilie same _.,(~Id
'liein.,2Oo.J. ~ 23).
SccaDd. Q1.S cIocs DDt pc:nait explcnliDn or the variaRcx:' sInII:turc.. Par:exampIe, .~ may wish • aii.... ·dac .~ 'of ~ tatal'~ '-Ween (Ihc INIRACI.UJiia ~1D coi:riiaacr (ICC), cquaIi_ (I) be~) or we' ilia)' ward to irnaliplc, how ·1IIe. variaaca 'Mulii~'Ve1 ....,.b wiD ,c...... . ,as a ~GrCOWlliala. . . . IIiId lillie. 10 ...' analysis When ~ ~'ell'cCtiveJy i~IO'lhaUbe ICC is cbe·to· _ ·HoWever,'. is wise 10 be. c - - . • e\'CII a· small ,ICC CIIII .lave • aaidri\lial clfcic:t.
mb.Jcc:ts
_____________________________________________________ The plan of thisllllic:ie is as foliowL Fi~ die key ideas or muililcvcJ ....Is 8M outlined., foDowccI by a discussion of commonly used aIgorilhmsl.ftUiq mullilc1e1 models. '11Ic:D extc:Dsions to cliscn:1I: data &Ie described aad the .elalionsbip 100ENElWJSEDESIDL\11JIIU fQUQIONS (OEEs) ispYeIL Mediad appIiadions and f'urdIer exleasions 1ft discussed and Ihl:D missiq .... delilll and software. Some sUlpslioas for fudhcr readiiag 1ft p\'eII III Ihc cad of Ihe CIlIIy. Consider the lDulticcnn trial orahe finl filum. FacusiRI OD 1c1els I and 2, we beliD by clcscribiq the simplest model, which allows for correlation between tile oblCl"Yalions, before oudininl how anon: ftexiblc models can be bailt up. 11H: idea iato pncralisc OLS rcgmssiOD. AD OLS model would haye a single ~lrclSion line rclatiq the averqe response to lime. A multilevel model, howeYer, can be thougha of as extending lhis to include a Iq.ession line for each subjc:ct. ThUs. wIIeras iD O~ .egrasion Ihe obsarvlllions 1ft disbibukld about a sialle rcpessiOD line, in lDu1~1 models we can view each subjecl's saponses as distribUtccl aboul Iheir subject-speciftc n:pasion line. The subjectspecific rcpasi... lines an: dlendislributcd abouttheovendl ave,. ~siOD line. This is illuslnted in the IeIXIIId ftprc. Here. the ovendl average .elatioalhip betweeD Ihe n:sponsc: and time is liveD by the bold line., Y=a.+fJl. Five subjec:t-spcciftc rc:pasion linc:s 1ft shown. which an: parallel to this. Each subject's obscmdioDs arc distributed about daeir n:pasion line. Fiw; ~amples oIlhis arc gi\'ell in the lop half of die fIpn:. In this simplec:ue. each subjc:ct"s.epasion Iiae is .....lel to die overall awrap line. The distance between dae fth subject specific: line. Y=(a.+u/)+fJl. andlbe avc:ra&e line Y = a +PI is ~ (in Ihe second ftpre.j= 1,2,3,4 or S).1'hcac
MULn~MODaS
"I arc Jmowa as the subjecl-specoific rarrdom effecu, also
known as the lem 2 l'esitlluib. They ale assIIIIIed to be lIDIIIIIIIIy distribulal about zelV. TIle vertical distances between each subject's n:sponses and lheir subject-specifte rclrcssi... line arc known as die lel'ltlll'e.rit/Ullu. These: arc anaIolOUl to Ihe residuals in OLS madels aad an: likewise IIIISIIIIIcd 10 be normally dislrihded about ZCIO. 'I'he IIDIIIUII densities 01 die level :1 and some level I rcsicIualS arc showa in the second 811ft. In the lop half, we: see lve obsc:mdi_s.1IUIIbd .+'. The Ihe vertical dist8llC'Cs to their subjecl-speciftc .egn:ssion arc ftve level I Rsiduals. The NOIWAI. DImUBU1ION 0I1hese .esiduals is iIlustralal by the ftYC normal densities about lhesubject-speciftc rcgn:ssi_ IineL 1ben. _ the ieft-liand side.1he ~12.esiduals. "I •..., 115 ~ shown. '11Ic:ir normal distribution is sketched on dae left-hand side 01 the ft&1ft. TIle parameIeD a and fJ in the fipre an: known as ft.panunetI:rL They haw; a similar inlerpn:lation to their COIInleqIaIts in die OLS models. so a is 11M: avcrap .eSponlC at lime zenJ and fJ is Ihc: ayerap chanp in IaJIOIlSC per unit cbanp in lime. However. we "ve twa new panuneIcrs. known as IrIIIIlDnJ ptIrtIIrtItter$. wllich an: the wrillllCC 0I1he 1c1e1 2 n:siduals. called ~, aad the variance of the level I residuals, called~. thus, in thescconcl filure, Ihe densily of Ihe IIJ is sketched on die left aad has yariance The fiYC densilies 01 the leyel 1 rcsicIuaIs have common variance 0;. Often. 0; is called the between-subjecl variance and Ihe widain-subjecl Yllriaace. 'I'he second fiprc n:pn:senls Ihc: simplest mullilew:1 model. As each "Jcan be viewed. a random conbibution to Ihe iRtemepl oflhejlh pc:ncm's n:p:ssion line. which is (a. +u.,), this is often known • a RANDOM IN1ERCI!PI' t.IDDEL. II is also
o!.
o!
'Dna
multilevel modele Schemafic IIusIration of the IIIIJdom intercepI model 217
MULn~MODBS
____________________________________________________
a simple example or a cat.lJll»l:NIS a; VARIANCE model as them is a sinlle wriance lmn conesponding to each level in the madel ror level 2. 0; far level I). The: motivation far multilc:vc:l madels was lheir ability to model the ~aliOD strudlR or Ihe dalB. We tbeman: CODSider the ~lati_ struclun: implied by the random inlen:cpl model. 1b do this. we have to consider the varillllCe oreach obsemdionand the COVARIANCE belween observalions. First.. consicler the varilUllClC. In multilevel models. the random component is the RESIDUAlS. Residuals rrom dill'eMnl levels are always assumed to be independent. Likewise, ~duals cOIRsponding to dill'erenl unill wilhin a level (i.e. difl'emal obscJ'\lalions within level I and dill'eMnt subjects within level 2) are assumed to be independenl. The total variance or each obscrYalion is thus the sum of the variance of the residuals at each level. Thus. in the random intercept model, when: each observation has a residual at level 1 and level 2. the variance of an observation is 0; + ,,;. Second. consider Ihe covariance. In the random intercept model or the second figure, different observations from the same subject. j. sh~ a common random CODlponent. their level 2 residual II). Their covariance is therefore COV("I.II)) = (u". uJ) =~. However. observations rrom different subjects share no common residuals. Their covariance is therefore zero. Recalling that Cor(A,B) = Cov(A.B)/v'Var(A) Var(B). we see that the ~lation strudun: implied by the: nndom inlercc:pl model is:
(o!
f
I
P =.;./
lo
.......bject .... dlDe
v'l< +;;)(G! +D;) =.;./(cr. ;-~) .......bject.lliR'mlluiaae dift'enmt _bjects
(I' Thus the: random inlcm:pl model of the sc:cond figun: implies a fixed com:lation. p, among a subject's responses. inclcpc:adcnt orhow rar apart in lime they an:... This is known as a comptIUIUl~'"""elry orexc/rmrgeable cam:lation stnIdurc:. The correlation. p. in equation CI) is also known as the inll'tl Inel2 unil, or ~ commonly ICC: in random intercept models. p mc:asurc:s the proportion Dr IOIaI vanan~ which is bc:lwcen subjeCls. Ir p = O. then observations arc indepenclc:nt. Consider how the random inte~ model illuslrated in the second flIUn: compares to filling an OLS line to each subject in tum. Such OLS lines would be: unbiased estimalcs or each subjecl'S IIUe line. However. they milht be: ilDPl'"isely cstimatecl, palticularly ir a subject has rew o_rYations. ConWftCly, the cstilDllle of the overall aYCnlle line (a +Ill) is a )Reise. but biased. cstimate of each subject·s IJUe line. Both exln:mes arc: unclc:simble. By fitting a multilevel model. we compromise bc:tween the: two exln:mc:s.. The estimates of the IIJarc: known as "best linear unbiased pmlictcn' (BLUPs)
and. as lheirname suIFSIs. have certain optimality properties (\Gbekc and Molenbc:rJhs, 2000. p. 80). 'I1Ie practical eft'cct is that Ihc: subject-5pc:ciflc repasion lines estimated by the mubilevel model arc:cbawn (or shrunk) clascr 10 the: mean line than the: OLS ellimatcs, and lhc: rewer Ihe obscrwIions em a subjc:d.1he meR their line is drawntowauds (borrows slI'aIgih from) Ihc: MEAN n:gn:aion line. this is often rereneci to as shrin/tce in the lilmDlurc:. Having ftUed the random intercept model. we should examine the levelland level 2 n:siduaIs to check whcthc:r they arc: approximately nannaI. as the multilevel naaclcl assumes. and identiry OUfLERS. Level 2 n:siduals can alsa be useclto distinguish oullying subjects: Ibis has found wide application in mccIicai settings. For mast longitudinal data. Ihe correlation bc:Iwc:en observations declines as the time between them increucs. Thus the ftxccl corn:lalion slnIctun: or the random inlercc:pl model,p. isinsumcicnl A natumlcxlCDlion istoallowsubjects to have their own slopcs as well as their own inlcKCpll.. as illustrated in the thinl ftpn C- page: 299). As bc:fon:.the ovaallllVCl'llgen:pasian line is Y=a+/ll. Now, however. thejlhsubjcct's regn:&sion line is r = (a + ,,,)+f/l + "/)1. lathe random inten:epl modclthellJwere normally distributed with mean 0 and variance all2. In the RANDmIIN'IERCDT AND SlOPE MOOEL(~~"/)havcaBIVARIATENORMAI.DlStRlBl7I1ONaboul(O.O).
As before. the level I n:siduals an: the vaticaJ distances between a subject's observations and their subject-specific rep:&sion line. The level 2 n:siduals an: now (H". ~), sa we have two level 2 residuals per subject, rcpn:senting the random inleR:epl and slope n:spcctively. We can calculate the variaDce and covariance or observations in a similar way 10 thc ranclom intcrccpl model. although the algebra is man: involved. Then we can derive the correlation structure: implied by this model. 1he variance or the respaases is no Jollier CDIIIIJainc:cl by Ihe model to be: constanl: it can now iacn:ase With lime. Further, the com> lation bc:tween obsel'\'aliCIM _ the same: subjcct can decline as the lime between them increases. Hence this model is oRen IIIDII: appropriate for l_plUdinal data. The way. the: nndom intcrcc:pl and slope model builds on the nmciom inlCn::epl madel sugests many further extensions. 10 begin with, if we have additional cowriates. they lOG can have random c:tTccli. For example. ir we include a In:alInc:nt variable. subject-spcc:iftc trcatment effects can be eSlimatc:cl. Levels can be added 10 the model to deambe: additiaaal levels in the daIa.. For example:. the first filUM shows thal subjects arc: nestc:cl withincenln:s. We can extend the rancIam intc:n::epl modcllo include a random effect at the ccnln: leyel. Such a model yields estimalcs of CXJIDponents Dr variance at each level (cenln:. subject and obsc:rvation), sa the proportion Dr the lOtai variance between centn:s can be: calculatc:cl. Further, the level 3 (cenln:) residuals can be: examined 10
______________________________________________________ V=(a+U,)+OJ+ vt>t
MULn~MODE~
Y.... V=(CX+U1)+(J+ Y1)t
Y=(a+Us)+(1k vall Y=(a+ual+(1k v.Jt
I
Y=(a+U..)f.(1k v,.)t
I Tune
mullllevel models Schemallc illusttBtion of the random intercept and slope modBI
inclic:* autU~ and cowriales can ~ given nndom centreleyel terms as weD as random subjc:ct-IeWlIlennS. The level I variance (which is analogous 10 the leSidual Yllrianc::c in OLS models) can also be modelled by COwrillles: e.g. male level I ~iduals may be mon: variable than those f'rom females. 11Iis is knowa as modelling compk:c l'Q,.ilIlion. Sometimes the nndom intercept and slope madel is not sufticiendy llexible to model &he carrelation slIUctum, particularly if ob.serYlllions an: close together in time. Many options ~ possible: if subjecls an: observed at identical times' then an DIInIcliWl allCmatiYe is an IIl1:1lruclU1Y!ti COVARIANCE MA11UX~ which imposes no parametric madel on the extyariance. Much has been written on this; sce. fOr example, VCIbcke and Molenbelgbs (2000. Chapter 16) and Digle el QI. (2002, Chapter S). Multilevel models forquanlilaliYeclala&re typically based on the multiYariate normal cliSlribution. 11Ius, the likelihood of the data can be writlal down and maximised using adaptations of flfewton-Raphsan techniques (for details. ICC Raudenbush and Bryk, 2002, Chapler 14). AilematiWlly, a Bayesian apprvach am be adoplc:d (see the chapter by Clayton. D.G. in Gilks. Ricbanlton and Spiegel haller, 1996). If likelihood methods IR adopted. rcStrictcd maximum likelihood (REML)is usuaUyuscd(Verbckeand Molenbaghs, 2000, p. 43). This cmmcls the downward bias of maximum likelihood estimales of variance and "'CIIIi1a negliPble exira work compulalionally. However, changes in REML lag-libUhoacls canDDl genemUy be used 10 CXllllpIR neSled models., so muimum likelihood may be pId'aTed far model buildin& (Goldstein. 2003, p. 36), allhaup. in uncommon situations with many fixed parametelS the lWO can gi~ quite different ansWCIS (\abeke and Molenbeqhs.. 2000, p. 198). OENERALISfD LINEAR MODELS (GLMs) exlend ~LS models tocliscrde 1apo1WCS. Anaiogously,gent'miued iiMtII'mixeti
rrrodel:J (OLMMs), sometimes called nonime. mixed rrrodel:J. extend multilevel models to disc~e responses. As with GLMs. we model a runction orlhe FROBABIU1"Ylhat the n:sponsc takes on a particular value. In GLMMs. however, for ",sponses on the same subject, this probability shlRS a subject-specific tcnn. For example. we can make the random inten:epl model, illustraled by the second figure,. a GLMM by letting Y follow a binomial distribution and writing theOWlrall repcssion line as logit(Pr{Y=)'U=a+ ~I. 1be subject specific ",,",ssion line for subjecljwould'be
Iogit(Pr{ Y = y}) = (a + Uj) + III
(2)
ad. as before, the leYeI 2 ",siduals, u" would be normally
dillributed about zero with variance 0;. Note that. as in GLMs, in GLMMs the level 1 variance is a fixed runction or the mean. TbeJcfore, there is no Ierm cOll'CSponding to Also. as with GLMs. the function 'IOIit' in equation (2) is
a;.
known as the link function. AllemaliYe link fUnctions (e.g. log. inverse normal) can be used together with other probabililY models such as the Poisson or negative binomial. .UnfOllUnalely, fitling OLMMs is nOl nearly as straightfarwanl as fitting multileWlI models to quantitative data. because the LlKELDIODD is much man: diflicultto compute. Three approaches, all discussed by OoldsleiD (2003). arc commonly·adopted. The ftJ5l approach is QUA5HJItEIJ1IDOD. Then: ~ two fcxms of this, penalised quasi-likelihood and marginalia:d quasi-likelihood.. Both meIhods rely an approximations., which can be made to ftrsI or second order. 11Ie approximations involved mean that quasi-likelihood methods provide biased parameter estimates: in panieular, estimales of variance componen~1 tend to be downwardly biased. This bias is 299
MULn~LMODBB
____________________________________________________
mosl marked in data sets with Cew level I unils per level 2 unil or probabilities close to boundaries (e.g. 1 or 0 for binary data). The biBS is least for second-order penalised quasi-likelihood. Another drawback is thaI. wilh quasilikelihood. no estimate oC the log-likelihood is available for comparing models. The scc:ond approach mies on numerical Of Monte Carlo integnlion melbods. This is a1IIIputationaily considenlbly more intc:osive if several random efl'eclS Of Je~ls are in\'olved. Ne\'Cltheless. it is becoming int"l'Casingly feasible. An additional advantage is thai these methods proVide an estimate oC the log-likelihood. which can be used for hypothesis testing DDd interval estimation. The Ihim approach is to adopt a Bayesian formulation with uninCormative priors. Many common models 8IC implemented in MLwiN (Rasbash el al.• 20(0) and seven! models are described in the WINBUGS manual (Spiegelhalter. 11Iomas and Best,. 1999). Nole that these methock can be extended to provide multilevel versions of more gencml multinomial models (see the chapter by Yang. M. in Leyland and Goldstein. 20(1). Multilevel models are likelihood based. An alternative class of melbods. known as OENEllAUSED EsnMAnNO EQUATIONS (GEEs). can also be used fOf multilevel data (see Diggle et al•• 2002. Chapler 11). GEEs model the mean and variance of the data only; unlike multilevel models. a PROBABDJTY DISTRIBUTION for the data (e.g. normal) is not specified. Standard errors arc often estimated robustly ft'om the sampling variance or the raiduals. using the Huber-White sandwich estimator (see HUBER-WHITE ES11t.lAlOR) (Digle et al.• 2002. p. 80). A theoretical advantage of GEes is that the fixed parameter estimates are consistent (i.e. reliable if lhere is sufficient data) even if the covariance sbUclure is wrongly specified; however. they may be ineflicienl if the covariance slJuClure is subslantially misspecified (Goldstein. 2003, p. 21). The drawback is Ihat variance components are not explicitly modelled. but arc tRated BS nuisance parameters. whereas Crom Ihe multilevel modelling perspecti~ the variance components contain useful insights. This is din:ctly related to an important. but subtle. difference between the two. Fixed panunc:ter estimates Iiom multilevel models estimate the effeci of a covariate on a subjecl conditional on lhe .lllue of lhei, mbje~I-:lpedJic effe~t:l. GEE panuncter estimalcs are marginalised over subjecl-specific elTc:cts: they estimalc the average effect of a covariate over the population the data 1£ drawn rlVlD. For multilevel models fOl' quantitative data. conditional and marginal estimates or fixed parameters coincide:. For discrete data they do noI~ oRen marginal estimates DR markedly smaller in mDlRitude (compare Tables 11.1 and 8.2 in Digle et al., 20(2).
The appropriate approach adoptcdde:pcnds on the scientific question. If the primary aim is to modellhe average R5pDDSC as a function of covariates and lime. and Ihe correlation is • nuisance. Gees may be prefem:d. The laulting paramcler estimates are often known as populalion alWQgm. ConveneIy, if understanding of the variance 5IJUctuR is important. e.g. in invcstipling dc:tcrmiaants of variation in growth rates. muhilcvel models are laIuiR:d.. A complication with eonditional models is thal. because the interprelation of the fixed panunc:ters is conditional on the variance model. if this is changed Ihe interpretation is generally altered. The literat~ on medical applicatiOns or multilevel models is vast and growing. A good starting poinl is the collection of papers in Leyland and Gold5lein (200 1). which includes models fOf growth data. l.patial dislribulion or mortalily and lIIOIbidit)'. and institutional comparisons. The latler is an important and widespread application of multilevel models. Applications to .ETA-ANALYSIS an:: discussed by Hardy and Thompson (1996) (quanlilalivedala) and Tumerel Ill. (2000) (binary data). CROSSOVER'TRIALS by Jones and Kenward (2003. Chaplers S and 6) and nUS1"ER RANOOMrSED TRIALS by Donner and Klar (2000). So far. we have assumed that ead1subjcct at each time only has one ~sponse. However. the covarianec: model n:adily extends to allow multivariale responses at each time. For example. a subjecl'S diastolic and systolic blood pl'Cssure can be modelled simultaneously (see the chapter by McLeod. A. in Leyland and Goldstein. 20(1). 1bc multilevel framework. can also be extended to handle lime-to-evenl data. wilh subjects having Rpc&ted events and a common frailty (the commonly adopled term ror a subjectspecific: random elTed in survival analysis). Indeed. rraillies at diffcrent levels of the hieran:hy can be fitled (Singer and Willett. 20(2). Anotherextension is whal is termed 'cross-classified' data. Here subjects DR members of moR than one hierarch)'. For example. subjects may be nested within gencml practices and health auihoritics. but may also be nested within distinct neighbourhoods.. served by a number of general practices. They lhererore belong to more than one hieran:h)'. Parameler estimation is no longer always straightforward (Goldstein. 2003. Chapler 11). Frequendy in sludies involving longitudinal follow-up. a proportion of the intended Rsponscs will be unobserved. An important advantage oC multilevel models over classical techniques is that • complete set oC observations on each subject included in the analysis is not laIuiRd~ subjc:cts can still be included in the analysis with partiall)' observed response data. Further, if subjects are missing RSponSCS, 01' dropout,.lhen provided that. given theif obsen'Cd datL Ihc: reason for the dropout does not depend on the unseen responses (lhe MISSINO AT RANOOM. MAR. assumption). parameter estimalcs from
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MULTINOMIAl.. DISTRIBUnON multilevel models arc stiD valid (Litlle and Rubin. 20(2). Thus. if daaa arc analysed using multilevel models and n:sponse data an: MAR. ad hoc IMJIl1I'ATJON techniques such as replacing a missing observation by the previous seen obsemdion (LOCF) an: not mauircd: indeed they wiu generally introdUCIC BIAS (Molenberghs el QI.• 20(4). Sensitivity to MAR can be assessed; see. for example. C'aJpenlcr~ Pocock and Lamm (2002). However. parametel' estimates from GEEs are not vaUd under MAR: to guamntee their validity a SIJOnger assumption. missing completely at random (MCAR). is required. This assumption slates that the reason for a subject's unobserved data is independent of both their observc:d and unobserved data. Although GEEs can be modific:d to cope with MAR data. to do this efficiently requUcs a nontrivial multistage eSlimaiion process. The above does not apply 10 missing covariate infonnalion: ir a nontrivial degree of covariate informalion is missing. it usually needs 10 be rc:covered using appropriate daaa imputation methods. The generality of multilevel models means that the distribution or many test statistics is only known under the null hypothesis. so simulation often has to be employed in sample size calculations. Simplifying assumptions enable progress in special cases (Diggle eI 01.• 2002. p. 24). As multile\'el models become m~ mainstream. software to fit the basic models for quantitative data is becoming increasingly available in standard packages. All the models described here can be fiued using MLwiN (Rasbash el 01•• 2000); many can be fitted using PROCs MIXED, GLMMIX and NLMIXED in SAS version I.x (SAS v. 8.1. 20(2). For very large datasell. SAS is preferable. A comp!d1c:nsive review of the capabilities of available pac~s is givcn OR the MLwiN websilC. www.mlwin.com. Bayesian model fitting can be performed with the WinBUGS package (Spiegelhalter el QI., 1999). This is very ftexible. bUI the user is required to write the program to Ilt the model. and a degn:e of knowledge about MAuov CHAIN MONre CARLO methods is required. Newcomers to multilevel modelling should start with one of the many excellent books now available. The least technical of these is Kreft and de Lccuw (1991). which giYCS a basic introduction to models for quantitative data. from a social science pcrspecli\'e. The software MLwiN (Rubash el 01.. 2000) also comes with an accessible manual and many examples. Raudenbush and Bryk (2002) give a much more extensive treatment. including discrete response models. with many detailed social science examples. For the methodologically inclined. Verbeke and Molenbcrghs (2000) give a comprehensi\'C overview forquantitali,,-e data from a longitudinal perspc:clive. Examples are analysed in detail using mostly SAS (SAS v. 1.1. 20(2) and some
MLwiN (Rasbash et QI.• 2000) and SPLUS (SPLUS v.6. 20(3). 11Ic latter half is gi\'en over to problems with missing data. l..c:ss detailed but more general is Diggle el 01. (2002). who also oomc from a longitudinal standpoint, and discuss quantitative and disc~ data. multilevel models. GEEs and transition models. Most n:cenlly. Goldstein (2003) gives a cornprdlensi\oe ac:count of the CW'I'aIt state of multile\'el modclling~ including oudincs or tc:chnical details and many illustrali\'e examples. JRC 1Ac:lmowledaemeat: James R. Carpentel' was SUpporlc:d by ESRC Research Methods Programme grant H3332S0047. titled 'Missing, data in multi-level models·.J CupID"r, J., Poeock. s. aad Lwmm. C. J. 2002: Copin& with missing data in clinical trials: a model based approach applied to asthma trials. Statutiu in Met/jcin~ 21. 1~3-(;6. DIp. P. J .. Hequty. p .. ~ K. V. aDd Ztaer, S. L 2002: Analysis oj longiludinal dala. 2nd edition. Oxford: Oxford University ~ss. Danaerf A. ad 1CIar, N. 2000: INsign OM QllQIyJis oj dlUter ramlonri:ation triab in !realill resemrlr.london: Arnold. G~ w. R..1Uc:IIardsIIa. s. aDd SpIeaeIhaIter, O. J. (cds) 1996: MarkoW' chain Monte-Carlo in practke. londan: Chapman &: Hall. GGIdsIIID, H. 2003: Mllllilel'el slatistitsl models. 2nd edition. London: Arnold. Hardy, R. J. aDd TbampIoD. S. G. 1996: A likelihood IIIJIlI'OIICh to meta-analysis with random ctrects. Slatutia in Med· icine 15.619-29. J..., B. aDd KlftWanI,M. G. 2003: Desi8n and QllQlysis ojcrossoO'Ier Irials, 2nd cdition.l..ondon: Chapman &: Hall. KnIt, I. aad de L...". J. 1998: Inlrotlucing mullilerrl modelling. London: Sage. LeJIud, A. H. aDd GoIdstaID, H. (eels) 2001: Mullilnoel model/illg of !realill Jtalistks. Chichester: John Wiley a Sons, Ltd. Uttlt, R. J. A. aad Rabla, D. B. 2002: Slalislimi QllQlysu ..itll missing MIa. 2nd edition. Chichester: Joim Wiley &: Sam. Ltd. MaI..be..... G .. 1bJ,Is, H.. J-a, L, BeuakenI. c.. ~L G., MaUIIIkradt, C. aad Canal" R. J. 2004: Analyzi~ incomplete IqitudinaJ clinical trial data. Biostalislics S.445-64. Rasbas'f J., B......... W.. GGId.sIeIn. H., V.... ~... PInts, L, HnI"f ~I.. Waadlaaatef G.f Dnper. O.f ........anI, L 8Dd Lewis. T. 2000: A wer'$ guide to MLll'iN (rersioll 2.1). london: Institute of Education. Raadeabasll, S. W. &ad BI')'Ic, A. S. 2002: Hierarthit:allinear models: applim/iQIU tmtl data tmal},Jis melhods. 2nd edilion. London: ~. SAS Y. 1.1 2002: SAS Worlch'idc Headquarters. SAS Call1plS Drive. Cary. NC 27513-2"14. USA. \\'\\'W.SB5.com. SIn.r, J. D. aad WUIe", J. B. 2002: Applied IDflgiludinal data QRalysis: modelling MBlfge tmd el'elll Ot:cu"enre. Ne'A' York: Oxford University Press. Spleaelhlllfer.D.J., 1'bomas.A.aadllest,N.G.I999: WinBUGS loersion J.2 user malllltli. C.ambridge: MRC Biostatistics Unil. SPLUS Y. '2003: Insightful Switzerland. Christoph Menan-Ring II. 4153 Reinach. Switzerland. TID'DII'. R. 1\1., Omarf R. Z.. V. . . M., Goldstela f B. aad TbomJllOllt S. G. 2000: A multilc\'C:1 model framc'A'Ort far mc:1a-analysis of clinical trials with binat)' outcomc:5. SlatisliC'J in Medicine 19.3417-32. Verbeb, G. aad Molen........., G. 2000: Ulftar mixed nrodeb for longitudinal dala. Ncw York: Springcr Verlag.
Keaw....
multinomial distribution This is a genc:nlisalion of the BINmUAL DISTRIB~ to the cue where
m~
dian two
301
MULn~OOMAUU~
__________________________________________________
outcomes arc possible for every "trial'. Wbcras the binomial disbibution addmues the number of sut'CCSSCS (and daus implicitly dae numberorrailun:s also) in the case whm: every event 01' llial can only result in a succcss or a failure. the multinomial distribution models the numbers of each outcome in the case where each event 01' Irial can ha~ one of multiple ouamcs. For example. l..ossos el QI. (2000) note that. when modelliDl genetic mutations in a situation wilh fOW" ralbcr Ihan two distinct genotypes, it is neccssary to extend the usual binomial madel to a multinomial one. In general. for rr observations. eacb of which can independently take one: of N mutually exclusi~ outcomes with probabilities PI. P2,• •••• PH (when: PI +P2+ ••• +PN= I), then the PROIABn.JTY of seeiDl .1", observations achieving outcome I•.1": observations at'hieviDl ouame 2. etc•• ~ X, +X2+ ••• +.1"N=n. is givCD by:
whm: rr! (factorial n) is the product of all the integers up to and including n. namely. n x (rr -1) x (n - 2) x •.• x 3 x 2 )( 1. wilh O! dennc:d to be 1. Note that since the data an: multidimcasional. there is no single mean value of the dishibution as such. although (as for the binomial disbibution) the expected number to be seen wida outcome k is Pin.... AGL
1..aIIos.1. s., 11bddnDI, R., N....·haa, B. aad LnJ R. 2000: The inf~Dl:c or anliFG scleClian m Ie fjmCS. JoflTlltll oJ 1mRrunoItID 165. 5122-6.
muftlpl. comparisons Procedures fOl' a detailed examination of w~ differences between a set of MEANS lie. usually applied after a signiftcanl F-lest in an A..'W.YSIS OF VARIANCE bas led to the rejection thai all the means an: equal. A large number or multiple comparisollicchniques has been proposed but no single Icchnique is best in all situations. 11tc majordillinction between the techniques is how theYCXlllb'ol the in8alion of the TYPE 1 ERROR thai would occur if. for example. a simple SruDEHr'S I-TESr was applied to test the equality or each pairs of means. One vel)' simple prucc:clure for dealing with the innation procedure is to juqe the P-wlucs from Cat'h l-test against a significance level ofaim ralhcrthan a.1he nomiaal size oflhe 1)pe I mor. whc:rem is the number orI-tests performed - this is Down as the 8oNfI!RRoNt CORRECl1ON. Man)' alternatives approaches arc available. IDD5t or which arc based on the usual l-statistic. but which differ in the choiccofcritical value apinsl which thc l-statistic is compaml. A compn:hcasi~
account of multiple comparison procc:durcs is givcn in SSE
Hsu (1996).
IIsII, J. C. 1996: Multiple compari.fOM. London: CbIlJllDlR a Hall.
multiple correlation coefficient
See aM1Ilf.UTIDN
multiple Imputation This is a method by which missing valucs in a datuca arc replaced by more than one. usually between 3 and 10. simulated VcniOlW. Each of the simulatcd complete datascts is then anaiysc:d by the mecbod relevant 10 the in~tigalion to hand and the results combined to produce ellimates. S1"ANDARD EJlRORS and CONF1DENCE INTERVALS that ineoJpDl1lle missing data unc:Cltainty. Introducing appropriate random mor into abe imputation process makes it possible to get approximatel)' unbiased estimates of all panunetcn. although abe data mull be missing al random rOl' this 10 be the case. n.c multiple imputations lhemsel~ an: cn:aa.:d by a Bayesian approach (see BAYESIAN METHODS). which requires spcciftcation or a parametric model for Ihe complete data and. if necessuy. a model for the mc:cbanism b), which data become missing. A compn:hensi\'e accounlof multiple imputation and details of associated softwan: an: giVC8 in Schafer ( 1997). BSE (See also DROFOUI'S) ScWer,J. 1997: '11IellllQlysiJo/int.YlfllPlelenrlll,irtl1ialetlalll. Boca Raloa: CRO'Cbapman a: Hall.
multiple linear regreaalon
This is a technique used to model. or chamcterisc quantitativel)'. the ~Ialionship between a response variable. y. and a set of explanatory variables. X2. .... .1"". 11tc explanalory variables arc shicll)' assumed to be known or under the control or the invelligalor. i.e. they an: nul wnsidcred to be random variables. In praclice. w~ this is I1Rly the case~ the results from a multiple regn:ssion analysis an: inccrpmcd as being wnditionaJ on lbc observed wlues or the explanatory variables. The multiple regression model can be writtca as:
x,.
)' =
flo +/lixi + ... +P"x, +1:
wlM= fJo is an intcn:cpt and PI' iJ2' •••, /I" arc rqression cocflicients thai measun: the change in the n:5pome variable associated with a unit change in the com:sponding cxplana~ variable~ wnditianal on the other explana~ variables remaining constant. If Ihe explanatory variables an: highly correlated sucb an interpretation is problematic. The residual. E. is assumed to have a nonnaJ distribution with MEAN zero and YARIANCE! 0 2• An alternative way or writing the multiple regression model is thai y is distributed normally wilh mean /I and variance 02.. where /I =/10 + iJ,·1"1 + ~ .. + iJrr'Cll• This fonnulation makes it clear Ihal the model is only
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MULTIPLE LINEAR REGRESSION suitable for continuous response 'Variables with. conditional on the values of the explanatory 'Variables. a NORMAL DIS. TRl8unON with constant variance. Fora sample orn JapDllse values along with the cOlRsponding values of the explaaatory variables. the aim of multiple regn:ssion is 10 arrive at a sct of 'Values for the regression coefficients that make the values of the response variable pn:dictcd from the model as 'close· as possible to the observed response values. Estimation of the parameters of the model (/J•• {J2' •••• /JII) is usually by least squares (sce Rawlings, Putala and Dickey. 1998). The variation in the response variable can be partitioned into a part due to rccression on the explanatory variables and a Jaidual. Tbis partition can be sci out in an ANALYSIS OF VARIANCE-type table as shown.
mulliple linear regression An analysis of vanlllJCfltype table
DoF
Solute
SS
/tiS
Rqressiolr
q
ROSS
ROSSlq
ResitIMsJ
n-q-I
RSS
RSSI(n - q - I)
MSR
RGMSI RSMS
Under the null hypolhcsis that all the regression coemcienls,fJ.~/J'%•...,{J",arezem.1hc square ratio (MSR)in this table can be tested apinst aD F-DlSTRlBt1I1ON with q aad n - q - I DBJREES OF FIlfl!I)OM. the residual mean square is an estimator of aDd is used in calculating ST,.o\HD,AIlD ERROItS of the estimated rqression coefficients (see Rawlings. Putula and Dickey. 191»8). The MULTIPLE CautELA110X COER=ICJENI'is the COI'I'elalion between the observed w1ues of the response and the wlues prcdiclCd by the model. The square of the multiple com:lalion coefficient gives the proportion of the variance or the response that can be explained by the explanatory variables. The overall test that all the regression coefftcients in a multiple regression model are zero is seldom ofgreat intcRst. 11K: investigator is mCR likely to be conc:emed with assessing whClhcr some subset or the expJanatOl)" variables might be almost as suc:cc:ssful as the full set in explaining the variation in the RlSpoDSC variable; i.e. a more parsimonious model is soughl. Various procedures. have been slilgestcd to help in this SC8IdJ (see .oW. SUBSETS REORESSION, AUTOMA11C SELECnON
mean
cr
PROCEDlJRES).
Once a final model has been letded on. the assumptions of the multiple linear n::gn:ssion approacb for the data to haad nced to be chc:cb:d. One way 10 investigate the possible failings ora model is to examine what arc known as reak.lutm. deftned as: Residual ==
obscrvcd raponsc value-l'raIicled n:sponse value
The n sample residuals can be plotted in a variety of ways to assess particular assumptions of the multiple regression model: a IUSTOORAM or STEM-AHD-LEAF PIDf of the residuals can be useful in checking for normalily or the error terms in the model: plots or the residuals against the corresponding values of each explanatory variables may help to un~r when the n:lalionship between the rosponsc and an explanatory variable is more complex than that originally assumed - it may suggest that a quadratic term is needed to madel a 'U-sbapc' or OJ-shape' apparent relationship; a plot of the residuals against the fitted 'Values may identify that. for example. the psacnce of the multivariate OUlUERS are worthy of further invcsliJation and checking or perhaps that lhe variance of the response increases with the filled values. suggesting that a transformation or the response sbould be considered. 111C1'C arc now many oCher regression diagnostics available (see. for example, Lovie. 191» I). To iIIustnte mUltiple regressioll we shall usc the data showil in the second table. These data arise from a study of 20 patients with b~ension (Daniel. 1995). In practice. of course. Ihere would be 100 few patients to allow a sensible analysis with Ieven explanatory variables. The response variable bc~ is the mean arterial blood pressure (mmHg).
mullple linear reg.....lon Da'. for 20 patients with hypettension
1 2 3
..
5 6 7 8 9 10 II
12 13 14 15 16 17 18 19 20
BP
Age
Wt,(thl
SA
TimeHt
Pulse
Slrra,
lOS
47 49 49 SO
1S.4 94.2
1.75 2.10 1.98 2.01 1.89 2.25 2.25 1.90 1.83 2.07 2.07 1.98
5.1
115 116 117 112 121 121 110 110 114 114 115 114 106 125 114 106 113 110 122
63 70
33 14 10
95.3 04.7
51
&9.4
4K
99.5 99.8
~
47 49
90.9
-18
92.7
47
94.4
49 SO
04.1 91.6 87.1 101.3 04.5 87.0 04.5
45 52 46 46
46 -18 56
&9.2
90.5
95.7
2.OS
1.92 2.19 1.98 1.87 1.90 1.88 2.07
3.8
1.2 5.8 7.0 9.3 2.S
6.2 7.1 5.6 5.3 5.6 10.2 5.6 10.0 7.4 3.6 4.3 9.0 7.0
n
73
99
72
95 10 42 8 62 35 90 21 47
71 69 66 69 64
74 71 68
67 76
80
69 62 70
98 95 18 12
71 75
99 99
BP: Mean meriaI blood Ift~ (1IUIIHu, Age: Age in yean; Weight: ~igbt in leg: SA: Body SlDface ~a (SCl'IoR IDCba)~ TimeHt: DwatiOll ofhypeltcDsion(yan); PuI.: Basal pulse (!alii mim): Stress: Measun: of suess.
303
MULn~RE~SY~
_______________________________________________
multiple II..... reg.....lon Results second table
in the
mulaple record systems
rl'grl'mOll COt!f/kMnl
,.,.""
T-mllie
p.'Yl/ue
-12.1705 -O.7OlJ -0.9699 -3.7765 -0.0614 -0.0845 -0.0056
2.5S66 0.0496 0.0631 1.5102 0.0484 0.Q516 0.0034
-S.Ol41 -14.7710 -IS.lOtI -2.3900 -1.4117 -1.6370 -1.6328
0.0002 0.0000 0.0000 0.0327 0.1815 0.1256 0.1265
(lnIeR:epl)
Ace WeiPt SA llmeHr
PUlse Sbas
The LEAST SQUARIiS ESTWADONS of the rqression parameters are shown in the third lable. The square of the mulliplecorrelation coefficienl is 0.99 and the mean squares ralio described above lakes the value S60.6~ tested Blainst an F-distribution with 6 and 13 degrees of fRedom the associated P-VALlfE is extremely small. Clearly~ the hypothesis thai aU the regression coefficients ~ zero can be safely rejected. For these data the sample size is too small for residual pJots to be particularly informalive. Howe'Va'~ for interest. the figure shows a plOI of residuals against filled values. The plot lives no cause forconcem in respect orthe constanl variance assumption. SSE
0.2-
!
0.0-
i
•
•
•
•
• • S•
•
-G.2§. -G.4-
•
•
•• •
•
•
•
•
-G.6-
-0.8-
• 105
110
115
120
125
Filled values of response
multiple linear reg.....lon Residuals plotted against fitted values [See also OENEIlAUSED LINEAR MOOELS~ LOOISnc REORESSlON]
DaaIeI, W. 1995: BiDslaJulics: II fDunt/tltion {Dr IIIIdlysia ill ,hi' hellllll sdent:rs~ &II cdilion. New Yark: John Wiley &: Sans. IDC. P. 1991: Itcpssion cIiapomc:s.ln Lo\·ie, P. aad Lovic, A. D. (cds). Netl· "'tIDpIfII'tI/~ in slalislicsfor ps)"dzolDD aniI,hI' Jocilll mm~9. Lonclon: Routi. J. 0., p....... S. G. aDd
1..0"
See C'APIVRE-IlEC.\PJRE
METHODS
Residual staDdanI emir: 0.4072 an 13 ciegRCs oI'fRlcdom: Multiple R·SIpIIIKd: 0.9962.
0.4-
DIc:keJ, D. A. 1998: Applil!ti regressiDlf tmaJ,sis: a rat!tll'C'lr totJl. New York: Springer.
SllIINIarti
Estimtlled
TI'Im
for data
Raw...,
mulaple testing
This n:fen to canyilll oul mulliple (more than one. bul possibly very many) statistical SIINIfI. C'ANCE TESTS. The problem is one ofnot conll'Ollilillhe overaD TYPE I ERROR rate when we perfonn many signiftcance tests.. The 'tYPe I error is the probability offalsely rejecling the nuD hypothesis (Ho) when it is actually true. If we compare lwo trealments in a aJNJCAL 1RI~ we generally state the null hypothesis to be that there is no difrereace in mean response (ar in death rates~ or cure rates~ etc.) between the lwo treatments. This is not a statement about the data that we see in Ihe trial (the sample meaas~ .i;) bul rather one about. the true (but unknown) population means.p;. The alternative (HI) is simply the CODver5e - i.e. that then: is a difrerence between the treatments. Now~ usually (although it is a very arbitrary yanklick), we reject Ho and dc:clare thllla difrereace between ImltmenlS exists if the calculated P-VALUE is Jess than S4Jf,. So for any single significance test~ if the null hypathesis is true. implicitly we are accepting a risk ofbeiq WIOng ofSfI, -and for many situations~ many people consider that an adequately small risk. Howe\ou. whlll happens if we perform more than one significance test to answer the same (ar relaled) questions and we are pn:pan:d to rejeclthe null hypothesis if either (or both) tests live P < O.OS? In Ihe simplest case of Iwo independent tests. the FROB,.\BIIJIY of eilher lest 1 Of' lest 2 (or, indeed, both lests) giving us a small (say, <s... ) P-value is 1- 0.9S2~ which equals O.097S (or close to 109f,). If we carried out t~ four, five or even ten independent sipificanee tests,. the probabilities thai III Jeast one of lhem will give a small «SCjl,) P-value would be. respectively, 0.143, 0.186. 0.226 and 0.401. These are Ihe IW.SE F05ITI\'E BlIORUI'ES and it is apparenllhal.. \'elY quickJy• the risk of er'l'OIleous/y declaring a slalistically signiftcant difference between the Imdmcnts (i.e. when all of the null hypotheses are true) becomes much (and unacceptably) higher Iban SCjl,. Therefore. we need methods to com:ct for this inftatcd ~ I erIor rate. The simplest method to use is the BmHRIIONI OORREC TlON. Using his very simplistic line or mlSlDllilil. if we are to cany out two signifteance tests but want to ensure that the overall chance of making a false positive enur is kept III S..~ then we should test each of the two nuJl hypotheses III the 2.5., level. Then the probability of either test 1 or lest 2 (or bulb tests) giving us a small (now 'smaU~ means less than 2.SCjl,) P-value is 1- 0.97S2 • which is close to So if either or baIh tells meet this more stringent level of SlDlislical sipiftcanc:e. we can reject the null hypothesis and onJy IUn a SCjl, risk or malting a false positive claim
o.os.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MULTISTAGE CLUSTER SAMPLE
about a ditrcn:nce between the In:almcnts. His idea eXleads very easily to ~ than two teslS: ror tIRe we compare each calculated value or P with 0.OSJ3 (= 0.01(7). ror rour teslS we compan: cach calculated wlue or P with 0.0514 (= 0.0125). and so on. Thc:n: an: many masons why mulliple tesling (particularly in CUNlCAL TRLW - but in other medical applicationS too) might occur. evcn in studics that just compare two tn:alments. Examples include: many ENDPOOlIS. mo~ than one time point (e.g. short-tenn response and long-term response). more than one analysis me:thod (e.g. using a NONP•.utAMETIUC METHOD and its eqUivalent parametric test), diffcrent definitions or ~sponse to In:atment (e.g. ditrerence: in means. or mean changes rrom baseline. or pen:entage change from baseline. or proportion or ·responders'. etc.). ditrcn:nccs within (and between) various subgroups (see Sl1JKJROUP ANALYSIS). multiple: INTERIM ANALYSES. and so on. In studies comparing more than two treatments. there arc all thesc same possible examples or multiplicity in addilion 10 thosc or many comparisons between all the diffcrent treatment arms. It is thel'dore quite clear that \'Cry qUickly the simplc samario of one significance test or one primary endpoint can become unrealistic. The: Bonrcrroni mc:lhod is simple but also very incf1lcient: it lacks statistical POWER to identify real ditrerenccs between tn:atmenlS (see. for examplc. Pemcgcr, 1998). It can also lead to the \'cry uneomronable situation or two tests giving P-values of. say, 0.03 and 0.04. At fil1l site. these bot/r appear to dcmonstmle a statistically signilic:ant ditremlt'e between the groups; they an: both less than 0.05. However. because thc:rc are two lestS. Bonremmi says we nee:d to compan: cach orthc:m apinst the I11OI'e stringcnt criteria or 0.025. but now neither meets this level orstringcncy! Hence.. various (in ract. numerous) other methods ha\le becn proposed - some an: simple (albeit not as simple as Bonrenoni) and some an: ve:ry complex. The details an: beyond the scope of this text althoup a good overview is given by. roreumple. Hochberg and Tamhanc (1987). We will iIIustratc ideas using two simple: methods. One is by Holm (1979). He proposed ordering all (or k) calculatcd P-values PI (the largest) to Pit (the smallcst). The smallest calculated P-value (Pit) should be compared with aile: if P" < aile then we dc:cl~ this ditrerenee to be statisticaUy signillcant and move to the next smallest or the P-values. Pit-I' If P.t-I < aI(k - I) then wedeclan: this diffcrencealso to be statistically significant. The: proc:cd~ CXlGtinues 1OI1i1 all or the calculated P-values ha\le becn tested. orunlil one of the P-values rails to mcel the criterion for statistical signiricance. In this case, the proc:cd~ stops and no more or the teslS an: considc:rcd significant. Another method is called the ·closed tesling' approach; it is best dcscribc:d by an example. If we wanted to compare two doses of an activc drug with PLACEBO (or some other refc:rcnc:e
produc:t) then it might seem that we nccdtocany out each tc:sl al a 2.5tJt significance level. Howcver. assuming we believe that the dose-n:sponse relationship will be monotonically increasing then we can begin by just tesling the highest dose against the rererence at the standard 5tJt level.lrthe P-wlue is smaller than Sc.t. then the diffcrence is declan:d stalistically significant and the next dme(and its P-value) an: considered. This P-valuc is al.wcomparcd to Sc;t level and. if smaller than this. is declared as signiftcanl The proc:c:dure could continue ir thc:rc wc:rc several doses. Each test is carried out at the 5CJt significance level until one rails to meet it - then a1lteSling stops and none or the other tests is considenxl significant. This is a much more powerful procedure than Bonrerroni' s although it has a major problem if a treatment turns out to have no cfTect on the fine tc:sl (in this case the highest dose) but may have substantial effeclS at lower doses: none or the secondary tests can be considc:n:d ·significant· because the very fine test (for the: highest dose) railed. These two approaches (and ~ are many others) illuslrale that then: is no simplc. single approach to solving multiplicity problems that is applicable in all situations. SD Horb.... Y. and Twm....... A. c. 1987: Multiple comporison pmMIMrrs. New York: John Wiley a Sons. Inc. HaIm,5o 1979: A
simple scquenlially sejcctive multiple test proccdulC. Srantlintn'ion JOIII7Ia/ o/Slatislics6. 65-70. re.,..r, T. V. 1998: What's wrong with BoIlrmoni adjustments? BritUlr Mt!tIica/ Jounra/JI6. 1236-1.
mulUstage cluster sample
nonprobabilislic method or sampling is used when members of a population an: ammlCd in subgroups or cluslCl'5. In this method clusters an: the sampling unit rather than individuals. Members within acluster should beasdiffc:rcnt as possible whereas clusten. by way of contrast. should be as alike as possible. However. this condilion is hard to satisfy and sinc:e two mcmbcrsora cluster will be more alike than two rromditrc:rcntc:lusten.. it is beltcrto have many small clusters than a rew Iarp clustcrs. as this reduces sampling enor. Each cluster should be similar to the total population but on a smaller scale. Clusters must be distinct from each other and evcry member or the population should fall within a duster. In some situations. it is neccssary ror all clustcrs to be of a similar size and this may n:quire the pooling or some clusters. Otherwise. the PRCILo\BDJ1Y that a cluster is chosen can be made proportional to its sac. so that bigger c:lu5lels an: more likely to be chosen than smaller c:lusters and the probability orsclecling an individual member orthe cluster is inversely proportional to the size of the cluster. For a single-sta&;e cluster sample a list or the clusters is constructed. Then a random sample of the clusters is taken. This may be a simple random sample. with cach clusterhaving an equal probability of being included in the samplc. or it may be that the probability of being in the sample can be proporlionalto the size or the duster. Once the clusters have been selc:cted each member of the cluster is included in the sample. A
305
MUL~TATEMOOBB
____________________________________________________
For a lWo-stage cluster SlIIIIple ahe mdhod is the same as a single-stage cluster sample but once the clusters have been selected aben a SIMPLE RANDOM SaUIPLE is used to select the members of abe cluster to be included in die sample. Clusters ..ay also ronn larger clusten. in which case. multistage sampling would be used. Fint. abe clU51e15 would be sampled. rollowed by the subelusters. Depending an lhe makeup of tile popullllion there may be many SlqC5 to die multistap sampliq. Multislap CluslCr sampliq was used in a SlUdy or violalions or the international CXIde of marketing of breast milk substitutes ("naylor. 1998). Hen:. the capilaJ city or rour chosen countries ~ the main c1ustc~ di5bic:ls wen: randomly selected subc:lusten. health facilities wen: mndomIy selected rrom the subelustcrs and mothcn WCR S_III_ally sampled rrom Ihe heal... facilities. The main advan. is IhIIl no sampling rmme is n:quirm. 1he maiD disaclwnlage is that the sampling is nonnadom and samplinl envr incIases by laking multiple samples. as then: is sampling CII'CII' aI each _e. For further details see Crawshaw and Chamben (1994). SLV
rurlher transitions out ~ pniaibited. "Ibe Slale sllUclun: describes the slates and clelcnnincs which transitians an: allowal. Dilfen:nlslalc strucl1RS may chanac:terisc the same sIoehastic process and ~ thus nat unique to the process. 1he stale strucl1R chasen depends on the questions or inlcn:st. tile lI1IIISpam1Cy or naaclcl assumptions and Ihc case wiab which 10 make infCRnce5. 'Tbis strucbft can be n:pn:scntcd schemalically with a multislale diapam in which boxes n:pRSCDt Ihe wrious sIaIcs and anows between tile boxes n:pRsenlthe possible lnIISilions thatcaa occur. Figgn:s (a) 10 (e) pRseal wrious mulli5bdc diapams of commonly obscrvecl mullislale pmcess types.
1AIve I ..... 1Dead 1 1 Well 1..... 1 Sick 1..... 1Dead 1 (8) Sul\'iw1 Model
Alve
\.
muttlalate models Tbese an: often adopIecI for anaIysinl event history (see SVRYIVAL ANl\LYSIS-AN CMlRVIEW) and l.ONOIIUDIJtW. DATA. They an: commonly applied in studies whCR subjcclS an: roUowal up over lime with R:Speclto a (stachllSlie) process of illlelat ahal is observeclto occupy exacdy one ora ftnite number ofdiscn:1e slalcs al any poinl in lime. Mullillale naaclcls have been round to be eXlmnely useful in medical araIS such as psoriatic arduilis. where individuals may IDOYC between a number of disability sblles (Col. mild. moderab: and seYen: disability) over Ihe loqiludinal couneoflheirdisease (Hustedel III.• 2005); in hepatitis C virus (HeY) disease PlUlR:S5ion studies when: li\'Cl' biopsy scan:s an: used 10 dc:tcnnine the stqe of HCV-n:laIc:d li\'CI" disease (Swc:cliq el III.. 2006): in desaibing the slales. characterised by the occurn:nce of various evcDlS (acule pan venus hast disease, chmnic: graft versus hast diseuc~ relapse and dea... in n:mission). whCR a leukaemia patient may enter followiq bone manow InIDSpl....alion (Kcidinl. Klein and HORMiIZ, 2001); and in olhcr IR85 such as Alzheimer·s disease. bn1nc:hiolitis oblilCranS syndrvmc. cancer. c:agnilivc impairment. diabetic n:tinopaihy. HIVI AIDS. studiesoflwins and incompelinl risks ordeath studies. MultiSlale naaclcls an: buccl an stochllSlic processes that move tiuuulh a series of discmc stales in conlinuous lime. 1he movemenls between slalcs an: called lraasitions. SlIIlcs can be lransienl (movements out ~ allowecl) or absorbiDl, ir
1---..1=11
1=21
en....... ad CIIaadIen, ... 1994: A COIIcUr co",se ill A Irlyl $ttllistits.lnI cdilion. Cbcllelthlm: SIIRIeyThomcs Publidaen Ud.. .,.,...... A. 1991: VaoIaIionIaflbe intenlltiaallcodeofmarkdingor IRasI milk sullstibdCS: prewleace in faur CCIUIdIies. Britis" Mediall JtJrntl1316. 1117-22.
(b) Pmpasb-c Madel
(e) CompcliaJ Risks Model
1Heallhy 1---.. IDisabledI \.
,/
1 Dead 1 (d) Disability Madel
(e) WDCSS-Rccover)'-Deada Madel
multl8late models Val'ious cfagnuns of canmonIy obsetved mullislate pnx:ess Iypes A mullislate process can be spccifted fully either lIIrou&h ilS transition intensities (also known as hazard f'Unclions. sec SlJRVIYAL A.'W.YSlS - AN OVERVIEW) or by ils InInsitian abilities. 111e IrDsilian inlensilies are tile inslaalancaus probabililics per lime unil (i.e. tile InIIIsilion rales) of 10iRl rl'OlD one stale 10 another. liven the history (developmenl) or tile process just prior to the limes or the InDSilions. WhcM a lnnsilion fJODl one stille dim:t1)' 10 another is impassible Ihe cOIRsponcliq lransilion inlensily is ZCIO. 11Ie lnlllsilion probabilities n:pn:SCDl the condiliOnal prababilitics or Ihe process being in particular states al various times. giycn ahal tile process wasobscrved in specific stales III earlier limes and tile hislories of the process an: up to Ihese c:adier limes. For movements out of an absorbing state the transition probabililies will clearly be zeIO. Mathematically. liven a mulli5lale praccss for a subject. X( I). allime I ~ Owilha finile discrac stille spaccclenalcd by
piU.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ MULTIVARIATE NORMAL DISTRIBUTION Q $
= {I. 2, ... , ".} and history up to some time just prior to
< t. H(s- }.lhen Ibe transition probability of moving from
state i at time s to slate j at time I is given by:
P,(s,I) = Pr(X(I) =jlX(s) = i;H($-}} and the lnMition intensily of making an; toj instantaneous transition (denoted b), i - j) at time I is given by:
. Pr(X(I+&) =jlX(l) = i:H(I-») I1m «ij(1 ) = 41-0 AI It is imponant to note that both the transition interwities and the lnMilion probabilities arc esscntial for Ibe development. estimation and inference of multislate models. The transition intensities ~ important in the formulation of neccsslU)' models for Ibe data. in terms of ineorponting c:ovariatcs that explain some of the heterogeneity across subjcc:ts and for representing the assumptions (e.g. time homogeneit)'. Markov or semi-Markov, piecewisc constant or Weibull baseline interwilics. proportional intensities (sec JIR(II[It'I1(]N HAZARDS). c:Ic.) being made. The lnMilion probabilities. on the othCl' hand. arc impodant for aJII&lructing the likelihood function (sec LIKEUHOOO) to be maximised and for making long-range predictions. A simple and mathematically lnIc:table example of a multistate model. which has been described onen in the Iilerature. is the lime homogeneous Markov model. Here Irarwilion intensilies an: assumed constant over. or independent of, lime (i.e. time homogeneous) and only depend on the histor)' of the process through the current state (i.e. Markov assumption). In this special c:asc. a model for the lransition intensities. incorporating baseline explanatory Yariables, can be specified using a multiplicative structure and lhe proportional intensilies (or hazards) assumption proposed b)' Sir David Cox in 1972. which is that the transilion intensity. ay(,}. of making an ; - j IJUnsition at time , is given by:
a,;j(l)
= a;exp(lllijZI + '" + flpipp)
where «~ c~ 10 a baseline intensity of making an ; - j transition. which is being modified by the exponential of a linear combination of the baseline explanatory variables. ,z,. The p regression coefficients. fl.ij •... ,Ppy. associaled with Ihcsc baseline explanatory Yariables are assumc:d here to be transition intensityspecific. although constraints on them can lead 10 more pmsimonious models. The exponential of the rqression coefficients are interpreted as rate ratios. Fwthcr extensions beyond simple time homogeneous Madtov mullislaie models have bc:cn made. The readers ~ ~ferrcd to review articles by Commengcs (1999). Hougaard (1999), Andersen and Keiding (2002) and Mcira-Machado el QI. (2009) for further details.
=., ...
Frc:quendy in evenl history and longilUdinal data studies in medical n:scan:h.obscrvalion ofthe exact tnmsilion limes may not ocxur. lbis may be because. by the end or follow-up. all individuals under slUdy have not reached an absorbing stale. which thus rcsulls in right a:nson:d observation times. R»Uowup of some individuals may ba\"C only happened some lime after the process began and various evenls may have occurml bctwa:a the start of the process and abc start of follow-up. which ma)' ~t in left censoring if the limcs of these various evenls arc unknown. Furthcnnorc. then: are many longitudinal studies when: subjects are obsc:rvecl intcnniucntJ)' (i.e. discretely in time) and the times of transitions are interval censored (i.e. the exact times of bansitioRs ZR unobsc:rvecl). except possibly for an absorbing Slate such as death. Finally~ left InDICation may oa:ur when indiYiduals come under observation only some known time aftCI'the 'naln'/ defined time origin of the process. For example. polential participanlS eligible far a study would only ena if they have not died by stud)' commencement. 1b: pn:scnc:e of these obscrwtion and selcc:tion schemes may pose spc:c:ial problems for wUd inference. and assumptions on how these featu~s mayor may DOl be infonnalive for the mullistate process arc laIuircd in order to c:orwtruct the appropriate likelihood funetioa. In most siluations. at least initially. the 'sampling time prot'lCSS' is assumed to be noninformalive (ignorable) for the multistale process. BT ADd..... P. Lull Ke....... N. 2002: Multi-statemodds forevent bistoly analysis. Stalislkal Melhods in Met/kal ReSf!tUch II. 91-115. D. 1999: Multi-state models iD epidemioJac U/elime Odta Analysis 5. 315-27. Haapard, P. 1999: Multi-Slate modds: a ~\;c:w. U/el- Dala AllaIyJis 5. 239-64. Husted, J. A.. Tom, B. 0.. Farnell. V. T., Sc:beIItq, C. &lid G......n. Do Do 2005: Description and prcdidion of physical functional disabiUl)' iD psorilllk arthritis: a IODgitudinal analysis using a Markov modd 8pIIIOIICh. Arthritis C~ anti ReMarm 53. 401-9. Klidin. N.. JOebl, J. P. aDd HorowItz. Me M. 2001: Mullistatc models 8IKI outcome prediction iD bone manuw baDsplantatiOll. Stalislics in Medici_ 20. 1871-85••feln-MacIIHo, L., de Uiia-Alvlln!~ J., Cadano-SUinz. C. uII Anderllll, P. K. 2009: Multi-slalc models for die analysis oftimc-lo-e,'enl dais. Sla/aliral Metboth in MftlicaJ Restorm II, 195-222.5........ M.J.. DeAftIIIII, D., NttII. K. Ranay, M. E..1n1aa. W. L .. Wrlgbt, M .. Bnat, L .. 1IarrIs. H. Eo, Tnat RCV stad7 Graap, Hev N........ Reabfer 5........ Graap 2006: Estimated pro~ Ales in three VDited Kingdom hepatitis C cohorts c1ift"emi acconling to method of RClUibnc:nl JOIII1ItII of Clilrica/ Epiniology 59, 144-52.
eam.......,
a..
mulUV.late analysis of variance (MANOVA) See ANALYSIS OF VARIANCE
mulUVarlate nonna. distribution
This is a generalisation of the NOIWAL DJS1RIBUTION 10 more than one dimension and the PROBABILITY law that underlies many methods of multivariate: analysis.
307
II,
h. lu til'; t
f fit'
II,
fa
t
.'
it III i : IS (p·i i 'I'~ ~!
["
'
Sf.
It -
·lrJIII,J~li·~lP·=il!
_I".,.
I
t
ill
1 9.
...
'
I~ l~ff. I:I~ 'illl~Ui I (I III JJhi:~lIl,U:II,1 .J II 'a' Ihll=~l' 1.1 t~tll ~J~ i J t.' lifUr I~ I.J ~ I. -l I , , , r 11,.19. iI i I
It!~f'
51
11' 1 ' II II r I ...,[ l"=- s. I.
I --...
J:-I, r ~ a
II
I'
~~1J IJllr[JII~tl 1 It'· r 'I .. ,~ -- p i S i r. ".. " 1IL'l I ~ I ' r.Ii
I hl~ lahillltl'~ 1"I'li'~' 1·1-
"It ' ,
f: R.
~itll- - IEjlf~lJ "I~{ttail:l f a rr l . ,II ~ 1-" a r: if!. ~ ~ II ~~ lfiil! i.!~.·t 11'01
I.r1t:1I I"
D
i
! ftI
I
--
N negative binomial distribution
This is the
PROBA811J1'Y DIS11UBUI'IOH of the number of events requRcl in order to obsene k "SUCXlCSSCS' • ConInISI this with abc BIKOMLoU.
which madcls Ihc: number of SUCCCSICS that wiD occur gi\'en a fixed numbcrof lrials.. Also nole that, since the 0E0ME11UC DlmtaurlDN models the number of CMnts mauin:d to obscm: one success, it is D special case of the ncptive binomial. If cach e\'Cnt indepcndeady has a probability ofsucccss.p. thcn the probability mass function for the number of evcnls. :c. requilal befo~ observing k successes is: DlSlRIBUJl)N.
(."-I)!
Pr(X == x) == (k-I)!(x-k)!
I
Jt-lt
(I-p)
when: n! (factorial II) is given by abc produci of inlegers up to and including n. and or is defined to be I. 1'be MEAN of the distribution is kip and the VARIANCE is k(1 _ p)/p2. The distribution can be genenliscd to the case where the k panuncter is nul an inleger (by n:placingthe fKlorialtmns with pmma functions as mentioned in cw.tr.tA DlSTRlBlJ'J1OX). which then enables abc following inlCrpn:talion. Suppose we have observations of count daIa from a populatiaaofsizeN. wbc:~each pcnon'scount Will beindc:peadently distributed as Poisson with some parametu A. In Ibis case, we would expecllhc counts in the popuIatiaa 10 be disbibuled apin as D PoISSoN DlSl"RIBUTION. Yet often the: papulation exhibits m~ variance than can be explained by D Poisson distribution, an exampleofCMlRDlSPBlSIDN. Oncn:ason farthis might be that individuals do nol shan: the same value far A. For example. Mwangi el Ill. (2008) show that counts or malaria episodes in 373 childn:n show I110Ie Variability dian can be explained by a Poisson distribution. but show also that the negative binomial distribution proVides a much bc:UCI' fit. This is aUributc:d 10 variation in the susceptibility of the childn:n. with some children being at increased risk or clinical malaria compan:d to others, and so the: model assuming D common A does not hold. The negative binomial distribution is not the only distribution 10 allow far gmder dispersion than the Poisson distribution. but, specifically. ir \'Blues from individuals are Poisson distributed but the values or .1 vBI)' between individualsaccanling 10 D gamma diSlribution. then the: population rrequencies will be distributed as a negative binomial distribution. Fwthc:r discussioD or this DDd oIbc:r aspc:clS of the distribution sec: Grimmell and SliI'zaker (1992). Oelman el QI. (1995) and Glynn and Buring (1996). AGL £rrqdDpllldi~ CGIIIpIIIiM It) MtId"KaI SlQlirlia.:
cI) 2011
GeIawI. A., C'arID,J. a..Sten, H. S."" au...... 0. B. 1995:
Ba)'esi- dam anaI)'sis. Boca RaIon: Chapmaa a: HalIICRC. GI;yDD. R. J ............. J. E. 1996: ~)'S of measuring nICs of m:urmal emllS. Bri/frlr MftiimI JOUfIIQ/ 312. 364-7. GrIIIuDea. G...... stlrDbr. D. R. 1992: ProIIIbiIiIy and randompRlCesses. 2ad edition. Oxrord: ClareDclon PIa&. Mwllllll, T. W ........., G .. W-...., T" N.. ~ s. Mot Saow, R. W., d tJ/. 2008
Evidalce for O\'CI'~ioD in the: dislrillutioa of clinical malaria episodes in c:hiJdreD. PlAS ONE. 3(5). ell%.
negaUve predictive value (NPV) 1'bis is defined far a diaposlic test ror a particular condition as the ~ BD.ITY thatlhosc: who have a negative: Ic:st do nol aclually have: the condition under investigation as measuml by a ~fen:aoe or "gold' SlandarcL (Conll'aSithis with the I1OSI1IYE FREDICTIVE VALUE.)
If the data an: sct out as in the table. Ihc:n: d NPV==-
c+d NPV can also be expressed as a percentage.
negative predictive val. . GeneTIIIIIIbIe of test results. ~a+b+c+dmdWa.~s~
PreMnt
Total
The NPV should be JRSCllted with C'OIIRED INI'ERVALS (typically SCI al9S'l.) calcullllcd using an appropriate mdhod such as that of Wilson that will not produce impossible: values (IJCIt'CnlagC:S grealer than 100 or below 0) when NPV appnlKhes exlmDc: values. CLC (Sce also FALSE NEQA11\'E RAT'E. FALSE PDSJ11YE RA11!. NEiOAlIYE PUDlCl'M! VALUE SENSmVlTY. SFEClRaTYI
AUman, D. G.. MIIdda, D., BrJ-' T. N. aad GanIaer, M. J. 2000: Sialislia ..ilh confidence. 2nd edition. Loadoo: BMI Books.
nested case-conlrol studies
This is a fonn or in which the cases and controls an: drawn from within a larger study. In other words. abcy an: nested within a pamlt study. which is usually a CGIORT sruDY bul sometimes D CROS5-SEcrJONAL sruDYar a PREVAI.ENCE study.
C'ASE-CONI'ROL S'ltJDY
S«YIIId Edit_ Ediled by Briaa S. Everitt and ChrisIGph« R. P'dJDeIo
JohD Wiley 6\ ~ ....
3D9
NESTEDCASE.coNTROL STUDIES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
111e acslCd nalun: of such studies provides Ihc method's slrength. One concern in Ihc design of cue-cantrol studies is the appropriate choice of controls.. all of whom should be eligible 10 be cases if Ihcy were 10 develop the: disease. A casc-conll'Ol study nested within a aJhcxt slUdyoven:omcs this concern as a control within the cohort who ~Ioped Ihc disease would be counted as D case. Usually. but not nc:ccssarily, the controls who 1ft chosen are matched to the cases GIl various confounding factors such as age and sex. The usual n:ason for conducting a nested case-cantrol study, rather than anaIysiAl data on the en~ cohart or survey, is economy. Usually more data 1ft collecled on the participants or the nested sludy than in the main study. Sometimes these data 1ft derived from the analysis of ston:d samples, blood or wine, for example, or from fUrther infOl'matiGII beill& obtained Iiom the participants. In D study to examine the role of sex steroid hormones in relation 10 endometrial cancer, Lukanova,,' til. (2004) nested acase-controilludy within thn:e large cohort studies flUID the Italy, Sweden and Ihc United Slates. 11H: cohorts comprised over 65000 women tiom whom venous blood samples had been taken at enrolment in the cohorts. From within the cahmts. 124 cases of endometrial cancer wen: idcalified and two conlrols per case were chasen from within Ihc same CXIiIorl as the case and matched on various factors iDc:luding date of and age at blood donation. In this way,1hey were able to contine the pracessin& of the samples to the 124 cues and carrespanding controls, providing a In:al saving on processing samples from all participants in the thn:c cohorts. yet without major loss of statistical POWEll. While the pracessiDl of blood samples is often a component of nested case-c:ontrol studies. othc:r samples can be the focus. In a study to assess the role of selenium in ClCIIOIIary bean diseuc in men. toenail clippings were obtained for selenium analysis (Yoshizawa ,,' Ill., 20(3). 1be study was nested within the Health Professionals' Study in the United StaleS. which is acahoJt studyofovcr 50000 men. Within the cahoJt 470 participants devclapedcoronary heart disease and a matched control was chasen for each one. Thus. fewer dian 1000 toenail samples had to be analysed for selcaium. The cost of sample pmc:essing is aaI Ihc only n:uan for conducting a nested case-control study. While infonnation collected on the cohan is of inten::sl. sometimes funher daIa coIlec:lion is Kaluiml. Forcxamplc. London ", til. (2003) canduc:tedacasc-c:onbUI study ofbrcastcaDlX'l'nCSled within a CXIIIDIt study of more Ihan 50000 women. '11Ieir inteRst was in n:sidcntial mapctic field exposure and forlhis they wen: able tOfacusonthe743casesidl:ntiftedinthecohortashavilllbR:ast cancer and a comparable numbcrofcontrols. Detailed 1IliSCSSmeat oflhe mapetic Reid ex~was madcin the: homes of the sclc:cled participants. Data on adler risk fldols for brast cancer and possible confounding factors wen: ain:ady available in the daIa Ihat had been collected for the entin: cohort.
Case-cantrol studies can also be ncsled within crossseclioul surveys. Baker ,,' til. (2003) conduc:ted a saney of approximately 3000 men in soulhcm Enlland to ascertain Ihc prevalence of knee disorden in Ihe p:ncral papulation. A ncsled casc-control component consiclcn:d Ihc cases who had underlone knee SUJlCl')'.1be focus was GIl occupational and spodiDl ac:livilies. The activities undertaken by the cases at the birthday prior to their rqatc:cI onset of symplOms were considered. For each case, fi~ conllOls wen: selected matched 10 the case within I year of age. The activities undertalccn by the controls Dlthe same: birthday as the case wen: then consiclend 11H: nested case-c:ontrol study thus allowed the inveslilalOn 10 avoid bias due to the cases beinl likely 10 give up activilics at an earlier qe than the controls. because of knee pain. A matched analysis (see MATCHED PAIRS ANALYSIS) was Kaluia and thus the entire cross-sec:lional lIudy was a weaker 1001 for this particular analysis than the matched nested case-canll'Ol method. A runhm- vmaat on Ihe mediad is to nest a casc-control lIudy within a large rouliDe daIa coIlcctiaa system. Alerbo (2003) analysed the risk of suicide in relation to spouse's psychialric illness or suicide. 'l1Ie data wen: obtained by IinkiDl Ihe Danish population n:gislcls usiAl the unique penonal identification numbers assigned to all people in Ihe countr,. All suicides were identified. as wen: 20 matched controls per case. All the spouses and children UviDl with the cases and controls wen: identified from the relisters, alonl with infonnalion on diagnoses from the Danish psychiatric: register. The sludy showaI that lhcre was a gmder rislc. of suicide among those whose spouse had been admitted 10 hospital with a psychiatric disanIer or who bad died, particularly if the death had been by suicide. Few countries an: able 10 conduct such saudies as the Iinkap between m:ord systems is naI pa5Sible or is DOl allowed. but when: it is possible. suc:h as in Scandinavia. the opportunities for such epidemiological lIudies an: JI'CIIl. In a cohort study in which cascs an: rcmJiled prospcc:tivcly il is possible that controls identifted DI one lime point became cases IDler. This is parliculady likely 10 happen if the disease is cammon. An example of this is a nested case-cantrol slUdy within D birlh cohort in Sweden (Emenius el til., 20(3). Hen: the rocus wason recum:nl whecziDl inchildn:n in relation 10 nitrogen dioxide exposure. Wheezing is common in childhood and cases were identified from assessment of the cohort at I and 2 yean of age:. Controls were chosen tiom within the cohort and matched to the cases on the day of birth. Three controls. selc:ctcd 10 match cases identified DI the age of 1yc:ar, wen: found to be wheezing at the 2-year DSSCSSlDeDl and so were also included as cases at ahaltime poinL Such nested cuc-conbol studies in which controls can became cases are sometimes called case-cohort studies. HI (See also BIImI COHORT snmlES. CASE-CONIROL STUDe)
____________________________________________________________ NEURALNEnNORKS Aaerbo, It. 2003: Rat of suicide aad spousc's ~hilllric ilJaea .suicide: ncsIaI cac-canaol study. Btitbh MftiksJ JDIImIII 327. 1025-6.. ....., p......... L, OIapIr, C. .... C. . . . D.2m3: KDec diSlldl::rs in the pmI papuIaIion ... Ihcir ",lilian to CICaIplIioL OmJptltiDlltlltllltl ElrrirOlllJlmlttI Mftlidne60. 794-7........
W1
G.. " ' " .... 0 .. " " " , N., K.... IL-J.. Lewaf.M..NonMI,
S. L 8IIdI ~. M. 2003: NOIIS a ...m-ofair poIIuIion. and n:curmnl whccziag in chillhn: a aestal cam-cClllllal SIUcly wiIbia die BAMSE ..... cahad. Ot:r:upotitmtll_ EiJ'firtJnnrmlai Medic;' 60., 87&-81. Laadaa, So J.. ~J. M., H. . . . K. L, , . . . . . B., MoIne, ... KaIaaII. L No. ~ W. T., ,...., J. M. ... ........... B.E. 2003: RcsidaIliaIlIUIpCtk fieldCXJXJ5IRand blast CllDCCl'risk: a nc"~ study fiana mullicdmiccal1lllt iDLas ADpIes Caunty, Califamia. AmeriaBJ JDIltNII 1# Epide",ioIogy lSi. 969-IO.1.'""WII, A., ...... It., . . . . . A., ArIIIII. A.,'"", p.. ~ \'.,i.Iaaer,P.. SIIan,R.B., ...,,~MaII, p...... r.. KaIIIII. K. L, Ledz. Me, StaItIa. P., . .,..., F., Ha'-..., G., ~ R., TGIIIaIa, P. 8IIdI z.w«..........., A. 2004: CimdaIing levclsaf'aa staaidhanncnesanlriskofcnclamclrial eaaccr in pDiSlmCnnp"usni wamcn. brltnItIliHIIJI JtIUIIIIIl D/ 0Inm' 108.425-32. y ........ K.. AaWiID, A.,.1ontI, J. s., ........, Me J., GIInmaIccI, B., 8sh.., C. K.. WIIIeH, W. c. ... II'-, It. B. 2003: PIaspedivc IIUdy of sclenium levels in IDcaaiIs and risk or caa.y heart diaac in mea. AnreTicar JOIIIIIIII 0/Epidtmitllogy
.......,s.,
158.852.....
net monetary beneftt (NMB)
Sec COSJ-EFR:CI1\'I!..
NESS ANALYSIS
net reproduction rate (NAA)
See DI!MOCIlAPJIY
neural networks nus is a general class of algorithms for MAC'Ima LEARNINO. A neural netWOJtcan be describccl asa pananclCrisc:ci class of runcti~ spccilicd by a weighted Imph (the nelwork's an:hiledUn:). '11H: weights associated with Ihe alaes of the ,mph an: the parameters. Oripnally. ac:wal nelworks wen: malivated by analogy with the strucllft of the brain. '11H: nodes of abe neul1ll network com:sponcito the: neurons andlhe mps 10 neuron inlenldions. Far clinx:tcd paphs. we can clisliquish reaDmIl an:hilCctures (canlaining ~Ies) and feedfarwanl IKhilectures (1IC)'C1ic). A \'aY importanl special case f1I feedrarwanl nelworks is liven by laycral networks. in which the nodes or Ihc: II1II'h 1ft cqaniscd Do layen such that caancclions an: possible only belween clell1Cllls oflWoamc:culive layem.1hc weight bc:Iwcen Ihcjda unit and lheldh unit ofsuccessiYe laycrs 1- I and I in a network is indicated by wb and il is often assumed dial all c:Ic:mc:IU or a layer IR c:ann&:Ctcd to all elc:ments or the suax:ssive layer (fuJly canncclcd an:hilecllR). In this way. the caanecliaas ~n lwo layas/- I and I caa be n:prc:sc:ntcd by a weiptllUllrix W,. whasec:nby at lOW k and coIUDUIjCXlllapDllds to Ihc weilhI W41 of the edge fR1lllrw*j to nodc: k in the SUCCCl5ive layen (sec: Ihe ftllR).lt is cllSlalllaly to caU Ihe ftIsllaya- the: inpullayer and the last one the: aulpUl layer. 11M: n:mainiq ones an: caIIc:cI hiddca layc:rs.
------~.~
Input laver
. .
Hidden layer
Output I.,.,.
neural networks Connections between layfNS on the weight f1IIlItbt
A 0pcn:eptraD' can be dcsc:ribcd as a network of this type: wi" no hidden layen.11 can also be seen as abe builcling block of complex nelworks. in that each unit can be n:,ardecI as a pcn:eplnm (if inSle8d of the transfer ftlnction one uses a threshold function, rctumin, Boolean yalues). Therefon:. laycn:d fccdfarwanl neural networks as described abo~ an: also oRen mcm::d to as "multilayer pcn:eplrDns' • In a la~ netwodc. lhc function is campulcd sequentially. assigning Ihe Yalue of the arpmenllo lhc input layer. then calculating the activation leyel of the successive layers as dcscribccllaler, wtillheoutputlayer isn:ached. Tbcoutpul of the: func'ion computed by lhc network is the activation value of the outpul unit. All units in a layer an: updak:cI simultaneously and all the layen an: updated sequentially. based on the output of Ihe pmvious layer. 11Ie UDils of Jayer I calculate their output values y/by a linear combination of"e values at the pi'Cvious layer YI_I. followed by a nonJinc:ar tmnsformation 1:/l0R, as follows: y,=I(W""_I) when: w, is the alae weilht malrix between layer 1- I and layer I. and when: I is callc:cl Ihe Irall:flerftllldiDn. A cammon choice for this lraasfer function is the 1000stic function: I f(::) = 1 + e-: Notice dial each neunl network thus n:pn:scnls a class of nonlinear functions parameterised by the weighls whose values cletennine Ihe inpullautput behaviour or the neural netWOJt. ThUmn, Ihe network amounts to chaosin, Ihe values of the weiJhts automatically. For Ibis. a (labelled) lniniq dataset is nc:ccled and an error function for Ihe pcrfOrn18llCC of the network has to be ftx.cd. Trainin,a neural nelWOJt can then be dane: by ftncliq those: wei,hls ....t minimise the: netwadt·s c:rmron such samples (i.e. by ftllinl the nelwcft to abe data).
311
NMB ________________________________________________________________________ More conc~tely, in the parameter space the error funclion ewluated on the training data translates to a cost function thai associates each configuration of Ihe edge weights wilh a given enor on the training set. Such a function is typically nonconvex. so lhal il can be minimiscd only locally, which is often done by gradient descent. A technique known as 'backpropagalion' plOvides a way to compute the necessary gradients efficienlly, allOWing the network 10 find a local minimum of the lIaining error with R:SpCCllo network weights. The fact Ihallhe lIaining algorithm is thus only guaranteed to convcrge to a local minimum implies thai the solution is affected by the initial estimate for Ihe weights. This is one of the major problems of nc:ural networks. Also problemlllic is the design of the architecture (e.g. the size aDd the number of hidden layers). often chosen as the result of llial and CIIUI'. Some such problems have been ovel'tlOmc by the introduction of Ihe related melhod of support vector machines. Other types of network arise from different design choices. For example. radial basis funclion nelworks usc a differenl transfer funclion~ Kohonen networks are used for clustering problems: Hopfteld networks are used for (lIJIDbinatorial optimisation problems. Different training methods also exist. NenDB [Sec also CLUSlER ANALYSIS IN MEDICINE] BaldI, P. 1991: Bioill/ormtllia; II mtlMint It!tII1dng apptOtldr.
OunIJridge, MA: MIT ~ss. BIsUp. C. 1996: Neural Iftht'Oriafor potier" lUognilion. Oxford: Oxford University Plas. QtstIa.Ia" No ad Sba....Ta,Ior. J. 2000: All ilflrotlutlion 10 Sllpporl rector lIftI~bint!s. Cambridge: Cambridge Uniwrsity I'W:ss. Mltcllell, T. 1995: Modine lelll'nin,. ~: McOra'A'-Hili.
NMB
Abbreviation for net monetary benefit. Sec COST-
EFFECIM!NESS ANALYSIS
N-of-1 btals
An N-of-l (or single-palient) IriaI eambines clinical practice with Ihe well-established methodology of the RANDOUI5m CONTROLLED DIAL to compare the effectiveness of two or more treatment options within an individual. The N~f-l trial offers a design thai facilitates identification of responders and nonresponders 10 tn:atmenl and subsequenl dctennination of optimum Iherapy for the individual. Indeed. within the context of the hierarchy of e\'idenc~bascd study designs. it has n:cenlly been suggested Ihal N-of-l bials deliver the highest strenglh of evidence for making individual patienl treatment decisions (Guyall el 01.• 2000). In clinical practice. the clinician commonly pelfonns a 'lberapeulic 1riaI' or 'lIial ofthc:rapy' . in which the individual palient n:cei\leS a treatment and the subsequent clinical eourse delennines whether In:almenl is judged effective and is continued. Such an approach has serious potential BL\SI:S due 10 the PI.ACDO effect. the natural history of the condition
and the urge of the paticnland clinician not todisappoinlone another. Themcthodologyofthe N~f-I trial at least partially overoomes SOI11C of these potential biases. N-of-l IriaJs generally CXIII1p8fC a single new therapy wilh a cuncnl sIandard therapy 01' a plllClCbo. However, as wilh lIadilional randomised controlled lrials. it is also possible 10 compare more than two treatment options. In an N-of-Ilrial the individual serves as his or her own control. n:ceiving all In:alments under in\'Cstigation. Ideally. such a trial is conducted as a double-blind (both the individual and outcome 8SSCssor blind to allocated treatment in any tn:atment period) multiclt1SSOvcr trial with three or ~ periods fOl'each treatment. Repeated alternations belween trealmc:at periods with the new intervention and lhe control ensure several comparisons between the treatments. The trial design will. however. be lailon:d to the clinical enlity and therapies involved. The lime commitment of such trials by both patients and health plOfessionals is considerable. N~f-I lrials rely on cooperalion between individual clinicians and patients. Hence. the patient"s (and clinician's) commitment to the trial is essential for il to mach fmitioR. The duration of an N~f-l trial willlaQ;ely depend on Ibe nalure of the c:ondition and lhe lreatmcnlS under investigation. but is likely 10 continue for between sc\'Cral weeks and scveral months if noI far longer. Hence. such trials am only elTective fOl' chronic and stable conditions when: the natural history of the condition is unlikely to change dramatically oYCl' the course or the llial. Examples of their usc in dilTerent clinical areas include osteoarthritis. gastroesophageal re8ux disease. attention deficit hypenctivity disorder and chronic airRow limitation. among others (March el 01•• 1994). One problem encounle~ in N-of-illials. as in CROSSOVER TRIAI.S, is cany~ver effects of In:almenL which may m1uce the estimated treatment effect. The therapies under invcstigation should ~rore have a rapid onset and cessation of effecl Ibat will help to minimisc any CU1')'over elTects. In addition. a washout period between treatments can be incorporated into the trial ar a run-in period. where the first few days on each ImIbnenl are not evaluated. Because oflbe expense and time involved. it is important 10 dctennine atlbe oUlset whether an N~f-I trial is really indicaled far an individual~ i.e. is the effectiveness of treatment in doubt far this specific individual? Full criteria (summarised earlier) that should be satisfied before an N-of-l trial is commenced an: pIOvided by GuyaD el 01. (1988). When the individual patient and clinician am in agreement that an N-of-I trial is justifiable. this design plOvidcs the additional opportunity to measure the symploms that maltc:r to the individual concerned. In addition to slandanlised and validated diseasc>specific and generic outcome mcasures.the individual is asked to identify their most tlOUbling symptoms or problems associalcd wilh the illness th.. an: important in
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ NONPARAMETRIC METHODS - AN OVERVIEW
their everyday lives. These then rorm the basis or a selradminiSleled diary orqucstionnain:.11 may be II daily diary or weekly summary depending on symptoms and treatment duration. bul when: possible several separate: measurements should be laken within each tn:almenl period. The opponunity to measure the symptoms thai matter to the individual is a unique feature or N-of-I trials. In II classical randomised controlled trial the lowest experimental unit is the indiVidual: in an N-or-I trial it is the treatmenl period. Thcn:fore the sample: size: in an N-of-I trial is the number of treatment periods applied. Sample size or POWtJl calculations used with classical randomiscd conlroned trials can also be used ror N-of-Itrials. However. they make certain assumptions concerning &he independence or data from each treatmcat period. which may not be reasonable. While: a large: numberortn:atment pc:riods would increase the statistical power. the nalUral counc or &he dinical entity. thcnpy characteristics and patienl compliance will generally put an upper limit on this number and thus statistical power will generally remain low. Random assignment or subjc:ds to treatmenls in classical randomised controlled trials is essential in onIer to oblain comparable groups with respect 10 explanatory and CXJIIfounding variable:s. Com:spondingly, random assignment or treatments to In:almcnt periods is essential in N-or-l trials. Once: the number of tn:alment periods has been determined there an: a number of ways or randomising the In:atmenls to pc:riods. The most recommended design when comparing two tn:alments (as is most c:ommonly the case) is random allocation within pairs oftn:alment periods. Forexample. for the: comparison of treatment A venus treatment B during eight treatment periods., the following RANDOMISAnON schedule might be generatc:cl: AB AS BA AB. This approach avoids the possibility of sevcml consecutive treatment periods with the same tn:almenl. In te:nns or the analyses or an N-or-I trial an important first approach is to pial the dais and examine the results visually. The more theomical methods depend heavily on the type or randomisation used. When the paired design isemploycd. the simplest approach may be to perf'onn a SlON 1BT, which examines the uKEUHOODor&he individual preferring the same treatmenl within each pair of In:alment periods. However. this does nol assess the sln:nglh or the treatment elTcc:t. only &he din:ction or it. A IDDIC powerful alte:mative is the 5nJDENTS I-lEST (either paired or unpain:d depending on randomisation). For such analyses the paired design is again prererable since il goes some way to reduce the impact or AUTOCORRELATION (i.e. &he assumption of such a statistical te:st thai observations rrom one In:atment period to the next will be indepc:adcnt). Rc:cording several measurements within each treatment period and comparing averages aclOSS the pc:riodscan n:duce this problem f'urlhcr. Paramdric tc:stsalso make the assumption of normality and nonpanmetric tests
may altemalively be used. In addition. BAYESIAN MEtHODS arc available (Zackcr el 0/•• 1977) for combining information rrom a series or N-of-l trials. When an individual's N-or-I trial has been complete:d the n:sults will be summarised and disscminatedduring a feedback 5c:Ssion between the clinician SB and patienl to inform rulun: treatment. OayaH, G .. Sadu!tt, D.. A.-.., J.. Roberts. It.,
z.,
lhemedicalliteralUn: XXV. Evidc:llcc:--bued medicine: principles for IIIIPlyiag Ihc usen" guides to patieat CIK. Joumtll 0/ Ih~ Amtri("QII M~tlical AJsoriDlion 284. 1290-6. Mardi, L ......... L, Scbwarz, J .. J.. C1aodE, C. and Breaks. P. 1994: N of I trials comparing a non-steroidal a-iaflammatory dnag with paracdaInaI in OSIecIadhrilis. Sri/ish Metiil'tll JOU11IIII 309. 100n-s. Zadu!r. D. R.,Mmld, C. H., MdDtam,M. W.. D'A....... R. ... SeIIker. H. P. ud ...., J. 1997: Combining single palicnt (N-of-I) bills 10 Smale populalion b'caIment effccts and 10 evaluate indMduai palicnl KSpODscs to In:almcnl Joumal of C/inical Epidemiology SO. 401-10.
sam,.,...
noncompliance See IN
ADJUS'BIENT RIll NONCOMPIJANC'E
Rers
nonlgnorable dropout See DROPOUTS. MJ5SINO DATA nonlnferlorlty See AC11\'E CONtROL EQUIVAlDICE StuDIES nonlnformatlve censoring
See CE'6ORED OBSERVA-
nONS. SURVIVAL ANALYSIS
nonlnformatlve censoring/dropout See
CEN.
SORED OBSERVATIONS. DRCtIOUT5. loOSSINO D.O\TA
nonparametrlc methods - an overview These an: inferential methods used when the assumptions of parametric methods are violaled or the sample size is small. i.e. rewer than 25-30 in each group. Nonparamctric methods do not assume thai data are normally distributed as paramelric methods do. although they usually have their own assumptions. Several situations suggcst&he use of nonparametric methods. including: when the independent andfor dependent variables an: nominal in measurement: when &he data an: ordered with many ties; when &he data an: rank ordered: when then: is a small sample size or unequal groups: when the dependent variable has a distribution other than a NOtitAL DIS11UButlON: when the groups have unequal variances: when &here arc unequal pairwise CORREl.A11ONS acruss repeated mcasun:menls; when the data has notable OUJUERS. Nonparamctric methods have common characte:ristics. including: independence or observations: few assumptioM;
313
NONPARAMETRIC METHODS - AN OVERVIEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
dc:pcndent variable ma)' be categorical; focus on rank ordering or frequencies: hypotheses in terms of rank. MEDIANS or f~uencies: sample sizes ~ less stri~ent. Mast paramebic methods have atlcast one nonparametric altemali\'C. Some ma)' ha\'C several and which to choose dc:pcnds on which assumptions ~ met and what is to be shown with the data. 1'hcre arc two nonparametric ~Iation coefficients: they ~ nonparametric: versions of PEAR.soN-s CORRELATION COEF. FICJENI'. These an: SPEARMAN"S JW« caRJlELO\lION COEft'K:jENi'. also known as SPEARMAN"S RHO (g) coefficient., and KENDALL'S TAU (I') coefficienL ~ an: several different versions of Kendall's I' including a. b and c. Spearman's rank is the Pearson ~Iation calculated on the ranks of the data rather than the raw data. It is therefore often prefcncd due to its similarity to Pc:anon: in fad. if the data an: nonnaIly distributed both coefficients give numerically similar answers. Speannan's rank is difficult to inteJpret if there ~ many ties in the dala, so in this situation Kendall's I' is often preferred. Kendall's T' can be extended to give a P'o\R11AL CORItEI...\11O.~ COEFFJC'lEH'r. this finds the correlation between variables while controlling fot the effects of a third variable. Nonparametric methods can also be used to analyse C(X\II. TINOENCY TABLES. The most c:vmmon of these methods is the CIO-5QUARE TEST of independence, which is used if both variables in the contingency table an: nominal.lrthe assumptions of the chi-square tesl an: nul met then ASHER"S EXACT TEST or its extension. the Fishu-Preeman-Houlton lest, can be used instead. Both the chi-square and Fisher's exact tests can only be used if the poups an: independenL If there is an ordering in the data within a contingency table then a tesl for trend may be more powerful than a lest of association. In a 2 X C Iable. the chi-square tcsl for trend can be used to look for a trend in proportions between the two groups. In an ,. x C table w~ both variables an: onIe~d, then linear-by-lincar ASSOCIAlKlN. also known as the Mantel-Haensz.cl test for trend. can be used (sec MAN1EL-H.AEJImEL MElHODS). McNEMAR'S TEST is used to anal),se binuy data in two groups where the groups arc paired. These can be data before and after some event or matched pairs. McNemar's lest takes two different forms depending on the sample size. There are two extensions to McNemar's lest; these an: the C'ocHR.o\N QTEST, which is for more than lwo time points or more than two groups with a binary outcome. and the Stuarl-MaxwelllcsL which is used when there an: two paired groups with more than two outoomcs. If agreement rather than association is of interest then the kappa coellicient can be used. Kappa mcasun:s ~ement while adjusting for chance ~nt. "There arc tJuce forms of kappa. simple or Cohen's kappa for agRement between two raters nting on the same scale. weighted kappa. which takes into account the degree of disagreement and multirater kappa. which allows for more than two raters (sec ICAJIM AND WEIOHTfD KAPPA).
Continscncy tables can also be analysed usi~ contingcacy coefficients. The phi coefficient is used to give strength to the association found in a significant chi-squan: tesl of association in a 2 X 2 lable. Cramer's V. also called Cramer's C. is the extension of the phi coefficient to an ,. x ~ table and should only be used when the chi-square lest has already pro\'ed to be significant. If both variables in the contingenc)' lable an: ordered. the gamma statistic. G, can be used to measure the slrength of associalion. If there is a special distinction between the ordered variables. e.g. one is the dependent and one is the independent. Somer's 0 can be used: this is asymmetric and gives a different answer if the variables are intercha~ed. Ooodncss-of-ftt teSls an: usually nonpanunetric. These lcsts are used to see if a sample distribution is similar to a pre-spc:cific:d distribution or noL The binomial test is used for dichotomous data. i.e. data that can onl), take two outcomes. It secs whether such an extreme split into the two groups is likely to have occUJlCd by chance or not. If there an: more thaD two outoomcs then thechi-squan: goodncss-offil test could be used instead. The KouIOOOROV-SMIIU~O\' TEST takes twoditTercnt forms; the first is the two-sample test. This lest eompara the distribution oftwo different groups lO see if they ~ similar or not. This is done by sc:ci~ if the largest difference between the distributions of the two groups could have occum:d b)' C'/Ullft'f! alone. The one-sample tesl compan:s the observed data to a thcon:tical distribution to see if the IBlJ:est differences between the two distributions could have oocurrcd by chaDce. It is often used to test if data an: normall)' distributed enough to usc parametric Icsls. HOWC\o"Cr, it should only be used if the parameters of the distributions can be specified in advance.lfthis is nol the case then the Lilliefors tesL which is similar to the KolmogorovSmimov tesl bul allows the MEAN and ST.o\NDa\JlD DEVIATION to be estimated from the data rather than being, specified in advanc.-e. should be used. The Shapiro-Wilks test is also similar to the Kolmogorov-5mimov test and is usually used tocompan: to anonnal or EXPONENTLo\L DISTRIBlmON. The runs lest sees whether the order of occurn:nce of lwo wlues of a variable is mndom. A run isa sequence of like observations. If a sample contains too many or too few runs then that sample may nol be random. Nonparamctric tesls for two samples an: nonparamctric versions ofthe l-lest. One ofthese Icsls is the MANN-WHITNEY RANK SUM TESf, which tesls for a difference in spread and location or medians between Iwo independent groups. The hypothesis tested depends on whether the assumption of similarity of distributions is mel or noL The SION lEST tests for a difference in medians between two paired groups but is less powerful than the WILCOXON SIONED RANK lEST for doing the same. The Wilcoxon signed rank test can only be used if there is a similarity of difference scores about the lJUe median difference.
____________________________________________________________ Thc:rc arc man)' nonpanunctric versions of ANALYSIS
(]f
\'ARIA."'K'E: the mast commonl)' used is the KRusItAl..-WAWS lUT. which is the extension of the Mann-Whitney nnk sum
test to more than Iwo groups. The MEDIAN lUT can be used when the assumplionsofthe KruskaJ-Wailis test arc violated~ it compares medians between k groups. The JONCKHEERE'I'f.JUlsTRA TEST is used when Ihc independenl variable is ordinal and looks for an incrasc in medians rather Ihan a dilfen:ace in medians.. the Older of the groups having been specified a priori. If ~ arc n:peated measura in the independent groups then the Scheiru-Ray-Hare lest can be used: this is an extension of the Kruskal-Wallis tesl to allow for ~peatcd measun:s within indepenclcnt groups. 11tc AtiEmIAN 'lEST is the nonparametric \'ersion of Jq)Calcd measun:sanalysis mvarianc:c.lt is used for: multiple pain:d samples. Them is a multivariate extension or the Friedman lest. the Quade lesl. which is similar to the Friedman test but taitcs 8CICOUnl of the range of the data within a block. If the independenl variable is ordinal then the Pa;e test for onIm:d allenlatives is mo~ powerful than the Friedman tesl~ illesls for an iDcreasc: in medians rather than a dilfen:ace in medians but the order of the groups must be specified a priori. As the siu of the samples used in nanparamclric methods incRDe. the tesl statistics lend towards a nonnal or anSQUARE DISTRIBUTION. The~forc.. when the sample size is large enough a normal or chi-square approximation 10 the test stalistic is used to calculate the P-VALUE. It is impmtant to IqH)rl this asymptotic P-value only when Ihe sample siu is large enough. If the sample size is small, the exact P-value should be quoted instead since it is more appropriate and mOR accurate than the asymptolic ORe. For further details sec PeU (1997). Siegel and Castellan (1998) and Conover (1999). SLV
CoiIOftr, \V. J. 1999: Pract;~o/ fIDIIptUanJetric slatistics. 3rd edilion. Caic:heskr: John Wiley cl Sons. lkI. Pett. Me A. 1997: NotIptITtlIIlelric stalislics lor Maltlr core re.uarch. lhousaod Oaks: Sale. ...., S. IIDII Cute..., No J. 1998: NonptUanJetric stotirtia for the bebtnioral scimCD, 2ad edition. New York: Mc:OratA'-HiIl.
nonre8ponsa bias This occurs in all Iypes or study when there is a systematic ditTen:ace between the chlll1lCterislics of those who choose to paJticipale and those who do not. In surveys. it is common practice to select a rcpn:sentalive sample from the lDIIct population and colleci data by means of a QUESnONNAIRE. Some or die sample will not respond. either because they cannot be con&actcd or because they do not wish 10 parlicipate. If the nonresponden diller in a systemalic way in terms of characteristics (such as ace. sex or depriYalion) thal are related to the response variable(s). biased estimales willrcsulL If basic
~DmTRlaunoN
demographic information is available for the entire sample. including the nonrespondents.. this information may be of use in adjusting estimates to account for the liAS. However. the best approach is to maximise the response rate by using techniques such as incentives. Rminden or enclosing stamped addRssed envelopes. Edwards el 0/. (2002) summarise the evidence for a range of these techniques. Nonrcsponse bias can also be a problem in comparative s1udies. For example. the classic Salk Polio Vaccine Trial of 1954 was conducted as a randomiscd. double-blind. placebo-aJnlrOllcd trial (sec CL~ICAL TIlL\LS) in some health departments in the USA (llmuret al.• 1989: Meldrum. 1998). Parents of almost 750000 childml aged ~7 years were asked for permission to include their child in the randomised trial but 45'1. refused. Those who consented were randomly allocaled to Salk vaccine or PIXEBO. All childnm, including those whomuscd inoculalian. wac followed up for one year. The INCIDENCE rates of polio in the rollowing year ~ 28 per 100000 in the vaccinated group. 71 per 100000 in the placebo group and 46 per 100000 in those who mused. Although other mom subtle biases may be present. the large slatistically significant clitTen:acc in polio incidence between those randomised 10 placebo and those who Rfuscd is largely due to nonn:sponse bias. 'n1osewho ~ruscd w~ mo~ likely to be from dcprivc:cl households who were known to have a /oM-'Y!r incidcnec or polio in childn:n aged 6-7 years. In this randomiscd lrial. a valid comparison between vaccine and placebo among volunteers is obtained. giving convincing evidence of the erred of the 'Vaccine. The nonresponse bias here is an inlC~sling side issue. but a nonresponse bias of this magnitude in an observational study would be a serious problem. Nonn:sponse can also occur among individuals lost from observation in LONDITUDINAL STUDIES. Any Rsulting bias is known as withdrawal bias or loss to follow-up bias (see DROPOUTS).
lVHG
(See also BIAS. BL\S ~ OBSERVATIONAL 51UD1ESJ
Edwards, P., RabertI, ... 0..... M. n til. 2002: lnm:asiDg mponse Illes to postal queslionnaiJa: sysIematic molew. Briti" Mtdko/ Journtd 324. 1183-91. MIIIdnan, M. 1998: 'A adculatc:d risk': the Salk Polio Vaccine Trial of 1954. Bl'ilish Medico/ Jounrttl
317. 123)..6........... J. PtL "111. 1919: StalUtics: 0 guide I" the rmJc"own, 31d edition. Paciftc Gnwe. CA: Wadswuth and BruoksICoie.
normal dlstrtbuUon Probably the mosl important of the PROIIAIDJTY DlsntIBUlIDN5. the nannaI dislribulion lakes the fonn of the familiar. symmclric. unimodal, bell-shaped curve. as illustrated in the figure on page 316.1ndecd. it is often rercncd to as the bell-moped dUlribution or GlllUsian distribution. The nonnal distribution has a number of propelties that are appealing. 31S
NORMAL DISTRIBUTION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
0.4 /(%)
0.3
=~exp[-~
Pc:rf'onniDi such alnnSf'onnalion is sometimes n:fcnalto as calcalaling the :-5CORE of the data. Sincc it is symmelric and unimodal. the median andlhc made of thcdiSlribulion cquallhc mean (P). Tbc clislribution
iG.2
.!
canbc~byclvina;ina;llorn:scalcclbychancina;D.bul
0.1
O.OIt:::::=~======;===~= -2
0 value
2
normal distribution 11Iustra1ing1he fotm oIlhe nomuJI (01 Gaussian) tlsllibulion. Ibated here is the sflIndIJrd nomtIlI dJstdbulkJn with mean zetOsndvatlancs 01
one. Matlcedisthe IinecorrespotJdingtoa value Iorllre zscore 011.96. As can be seen, 97.5" of thedlsttlbutlon Ies to the ItJIt 011.96 and 2.5'16 oIlhe distrlbulion is IlSSOCiatedwithlargervalues. T1JeplObebillyolseeinga WIlle gfflllter In magnitude than 1.96 is 2)(2.5'16 as the
distrlbulion is symmetric about Z8IO
the shape: of the distribution n:mains unchanged. Because or this. propc:Ilic:softhe shape such as SKEWNESS and KURIUSISan: constant far aD normal distributions. 'lbcn: is an added bencftt daat 9'" of the dcnsity will lie within 1.96 (or approximalcly IWo) SIandud deviations of the mean. This result is invaluable in calculating conftdcace intcnals or performingCCsts OD the 1DCaR. Despite the name. it is nol always normal to ICC a nannaI distribuliOD! Indeed. Iheae an: times when it would be a pasilivcly oImonnal distribUliOD. HoweWI'. a number of &Ialistical pmccdun:s such as ANALYSIS C. \'ARWICE (and its variants) ad thc S11JDEN1's I-TEST do n:ly on propcnicsoflhc normal clislribution. 0Ihcr techniqucs and pracc:clun:s that might make assumptions of normality include PEAISON"s CORRELATION COEfRC'IENT. LINEAR ItfORmat. PItIXCIML CC».tJIO. NEXT AlW.YSlSand FACnJRANALYm. 11ais leaves the pmblcm or
amuc:e the dala to that with the standard nannaI density
tcslinJ 10 sec if dada or n:siduals an: normally clistribuccd. 'I1Ic normal distribution always stmchcs from minus iRfinity to infinily and. as such. while it can often provide an adequate approximation to a bounded distribution, if the data have 10 be positive. or between 0 and I ~ or an: constrained by the inclusion/cxclusion criteria of a trial. then this might be cause for aIann. Essentially, if then: an: no observations near the boundary, i.c. the densily of the distribution al the boundary is ncgliliblc.the approximation may well be ftnc. For further infonnation on the nonaaI distribution see Altman and Bland (1995) and Annitagcand Colton (I99B). 'I1Icre an: a number of tests available to seek cvidence of IIODnormality. One can test the skewness and kurtosis of dac sample, bul this is not gencrally advisable. Onc can conduct the KoLMOClOltO\'-SMlL'IOV TEST. but this can be ovcnensilivc. as can the Shapiro-Wilk W lest. another popular allemativc. A a;raphical assessment of the distribution. either through HISTOGRAMS OJ' QUAN11U-QUANTILE (Q-Q) PLOIS. will often sufllce. If one decides thai a samplc of clata is not nonnally distributed. one has the choice of using a IIICIhod that makes no assumption ofnormality (e.g. a NDNMIWEI'IUC t.IEIIIDO)or performina; a '1IlANSFCJBIATION of the data so that they approximate nonnality. For example. t&kina; thc loprithm of ratios often leaves Ihcm ..-cepIably approaimak:d by the nannaI distribution. "Ibis is the most comman way of estimating confidence inlcrvals for ODDS RAtIOS and n:lativc risks
filnction with mean zero and slaDdarcl clcviatima of one:
(sec JlELo\1IYE RISK AND GODS RA11O).
FII1I and fcxanost.lheae is. malhellllllical n:sult callc:d lhc CEN11W. w.ar 1HEOREM. which. broadly speakinc. tells us ahat if a sample is laken from • sinJle population and aD obscnalions ill that sample an: indcpcndenl thea the sample MEAN wiD be approximately normally dislribulcd.. willi the appruximation impnn'ina; with the size of thc sample. b is also Ihe distributima that leads to • Slatistical model far • . .ression n:sultina; in the same panunctercslimates asa leut squDn:s n:gn:ssion. a property flllllDllsly exploited by Gauss (hence Gaussian). The normal distribulioD is n:1atcd 10 many othcn. 11Ie F-DI51'RIBU'I1C»I. CHI-SQUARE DlSl'RlBU'I1C»I. I-DlSl'RlBUI'IOK and LOON'QRI,W.mmtIB~can all bederivcd from iL It is what is
known as a limitina; disbibulion for. and thus can be a &aocI approximation to. the lETA and OMWA DlSlRIBl1I1OKS ancIeven for discn:tc disbibutions sucb aslhc BIJIDIAL and PoISsoN DIS1RIBUI1OHS. For further cIctaiIs on how the normal distribution n:Jaacs to other distributions see Leemis (1986). The nonnaI distribution is defined by two JNIIBIIII'tcn: the mean.lI. and the srA)lD,\IDDlMATJON. D. The dcasity function for a wriable X followinc a nonnaI distribulion with paramcIeIs II and Dis: j(x) =
~exp [_ (X_II)l]
DV'br
2Q2
By transformina; theclata X lDcmdcZ = (X-Jj)/DOnc can
________________________________________________
NUMBERNEEDED~HARM~~
It is DDt often that one is Rlquiled to pc:rfarm a normal tc:st (as oppased to a tell for DDIIIIIIIity). as dlis Rlquil'es knowledJe or the stancIard deviation of the dislribulion. If the standard dcwiatiOft is to be: eslimatcd frum Ibe sample. dlen a l-lest sIIoalcl be performc:d. 'tben: are. howew:r, tables or probabilities associabi willi %-scan:s in many 1eX1s Lindley and SaIIt. 1984). 'I1aese are normally the pmbabilities of a standanllIDIIIIIII variable taking a value less than thez-&c:CIn:, a rancti_ (the clillributi_ function) usually cIeDDICd by the uppercase On:ek leuuphi •. lflhe z-sc:on: is positive. dlen the PIlOBABIIJIY or observing a scen as large in mapilUde is 2(I-ct(z). Alternatively, ir the %~ is nc:calivc. the probabilily is 2.(=). When Ibe z-sc::an: is zenJ. it is taking the mean value and so ct(O)=O.5 as Ihe dislribulion is symmetric about zerV. As aD example of ilB use. Kanis (2002) models the density of bone minerals as a lIDJ'IIIal distributiOft and by so doilll is able to calculate Ihe effect or se~1 variables on fraclUn: risks. AGL (See also MULTlVARL\1E NORMAL DISlRIBunox)
NNT is Ihc: estimated number or ..tiCDlS' who need to be tn:ated with the new ImLlment rather lIIan the standard IRatment ror one additional palientto benefit. The NNT is calculated as one divided by Ibe absolute risk n:duction (ARR). whe~ the latter is simply the absolute value of the diff~nce between die connl group event rate and die experimental poup event rate. The COIICepl of NNT can equally well be applied to, harmful outcomes as well u beaeftcal ones. when iDstead il becomes die number needeclto hann (NNH). For example. in a slUely into the effective use of intensive cliabelcs Ihcrapy Oft the devclopmcat and prop:ssion or neurapaday, 9."' or patients rancIamised to usual can: and 2.R of patients nmdomiscd to intcIBve therapy sufl'aaI fnHD neuropathy. Caascquenlly.
A_a. D. O..... BIaDd, M. J. 1995: 11Ic IICIIIIIal distributian.. BriliJII MftiicalJlJllr7Nli3 10. 298. A.-...., P. adCGltaa, T. (eels) 1991: En~-doptMliQ tf _"tlt&tie$. Oaidaestcr: Jalan Wiley .t Saas. Ltd. KmdI, J. A. 2002: DiapDsis of ostcapoRIIis and IIIICISmentorfiacbRrisk. "-UmCY!t 359, 1929-36. ......... LMoI986: Relatiaasllips lIII0111 CGIIIIIKHI uai¥ariaIe dislributians.. 1M Ameri· ~tIIIStlllillidtm40. 2.14l-6.1JDdIe;J. D. V.... Scaa. W.P.I9M: NftF Ctlllllril~ e~IMII"'" $tG/isticai lobW$. CamIJridsc:
which is rounclcd up to 15. This meaDS 15 diabetic paticnts need to be tn:aIed willi intensive thenIpy to ~venl CIIIC fllllll developing neuropathy. (11Iis example is given on the website or the Ccnln: far Evidence Based Medicine.) Allman (1998) shows how to calculate a conficlcacc interval ror NNT. althoup this is nol considelal helpful ir Ihc: 9S~ conficlc:nc:e interval for ARR includes the yalue ZCIO. as lIIis lives rise to a nonftnite CONfIDENCE IN1EIlVAL for NNT. W8Itcr(200I)illuSlnltessomestatislicaipropelticsof NNT and similar mcasura. willi examples drawn frum ditren:nI Iypes or sludy design. While then: have been a rew critics of NNT as an index. mosI medical statisticians. including Allman and Decks (2000) derend die: concept as a userul communi calion tool when pn:scDliIll raulls frum clinical studies. Bandolier. an Oxford·based. indepenclc:nt research group promotiq cvidence-based medicine. maintain a useful website with rIIIIhcr inrormation on NNTs and their application~ www.jr2.ox.ac.uklbanclolierlbaothl BSElCRI' booIhslNNTs.hlml.
CanaIIricI&e UaM:nity I'Ias.
nor.... probability plot
See PROB.\BIUI'Y PUJI'
nQueIy Advisor This is a software packqe userul ror dc:tcrminilll sample sizes when planniDg resean:h studies. Details are available 80m Statistical Solutions Ucl 8 Saulll Bank. Crossc'sOn=en. Cork.lrdand.. www.stIlllOLic:lnquer)./ nquer)'.htm. SSE
nul_nee parameter 'I1Iisisa pammeleroramodel in which then: is litdeorno scientific illlClat but whose pn:sc:ace is needed to make valid inren:nces and cstimates or the panII1Iden thai an: ormai iDk:n:sL An example of a nuillllDtle panuncIer is Ihe YAJlLo\IIIrE of the random efl'cctlcnDs in a RANDOM IN1ERCEPI' MODEL (see MU.TII.EVEL MODElS).
null hypothesis
SSE
See HYPOIlIESIS 1BI'S
Icadililto NNT- I/AD - 1/6.8% - 14.7
.u.a.n, D.O. 1998: Caaftclence iDlcmlls fGr the MIIIber nccdcd 10
bat. Br;II MetIi«J1 JDIIIfIG/317. I3OD-I2. AMmeD, D. 0 ..... 0.1rs, J. J. 2000: Canuncnl on the paper by HUIIaa. JOIII'IItII tJ.f tlw RtIJ'tIIS1tltmni Sociel)~ SeMI A 163,415-16. WAr,S. Do 2001: Nwnba nccdcd to baI (NNT): CSlimatioD or a JIICIISIR of clinical benefit Sttlt&tiu ill Mdrille 20. 3947-62.
number needed to traat(NNT} Often a useful way
number needed to harm (NNH)
to n:porl Ihc n:salts of a nmdomisccl cliDicai trial. the
10 1ItE,Q' (NNT)
Sc:etu.lllERNEmfD
317
o obaervatlonal studl..
TheM ~ silUations when: medical raean:h has 10 be conducted using saudy clc:signs that do DOt involve RANDOMJ5A11ON. These obscn'alional studies include a range of difren:nt saudy types. four maia types or which are described and ilIuslnlcd brieRy below: CASE-CONI'ROL. COHORT. CROS5-SECI1O:W. and ECOLOOICAL SIlJDIES (sec EPIDBIJOLOOY). Case n:potts (or case series), which involve only shlclying patients with spcciRc diagnoses. an: sometimes included unclc:r this heading. While providing valuable information about characterisaics of patients. from the scientiftc pclSpcctive they an: limited and. then:fon:. are considc:rccl 110 further. Inlcn:sa in most abscmdional slUdies Iypically focuses on studying the n:lalionship between disease and exposure. Expasun: dala an: coIlccled for those diqnoscd with a disease and also far lhasc who an: disease fn:e. In the absence of randomisation. obscrvaliaaal studies are particularly pnJIIC to problems usac:iatcd with confoundinlo and lhis needs to be considen:cl at Ihe design and analysis SIaIe or such studies. SAMPLE SIZE DElERMINA'IImI IN OIISERVA11ON.o\L SIUDIES an: described elsewhen:. In ecoI"giml sludies. Ihe unit or aaalysis is a group of individuals. or ·community·. whc:n:CKh communily provides its IIIaIS1In: of disease OUlComc: and exposun:. Examining the Sln:ngth of ASSOCIAlION between disease and exposure in these studies is usefully examined graphically using a scllllel' diagmm (sec: SCATI'BlPLCJI"). A sIUcIy examining thcassocialion between iDciclcna: or squameJUH.'lCD can:inama of the eye and ambicnllcveJs of solar ullnviolCl is one such example (NeWlOn el til.• 1996). WIlen Ibis suggests a lincarrelationship bdwcen the discasc and exposun: a n:gn:ssiaa analysis can be infannalive. In this example. the incidence of squamous-cell can:inama was found todecrcase by 294J, pcrunilreduction in ullraviolct exposure. a finding ahat was highly Slalislically significant (p < 0.00(1). AssDciaiions observed between disease and cxposun:s in ecological studies may DOl necessarily n:8cct the paUem seen when individuals are the unil of analysis. CrtMS-sediontll slut/ies. in conlnlst. provide the opportunily 10 Slucly data on disease and cxposun: on individuals at a particular poinl in time. In a CIQSS-SCClionaI study of elderly women. for example. inlcrcsl focused on the piCvalence of falls (Lawlor, Palel and Ebrahim. 2003). Such disease measures can be examined in n:lalion to one or mon: expasun:s also recorded al thallimc. The eflccts or chranic diseases and drug usc on the IRvalencc of falls were of intcn:sl and daIa on these factors wen: also recanIcd aI thai time. The ftndinp from this EIfcyd"'~
C'1IMIJIIIItian 10 M.rKtlI S1.una: S«rMII Edition
study illustrate the importance or considering conrounding faclOn. While the pn:valcnce or falls increased with increasing numbers of simultaneously oceaning chronic diseases. no such n:lation was found with the number of drugs used after adjusting for such factors (Lawlor. Patel and Ebrahim. 2003). CAse-c"nl,.,,1 and cob",., sludies both offer the important advantage of measuring disease and exposun: at different points in time. Casc-control studies proceed by identirying a rcpn:scntative group of individuals diagnosed with a ccrlain condition Ceases') and a represenlative group of individuals who ~ disease free ('controls') (see the Rrst figUM on page 320). Information n:ganling specific cxposures of inlen:st is collected and compared in cases and controls using odds ratios (see REUTIVE RISK AND ODDS RA11O). which an: estimated b)' dividing the odds ofexposure in cases by lhe odds of exposure in controls. Casc-control studies have paaticular appeal for studying rarediscasc:sand il is nOICwonhy that this was the study desip first used 10 demonstrate the link between smoking and lung cancer (Doll and Hill. 1950). 11aey do. however, n:lyon individuals being able to n:calltheirpast exposun: hislories 8CCUI1Itcly: RfCAU. BIAS is a particular problcm in case-conlJOl studies. Caafounding in case-control studies can be taken inlO account by IMlchinl cases and eanarols althe study design. bul allowllDCC must be made far this in subsequent clata analyses. The United Kingdom Childhood ClIIICCr Study provides such an example. when: inten::sl focused on Ihe relationship between nc:onatal eXpOSure to vitamin K and childhood cancer (Fear el til.• 2003). Here, cases wen: childn:n diagnosed with cana:r and controls were Ihosc wilhaut canccr. One mc:asun: of ·cxposun:' was whelhc:.. neonatal vitamin K bad been reccived cnlly or by Ihe intramuscular (im) mule. Cases and controls wen: matched on sex. month and yC8l' of birth. and n:ciDD of n:sidence at dililnosis. 'I1Ie odds ratio ror C8llCel' in childn:n who had received vitamin K cnlly compared to those who received it by the im mute was I.€» (9S'it. CI 0.94 to 1.61) after adjulingfarthemalchingandalhc:rconfaundiDlfactors. 'I1ais sIUcIy did nal. thcrd'cn. provide support for an association between expDSIR of nc:onaIc:s to vitamin K and subsequent risk of chilclloacl canc:cr. When ordc:ml accordiq to the Sln:ngth of scientific cvidc:ace that observational studies have the palCnliai 10 provide. cohon studies hold the top position... its simplest fann. this involves identifying discue-frce individuals. who an: cllUSified 8L"C1J1'ding 10 an exposure aI a particular point in
YICd by Briaa S. Everitt and Christapher R. PIaImer
C 2011 JohD Wiley lit Soar.. ....
319
ODDSRATIO ___________________________________________________________________
C8se-con1roI studies start with the si~le disease (or outcome) of interest and exarnme the association with past exposure(s) PAST
4
PRESENT
Exposwe
Exposed?
-....
-
-------........ -- ...
study suggested no difference in cancer rates belwc:ea these two cohorts (IRR =0.99. 95... Cl 0.13 to 1.17). When Ihe outcome studied was other symptoms of ilI-heaJth. however. Gulf War veterans w~ found to have an excess of illness at /.Me follow-up (Hotopf el al.• 20(3).
DoD. R. ..... _
.. ~......-------------
~
rate ratio (lRR). ARc:r adjusting for confounding factors, this
~
Unexposed? .-------------- Conlrols
ob8ervatlonal 8lud1es Principles of a case-controI study design
time and rollowcd up to determine which or them develops the disease of inlcn:st (see the second figu~). In conlnlst to case-aJlltrol studies that rely solei), on 00D5 RAnos., cohort studies provide the opportunity to measure disease in many diffen:nt ways using absolute (e.g. risks. rak:s or odds) or relative (risk ratios. rate ratios or odds ratios) mcasura. Cohort studies also havc the important advanlage or mllec:ting exposure elata prior to disease occurrencc. i.e. avoiding ~all bias. and studying a range or diffcrenl health outcomes. Such studies. howcvu. ha\'C the disadvanlales of ~quiring larger study size aDd taking much longer 10 mnduct. particularly far rare diseases.
A. 8. 1950: Smoking_ ~iDOma oflbe IIBIIPldimilwy n:port. BTitis/r MftiimJ Joumtll ii. 739-48. F..... N. T .. R......,E.. A..... P.. SI.....,.,J., o.y.N. ad Eden.O. B.2003: Vitamin Kand childhood Cllloer. a n:port fiom the United KinpIom Childhood Cancer Study. British Jourlftll ole.crr 19, 122l-ll. HoIopf, Me, o.,1d, A. s., Hall, L. Nlbllaa, V., VnwIa, C. IDII WIIIIIy,S.2ODl: GulfwariUncss-bctta. "MII'SC.orjusithesamc? A CCJhort study.lkiliJII Metlit.YJI Joumtl1327. 137~2. Lawlor, D. A., . . . . R.aadEbnIdm,S.200l: AssocialiOllbctwc:c:nfallsiDclderty women and chronic diseases and drus use: ClVss-sectional Sllnoey. Britis/r Medical Joumtl1327. 712-17...adutIIIe, G. J., B.... A...M.,Macaaoc:IIIe, N.. Rotapi', M.. Do)'1t, P...... LDI, M. 2003: Incideacc of cancer IIIDOIII UK Gulf \WI' veterans: CCJhort slUdy. Britis/r Mrtlit.YJI JOIIIYUII 327. IlU-7. N....... R., Fer"', J.. 1tIeY.., G., BInI, V. ad ParIdD, D. l\L 1996: Effect of ambient
solar ullJB\'iolet radildioa 011 incideaQe or squamous-cdl camaoma of the C)'C. 77re Ltmc:et 347, J4S~J.
odds raHo
See RELA11VE RISK &~ aDOS RATIO
ona-sample Heat
See S"nIoon"s I-'IUI'
one-sided tests In hypothesis tesls. we try to distinAcohortatud)' ...... wit. 8181ectad tJOUp 01 dilHll8 .... peaple who are daaaified accoldilg to 8 apedfic expoaura. They are then cbaeMcI CMr lime Ie) . . . who dertMlopa the cl81188 or outcame(a) aI inlerelt
PRESENT
Exposure
----------~~. ~E
Dlll8le1tat1ll
-------------I..~ .-",. ~
Unexposed
.. --.- -.-
Disease?
- - - - - - - - - - - - - - ~ [Jisease tee?
abservatlonal studies Principles 01 a cohott study design Studying disease rata in a specific whon oRen ~quin:s identifying a sc:par8te eamparison cohort. A cohort study of cancu incidence in S I 721 UK service personnel deployed in the 1991 Oulf War. for cxample., also involved assembling a cohort or 5075S active senice penonncl who wen: not deployed in that war (era cohort) (Macrarlanc et al.• 20(3). In order to take account of confOunding, members of the Gulf War \'ClenIDs. cahort members were matched to members of the era cohort according 10 age. sex. rank. service and level of filDc:ss. 111e main outcome measun: here was the INCIDENCE
guish between chance variation in a dataset and a genuine effect. We do this by comparing the NULL HYPOIIIESIS. which Slates Ihallhere is no difference between the populations in which the data III'OSC. to the altcmalive hypothesis, which Slates thai then: is a difl'emICC. For a one-sided test. this allcmalive hypol.bcsis specifies the di~tiOD of the difference. i.e. we wish to distinguish chance variation from a decR:aSC or increase in mmparison 10 the null hypothesis. The P-VALUE ror a one>sided lest is calculated by CODSidering onl)' one side of Ihe lest statistic's distribution. One-sided tests are used in situations when: a genuine dift'enmce can onl)' oc:eur in one ~specified dim:lion and any differences seen in the opposite direction arc a raub of mere chance. For example. in Volzke el QI. (2002) onl)' iDCR:aSCS in cardiovascular risk wen: looked at.. as a lowu cardiovascular risk was DOl biologically plausible for Ihe study group. Onc>sidc:d tests are also useful in situalioas where we are only interested in diffen:nc:es in one direction. For example., when introducing a new cheaper and man: convenient diagnostic leSt we might only be inten:slc:d in whethu it is less accurate dian the curn:ntlest. Tests should nevcr be one-sided unless then: is IIIJOng evidence presenl prior 10 data collection, suggesting that an)' change Iiom the null hypothesis must be in one: (spedfted) ~ion only. Using one-sided lc:sts makes it easier to reject
___________________________________________________________ the null hypothesis when the a1tcmati\'c is lrue; Ihus onesided Icsts an: alIraclive to Ihose who define success as having a P-value less than the significance lcvel. A onesided lest should not be used just because a difference ill a particular direction iseXpccled. as things do not always lum out as planned. Mon: details can be round in Bland and Allman (1994). 14MB
BIBd,J.M. ... AIba... D.O. 1994: SlIIisw:sncMcs: ~and~ sided ICsts ofsipificanc.'C. Britirb Met/kalJouru 309,248. VOIzke, H., ..... J., KIeIae, V.. SebwabD. C.. DaIua,J. B.. Edell, L ... RIttIat R. 2002: Angiotensin l-conve~eazymc: iDsenioMlcldioD polymorphism and cardiac monaIity and morbidity after 0IlRHIII)' liter)' bypass pan mrgery. CMJ/l22. 31~.
One-way analysis of vartance
See ANALYSIS OF
ORDI~DATA
qucslion asks: 'In gcncraJ would you say your heal... is "'excellent". "1""" goot/', "rootf', "'/0;'·' or "poor''?' The gencnlised Iogit or palylomous modc:l is a straight-
rorwan:i exlension of the LOGISTIC REOIESSION model for binary n:sponse Dad ac:commocIates for multinomial ~nscs (Agresti. 1984). It does not. however. take: account of the ordering or the Calegories. To indicate the fann of the poIytomous model. lct X I and X2 denote c:owrialcs of inten:st and y be the response measured on the onIinaJ scale of the health 5Iatus question. Then taking the last category (i.e. 'poor') as n:ren:al. the palylOmous model is expn:ssed as: log
[~~: ~~~] = aj + X./1il + X2/Jh,
j
(2)
YARIANCE (ANOVA)
ordered categorical data
Sec atDINAL DATA
ordl..1 data
Data that have bcca collected rrom n:seardI studies gcnerally rail into lhn:c main categories: ( 1) nominal. (2) interwJ and (3)onIinal.ln thc case oranlinal data. thc: mast appropriaIC mclhods for analysis are lhasc that take advantagc or the ordering of the n:sponsc catc;ories. 1hcse mclhods an: collectively tcnncd onIinal Iqlasion models. In thc: lilendure. them an: six main types or anlinal rqn:ssion model, including the rollowing: polytomous model, proponional odds model, unconSlnlinccUcanslJaiDcd partial proportional odds model, adjacent category model. continuation ralio model and stereotype madel. Ordinal n:gn:ssion madels have been included in lhe broadcrcatcgory ofOENERALISED UNEt\R r.tODB.S and therefore consist of the usual c:omponenls: a rQndo", component, which identifies the probability distribution of thc: n:sponse variable: a ~'stmralic component. which specifics a linear function of expJanalOry wriablcs; aad a link, which dc:scribcs the functional relationship between the systematic component and the expected value of the random componenL All ordinal regn:ssion models can be expressed as:
F(;r)
= 1. 2, 3,4
= aj +X./1jl + X2/Jjl + ... + X,/Jjp
(I)
where F(;r) denotes the link funclion.also known as Ihe logit or log odds and this fUnttion includes the ·cumuladvc'Iogits.,1he 'continuation moo· logils. Ihe ·adjacent categary' Iosits and ·lCneralised'losilS.11aea1and/1ll, •. .• /lit an: the pammc:lcnlO be estimaled based on 1he,.Ma c:ut-painl (lhis is the paint at which the scale isdicholOmised) and the Xl, •..• X",an: the c:xmuiates measun:don the subjects in the study.lfwc let Pr(Y = )j) dencle the FRaIABIIJ1Y that a subject falls into the 'i c:aIqory, then usingequalion (I) wecanexpn:ss the legit funclioM for various OIdinaJ rqn:ssion models. For the purpose or illuSlration. the ftvc-point ordinal Kale from the health slams qucstion on the SF-36 Heallh Survey (Ware and Oandck. (998) is used. This
Then: an: four lapis functions based on thc: cut-points 'excellent' venus 'poor'. 've,.,' good' venus ·poor'; 'Soot/' versus ·poor' and 'fa;" venus .poor' • The logit functions an: cxpressed in terms of the four cui-paint-specific: intcrc:cpl panunc:1en (al) and for each or the covariatel XI and X2 , thc: four cut-poinl-spec:ific: rq;lasion coefftc:ients an: /1,1 and P12 n:spec:tivcly. For a given covariate. say X I' the parameten PJI conesponds 10 the four Iog«lck of (Y YI)' relative to the refen:at cah:gory (Y = y,,). Exponentiating the n:gn:ssion coefficients /1l l IaUlls in thc: cUl-point-spcciftc ODDS RATIOS comparing (Y = )i) versus (Y ."r) for a unit increase in the levels of X I having adjusted X 2 • Tbc prime fealure or the proponional odds model is that a single summary mc:asure (in terms of an acids 1Blio) is used to !iUII1marisc the n:laIionship of the ordinal n:sponsc and thecowriatcs. Thcpraponional oddsmodcl (SDI11Ctimcs kDDWD as the CflllJlt/aliR Ioril modeIJ allows for Ihe ordering or the ra;ponsc adc:gorics duough the uscofannulative probabilities. The proportional odds model was Ilrst introduced by Walker and DUDCan (1967) and lheir model was bascd on cumulative probabilities. McOallagh (1980)considen:d their madel in great detail and derivcd from it the proportional odds model. For this latter model. it is assumed lhal one can combine lhe cut-points of the n:sponsc into a single model, in which the same slope parameter Pis used ror eadllogil. 1hc proportional odell madel fit~ for thc ordinal scale of the health SIalus question would take on the form:
=
=
rPr(Y
log lPr(Y
; )~)
i
= I, 2, 3, ..
(3) Hen: lhc logits an: based on the four cut-points: 'excellent vcnul(·l'el")' sood'. 'good", "/air\ ·poor'); (·excellent'. ·very good') venus ('good". "/air'. ·poor'); (·exallen,'. ·very good'. 'good') velWS ("/air\ 'poor')~ (·exallent', 'very
321
ORDINAL DATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
good'••gDtHl'. "jilir') VClSUS 'poof. ~ an: four intcm:pI paramc:IeIS la,. in this model. AsjinclaSCs.these parameters increase. Id1c:cting an increase in the Ioghs, as addilional pmbabililies are added into the numerator (i.e. a. ~ a: ..• < a..). Also the Il. and ~ are the common slape: paramelcrs aver the four cut-painlS for each cowriate lapc:ctively. "I1IcK are 1\\'0 assumptianl or the pnaportional odds madel: (I) the c:xislcncc or an undcd)'iDg c:anlinuaus \'BriabIe - nul all anIinaI scalc:s will have an uncIedying c:anlinuum (e~ the Iolal SCIR): the pmponional odds model can still be used in such cin:umstanccs.b only dmwhack bcinl Ihat the iDlapn:laliDn or the paramders becomes clil1icult; (2) homogeneity in Ihe eul-paiat-spcc:ific n:p:ssion paramctas (lmown as the proporlionaJ odtI:s lUSIIIIIption). Prior 10 fitling the pmpartionaI adds model. it is lmpartanllhal the 8ISUIIIplion ofpapadiaDa1ity be checked. either paphically or fannally. f« instance using Ihe SCIR lesl (Pdcnan and Harren. 1990). An appealinl requiremeDl fCll'onIinal dais isahatthe model should in some sense be invariant under a reversal ofcalqary anIer. This implies daal die mapitudc or the summary eSlimales does nol depc:ad aD Ihe clin:ction employed in modelling the oUICOlllC. i.e. whether Ihe cut-poinlS are fonncd using incn:asiRJ or clec:n:asing levels of severity. Howe,". Ihe sip of Ihe JJ panunc:a is chanp:cl aad lhe IaA reverse sign and anIer. The prapartional odds model is also invarianl under the coIlapsability of the n:sponlC Calegoriel. HCDCC. if lWo adjacenl n:sponse categories are pooled tOiether and lhe CUI-paint mROVed. the eslimales of Il should n:main cssealially unchanged. althauJh the (aJI an: aIfc:ctcd.. Por the covariate.. say the parameter II, com:sponcls 10 the global Iog-odds over all the: four cut-poinls. The exponenlial or abe n:pasiaD caetlicienlS Il., n:sullS in a single estimate of the odds ratios for a unit incn:ase in abe levels 01 XI having adjusted X2The pnIpOItional odds madel aad the partial proportional odds models arecolleclively Ienned Clllllulali.logU nrtHkl:s. In pnc~. it is often ditlicult 10 ftod data 101' which a prapadional odds model is a plausible clescriplion. There is. tbl:n:fan:. a need ror a modellhal permits partial pnJpCII'lianal ocIcIs wllc:K some explanatoly variables may IIICCI the prapadionaI odds assumption and olhcn may not. 1bus. Ihe prinuuy n:ason for the formulation or abe "panial prapartionaJ odck models' was to max abe stringent assumplianofacanllaDtodds ratiopracntcd by the prapartionaloclds model. The IISSUIIIpIion thal a conslanl slopes model holds. when in fact. for aliwa variable. a constant log-odds ratio is nol n:praenIali~ ofall the log-odcls ralios over abe cUl-poinls. can lead to Ihe fannulation of an iDIxHIm maclel. The partial proportional acids models wen: inilialed by the wart ofPc:lenon and Harn:ll (1990) and in Fnenilhcn: are two types of partial proportional odds model: ,he IInrOllstrained partial pmportiolllli odtls model. for which no
X,.
consbainls am placed in the estimation of abc parameters. and the cOlUtraineti parll"al proportional otitis model. for which a certain n:laUonship may have been observc:d belween the log-oclds ratios andi, the point or dic:hoIomisalion. Such a n:lalionship may be linear, for example, in which case alinearconslrainl is placc:d on the paramelers of the model. 11Ie cut-points that are used for the pallial pntpOItianal odds models are the same as forabe propaItional odds madel. Assuming X I has proportional odds and X2 does not have proportional odds. then the: unconslnlined putial proportional odds model for the ordinal scale of abe healah slalUs question takes the fonn:
,.0)]
Pr(y < 101 [ Pr(y; ;;) = aj +X.1l1 +X2IIl + T2}')2
(4)
Hen: the III and fJ.z are Ihe n:p:ssion c:oeRic:ienls associalcd with the twa covariates or inlen:sL The T~ is the c:owriale whieh is a subset of the X2 fCll' which Ihe proportional adds assumption either is not assumed or is to be tested and )'12 an: the n:gn:ssion coemcienlS associated with T:l' so that T2 )'J2 is an incn:ment associated only with the fth cumulllli~ logil and )"2 =0.11)'12 =0 fCll'alli, then lhis madel miuces 10 abe proportional odds model. Thus a simullancous lest or Ihe proportional odds model assmnplion is a tell of the NUlL JMOnIESIS thal )'J2 =0 for all j =2, 3. 4. Since: )'12 =0, the model uses only + X"i2 to estimate the odds ratio aliSOc:ialcd with the: dicholomislllion of abe )'-n:sponse categories into the 11111 category venus the n:sI of the categories, when: the estimalion orthe odds l1IIios usaciatcd with the remaining cumulati~ probabiliUes involve incn:menUng a.,,+X.Jl2 by T2"a' Given that Ihe relationship ora cowriale and the n:sponse is n:pn:SCDlcd with naaproportional odds. then for the individual cut-point-spccilic odds ndios. often a cCllain type of II1:nd may beanlicipated: e.g. alinc:artn:nd may be expected. In such a case. a constraint can be placed on the parametc:rs in the maciel, 50 that the an:nd is laIccn inlO accounl. When abe constrainls an: incorporatc:d into the unconstrained panial proportional odds model. for the scale of Ihe health slatus question. this mocIellakes the fann:
a.
Pr(Y ~)J)]
log [ Pr(Y > Yi)
= aj + XIIl. + X2IIl + T2)'lrj
(5)
j = 1,2,3,4
rJ
r. =o.
Hen: the an: ftxcd pn:-spec:ilied scalars and 11ae new puameIcr )'2 is not subarcriptcd by j. AlthOUJh )'2 depends oni, it is mullipUed by the llxed ClDllslanl scalar
rJ
in the calculalion of the fth cumulalive logil.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ORDINALDATA
The uncIc:rlyiDS assumptions of Ihc partial pmportional odds models are u for die pnJpOIIional odds madel. Oivc:a cavariale X2 • the lcsl of wlu:lher a sinsle Y2 paramcler fils the cIaIa as weD as 3 (number or catqories - 2) parameters can be obtained by usins the LIKELIHOOD uno tesl. Here we CDm~ the loS-likelihood ofuncoastrained and conslraincd models. 'I1Iis Sives aD approximale chisqwn wilh 2 (Dumber of calegories - 2) - 1 DBIREES OF
"./1
FRDlJOM
(DoF).
The adjacent catelOl')' model utilises siqlc-clllCgoryprababilities rather than cumulalive probabilities. Agresti (1984) slala thai wIleD the rcspaDse calqoric:s have a Dlllllnal orderins, lopt models should utilise thal orderiq. ODe em incorporate lhc onIering dilUlly in the way we caasbuct the Iogils. Uke the proportional odds model the adjacent categories Iopt madel implies slochastic orderiqs or the JapOIR diSlribulians for different pmlictor wlucs. Agn:sIi (1989) describes the adjacent calegory logistic model as naadclliDg lhc ratio of Ihc two probabililies Pr(Y - YJ) and Pr(Y - '~I).j = 1.2. 3,4. 11Ic cut-points lhal ~ used for the adjacenl calclaries. p~ the scale or the health status questicm. would be: "excell",,' vcnus 'WI'Y good': cm')' rotHt venus 'rtHNf~ "g«HI venus Yoil": '/IIir' versus "poor'.
There ~ lwo types of adjacent CalclOl')' lopl madel. The COIISIaDI slope adjacenl calegory maclCl has the followiag n:pteSCnlalion: log
._--=--- -
Pr(Y - )j) ] ~~Y-Yi+I)
Gj + Xlfll
+ XVll
j - I, 2, 3, 4
(6)
Manor. Manhcws and Power (2000) described the adjacent .cateFl)' model iD a slilhdy dill'cnmt way. 'I1Icir version or the madel is: Pr(Y - y.) ] loS [Prcy
?) - Gj +XI~I +Xliljl
-))+1
.
J - 1,2, 3,4
(7) ID madel (6). for a given covariate. say X •• the parameter fll c:onespoads 10 the Iog-odds of falliq in calqoric:s "excel· lenl' YCrSUS "FeI'Y good'; '1'el')' good' versus "good'; "good' venus Yo;,': 'ttlil" venus "poo,' • If the expaacnlial is taken,
this raulas in Ihcglobal odds ralios forlhe MljacentclllCgOrics for each unil incIasc iD the levels of X I. 011 • similar nole, model (7) provides the cut-Point-spc:cilic acUacent cllk:gary odds ratios. and for the health sIalus question scale tIIcM ~ four or Ihcsc. for each unit iDcIaIsc iD the levels or a livcn ca'lBliate. OivcaDoniinal scale. when: one is particuhuly iDtcn:stcd in assessins the mmve chance or a givCII IBIing. &laiRst all more f.wurabIe ones. then one would IHII'IDIIIly caDSiclcr employing the conlimlDtioll mlio logils. 'I1Ic cODlinu.lion
ratio model is best suited 10 circumstaac:a in which die iDdividuai calegoricsarc ofparticularinten:sl.11 is wclJ suited far failure time cIaIa and ou!Comes that measu~ IIIrcshoJd poiDts. where indiViduals at a p~ levcl or an outcome musl have passed lIuuur:h all previous levels of aD outcxJmc. The Iopts for the continuation ratio .....1arc based on die cUl-points: 'excellenl' ~ (",et)' rotHt. 'good'. :fo;,'. 'pDDI"); 'm')' goot/' versus C"root/', 1iIir', cptIOl"); 'goot/' ~us (1'0;". "poor')~ y"Ir' venus 'poor'. As far the adjaceDl calcgcxy models, there are two vcnions of Ihc COnlinuation ndio model. 'I1Ic form of the coalinuatiOD ralio model was iDiliaily formulated by FcinbcrJ (1980) aad originalc:d from survival lime data. Various forms of the model exist; Ihe mast common is die forwanl formulalion model aad is wriuaa as: log
~.,> ~Pr((Y-Y')] Y > Yj
Gj + XIIlI
+ XlIJ,z J• -
I. 2, 3.4
(8) Model (8) ildcscribed by Coleaad Ananth (2001) as ajil'I)' cDlUI,oim!d COIIlinllQlion mtio 1IiodcI. II .Dows the culpoiDt-specific CODlinualion lBIias 10 be clcscribcd by a sinsle
regn:ssion paramctcl' (in a similar way 10 the JHOPCIIIiaaal adds modcl). This maclel represents the probability or being incategol)'j,CXIIIdilionai on bciq iDacatelOl')' gn:aathanj. The inlcn:epl )JIII1UIIdcrs arc dcnotc:cl by tGJ) and arc Ihe same as the cumulative Iogit model. bul ~ not necessarily anICIaI far the continuation ratio madel. Essentially. this madel caD be vicwc:d as the lBIio or the lwo conditional pmbabililica. Pr(Y - )il Y E YJ) ad PIty> 'ilY E )i). i.e. one .....Is die odds offallins iD Calegcxy j as appascd to highel' ahan caJqpryj, giveDthat one has been in c_garyjarhighcr. By \iewiag the oulcorne a. going fmm ~ to less SCYCI'C~ this model caD be appJied iD revcnc and rorms the backwanl cODlinualiOD ndias Pr(Y =Yi)/PrCY < )J). Because of the condilioniag on adjacent cut-points. Ibc continuation ratio, unlike the plUpOrlioDai odds. is a1TeClcd by the direcliOD chosen for Ibc response vari.ble. .ad the forward aad bacltwanl ratios arc not c:quivaieDt and yield differenl results. Tlaus. the continuation ratiO model is not iDvariant under the reversal of calegories uDless Y is binary, in which case OBC has to be careful which cODlinuation ratio model OIIC uses. ADaIhc:I' farm is die dill'cnml slapes continuation ratio naadcl .nd for allis model the repession pal'lllDCtcrs arc allowed 10 vary by the cut-point. This model is wriucn as: [Pr(Y -,.)]
log lPr(Y > ;)
=OJ + X.lljl + X2fJj2
.
J - I, 2, 3, 4
(9)
In thisease. tbc mullinomial Jiblihood ractors iatoa producl binomial likelihoods for the separate 100its. 'I1Ic
or the
S23
ORDINAL DATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
aJlltilMlalion ratio model has the advanlage thai the c - 1 Iogits pnxIuced when fiUing the ratios in n:lalion to the cUI-pointsa~ asymptotically independenl ofeach other.1'hus. the estimation of the paramdels in each of the c - I logits can be calried 0111 separately. using the mc:thod of maximum Iikdihood., and the summation of the individual chi-square 5latiSlics giyes the tm:rall goodness-of-Rt s1alislics for the set of lhe logil models. In practice. the continuation ratio model can be Rtled in any slalislical packqc lhaI includes binary logistic n:p:ssion. after suitable n:structuring of the data. As the fully conslnlincd model is neSled within the dill'acnt slapes aJlllinuation ratio model. the dill'c:Rnc:e in -2 loglikelihood (deviance) provides a test of the validity of die assumption that the thmihold-spccific continuation ratios are equal and is distributed asar variate under die null with DoFs equal 10 the ditTercace in the number of panlllletelS between the nc:stcd models. The: odds ratios for the diffcn:nt slopes and fuDy c0nstrained mnlinuation mlio models can be described in a similar way to those of the adjacent category models. as described earlier. The stereotype ordinal rqression model was inlroduced by Anderson (1984) as part of a general model for discmc multivariate OUlI:omes and also arises naturally in the conlext of wly discrete outcomes. The sterealype model is a derivative of the polylomous logistic model (2). The polytomous model pIOvides the best possible fil to the daIa. at the tlOSt of a lillie number of parameten thai can be dimcult to interpn:L 1he stereotype model aims to reduce the number or Plll1llDl:telS by imposing aJIISbaints without redUCing the adequacy or the model. For the health status scale. the starting point for the stereotype model is to impose a slnlctUR: on the fJJI and /l./2 such thai:
fljl = -tiljJ. and/lj1 = -4IjJl' j
= 1. 2. 3,4
(10)
11Ien model (2) bcoomcs: lag
[~: : ~}] = "j-~i(XI/l, +XJ1,)
(11)
j = 1,2.3.4
There an: certain fea1UR:s that an: specifically Mlevanl to the sterealypc ordinal re~ssion model and these include the dimen:lionality of the: model, mstinguisbable )'-response categories and the ordering of the )'-ruponse categories with rapccl to the CO\'arilllCS. The: dimensionality ofthe model for the )'-n:sponse and the covariates is determined by the number of linear functions n:quired to describe the relationship. If only one linear function is used to describe the MlaUonship between the ordinal n:sponse and a set of pmlicton, model (11) is
a olle-dimensiDllal stereotype model. Onc-cIimensionai n:lationships an: much more common in the literature compared to the higher dimensions. Having decided on the dill1Cnsionality of the model, t~ are questions about ordering and model simplification. perhaps using distinguishability as a criterion. The concept of indistinguishability is described when a given covariate. X •. aft'c:cts two response categories .lj and in an identnl manner (thus X. is not prcdicli~ between the two Cllb:gories): we then say that these two categories an: indistinguishable with n:spect to X •• '11Ie hypathcsis thai Y=Y. is indislinplishable from Y = )',. with n:spectto the cOYariates. takes the: fonn N,,: fl. = fl,. In the one dimension stereotype madel. this is cquiyalentto asking whether there are dillerences among the: 4IJ (Ho: til. =til,). In the n:gression models discussc:d so far (with the exa:plion or the polytomaus model), the n:gression panunctcn and consequently the logits an: based on the: ordering of the y-response categories. 'nIerefon: the proportional odds. continuation ratio and adjacent category models assess the ASSOCIATION or )'-n:sponse and the covarialcs conditional of the order that the categories occur. In this case the ordering is 'inbuilt' and assumed a priori. In many cases. one cannot be too cedain about the n:levance of the onIering of the n:spoMC categories. The sterc:otype model is based on the poIytomous model and thererore uses generalised logits. The polylomous model does not haye the mechanism to BIXlOUIlt for the ordering of the ,-response categories. Anclenan (1984) toaIt the laller model and assessed the relationship of the )'-n:spon5e and a gi~n covariate. If the individual cut-point-specific n:pcssion parameters wen: ordc.m (leading to the stereotype model). then one could assume that an onIemI natuM exislcd in the response categories. 1'bis is quite diffen:nt from the 'anlinality' aspect of the proportional odds. continuation ratio and adjacent category maclels. For these latter models. the onIcrcd categories an: accounted for through the formalion of the logits. Therefore they an: not necessarily onIcrcd willi n:spect to the covariates and in a sense it is not necessary to haye any rqn:ssor variables. By conlra!it. in the stereotype model. the ·ordinality' only reveals itself through asseSSing the relationship of the: )"-response and covariates. If ordering is appropriate. the mocIcl orden the fJJ (in the poIytomous model) instead or ordering the odds or the link function. Thc 'onlcring' is more din:ctly lied to the effects of the explanatory variables and becomes a testable slalemenL If the dimensionality is one. onIcrilll of the ocIds ratios is easily verified. If PJ > 0 and the odds ratio fonn a decreasing sequence eflill 2: ~I 2: ef¥1 2: ... 2: ~I 2: I. then:
>,•.
411 = I 2: tPl 2: ... 2: tils = 0
(12)
Note that (12) is not sIrictIy ordinal. as adjoining categories may be inmSlinguishable. If (12) is satisfied. then the eft'ec:l
____________________________________________________________________ ounuERS of the aMUiales upon the first odds ratio is gRater than ils effect on Ihc sc:cond and so on. For the health Slalus scale. model (11) has a slandanl multinomial inten:epl with folD' parameters ror a response variable. It estimates Ihn:e inclcpendcnt scale values of C~J} for the raponsc factor and a single beta parameter for each independent variable. The larger' the diff'en:nce belween any two.J values. the more the 101 odds belweea l.be outcomes is affected by the independeat variables. The til. paramc1e15 show how the independent variable. X I. affecls the log odds of higher versus IOWCI' scores. w~ "higher" and "lower' is defined by the ~J scale. Most orahe ordinal ~pession madels detailed hen: can be fitted in well-Imown slalistical softw~ packBIes. Howe'VCl'. then: ~ some models (e.g. the partial proportional odds models) lhalIR not very wellaa:oll'lDlOdatcd for. and in the litcrat~ these madel cannot be easily filled and IR m~ compulalional inteasive. The goodness-or-fil ror a t'-Calcgory model is a nalUra) extension ofbt for the two categories. However. methods to assess Ihe residual. when: there is lack-of-fit ~ underdeveloped and IaIIiR further resc:an:h. Indices such as AKAJU"S INFOIWA1ION CRmJUA (Ale) ~ oRen used to compare diffemll madels from the same data. Howe,,". the use of suchan index. has been I1U'ely cited and it would be worthwhile to exploit and ewluate such an index in the CIDIIlcXt of ordinal re~ssion models. RL
AanItI. A. 19M: AIIoIysis oJ ordinal ctltrgorit.Yll.'tI. New York: Jolm Wiley & Sans, Inc. Apaa, A. 1989: Tutu an mocIc:lIing onIcRei catquicaI rapDlR data. PI)"('/toIo,it.YJI Bulklill IQ5. ~301. Alldenaa, J. A. 1984: Rcpasian and adcRei caIcgoricai wriabJes (with cliscussial). JtIIIrIIIIl 0/ tltt ROJYII SlalirliIYII Sofiety, Series 846, 1-30. C. . s. .... AadI. C. 2001: Rcpasian models fel' UIICClIISbaiDcd putjaIIy or fully CClmlJIilXd CXIIIimIatiaa adds IIlia5.. InltrIItIliDlltll Epit/mtitJlogiali AsJOdtlliJlJ 30. 1379-&. JI'.~ B. 19m AIrtIIysis oj O'o!S-t:itmjjW 2nd edilian.
lonl-tailed distributions. butlhcy may also be irRlevant or emmeous observations that need to be expunged from Ihe dataselS before analysis. with due precautions. Outliers in the distribution of a single variable (un;vtlriale outliers) are defined in termsofthe data sprad (Ramsey and ScWer. 20(2). The fiJ;urc shows the BOXPID1' for some data with a MEDIAN value equal to 60. first quanile equal to 40 and third quartile cquailO 80. The IHJUQUARTILE IlANOE of these dala (shown as die central box) is thus equal to 40. OUlliers an: defined as all observations that ~ more tIaan I.S intcrquanile ranges away from the box. Some srATImcAL PACKAOES use another' con~nliOD of showing extn:me wlues ~ than 3 interquartile ranges away ftum the box. In the fipn:, all values exceeding 140 (=80+ I.S )( 40) an: outliers (shown as circles). When the data cane flUID a NalM.\L DlSlRIBtmON. a lest can be used to dc:lecl univariate outliers. Gnlhb:J' tesl sltllisl;~ isdeftncd as the largcsa absolute: deviation from the sample r.tEAN in units of the sample STANDARD DEYlAnQllt max I y,-mVs. w~ y, is the ida obscnalion. m is the sample and s the sample standard deviation. Critical values fclr this lest statistic can be cOmputed from the t-DI511tIBUI1ON with N - 2 DEDREESOF fRI!EDOM. where N is the sample size (Orubbs. 19(9).
mean
Q
.;)
200 o
100
.'a.
CambridF.MA: Mn'PIas.MIDar,O.,MattIIn'I,s. ...........r,C. 2000: Dicld:1moas or caleplrkal lapallse? AIIaIysiDc self......
i}
bealthandlifelimesacialclau./III~l1IQlitHrtIIJtJUI1IQlofEpitkmiolol1
r.1fOwIaP,
29. 149-57. P. 1980: Repa.1ion models fortlnlnal chu (with disalssian). JOUI'IIQlof. RDyaISltllistkal Society, sma B 42. 109-42. "lInDa, B.IIDII ......... F. 1990: Partial pIQIICHlianaJ adds mcxlds f« _DIll RSpDIISC \'aIiabIes. AJIpW Sltllistit:ll9. ~17. W....,s. .... o..c.n. D. 1967: Estimatiaa oIlhe .....Iity of an evell as a func:1ian of SC\'Cml iDdcpeIKient wriables. BiomelrilcQ 54, 167-79. Ware,J.llldo.dek.B.I998: O\'CIViewoltbeSF-l6hea1th suney and the intcmalicnal quality or life asselillDCllt (lQOLA) project. JtIII1l1111 oJ CI.oI Epidemiology SI. 11.903-12.
ordinary least squares (OLS) See LEAST SQUARES !SI1M1mON
outliers 11Iese are observations judged
to be too far
from their group average. Outlien may genuinely come from
outliers A boxpIot showing thtee outIitNs Some Slatisticallests are resistant to outlicn, but many an: nul. The simplest example arises from the calculalion of
a sample mean: the mean is very sensitive to outliers: in contrast. the median is raislant to outliers. Liiccwise. die comparison of two sample means thnJugh a 1-1ESTis sensitive to outliers: in contnlst. a lest that uses l.be ranks of Ihe observations, rather' than the observations themselves. is resistant to outliers (such a rank tell is nonparametric and is also robust to deviations from- nonnality). Outlic:n in the distribution of sc:'tmIl variables (multivariate outliClS) ~ defined in terms or the dislance or each observation to the multivariate mean. The Mahalanobis'
32&
OVERDISPERSON _____________________________________________________________ distance is computed by standardising the variables of inlUest (subtracting the mean and dividing by the slandanl deviation) and summing the squan:s of these standardised 'Variables. The sum approximately follows a OII-SQUARE DlSTRlBl1t10N with m degrees or fn:cdom. if m wriables ~ consideral. Multivariate out lien can exert mldue inftllCllCe on the analysis. particularl), if multiwriate regression is used. Casc-iDftuence statistics such as the leverage. Cook's distance or studeDtiscd n:sidual. an: useful to detect inftucntial observations and to assess their impacl on the resulls of the !.IS analyses (Ramsey 8Dd Schafer. 20(2). [Sec also STUDENTS 1-1DT] Grabill, F. 1969: Praccdures for detecting outlying abservaIions in samples. Teclrnomnrics I I. 1-21. _ " F. L. aad SdIafer, D. W.2002: 71re slG/Utical slattk A toIIrJt in nltlhods in .to onalyJis. PlM:ific GlOve. CA: Duxbury.
overdlsperslon Ovcrdispersion occurs whenever the outcome (or the n:sponse variable in a regn:ssion model) has a larger VARIANCE than that predicted b), whatcver model is being used. It is not usually a problcm in regression models with a continuous outcome and a normally dislribuled error tcrm. as the NORMAL DlSlRIBunON has separate paramc:tels for the variance and !.lEAN. Overdispenion arises more commonl), in thc case of discrete variables - usually cither CGUnt 01' binary variables. For the Daalysis or count variables. a usual assumption is that they follow the POISSON DISTRIBUIlON. For binary variables. the BINOMIAL DJS1RJBUTlON is often assumed. In each case. the dislribution has only one paramcter. so the wriance is ddennined by the mean. The Poisson distribution assumes thal the mean and \'8riance of the distribution an: equal. The binomial distribution assumes that thc variance is the mean multiplied by (1 - the probabilit)' of success). Ovcrdispenion occurs when the actual \'8riance seen is greater than ahat predicted b), the Poisson or binomial distributions. An example in ageing resean:h istheanaI),sisofacli'Vities of daily Ihing (ADL) scores in a trial of a prehabilitation prognamme (Byers el al., 2003). ADL wcre saJrc:d on a 16-point scale and had a positively skewed distribution with a mode or 0 and mean of 2.8. The 'Variance was 16.4. indicating considerable overdispersion. There an: two common causes or overdispenion (Agn:sli. 1990). These are: (a) positive (.lOIRlation (rather than independence) between observations; (b) the true sampling dislribution. being a mixture of Poisson dislributions. The latter could be caused by helelQleneity among subjects. For example. suppose that the distribution of ADL for
women was Poisson with a mean of 9 and ror men was Poisson with a mean or 4. For a group of equal numbers of men and women. ADL would have a distribution with a mean ofapproximately 6.S but a 'Variance of approximately 13. thus showing overdispclSion. Overdispen.ion can be examined b)' comparilll the variance of a set of observations to the pn:dicacd \'8riance under an assumed distribution (as above). However. ovcnlispersion can also beeumined using regrasion models. Forexampie. using a Poisson regression model. if there is no ovcrdispersion and the model is cOlRClly spc:ciftcd. the DEVIANCE of the model would be ellpecled to cqualthe number or DEOREES OF f'IlEEDOM (Lindsay el Ill.• 2(02). If the deviance is larger than the number of degrees of freedom, this is traditionally taken to indicate ovcrdispenion. Forexample. in the ADL example (above). the I1ItiO or deviance to number of degrees of fn:cdom was 4. Lindsay eI al. m:ommend thal a de'Viance gn:ater than twice the number or degrees of rn:c:dom should be taken as an indication that ovcrdispersion ought to be examined. Overdispcrsion can rault in unclcratimation of STANDARD ERRORS. if the overdispersion is not taken into account. Tmdilionall)'. variances have been adjusted by multiplying them by an inftation factor. This inllation factor (or 'hete~ gcneity factor') is equal to the de'Viance divided by thc number of degn:c:s of fn:cdom. It will thus be grealer than 1 ir there is ovcrdispenion. Inference is then based on the JdAXiMUM UKB.IHOOO ES11MAT1ONS obtained b)' the fiUing or the Poisson model to the overdisperscd data. bul with multiplicalion of thc standard enors by the square root of the inftalion factOI'. However. in the em offast statistical programming. overdispersion can be investigated and taken into account using statistical models. rather than merel), CXN1'eCting the standard erron (Undsa)' eI al.. 20(2). For counl data. the standard altc:mati'Vc to the Poisson model is thc nc:pliYe binomial model. This model includes a disturbance OI'enortenn. i.e. it assumes thai the mean varies randomly in the population. For binary elata. the standard alternative to the binomial model is the beta-binomial. which assumes that the PROBABIlITY has a BETA DlSTRlBl1f1ON. Inclusion of appropriate eovariates in a simplc Poisson or binomial model may also be a way of accounting for ovcrdispersion. Undsa)' el QI. (2002) examined the presence and efl'ect of o'Verdispersion in data on the annual number of blackgrouse and the etTect of climatc on this number. Usilll the Poisson disbibution. there was cvidence of overdispenion. with the model having a deviance of 29 on 18 degrees of rreedom (inftation factor or 1.6). However. allowing for o'Vcrdispersion using the negative binomial model gave paramelU estimates and standanl erron that ~ 'Very close to those from the Poisson model. Another cxample has an innation factor of 1.9S, and yctthe negative binomial model fitted no
___________________________________________________________ OVERMATCHING bc:uer dian the PoilSDn IDDCIeI. In Ibis cue. fram the modelfitling the~ was no evidence of ovenlispenion - and yet usillllhc innalion factor would have muldplied Ihc slandanl errors by 1.4. The n:commcndation flOm Ibis paper is lhallhc deviance can be used 10 indicalc passible ovcnIispcrsion. but this should then be investigated using model-based Icchniques. rather lhan mcn:ly inHaling Ihc standard errors. KT
Pad_
,\p'eItL A. 1990: Calegoritll'.'QIIIIQlyJis. New York: Jaha Wiley and Sans. Inc. Byers, A. L. AIIon, II., Gil, T. M. I11III
P. No 2003: ApplicllliDD or RCplive biaomiallllCldeling for disc:~te outcamcs: a case study in ItIiIll 1aClR:b. JourntIl 0/ Clilliml Epitlerniology S6. 6.. 559-64...........,., J., LaarIa, Do, V.......... .., Hebert,.., HlllweI, B.,IIDI. G. B."111.2002: Risk fadars for Alzheimer's disease: a praspcctivc anaI)"'is tiom the Cuadian Sludy of Health and AM. Amerirtm JOU1fIQI of £pitlenUOID'" 156.5. 44S-Sl. (Sec also QuA.a-~J
overmatching
See CASE-COWrROL STUDIf3
327
p paired Heat
See S'I\IDENl"S I-TESI'
partial correlaUon coefficient
See CCRJl£1.A11QN
partial likelihood n.is is a runc:tion~ consisting of a product of conditional LlKEUHOOO5. used in certain situations for eslimatiaa and hypalhesis aliDl. The mo5l c:vnunonly used partial likelihaocl is that used in COX'S RmRESSION MODEL In this model, it is assumed thBt the hazard of faillR aI time I fOl' an individual with covariates X=(X ••• 00, X«) is A.,(/)J(X. JJ). when:/(X,JI) = exp(D=.II,tX,t). fJ = CfJ •• •••• fJK) arc lhe. hazard ratio p8I1IIDCters of in~5l and Ac.(/) is Ihc: unknown baseline hazard funmaa. No assumptions are made about Ihe fonn of AJ.I). In Ihc: fulilikelihoacl function for Cox's model. information about fJ is tied up willi infonnalion about Ihe nuisanc:c function AJ./). Cox' 5 parliallikclihood sacrifices some oflhe. information about fJ contained in Ihc: dabl. in anlcr to eliminate this dependence on A.,(/). Let IJ denote Ihc lime oflhc,Jlh faillR and let RJ deaate lhe risk set allime 1,It i.e. the sci of individuals who. just before lime 1,It n:mained not failc:cl and nOl censonxl. 'l'ben. conditionalon the risk set RJand lhe fact that a faillR took place at time Ihc PRClBABDJTY that it was pelIOn ; who failed is:
'I'
f(X... ~) EI~(lj)J(XI.~) - 'E/E_j/(X"JI)
Mi(/j)/(Xi,fJ)
_
depend on 10(1). Intuitively, since no assumptions are being made about the fonn of the bueline hazard and since ccnsOl'ing (see CENSORED OBSERVATIONS) is assumed 10 be noninformalive. information about the actual times of failure aad Ihc: censoring events would not be expected to reveal much about the parametc:n of inten:st. ~. and indeed. this turns out to be the case. Thus, conditioning on these leads to very liule loss of information abOut JI. Less obviously. it also tWIIS out thai this paItiaJ likelihaocl function. although not a ~r likelihood funclion. has similar 5latistical propc:rtic:s. 11Ius. it may be used to c5limalC fJ. to calculalc a CDVARlANt"E MATRIX for JI and for UKELIHOOD RATIO hypathc:.sis testing.
SRMII EdUitM
CIa,.D.8IIdHl.,M.1993:SlaliJIit:alnrotle&inepikmioiogy. Oxfclld: Olford Science PUblicaliaas. CoIett, D. 2003: MotkIlinI surnfGltIota in lftftiicQ/ ~mI,th. 2nd edition. LancIan: Chapman &: Hall. Kalb8e1ldl, J. D.... PnIdke, R. L 2002: 71re slalillit:Q/ IBIQ[ysis t1f/ailure lim~ 2nd editiaL CbicIIcsIer: John Wiley &: Sons.. Ud.
.'D,
path analysis This is a tool for evaluating the interrelationships among a set of observed (or latenl) variables based on their correlalional structure. 'l'be poslulated relationships between the variables are often illustrated graphically by means of a path diagram. in which singleheaded arrows indicate the din:cl inftuence of one variable on another and curved doubl~headed arrows indicate correlated variables. (For an example of such a diagram see CONFIRMATORY FACTOR ANALYSIS.) Originally introduced for simple regression madels for observed Variables. the method has now become the basis for more sophisticated procedures such as conftrmatory factor analysis and use of structural equation models. involving both manifest and latent variables. SSE path diagram
1be putiallikelihood is the product. overalilhe. failure times. ofthcse conditional probabilities. It uses notlhe. actual faillR times. bUllhc nnks of lhe. faillR times. Note that it does not
&rqdtlfNlldi~ ComptIIIiIIIf Ilk Mt!dJml SlQlBlkl: oJ) 2011 JoID Wiley 6\ SoK ....
A more general deftnition of the partial likelihood can be found in. for example. Kalb8eisch and ~ntice (2002). For fanher details see Clayton and Hills (1993) and Collett (2003). SRS
See MIN A.'W.YSIS
pattem recognition
Sec a.um:a ANALYSIS IN MEOJ.
CINE. SUJIIIOIn' VEcroR ltACHINES
Pearson's contingency coemcient
See CONTIN.
GENCY COEfFICIENT
Pearson's colf8iatlon coefficient
See CCJRItB.A'I'J(W
per protocol
'Ibis term is usc:cl to describe a subset of participants in a randomised clinical biaI who complic:cl with the protocol (see PROTOCOLS FOR CUN1C'AL 1RIA1S). It is also usecltodcsmbe an analysis based only on these participants. The perplUlocoi dataset consi5lsofthose participanls who complied with the protocol sufftcicndy to cosun: that their data would be likely to show Ihe efl'c:ctsofln:atmeDl. Aspc:cts
of compliance that could be considcRd include exposure 10 In:llbncnt mad violations of lhe. eDtry mteria. For example.
EAbd by BriaD S. Ewrin aDd ChrisIOph« R. PaInaeI'
PERSON-YEARS AT RISK _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ participants could be excluded iflbey look less than 8O'l. of the pn:scribcd Imllmcnl, if they used an additional bcalmcnt that could also affecl outcome or if n:lrospcctiYC cwlualion shows that they did not meet eligibilily crilCria.. 'I11c rules for inclusion and cxclusion of participants from the per protocol dataset should be carefully specified befon: lrealmcnl allocations an: unblindcd. Any decisions made after the data are unblindcd should be n:gardcd with suspicion. The rationale for per protocol analysis is to estimate the emcacy of an intervcntion. undilutcd by faclOn such as noncompliance:. lack of eligibility and additionallrcatmcnts. II is the~forc mosl often usc:d in explanatory trials that aim to mcasun: the efficacy of an intCI'Ycntion under cqualised conditions, nthcrlhan in pragmatic trials thai aim to ~ theeffcctivcness ofan intCI'Ycntion policy in routine pmclice (Roland and Torgerson. 1998). However. per protocol analysis is subjcclto SELECIlON BIAS because some: participants arc: excluded after RANDmDSATION. Selection bias may be Ie:ss wbca the natu..: and number of exclusions is similar in difTen:nt nndomised groups. bUI BIAS can still occur in Ibis siluation. n.c~forc per protocol analysis should only be usc:d whm estimating efftcacy is more important than avoiding selection bias. When a per protocol analysis is done. other analyses. such as 1NtEml(]NTO-T'REAT (ITT). should also be rc:poJted. One specific siluation whe~ botb per protocol and inlcnlion-lo-I~l analyses ~ routinely reponed is in EQUIVALENCE STUDIES (leH E9. 1999). ITT is anliconservalive for cquivalcnce lrials because the inclusion of participants who do nol comply or switch lrealmcnts causes dilulion of dilTc~nc:es. However. since per protocol analysis is always poIcntially biased, it is bc:sI 10 do both and can:fuJly characterise cxclusions from Ihc per protocol analysis (Jones el 01.• 1996). Participants withoul oUlcome mcasun:mcnts may also somclimcs be cxcluded from a per protocol analysis. Howa'cr. this may lead 10 biased n:suJts and the potential impact of missing data should be considcn:d (Shih. 2(02). SHIIW 1n1e,..110_ CoaftnDCeOD IbI'lllDlllsalloD E9 Expert Worldal GnnIp (lCH E9) 1999: ICH harmonisc:d lripadite guideline. Stalistical principles for clinical trials. StQti~tia in MMieine 18. IS.
1905-42.Jaaes, a..JanIs,P.......... J. A. and Ellbatt,A. F. 1996: Trials 10 assess equiYaleace: Ibe importance of rigorous methods.. British Medical Jounrallll. 36-9. RoJud. Me aad Torpnoa, D. J. 1998: Undentaadi", controUcd trials: wbal an: JllllPDalic mals? British Medical JourllDl J 16. 285. SIdII, W. J. 2002: Problems in dealiag wilh miSlSing dada and iDfonnati\'c C:CDsoriDg in cliDicaimals. CIITrml ConlTDIled Trials in Cart601'Qmllar Mftiidn~ 3. 4.
person-years at risk nis is lime in years summed over several individuals. In any COHORT stUDY of incidence raIc or nte ratios, difTcn:at members of the cohort will be at risk for difTen:nt amounts of lime.
Some members may cRIer the risk sci later. because they joined Ibe cohort lalcra and some will leave carlier. because they wc~ ccnsoral or cxperienced the evenl of inlcn:sL Tbc length of time al risk (mcasun:d in years) experienced by each of a sel of indiYiduals. summed over lhasc indiYiduals. is the Iotal person-years at risk for that seL Incidc:ac:c rates (measural in units of pel" pcnon-ycar) an: calculated by dividing the numbcroCevents oc:curring ina sct of individuals whilc they an: at risk by lbeir total lime at risk (measun:d in person-years). Nolic:lc the implicit assumption being madehcn:: one person at risk for 10 years is cquiwlcallo a group of 10 people at risk for 1 year: both yield 10 penon-years at risk. For furtbu details sec Rothman and OrecnJand (1998). SRS R"""IM, K. J. ad G............ S. 1998: MDdtrn epkkmiol"O'. 2ad edition. fhiladdphia: Lippincoll-RaYC:D Publishm.
PEST
Sec SEQUEN1'JAL ANALYSIS
phannacoklrwtlcsJphannacodynamlca (PKJPD) Sec PHASE I TRIAU phannacovlgllance
The terms ·posl-mmcting surveillance' and 'pharmac:ovigilancc' arc often used synonymously 10 n:fcr lo moniloring (both aclively and prospective:ly or ~acling 10 spontaneously occuning safely concerns) liccascd pharmaceulical and biological mcdicinal products. AI the time a marketing authorisation is gnnled (see MEDICINES AND HEALTIIC'ARE PRODUCTS RmuUTORY AoENCY) thc:~ may ha\'C bee:n betwccn a few hundn:d and a few thousand paticnts sludicd. Typically. each ofthc PIW£ UI CUNlCAl. 11lLUS may have only nndomiscd a few hundred palientsto the new, experimental beatmcnL It is quile likely, lhercfon:. that ran: raclions (pcrhapsoccuning in one in a thousand patients or fewer) may never have been seen in such mals. No medicine is evu completely safc and it is therefore important that once widespn:ad usc of a new one has begun. safety signals ~ moniton:d and followed up. RaR cvcats may manifest Ihcmscl\'Cs. as may adveI'SC interactions with othu drugs conunonly used but in whom inlclaction studies wcre never carried out. Phannacovigiiance is a rdalively new scic:ncc, its grounds pcrhapsbcingsctoUlbyFinncy(1971).whosu~stedthat"thc
primarydUlyofadrugmoailaringsystcmislesslodcmonslJalc dangu or to cstimalc incidence than to initiate suspicion'. It mighl be argued that once ,;danger' has been detennincd. Ihc monitoring procc:ss is too late and patients have aImHIy been injured (or may havc died). What we need isa system that can "initialc suspicion' so that pI1:\'Cntive measun:s can be taken. Evans (2000) discusses a similar theme. The great challcngc for phamacovigiJance is the uncontrolled natu~ of the data. Marketing authorisations an:
_________________________________________________________
generally based on randomised. double-blind. PLACDO (or active) controlled studies. Assessing posl-markcling data is much closer 10 working in fPIDEMIOLOOY. Various national systems for reportiDJ; unexpected adverse reactions exist, including abe "Yellow Card' system in the Unilc:d Kingdom and abe "MedWatch' system in the USA. 1hcsc (and oIhcr) systems ~Iy on reports rrom a VDriely or soun:es including doctors. phannacists.. phannaceutical companies and sometimes from patients abemsclves. However. they my on the judgement of whether an ad\'CI1IC ~action needs to be considered. By the very nahan: of the rarity or the reactions that phannacovigilance systems are lIying to identify. doctors may nul realise that an ad\'erse e\'ent in a particular patienl has anything to do with the mcdicalion(s) he or she is ~civing. Con\'CI1ICly. reporting any and all adverse cvents experienced by patients and highlighling a possible malionship 10 any or all of the medications the potienl is fCICICiYing would o\'erbunlen any doclor and monitoriDJ; system. A middlc ground nceck to be found. A rurther problem arises when a "new' advene reaction is suspc:ctc:d and reportcd (in the scientific or lay JRss) and then abe incidence of spontaneous ~parts often suddenly increases. Tbc system swings rrom (almost always) underrcpmting to (occasionally) o\'cm:porting. Various methods ha\'e been proposed to IIy to O\lm:OD1C problems of over- and Wldem:palting. Systems exist to ~onl all medical evcnts (JRscriptions, illnesses.. cle.) ror samples of paticnts (say, all patients of scleclc:d general practitioners). These are beucr at oycn:oming selective reporting and. allhough they are based on samples. they are typically much larger samples than can be recruited into CLINICAL TRIALS. Tbcy also represent samples rrom practical experience in the community, rather than the highly CODtrollc:d en\'ironment of a clinical trial. For further reading. van cler Heijden el al. (2002) proposed statistical methock that usc 'all n:poIts of n:aclions' as a means of adjusting for a general levcl or undcm:palting. Origg. Fan:weU and Spicgclhaltcr (2003) ha\'c ~vicwed statistical methods taken from ideas in quality control to monitor sln:ams or ~ports (advene reactions and others) in .~ lime'. while Strom (2000) gives an exceUent and detailed coYCnlle or issues and methods used in pharmaco\'igilancc. SD
E'VaDI, S. J. W. 2000: PhannKlOVigilaDce: a science or fielding emergencies? StGtbtics in Medicine 19. 3199-209.11nae7. D. J. 1971: s.tistic:aI aspects of monitoring for c1aa~ in drug therapy. Mrllrotb of llf/omJIItion in Medicine 10. 1-8. G..... 0. A.. Fare.... V. T. aDd S.....baUer. D. J. 2003: Use or risk.adjWlc:d CUSUM . . RSPRJ' chads for monitoring in nxdical contexts. StGlbtit:tl1 Mellrods in MedicGI ReseGrch 12. 147-70. Strom, B. (cd.)
2000: Plrtmnocorigilance. 31d cditioD. Cbicbcster: John Wiley " Suas. Ltd. , . de HIIJdta, P. 0.1\01.. ftII PaJJnbraek, Eo P., '¥aD
B.........S.aadaa_Horst..,J. W.2002:Onlheassessmentof
~EITR~LS
adverse drug ~ans from 5pOIIIaneOUS reporting sys1C11L\: the influence of uncier-lqlClrting 011 odds ratios. StGlistics in Medicine 21. 2027-44.
Phase I trials
These arc CLOOC'AL 11UALS canied out in the early development or a drug. after animal tOXicology studies have becn completed, when the drug will first go iato man. The)' involve human \'olunteers. healthy 01' patient. and as such thc subjeci of such lrials expects no therapeutic benefiL The rocus or Phase I trials is on the safely and tolerability of drugs and pharmacokinetics (PK), which can be CODsidcrcd as what the body docs to the drug. and phannacodynamics (PO). what the drug docs to the body. Pharmacokinetics involves measurement of drug concentrations in the body. detcrmined by taking blood samples at specific times throughout the sludy. These are then summarised using such measures as AREA UNDER 11IE CURVE (AUC), ~presenting total exposure, and muimum drug concentration (CmuK). Other mcasun:s thai might be used arc time 10 maximum drug concentration (I .... ) and elimination haIr-life. II is not possible to cover these topics in detail here. but Roland and Tozer (l99S) give morc dewls and applicalions ror these measurements as well as how to look aI the ~Iationships betwccn them, using PKIPO modelling. Phase I lriaIs can also be carried out prior to drug submission or when a new formulalion is being de\'CIoped. Examplcs of such studies an: biocquivalcncc. drug intcractiOM (both phannacokinetic and pharmacodynamic). the effecl or food and ~nain special populations, such as renal or hc:palic impairment. elderly or malcs \'enus remales. These trials can also be carried out in early devclopment to give a company a general idea or abe effecl studied, but a confirmatory &lUdy will inevitably be required during submission to a !qulatory body. They are highly regulated sludics, with design and analysis being \'Cry standard. Further details on the SlUdies can be found alabe FDA website: http:// www.rda.go\'/cdcrlguidance. First-into-man studies lakc place after complction or toxicology tcsIS. The primary objective of Phase I trials should always be the examination of the sarely and tolcrability~ however, PK and PD data an: usually collected and analysed. Since: thc drug has not been previously administered 10 hulll8lW the doses ha\'C to be cscalated. meaning that each subject Slaltsoa a low dose and Pf'OI'CCck through the doses to progn:ssively higher ones. ObViously, data have to be ~ viewed prior to any escalation and, ir there is any cause for concern over safely, escalation wi)) not occur. remembering that abe safely or subjects is most important. n.c table gi\'CS cxamplcs or dosing escalation schemes showingjustsing)ecohorts.ln n:aiity,lhcre is likely to be up to 24 subjects in six or eight cohons. although this depends on
331
PHASEIITRIALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ how many subjcctscan be m=ruitcd and how many sUdy data need 10 be collc:cled..
Ph_ I trial. Examples of a !ising doss scheme
mdbocls for dosc-aca1ation studies in healthy volunteers. Bioslalislits 2. 47-61. Seaa, 5., A....... D., Bailey, R. A., BIrd. S. ~I., Bopcb, B., Col......, P., Garrett, A., GrIeve, A. aDd ....... II18II", P. 2007: SlaIiSlicaJ issues in finI-jn-man studies. Jouma/ of lire Royal Slatulim/ S«iel),". SNies A 170. SI7-79.
E:ctllrJpie J
Subj«1
Ptriotll
PtTiod2
PrriodJ
Period 4
I
IOmg IOmg IOms Placebo
20mg Placebo 20mg 10ms
lOme
Placebo 30m, 30m, 30m,
2
3 4
20mg Placebo 20mg
Exompif! 2
Sllbj«t
Ptriotll
Ptriod 2
PeriodJ
Placebo
2
IOmg IOmg Placebo
30m, Placebo 30m,
3
20mg 20mg
Notice that both designs incorponde a IUCEBO period for each subjccL This is so that the measurements from an acti,,-e period can be put into some context and it can be determined. for example. whether headache is a tal drug effect or merely a result of being in a clinical trial. Much of the analysis for these types of study is likely to be data driven. since they are mainly explonllor)'. Such things us dose or exposure responsc CUJ\ICS. willi safety and phannaCXlkinctics. can be examined. as can Ihe detennination of a maximum tolerated dose. In addition. Bayesian techniques (sec BAYESWI MIrrHODS) can be used optimally to design and analysc these studies (Whitehead el a/•• 20(1). The TON1412 Phase I trial (Senn el a/.• 20(7) nised impodanl ismcs n:garding finl-timc:-in-human trials or new lR:atmcnls.1n this trial all six healthy \'Oluratcm exposed to the acti,,-e drug sufi'cn:d immune !aCtions willi scycm and in some cases long-1ennsequcJae. Asaconsc:qucnc:eoflhistrial the UK Sccmary of Stale for Health set up an expert panel to il1YeStiple Phase I trials. Their rqJOrt includes 22 m=ommcndations and was published in No,,-ember 2006 (Expert Scientific Group OIl Phase One Clinical Trials. 20(6). The ABPlJBIA also published a rqJDIt OIl the TGN 1412 study (Early SlaF Clinical TriaI1Bskfan:e. 2006). Sean el al. (2007) discuss statistical aspccIS of the trial. provide identification of shoncomi~s and make m:OmmcndatiDRS for fu~ trials. AS
Earl, Stap
CllDkai TrIal TasU'ofte 2006: Joint ABPIIBIA Report. London: Association ofthc British Phannaccutica1lDdustrylBiolndusary Association. Expert Sc:lntlllc Groap GIl Pbue ODe Cllalral1'rIaII2006: Final Repon. LondoD: The Stationery
Office. Rowtaad. M. &ad T__, T. N. 1995: Clinkal pnormacokinetics - mncepls and applii:alioru. 3ni edition. BaI~: Williams aad Wilkins. Wbltellelld, J., ZIIoa, Y., PaUersoa. 5., Webber, D. aad FraadI, s.. 2001: Easy-lO-implemeDt Bayesian
Phase II lllall
This is a CLINICAL 1IUAI. of a ftCW agent or procedlR in which the primary objccti~ is typically to detennine whether it has sufficient thcnIpcutic eflicacy with an aceeptable safety profile in patients to warrant fUrl~ lcstjng and development in additional Phase lIlrials or large Phase Jlllriais. Before starting a Phase IIlria), however. a safe dose and schedule or the new drug or procedure has to be established in earlier dosc-ftnding Phase 1 trials. As such. Phase II trials are similar to the therapc:ulic exploratory studies according to the ICH HannonisedTripartiteGuideline E8. and SIaIt with the studies in which the primary objective is exploration of and sm:eni~ for therapeutic eflic&y. Phase 11 trials often employ study desips such as historical cantmllcd studies in which comparison is made with the efficacy from the historical cxmtrols or sclr-conlrolled sbJdies in which comparison is made with the baseline status. Phase II trials are often conducted in patients who rnc:c:I narrowly defined eligibility criteria in order to cnSIR homogeneity in patient baseline characteristics. ENCIIOINISofPhaselilrialscan be biological or a sunogalc of clinical outc:ome and somc:limcs Phase II trials may be further classified as Ua or lib accordi~ly. Also. in Phase II trials additional objccli1lCS such as exploration of other sbldy endpoints. therapeutic rcgimcats including concomitant medic:alions or largel paticftt populations are evaluat&:d and analyses for these objectives ~. by necessity. exploralory and invoh'e many subset analyses. In the carly development of a new therapeutic agent or proccdu~. the dose and schedule to be used in subsequent trials are determined in Phase I trials. Par example. with traditional cytotoxic agents for cancer treatment. this dose is generally known as the maximum tolcmted dose (MTD). alliloogb othcrchoices. e.g. dose level before the MTD. may be used. In subsequent Phase II trials. patients are In:alcd at the dose level established as safe and BlXlCptable in Phase 1 trials to scn:cn if the ltatmenl has sufJiciently promising clinical activity or therapeutic ellicacy. usually evaluated by the PROBABlUl'Yof treatment success. e.g. objcctiye raponsc in c8llClCr. for further investigation and deveJopmenL Since Phase II trials serve as initial screening. it is desirable to achieve the goals or the study willi a minimal number of patients so that as few patients are given inactive lrealmCnt as possible. Also il is importanl to minimise the 1}-pe U CI1'OI' probabUity ofrcaching a false negative conclusion (see fALSE NFlIATIVE RAlE). To this end. sequential designs have been proposed in which a fixed number of patients are ac:cnIed in each stage and the study is stopped early if the observed number of tn:almcnt successes is too small.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PHASE IIIlRlALS Gchan (1961) was the first lD propose a t~stage design ror screening in Phase U trials in which 14 patients arc initially accrued during stage I. 1be study stops ir no treatment success is observed and otherwise anoIher cohort of patients arc acaued in stage 2. The number or palicnls during stage I is chasen lD keep the probability of early termination very small. say <0.05. when the treatment is active with the probability orlrealment success p 2: 0.20. The sample size ror stage 2 is delennined to achieve a specified level of confidence in the estimation of the probability or treatment SUtlCCsS with the maximum sample size. Later this notion of t~stqe designs was formalised as a test of statistical hypotheses regarding the probability of treatment success. as in Simon (1989). Multistage designs ha~ also been formalised by Fleming (1982). among other&. In general. a t~stage design is deftned by the numbers ofpalients to be accruc:d. and"2. and the boundary values,r. and r. during stages 1 and 2 respectively. and is denoted as (r.1 n •• Tin) w~ n =n. + is the maximum sample size. During Slap: 1. n. patients an: initially enrolled and lmdccl If the numberoftreatmenl suca:sses during stage I is less than or equal to TI. the trial is terminated for lack of therapeutic efficacy and it isconcludc:d thai the tlalmcntdoc:s not warrant further investigation. Olhc:rwise. the study is continued lD stage 2, during which an additionaln2 patients an: enrolled and lR:atcd.lfthe lotal numbcroftreatmcnt successes after stage 2 exceeds r. it is concluded that the treatment has sufficient therapeutic eflieacy and it is. thcn:fCR. considerm for further investigation. Early atlCICptanc:tc or the treatment is not considcn:d~. as then: arc no ethical reasons 10 do so when there is evidence ofthcrapcutic efficacy. The ethical impc:ntivc ror early termination OL"CUI'S when the trealment has uRllClCCptable thcrapc:utic efficacy. Fwthcrmon:.. when the~ is sufficient therapeutic emcacy. there is often intcrc:st and desin: in studying additional patients to assess the safety or lR:almcnt m~ extensively. The sample sizes and the boundary values for such twostage designs can be dctcnnined based on a lest ofhypothcsis. Consider testing Ho:P $ PoBiainsl H.:p 2:P. with Type land Type II error probabilities a and P. whe~ p denotes the probability of treatment SUtlCCss. The value of Po is chosen to represent the maximally unacceptable level of thc:npeutic efficacy and the value of P. is chasen 10 ~present the minimally acceptable level or thcnpcutic efficacy. Thc:sc parameters (Po.P •• «. fJ) an: the design parameters. Then the probability of early termination because or insufficienttherapc:utic emcacy indicated by no m~ than r. treatment successes is given by Pf!I'=B(r.; nl. p). whe:re B denotes the: BlNOMl.Uo DJS11UBUfION runction and the expected sample size for the lJUc value or p is determined by:
"I
"2
E(Ntp) =
n. +
(l-PET)"l
1be probability of rejecting a In:alment with a success probability P is thus gh'ell by: B(ra; nl, p) +
minJ!l.rl
L
b(x; "I. P)B(r-x; "2, p)
X='I+a w~ II denotes
the binomial probability mass function. Simon ( 1989) proposed two-stage designs that arc optimal in the sense that the expected sample size E(NIp) is minimised when p = Po with insul1icicntthenpculic el1icacy subjcctto the 'JYpc I and U error probabilities. Despite the minimum expc:cral sample size. the maximum sample size" =n. +"2 for the optimal desip can be much larger than oIhcr designs. which is undesirable. Therefore. Simon (1989) also suggested the minimax design. which minimises the maximum sample size. agam 5ubjcclto the same"JYpc I and II cnorprobabilities. Howe~. Simon's minimax and optimal designs sometimes result in quite difl"~nt sample size n:quin:menls. For exampic. the minimax design can ha~ a much larger expc:ctcd sample size E(Mpo) than the: optimal design and the optimal design can have a much Iqer maximum sample size " than the minimax design. for example. with the design parameters (Po. P •• a. fJ) = (0.3. O.s. 0.05. O.IS). the minimax design is (r.ln •• rln) = (14137, 17142) and the optimal design is (7121. 19/48). wbcmls the expected sample size is 37.6 and 28.5 for the minimax and the optimal design n:spc:clivciy. Jung el QI. (2004) used a Bayesian decision-the:oretic criterion of "admissibility to define a class or designs basc:cI on a lass runction. which is a weighted avenge or the maximum sample size and the expected sample size uncIer the NULL HYPOIlIESIS. the two criteria used by Simon (1989). The admissible designs include Simon's minimax and optimal designs as special cases and compromises between Simon's minimax and optimal designs. For our earlier example. one can rmdanadmissibledesign(4115.18I45)wilha maximum sample size of 4S. which is a compromise between the minimax and the optimal designs and an expected sample size or 29.5. which is close to the expc:ctcd sample for the optimal design. These compromise admissible designs can be found on a eoRvex hull formed by Simon's minimax and optimal designs. The admissible designs ofJung el QI. (2004) can be easily generalised to any number of stages. KK
f'IIIIIIIIa. T. R. 1982: ODe sample multiple testing proccduJa for fhasc II clinicallrials. BiMltlrit.f 38. 143-51. Ge..... E. A. 1961: The ddamination of the DumIJer of patienlS mauiRd iD a follCM'-ap trial of a new chemothcrapeuaic: ~L JtJUmQ/ of Chronic Dismsrl 13. ~S3. J ..... S.-H., 1M, T.t KIm. K. ..... Gtraae. s. L 2004: Admissible ttAo-state designs for Phase II cancer clinical trials. SlalUtia in MedicineD. S61-9.SbDaII, R.1989: 0pWnaIl\\'O-SIqe desipdor Phase Dclinicallrials. ControlledCIi"ialITriDl" 10, 1-10.
Phase III btals New medicines. in parlicularnew drugs. an: generally developed in a phasc:d process. Phase I trials
333
~EWTRutS
_________________________________________________________
and Phase II trials an: small lrials, Ihe ronner cencrally in voluntcc:n. the laucr in patients. and tenclto be learninl or explandol')' trials. In contrast. Phase III trials an: larger and ~ confirmatory. Phase Ullrials provide confinnation or the prupc:llies or the new medicine Ibal ha\'C been discoveml in early phases or the development propammc. In anIcr to be appm~ ror IIIIB"kctiRlo a new drug mull be sbowD to comlitute a worthwhile cantribution to medical batment. Phase 10 trials all' desipeel to identiry an appr0priate populalionof'paticats who~ better tn:aIcd with the new drug thm willi adler tn:allnc:Dts. Additionally. they should idealil'y those palienlS who do not benefit Iiam IRatmcnt with the new cIrug and those who may be harmed by its use. 1be prime loal of such trials is to IaxJlllllll:nclln:atmcnt sInllc:lics. includinglhe apprapriale close. ar cIase liInlion. ror the pRsaibilll physician. 'Ibis goal is achieved by the detailed label whose conlcnt is inrannecl by the n:suIts or all three phases or drug developmcnL although primarily Phase 10 trials. Phase lU clinicallrials will generally exhibit all the characlCristics that have 4XJIDC to be assoc:iaIcd willa the ·mythical· gold standanI biaI. They will genemlly be rancIomiscd.. be IUn double-blind. FI.ACEIIO controlled and conclucled in a well-defined populalion or patients. They will mast often be parallel IIOUP. fixc:d sample lrials. However. the~ is a growing appn:dalion Ibat group sequential. adaptive or ftcxible designs provide drug sponsors with the opportunity to manage lheir drul development programmes in a mo~ eflicient way. nus class or llial. is characterised by the ability to make changes to the initial seUing or the trial. in such a way as to protect the TYPE I ERROR. 1'11e changes include the droppinl or treaunenl anns rar inefficacy (rulilily). including the sloppinl orlhc llial. ~slllllple sizing based on learning aboul eithcrlbe variabilily in Ibe llial and! ar the eslimated effeci and early stopping because or mounting evidence or a large beneficial effect. The ability or drug sponsors to stop individual trials early because or evidence or a large beneficial effect may be reSlriclc:d by the rcquin:menl for a pmscribed level or eqMJSUR: or patients to the ellplondol)' mccIicine. For example. il may be that a drug sponsor is rcquiml 10 have SOO palients expasecI to the explamtary mccIicine rar at least 12 IDOIItM. Phase lU bials ~ normally conducted in pamilellraups and while it is nat unknown for a llial to include more than one close or the experimental medicine. such IriaIs are noalhe norm since the appropriate dose will nonnaIly have been chasen in Phase II. Oencmlly. two Phase IU IriaIs an: requin:d ror the ",gistralionofa new medicine. One rason forthis is that to rcqui~ two trials each to be significant at the onc-sicIcd 0.025 leyel com:sponds to an ovcnU 'JYpe I error of' 0.000625 CXJIRspondinllOa F.u.SEPOSIlIVERAJE or 111600. The use ora one-sided SJONlFlCANCE la'EL is ncecssuy since we clearly rcqui~ both lests to be positive. Ihaa is the In:almCnt effects ~ in the same dim:lion. It can be shown that ir this is the
only rcqu~menL then there are m~ efficient ways toensun: its achievement othcrlhan "'Quiring the individuallrials to be separately silnilicanl. M~ recently. the Federal Dlugs Administration (FDA) in Ihe USA bas set condilions under which il is possible to ",gister a new medicine on the basis ora single. IIIIJC ruxJCAL 1RL\L with a smaller than usual Type I envr. 1'11e circumstances lend to be when lhere is a critical need among scnously ill palicnls. In such cin:urn.stanccs it is not unusual ror Slanclanl praeticcs to be ",iaxcd. One well-known example is in the development of lRalment far HIV and AIDS in which Kplalars. under pn:ss~ from patients. ",Iaxed SlandanI clinical praclice. The choice or connl in Phase 10 trials is a nudIer of cansiclenble debate. It was noted earlier thaa Phase IU IriaIs are generally placebo conttalled. There are. however. ciraanSlanceS in which it is DOl possible ethically torcquin: palients to be trc:ab:d with a placebo. Such circumstanc:es typically. although nat exclusively. arise when the disease under invesligation is associaled with a monaIily outcome. The n:a:nt Inlcmalional Confen:nc:e on Harmonization (lCH) guideline (EIO). entided 'Choice of cantrol group and n:lated issues in cUnicallrials·. acldrases a numbcrofimpartalll issues when an active cantlOl poup is chasen and the prirnaIy aim or the study is to dernonstnlle noninferiorily. or prime impallallCe in this canlcxt is what EI0 terms II.ua,.J"tftJillvil,.. Assay SCDsilivity is the ability of a clinical IriaI to dift'c:rc:nliate bctwcea etreclivc. minimallyetrc:ctive and ineft'ccli\'e tn:aIInenls. Assay sensilivity is impartant in an trials. bul has particular implicaiioM in lriak whose prime purpose is to delllODSlrate naninferiority. Ir a dlUg spansai' ilRnds to claim etrecliveness or a new medicine by showinI it to be noninferior to aclive cantml. but. the lriallacks assay sensilivity. il is passible ahat an inetreclive ImIlment will be round to be noninreriar and lherc:by lead to an erroneous conclusion or eftic:acy. The EIO guideline points 10 a numbcr or issues thal need to be acldn:ssed: Was Ihe appropriate palienl papulalionchoscD? Was Ihe approprialc dose or the aclive comparator used? Was lRatment with Ihc active comparator or Ihe appropriate duralion? How was the noniDferiorily margin. whicb defines nonimeriority. chosen? Is there historical evidence that Ihe design has in die past shown itself to be capable or distinguishing effcclb'C from ineffective tmIlmc:nlS? After the conclusion of the experiment. the luideline requires a demonstndion thai Ibe ~sults obtained on the active comparalor are similar to those oblainc:d in previous trials. AG
Ph_ IV trials No talally adequate definition has yel been arri~ aL One working dcfiniliaD. hOWCYCl'. is that Ihc:se types of IriaI encompass all the studies uncIcltakcn after obtaining a marketillllic:cncc. While studies canduclcd in the run-up to the rqistration pnx:as an: rqanled as adequate for the purpose or dcacnnining whclhcr a new dJuc is efticaciaus
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PHENOTYPE
and in lillie part safe. the:y do. by their \ICIY natum. leave imparlBnl questiaas unanswen:d. As an example. during the JR-.lInIlion phase. the: toxicity of a drui could not have been accuraIeJy asscssc:d in a Phase UI clinicallrial if the incidenceofagnuaulacytasiswas I in 20 000 or less. In general. Phase U and PIase IU sludies an: reslriclcd because they arc based CIIl limited nmnbelS of palicats. a limilcd duraliClll or patient exposure and a Ialrick:cl population of padents. Phase IV bials arc desipcd to answer detailed questions about the practical usc of a drug. There IR many different aspects of this.. not an of which will be studied at Ibe same time. Such lrials may be conducted to inveSlig* other doses or the: scheduling of a new lIeallnent; they may be useclto look at side:-efreclS in more detail. particularly in lOllI-term chronic us.: they can be used to ilweSlipte drug efftCKY in long-term usage ",hue. rar eumple. the course of a discasc may be modiftcd over a period or months ar ycam: they can be used to collectwmparative dataofa ICIIlg-term usage if the original studies ~ n:slric:led to a comparison apinst PLACEBO: they CaD be used to investigate new uses and indicalions leading. potentially. 10 a new submission; arlbey can be usecllo investigate alternative populations. Phase IV trialslR usuaIlyeanductcd after a chvghas abady been tluaugh the rull trial pnlCcssand bas already 101 a lic:cnce fmmthe aUlharities rar gc:neml use. although Phase IV trials can also be nm during the ~5lration process itselr. In many aspects.. Phase IV trials an: similar to PIuuc IU trials. They will. in the main. be nndomisc:d and CIDIIboIIcd. allhauP not generally placebo controlled: t\uthe1lllOR:. some of lheir 0bjectives may be: ilMstigalcd by unermlrolletl studies. It is impoJ1antlO distinguish between Phase IV trials and posl-marketing surveillance (PMS) ar IDOnitotai n:1easc slUdies (MRS). A PMS trial wiD tenello be observational and nonintervenlional and will be conducted primarily to monitor safely in a ne:w medicine shollly after it has begun to be IRsaibed in daily practice. While then: may be: simple measun:s or efTlaIC)' included so that additional rislclbenefit judgemealS may be made. this is DOl the prime purpose ortlle trial. A rurther type of study that may be conducted pastrqiSlndion is a so-called "seedilll' study. A "seeding study' inWJIves the drug spGlLtOJ" pJOvidilll a new medicine. to physicians in order 10 familiarise them with its use anelto cncou.... them 10 pn:scribe iL In such studies then: is neither intention nor attempt to galher data that could provide useful scientific or medical iruarmation. Man: n:cently Phase IV mats have been used in order to collect health economics. also known as outcome n:search. data concernilll n:saun:e ulilisalion. Such data IR laluin:d in anIer to provide: inronnation on the cost-effectiveness of new medicines for reimbursemenl qencies. Allhough this information may be n:quin:d at. or shonly aftu. rqiSlndion of a new medicine.. ron:iag its wllection durilll Phase II and Phase III mals - SCH:811ed piggybackin& - may not be
appropriate. Phase II and Phase UI trials are highly contnJlled scienlilic experiments designed 10 minimise \'Sriation and 10
make. it possible to deleet smaD. but clinicaUy n:levant. tn:alment effects. In CIIlC sense. therefore. they an: anilicial and do not n:pn:sent the e.nvinmment in which new medicines will ultimately be pn:scribed. 1bis has a number of c:onsc:quences. Fust. pmtocoIs in Phase W trials may include invaIiplions thai an: nonsIandanI in the clinicallmdmcnl ofpalicnts but an: enlin:ly appmpriaIe in a aJNICAL DL\L to monitor the safety of patients. If these "extra' invesliptiaas an: aOocatcd against the new medicine: theircost may lIAS the cwaall e.wIualion of its casl and hence its cast-ell'ecti~. Second. health 0IIIt'cJIDes may be man: wriabIc than the: clinical EJIUOIN1S studied in Phase IU trials. Ir. Ihc:man:. health ecanomie ouIcomes an: piggybacked on a Phase U trial this is libIy to iDCMaSC the sample size and lead to an even p:ala'C05t of chi development than hithc:rto.IdeaDy. hc:aIth c:canamic studies should be: eonduclcd in a IIIIR l1liunWslic setting. 1bey will laid to be Jaqe trials. usinc e.ndpainlS that an: palic:1II centric so that quality of lire oU1camc5 (see QUALI1Y OF LII£ ~) may IRcIominate OWl' clinical OUIcOmes and they wiD laid to be simple with the minimumofinf'onnalion being e»IlcctecI. Such trials may man: appro.. iately be carried aut in a Phase IV propammc. AG
phenotype "Ibis tenD refen 10 the observable chamcterislics or an individual or orpnism. as distinct rrom its OENOTYFES. which. berom modem molecular genetics. wen: DOl observable. Allhough many gellDlypes an: also now measurable. the distinction is still useful. Phenotype covers a wiele IBDge or possibilities, including diseases (affecll:d or nal affected). quantitative baits (e.g. blood pressure or height) and biolOJicai measurements (e.g. blood glUCIISC levels). In some cases.. there is a close n:1atiolWhip between a particular gene and a particular phenotype. Far example. in the ABO example. theft arc four main blood groups. as mc:asuml by serology. detennined by the ABO gene: A. 8. AB and o. 'I1Iese phenolypeS IR determined by the genolypes: individuals with geDOlypc:s (A.O) and (A.A) arc group A; genolypes(B.O) and (B.B) give rise 10 group B; (A,B) gives rise to group A8; and (0,0) 10 group O. Most phenotypes 811:. however, n:1ated to genotypes in a man: complex rashion - mast diseases and other cammon phenotypes arc related 10 genotypes at many genc:lic loci and arc also inRuc:llt'led by environmental or lirestyle ractOlS. The IIIIOIIABa.mES with which particular genalypes arc associated with particular pheaotypes an: n:fern:d 10 as penelrtmces. DE (See also ODIEI1C EPlDEMIOLOCJY, nYIN ANALYSIS) BIIM... D.J.,B........... C......,C.(ccls)200I:HtmtlbooJc o/slatist;talgtnelia. Chicbcsler. John Wiley 4 Saas. Lid•....., p. 1998: Std/utics ill latman genetia. Laadon: Arnold. 335
AECHARr ____________________________________________________________
pie chart This is a graphical dispJay in which a series of liequc:ncies 01' percentages arc Rpresented by sc:ctions of a circle having areas proportional 10 abe observed values. An example is given in the ftg~
pie chart Frequencies 01 first bitlhs in each month in a SWiss town (sample size is 500)
Allbaugh very popular in Ihc media, bath the general and scientific usc or pie chans has been severely criticised (sec 1\artc. 1983~ Wainer. 1997) and tables arc preferable for mast small dataselS. Among particular dlllllers to be awan: of aJDCCming piecharls arc the following: di5lortiaaorlhe basic shape flUID a cin:le.. misJcading use of lhn:c-dimensianality, delaching or slices fmm the pic. nJlaIion to promote or hide a given slice and raillK to show sample size on which the prapoItions IR based. SSE (Sccalso BAR CHART. DCJnII..(Jf.IDSJOOJlAM. S1EM-ANlH.EAfPLOrJ
1'afte. &. R. 1913: The l'i5UG1 display of quanlitotire inpmaliOfl. Oacdaire. cr: Omphics Pn:ss. W........ H. 1997: Vi5UGI,nelatiOlls. New yadt: Springer.
pilot studies These
small-scale Rscan:h experiments primarily undertaken to infonn 01' impro'VC abe conduct or Rlatcd fUlIK n:se&Idt. In general. pilot studies do not in abemsclves aim to generale scienliftcally useful eviclcntle. Instead, pilot slUdies IR conducted for a variety or RUOns. indudilll: testiq reasibility and appropriateness or data eoIlcction (e.g. IR sufficient manpower and time R50urces planned to pther required information?); identifying pr0blems with QIJES'I'JONNAIRE wording. if exlraclilll infonnalion in this manner (e.g. do patienlS' n:sponses indicate that questions ba'VC bec:a properly understood and answCRd?)~ idcntifyiq problems with data proc.:essing (c.g. docsanull. or a blank. rcspaase mean an DnSWCI' is missiq. or nol asked. or no rcspaase.. 01' an unknown DnSWer?): training observers IR
or interviewen 10 equally high standards to. help ensUR unironnity or data qualilY. whenever multiple individuals IR involved in data mllcction and processing. as in most large or muiticentR studies (e.g. islherc a learning curve in how data arc extraded and do all altain a sufftcientlevcl or expertise?); assessiq. alleast initially. anticipated wriations of responses. for this can help sample size cstimation for lhe main study (e.g. when such inronnaaion is either not available or not applicable rrom the published Iiteratu~ being bellCl' than relying on 'educalcd guesswork' alone)~ estimating main study duraliaa and cast (e.g. ir UJICICrtain, both time and money estimates may need to be finned up befCR submitting a ralistic buclgc:1 to poleDtial sponsors). Alabough often OD a bidden asenda. II well-conducted pilot study can also belpsel'YC 10 convince potential spoI1SOIS oflhe main sbldy to follow and Ihalthe rcsean:h team is indeed capableofpcrf'onning lheir intended study. once IIwanlcd abe net'Cssmy funding. Pilot studics do DOl need to be large 10 serve their purposes. Then: may be the need for scvcnd drafts before the ftnaI \'Cnion or a questionlUlim is deemed most suitablc. although one pilot is generally suflicient and its cast and effcxt arc usually well rewarded. On the negative side. pilot studies can cause RSUitS from an investigation to be delayed. a classic case of the compromise between speed and quality. (This is reminiscent of the service salesper50n~s banter: ·You can havc it fall. cheap or n:liable - pick anyone!') Howe\'Cl'. various allempls 10 cin:umvent this problem have been proposed thal involve internal pilot studies. These IR s1Udy designs that seek. 10 incorponae a seamless transition from the pilot to the main study. in which the initial inrormation arising helps to diclatethechoice of the ovcrall study sample size. a similar motivation to some DATA-DEPENDENT DESJONS for CLINICAL TRIALS. See. for instaocc. mnlribulions by Burkett and Day (1994). Willes eI al (1999) and Zuckcrel al. (1999) for a ftavour or thesc devclopments. One waming about running a pilot study is the potential for BIAS brought about by inappropriately combining RsullS fram the pilol and Ihc main studies. especially ir changes occur after the pilot phase. This is 50 even if these chan~ IR as innocuous looking as rephrasing questions rrom positive 10 neplive or flUID open to closed (see QlJESTJONNAlIlB). Pilot palients will be in a dift"CRnt timeliamc. so beware or any tc:mpOnI 01' seasonal wriatiaas in responses. Furthc:nnorc. eligibility criteria for inclusion in Ihc main study and in the pilot slUely may differ in important ways. meaning it would not be obvious to which popuJaaiaa (if any) inf'en:nc:cs from such a hybrid sample mighl apply. In conclusion. pilot studies arc well worth abe inilial effort involved. Too few IR undertaken in advance of major studics. This may be attributable in part to too much msean:h occurring on a tight timescale to accommodaac individuals' medical can:er sacps. Also, it is II10R than a pity ahat much
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PITFALLS IN MEDICAL RESEARCH
publisbcd research is labelled as a ·pilol· as if this were somehow a post-expcrimcntaljustificalion for inadequate. or. wane still. absent.. design planning. For one's own research. always run a pilot study unless there is a sound reason not to do so. Expect to discard ilS results rrom Ihe main analysis. unless it causes nothing to change and it was prc--planJlCd as an intemal pilot study. CRP
0.,.
Burkett. M. A. ..... S. J. 1994: Internal pilol studies for estimatinc sample size. Slatistirs in Mttlkinr 13, 2455-64. WItt.., J. T., Sdaa...........r, 0., Zadler, D. M., Brit..... E. aad Pnwdwn. M. 1999: Internal pilot studies 1:"JYpe 1emH' rate ofthc naive t-ccst. Stalistirs in MrJiriM 18.3481-91. Zarbr, D. Me, Wltlls, J. T .. Sdaabeaberpr, O. aad IhfttaIa, It. 1999: (nttmal
pilot studies U: comparison of various prcKlCdums. Stalistirs in MrJirinf! 18. 349J-so!).
pmalls In medical research Many people working in healthcarc bc:comc involved in the design. eJC«"ulion. analysis and dissemination of medical research at some point in their can:c:rs. DoctoB are taught about medical research and in the course: or their professional training may well be cxpected to be activcly involved in n:scarch. Other health professionals and even managers are also likcly to be required to be activc researchers. Consumers (that band or once passive recipients of heaJthcan: rormerly known as patients) an: also increasingly contribUting to the commissioning and design or research. There will always be those ror whom n:scarch is at the centre or their professional lives and those ror wham it is at the periphery, but many individuals are expected at some point to be n:se.an:hcrs. RcseardJ. heM'cver. pn:seats many pitralls ror the inexpericnced. same of which an: described here in the hope that this may assist ruture n:scarchers to avoid thcm. Rc:sean:h and clinical audit are sometimes conrused and thc noviCle researcher may be uncertain what kind or activity it isthcy are undcrtakiDl. The UK NHS wodting definition or n:searda is 'the aIlcmpt to derive generalisable new knowledge by addn:ssing clearly defined questions with systematic and rigorous methods' (Department of Health. 20(1). while clinical audit can be defined as "a quality improvcment process that seeks 10 improve paticnt care and outcomes through systematic rcview ofcare against explicit criteria and the implcmentation of changc' (Nationallnslitute ror Clinical ExeclleDL"C (NICE). 20(2). The distinction is not an acadcmic ODe. as research has wider ethical and govcmance implicalions than audit. as patients must be protcctcd and the scientific: quaiity of the research guaranteed (World Medical Association (WMA). 2(02). Forcxample. access to confidential patient imormalion ror n:scan:h purposes requin:s independent cthical review and generally the informcdconscnt orlhc patient. but access ror an audit activity aimed at impro\ing paticnt care does not. Approprialc review forn:search should be obtained. including
scientific review. ethical ~icw from a research cthics commillee or institutional review board and ocher approvals. For example. in England and Wales. approval must be sought rrom cach NHS trust hosting the rescarcl1 (sec E11DCAL REVIEW COMMl11EES). Hencc the Rrst pitfall may sometimes be Dol 10 recognise research activity, thus denying panicipanls the protection olTen:d by cthical and scicntific review. Many inelividualsan: invol\lCd in and contribute to research. but they do not all have thc same expcriCRCle or expertise. It rollows that n:searchers who do not seck out the range of skills needed for their projects are likely 10 run into problems. SUt"ClCSSrul medical rcsean:h no 10000r involves a singlc scientist worDDI in isolation. if. indeed. it cver did. Major gmnt-awarding bodies loak Dol only ror individuals with research track records but also ror a rescarc:h leam with appropriate cUnicai and other-collaborators. inclueliDl explicilly identiftcd statistical expertise. Even a relatiYCly junior and isolated researcher should seek to identify a researdl tcam that will support his or her project. This should ideally include senior clinicians and im'cstigatons with deeper knowledgc of the resean:h ama. statistical support and ad\ice (often awiJable in taean:h active institutiollS). Peer support enables rescard1crs 10 benefit tium shared experience. Essential though it may be. collaboration. or. marc particularly. QMtiemic Sllpetl'ision. can causc elimcukics ror researchers. Collaborators and supervisors may haYC overambitious vicws of what their researchers can achic\'e and may underestimate the time requin:d. Supervisors may havc their own special inten:sts that are not shan:d by students or that do not c:onslitute appropriate research ror thc students' objectivcs., particularly academic oncs. Indeed. marc than one supervisor may ha\'C confticting special inten:sts. Then: are no easy answers. but early development or an agr= protocol ror the n:scarch and a project plan ror student's time offcn some protection against confticting inten:sts and changes of direction. Why might raiUDI to seck statistical collaboration be a pitrall? This can be answcred by consideriDl what a statistician cancontributc to a research project. Researchers whose memory orlcaming basic statistics CIOIISists ofa succession of hypothesi5tcsts might be surprised 10 find that a consullalion with a statistician is unlikcly 10 rocus on the statistical tests required in the proposed research. Discussions will focus on. first.thc question that the research will addn:ss. moviDl on to the appropriate n:se.an:h design to answer that question and the dala to be collected. Only then can the proposed analysis and the required samplc sizc be considered. The slalistician contributes 10 the whole n:scarch design. not merely Ihe statistical analysis (see CONSU1.11NO A SlATIS11C1AN). Ir the role of slDlistics in research is about design as much as it is about analysis, what problcms are cneountercd when: the research design is inadequate? The pitfalls involved in procccdiDl with research without a clearly thought~t 337
PflF~INMED~ALRE~H
______________________________________________
n:scarch design an: several: the finished ~search may IIMwer no clear question: although the n:searcher had a clear question. an inappmpriate design that cannot giw: a clear answer has been used: some or the inrormation needed properly to answer the question was not coDecled; the study size may be too small to answel' Ihc n:searchcr~s question. Wby docs it matter that a racarch proposal should addn::ss a clear question? Medical ~search progresses by framing hypotheses and addressing answerable questions. and patient involvement in n:search cannol be justified when this does not apply. Without a raearch question. a statistician cannot develop a rcsean:h design. just as an architect cannot develop a drawing unless he knows whether a house or a railway station is "",uim Research quesUons and racarch designs an: so closely linked Ibat once a n:sean:her has fnunc:d a clear question. the appropriate design often becomes apparent Widxtut an a priori answerable question. the daIa collected may well be inadequate to proVide the answer to any userul question. Some racarchcn might ftnd an evidence-based approach to medicine (see E\'1DENCE-BASm MEDICINE) helpful in thinking through clinical racarch qucs1ions. Diffe~nt types of clinical question. e.g. concerning lreatmcnt. prognosis or diagnosis. can be rramed. and then a ~scan:h protocol with Ihc appropriate design to answer the question can be developed (Sackett el al.. 2000). For example. a question conccmiq a medical intc:ncntion needs to deftnc the population to be studied. an intervention and a comparator and an outcome. The design most likely to CXJDtrol for confounding in an intervention study and thus provide high-quality evidence is a nmdomiscd CXJDlmlled llial (see ruHlCAL llUALS). In specifying the question. the prdem:d ~search design has also been ddennincd. Not all n:scan:h has a clinical focus - it might be epidemiological or laboratory ~5Carch - butlhc n:scarchc:r will still need to have clearly fonnulated. n:futable hypotheses. ir a protocol designed to test them is to be developed. A raearchcr may have developed a question. but is it the right quellion? A further pitfall is ahat the n:search question may aln:ady have been adequately answen:d. Would the proposed n:scarcb add anylhiq new to the existing literatun:? Once the component parts of an initial question. e.g. the palient population. the precise intervention. the most valid comparaton and outcomes. an: clearly defined. they can be tumcd into search tenns to be used in n:viewing the medical literature. perhaps to be used in a ronnal systematic review (see SYS1DtATlC REVIEWS AND META-ANALYSIS) (Chalmers and Altman. 200 1). Iflhc question has been answen:d or there is a good-qualily saudy in ~s likely to give a definitive answer. then CXJDtinuing to plan a new study is neithcrethical nor cost-effeclive. Even when: a thorough literature search CXJDftrms that the R:scarch question has nol previously been adequately answered. it may be that the literatun: or conlact with experts. clinical colleagues and patients suggests that
the initial qucstion was not clinically ~levant or has already been partially answcrccl. and thus the question may need to be modifted befon: developing a racarch protocol. Once the racan:b question has been rramed. inappropriate ~search design JRSents the next pitfall. II is possible ror a ~searcher to have a dear n:sean:h question. but to have chosen a racan:b design lhat cannol answer it An epidemiologist cvalualiq the ~5Ult5 of a study will CXJDsider whether they an: a chance IInding. ~ explained by conrounding (when man: than one factor an: associated both with each other and the outcome so that il is impossible to say what Ihc true effect of each is on the outcome) or by BI..\S (Rothman. 20(2). 11Ic n:sean:h design must answer the ~search question in a way that minimises the inlluence of chance. bias and confounding. Bias is any process that causes the study n:sulas sySlcmatically to depart from the InIC n:sult The selc:ction of participants can be biased. pcmaps because the sample is dmwn from a highly selected hospital population but the intention is to generalise the n:sults to the whole population (sec SB.ECTJON BlAS). 'J'bc measun:ment of outcomes can be biased by poor ascmainment or poor n:sponse. by bias in Ihc rcconling or infonnation or by inadequate follow-up. Sometimes proposed n:scan:h has a biased design because the ~an:her starts with the data that ~ mosl easily available. not with the daIa thai can answer Ihc question. Suppose the accuracy of D blood test is under consideration and a ~searcher has access to tell rcsullS and lJ55OCiatc:d patient records. Suppose that diagnosis n:quires expensive tests. It is likely that only the most sew:~ and quite probably only those with positive lest n:suIlS will bave the expensive investigations and the faults will reftecl this bias. inftating the appaRnt accuracy of the test. If the racarcher had staJtcd by considering an optimal design. then a COHOin' S11JDY including a n:praentativc sample of patients. all of wham receive both tests. would be the pn:rem:d design. Uncbpinning clearly framed ~search qucstions there should be an understanding of how n:search design can as rar as possible avoid bias and confounding. whelhcr that undentanding is mediated through a lraditional approach 10 epidemiology or to the design of experiments or by evidence-based medicine. A further pitfall in Ihc design of a research study is thatlhc data collected may be inadequate for the plllpDSe. 'J'bc chosen outcome mcasun:s may not adequately measure the chosen outcome. Standardised inSIIUments (including QUESTION. JlWRES and psychologlcal tests) with known validity and ~liability should be used when: possible (see !.IEASUREMENT ERROR AND RELL\BDJ1Y). If ~sean:hcrs an: developing Ihcir own outcome mc:asun:s. whether laboratory tests or questionnaires. they should investigate the proposed outcome's properties. All questionnai~ should be piloted. prderably using Ihc target population (sec PILOT STUDIES). Investigaton should be wzuy of conrusiq process with outcomes. Some
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PfTFAll.S IN MEDICAL RESEARCH potential outcCMlle measuleS will DOl be available far all patienls: e.g. the gold sIancIard for clinical clia&nosis may depend an histological eaafirmation of disease that can anly be obtained in the moll iD paliCDts. Hence. pralOCDl outcome definitions may nced 10 be pnlImalic:ally adaptc:cl to the n:aliUes of the clinical scttiq. Impadant data items may be amiUc:d.. Forexamplc:. it is DOl uaknown ror inexperieac:c:cI n:sean:hen interated in the survival or lheir patienls carefUlly 10 collc:cl infcxmalion
about dc:adls DmOIIltheirpalicDl5. but nat explicidy to I'CI.'lOId the last dale livinl patients wen: ICCII., without which stm1YAL ANALYSIS is impossible. Impaltanl eonfoundiag facton (ace and sex shauld almost always be n:carcIecI) may be omitlCCl Consullation with clinical and slatiSlical collaboratan and ca~rul Jading ofpn:vious reieyantstudic:s will pn:VCDI some of these mistakes. It is equally possible to coUect rar IIIIR data than is necc:ssary, particularly when .....y b:sls 1ft repealed as part of clinical cam or n:sI:lEh is designed by larIe. mullidisciplillll)' eommilleeS comprised of spcci~ whe~ each wanls to pmmale his or her OWD an:a or clinical inlen:st. Decisions should be madeal the outlCl on which lime poinlS data will becolleck:d and what will be measun:d. A n:sean:her might. far cxample, want to know the lowest while cell count followiqa bane: IDIII10W transplant or the lime point al which the while cell countla:O\'Cred. but will ClCltainly nul nc:althe n:suIts of daily blood tc:sts. All of this should be specific:d iD the n:sean:b piOtacoI. not after the end of data collection (see IIIORJCOU FOR CLDIICAL 1RL\I.S). Resean:hen are often disappainlc:d whc:a their appamady inlen:stiq data do not show Slalistically silnificanl results and thus may have occum:d simply by chanc:c. To avoid this disappoinlmc:at. a n:sean:her mull before the n:sean:h IIads .calculate what sample siZe is n:quired in order to be n:asanably certain thai. if the desired n:sults 1ft obtained. they are precise c:aaugh to be convincing. as demoaslnb:d by a II8II'OW CONF'IDfNCE lNIERYAL or by the IaIIlt of a hypadlesis
24
I
I.
z
22 20 18 16 14 12 10 8
II:Sl that is slatisli.cally
sipificant 81 _ pn:-specific:cllevcl. To ellimatc how many patienls 1ft needed (a sample size calculation) the n:scaR:her mull Iypically spc:cify what die: n:5elEher expects to find. e.g. the outcome in Ihe conbVl group. and whal the n:scaudIer hopes to ddcc:1. c.I. _ clinically impaltanl dilTcrencc that might be achicYCd with _ new IIealme.nI. The resean:hcr also needs to specify acceptable slalislical power so that lhLR is a n:asoaabIcchancc of detectiqsuch a dilTclmCc if it docs exist (the minimum acceptable is 809f" wllicb wiD detect _ bUe n:suJt in 809f, of samples with _liven level of stalistical silnilicancc) and Ihe sipificanc:e Icvcl (typically «=0.05) or pnx:ision (i.e. Ihe confidence interval. typically 9S4Jt). Statistical aspc:cls of samplc six calculations an: straightfarwanl, pven appropriate refc:n:ncc boolcs. tables (e-l. Machin el a1.• 1997) or soft~ (c.g. NQl&Y ADVISoR). but dccidinl whal is a wadhwhile oulCome is not (sec various SAMIU SIZE DEI'ERMINA11ON cntries). It is c:asy to manipulate the inputs to Ihe calc:ulatian by adjustiq the n:scaudIer·s expcclatians and thestali51ical powcrrequirecl. AU toooOen_ re5c:1Eha- approaches a statistician far _ sample: size ca1culatioa in the expeclalioa dac answer will be pm:isely Ihe number of patienls aYailable. It is impartanl. however. 10 explore what the samplc size n:quilanents would be if die: iniliallllSUlllplions wen: __g. RcselEhen should aim far samples larpr than the minimum nc:cesury to identify _ subst.tial impnwcmcnl over IIandarcItre_lment. ~ may be few daIa to infonn calculations and the lanptalioa is 10 undcn:slimalc sample size (C-l. byassumiq implausibly small STANDa\RD DEVIATIOJIIS). As an iJlUSlralion of whal can go wmng. CXJIISider a placebo-c:OIdIOliai trial when: then: is good rasaII to expcc:t a 60CJt raponse with the inlclYCntion and where a = 0.05 and FOWER is909L. JILAcDO respoILII: I'IIles 1ft aatoriously variable. If an overoptimistic reseaId1erlllSllllleS a 30'1, placebo n:sponseaad thcaclUal nlc is only SCJt higher. then SOCJ, naon: paUents will be requimI (sec Ihc figure on pace 339).
Response rate: Aespcne rate: 60% drug. 30% placebo,
n - approx 120
Response rate: 60% drug, 35% placebo. n = approx 180
80% drug, 40% placebo,
n =approx 280
:~~~;:~~~~~~~~~~~~ o. pItfIIIlsln medical .. I.ardl Impact of tJIIel8llt levels 01 pIacBbo tesponse on ssmpIe size
PnFAUBINMED~RE~H
______________________________________________
Delerminalion of sample size is nul simply a statistical calculation. Rcsearchers often oYCrestimalc the availability ofpalicnts.ln prada, paticnts do nul always COIIscatto laIcc part. clefaull fRJID clinics, drop aut. move house and eYen die at illCOllYcaicnt limes (sec DIIIOFOUI'S). Medical rcconls arc oRen indcfinilCly unawilablc if not explicitly losL A generous allowanc:c should be acldcclto the sample size to allow for such contingcacics. What can be dane if the n:seaJdlcr c:annoI n:cruit enoup palicnts to meet a n:alistically cstimarcd sample sizc? If a proposed study cannol answer the proposed question. it would be unethical to ask paticats to talec part and the study should not 10 ahead. If a siDlIe-ccntre study had been proposed. it might be possible to find mllaboratan in OIhcr a:nln:s in order 10 obtain suflicicat numben. allhoulh YUIll. CENTRE lRIALS have further potential pitfalls c:ampan:d to singic-ccatre studics. Aftercarcrul considcnllionofthc: rcscan:h question and the dcsip. the n:searcher should havc developed a n:sc:an:h protocol (sec FIlOJ"OCOI.S FOR CUMCAL TRIALS). This should be sufliciently detailed to proyide a clear pietun: or what the slUdy will in\·olvc. Studics basc:d OIl inadequate or eYen nonexi5lCni pnJtocoIs an: likely to fail to prvviclc clata ahat match the n:searchcr's objectiyCl. If the saudy aims arc unclear in the protocol. if it docs not clearly specify which palients an: to be n:cruitcd. what is to happen to them and what data are to be collce. then everyone involw:d in canyiDl out the study will shan: Ihc same unc:crtainlies. While not all studies need the detailed plOlOCOls of commercial drug trials, the proIocol should describe Ihcn:scan:h aims and lay aut the activities IICCCSsary 10 achiCYC Ihcm. The n:seardIc:r might ask the qucsaion: "Is lhc:n: enough informatiOll in Ihc prolocol for someone fmm outside the n:scard1 team to understand why the study is beiDl carried aut and what would be needed 10 replicate it?' If the answer is "no'. there is the danger Ihat the resc:an:h project will DOl follow the research design and will provide mc:8llillllcss or biased n:aults. The n:sean:h proIOcol is at the heart of good clinical pnc:ticc (GCP) in n:scard1. Ihc code of conductahat aims to prolcctthe rights and safely of resc:an:h subjects and to ensure the scicatific quality of the n:sc:arch. While some aspc:cts of OCP arc specific to drug trials (International Confcrence on Harmonization (lCH), 1996). mast or il applies to all clinical sludies (Medical Rcsean:h Council (MRC). 1998. 2000). Scientific and ethical revicw of the protocol ensures that the resean:h design is yalid and ethical. Adherence to the pmlocol while the resean:h is in progn:ss should cnsum that the study aims an: meL Many novia: resean:hcn have Iiulc awan:ncss ofGeP. but adhcriDlto its principles would avoid many pitfalls for the rcseardI project in )JIOII'CSs. The principal ill'VCSligator and other SUitably qualified pcnonncl involval in the n:search should be iclcnlifled and. particularly in drug trials. respan-
sibilities should be explicidy dclc:galcd in writing. In commonsense lcrms, this means thai all members of Ihc rcsean:h ICam should uncIcJstand their mlc in the project and should receive adequate InIining. 11Ic scientific qualily oflhc saudy is assumI by adhcn:ncc to the pRIIocol. but this is only meaningful if it is passible to monitor that adIIcmncc. This is doac (extcnsi,,,ly in commcrcially sponsan:d trials) by comparing the study documentalion. Ihc "casc n:port fonns', against the sou.u dacumcnlalion. c.g. palienl noles and labondoly n:corda. Thus. nol only should the study be meticulously cIocumcntcd but also patients' n:conIs need to be maintained to high standards. Even where a study is nollikely to be audited in any detail. aclhcn:ncc to the protocol. careful study documcnlatiOll and maintenancc ofhigh-qua!ily palient n:cords will help ensure that all subjects successfully complelC the slUcly and will contribute 10 the quality or the daIa collected. COIlYCr.SCly. in saudies where standards of n:cord keeping arc low. even Ihc inYcsli,aIon will nnd it diRicult to find aut what was actually done. should Ihcy need to go back to the slUdy n:conls and raw clata. Siudy nx:ords must be kept securely and. in clinical resean:h. separately fRJID the patient n:conl. which should conlain all impaltant infonnation. including that the palical has givcn infonncd conscnL Study n:conIs should be an:hival after completion of the saudy, so lhat tIIcm is an audit trail should them be any qucriesconceming the way Ihc saudy was carried out or the accuracy or the data. A n:sc:an:hcr will nol be able to start a projecta. soon as the proIOCOI is fiaalisc:cl. Enough time must beallowc:d at the start of the project to obtain ~h ethics appmval. any inslitutional approvals n:quin:cl and any n:gulalory approvals requital. e.g. a clinical trial authorisation for a trial or a medicinal product in the UK (scc MEDICINES AND HEAi.11IC'ARE FRODUtTS REouUTCIlY AGENCY (MHRA). The time n:quin:d should not be undcn:stimalc:d. as n:sean:h clhics committees generally havc queries that must be answcn:d befon: final approval is given. Often these queries concern incompletc application fonas or badly drafted palicat information and might have bcca avoided if guidance on drafting fonns and leaflcls had been followed. Neycr allow only the minimum possible time for approvals to be obIaiaed. Even after all the data have been collc:ctcd. some further pitfalls n:main. Before pn:pariDl the dala for analysis. Ihc coding of cIaIa itcms should be consiclcn:d. as tidying up a messy cialasct can add COnsiderably to the lime nccdc:d for data analysis. Codes should be allocated for missing data. lakiDl cam to distillluish Fnuine missing clata flUID ·unknown' and zcm entries. Each palicnl should be allocated a saudy number. 10 that individuals can be easily identified in data analysis if necessary. The use or a database. with an appropriate form for DATA EN11lY incorporaling checks on Ihc data entcn:cl can facililalc quick and accuralC cIaIa enlly. Databases ~ preferred to SJR....ccls as it is casier 10
ensun:: Ihe allTCCl clara a~ cntcred ror a panicular case. If. howc\'cr, a spn::adshec:l is used. one row should be used for all the infarmalion on a singlc paticnt and can:: should be takcn when soningthe data 10 ensun:: Ihal the: allTCCl data remains allachcd to Ihc correct paticnl. AI least a samplc of data should be checked by double data enlly or proofn::ading (sec DATA MANAOEMENT).
Data should be kcpl securely and the puswanl pralCCtcclto pnIICCt patic:at confidentiality. should always be lIDDDymiscd and should be backed up I'qularly solhal al worslonly the last few cases cnlcR:d 1ft lost. Data pmteclian is a Ic:pI Rquin::menl as well as good pmctice: lasing pracious dala when a pcmable computei' is stolen is bad enough; it is wonc if the backuptxlpics wen:: in thecomputcr bag: it is farmorc serious ir eanfidcntial pcrsanal information is lost or n::vcaled. When:: available. SJ'ATISrICAL IY£ICMES ofI"cr IIICJI1: OQibility and spcc:d in statistical anaI)'Sis Ihan~andclat. .scs and. an important fador,1hc st.lillic" calculations made have been chccbd.. Data aID be irnpDltalliom da'aha'ICS. SJRBCIsheets and oIhcr formalS. Mcnu-driwn slalistical packages IR CIS)' to usc and ofI'eI' cxlcnsWc help files.. bullhat can in itself pn::scnt a hazanl A stalistical mcIhod should only be used ifk assumptions undcrIyilll il an:: mel by the data to which il is applied. Advice should be sou_ ir a IaCBR:hcr does not undcr:sIand the appIicationofassumpdans bcIIind the stalistiall mClhods they IR using c:Jsc the n:scan:hcr risks JRSCllling supcrfidally saphislicalcd but memainglcss rcsults. Poor or nonexistent planning oRen IIIC8M thai a n::search plUjccl is nc~ successfully completed. 00acI project man&lement fmm the oUlset. howcvcr. incn:asc:s a project's chDlKlC of successful completion. Rc:scardI projects Ihat have n:ccived substantial fundilll wiD be expected to havc pmjc:d plans monitored by a management committc:c to help cnRR that the pIOpt achieves ilS aims on time aad within budgel. 'I1ais approach can be userully applied 10 smaller scale pmjccts. The qualily of Ihe desin::cl outcome should be specified and the lime and other n:soun:cs available (or clse which must be obtained) must be idcntiftcd. For example. a doclor in a trainilll pasl plannilll some n:scan:h might have a pruj«l plan that specifICS: the quality or Ihe n::scan:h must be atlcast adequate for a conferencc presentation: Ihc n::soW'Ces availablc ~ lhc n:scan:hcr's time. SOInC inpul inlO lhe n:search design fmm others.. a IiUle lacal funding and some sccrelarialtimc: and Ihc time constmints an:: that sufftcicnt n:sults must be available by the confen::nc:e abslracl deadline and Ihc project musl be completed by the confcrence. II is importanllo be realistic when considering both the raourccs and the limc available. A project completed ovcr a ycar should allow a realistic amounl of time for Christmas and other public holidays, vacations and illness (the n:sc:archcr's. their supervisor's and the paticnts'), as well as building in some time ror UIICXpecled contingcncies. Once lhese constraints have been identified. the
O
B
E
C
~
P
_____________________________________________________________
researcher should plan backwards from the ENDPOINl'. Hc or she should ask the questions: Is then: cnough lime and resoun:es to achieve quality? If not. can the timescale be adjusted or further n:sourccs sought (which will itselr lake time)? Would a less ambitious project meel quality standanls? The timelincs can be pJolIcd and an analysis made of the critical palhways whc~ delay might accur. What steps will hold c\lel)1hing up (c.g. obtaininl n::scan:h ethics approval, retrieving recank)? Once this has been idcntifted Ihc I'CSClRhcr should focus on how to achieve Ihose steps in a limely fashion. Milcstones critical to the success of Ihc project should be klcntified and lBrIets SCI for achieving them. A managemenl committee can be sci up to monitor PIOpa5 against tal'lcts. 11Ic ~seardlcrwho wishes toavoid pitraBs will have clearly speciftcd n::scan:h questions. ...,...,..mtc n:SC8IdI designs.. adequately detailed pmtacoIs and wcll-planned projccls carricdout in line wilh JDOd practice. Stali5liciaalconbibute loaD or these objec:tiva. Statisticians, like cIoctars. 1ft mast circetivcly consulral in the carliest stages and the canscqucncc:s of not taking Ibis advice an: weR known. PcIhaps, then. Ihc biggest pitfall in medical n::scarch is to fcqc:t thaa sraliSlicai coJlaboraliaa should start alb beginning and last tluaughaut the life of a n:SC8IdI projecl. CLC a.Jmen, L ... " ...... Do O. (cds) 2001: Systematic m'i~'S ill IImllh t:aIVI: ~tINIftI")'m in I.'IJIIle:cl. Laadoa: BMI Books. DepartmeDt f1I H.atIa 2001: Researth gDI'emtlllCr framrworlc I'" Mal,h anti .dol care. London: Department of Health. ........ 6.... Caar.r.ce GIl H . . . . . . . . . (lCH) 1996: E6 documcat paiddine for pod clinical pradicc. ICH hannoDiscd lripInite
piddine.. Federal Reguler• • Iadda, D., Campbell..1., • .,.., P. ad ..... A. 1997: Sample $i:e tabieR/Or tlinical JlutlieJ, 2nd edition. Oxford: Blackwell Scieaccs Ltd. Medical ~Ia 0nInell (MaC) 1998: MRC gu_/ineRlor goodpIYlct;c:r ill t:lwcal"iaIJ. 1.Gndoa: Medical Rcsearda Council••1IdJaI . ....:II CoandI (MaC) 2000: MRC elhirs xne.J. GtJtNl Nseartll practite.l.GIIdan: Medical Research Council. N................ for allllalExcel.... (NICE) 2002: I'rillriples lor beJt prac,ice ill dinital tlIIdit. 0xfanI: Radcliffe Medical ~ss. R"-•• K. J. 2002: Epitkrn;. OIDI1. Oxford: OxfaRi Unh-ersil)' Pas. Sdett. Do ... SInus. S. Ee, RIdIardsaa. w. s., ....... w.... R. 8. 2000:
H.,.....
EI'itientHlaMI nretlitille: /roB- 10 praclite anti leach EBM, 2nd edition. London: OwrcbiU Livinplone.. \Vodd Mediad AsIDcIIt6. (MIA) 2002: Declaralian of Helsinki: ethical principles for medical rescan:b im'OlYilll hurnaa subjcc.U (amendnu:nt). In 52nd WMA General A51cmbly, Edinbtqb.
placebo
Tbis consists of a In:almcnl that mimics a potentially activc In::alment in cvery rcspec:t except Ihc inpedienl or other fcatun:: through which the ImIlmcnt is assumed 10 ellClt its efTc:cts. A placcbo can be pharmacological (e.l. a tablet or an injccti_). physical (c.g. a manipulation) or psychological (c.g. a CORvCrsation). The ideal
341
AA~RU~
_____________________________________________________
placebo is indistinguishable flUID its amye counlelpal1 in all R:SpCCts other lhan ils cJreds (appcarance.laste. cIe.). Pia"bos should ayoid any risk inhcn:nt in Ihc aclive intervention. For instanc:e. in ophthalmology, a lrial of iDlnIacular injc:ctions of a new cIfUI could usc "sham' subconjunctival injections of saline as the approprialc placebo. RancIomisc:d clinicallrials in which a placebo is used can be conducted with BUNDiNCI or masking orahe batmenlS.ln a single-blind trial. Ihc in\lCSligalor is aWIR or the belllmCnl bul the subject is nOl, or yice yena. In a clauble-blincltrial neither the subjecl JIOI' Ihc investigator is aWIR of the IR:alment m:eivcd.. Blinding. limits the oc:cummce of canseious and unconscious BIAS arising from the inftucnce that the knowledge of balment may have: on the recruilment and allocation of subjects. their subsequenl can:. the n:sponsc of subjects to tn:atmcnt. the assessment of ENDPDINIS. the handling ofwilhdrawals and so on (Inlenlational Conf'craaa: on Harmonization (leH). 1998). Placebos em sometimes be useful. eYen in so-aallc:d active CXIIIIJ'oI trials. i.e. lrials comparing two or meR active Ratmc:ats. say A and B. A lint situalion l1XIuiring the use of placebos is when blinding ofRatment isdcemc:dessc:ntial but A and B cannal be made identical. In lhiscase..cach subject can be allocated randomly to two seIs of balmenl ("doubIedummy·): either A aad placebo far B CJI' B and placebo for A. Analher usc ofplacebas in active caatd llials is when two active tn:alJncnts A and 8 must be companxI. but a plaa:bo ann can alto n:asonably be caalemplaaed. In thai case.. the Raiment conlJasl of inte.- is A YeI'SUS B and the plaa:bo group is of little usc: if Adiffcrs from 8. Howc\'Cr. if A does not dill"cr from B. the conlnIliIS of A or B 'VCI5US placebo may sugcst whether Aancl8 areequaOy ave orcqually inactive. The Declaration of Helsinki slales thai: "The benefits. risks. burdens and ell'ectiveness of a new method should be tested apinst those of the besl cum:nI prophylactic. diagnostic aacI therapeutic methods. This docs naI exclude the usc of placebo. CJI' no beatmenl in studies ~ lID proven prophylactic. diapastic or therapeutic mc:Ihod exists· (World Medical Associatiaa (WMA). 2(02). The appmprialeness of placebo control versus active conlrol should be cx:msiclenxl on a trial-by-lrial basis. For serious illnesses. when a thc:npcutic tn:aImc:nt has been shown 10 be efficacious. a placebo-controlled trial is unethical. A placebo may also be less necessary in Ihcse ~ucs because Ihc assessment of 'hard' endpoints such as dealh CJI' a majorcUnicai event are unall"ectcd by knowledge of the tn:abnent. In a plaa:bo-conlrolled lrial.1hc cslimalcd In:alment effect n:pracnts any effect oflbe active batment A over and above thai of the placebo (c)A in the figun:). This is g.enerally the e8'ect of inlCn:SI. In contrast. if the aclive trcaImenl is companxllo an unln:ated cantnJI group. the estimated lIalmenl ell'ect includes any placebo effcet (c)A + 6" in the rig.tR). If there is inlcn:ll in cstimaling the plaa:bo eft"ect on its own (d"
in the figun:). the llial should randomise patients between an untn:ated control group and a placebo graup. as shown in the figun:. in order 10 caalml far the naiR evolutiaa of the disease (e.g. sponlaneaus n:lRSsions).
I No ._IInen, I]a.. Randomisation p-.I Placebo 1reatment I i
AdIve _ _
l]a·
placebo A trial design I81Idomising patients to a notteatment controlllfOUP, a placebo lleatment group or
an active tl8lllmenl group Is the~ such a dUng as a plKcboefl"ecl? A seminal papcron placebos (Beecher. 1955). IBllely n:spansiblc for the gc:ncnd adaptiaaofthedouble-bUnd SlUdydc:sip. n:pancdan ayel1lle placcbon:spanse ndcofaboutone-thiRiin26studies. Another paper (Roberts el Qt•• 1993) suggested that the ell'c:cl5 of placebos could be much gn:ater 'under conditions of heighlened expectations'. This claim is naI supported by D ftICCI1I. systematic l'CYiew of 130 QJHJC'AL 1RL\LS in which paticots were randomly DSSiJncd 10 either plKc:bo CJI' no batment (HmbjaJtssan and OotzIche. 20(1). Outcomes were binary in 32 trials and continuous in 82 trials. As companxl with no batmen.. placebo had no significant ell"cct on binary outcomes, n:carcllcss ofwhethcr Ihcse outaJmcs ~ subjective orobjectiYe. Far Ihc trials with conlinuous outcomes. placebo showc:cl a stali!!1icaUy significant beneficial effc:cl an subjc:cti'VC outcomes. bal the ell'ecl cIecn:uc:d with increasing sample size. indicating a possible bias in smaD IriaIs. MB ....... R. K. 1955: TIle pcPAaful pIrI:ebo.JDIIIIIIlIo/t. Amr,inIn MetliI.YII ABlJrilll_ 159. IG02-6. Hrab,IartIIaa, A. aDd G. . . . . P. C. 2m1: Is the plac:cbo powerless? An lIIII)'Sis of cliaicaillials CGlllparinl placebo 'AiIh no balnIenL New En,1atI JIIImIIII D/ MetikiM 344. 1594-fJO.1attnIa..... Caat'lnalle ............ tIoa (ICH) 1991: E9 docuIDeal: guidance anllllislical principles for c1inkal trials. FetkraI Re,illlft' 63.179.49583-91.. RabIrtI, A. .... Kew-.o. G., MIlder, L ad H.-., I\L 1993: 1be power of lIODSpCCific el'ccts in healing: implicalia. far psychasocial and biDlap:allmllmenls. clilril.YltPs,dlo/DgY Rf!Pitm"l3.l7s-91. Warid MedbI AIIodatIaa (WMA) 2002: Dccllratioa of Helsinki: clbical prilXiplcs for mcdicaI mean:h iavabiD& bumaa _jects (amendmcal). 1ft 52ad WMA GcncnI AsscmlJly. EdiDIIurJh.
placebo run-In In some QlHlCAL tRIALS. especially lrials of preventive (rather than lbenpeulic) inlCrYCnlians. the eruolled subjc:cls go through a run-in placebo period prior to RANDOMISATION. The run-in period is uscful to scn:en Ihc subjects who an: likely 10 comply wilh thcinlervenlion and 10 avoidnuadomising subjccls who an: unlikely to do so.lhcn:by
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ POISSONDISTRISUnON diluting any bencfil of Ihc intervention beinglesled. Run-in pc:riodsallow in~lliIalOrS todocumenl why eligible subjects n:ruselrialenrolmcnlorraillobemndomisc:clattheendoflhc lUn-in period. The)' may also be lRIllo evalUab: the feasibilily of stralegies clesipecllO promote trial c:arolrncal and ~ncc. Thcdisadwnlageofselcclinggoacleomplien ina JUn-in periad is that lhcse subjects 1ft not n:pra1Cnlative or the targeted papulation, lhen:by CDmptUmisinglhc: exb:mal validily or the trial. Far iMlancc. a c~1IIian llial in palicms III hiP risk or a recum:nce of a head and neck cancer included an S-wcc:k placebo run-in pcriocl (Hudman. Chamberlain and Frankowski. 1997). or 391 ronner cancer palienIs who cnlerallhc run-in period.. 3S6 wen: I1II1CIaaaised; Ihc olhc:n wen: no laager inlaaled in lriaI palticipalion (If = 20). did nol mum within 10 wec:b or enrolmenl dale (If = 3). did nal achieve a drug ach:renoc level of • least 7Sfit (If = 9) or ~ nal randomisc:d fell' BIIDIher n:asaD (n = 3). 'l1Ie most significant prediclalS of IUD-in outcome (randomisc:d or nal nmdamisccI) WCl'C education level and Kamofsky pc:rfonnancc lIaR.. The odds of naadamisalian were man: than twice hiJher in subjc:cts with a good Kamof'sk)' perfOllllllllL'lC !It'CR and lhase with IIIIJIe than a hiP schaoI educalian. MB HudmDII,K.5.. D·.........~R..Land~ .... 1997:
OuU:omcs of a plKebo RIll-in period in D head . . neck CIUllCCI' C~OA 1riaI. OmlroBeti Clilrittll Trials IS. 228-40.
point.........
correlation
coefficient
See
CORREI.A1IO.~
Planning and Evaluation of Sequential Trials (PEST) Sec: SEQUEN1W. ANALYSIS
Poisson distribution This is a PRCIIABIJ1Y DJS'I1UIU. nmr orlhe numberof(mn:) evealSoccUI'IiDJ ina fixed lime or ~L Whcras the BINOMIAL DISI'RIBUTION is used 10 model Ihc
number of 'successes' observed given lhc nwnbcr of °lrials' taking place (and the independent PItOBABIUTY or IiUCC."CSS in anyone trial), lhe Poisson distribution has Ihc advlIIIlage that the numberoflrials need DOl be knOWD (allhaugh il still needs to be large). For example, Annilqe el Ill. (1999) modellhc nwnber or cases of juvCllilc-onscl CnJhn's disease in Scotland as being Poisson nndom variables. lhis is a typical exampleofilS use. as tile risk for an individual is small, bulthe population of atrisk indiViduals. whileWlknown.can bepJaumcd to be ..... Allhaugh the disbibulion was iclentilled befeft he warted on il, il is named after lhc Fn:nch malhemalician Simeon Denis Poissaa (1781-1840) and hence lhc 0p' in °Poi. .' is a1wa)'s capilalised. The dislribuliaa is deftnecl b)' a single parameler (IIe~ we use l) duallqRSCDIS bath Ihc MEAN and the VAIllAlCEand somusl be positive.lrX isa random variable taking a Poisson dislribution with parameter A. lhcn lhc
probabilit)' that ~ will observe X takinglhc value x, Ihe probability mass function of X, is:
AXe-i.
Pr(X=x) = - x! rar nonnegative inlqer q)ucs of .'t. The characlerislic that the mean and wrianc:e an: Ihc same is one oflhe signalUla oflhe Poisson distribution and or use in iclentif'ying lhc canect madel when we have several observations rrom lhc dislribution. Nole Ihat:
l
Pr(X = .'t) = iPr(X = .'t-I) and so tile mode or the distribution is lhc largeSl inlegc:rYaluc smaller lhan A (since this is the last value ror which }Jx is gn:aIer Iban one and thus the lasl value for which Ihe probability is increasing) 01'. if 1 is an integer. A and A - I an: baIh modal values. The Poisson distribution is the limiting distribution for a binomial distribution. i.e. as n incruscs in the binomial distribution and p clecn:asc:s to keep np conslant. then the Poisson distribulion proVides an increasingly better approximation. As A incn:ases in the Poisson distribution, then the NOBtAL DlSTRIBUDON provides a beller and beller approximation. II is of no surprise: then to leam that the Poisson distribulion is skewed. but that this SKEWNESS decreases as A incrases. For further details on how the Poisson distribution relales 10 other distribulions see Leemis (1986). While the nonnal dislribution may proYide a goad appn:Jximalion to lhc Poisson data. techniques such as tile I-lesl ma)' often nol be valid for comparinglwo graups because the assumption of equal variances only holds under the HULL HYPOI'HESIS. In this case, a squa~1V01 TRANSRNlMAnON of the data can stabilise the variances while leaVing a dislribulion that is slill reasonably approximated by the normal (Allman and Bland. 1995). In practice. the data may surrer from OVERDISII£RSION~ i.e•. then: is II10Ie Yllrialian in the data than would be anticipated from a Poisson model This may be alU'llll ofheterog.c:neily in the popuIatian: lhc pRJbabilily of a 'succas' may not be consIDnL Such variation may be accauntcd for by using a NEOATIVE BIHO\IIAJ. DlSTRlIIUJ'I)N or ir the SOIIR.'a or variation an: known il may be madelled via Poisson ~on (see OB\lElW..lSm LINEAR MODEL). For rurther n:ading see Annitage and Collon (1998). AGL
AIIIun. 0. G...... ....., M. J. 1995: Transf'tlIIDiag clara. Brit_ Metlimi JDlllflal312. no. Anid___ P.... Calton, T. (cds) 1998: Enqtlopaetlia O/bitulalistics. 0IicIaesIcr: JdIft Wiley a: Salls. Lid. AnDIfIIIt.E.,DI"",,·.......... H.. Gllallt.5.............A.I999: Incideacc or ju1aIilo-amcl QaIm's disease in ScolIaad. TIle l4Irt:el 353. 1496-7.1Mawh. L M. 1986: RcIlliansbip'1IIIDD1 canman uniwrillc disIribuIioas. J1.o Alnrriam StIIl&tki"" .JO. 2. 143-6.
PO~RE~~
____________________________________________________
PoIsson reg.....on
Sec
CENEIW.Jm)
I..INEAR MODEL
(GLM)
population proJection
See ce.cOcIuaJaHy
positive predictive value (PPV) PPV isdelined for a diagnostic at for u particular condition as the PRO&~W1Y that those: who have a positive II:Sl actually have the: condition as mcasun:cl by a n:fCl'Cllce 01' .pJId' standanI (contrast with the JIt"BlATIVE PREDICIlVE VALUE). If the data arc set out as in the IDle. then PPV = Q/(tJ + iI). PPV CaD aJso be exprasc:d as a pen.-enta&e. PPV dc:pc:ock on the prcwlcnce of disease in the larget populatiaa and this has impol1anl c:oasequcnc:es in Ihe conat of papulation scn:eniq. In conllasl. test SDlSmvm and SFECIFE'IlY often remain the same whc:au IcsI isapplic:d in difrcn:nt populations. Far example. suppose thallcst sensitiYity and test specificity boab cqual9Sqc. and I 000000 pcaplearc Iated in a SClftning piogramme. Jfthe prevalence of the disease in Ihc populaliaa is lOCI,. i.e. I in 100. lhc:a 95000 out or the 140000 with posilivc tall would have the disease and PPV =68t;t. If the JRYalenc:e of the disease in the populaliaa. however. is only O.It;t. i.e. I in 1ODO.lhcn 950 out of the 50900 with positive tests would huvc the disease and PPV = 1.8K. poaItIve predlcllve RIlle GenSIBI table of IBst resu/ls IItnDIJg lJf.b+c+d Individual samples
also on the IRvalcnce: of the condilion in the population in which abe lest is used. As disease: prevalence is likely to be Iowa' in community popuIalions than in hospital populatiaas., a diqnastic test may be useful in a hospital scaing when: there is a high priOl' probability thai a patient has the disease. but may nal be useful as a population sc:n:ening test. • PPV should be pn:scntc:cI with CONfiDEICE lNlERYALS (typically set at 9SCJt)calculatcd using an approprialc method such as thai of Wilson. which will noI praduce impossible values (pm:ealqcs grc:atu than 100 or below 0) whca PPV approadIes exln:me values. CLC (Sec also MISE POSITI\'E RA1'E,. POPl1LA'IION SCIlEENIHO~ PREVALENCE. DUE POSmYE RA1E) AI......., D. G., MacIdII, 0.. B.,.at. T. N. 8IId GanIIIer, M. J. 2000: Siotistia 'Kith eonjllknt.Y. 2nd edition. Londoa: BMJ Books.
posterior distribution
This" is a PROBABILITY DImlI. BunON that repn:sc:ats the infonnalion. or beliefs., assoc:iutcd with a panmc:ler of intc:n:st after data 1ft collected or an experiment coadudcd. The poslCriordislribution is produced by combining the FlUOR DIS11UBU1IClN~ which ~JRScats information about the parameter bcf~ the collection or data. willi the UKELIHOOD function~ which n:praents lhc informalion about the panunc:tcrconlainccl in abe experimental data. by abe use of BAYES' 'JIIEOIlDI. Formally abe poslcrior distributiaa is derived as follows:
p(lIIDala) OC p(Dalal'l)p(9)
Diseax
Posili\'C Ncg.lh'C
Total
/'resen'
Ahmrl
Tolal
a t: a+e
6
a+b
tl 6+tl
a+b+t:+tI
~+tI
Thus even with a n:lalively high prcYaicnec of Ihc condilion in Ihc population. many false positive lest results may "be gc:ncnatcd (sec FALSE POSI11\'E RA'J1!. with the aucndanl hann to abe palic:nl from anxiety and further investigations. Hc:aec the usefulness of a diaposlic lest dc:pc:nds DOl only on the test charac:lc:ristics. i.e. the test"s sensitivity and specificity. but
in whichp(fJ} is lhc priordislribution exprasing initial beliefs in the paramclCl"of intcrest.,lI.p(BlDala) is the COlICsponcliq posacrior distributiem of beliefs and p(Datal9) is the likelihood. The posICriordistrlbulion pmvides the fuD information concerning the parameter of ink:n:st. It is often useful 10 display the posteriaron the: same plot as the priOl'dislribuiion and likclihoacl. in a so-called triplot. 'lhisallows the n:""10 understand to whal exlent the priGI' distribulion is contributing 10 the overall infe~ncc. Examples of these plots an: shown in the figu~. In ftpre (a) an example is shown in which the prior distribution dominates Ihc likelihood and almast fully dc:tcrminc:s the poslc:rior. Figu~ (b). in caalrast., (b)
(a)
Posterior
Ukelihood
~rY;(C -10
0
10
20
30
4b
sO
posterior dlstrlbutlDn E1CIJmples of lriplots of prior disldbulion, IIcfllihood and postfHfor distdbution
_______________________________________________________
shows an example in which Ihc prior distribution is weak and the posterior is essenlially detennined by Ihc likelihood. II is sometimes convenient 10 represent this posterior information by summaries. As posterior estimates. Ihc p0sterior MEAN and posterior MODE are most often used and the uReenainty in the estimate is often repodCd by a posterior CREDIBLE INlBlVAL.
An imporlant consideration rollowing a Bayesian analysis is the reponing of the n:sulls. The slDlistician' s jab is not over when the analysis is complclcd and a JqJOIt wriltca. because it is at this ~e thailhouPt needs to be given to the traasmissionofinfonnalion to diverse groupsofremale customers. For example. in the contexl or a phannac::cutical drug trial. Ihcre are at least tIRe groups of individuals who inlc:mct with each other during the drug developmenl protlCSS. 'lhese groups are the expcrimentcn. the rcviewas and Ihc consumers. The aim of the expcrimenlcrs. among whom are individual phannaceutical companies. rcscardI ollanilations and clinicians. is to inftuencc: Ihc customcn. who are Ihe doctors treating patients. They do this by providing them with infonnaliClll thai has. in a sense. been 'sanitised' toenmn: objectivity by the reviewc:n. who an: the editors of journals and regulatory aulharitic:s. BC:ClUe they each have different motiwtions it is not at all dear thai there will be a single 'JJIIRlCI of infonnaUon' appropriate ror each customer group. One approach is to provide a range of posterior distributions bucd on a 'communily of priors' . This approach works well if the community is broad enough to cover many difren:nt prior beliers about trc:aIment effecls. Alternatively. the likelihood runction can be reponed allowing remote customen to input their own prior distributions to derive the appropriate posterior distribution. 1'bis appraach will only work ir the remale customers are able to cDII)' out the calculations. which may limit its use to only the simplest cases. AG
post hoc analyses 11Ic:se are analyses that wen: not specified in advance or data collection. Such analyses lend to be reglUdc:d with suspicion. bc:causeorthe possibility that an observed ASSOCL\1KJN may ha\'e been selecled from among a Iqe number of poIential associations that wen: examined but notlq)Oltc:cl ('data dreqing'). Such multiple comparisons can lead to inftation of TYPE I ERROR rates. If the number of associations tested is known then fonnal procedures such as the BONFERRONI CORRECTION may be used to co"", P-VALUES (SIOMFICANCE. LEVElS). However. these methods have subslantiBl disadvantages. one in particular being that they un: highly conservative. Unfonunalely. the notion thai the: fonnulation of prior hypotheses is a guarantor against being misled is itselr misleading. If we do 100 randomised trials of useless therapies. each testing only one hypolhesis and only performing one statistical test. alJ statistically sicnificant results will be
PREV~~
spurious. Furthermore, il is impossible to police claims that reporlcd associations wen: examined because of existing hypotheses. This notion has been saliriscd by Cole (1993). who announced that. using a computer algorithm. he had generated every possible hypothesis in EPllBOOLOOY so that all statisticBllesls are now of a priori hypotheses. In practice. the best approach is to report accuraacly the contexl in which an association was examined and 10 rqanl findings selectc:d ftvm among many CompariSOM as requiring confinnation by further research. JSlGDS
CaIe, P. 1993: 'I1Ic bypothesis Faeraling IDKhine. Epidemiology 4. 3.271-3.
power
Sec SAMPlE SIZE DmDtmo\TION IN CLINICAL TRIALS
power transformation
p.p plots
See nAN5RHWATIONS
See PROBABILIlY PlDI'S
prevalence 1he lR\'aIence. or point lR\'aIence. of a disease is the number of cases of a disease that exist at a specified point in lime in a defined population. It is generally presentc:d as a IDle. Thus: Numbar of ~ 01 D diselUe ... II pcuticuJar poi_ ia IinIo Pn!Yilleace= -~~~-----~-~--Number iR the papa.... Dl duI poiDt in lime
This results in a number between 0 and I, but far ease of presentation it is oRen expressed as a rate per 1000. per lOOOOOorper 1000000. depending on the disease rarity. As an example. Ihc number of males living with colon:clal caneer in ScolIand on 31 December 2005 was reported as 9880. giving a prevalence rate of 401 per 100000 Scottish males (NHS National Services Scodand, lnformaUon Services Division: www.isclscotland.org). .Care should be taken to distinguish between prevalence and 1NCIDEJIa. Although the definilions appear similar at first siPt. they are used fordiffen:nt purposes and il is essential to distinguish between them correctly. The prevalence of a disease clearly depends on the incidence and also on the duration or the disease. A disease with a high incidence IDle. from which most sufferers die vel')' quickly, will have a low prevalence at any point in time. Conversely. a chronic disease with a low incidence rate may ha\'C a high prevalence if the duration is long. If the incidence and avel1lge duration of the disease remain approximately constanl over time the prevalence and incidence will be relatc:d by: Prevalence = Incidence x Duration ~aIena: is of most usc in detennining the burden ofchronic
disease in the populaUon and therefore is useful in allocaUng
345
PRIMARYENDPOINT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ ~s and planni.. healthcam services. It is
or limited usc
in epidemiolopcal studies ordisease adiolOl)' because it does not measure RISK (Woodward. 20(4). Using pn:vaIcnce to assess disease bunlcn RXluin:s CIK when dealilll with curable diseases. 'l11c pRMIIeacc rale quoted previously far colarcctal cancer in Scollish males in 200S (401 pu 10(000) includes cases Ilnl diaplosed up to 20 ycaD IRviously. A meR ralistic assessmenl of cum:nt burden far males is obtained fRHn the number or prevalena cases diaposed wilhin the JRVious year. which was 1463 (59 per 100000 Scollish males) on 31 December 2005 (NHS National Services Scodand, Information Services Division: www.isdsealIand.OIg). Paradoxically. some impodant acI\IDI1CCS in lrealment can lead to an increase in prevalence or a disease. For elUlDlplc. the introduction of insulin as a tn:alment for diabetes resulted in a n:duclion in the number of dealhs and an increase in the prewlc:ace of diabetes in the population. A more detailed discussion of pnwalcnce and the relationship between prevalence. incidence and disease duration is liven in Rothman. Greenland and Lash (2008). \vHG [See also 1NCIDEIa) RotIaDa, K. J., cmeaa.ad, S. and ....., T. L 2001: M«Iern qkltmiolog),. 3111 cditioD. fIaiIadclpbia: Lippincoll Wilkins and WaJliams. WoodwanI. M. 2004: Epitkmiology: study design ond tltzlQ lIIIIlIysis. 2nd editiOlL Boca Rlllan: Chapman a Hall.
primary endpoint
See ENDPOINT
principal component analysis
'Ibis is an exploratory technique. mainly used in partitioning the variation present in a quantitlllivc multivariate dataset and in examining the: data to highlight their important paltems or reatures. Suppose thal p observations have been taken on CKh of n individuals and the values ha\'e been collected into a data matrix having n rows and p c:olumns. For example. J.:kson (l99I. pp. 107-9) provides a dalasct on hearing lass as assessed by audiomdry for 100 males aged 39. Each subject had decibel lass measun:d III faur frequencies for each of two ean.. yielding a 100 x 8 data matrix with values between -10 and 99 in each cell. Other inslanCcs of multivariate medical dalasets might arise when screening
patients. c.g. when a variety ofmcasurcmcnts are routinely laken on all individuals signing up ror a health "nlre. or in disease characterisation. when: measurements are taken on variables that should distinguish surren:rs rrom heallhy individuals. One orthe main stumbling blacks in Irying to assimilalc a mullivariate: data mabix is abe facl that AS5OCL\11ONS exist between the columns. For quantitative data. VARIANCE measures scalter while COVARIANCE or CORRELATJON measures association. 11Ic lable illUSlrate:s such associations for the audiometry data. The: variables am deaoted by ear (UR) plus rn:qucncy (500. 1000, 2000. 4OOOHz)~ the values down the left-teHight diqanal give the variances of the: variables: the entries below the diagonal give thecovarianccs between pairs of variables: and the entries above the diqonaJ give the COl1Cspondilll correlations. In cssenec. principal component analysiS simply 1nInsfonns the p measwed variables '~I. X2" •• , x, into a set of linear combinations '1. Y2 ••••• y" (Le. )"/= Utl.TI + Un.T:z + ... + a",x" for all i. when: the c:oc:flicic:nts uq are suitable constants). which am mutually uncarrclated and which successively maximise the: variance ofsuch linearcombinalians. In other Words.,)'1 is the linear combinalion of the .T thai has the greatest possible sample variance amODl alllinc:ar combinations ror the givc:o dataset.Y2 is the linc:arcombination of the x that has the nextlarICst variance and soon. Tc:chnically. the variances of the .v arc given by the eigenvalues of the COVARIANCE MATRIX of the original x and the c:oc:flicients Uu an: giYen by abe clements or the: com:spondilll eigenvulors. These quantities arc obtainable rRHn the raw data in all standard statistical softwam packages; the y an: known as the principal component:. or the data. Principal c:omponents arc useful far multivariate data exploration in yarious ways. It is often the: case thai each, can be inlerpreted by relating the size of the cocflicicnts a¥ to the variables that they multiply and he:nce a substantive meaning can be attached to the component Since the components am arranged in decn:asing order of variance. such interpretation will then:fon: identify abe maiD 5OUn:es of variation among the sample members. Sceand. by considering the variance or each component in ~Ialion ID the IDtai
principal component analysis Vadances, COVlIIisncss and correlations for audiomellJl data
1.500 L1000 L2000
lAIJOO
RSOO RIOOD
R2000 R4000
L50tJ
L1000
1.2000
lA«J(J
RJOIJ
RJOOO
R2000
R4«JiJ
41.07 37.73 28.13 32.10 31.79 26.30 14.12 25.28
CU8 57.32 44.44 40.83 29.75 34.24
0.40 0.54
0.26 0.27 0.42 384.78 25.01 33.03 57.67 269.12
0.70 0.55 0.24 0.18 SO.75 30.23 10.52 18.19
0.64 0.71 0.45 0.26 0.66 40.92 24.62 27.22
0.24 0.36 0.70 0.32 0.16 0.41
0.20 0.22 0.33 0.71 0.13 0.22 0.37 373.66
25.30
119.70 91.21 11.64 31.21 71.26
31.74
68.99
86.30
67.26
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PRINCIPAL STRATIFICATION variance of all CDlllponcnIS. it isoRcn appan:nt thaljust a few componenlS account for IftCI5l oflhc variability in the sample ancIthe remaining componenlS CaD be igncnd as essentially Iq)n:senliq Ihc 'noise· in the system. A typical IUlc-ofthumb employed by pnc:Iitiaaen is to retain only as many componenlS as account far 8IOund 754J, to 809(, of the talal variance. Each sample individual's value (ar ""ore) is easily obtained aD each principal component by applying the cieftnitioa of the CDlllponent to the original sample data. 'IlWng the ICIIJI'CS on the lint two principal componc:a1S and plotting the individuals in a SCATI"ERJIL01' from these lwa &c:IS of values then gives the best two-ciimeasional vicw of the dais. nais enables aDy 0UlL1BIS orgroupinp orindividuals to be idc:nlifted and ather patterns in the daIa can also be ",adily discerned. Finally~ since the principal COlDponcllls an: unc:om:lated. the scon:s on the ycan farm useful inpulto further statistical analysis. Howc\lCJ'. forming a linear combination of the Konly n:aOy malc.esSCDscifthey an: all similarcntities. Thus. if they an: all clifrcn:at (e.g. a height. a weipt and a count. say) or if they 1ft similar but of vcr)' YlII)'ing magnitudes (c.1- height of individual. Jc:ngth or leg.1engah of ann and head ciR:umfcn:ace) then it is preferable to nandanlix the variables before analysis. In this case. the cOI'RlaliDlr rather than the covariance matrix should form the basis of the calculalioos. II is imponant to be awan: that analysis of the standardised data gives dift"CRnt n:sults frum analysis of the unstandanlised cIaIa. so can:ful consideraaiOD is needed allhc autscltoclccidc which analysis is the IIIIR appIOpriatc. To illuslndc these ideas. consider a principal campanent anaIysisofthcauciiamellyclata..F'IfSl.a1lhoughaDunilSoflhe eipt variables an: the same (Hz), it is evideat fram the diagonal elements of the table that the variances of the two higbc:sl frequencic:s (384.71. 373.66) ~ nearly 10 times the size oflhose or some lower frequencies (e.g. 40.92.41.07). Hence Slandardisalion is warrantcd and the analysis should be conducted on the extm:lalion matrix. 11Ie cigc:avalucs of this matrix. i.e. the variances ofthe principal components. an: found to be 3.93. 1.62, 0.98, 0.47,0.34.0.31.0.10 and 0.15. 'I1le first four componc:ats have a combined variance of 6.0. which is 7Sfit of the overall variance of 8.0. so we can clTectively Iq)lace the eight original variables by the ftnt four principal componc:ats. How mighl we interpn:l thesccomponenls?The first one is: )"1
+ 0.42x2 + 0.31x] + 0.2Ix4 + 0.14K! +O.41K6 + 0.31.\'7 + 0.25.18
= 0.4Ox'.
w~the numbering is Iiom LSOOfar x.through to R4000 for
x •• TIle codIicienlS an: all posiIive and of approximately the same size. 50 the component is approximately proportional 10 the sum ofthe K. Tbisrqxacnts theovc:nU hc:ariqlevel ofan individual and implies that individuals who suITer lossal some
rtaaw:ncies IR likely to suffer loss at the other f~uc:ncies also. 'I1le main soun:c: or variation among the sample is thus in terms of the individual's ovcndl hearing level. The sc:cond compaacDl is: )'2
= -0.321'1-0 .2 3.\"2 + 0.24x3 + 0.47.%.1-0.39x, -0.23x, + 0.321'7 + 0.51.1'8
The coet1icienls ~ similar bc:Iween the left and right ear for each bquency but now show a ~contnsl' - negative coemcienlS allachc:d to low fmauencies (SOD and 1000Hz). pasilive coel1icients auachc:d to high fllXlucnc:ies (2000 and 4000 Hz). It is known that hc:arinJ lost as individuals age is first DDlicc:ab1e at high f~Ul:llCies. so the second most imponant SOUR:e of variation among sample IIICIIIbeIs is in tenns of this farm of hearing loss. 11Ie third companelll is;
Y3 = 0.16x1-0.OS.12-0.47.1'l +0.43.1'4 +0.26.1'5 -0.03.1'6-0.56.", + 0.4lxl
'Small' coeflicienls corn:spond to ·unimportant' variables. we can cxmclude lbalthe third mosl important soun:e of variation DmODl sample members is a contrast betwc:enjUSl the two higher frequencies. Finally, Ihc fourth component has negative coeflicic:Dls for the lint faur variables and positive c:oefllcients for tile other four variables and is the",fon: a contrast beawcen left and right cars. Thus we have been able 10 cbaracterise the main soun:es of variability among sample mcmben. A scaUerploi of the scores on the lint two components reveals ~e potential autliers in the sample. which should be chc:ckecl for aberranl valucs and possibly Mmoved from funber analysis, bul no other structUM of interest. In addition to its rule in multivariate exploration and clcscripliCID. principal component analysis is often also used either as a prelude to. ar in conjunction with. a range of aIhc:r techniques such u variable selc:ctiaD or orthogonal regn:ssion. (See Jackson, 1991. and JoUilTc. 2002.) WK (See also CORRESIIONDENC'E AJlW.YSIS. FAcrOR AN.\LYSI5) 50
J.....
JHkIDa, J." 1991: A IIsn'$ rllitk 10 priMipGI t.'Dmpolltrlls. New York: John W"dey
a:
Suas. Inc.
L T. 2002: Prilldptll
CMlpfJlltrll _lysis. 2nd dtioD. New YcR: $pincer.
principal atratRlcatlon 11ais is
a method far the
estimation of Ratm&:at eft'ecls in randomiscd CI.I\IE'AL 1RL\LS llltiustilil far an intermecliaac outcome that is not the pimary ENDJIOJNT. II involves classifying subjects into cllllSl:S tballR dcranad by Ihcif joint poIC..iaI responses of the intennedillle variable: to all possible random a1localiaas. 1bese classes ~ known as principal slnda. which have lhe prapeny that they IR indepenclent of lIaImc:al allocation and CaD be a.ndlcd in the analysis in aD analogous way to .,..,...nclamisation variables. Fmnpkis and Rubin (2002) introdua:d lhe
347
PRINCIPALSTRAT1F1CATION _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ cancepl ofprincipal effeclS. which com~ In:abnenlS within
principal SInla (wilhin-class or sb'alunHpccific INRNtI(I1I10-1UAT (lTI) efl'ecIs). The principal strati-fication melhod anows for hicldca canfounding (i.e. sclc:clion efl'eclS) belween intcrmcdialc and final outcomes in lhe ewllmtion ofba.tment-efl"ect mediation and poll-randomisation soun:cs of lR:aIment-efl"ed hc:lcIogcncily (see ADllImtENT ... KONCOMPLJANCE IN RANDmllSED COHlRDU.fD TRI.W and DIRECT AI\I)
INlIRfrT &Itr1'S).
We now illustrate principal SIralific.ation in the contexl of assessing di~ct and inclircct effects or RANDOMISATION an an outcome. Consider a binary nndomisation variable (%1 = 0, I) for subject i and a binary intennedialc variable (M,=O. I). Wedcfinc: Ml..z) to be lhc potenlial wJuesofthe intcnnediatc variable after mndom allocation. such that MItO) is Ihe yalue or the intennc:dialc outcome ir randomiscd to lhc CIODlrol condition (~=O) and MJ.I) is the value ir nndomised to lR:almcnl (2, = I). These are ~fem::d to as potenlial wJues as GIlly one of these values can be obsc:m:d in each subjecL depending on the value or z," The table shows ~ are rourdislincl classes for lhe joint combinalian of M,(l) and MItO). 11Icse classes are lhe principal sbala.. Note that a subject's slnllUm membc:nhip is nol usually known: e.g. a subject with 2, = 1 and M, = 1 is anly known to belong to class 2 or 4. In general, lhe stndumspecific ave.rap: ~alment effects Ta. r:. 1"-, and 1"~ may differ. 1bc: din:ct effects of Imdment are mc:asuml by 1"1 and 1":. since in these classes there is no change in the wlue of the intcnnediate variable between random allocations. Since class membcnhip is independenl of treatmenl allocation. the inleation-lO-lrcal effect for all subjc:cts r islhe weighted a\'CrBge of the effects within strata: r= k:rljrj }
principal strata The four possible plincipal strata with a bInaty mediator (M) and binary randomisation (Z), tire propottion of the sample in each strata and the stIBium-specific treatment I1Ifect
cllDJ
AI).O)
M,.( I )
(slralurn)
1
o
o
2 3
I I
o
4
o
I I
(AI1.0).
ProptWlio"
Trftllmt'"t
MAl)
ofsubjecla
ejferl
(0.0) (1,1) (1.0) (0,1)
We now consider how the In:atmcnt effects rl' r2. r, and 1"~ may be eslimalc:d. Within each principal Slndum C (C = 1 to 4), and givcn a sel ofIe measun:dcovariatcs. X. we may ha\'C a model for the outcome
Y=
L j;
tptrXk
+ r ..=+ ~
Superficially. itlooksstraigiltfonvanl tocslimatc r •. since it is an effect of ranciomisatian. but. of coune, \\'C typically do DOl know to which slndum each subject belongs since the class. C. is f~uenlly not identifted. In order to identify this. we Ialuire baseline covariales that are pn:dic:lors of stratum membership c. For estimation using principal straliftc:alian. we would construct a rqrasion (latent class) model to predict stratum membcnhip using baseline covarialcs. X. We would then simultaneously moclcllhc rrr effecls on outcome within the principal strata. The estimation proceeds by specifying a full probabilit)' model using MAXIMUM UIEUHOOD ES'IIMATION. Allcmalively. a fully Bayesian apprvach (see BAYESIAN METH. ODS) to eslimatian could be used.. If we havc missing outcome data (see MISSINO DMA) we can also simultaneously ftt a third model pn:dicling missing outcomes. based on lhe assumplions of lalent ignorabilit)' (sec Fnngakis and Rubin. 1999). We now illustrate these cona:pts with some more examples w~ principalllnllificalion is commonly applied. Consider a simple example whem we have random allocation or subjects to n:ceive either Ratmcnt or no RaIment (control). Those allocaIc:dto thecontrol poupcannol getaa:ess tolhc tmabncnt (thc:Ie is noconlami..tion). bullhosc subjcc:1s in the ~bncnl group can f&ilto n:ceive the tn:abnent. and so end up in the conlnJl group. In this case: we can define two principal stnda: complicn and IIOIXXJIDpliClS.. The lint principal stratum of complicn IR those subjects who IR trcaled ir they ~ allocaacd to the treatmem group. and nul tlaled ir allocated to the control group. 'I1Ic second principal stratum is the nonc:ompliers who will never n:ccive the: treatment, rqardlcss or their individual random allocation. It is paaibJe to identify these two classes in Ihosc subjects who arc aIIoca1c:d to In*ment. but lhcy mnain hiddea (latent) in the control group. The J1Teft"ecl withinlhc principal slntaofcompliers is kllDWll as the caMR.ER AVBlAGE C\US..U. EFFECT. which is a special case or a principal effect (see ADJlISIMENf lOR NOHCOMPI.IA.~CE IN JWI)Q).IJ5EJ) CCN1"ROUED TRI.W). It can be estimalcd as cIescribcd above or usillllN'S'lRLU:NTAL VAlUABLES methods. Franpkis and Rubin (2002) intnxIuc.-ed the conoept or principal sunupcy to apply principal sbatification to analysing sunapIC or biOlDlllbr outcomes (see SURROCJA11: ENDPOOOS). The: authors pnwidc an example with nndom aJIocaIion to SIandarcIln:abDcnt (z =0) or a new lIcaImenl (= = I) where the outcxJmc was survival time aad the pulalive SUIIOptc M is a naeasun: or CD4 count at two mon'" (1 = high and 0 = low). The labIe again ilIuSlrak:s the: feu basic principal Slrala: class I is those subjcas who wiD have a low CD4 count unaft"edcd by lR:atment; class 2 is those subjects who wiU have a bigh CD4 count n:pnIIcssofllalmenl; class 3 is those subjects who have a hilher CD4 count under the standard IRatmenl than the new
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PRIORDISTRISUTION tn:lllm:nt; and class 4 is those subjccls who have a lower CD4 CClUDI uncIc:rthe stanclanllIaImI:nt than thencM tn:almcIlI. CD4 counl is lhcndefined asa principal sunople rorsunivaitime if' the paupofsubjeds with lID rtTcfI'cclon CD44X1U11lalsohave no I1T el1'ec:t OR suM.llime.. Malhcmatically. these an: abe Iraup5 in which M,( I) = MI.O). which wauld be cJasscs I and 2 in Ibe table. and we have then idelllificxl a principal SUI1'OIale ir r. = r 2 =O. Principal sunaptesallow esIimaIion oflnlecausal ell'ecII.. as opposed to sllllisliaal sunapICS (Pn:lllice. 1989) because they avoid past-nmdomisaIion S'IUX"IION BIAS (1IIU1IC8sand CXJIlfounding). which may be pn:senl between the s~ g_ and IrUe oulcalnc:s. Allhough lhe basic idea or principal slnlificalion is the estimation or m effecls wilhin principal strata. il is possible 10 fil allc:malivc explanatory models nested within principal SInIta. For example. Iypically we 8M interested in a univariate response. but we could investigalc the advulqes of Simultaneously eslimating effects ror two or more differenl outcomes (i.e. multivariate responses). In the: caalexl or treatmenl compliance. Jo and MuthCn (2002) have invesligated lhe: usc: of the latenl growth curve or trajectory models ror longitudinal aulmme data. Recenlly then: has been an cxlcDdcd usc or principal slnlliftcalion in asscssiDi mediation of treatment effects by illlermcdiatc variables (sec: Jo. 2008: Gallop el til.• 2009: Smslcy. Dunn and While. 20(0). The latter aulhan dcftnc principal SIraIa accanling to mcaswa of therapeutic alliance in a trial with random aDacalion to either tn:aamcnl 01' c:anbol (no tn:almcnt). and when: those: allocated to c:anbol group camol gel access to lrealmc:lll. Thc:npc:utic alliance is only mc:asuraI in the tralment group. 1hc SInIla are defined as being: principal slnlum 1 is the low alliance strata. where SubjeclS would have a low therapeutic alliance whea allacab:cl to the IrealJncDt puup. aad an: nolllatcd when allocated to control: and principal slnIlUm 2 is the hip alIianec stralL where subjects wauld have a hip Ihcrapcutic aIIiana: when allocated to the tlabnent group. and are not treated when allocated 10 the nonllallncnl group. As lRViously, il is possible 10 identify these two classes in Ihosc aliocaled to the lR:almcnl group but they mnain hidden in the: contml poup. 1'1Ic authan thc:n euminc I"IT eft"ects and cxplanatary models for a cIosc-rapanse relatiOMhip within the Frincipal simla. Principal stratification is masl commonly used with binary lrcatment and binary intcnncdiatc: variables. However. the framework can be extended to SCltinlS with multiwriale. time-dcpcndc:al and continuous intennc:dialc variables. ancIlo multiple random allocations (Frangakis and Rubin. 20(2). RElGD
BIDIteJ. R. A.. Duaa, O. IIIId WIllIe. L
Mediation and madmlioa of IRIImCnl cft'ccts in IWIdamiscd canll'OUcd trials or complex inlcl\'adioDs. Sla,is'ittJJ melWs i111f1tt/ittJJ frMlrcb (in pms). ......1dI, c. It. ad Rullla, D. B. 1999: Aclcbasinl compliClliaasofintcatia.lO-bat analysis in thccambined presence R. 2010:
of aU-ar-none lreatmc:nt-nonc:ompliance ancl subsequent IUissial outcomes, Bionrtlrilca 80.. 2. 36S-79. Pnnp.... C. E. ad Rubia.
D. B. 2OOl: Principal SbatificalioD ia causal infen:nce. BiDnre/ric. 58.1.21-9. GIIIIop,a..s..u,o.S.. Ua.J. Y.. I!IIIaH,M. R..JaIfe. l\L ad T_ T. R. 2009: Medialian analysis with principal stndiftcaIian. Statis,ics ill Metlkine 28. 7. 1108-30. Jo, B. 2001:
Ra,..,
Causal inf=nce in randamizcd cxperilllCllls with mediational ... ccsses. PsycholOlicalMelhods 13.4. 31~36.J.. B._M"""B. 0.2002: Loqitudiaal studies with i~ntian BDCl noncompliance: eslilDldionofcausal c««ts in lJO"1hmixlUle modcliDl.ln Duaa. N. and Reise. S. (cds).lfIllllikt-e1 nrodeling: mellrodologitlll tltlwmte$, UJue$,tmtlapplit.YI,itlltJ.LftTCJtCeErlbaumAssocialcs.1!p.112-39. Pnntlce, R. L 1989: Sumtgatc eodpoinas in cliDical-bials - definitian aad opaatianal criteria. Sta,i.,its in trltt/inne s. 4. 431~.
prior distribution This is a FRaIABLITY DJSIRIBUI1atI that repn:sents the inrormaliaD. or belief's. assacialccl with a pararnctcrofinleR:sl befon: dais IRcollcctcd or an c:xpcrimcnt c:onduc:cccl. 1bc: piordistribution is an csscntial camponcnl ora Bayesian analysis (sec 8.o\YESL\N MEIIIODS) and difren:llliales it rrom traditional. so-called ftapmti51. analysis. The Friar disbibution is used togdhc:rwilh theLllCE1JllXJDftlnction. which reprc:scnlS the infarmalion aboulabe paramda'CXJIlIained in the experimental daIa. to ~ the pastaiar clislribulion ftam BAm· 11EORS1. AxmaIly. a Bayesian analysis pmcecds ftam the following formula: p(6IDa1a) OC p(DataIB)p(6) in which p(tJ) is the prior distribution expressing initial beliefs in the pammeter or interesl. /I, p(BiDaaa) is the corresponding POStERIOR DlSTRIBuno.~ or beliefs and p (Da") is the likelihood. There an: many lypes or prior distribution and many ways to derive them. Priar distributions can be based on information available in Ihc literature - historical data - and quantific:d through a formal process of mc:ta-analysis (see SYStEMATIC REVI£WS AND META-ANALYSIS). In the pharmaceutical industry. prior disbibulians aboul potential Raiment effects can be dell:nninai from existilll clinical databases of similar drugs fiom the same class of compounds. When: existing daIa are not awUable. experts can be used 10 elicit prior distributions by a process of questioning and refinemenL Fonnalistic prior distribulions can be determined 10 rqxacnt cases in which then: is lillie or no relevanl priOl' information. Such prior disbibulions are nonnaIly called noninfannalivc and a Bayesian analysis with a noninformalive prior will usually give very similar CGIIclusions to those obtained usiq a rn:qucnlist analysis. However. the output and pn:sc:nlalion or the analysis will be vCr)' differeDl ftam standard appraac:hcs and is often round to be man: intuilive and helpful by nonstalislicians. Finally. a selorpriordistribulions may also be used in Older that a range of assUmpiions may be tested apinst their subsequent caaclusions. For example. a communily of priors could consist. in addition 10
349
~Lrrv
____________________________________________________________
a neutral uninronnalive prior. of priors lhat represent both 'pessimistic' and 'oplimistic' scenarios. A 'pessimislic' prior would be one in which. f.". example. there would be considerable doubl aboul the effecliveness of a lrealmenl~ an 'optimistic' prior. in contrast, would represenl strong prior belief IhaIthe trealmeRt is effective. The usc of prior distributions in a Bayesian analysis is not without conlnM:rsy. From one perspective. their use is a strength in thal il allows Ihe scienlisl 10 access more information and thereby produce sbonger inren:nces. This slrenglh. however, can also be reprdcd as a weakness and gives rise to any number of questions. FinI. if the prior is based on historical daIa. is il relewnt to the cUJl'Cnt investiption? If it is elicited rrom experts to what extent ~ the lawtirq; inferences subjective and hence of less credibility than infClCnces based on data alone? ID pnctice.the influence of the prior distribution may be minimal if Ihe information contained in experimental data outweighs thai contained in the prior. which will generally be the case if the experimenlal AG study has a large sample size
probability
The noIion of probability has two connOlations: (1) as a mathematical discipline concerning the study ofuncerlainty and (2)as a numerical scale from zero 10 one to describe the fn:quency of occurrence. or degn:e-of-belief in. a given event. The first definition is akin to statistics as a discipline. with broad distinclion between the two being thai whereas probability sc:eb to learn about the sample given charaderisticsof the population. statistics seeks to learn about the population from given characleristics of Ihc sample. Their intenclalionship is such that Ihe theory of probability is the backbone of statistics. Howev~. the second connotation of probability will be our focus here. Any chance event happens according 10 some numerical measure between zero and one. with these exlMmes representing the probability of impossible and certain events n:spcclively.ln practice. most probabilities lie between these exlMmcs (see the first figure). Sometimes. probability is exIRSsed on an enlirely equivalent 0-100C1t scale. conventionally known as chtmce. ir so.
0
I
0.05
I
0%
5%
t
t
IrqJossible Asmall chance
0.5
I
50%
0.95
I
95%
1
I
100%
t
t
t
Equally Iltett to occur as not
A large
Certain
chance
probability Probability quantifies lhe scale of uncertainly. Note thatpmbsbililies 010.05 8000.95 aresfmply convenienl round numbers towards the ends of· the scale, with 'small' and 'large' being just (nonstandarrI)
desctiplive IBbBIs
Usual DDIaIion 10 describe Ihc probability of an event is 10 write the event in brackets orparcnthescs. namely I ... 1.1 ...J or(••. ). immediately aftuan abbte\iation of probabili ty to P. Pr or Prob, willi no consensus about capitalising Ihe initial lcuer. Thus:
Pr{ a male is prepant} = 0 pta coin lands heads uppennosq
= 0.5
prob(baby is either male or remale) = I arc just some of the many ways lhat simple probabililY sIalemcnts can be written. For the sake of bl'Cvily. the event beinl considered is often reduced to a single letter. so, for example. when considering the coin-lossinl scenario. one mighlsayPr(H} = Pr(T} =0.5. where HandTare shanhand for landirq; heads and tails n:spcclively. it beirq; understood from the contexl lIIaI only one toss or a fair coin at a time is being considered. (Note Ibat a leneralisation to describe chances of sequences or events when tossing a coin three times. say. could be PrIHHH) or Prf2H. IT}. a1thouP one has 10 be mindful to disli~uish whether one means in the latter case the onIcrc:cI sequence H. H. then T or any sequence with two Hs and one T.) II is importanl to know there ~ two ways of interpretinl probability. as follows: (1) lonl-run averale proportion or "frequentisf view and (2) degrc:c-of-belief or 'subjective' view. The lint type applies lilriclly only to those events thai are repealable. under assumed identical conditiOlLt~ such as idealised coin tossirq; or die rolli~, whereas the second applies universally. evcn 10 onc-off events. Arluably. in medicine. all events arc one-offs as lhey rerer to individual patients willa their own unique SCI of symptoms and lMalments. genetic and environmental backgrounds and occur in panicular places and times. Incidentally. statisticians need reminding from time to time thClC is no such thing as an averap patient! (Sec CONSULllNCJ A SfAllS11CIAN.) I...argely due 10 mathematical convenience and history (see HIS10RY OF MEDICAL SfAllSTICS). it is Ihe fn:quenlist definilion thai is by far the more commonly used in statistics. dominating almost all councs. textbooks and software. However. there is a growinl tendency amorq; bioslalisticians nowadays to faWHlr the subjective rormulation of probability. being the cornenlone of the Bayesian approach (see BAYESIAN METHODS). While same may criticise this approach ror beinltoo subjective andsomchow less scientific than the fl'aluentist approadJ. others que lhatthe ability to tailor n:sulas to one's own beliefs is actually advantageous and especially beneficial for applicalions in medical research. AI least the two philosophical approaches do not lead to conftidirq; results. even though the subjective approach is undoubtedly marc intuitively appeali~ and mon: obViously applicable. II is helpeulto bear in mind baIh definitions when
_________________________________________________________ thinking about probabilities and their inlclpR:talion. To iIIUS1nlIe. one can say Pr{ H, = 0.5 eil/wr becMlse the IDnInmndioornumberofheadstonumbcrorlOacsappuachcsthe wluc one-halras the number of tosses incmues indefinitely 01' because in a single lOss you n:ckon it is 50:50, being equally likely to raU tails as heads. Thus. a slalcmcnl such as a PCI'llDn'S blood group is type OJ =0.47 can, and docs, mean both that in a large random sample or the papuJaIion we CaD cxpcct 47 inevel)' lOOpc:oplc 10 have bload groupOanci that. in
Pre
thc:abscnceoffwthcrinfonnaliDn.onc~sclegra:-Gf-bcliclabaut
an individual patient being lypc 0 is 0.47. How an: probabilities assignc:d numerically? In simple situations, they ~enumcrated by appcalilllto the somewhat circular notion of "equally likely' outcomes.. Thus. for example, a standard die of six sides is equally likely 10 land any side racing upwards-hence the deduction Pr{gcuing a 60na single roll} = 1/6. This line or ~asoning is un~alislic in medical applications. however. ID practice. we assign pr0babilities by gathering (prererably large amounts of) dala fnmI random samples. in order to compute the a~ frequencies. For inslance. it may be that based on a large. national. randomly sclc:ac:d c:ohoIt that the proportions or blood types 0, A. S and AS are, n:spectively. 0.47. 0.42. 0.08 and 0.03. leading 10 the earlier pair or statements about type 0 blood gmup. Notice in this simple example that the propoltions MpRSCnling the probabilities sum to one. This is no SUlprise given thai the list or ABO types was exhaustive. A PIlQBA. BILITY DlSlRlBUTJON more generally describes the thcordical dislribution or the lOIal probability of one. These can be either for diSCrde values (some classic examples being B~DAL. PoiSSON and OECWElll1C DlS11UBUTJONS) ror which n:spectiye formulae ror the probability density funclion sum to one or else continuous distributions (e.g. NORMAL, UNIfOIW and au-SQUARE DlmtIBUTIONS) for which the area. under the probabililY distribution curve integrates to one. A HlSTOOR.UI can be thought of as a pictorial rc:prcsentation of a probability distribution. The larger the sample on which it is based.. the closer the histogram's shape approximates the underlying populalion probability distribution. Next consider the combining of pmbabilitic:s to quantify chllllCeSorm~ complex interactiOMorCMmts. Thc:n: are two JUles that apply for special typc5 of evenls thai are said to be 'mutuaUy exclusi\~' or 'indepcnclcnt' rapcctively, as follows: I. Adt/iliOlf nile. The probability that any of lWo or more
mutually exclusiye events occurs is given by the sum of their individual probabilities. For example: Pr{person is type Bortype AB} -
Pr{pcrson is Iype B} + Pr{ penon is type AB}
-
0.08 +0.03 = 0.11
p~mLrrv
Mutual exclusivity. or incompatibility. of events simply means that if one event occun., then OIhcr events are precluded (e.g. if someone: is of blood Iype B, they cannot also be simultancously of type AS). 2. Mulliplimliorr rule. The probability that two (or more) independenl events both (or aU) occur together is given by the product of their individual probabilities. For example: Pr{two unrelated people are both type A} -
Pr{first is Iype A} x Pr{second is type A}
=
0.42 x 0.42
= 0.176
Independence of events means the outcome of one has no bearing on the other(s). such as here the blood types of uruelated individuals. Note thai the concept of independence or dependence of events plays a fundamental role in EPlDEMIOLOOY. In CASECONlROL SI'UDIES. patients with a disease (the 'cases') are companxi with diseasc-fn:e indiYiduals (the 'CXJntrols') with re5pcc:t to some possible risk factor. For instance, men with testicular cancer might be compared with controls ~garding, say. milk consumption during adolescence. If more cases were 'exposed to the risk' of high milk consumption tbaa controls, then the probabilities of exposure being ditTen:nt might suggest a possible causative link. In other wOlds. the events {haying the disease} and Chaving had the exposure J would not be independent events. Thus. an observalional study can be thought or in probabilistic terms as seeking to demonstrate whether these two events are dependent or independent of one another. If deemed to be dc:pcndcnt. there is an ASSOCIA11ON (note. not the same as CtlIIStlIiorr; sec CAUSAUI'Y) linking the exposulC 10 the discasc. a mallei' of central importlUKlC. To explore Ihis Curther, one needs to introduce the concept oC a»lDlTlONAL PROBABILrI'Y. In brief, probabilities of events can change in the lighl or unfolding information. in particular whether or not another event is known to have occurred. To illustrate this. consider that the probability someone has tuberculosis is no longer the same if il is known thai their skin test for tuberculosis was positive. Letting 0 denote the event tpenon has disease. and T + denote the event t skin test is positiye}. then interest focuses on the probability of (D given T + J. which is written Pr( orr + } for short. The ".' sign is read as 'given'. meaning events written after il an: aln:ady known to have occUJTCd. Now Pr{ Drr+} is not equal to Pr CD" since the skin test ~sult. obviously. is not independent of the presence of Ihe disease. Equally. no tesl is l009f., accurate, ror if it were in Ihis example. bowledge of the skin lest result would be equivalent to knowing the tuberculosis slalus. but in reality no test is this reliable (see POSITIVE PREDICTIVE VAUJE).
351
PROBABILITY DENSITY FUNCTION (PDf) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ In gencmJ. the condilional probabililY of anyone evenl A given another (nonimpossible) evenl B is defined as Pr{AIBI = PrfA and BJIPrIB}. whc:R ",. JqJR:SeDls -division. If it is the case dlat A and B 1ft indepc:ncleal., lhen Ihe multiplication rule says thal the numeralDr Prl A and B) =Pr(A} x PrIBJ andsoiHoIlowsthatPr(AlBJ =PrIAI. JncIecd, this is aa alternative definition of independence or events. nDIIICly timllhe condilional prababilily ofan e'VCl1l., A. _1IDDIhc:r. B, is the same as Ihe uncondilional probability or event A. (Sec the relaled and important result known as
Tbis rule is beat visualised wilh aVe... diapam showinr; o~evenll. A and B (scethesecond fIpan:). Naticelhat il mluc:a to the addition law just giwa if A and B happen to be mulUally exclusive, for lhen Pr(A and BI =0. heinr; impossible to occur together. so IhaI a Vcm diapam repreacntalion would show no overlap bcIwccn selS ~ng A and B.
BAYES' 1IIl5CIlBI )
An application ofprobabilily is in the use of the summary measure odds ratio (OR) or relative risk (RR) (sec RELATIVE RISK AND ClODS RATIDS). Odds is simply an akcnualive way or expn:ssing probability, adopted and ravoured by bookmaItcn perhaps to make illess obviauslheir total probabililies within • given horse race. say. sum to IIICR than one in order lo guarantee lhemsclves a long-run proftt margin. Ir a pmbability or an e\le1ll is denoted by p.lhen Ihc odds for the same event equal pi( 1 - pl. The ocIm of an event having probability 112 is quoled as evens. Probabilities below one-half 1ft quoted as the number or chances or railing to Ihc number or chances or occuning, for a suilable lolal number or chances to avoid fiaclions. For example. a probabililY 1/8 is the same lID odds 7 to I; probability 21S is odds 3 to 2~ probability 1121 is odds 20 to I. Probabililies or events gmater thaa one-hair are expressed willa larger number flnt and lhen "oa' (instead or '&pinsl', the default assumption): e.l. odds of "2 to lon' means a probability of 213, w~as 2 10 1 spinsl is the cOIDpiemenWy prababilily of Ill. OdcIsare vinually never used in heallla applications by themselves. but do occur in the contexl of odds mtios when il is CICJII\Ie1Iicni. especially when aa"ysilll casc-conlrol sludies to have a comparative measure of relative probabilities acrass two groups. typically cases venus coabOls. When using l.OOISnC RflJRESSJON models also. compulU output usually displays odds mtios and their CONfIDENCE INIERVALS in associatioo with categorical variables. The RR is defined as lhe ralio of lwo conditional probabilities:
RR = Pr{diseaselexposcd}tpr{diseasclnol exposed} expos~ to a risk. ractor are indcpeadenl., as wheo the~ is no causal link between them. then RR=Pr(discascIJPrtdiscase} 1. whereas if exposure and disease an: depc:aclcnt the RR will DOl equal I. Finally, Ihc= an: lwo funher rules for handlinr; probability ODe can consider. The lint oflhese.. ~ferring to any events A and B. ~pnlless of whether independent or nolo mUlUally exclusive or not. says:
If the disease Slatus and
=
Pr{A 01' B} = Pr{A}
+ Pr{B.-PrtA and B}
probability Venndagtam depicting fwollBntlflJlevents A and 8 IItrIOIJg all those possible within ~Ia'" space'S. The general rule says that the probabilty 01 lying wIthiJ the region COfIfIred by A or 8 is Ibat 01 the two individual regions minus lhedoublB-countedshadedlflllon where A and 8 ovetIap 'I1Ic: second general rule for combining probabililies is as follows: Pr{A and B} = Pr{A} x Pr{BIA} Note. by symmcby.that Ibis can equivalently be staled as: Pr{AandB} = PrtB}
x Pr{AIB}
Examplcsofthe application orlhis rule 1ft round in SCREENING 511JDES. ~ il is useful to CI.1MicIer SEHSIIMIY and 5fECIf1CITY as well as FALSE POm1YE and 5\UE NaW'J\IE RAm. Other common uses of probability abound. for illltancc in
the summary measures ofincidenc:e aDd JRvalencc. survival rates withio life tables and appliclllioM in OENEIIC EJIIDEMI. to menlioa bul a few. 1be mast common sighting of pIObability io medical journals, for better or WOI5C. is due 10 the preference 10 summarise eVidence from studies in lenDs or P-VALUES, mca5lRS or probability used to assess plausibility of the NULL HYPOIHESIS or chance variation. For rurther n:ading at a recreati_allt:\reL Everitt (2008) provides an accessible acaJUIII of the role or probabililY includiDg bealih and resean:h applications. while IDOI'e formal aa:ounts can be found in just about any textbook of medical statillics. CRP OLOOY
Et'llttt, B. S. 2008: CIrtIM~ rules: tilt iJfDlfHtll rIIiIk 1o probsbility. rUt tmd JtQt&t;ts. lad edition, New YcR: Spriqcr.
probability deMBy function (PDF) rrY
DISTRI8~
See IIROBABL
_____________________________________________________
PAO~lrnY~OTS
probability distribution
di&a'de examples arc the PoISSON. CJEOlIE11l1C aad NEGATIVE
outc~
BINOMIAL DlS11UBUlIONS.
This isa ltatemenl ofall the values thal a variable can lake.. and the individual probabilities of obtaining those values. If oulcomes IR discmc (as in categorical variables. or variables IhaI can only lake integer numbas). we have two pairm Usts, one of outcomes and one or 1IIIOIf.o\BIIJI. Few example. when intaated in the number of heads achieved when tossing four fair cains.. we, have outaJmes and probabilities:
Probability A(X =x) 0.0625 0.2500 0.3750 0.2500 0.0625
OutaDc(x)
o I
2 3 4
This list of probabilities defines our probability dislribution. Note thai the list or probabilities must sum 10 one. HowcWF.even with five passible outcomes such an approach is bc:coming cumbersome. aad we ~ 10 ~Iate the: aut.come and probability malhc:maticaUy. for our example, we note that the oukome. x. and prabability. Pr(X = x). ,IR n:lated as:
Pr(X = .Y)
3
= 2.'C."(4- x)'.
whe~ x! (x factorial) is the product of all the integers up to and including x. and 01 is defined to be ODe. We Id'er to
Pr(X =x) as the probability
function. To define our probability dislribulion. we can just provide the mass
Ifthe outc:omeSlRon acantinuous sc:aIe. e.g. thelenalhofa then th~ are an infinite number of possible outCOmc8. Since the IOtai probability must be equal to one. thea generally the probability of the outcome being exac:dy a palticular value is O. In this case, ndber t~ deftning a distribution function. we define a probabilitydensily function lUmoUr.
(PDP). This is a curve.. the area under which JqRSenlS probabilitieL The IOtai area 1UIdc:r the cune repRSCRIS die: lolalprababilityaad thc:m"an: mustbc:equailOone.lngenc:ral. the an:a under the curve between any two values Jives Ihe probability that the: outcGmewiJllie between those twovalucs. It is often the case ahat we wiD work willi the dillribution function (or cumulative distribution fUaclion). which gives Ihc pmbability of bdng less than a particular value and so cqUIIIes to the area uncia- the curve to the left or a particular value. As an example cansicler the success rate of a new type or operation. Suppose that it is equally likciy 10 be anywhc:R fnnn 0 (DeVer successfUl) to 1 (always suc:ccssf'ul). Our probability deasity function will be a horizonlal line (at y = 1 since the area undu the curve must be equal to one) (sec: the ftpR). The probability that the: operation is successful ~ than 8OCJ., ofthc: lime is Ihe area under the curve between 0.8 and I. which is equal 10 0.2. Note that our example is an example of the BETA DIS'JIlIIIUIION. Other continuous pmbabilily distributions include the au-SQUARE. EXPONENIIAL. F.CL\IIMA.I.OOJ(ODtAL. 'and (most importantly) NOIUoW. DLmUBU'J1DXS.
AGL
probability plot.
These IR plats for comparing twv
IlUlSS
fUllClion (if one exists). Note that in our example the disbibulion is an example of II BINOMIAL DISTRIBUlION. Other
IIIOIIABIUTY DlS'lRlBUnOO or for assessing assumplioDs about
Ihc probability distribution of a sample of obsemdiaas.
-
~ ~--~--------------------------------~--~
~
o~--~--------------------------~--.--+--~
0.0
0.2
0.4
0.6
0.8
1.0
pmpartIon of successes
prababliity e118Irlbulan The probtJbMy density funcfIon lor lire success nile of an operation, when all rates IlI8 equaJlylilcely. Note that lhearea undsrthe CUIYe Is equsltooneandthearaa undertheCUIV8 belweenO.Band 1 is 02
ass
. ~~------------------------------------------------------,
t~----------------~~~--~
1'hem_awo . . types, ... ~.pIOt
tP~p pial) ad the "".,'. . . . ., . p/iIi (Q-Q 'pIaI). Ed type Isil......... ·iIIa·ftnt·l&aft. A pial 01 ..... wbose .~'" ~ die .-latiN pmbabili.· (p,y(q).~II)l Far • ..,.. --'~.II is • pmbabi)ily"1llQbabdily _ "Whcnas 8 pial or die poiall _ _ eaanIIII.Ia ans dID qlMllliIu{q,y(P).q,(P)} far eIi!I'cnnt ..... orpia.q.....iIe .IfII.aliIePIaL As·. . . . . . . . ·q_~piOtrari~_ ...mpdDa·dial. lid of daIa' is lain ·a· NOIIML IJ&IriJIUnaN waaId' iIMII~
-._ ~ ~ -.,...c WIuea 1(1),)'(2)1- - - ,),(oW) apillll. dID .qUaDb1c& or'a ' ......... . Dorm.hIiSlribuiioa, . ' . .•..e.: . ,-lfp~J
whIR IIIWIII,:
Pi -
~
1-1/3 . . fl(z) -
n+ 1/3
r
1
'e~du
4»;;Z;.
~1lJ"""DiIPam"""'Jgp.panilOQ"" . .. ". . o
• • ,•
2
o
..5
••
/.
o
'1
•
.
·4
o
It
a· .
0"
o
,/
~1.
·2
J • itII 1
0·0
o·
o·
•
o
0
••
.~,
0uanIIes aI.1Iandard nannaI ........, .... t. IJDII1IIIi ~,pItJIs:. ~ 117m."....,."..".."..
(II)
foo ~ "l1DIII •
notIfIaI
~ BIId (bJ
foO
S
E
R
O
C
S
Y
n
S
N
~
P
____________________________________________________ This is usually known as a nornJQI probtJbilil)' plot. Two such plots arc shown in the second figure on page 354: the first (a) is the probabilily pIal lor 100 points generated lrom a nonnal distribulion and the sc:cond (b) is the corresponding plot lor 100 points generaled from an EXPONENTIAL DJSfRIBU11ON. Plot (a) is essentially Jinear. confirming lIIat the sample is close to normally distributed. Plol (b) by conlnst is clearly concave. indicaling.the p~nce of right SKEWNESS. KUJrJOSIS if present would show itself as an S-shaped pial. SSE
the latent JapDIIIIe formalaliOD by exlCDding thc Ihreshold madel in (2) as lollows:
problt model
samc way as their probit counterparts by ~placing the link in equation (I) by a logitlink and the disbibution in (2) by a logistic distribution. LOOIT MODELS are more popular than probil models because the rqression parameters can be inlclpretc:d as 101 odds ratios and because the models can be used for CASE-CON1ROLS11JDIES. One adyantqe 01 probit models is thal they can easily be cxtended to the multivariate case by usinc sc~aI latent responses havinl a MULnYARIAU NORMAL msTRJBU11ON. For further details see Finney (1971). SRHIAS
A model ror dichotomous or ordinal responses that can be defined either as a CENEIWJSEI) LINEAR MODEL or by using a latenl n:sponsc formulalion. In a generalised linear model, the expectation PI of the response for unil i 'given cowrialcs x, = (XI" •• •• Xtl)' is modeUed as ,(Pi) =!Jo +JJIXli + ... +lJtxti = rJl.~r(·)isalink function and I arc regression paramcI~ In a binary promt model (far dichotolllCHlS rapanses). a pl'Obillink .-1(.) is specified as:
.-I(pj) = x'/larp;
= .(~)
(I)
whc:R 41(.) is the standanl normal cumulalive disbibulion runction. The distribution ory, is then specified as a BemaulIi willi MEAN p,. Praparlions oul 01 a total of ftl trials can be modelled using the same link function by specifying a BINOMIAL DIS'I1UBU1ION with dcnominalor ft,.
The same madel can be formulated by spc:cifyinc a linear regression madel for an underlying latent (unobserved) continuous response
y;:
)'; = ill + E; with E, having standard KOIWAL DIS'I'RIBUTIDN. The observed clichalomous response y, thea rqxesents an indicalor lar exceeding zero:
yi
;:5 0
.. _{Oif Y " lifv! • I
>
°
(2)
This latent response fonnulation is equivalcnl 10 a generalised Jinear model with a probil link and BcmouUi dislribulion:
Pi
=EV"JI-tS1I) = Pr(.vi = 11-tS1) == Prtri > Ol.r;/I) = Pr(zil + E; > 0) = Pr(e; > -r;fI)
=Pr(e; :5 x';//) = .(.r;I) where Ihc penultimatc equality hinges on the symmetry of the nonnal density of e,. Consider an ordinal raponse variable with S + I catclories O~ ...• S. An ordinal probit model can be specified usinl
r:
~=L
if
• "1< Yi•
if
HS<
if -00
<
Yi
•
Yi
:5 "I ~"'1
~oo
where " .. s == I•.•.• S are Ihreshold plll1U'l'lCtcls. Binary and ordinal 10lil models can be defined in the
f'InneJ, D. J. 1971: PlObit . .lysis. CamkidF: Cambridge Viiivenit)' Plas.
product-limit
estimator
Sec
KAPLAN-MEIER
esmlAlOR
product-moment correlation coefficient
A synonym lor Pcm'san's camJaaion CXJdftcienl SceCORRB.A'l1Cf,l
propensity scores
In OIIS'BlVA'IIONAL SI1JDIES aimed at comparing lR:atmcnl arms iD\'esliplors haYC no c:onIrol over the tn:almenl assipmenl TberefcR in caabasl to nndomised controlled trials (RCI's) (see CI.INICAI. TRIALS). estimates of group clifl'CRnces in outcome deriwd from such studies may be subject to confounding by observed prc-ln:atment ccwariales. The propensilY score lor an indi~ual is deftnc:d as the COJiI'DI1IONAL FIOIL\BlUTYolbcinc lR:aIed (or more generally or beinginlhegruupolinterest),giventheindividual'scowl'iale wJucs. A sample's propensity ICDIa can be used 10 balance the eoYBriatcs in the IWO gmups. thus m:aling a ·quasinnclamiscd' experiment and avoid bias. 'The approach has· been applied in many fields iDcIucIing medicine. epidemiolO&)' and health services research in attempts 10 clc:rivc causal effect ellimales from observatiaaal studies. As a malivating example consider the prospective cohort study (see COHORT snJDIES) recently utilised by Ye and KasIwtas (2009) to investigate the effect of Alcoholics Anonymous(AA) DlCCtingDltenciance on alcohol abstinence. Observational saudies had consistenlly found stronl d~ respaasc relationships beI~n AA meeting attendance and abstinence bul to date ~ was liuJc cvidence or such a
3S5
PAOPENSnYSOORES ______________________________________________________ relationship rrom experimental studies. The relationship between Ihe tn:lllmCnt "AA auendancc' and the outcome 'abstinence' in observational studies is poICnliaily subjc:ct to confounding. in that there may be a number or observable pre-treatment variables such as akohol problem severity, self-motivation and ooercion by othen that affc:ct study participants' decisions 10 go to AA meetings and independently contribute to them becoming abstinent. The authors employ propensity scce methods to adjust the AA effect estimate for SELECTION BIAS due to observed (lOIIrounders. The ability or the propensity score 10 balance groups with n:spcct to a Iar:e set orcovariates was conftnned in a seminal paper by Rosenbaum and Rubin ( 1983). The propensity score is Ihe coarsest balancing scon::. which is a function of the obsencd covariates such that the conditional distribution of the covariates given the balancing score is the same for treated and untn=ated individuals. Unclei' a slrongly ignorDble treolmenl assigrrmerrt, i.e. given the obsencd covariales treatment assignment is not dctmnined by the poICnlial outcomes, at any value of the balancing 1C0re. the difference between the treatment and control group means is an unbiased estimate of the a~ treatment effect at that value of the balancing score. COlUiicqucntly, with a strongly ignorable treatment assignment: • pair matching on propensity scon:s; • subclassification on propensity scores (also referred to as strati fic:.ation); • and covariance adjustment (also refem:d to as regrasion adjustment) can produce unbiased estimates of causal tn:almcnl effects. Propc:lUiiily score matching is pan of the design of an obserwlional study and refers to the proc:cdure whereby individuals in one group arc matched to individuals rrom the othel' group on the basis of similar propensity scores. 1)pic:ally, in observational studies the (lOIItrol group is much larger than the Rated group. thus making it reasible to sclc:ct a subset of controls that match the Imdc:d individuals. Matching can be one-to-one or onc-Io-k individuals and at the end of the matching process the sample is reduced to a smaller analysis sample. When controls are matched to treated individuals the analysis estimates the average IreQt· ment effect on the trl!Qlea (ATT); (lOIIvc:nely. ir treated individuals are matched to conbols the tnwage treotmenl effecI on lire untrealed (ATU) is estimated. Tbe:re has been some discussion as to the mechanics of the matching process (nncIom sampling oftreatcd or unln=ated individuals, matching with or without replacement. definition or "similar' propensity scores. etc.; for a review sec. for example. D'Agostino. 1998) and a number of propensity scon:: matching algorithms have bc:cn pul forward (e.g. in general purpose pacUgcssucb as SAS and Statal. It is important to n:alise that
when propensity sean: matching is carried out as pan or the design of the study this needs to be acknowledged by Ihe Slatistical analyses method (as in matched CASE-t"ONTIIOL S1lJDIES). e.g. by including nncIom etTects for matched pairs or by allowing ror within matched pair cOlMlations by other approaclJes such as OENERALIZED ES11MATINOEQUATIONS (GEE). This is clearly a point not always 8PJRCiatcd by the practilioner (sec Austin. 2001). Stratification by propelUiiily scores n:felS to the technique whereby individuals arc grouped into stnla on the basis of their propensity scores, treated and control subjects are contrasted within each stratum and a weighted average of these differences conslnK:ted to estimate the average trc:atmcnt etTect (ATE, weight = stratum size), the ATr (weight = number or treated in Slntum) or the ATU (weight = number of unln:ated in stratum) respectively. Rosenbaum and Rubin (1984) showed that Cochran's (1968) result. which states that five stnta each containing 20CJt of the subjecls remove 90fl. of the selc:ction bias. holds for propensity ~ stratification. Thus stratification according to propensity score quintiles from the combined group is commonly employed by practitioners. A third technique used to achieve propensity score conditioning is the inclusion of the propensity score as a covariate in the analysis model. the 14Kalled "regrasion adjustmenl". Howevu, while very simple. this technique further assumes that average In=alment effects do not vary with the value of the propensity score and n:quires knowledge of the functional relalionship between propensity scores and outcomes: thus propensity score matching or stratification arc the prererred techniques. A big issue for the practitioner is the coasllUction of the propensity scon:s. These are not typically known and have to be estimated frum the sample data at hand. A standard approach is 10 use DISCRIML'lANT FUNcrJON ANALYSIS or modelling techniques ror the binary outcome "assignment to Ihe group of interest/assignment to the comparison group', such as LOOJS11C REORESSJON or the PROBrr MOOfl.. to establish a rule for praliaing the probability of being assigned to the group or inten:sL and lhen to estimate this probabililY from the relevant covariate values for each member or the original sample. Note Ihatthe underlying theory that supports the use or estimated pmpcnsily scores is based on the estimaaecl propensity scon::s producing SIlIfrple balance (see Rosenbaum and Rubin. 1983). 'l11erefore, one tends to be o\'erinclusive rather than underinclusivc in the modeUing of trcaImcnt assignment in this context, including nonlinear and interaction efJ'ec:ts where possible. For example, Brookhart and colleagues (2006) show that standard model-building 10015 for pmlictive modelling wiD DOl always lead to good pr0pensity sc:on: models. in particular in smaller studies. These authors fW1hc:r suggest that co\·ariates n:laac:d 10 the outcome should always be included sinee they add precision without
_____________________________________________________ acIcIiq bias. If the cowriate 'is a conraunclm-. i.e. also relatc:cl to batmenl assipmc:nt.,lhc:n its iDelusion wiD aIsoclec:n:asc BIAS. IR conlnlst, cx:muiates relalallo 1n:1Ilmc:at assignment but nol to the au~omc: will cIc:cn:asc: precision witho~l
P~NSnYSCORES
difl'erc:ncc:. delined as the absolute: group ditTerc:ace' in iample means divided by an estimate or the pooled STANDARD' DEVIATION (nol STANDARD ERROR). In addition. when using propensity score matching the overlap between the: ~stributions of die propensily scens between Ihe treated and the conbol individuals should be: assessed in order·to be able 10 comment on the: represenladveDc:ss or the matchc:cl sample. As an example consicler the: lirll table. which summarizes the distribution of potential conrauadc:n in the: AA aad abslillc:ace study before and after propensity score matching. I'rvpCnsily scores were estimated using logistic regression modelling. Then 102 nonauendec:s were matched 10 212 altc:aclc:c:s using the Slala-user written command PSMATCH2. Propensity score matchiq perrormed well in terms of balanciq the: potentia) conraunclers. with mast standardised differences·between Ihc: AA aliena and nonattenclc:e groups reduced 10 below. 10CJt of .
~rc:asing
bias•. As the use or (estimated) propensity scores is motivatc:cl by Ihc:ir ability to balance covariate. across groups after matchina; or .-lilication, such sample balance should be cheeked berole pnxec:ding to inlerpret lJeatmenl etreCls obIaiDed using propensily scores as adjusted for selection bias. However. as Austin (2008) points aul, iD ~1riIs.t to other applications. the stalistic for assessing covariate bahiace should ~ a propelty of the sample (and nol of an unclc:rlyiq population) and the sample size should not atrect its value. This rc:quirc:mc:nl ~Ies oUI the use or statistical hypothesis telling. A number or appropriate: statistics have been proposc:d. with perhaps thl; most commonly used one beina; the standardised
prapeiI8Ity .cores Pre-treIIIment cov. . . distributions beIQte and after propensity Scote millching (extract from Table 1 in Ye and Kaskutas, 2009) AAIJlIader IIr=JJ6}~
(SD)
AA
1IlIIItI""'"
SttllldartliZal.1f!1Iff ;" "
(1r=2JJ) IPIftIII ISD)
Ilit/tNr IrItItdiilr& (whol, MllftJlkI DmIo,1'tIp/r1a Male
Mean. E1hRidly White Biadt Odas lAlDitol
Married $CpDivlWidow
SiDJIc
Level of cducaliDft lAo/ira'iDn Radiacss to cban&e iadex 'CQm'icm Number who prasun: ,.,..
0.59 (0.49) 31.8 (JO.I)
0.56 (0.50) .16.1 (11.5)
0.63 (0.41) 0.26 (0.44) 0.11 (0.32) 0.34 (0.41) 0.36 (0.48) 0.30 (0.46) 3.34 (UP-) so.O (6.7)
All'" propmsi" xtW
IIIII'C"."
6.93 li.4
0.00 -11.1
o.sl (0.49)
9.67
0.26 (0.44) 0.16 (0.37)
-0.36 -13.3
4.32 -4.G1 -1.Dl
0.45 (0.50) 0.21 (0.45) 0.,.27 (0.45) 3.11 (0.98)
-21.4 16.8 5.72 15.7
46.6 (7.5)
-9.72
-Dm -10.1 1.40
48.3
-11.5
1.85 (1.31)
1.58 (1.16)
22.6
16.1
0.58 (0.10)
0.53 (0.72)
5.96
0.93
0.43 (0.32) S.20 (2.77)
0.34 (0.30) 3.6912.67)
30.8 55.1
6.Dl 9.14
1.4~
0.99 (1.15)
33.4
-3.81
10 Id IIaIIDeId .
Number who P'" , . ultilDllwn. hobtern moerl" ASI campasite alcohol SCIR Number of dqleDlleaCe S)'IDptans
Number of alcoholIdaIcd CGDSCquaacCa
(1.42)
3S7
~nv~8S
____________________________________________________
the n:spedivc: standard dcvi8lion. Por example., the SllllldanliscddilTcn:nc:eiDclepc:adcacesymplolns wunxluc:cc1 fiom 5K of a stancIud deviation 109.8•• Howe\ter, lite m8Icbed 58111P1e had a hip II\'CI8&C pmpensily SCCR than the oriJiMI_sampie due 10 the ...ea group beiDg IllBldalIO (AA attendees) having hi&her ~ilies Ihan abc sample asawhDle. The sc:cond table shows Iltel:Slillllded AA eft"c:cls on abstinence befiR and after pnJpeIUil), SCCR malching. As expeclcd. the ODDS RATIO associlllCd with AA allendance is n:cIuced coasiclerably (rtvm 3.6 10 2.2) after actiusliq ror abscned confounders by pmpmllil)' scan: malching. The bias -U....nI was n:peaIed using prapensil)' scan: stratiftcaIion. SlUdy pallicipaals wen:classccl ialooneof.ncstrala usilll pmpensily sc~ quinliles. 'Ibis allowed abc use of lite whole sample of 334 AA auendees and 22811011111tc11c1e. A SInliftcclaulysis lheaeslillUllalthe AA etrcct _ abstinence within each ....m (see Ibe middle sc:ctioa in Ibe second table). This p\Ie Ylll)'ing odds ratio estillUlles wilillarpF AA c:O"eclS absened in-abc straI.a with lower AA aaendllllCC prapcIISil)'. 'I1Ic combined acids ratio was 3, apia SUllCSIing
dial pad or the l111114usted AA etl'ecl was due to absc:ned
baseline confounders. One lllilhi uk: ·Why UIe propaIlll)' SCIOn: matching ar slndiftcalion inslllad or ~iional malching ar SIralificalion by COYariIlleS?' The answer-is Ihal. prupellsil), ICOn: appraacb has the SIIIIIC-aim, ie. bias !eduction. but.- usc of pmpensil)' scan:s has practical advan~ especially when IIIeIe~a Iup numbcrofc:cmuialcs lDcGasicier. Evea when IheIe ale onl), a few covarilllcS. it can be dillieult 10 find conbalsdlat match Ihe In:atedindivid....son allthecO\Wiale \l8l11eS. PIapeasiIy SCCR IRIdchiq ow:rcomes this problem by allowing Ihe iDvestiplortoconlnJl rarmuliiple mvariales by usilll onl), a silllie 1CaI.. lIIIIlCIIiq variable. Similady, in the coal"1 of slraliftcalion the- number of passibIa SInIla paws c:xponenlially wi'" the number or binuy bKlrpvund clunctaistica. and this will eVenIuall)' lead loonl), oncollbe puups bein& pn:a;ent in _ of the Illata. Apia the pmpcasil)' scan: as a acaIar SIIIIIIIIIII)' of die CD\IIIIiaIcs is userul as il can balance die clislribulion of the covarilllcs acrass the IfOUPI wililin the linda without requirin& a kqe number ofSlnlL_
prapenslly aeoNS EIfet:t d AA attendBnce on 8bsIin8nce IIIIIImated btIfore and lifter pmpensity SCOfBIIdjJBIm8nt (fIJtIrat;f fmm Table 21n Ye and Kssicutas, 2009) Mil". . III /tJ/1otNtp
Bcf'en IIftIPCIISitI scan: ad~
PlapeDSily ICCR stnIIUicaIioa Subclass (fiam IDM:st AA IIftIPCIISitI to ..... AA pnJpCMil)') I
2
3
0.377 (0J02)
Yea
334
G.6I6 (0.025)
No
93
Yea
20 61
0.323 (G.049) 0.700 (0.105) 0.368 (0.059) 0.744 (G.067) 0.432 (0JJI3) 0.707 (o.os3) 0.476 (0.112) G.652 (0.050)
No Yes No No No
Yea
A4U11a1
"'IMIIIf(SSJ ftJ/ItnNIp 228
Yes 5
lltIIe Il/.'iMMe
No
Yea 4
•
No
Yes
43 'S1 7S 21 91 9 104 228 334
0.556 (0.176) G.673 (0JM6) 0.431 (0.106) Cl696 (0.061)
DiI~MKW;"
",Ie
oftlbslillencr
__ (5£,
-0.308 (G.04I,.-
-4U71 (0.116f
4.90
-0.377 (Q09I,.-
5.00
-0.274 (G.09Sf
3.16
-0.176 (0.117)
2..06
-O.lll (0.165)
1.65
-0.265 (o.l26r
3.12
PlapeDSily ICCR IDIIIdain& (DCIIat
dist.ace with RPfacement) Marchini AA IIInCIae paap
·p
No
102
Yea
282
0.479 (G.067) G.670 (G.02I)
OddsrtlliD tm«itltllll w"6 AA ."mtltInce
-0.191 (Om3)'
------------------------------------------------p~~~ Finally. ATE eslimalc:s are sometimes obtaincc:l by weilhtiDl observations inversely to the probability of selecliq their trealment group (which is the propensity scorc for the trealed individuals and I - plOpensity score ror Ihc conllOls). This is a particular application of INVERSE PROBABILRY WEIGHTING (lPW) estimalors and here the motivation is somewhat difTerent compared 10 propensily score malching or stratiftcation. Inverse weighting methods an: used in suney sampliq and for MlSSINO DATA. plOblems when the sample used to draw inferences is not a simple nmcIolD sample (SRS). Ralher. memben or certain subpopulations are under- or over-sampled and thus in the analysis each individual is weighted relative to the inverse probability of beiDl sampled. The concept can be applied to deal witb selection bias by viewing the (hypothetical) sample of potential outcomes under trealment aDd no In:atlDClll as an SRS from Ihc: populalion of interest. The obsened oulcomes are then realised nonnuadomly with the probability of observing a potential oulcome being Ihc: probabilily of selecting ill trcallnent group. In summary. propensity IleCftS are a uscful 1001 to quasirandomise observational sludies. The approach can have beneRII in terms of gcacralisability 10 wider populations than those Iypically studied in clinical trials. rcduced cost and time in obtainiq the data and ability 10 investigate smallereffcct sizes due to larger sample sizes. However. it needs to be emphasized lhat the approach can only reduce selection bias due to observcc:l confounders. Thus Ihere remains a nccc:l for experimental approaches (Co,. RCTs) or analysiS approaches (e.l. the use of INSTRUMENTAL VARIABLES) thai can address confounding by unobserved variables. (Sec websitcs hltp-JIwwWl.sasAXJllllprocccdinpi sUli261p214-26.pdr and hltp:/liclcas.repcc.GIJ/cIbocJboc:cxIcI s43200I.haml). SL AIIIII... P. C. 2008: A critical appraisal of prapcasily-scCR matchiag ia the medical lilClalUR bctftcn 1996 and 2003. St",&tic$ i" Metlitine 27, 2037-49. M. Ao, Se......._ S.. 9 ......... K. J., 01,.., R. J., A..ra, J. aad &tinDer. T. 2006: Variable sclcciiOft for propensity sco~ models. Am,rittm Journtll oj £pi_miDi"" 163. 1149-56. C...... w. o. 1968: The dl'ecli'VClICSS of adjUStmeDt by subclassification ia ~lIIO\'ia, bias in obscnllliaaal studies. Biomelric.24. 2M-13. D'A........ B. 1998: Tutarial in bioslalislics: propcasily scare methods for bill raiuclioa in lbe comparison of a ,"lIIIIenl 10 non-raadomised coalrot IIOUP. St",&tiC$ in Met/itin, 17. 2265-81. _Dba...... P. . . aad RabiD, 0. B. 1983: 'I1Ie CClllrai role of die prupcnsily 5CO~ in obSCl\'alionai studies for causal efl'cds. BiDIMtrilca 70.41-55. R........... P... aDd R....... D. B. 1984 Rcducinl bias in obscnalional studies using subclassificaliClft on the propeasity scare. JOIIrlltl1 oj lhe Amerita" Slatulit,,1 ~riat;o" 79. 51~24. Ye, Y. and ~.... L A. 2009: Usine propensity
&no........
to adjusl for selection bias when asscss.iqlhc effectiveaess or Alcoholics AaonyDlOUS in obsenllionaJ studies. Dr", ""d Altolrol Depen_nce 104. 56-64. saRS
proportlona. hazard. Whc:a comparilll two lraups with limc-kH:venl data the summary statistic thaa is usually employed is the hazard rate in one paup compualto the hazard rate in the other group. which is usually called Ihc hIlZtlnl ralio.lrtbc hazard ndio is constanl over lime. i.e. il is ilrtlepentlenl or lime. then the hazards in the two paups an: said to be pRJpOItional-lhcre is a constant multiplicative relalionship betWCCA the two hazanI rates. Malhcmalically. this relationship takes the fonn:
hi (1)/Iro(I) = exp(ll.XI
+ /l2X2 + ... )
where /rJ.t) and h.(I) an: the hazard runctiCJIIs in the lWo groups (which may vary over time). XI. X2 • ••• an: the explanatory variables and /l •• /h, ... an: the CDCOicienls eslimated Iiom the data. The assumption or proportional hazards underlies Ihc inclusion of any variable in COX's REORESSION MODS- ~ fon:. it is usually important to assess Ihallhc hazards an: approximaw:ly pnJpOItionai across dilfCRnt groups befon: including a variable in a Cox maclcL There is. however. no unique: or completely salisfaclory way iD which to lest this assumption. Many approaches have been sugeslCd and hen: we pn:scnt one numerical and one graphical approach thai seem to pc:rfann 85 well as mosI others (Persson. 1991). A numerical appraach was introduced by Cox himselr in his oriIinai paper introducing the model. It involves including a liIDc-dcpcndcnt covariale in the model. Thus for one
variable. the model would look like h.(I)IhJ.I)=exp (/l.X. + yX •(t»), with a simple extension for mon: variables. The NULL HYPOI1IESU or proportional h~ i.e. y =O. is tested by filling the model and assessing this null bypolhc:sis in the usual way. One simple approach to assess this assumption graphically is 10 use a complemenlary log plol. This is a plot or log I against 10gl-loIIS(I)II. where I is lime from baseline in whatever unit is being used and $(1) is the surviving proportion allime , (see the figure on PUle 360). If Ihe assumption of proportional hazanls holds true then the curves for each group should be approximalely parallel (i.e. the curves will be identical bul vertically shiRed by a constant). One: may fonnally lesl lbal the ·slopes· of these cu"es do not diller. bul this is nol usuall), vel)' helprul or infonnative (e.g. usuminl thai the curves an: approximately linear) and thus usually we an: left with visual inspection alone. to assess whether the curves an: inclccc:l parallel. This is true of all gnaphical methods. Excepl for gross departures from proponional hazards or
PROTOCOLS FOR CLINICAL TRIALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ as a supplemeot to numerical methods. these methods MSiMP cannot be widely recommended.
2 1
-4
112 Extensive Linited disease disease 0123456789
logt
proportional hazards Graphoflog (-Iog[S(t}}} against log t In patients treated with ECMV chemotherapy with limited and extensive smaI eel lung cancer (from Machin and Parmar, 1995) Clewes, M. A., GauId, W. W. aad GutIerrez, R. O. 2003: An ;"Irotlwtion to J'IIIl11Y11 QllQly* UJing Sltlt~. moiscd edition. Texas: Slam Plas. Maddo,D........n.r.M. K.B.I99S:Suniml tllltl/yJis: a prat:titJIi aJlllfYJtlrlr. auchester. Joha Wiley at Sons~ LId..
......... L 1991: Essays oa the assumption or...,ponional hazards in Cox's rqJCSsiOll. Acla UIlirerJilalis UpJQltI~lUis. UppsaJa: ~hmsive Sununaries of Uppsaia Disscdllions fram the Facully of Social Scieaccs. PIaatadaII, S. 1997: CliRitJIi trials. New York: Wiley Intcnc:icace.
protocols for clinical lllall These 1ft formal documents outlininl proposccI procedures far carrying out a CLINICAL lRIAL. Protocols for clioical billls serve D variety of
musters; cOMCquently, a well-construclCd plOloc:ol discusses a mnge of issues relevant to the trial it describes. The principal investiptor uses the protocol us a tool to describe the scientific rationale behind the study ancIlhe goals of the trial. Tbe study staff clc:pends on the protocol for operational guidance on whom to recruit. what study prac:edures to ped'onn and how and when to perfonn lhem. The protocol mUll include information about safepwds planned for the pmticipants 50 that institutional boards and ETHICAL REVIEW cmo.anus can assess the appmprialeDeSS of the plans to protect the participanlS in Ihc rae-=h.lflhe trial is to be used as part of a rq;uJatory submission. the praloc:ol should
include enough information so dlat when Ihe trial iscompicte and Ihe final raults praenlCd. the regulatOlS will be able to assess the quality of Ihe study. the raults and the statistical interpn:tution. Fanally. in unticipaUon ofevenluuJ publication of the raults. the investigaton.. in writing Ihe protocol. should consider the types of data they expect to include in publicalions. An ideal protocol addressc:s these mUltiple musten in a way thai is well orpnisccL easily readable. unambiguous and internally consistent. Several guidelines 1ft awilable 10 help the writer consbUct a pmtocol thai co\'en; all impodant aspects. notably those produced by the International Commiuec of Harmonization (in particular El. E9 and EI0) (wwwJch.otg). the vurious guidances and the "painls to consicb-' documents provided by the United Slates Food and Drug Administration on diseasc-speciflC pIOloc:ol5 (www.fda.org). as well as the CONSOIn' slatement on recommendations for reportiog parallel-group randomiscd trials (Moher. SchullZ and Altman. 20(1). All of these provide useful advi"" to the canSIruCton of clinical protocols. These doc:ume.nIs an: pidelincs; a specific: protocol should incorporate the relevant features described io these and related doc:umc:ats. individually IDilared to the study at hand. A protocol that reack us if it ~ cooslnlctcd of boilerplate sections from various previous protocols, ar uncritically editc:d versions of standard lempJaaes. is not only baring to read but. worse. may lcud to faulty compliance to its procedures. Most weIJ-conSllUctcd pn3lOC:Ols ~ a similar sb'Ucturc. They begin with a lCCIion discussing the disease under study. Next comes a description of the context of the particular trial including infonnalion on the intc:rvention uncles- study• Depending on the n~ or the study. this section might describe the phannac:ology ofthc product and the justification for Ihe particular dose 01' doses under study. the mechanism and IiInIct1Re orabe device or the jUSlifk:Dlion for the particular behavioural inlel'\'Cnlion. AnadIer section clc:scribcs the aims and goals of the slUely. The remainder of this entJy asmmes that the inlerYentions under study will be drugs or biola;ic:s; however, the gc:neraJ considendions apply also to trials of devices and behavioural interventions. The protocol will carefully define Ihe population of interest. specify the colly criteria and delineate the impDltunt facets of the study design. The heart or Ihe pnJtocol will contain a desc:riplion of the procc:durcs for scn:cniog. carolment and subsequent visits. The actual content of these sections wiD depend on Ihe design and purpose of Ihe study. PHAsE 11RL\lS or dose escalation studies. which arc typically nat hypothesis-driven. aim to proVide evidence that Ihc product is sufIlciendy safe to allow administration to IIICJR pcnple. Con.sequently. lhe section describing the design of this type of study should adchss the consideralions. lIatistical and otherwise.. that Ic:d to the particular sample size and
________________________________________________________
to the choice ofcriteria for dose escalalion. The proIoc:ol will describe the crilCria by which the various doses under consideration will be evaluated and the methods by which dose escalation will proceed. Typical PHASE ll11U.W studying nondinical ENDPOINI'S as well as both PHAsE III and laler conftnnatory PHAsE IV TRIAlS studying clinical endpoinlsaim to test hypolhescs. A protocol for such a tria) should UDambiguously describe the primary endpoint. the primary hypothesis and the stalistical methods plannc:d to tcsllhal hypothesis. A seclion on statistical POWER should justify the sample size. Ir the study has a CXlRtrol ,roup. the protocol should describe ils natun: and should ddend its choice. If the study is testing equiyalence or noninferiorily, the protocol should specify the appropriate equivalc:ace 01' noninfmority margin (sec AC'I1VE CON11lOL EQUlVAI.!NCE SruDIES). A section on secondary endpoints should present a cogent rationale for each willi a discussion of how each one: will yield infonnation that augmeAlS the data provided by the primary endpoint. All proIocols should contain a statistical section thai spc:ci11cs which panicipanls the primary analysis will include. how missiq data will be handled and the plannc:d Slalistical methods. Early phase studies typically usc descriptive and cxplorattxy SIaIislicai mc:ahods. for such studies aim to prodta data reicYanlto the design of subsequent trials. In Phase III or Jaac:r trials. however. whc= the purpose is to test hypothc:scs. the statistical section should describe a ri,orous approach to analysis that pn:sc:rves Ihe TvFE I ERROR nle. Failure to define the primary endpoinl clearly. 10 select a rigorous statistical approach and to specify statistical analyses unambiguously jeoparcliscs the ultimale wlidily of the inference from Ihe trial. Prococ:ols for randomiscd clinical trials should describe the methods of RANDOMISATION and. if relevant. the nalure of any stratificalion. If randomisation is blocked. the protocol should not include the block size because making that infonnalion available can potentially lead the investi,aaor to deduce the treatment given to some participants. For studies involving BLDIDINO. a seclion should describe the methods used to conceal tn:atment allocation. Oc:nerally. protocols should discuss the methods planned for erwuring unbiased assessment of outcome. The rmal sccIions of the protocol provide RIles for dnJg disposition. handUng unexpected D'enC evenls, monitoring safety. IHIterinJ: to replalOly guidelines and adminislnlivc ma11cr5 essential to the conduct or the study. (11Jcse scctioas. unronunatcJY. sometimes ~ as if the writers had bc:c:ome tired by the lime they rcac:hcd them. If the IriaI is studying ovarian cana:r. for il15lancc. the usc or the pronoun 'he', 01' even the mCR politically ~ 'he 01' she', clc:arly indicales that the writc:r simply puled the material from another protocol!) The protocol must describe the plans for monitoriq the safety of the participants during the study. the methods of
PUBUCAT~B~S
rollow-up and the way in which the inw:stigatOlS plan 10 protect the panicipant's rights and conftdentiality. n.c protocol should be clear enough thai the dcsi,ners of the case repon fonDS can usc it to CXlR5IrUct the fOnn!. n.c inclusion of all this DCcessary malc:rial makes many protocols VCI)' loq. To enMR that Ihe clinic stair implementiq the protocol understands the purpose of the trial and the (XOCX'dures.the document should CXlRlain two summaries. One. which ,enerally comes at the beginnin, of the document. is a two- to (approxinudcly) four-page synopsis brieny dcsc:ribing the produci or other inlCrVCntion. the objcclives. the study design. the study population. the dosing and dosin, regimen, the primary and scconduy endpoints and the statistical plans for these endpoints aloq with a discussion of the power. n.c second summary is a one- or Iwo-page flowchart lisling the procedun:s to be perfonncd at each visil. A helpful aid toaccumte implcmc:ntation ofthe pmtocol is a laminated pocket-sized card CXlRtainiq a miniature. but Iqible. version of this nowchart. In summary. the protocol for a clinical trial should justify the study and describe its procedures. hypotheses. statistical plans and adminislrative luidc:linc:s. A clearly written p~ lOcol with close connection between the trial's goals. aip and analysis plays a crucial role in implc:mentiq a study that is likely to result in a mIRct inference. JW Maller, D.,Sbultz. K. F. and AlImaD, D. G. 2001: The CONSORT statement: rnised n:conuncndalions for improving the quality or rcpons or parallel-group raadomized bials. Annals of Inlenral Met/kiM 134. 7-622.
publication bias PUblication bias refen to the publication or nonpublication of research findings. depending on the nature and din:ction of the n:sults. Its potential 10 undennine the validilY of medical research was noted by Begg and Berlin (1988). Then: is extensive empirical e\'idence for the exislenee of publication bias. Scherer el QI. (2007) reviewed 79 studies describing subsequent full publicalion ofn:scan:h initially presented in abstnlCt or short report form. Only about half of ab5lracts presented at conferences were later pUblishc:d in full. and subsequent publication was associated with factors such as 'positive' (usually equated with statistically significant) results. KClCptance for oral prcsenlalion (venus poster presentation), clinical n:sean:h (versus basic research) and randomized trial design (versus other study designs). Hopewell el Qt. (2009) reviewed COHORT snJDlI:S of re,istercd CUNlC'AL TRIALS, in which invesligatOlS were subsequently mntaclcd 10 detcnnine the publication Slatus of each completed study. PUblication was more likely if n:sults were positive (definc:d as results classified by the investi,aton as statistically sipificant (P < O.OS). or perceived as strikiq or impoltanl. or showing a positive din:ctionofeifecl) ratherlhan negative. Other factors such as the study size. funding SOWtlC and
381
~v~u~
_________________________________________________________
academic rank and the sex oflhc primary investigator WCR not CXJDSistcatly associated with the PRO&~ILII'Y or publication. Publication bias should be seen as one of a number of n:poIting BIASES. which also include lime lag bias (the rapid or delayed publicalion of n:searcb findings), multiple publication bias (the multiple or singular publication of research ftndinp). location bias (the publication of resc:an:h ftndinp in journals with dilr~nl case of access or levels of indelling in standard databases). cilalion bias (Ihc citalion or noneilalion of resean:h ftndings). language bias (Ihc publication of n:sean:b ftndinp in a particular language) and oulcome reporting bias (thc selcctive reponing of some outcomes but not oIhcrs). depending on Ihc nature and direction of the n:sults (Sterne. ~gcr and Moher. 20(8). Empirical evidence that selective reporting of oulcomcs ",;Ihill studies is an important threat to the validity of resean:h findings has el11CflCCl from a series of cohort studies conducted by Chan
l\L T., GtItDUet P. C._ AI......, D. G. 2004: Empirical e\'ideaoc: far selccti,·c n:poning of outcomes in randornia:d bills: comparison of prtMocols to published articles. Jourlfllloj Ihe American MMit:ttl Associolion 291. 2457-65. Eaer, Me, SIIIItb, G. 0.. SC.......... l\Laad 1\Ilnder,C.1997: Bias in mca....ysisdetccted by a simple, paphical test. snlU/r Met/irQl Joumo/ll5. 629-l4. HopeweI, 5 .. Loadoa, K.. Oub.1\I. J., 0mIaa. A. D.... DIcbnm. K. 2009: Publication bias in clinical trials due 10 s1atislic:al significaace or din:cticn of trial n:sullS. CotlJrQM Dtttalxue ojS,J/tlrlQlic ReJ'iftrs Issue I. Art. No.: MROOOOO6.00I: 10.1002I14651858.MR000DD6. publ. Scberer, R. W., l.ugen. . P. aacI , . EIIII, It. 2007: Full publication of results iniliaU)' pR:sc:nted in abstracts. Cothrone Dtl/~ ojSyJtmJIIliC" RerieM's Issue 2. Art. No.: MROOOOOS. 001: 10. 100211465 I 858.MROODOOS.pub3. Sterae. J. A. c., Eaer, M. _ l\Iaber. D. GIl beIIaIr oftlae Codanae BIas 1\1........ Gnap 2008: C1Iaptcr 10: AddKssing n:portiag biases. In Hi"ins. J. P. T. and GIml. S. (eds). COMrtlIIe Handbook/or S)'slenltltit rniR'J oj inlmoenlioM. Chicbesler: John W'aley &: Sons, lJd. S.. A. C.. Ga,......... D. aad FGer, M. 2000: PUbliclllioa and n:1md bias in
....,J.
el DI. (2004).
mc:ta-analysis: ~r 0( Slldislic:allcsts and prevalc:nce in the liler-
Begg and Berlin (1988) noted that an asymmClric appearance of a RJNNB.. PIDT might suggest"'al a m~-anaIysis (see SYSTfJ.IAllC REVIEWS AND MEI"A ANALYSIS) has been alrected by publication bias. 111e subjectivity inherent in interpretation of paphical displays (see mL\PIOC'.o\L DEC'EPJ1ON) has led a number of authors to propose that slatislical tests for funnel plot asymmetry might be used to diapose publication bias. The most widely used such test was proposed by Egger el DI. (1997). 111esc authors also noted thai publication bias is only one of a number ofposlible causes offunncl plot asymmetry. which may also result from other n:porting biases. methodological Raws that lead to spuriously inllated elrects in smaller studieS.1IUe heterogeneity or chance. Thcn:fore tests ror funnel plot asymmetry should be seen as examining 'small-study efTccts' - a tendeney for elrects estimated in smaller slUdies to differ fiom those elllimalcd in larger studies (Sterne. Gavaghan and Egger. 2000). Publication bias is one of a number of possible explanations for small-study effects (Egger el Dt.• 1997: Sterne. Gavaghan and Ecgcr. 2000: Sterne. ~ICI' and Moher.. 2(08). SlDlillic41 problems ahat can affect the ~ger test have led a number of authors to propose ahemalive tests for runnel asymmetry. Stcmc. ~ICI' and M'" (2008) reviewed a numbcrof such tests, and provided R:commendaaions on testing for funnel plot asymmetry. JS
1lIIR. JolIrIItIl ojC!inirQl EpiJemiDio,y 53. 1119-29.
.... C, B._ alllla, J. A. 1988: Publication bias: a problem in iJtIcrpming medical daIa. JOIUIfIlI oj I"~ Royol Siolisliall SoC'iety. SerieJ A 151.419-63. CIIaa, A. W., IIr6bjartIIGa. A.. HuItr,
P-values These were introduced by R. A. Fisher (1925) as a means of assessing the evidence against a NUU. HYPCJTH. ESIS. Often. such a null hypothesis sIDles that there is no associalion between two Yariables. e.g. belweea hypenenlion and subsequent heart disease. 1hc P-w1uc is defined as the PROBABlLrrY. if the null hypothesis were true. that we would have observed an ASSOCIATION as large as we did by chwu:e.lfthis probability is small we have evidence tJgDillS1 thc null hypothesis; in other words. the smaller the P-value. thc stronger the evidence against the null hypalhesis. To illustrate the calculation or a P-valuc consider the results. displayed in the table. of a study to investigate: whether smoking reduces lung funclion. Fom:d vital capacity (FVC) (a test of lung function) was measured in 100 men aged 25-29. of whom 36 ~ smokers and 64 nonsmokers. (For simplicity we will use large-sample formulae.) 1hc mean FVC in smokers was 4.7 litres compan:d with 5.0 litres in nonsmokers. 1be dilreR:nce in mean FVC. XI-.TO. is thcn:rore 4.7 - 5.0= -0.3 litres. 1hc STANIWlD DEYlA110N (SD) in both groups was 0.6 Ii~. If the null hypotheSis is true. then Ibe ME."-H of the SAMPUHO DISJ1lIBUTlON of (.TI-.io) is zero. The larp-sample formulae sIaIe that thc sampling diSlribution of (XI-XO) is normal: its STAND.UD ERROR (52) is derived from thc standanl errors of .Tl and .TO) as:
P-vaI... Results of a study to investigate whether smoking reduces tung function Group
Noasmokers (0) Smoters (I)
Nllmber ojmm
MeQIIFVC
Sltmdtml tIeI'iIl/ion
Slturdtzrtl error O/mttllf FVC
%0 = 5.0
-'0=0.6 J. =0.6
SEo =0.6/../64 =0.075 SEa =0.6/../36 =0.100
%.
=4.7
_____________________________________________________________ SE
= J(sFIJ + sEll = J'O.l2 +0.0752 = O.l25litn:s
nae'ft.rI.rltlti61k:z = (.i'I-io)/SEmeaslRSbyhowmany sIandanIcmn Ihc meaadiffcnmce (XI-."') dit1'cls fnma the
Dull value or O. In dais example:
== (-0.3)/0.125 = -2.4 nae difference belWCen themcllllS is lhen:f'are 2.4slaDdan1 below 0. The ftpre shows Ihat the probability or pUil1l a dill'clalce or -2.4 IbIDdanI enan or rewer (the
cmm;
an:a uaderabc: curve: 10 abc: Ic:n of -2.4) is 0.0082 (Ibis can be round usil1l a COIIIpulc:I' or Sf.llljsticallablc:s). This probabililY
is bown u the tIIIe-&itktl 1'-,a1rw_
0.4
P-vaIue = o.Ot64 0.3
0.2 0.1
~
-s -&4~
-1
0
t
aaA
S
4
Standard errors ......... UsqJfesIstaIisIIcz tod8fennfnetlrepmlJ8bllty of gBIJIng .. cIIIetenoe ~ -2.4 stBndatrI error or ffIwer
By CIOIIYCIIlion. we usually use lWO-&iIkd "-IYI/W6. 'I1Ic: jusliftclllioll for Ihis is thai aur assc:ssmc:nl or lhc pmbabilily 1hat lhc IeSUIl is clue 10 chance should be basc:cl on how e:XRme is die: ~i%e of the dc:parturc r..... Ihc: null hypalhesis and not iIs dim:Iion. We: lhIBrore include the: pmbabilily Ihallhe dilTe:rence lllight (by chance) h~ bc:c:a in'lhc aPPosite dim:ti..: lhc IIICIIII FVC mishl ha~ bc:c:a gn:aIc:I' in smokers Ihan _smokcn.. Because the KORMAL DIS'llUllunoN is s)'l11lllClrical. lhis pnJbabilil, is also O.ool2.1he .IWO-sidc:cf "-value - die: prabability or observil1l a cliffe:n:nce 81 Ic:asl u exln:mc as 2.4. if abc: null hypalhc:sis or no dilTenmce is canect - is thus faund to be 0.0164 (-0.0082+0.0082). Such a I'-value provides ewdc:ace tlgBilul the ...11 hypodac:sis aacllUgc:sIs thai smoking docs. indeed. atrecl PVC.
~V.uES
While moll standmd IIaIislical compuICI'pacbges include: "-values as part oflheir IIandanI OIllpUL the intc:rpn:tlllion of P-values causes eanfusiaa. Far the n:asons discussed in Ihc: CDII)' . . HYPOIIIESIS 'JESTS. lhIB is Iilde justificlllion few clividilll n:sults inlo &sipificanl' and .....iplilieanl' acconiine ani, to whether Ihc: I'-value is less than 0.05 (ar on die: basis or an, oIhc:rthn:shold). Thn:c erranmmmon in Ihc: inlc:rpn:IaliOli of P-values an: as follows. Fina. potc:ntially clinically impollllDt associalions observed in small stucIic:s. ror which Ihc I'-value is man: than 0.05. are denoted as nonsipiftcanl and ignon:cL 1b pnJIect ourselves againll lhis e:rnH'. we sbouIcI always consicIc:r the I1IIIP of possible values rew the associ8lion shown by the: CoDftdc:nce inte:rvaI as well as lhe I'-value. SecOnd, sllllislically signiftcanl (I' < 0.05) lindiap an: lIIIumcd 10 n:suIl fiam n:allIIISDCialians. WhclCDS by dc:ftnilion an average of 1 in 10 camparisons in which the nuo h)'lXllhc:sis is IIUc: will raull in P < 0.0s. Thinl~ stalistically siJaillcanl (P < 0.05) findings an: assumed to be: or clinical importance:. whemIs li'len a suflic:ientl, large sample size. even an exlremely small lISIOCialion in.the population will be delc:dc:d as diffc:n:nl flOm the null hypodaesis value or ZCIU. Based on considc:ndians of the powCl' of sIUdic:s aad die: pmporIion of null hypolhesc:s thai an:. in fBCI, false:, Stc:rnc: and Davey Smi'" (2001) adapIaIthewolk orOabs( 1986) 10 IUliest dial in siluIIIions typical of maIicaI sIalisIics P-values Ic:ss than 0.001 could be: considc:n:cl to pnMcIe IIrcmJ evidence apinll the .null h)'pDllac:sis. Howc~.. dac: inle:rpn:18Iian of 1'-w1uc:s wiD always dc:pcnd on the contexl in which Ihc:y were Ie:nended.. Far exampJc:.. Wacholdc:r and Cbanack (lOO4) suge:1iIed thai in Ihc: canle:xt of moJc:c:ua. c:pidc:miolop:al studies. in which many thousands CII' e:~ millions or single-aucleotide pol)'lllOlphisms (SNPs) may be IcsIed for associaticms with a disc:aue outcome. it may be: appR1pIiaIc to consider only I'-values less than 10-", ar~ 10-4i.. u providing eVidence of a n:aI lISIDCialion. IS ........, R. A. 1925: Sltll&lktII rrwlirotb for trMUth .._~rs. Ediabuqb: Oliver IIDII Boyd..,.., M. 19B6: SlG/mit.Yllir/ererKY.
Chicbcslcr.JobD W"deyaSoas, Ltd.....,J.A.aadDa.., ..... O. 2001: SiftiD& abe evicleacc-wbaI's wnJD& wilb sipific:ance tats? Brit<"'~tIita,J1lllt1llll322. 226-31. w.......~ s. ... a--k, R. M. 2001: Asscaillllhc pnIIIabiDty tbaa a positive .... is raise:
IIPIIft*'Ia ror lDDIa:uIu cpiclenaioJacy SIUdics. JtIIInItII D/ • Ntltkllrtll 0lIrc:w _tilUte 96, 434-42.
ID
Q Q-Q plots See FROBABUI'Y PLOIS
quality of life (QoL) measurement This is a slandanlised subjective approach to measuring a person's perception of the:ir OWD health by using numerical scoring systems and may include one or several dimensions ofquality of life (QoL). Quality of life is a complex eoncept with multiple aspecls. These aspects (usually rercm:d to as domains or dimensions) can include: cognitive fUnctioning: emotional functioning; psychological well-being; general healab; physical functioning: physical symptoms and toxicity; role functioning; sexual funclioning; social well-being and runctioning; spiritual/existential issucs and many mofc. This broad definition of QoL inc....cs scales or iastnnncnts that ask general questions. such as ·In general, how would you rate your healab now?' and IlleR specific questions on particular symptoms. and sidc-clTccts, such as ·During the past week bave you felt nauseated?' Thesc measun:ment scales all havc the common fcat~ of using a standardised approach to assessing a person's pcn:eption of their own heaJab by usiag numerical scoring systems and may include one or several dimensions of quality of life. Rc:scan:hen have used a variety of names to describe quality of life measun:ment scales. Some prefer to use the term heQllh-relQled qUQlily of life (HRQoL or HRQL) to sbess ahal we are only concerned with health aspects. Others have used the tenns /reQlth slQlu:J or self-reporled heQllh. The United States fOOD AND DRUG AmDNISlRAnON has adopted the lenD patienl reporled oUlcome (PRO) in its guidance to tbe pharrnaa:utical industry for supporting labelling claims for medical product development (Food and Drug Administration, 20(6). The UK Medical Research Council (2009) used a similar term patienlreported olltcome metI:Jllra (PROMs). However. nol all people who complete such outcomes an: ill and patients and hence PRO could Jegitimalcly stand for perSOl1reported outcome. Mostly. we shall assume abat the quality of life inslnUnenl or outcome is self-reponed by abe person whose experience we arc intereSlcd in. but it could be completed by anoth" person or proxy. The tenn healllr outcome assessment has been put forwanI as an alternativc that avoids specifying the respondent. This article will follow CODyention and use the now well-establisbed term quality of life (QoL). The~ no fonnally agn:ed definition of QoL, so mosl investigators get around this problem by desaibing what &qdOfNllldjt CtNHpIIIfioIr It) MmllYll Slalislia; S«rMd EdiliufJ C 2011 JohD Wiley & ~ ....
they mean by QoL. and aben leuing the items (questions) in their QUESTIONNAIRE speak for lhemselves. Some QoL instruments focus upon a single conecpt (or dimenSion). such as physical funclioning. Other QoL instruments have scvenl dimensions. such as physical. emotional and social runctioning. SiDce there arc many potential dimensions or QoL it is impractical to assess all of thesc concepts Simultaneously in one instrument. Furlhc:nnore. QoL is a subjective concept. because symptoms. such as pain or depresSion and even physical functioning. a~ experienced by the individual patient and therefon: they cannot entirely be assessed by 'objective' measures. So how do we actually measure QoL? Simplistically. QoL me8Sma n:praenl a standardised approadJ to assessing a patient's pem:ption of their own health. using numerical scoring. and can include symptoms. function and well-being. The conecpts fonning the various QoL dimensions arc subjective measun:s and should best be ewluatc:d by asking the: patienL The Medical Outcomes Sludy (MOS) Short FornI (SF)36 is lhe most commonly used QoL measure in lhe world loday. It originaled in the USA (Wan: and Sherbourne. 1992). but has been validated for use in the United Kingdom (Brazier el al.~ 1992). It ()Ontains 36 questions measuring health across eight dimensions: physical functioning (PF) 10 items: role limitation because or physical hcalah (RP) 4 items; social functioning (SF) 2 items: vitality (VT) 4 items: bodily pain (BP) 2 items; mental health (MH) S items: role limitation because of emotional problems (RE) 3 items; and general health (GH) S items. (The: first rapre on page 366 shows abe 10 questions that make up the physical function dimension or abe SF-36.) 1M responses 10 abe 36 individual questions arc classified into a mixlUn: of biaary (yeslno) and three-, five> and sixpoinl ordered response calcgories. In planning and analysis. the question responses an: often analysed by assigning equally spaced numerical scores to the ordinal categories (e.J. I = .yes. limited a )01' • 2 = ·yes. limited a little' and 3 "no., not limilc:d at all' , far the 10 items in the figure). 1be raw scores across similar questions (e.g. the 10 pbysical fuDclioning items shown in abe figure) are summc:d to genenle a nw dimension score. Thus the 100ilem physical function scale of the SF-36. wiab items seon:d I 10 3. would yield a nw SCXJIC mnging from 10 to 30. Finally. these nw dimension scores are thea lIansfonncd to genente a QoL score from 0 10 100. when: 100 indicates 'good hcalab'.
=
Edited by Briaa S. Everitt and ChrisIGph« R. P'dmeI'
385
OUAUTY OF LIFE (Col) MEASUREMENT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ HEALTH AND DAILY ACTlYmes The following questions 818 about activities that you mVd do during a typical clay. Does your health Unit you in these aclivities? If so, how much?
Yes.
V.,
1
2
3
b.lloderateacllvllles, such as moving a table, pushing a YlCUIID cleaner, bowIiIg or playing golf
1
2
3
c. l.IIing or carrying ~ries
1
2
3
d. Clmbilg ..... tights of
1
2
3
e. Clmbilg 0 . . flight of stairs
1
2
3
f. Bending, kneeling orslooping
1
2
3
g. walking Il1018 a.n a mile
1
2
3
h. walking hilt. mile
1
2
3
i. Walcing 100 yards
1
2
3
j. BaIting and dressing yoursel
1
2
3
AC1MI'ES
No,not
Ibnlled limited limited a lot a little .td L
VIgcnuI acIIvItIII, such as nming, lifting heavy abjects. participating in slrelllDUS spodB
stail
quality of life (QoL) m_urement The 10 questions that make up the SF-36 physicallunction dimension (BlazJeret aI., 1992) In our PF dimension example Ibis transformation is achieved by:
[RaW scan:-Lowcst possible scorel x 100 Range of possible scores ]
r
Raw scan:-10l i.e·l 20 x 100
J
Faycn and Machin (2007) tenn this procedure abe "standanl scCX'ing mcIhod' and this basic praccdure is used to SCCR many QoL instruments besides the SF-36. The SF-36 is an example of a QoL inllnllrlCnt that is inlcnded for general usc. im:spc:ctiye of the illness or CIOIIclition of the patient. Sut'h insll'Umcnts are oRen tcnncd gelwric measures and may often be applicable 10 healthy people too. and hcna: used in population sUl"YCys. The ICt'Onci figlR on page 367 shows lhc distribulion of Ihc eir;ht main
dimensions oflhc SF-36 from a general population sUn'ey of United Kingdom residents (Brazier el til.• 1992). The: third filUre on page 368 shows how physical functioning in Ihc general population (Walten. Munro and Brazier. 2m1) declines npidly wiab iDCmlSinr; Ble. The SF-36 is also an example of a profile QoL measun: since it gencrateseipt sepanIIe scaRS foreach climension of health (founh figun: on page 368). Odler generic profile inslnlJnc:lllS such as abc Sickness Impact Profile (SIP) and NottinP_ Heakh Profile (NHP) are described in Bowlinr; (2004) and FayCIS and Machin (2007). Convenely. some oaber QoL measures genc:ratc a single summlll)' KOre or single intle.",. which combines the clift'c:n:nt dimensions of health into a silllie number. An example of a silllle index QoL outcome is lhc EuraQaL or EQ-SD as it is now named (FayClS and Machin. 2007). Generic inslnlmenls 811: intended to cover a wide lBIIIe of conditions and ha'VC the advantqc thai the sean:s fram patients wiab various diseases may be mmpam:d Blainst each oaber and against the pnen! population. For example. Ihc fourth figure compares Ihc mean SF-36 dimension scores of a group of )'GUlli male cancer surYivon aged 25-44 with an age and sex matchcclcaalrol sample (Greenfield ellll•• 2(07). The cancer surYi\'on sample bas a lower QoL on all eipt dimensions orthc SF-36 than Ihcconarol sample. On the other haneI. generic inslrUments may fBillo focus on the issues of particular CODCCm to patients with discllSC., ad may often lack the SENSIlMIY 10 detcc:t dift'erences abal arise as a consequence of In:atments thlll am compami in CUXIC\L 1IUALS.. This has IcxI 10 the dcvclopmclll of Cflmilion- or diseII~ sp«iJic QUESTlOJIINI\IIlES. Discasc-spc:cific QoL measurement scales are comlfthcnsi~y n:vicwcd by Bowlil1l (2001). Examples of disc8liC>spccific QoL quc:stioanain:s dcsc:ribcd in Fayen and Mac:hiD (2007) include: the canccr-spc:cifte JO.item Eurapean Orpnisalion for Rescan:h and Tn:almcnl of Canc:cr (EDRI'C) QLC-30 quc:slionnaire and the cancerspecific 30-item Roucnlam Symptom Checklist (RSCL). The instruments dcscribc:d above claim 10 measure general QoL and usually include at least one question about oYelDIl QoL orlleallh. Sometimes investigators may wish 10 explon: particular aspects or concepts in grealel' depth. There arc also instruments for specific aspects of QoL. These specific aspects may include anxiety and depression. physical functioninr;. pain and fBlip. Examples ofinstruments that evaluate specific aspeets of QoL are again described in Payers and Machin (2007) and include: the Hospital Anxiety and Depression Seale (HADS) and the Bc:ck DelRSsion Inventory (BDI) instruments for measuring anxiety and depression; the McGill Pain Questionnaire (MPQ) for the measurement of pain: the Multidimensional Faliguc In'VCntary (MFI) for assessing fatigue and the Smhcllndex of Disability (BID) for assessing disability ad functioning_
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ QUAUTYOFUFE(QoL) MEASUREMENT (a)
601}
I 200
.500
g 100
is ·00 ~ ~
:300
u...
2.00
~
100 U
S -J.6
d 10001
(c) f200
.' 600 00
o 33.3, 166..7 UIO . IF 5 __ .emotiOOl SIOO1 e (91)
S 140 120 ti 100 i1; ~ £10
Lt:e-
100
60 40 20
...
o ~-
Sf-J6 ge.nerallie.allh per.o
tlo.ne reo '
......" of life (QoL) _ 1 M I d DisttibuIion of the eight SF-86 cfmensIons from allfllJfRl popuIIIIion sutVey (n - 1372), where a score of 100 IndIcIII8s Vood hBII/th' (data from BrazIer eI 81., 1992)
QU~OF~(~M~REMBff
I-
100 90 • 80 70:
••
-. Ii oil
a
III III •
• •• ,
50 40
f
CIJ
so
I:
20
:E
10 0
-I-
________________________________________
.~(n-S252)
• Female (n=988t)
I
a• • •• • •
**;t**~~,,-s~,. 41
~.;""~~,t",
Awe (Jeirs) ....Ity of life (QoL) ..............t Afean SF-aB- physIcsI funcfion SCORIS -by tee and SfIJC (data·from W.."., Afunto 8IId BlazIer, _20(1)
Tho historlcal.,vclapmeat or QoL 8UCSsment-is briely discussed in Pa,ers and Machin (2007). ODe of the fint iaslnllnents ..... ~ lhe IISSeSllllellt of -patic:ars beyond physiolDlical and clinical examinatioD wu the ~oflky Pcrf~ Scale -pmposed in 1047 for the use iii clinicaJ sdlillp. OVer·the rollowiag· yeus ·a-number of ather lCIIIes- wen developed to -_ assCss functionally ~lil)'!, such u die Baithellndcx. The ne~t . . . .tioa or questiOalUiiMi from 1910 emwanl&. such as die SIP aiadNHP, allempted tG ._Iiry pneml health saatus and nat just runctional abilily. The l'urldkMaai ability. climCniians . . oaf die pneriC i~ .. mCaIS may not be lapOIIIive eDaqh 10 cIeteet the 'snuill'
chanps ._physical runctioninc- experiaaci by paIienIs IIIicIeqoiq baamenL Far example.. ~ least "dimwit' item orlhe SF-36-physiCal f'Imclion cli~ (seethe I .. ftIiR) is question 3j, "aalhill, or chssi. yourselr'. Older adulas With fUDc:liODiIll Jimblems--niay _ hnc-dilliCally com:' .pIetia&lbJsitein. For~1e. apncnl papulalion_ surveyorolder &dullS- aced 65 CI' IIIIIn: (WaltCrs.- Munm and Brazier. 2qGl) found Ihat 6.S4J, orrapandealS were "floor' (.i.e. sccnd ~) iOD- die oriIiIIaI-IO-item PF cIimension. 'fIIIRiIre ieveral-.,lulians to this prablan OrWln:sponsiYe QoL meas&II'CS. iadudia, -extencliag the scales (by -acldiq exlla qUestions), _CDIIlPulu ~ve Icidlll C<;:AT) ancl paIical-pneraaeci measures (POM). With die inCn:asiq USe
at_
100 ~
90
:@ 80 I::
0;
~
.r,p
;~ [Q !;
[B
70
60 50 40
o Crmtrols (".=2
30
• Canrer s.uM1iorrs ("=.17 1)
3)
20 ·
0 I]
PF
BP
GH
VT
SF
SF-a6-Dlmensians
n_.
quality of lie (GaL) m ••u......t PtDIl1e 01 meIIn SF-36 cfmeIJWoR SCOffJS fora sample _01 young IIIIIIe ~ sun!IvotS (Illied 25-44}. CDtfIIiared with and sex maIched oonIJDI group (dIJIa ftom. Greenfield 111:81., 2007) . .. '. . .
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ aUAUTYOFUFE (CoL) MEASUREMENT of infonnation technology we could use CAT to assess QoL. This would invol\oc Ihc use of 'tailon:d' or adapted tests. For example. to assess physical functioning (using the SF-J6 say), if a patient cannot manage a short walk. why ask questions about long walks or mnning? With the use of CAT. the compua softwan: can selCC't questions of appropriate diRicuily on the basis of earlier responses. This can result in bencft15 of IIlOIe precise grading of ability and fewer questions put to each penon. A Simpler solution to the problem of nonraponsive generic QoL in5lnlmcnIS is to use patient gc:acrated measures (POMs). These arc a SCI of QoL instnnncnls that ask palicnlS to select their own dimensions and/or ilCms. POMs have the advantages that they an: more relevant 10 the palienlS and likely to be more sensitive to change. The disadvantages ofPGMs an: Ihc reduced comparability between patients. the dimensions may change befcn and after treatment and Ihcy an: mCR complex to administer. Fayers and Machin (2007) give two examples of POMs. the Patient Gcncnded Index (PGI) and the Schedule for the Evaluation of Individual Quality of Life (SElQOL). Then: is a continuing philosophical debate about the meaning ofQoL and about what should be measured. DcspiIC this. it is still important to measure hcaltlHelatcd quality of life (QoL) as well as clinical and proc:ess-bascd outcomes. This is because 'all of the these (QoL) concepts reflc:d issues that are of ftmdamcntal impmtance to patienls' well being. 'I11cyan: all worth investigating and quantifying' (Fayers and Machin, 2007). Mast investigators In:at the 'swnmated scores' from the QoL instrumc:nts as ifthcy are fiom a continuous distribution. This is probably not an unreasonable assumption. particularly if we believe that there exists an underlying continuous latent variable that measures QoL. and that the actual measured outcomes are ordered ealegorics that reftect contiguous intervals alang this continuum. Mast QoL outcome measures. such as the SF-J6. which use the standard scoring method described pre\iously. gencmte data with discrete. bounded and nonstandard distribulions (sec the second figure). The scaling of QoL measures such as the SF-J6 may lead to sevc:ral problems in determining sample size and analysing the daaa (Walters. Campbell and Lall, 2001: Walters. Campbell and Paisley. 2001). The apparenl continuum hides the fact that only a few discrete values a~ possible. For example, the role physical (RP) dimension of the SF-J6 is scored on a 0 to 100 scale but there are only five possible categories/scores. e.g. O. 25. SO, 75 and 100 (see the second figure. part (c». Also, the scale may not be linear. For example. using the SF-J6 RP dimension. is a change of score from 0 to 25 the same as a change from 7S to lOO? Another common concern is a floor or cclling elTect. Patients cannot be worse than the worst calcgoJ)" or belter
than the best category. (In the case of the SF-36 sco~ eithu
oor 100). For some populations the level is wrong and mast people score on either the best ealcgmy or the worst catcgmy. Floor and eeiling effects arc more likely to be a problem in longitudinal studies because they limit the ability of the instrumc:nt to deleet an improvement or deterioration in a palic:nt's QoL ovu lime. Part (c) of the second figun: shows that for the RP dimc:nsion of the SF-36 over 72~ (l000f 1372) of the gc:neral population sample had scon:d 100 and were at the ceiling of the distribution. Furthcnnore. methods based on the HOBtAL DlmtlBU110N (such as t.IUl.11Pl.E UNEAR RBJRfSSlQN) assume that the outcome variable has a constant VARWCE. The variances or changes may depend on initial wlues. This is a common problem with range-limited values. Patients may enler the study with a wide variety of scores. but tend always 10 i~ their SCCRS. Thus patica15 who score lower at the SIaJt of the study ha\OC more range to improve than those who are already close 10 the maximwn. Anothc:l' issue is that normal approximations may not apply. Since the data an: in fact categorical. they may n:quire differenl tc:chniques of analysis. By definition. no ordinal variable can be normally distributed. although in some cues a normal approximation will suffice. Also. it is difftcult 10 quantify an effect size (e.g. a desirable difference in MEA.lI( score between groups) in advllDCC and another cona:rn in QoL measurement is thai MISSING DATA an: likely. e.g. in questionnaires that ask "How far can you walk?' when the patienl is in a wheelchair. The advantages in being able to treat QoL scales as continuous and nonnally distributed are simplicity in sample size eslimation and stalistieal analysis. Therefore, it is important to examine such simplifying assumptions for dilTerent instruments and their scales. Since QoL outcome measures may not meet the dislributional requirements (usually that the daaa have a normal distribution) of parametric methods of sample size estimalion and analysis, NONPARAMErRIC '-I!.THODS are often used to analyse QoL data. Conventional methods of analysis of QoL outcomes an: extensively described in Fairclough (2002), Fayers and Machin (2007) and Waltcrs (2009). The papers by Walten, Campbell and Lull (2001). Walters, Campbell and Paisley (2001) and Walters and Campbell (2005) discuss alternative ways of determining sample size and analysing QoL oulcomes. including the use of the proportional odds model for ordinal data and the nonparamc:lric IIOOI"STIlAP compua simulalion method. 1bcre an: nwnel'Ous QoL in5lnlmenls now awilable (and these an: extensively dcsaibcd in Bowling, 2001, 2(04). By far the easiest way to assess QoL is to usc: an off-the-shelf instrument rather than designing your own. So how do you choose between the various QoL instnunc:n15? This
OUANTILE-OUANTILE(Q-Q) PLOTS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ ftlndamenaally depc:ack on lhe purpose of the stud)' and thc n:liabilil)'. validily. rc:sponsivcness and practicality of thc inslnlmc:al. A bell m,d brtl«s approach is recommended for lhe assessment of QoL. meaning both a generic and aJDdilion-spccific instnamenl should be used. Thc inlen:Sled ~ is refcrmlto Fayers and Machin (2007) for compn:heDSi~ guidelines on assessing. analysing ud interpretiq QoL data. Alternativel)'. Fain:lougb (2002) c:oncenlntcs mon: deeply on the aign and analysis or QoL studies in clinical trials. Wallers (2009) prac:nlS practical and pragmatic guidelines for the design (i.e. sample size estimation) and analysis of trials involving QoL measIRs. Finall)'. Bowling (2001. 2004) provides an extensive review of generic and disease-specific QoL measurement scales. SJ'V
BowtIaIt A. 2001: Me4Jllrin! tlisrtue: II reriew of tlise~sp«iJiC' qwzJity ollife mftlSllMlltnl SCtJIeI. 2ad cdiliaa. Buckingham: OpeD University Plas......... A. 2004: Mmmril& hetllllr II reri",,' of
IJIIIIlity olli/e mt'tJlllTrflltnl KllIeS. 2ad cdiliaa. Buckingham: Open University Pn:ss. Bnder. J. It, Harper, R., J - , N. M. B., OtC......., A., 1'IIoIuI, K. J., WHtwood,T. ud W........ L 1992: Validaliaglbc SF-36 health SUn"C)' questiallnaiR: new aulcome measure far prilliii)' cam. Br;Ib/t Medital JourllllllOS.I""'. r ........,Do L 2002: Design tIIIIltllllllysu ojqUtJlityollife ~tutlies in dhrical triG/s. New York: Oaapmaa a Hall. r.,..., P••L ... .1acIda, D. 2007: Qutllity of life: I~ 1UMs.rnte1lt. tIIItI/ysis tIIIIl ;"lerprelllliDII oj fItI,knI-reportfti DflIl.YIIMs. lad edition. Chic~ Iter:Jahn Wiley a Sans. Ud. Foadud DnII AdadaIstrdoD 2006: Guitlturufor int/rulry: ptlliml-rrporleti oul"",,£, nreamra: use ill nretliC'1I1 product tlewlopmenllO support labellin! cIoinu (draft). New yadt: food and Drug AdminiSlnlian. GI'IeIIIeId. D. Me,
W........ s. J .. 011. ...., R. E., IIauocIE, .. W., .......... R., 0.", H. A.. Saowdta, J. A., DeropIII. L......... S. M.... . . . R. J. 2007: PrnaIeac:e and canscqucacc:s or andmJCII deftciency in ),OIIIIJ male cancer slnmn in a cantroIled cruss-scclianll study. JtJIUIItIl of Clilln Elttlocrinolog,. llllti Melabol_ 92. 9. 3476-12••ledIaIlleIeudI CoIIIIdI 2009: PIII;ml rrportetl oulCfIIIlelMtlSllfrs(PROM~): ;tlenlihinlUKTrsetJrth".iorilieJ. Repan of an MRC Warksbap. 12 Jaaull)' 2009. Royal Collep: of Physicians. London.. Loadoa: Medical RcseEh Cauncil. \V...... S. J. 2009: QutlI;ty oj li/r ou/t'Olfles ill C'lin;tIl# l,iIIIs and "e"llh mre ntIIutJlion: II pract;tm 1M 10 _i)'litllllti iIIlerprelllliDII. OUchcstcr: John Wiley A Sam. LId. W....... S. J. aad Camp"", .LJ. 2005:Tbe use of baolmap simulalioa melhods rarcletcrmiaing sample sius far slUdics involving heahh-relatcd qualil)' or life measures. SlalilliC'~ ill Metlicine 24, 107>102. W....n. s. J., Camp"O, M. J. ad ...... R. 2001: Design and analysis of trials with quality of life IS an OUICome: a practical pide. JOIInIIlI of B;opharrntlC'eutica/ StaliJl;u 11. 3. 155-76. \V........ S. J., C...,..U, M. J. and PaIlle" S. 2001: Mc:daods for determining sample sizes far slUdics involving health-related qualil)' of life 1DCU1U'CS: a lUloriai. Hetll,' Sn"'iC'es IIIIti OulC'ome$ ReJetlrC'h MethotloloD 2, 83-99. W........ S.J•• MIUII'O,J. r .... B....., J. E. 2001: Usilll the SP-l6 with older adalts: cross-sectional communiI)' based suney. Ag£' tmtI Age;", 30. 3l7~l. W.... Jr,
J. E. and SIIerboa,... C. D. 1992: The MOS 3et-item sharl-fona health survey (SF-16).I. ec.eplual tion. Metiklll elll'e JO. 473-83.
framC\\"rt and item selec-
quantlle-quantlle (Q-Q) plots
See PROBABILn'Y
FlDI'S
quantile regression
'Ibis is a statistical n:gression method tIIat maclels any specified QUANTILE (e.g. MEOoo.. first quanilc. 90th pen:cntile) of a continuous dependent variable liven a set of EXPLANATORY VARIABLES. It is analogous 10 linear rqression (see MULDPLE LINEAR REGRESSION). which models the MEAN of the dependent variable instead. When applied to the median. quantile n:gression is bown as meditm regressiolf. Quantile regression has several appealing features. some or which arc iIIustraled in the following four examples. Although inspin:d b), real-life applications. all the examples pn:sc:ntcd an: fictitious. In them. deSCriptiODS and inlelpn:lalions an: kepi as aJDCise as passiblc aad may occasionally be simplistic. II is hoped that they ma)' nevertheless facililale unclcntandiq of the prominent fcallRS of quantile regn:ssion. E.'CQII,pie I. QlIIIlfUies lire 0/ sIIbslllnlil'lf interesl. Forced vital capacily (FVC) measun:s the 10lal volume of air onc can cx.hale after a deep inhalation and is commonly used along with other indexes 10 evaluate lung funclion. Lower values of PVC ma)' be indicalivc of some pulmolUll')' disorder. In clinical practice it is or pal intereSlto compare an individuars observed measure with refen:ncc values of normal. FVC is kaown to change ph)'siologicall)' along with age. sex and heighl. Reference values should therefore be agee. sex- and height-specific. The first figure (on page 371) shows a SCA1TEIlPL.Ofof FVC againsl age measured on health)'. nonsmoking mc:o. The lincs depict the 5th percentile estimated by quantile regression (solid) and mean FVC b)' UNEAR REORESSION (dashed). The lines are estimated ror I.I-m tall men. The 5th percentilc linc can be intcrpn:tcd as follows: at an), given age. 9S 4J, or healthy I.I-m lall men arc expecled 10 measure above the line. Observed FVC measures that fall below are typically considered subnormal. Nole lIaat mean FVC estimated by linear rqn:ssion is hardl), of any inlen:sl in this conleJtt. FVC measun:s of perfectl)' health)' individuals an: cx.pectcd 10 fall abovc and below the mean line. Insorar as they an: nol too low. they should raise no suspicion. Quantile regn:ssion may estimaIC other pcn:cntiles of nonnal (c.g. the lsi. the 10th). Quantilc:s an: of resean:h interest in man)' other seUiql. which include. for instance. a median lellaal dose in toxicology. percentiles of seawaler concentration of chemicals in environmental studics~ median survival time in CUNICAL TRIALS and 90th
___________________________________________________ 8
• •• •
2
5
• •• • •
QUANn~ReaR~~
•
~
1~
4
1
l
3
2~--------_P--------~--------~~--------p 70 40 50 eo 30 ,,-(years)
quantile reg....lon ScatterpJot 01 forced viIBJ cspaciIy against. age measured in 1000 tictJlious inclviduals with the 5th peteentle estimated by quantile I8f1ression (sold Ine) and the mean estimated by linesr If1IJtession (dashed line)
pen:entile of the time from aD elllClleocy call to admission in a hospital in emergency mc:diciDe. Exllmple 2. Qutmlilu provide buiglrl. Body mass index (8MI = weiptlhc:ight-sq~d~ in k&lm2) is often used when slUdyiq obesily. 'I1Ic: sccaacI figue on pqe 372 shows a SCllllelplal of 8MI &lainst &Ie in sc:denlary childrc:a (le:fthand panel) and inchilckcn on a pbysical activily programme (right-hand panel). The solid lines in each pai1cI n:pracnt from baItom to lOp the estimated 5th, 2S1h~ SOth. 75th and 95th percentiles. AI 10)'c:aJ'I of age the distribution of 8MI values in the two groups look similar. With ageinl. however. the twodislribuUaas sc:pande. The iaqer BMI wlucs IR esdmaled to IJUW higher in the sedentary papulation than in the active population. However, the lower SOCJt of the 8M1 vallics in the two populalians seem not 10 be conspicuously impacted by a sc:cIcntlll)' lifestyle. lndc:eeL the slopes maIc:d b)' quantile regression do DOl cliffei' silniftcandy between the two groups far aay percentile: below the mc:diu. Linear J'CII'CSSion (not. shown) would provide: estimates for the slopes of mean BMI. They would show a diluted, averageetrcct. which cauld allow but a pallial undc:rstanding of the complex impacl of the physical activily propamme onBMt Exturrple 3. Qumrlilu 11110"' 'llTillble Irons/omJllliorr. Urariium is a naturally occurring alpha-emitting radiaaucliclc: and a toxic he:avy metallic element with caKinogcnic poleDlia" Groundwater conc:cnlralioas of uranium are
Qta:.
meuun:cl iD the vicinity ·or a pollutinl saUrc:e•. Tbc: third ftlure shows the scauerplal or uranium concentrations (left-hand panel) and the 10larithm lnasf'orm or uranium (right-hand panel) qainst distance (miles) rrom the source. The solid lines represent the Sih. SOth and 95th percentiles or uranium and the dashed line ils mean. Modelling the relationship between uranium and distance is simpler on the 100arithm scale•. where it is approximalely linear, thaa on die: untransfonncd scale. Quantile I'Clression allows lnasformation or the dependeat variable. The quantiles of uranium arc estimated on the 10larithmic scale a.nd then Innsfol'lllCd back to the unlnmsfonnc:d scale. In the unInnsfonnc:d scale the estimated quanlile curves arc thus· constrained to be positive. which is clearly clc:sinble. In general, in linear regn:ssi_ the clc:pcndenl variable should DOt be lransronnc:d. despite this being common practice. Infcrmce on transfonned outc:ane wauld carry no informatioa about untransfarmed outc:Omc. unless sll'OnJ distribulioDal IIISUIDpIions W4R made. A cIired application of line.. lqI'CSSiaa 10 untransranned uranium. however. pnxIuccs aansc:nsical lICIaIi\IC estimates of meaa uranium. The DDnIincar relatiaaship between mean unnium and clislance should inslc:ad be: macll:1Ied with SDmc oIhar approprilllC method (e.l. splines - see SCATIERFI.DI' SMOCJIIIERS - and nonIinc:arrepasion methods). 2\ICn then. howew:r. inference about the rnc:aD only may be: unsatisrador)'. In the PRsc:nce of skewed dialributions the mean may be highly affcc:tecl by few
371
QUANTILE REGRESSION _
Sada ltary
hlysicaUy AdiYe - - -
40 -
..........
~
Iro
f ro
30
--
-m
Ji::
J=
}
•
~
~
~
"til
m
~
:1
...:: 20 ~
2{l
~
0
0
a:I
i:!O
0'
18
20
~-----r----'-----'-----~----~
12
1.4
'~ 6
Age (reat~,)
quantile regression Scatterplot of body mass index against age in 500 fictitious sedentary children (left-hand panel) and 500 fictitious children on a physical activity program (right-hand panel) with the 5th, 25th. 50th. 75th and 95th percentiles estimated by quantile regression (lines bottom to top)
unu~ually Il1r~e \·alue~. Inference about a set of quantile~ with quantile re£n:ssion !;enemll} permits mure complete inference.
l.·xamplc 4. QlIcl1llilt'.\· (11'(' ro"I1.\//o tJlIllit'r!i tlmllllt.'tI.lllr('-
Sample data may sometimes contain unusuall} lar,!;e or unusually small \·alucs. often referred to as OlITIJrRS. Outliers may occur because they are pres.cnt in the population from where the sample is dr.J\\·n or bcxausc of measurement em)!"!>. Both ca~s are extremely frequent in real application...... When outliers are present. the median may be a better summary statistic than the mean to asses...' the hlCation of a distribution because, unlike the mean. it is lar,!;cly unaffe'""ted by them. Whcn thc distribution of sume \-ariable gh'cn the independent \'ariables ha., unusually large or small \-'alues, the median may he morc cfficielltthan thc mean. in that it ha., more POWfJt and gi\'es narrower COSI1Df.SCI: INll:R\,.\I.S. If. for example. the distribution IS normal. then the median is less eflicknt than the mean: if it is exponential thcy an: equally enicknt; if it is a Sn!Dt::'\T'S I-OISTRlBtrTlO:-.i with J DEORUS Of I'RE1:t)o~1 thc median is more cllicicnl. The robustnes~ to outliers and mea,<;urement emlr applie!\ to quantile re~ression a.., well and male~ it preferable tu lincar re~rcssion
11It.'1I1 arur.
\\'hen thcse issucs may be rele .. anl. The fourth Ii,!;urc shows a scallcrploc of weight a~ainst height. The sulid linc represents median regreS-..<;ion and thc dashed line line ar regres.sion. The t\\'o outlier.. an-eet linear regression but not medjan rct;ressiun. The slope of the estimated mcdian weight is statistically significantly different from zcro. while the slopc of mean wei~ht is not. further. thc eonlidence inter\'al for thc estimated slope of the mcan is about 50'1 lar!,!cr than that of the median. Quantile regre~sion models can bc ca..<;ily e~timatcd with must of the y.. iddy a\'ailablc ST,mSTIC'.-\1. SOfTW.\JU: (e.g. SAS. St..lta. RlSplus). For instance. suppose it is of interest to infer median blood pressure in two populations: females and males (or treatment and placcbo. exposed to some risk foctor and unexposed •. Gi\-en sample data from each population. mcdian re,!;ression pnwidcs conlidence intenals and P-V.\WLS fur the medians in the lwo populations Llnd theirdiffcrence.ln this case. thc mcdian re!,!ression modd would ha\'c blood pressure a... the dependent \'ariable and sex ~ a binary independent variable. just as linear re~ression would ha\'C the same variables to infer about mean blood pressurc in thc t ..... o pupulations,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ QUANTILE REGRESSION
400(1)
m ~ £! g
-
9'
~
'f
~
:::J
~
:!! 200001
.;.
::J
~ ~
~
.,
....I
fi'
~----~------~----~
DiS nee miles}
quantile. .........on Scatterplot of 1000 fIcIiIIous measures 01 groundwater ursnlum COIJC8fJtIation against dIsIIInce from a poJIutJng SOUfCB in the natural sca/B (IeII-handPIJfJ8I) and In the Iogatithmk; scale (tirJht-hand panel) with the 5th, 60th and 95th peR:SntiIBs estimated by quantile I8QIf1SSion (solid lines boItom to top) and the mean estimated by linear 1fIIJI8SS/On (dashed line)
70
iIiii--. 40
30
~--------~--------~--------~--------~ 1,9 1,1 us 1.8 2.0 HQIg11L tilll
quantile regrealon ScatttHplot of weight against height in 50 fictitious individuals with median weight estimated by quantile regression (solid line) and mean weight estfmaled by linear regression (~ I/nBI
OUANTILES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ On a Icchaical noll:~ quantile rellCssian models are eslimatcd by a simple and fast iterativc algorithm. an adaptation or abe simplex algoridun from mathcmalic:aI linear programming. Confidence: inte:rvals and P-values may be obtained from lup-samplc approximalions.. mnk-scon: Ic:sl imcnions or the IIOOI'STRAP. 'I1Ic lalteI' has been sIIown to pcrfann better than thc olhers and is generally RICOIIIIIICncIed. Quantilc regression is a ftexible. insightful and cflicicnt statistical Iqn:ssion method. Its Clltensions ~ an acti'VC area of research aad incluclc~ ror cxamplc, methods for eslimation or condilional quantilcs with censored dala. clusten::d data (see a.USTER IlANDOMiSED DIALS and CLUS. TERED BINARY DATA). IONOJ'RJDINAL DATA and count data. Given its appealing fc:at~ quantile re~ssion is becoming increasiDlly popular in biomedicalresearcb~ and its usc has been recommended in dtarials. commentarics and rcsc:arcb articles of numcraus n:puted joamals. For further details sec. amoDl othcrs~ Kocnkcr and Hallock (2(MU). Cadc and Noon (2003)~ Austin el III. (2ooS). Koc:nker (2005), Gillmaa and Kleinman (2007) and Bottai. Cai aad McKeown (2010). MIS
A.... P. C.. 1'1I,.I. V.. DIII)', P. A. .... Ait.r, D.A. 200S: Tutorial ia biostaIbtics: Ihe usc of qUlllliierepasiOD in hcallh ~ racan:b: a case study c:xaminiDI gender differences in the limeUacss or lhnmIIIoIyIic therapy. Slo'&';u ill Malicinr 24. 791-116. Boa.I, .... caI, ........tcKtowll. E." 2010: 1'utariaIs in biaslalistics: logistic quanlik .:p-eaiaa rar bounded oulcolDcs. S'a/ls,;cs in MfffliciM 29. 309-17. C_, ... s. .... N-. ..... 2003: A gentle: inlmductiaR toquanlile .:p-eaian forec:aIoBists.. Fronliers in &01· oD and lite EnrirtllllMn' 1.412-20. GII-, M. W..... IDe.... . . . K. lOD7: lavitecl CGIIIIDCIIIaJ)': antccedelU of alait)' - mal)'.lis- iatcqRCatioa. and uscoflaqitudiaal data.AmuiCtlll JourlltllD/ EpfllemitJloD 166. 14-16. Koeaktr, R. 2ODS: (lualik Te,res.Jion. New YcR: Cambriqe UDi\'CIIi1)' Pn:ss.. KoeUer, .. -HalaS, K. P. 2001: QuIDlile repasion. lDU1JItII of EctJnDnI;~ l'erJpetliay.J 15.143-56.
quanti...
CUI-points thld splil either a samplc or onIc:n:d data. or a PROBABILITY DISTRIBU1'ION. into Mgions of pre-spccified and often equal size. The 1IED1AN is thc simplcsl quantile. which splits the sample into two equal halves. Similarly tcrtilcs. quaniles. quintiles. deciles and ccnUles (or pcm:ntiles) splitthc samplc into MSpeClivcly thinls. quarters. fifths. tenths and hundredths. There are one: rewer cut-points thaa n:gions. numbcrc:cl from the lower to the upper tailor the distribution. For example.thc three quartiles split the samplc into roar equal Iqions: a qulllter below the lower quartile~ a quarter above the upper quanilc and a quartcr eithcr side of thc mid-quanilc (or median). Sample quaatilcs are n:quiml in a PR08ABIIJI"Y PLOr. 'rhIR an: lC\'Crai ways to calcuille them: Hyndman and Pan (1996)
recommend the medilllHlnbiasecl estimalor. .
P·-
I
1-]
I-~
n+ j
where: Pi is the cumulative probability carn:sponding to the ilh quaatile. i.c. thc ilh orclcml value in a sample of size n.
11ac alternative epidemiological definilion of quantilcs is the regions themsclYCS. 50 lbld with this defiDilion quinliles. far Cllamplc. correspond to the ft~ regions of lhc distributima rather than the four cut-points. 11ac two definitions an: oftenconfusc:d. TIC (Sec abo INTERQUAIl1ILE IWIOE aad CIlOW1II CIIARTS)
...,.......... .I. ad Fa. Y. 1996: Sample quantiles ia stlllistical pacbps. AmeriCtlll Slaliniritln SO, 361-5.
quantitative Iran loci (QTL)
Thcsc~cluomosom
aI locations of functional variants thai affect eantinuous
charadcristics (e.g. hci&ht. blood pressure) or common diseases that an: lhaughl to ha~ an underlying mntinuous liability (e.g. hypcllCnSion). 'I1Ic made: or inhcritmcc of such lnits is usually consistent with a polygcnic madcl that assumes multiplc smallgcnelic and cnvimnmental cffects. The term QTL is sometimes used to clcscribe all the constituent loci in the polygenic modc:l. but is mon: often restricted to the loci that haw n:latively major and thcI-cfon: potentially dctc:c:table: cffecls. while the rest arc kaown collccti~ly as lhe re.sit/,.I polygenic btlckgl'Ounti. Recent devclopmcDts in molecular genclics have made available multiplc gcnelic marbn throughout the genome and enabled the localisation and deteclion of individual Qn. by linkage aad association strategies. 1bc 1lIOII papular mcIhod orQ11..liDlcagc analysis is based on an extcnsian of the \'IIriancc components model rar paltitioning phenotypic VAJlLo\ICE into gcnc:tic and environmenial c:amponc:nts. Traditional variancc components maclcls in genetics n:ly on diffi:n:nt gcnc:lic: n:lalianships havilll dift'ercnl extents of genc:lic: sharing and thcn:rorc dift'ercnt magnitudes or~ far the genetic COIIIpancnl orthc bait. Forexamplc. monozygotic: twins.han: alltheirgenc:saad have agenetic carn:latima of I. wbm:asdiZ)'gotic twins shan: half' their genes aad ba\'C a genetic ~laIian of 0.5, sa that a greatel' lnIil similarity between monozygolic twins than between dizygotic twins would SUlP lhe pn:scncc of agenetic camponcnL 1bc Cllteasion involves inlJoducing a COMPONEHI" OF VARIANCE that is carn:latcd bclwccn relalivcs to the samcClltent as the propadian of alleles they ~ aI the QTL (in the identity-by-dcsccnt sense). In OIhc:r words. relatives who share both alleles &I the QTL wiD be: completely carn:laacd
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ QUESTIONNAIRES ror the elTccls of lhe QTL and similarly those sharing one or none of the alleles will have ~Ialions of o.s and 0 n:spc:c:U\'Cly for the effects of the QTL A genome scan for QTL linkage would involve estimating the exlc:lll of allele sharing between family memben in a sample from nuuker genotype data and systematically testing each chromosomal location for a significant QTL componenL Another aPJXOllCh to QTL linkage analysis is based on linear ~pession of some me~ of lrait similarity on the proportion of allele sharing for the ~Ialive pairs in a sample of families. 111e original method. proposed by Haseman and Elston. uses Ihe squan: of Ihe lnil diffe~nce between ~Iatives as the measu~ of trail similarity. but man: recent work has shown thal another deftnition based on a weighted sum of the squan:cl difference and squared sum (of the meancenan:d variables) is rraore powerful. 111e original rq.asion approach was restricted 10 sibling pairs bul it has been exlendc:d to gc:aeral pedigrees. A major problem of QTL linkage analysis is that the POWER to detec:t a QTL clc:aases rapidly with decn:asing QTL erred size. If random population samples are studied. a sample size of tens of thousands is required for adequate power to delect a QTL that accounts for as much as 10 fjt of the lnIit variance. Selective genolyping on the basis of informativeness has been proposed as a method for reducing the cost of linka&e studies: typically families are sel«ted for genolyping only if lhey contain individuals at the extreme of the trail distribution. Both variance components and regression methodologies can be modifted to deal with samples with selective genotyping. ASSIOCIATION analysis is complementary to linkage analysis for the localisation and idenliftcation of Q11- Both family and unrelated designs an: used for QTL association slUdies. the fonner providinc for the possibilityofa within-family test that is robust to population stratificalion. 1)pically. QTL association data are analysed by LINEAR REOIlESSMJN or an exlcnsion of linear rqrasion to family data that makes allowance for co~latc:d clata. A popular melhacl assumes that the lrait has a MUI.11VARJATE NORMAL DlS1RJ8U11ON within a family. whcm Ihe means an: determined by a linear function of allelic effects and the COVARIANCE slnlctlR is determined by degn:e of genetic relationship. and possibly local allele PS sharing. between relative pairs.
ISec also 0EN'E11C EPlDaooLOOY. TWJN ANALYSlsJ Sbam~ P.
1997: Slolutia in Irumon gt.'flelits. London: Arnold.
quasl-llkellhood
nus is a generalisation of the
(Ell.
ERAUSEDl.INEARlIODELSapproach. In genemliscd linear model
(GLM) methods. such as Gaussian. Poisson and LOOlSTIC REORESSION. the distribution of the n::sponse variable is assumed 10 be one of the EXPONENTIAL 5UDLY of distributions. 111e unknown panuneten of the model are eSlimak:d by
maximising the UlCEUHOOD function. This likelihood function is based on a specification of the whole distribution of the response variable conditional on the cowriates. However. it turns out that the r.Lo\."(IM1J1I UICELIHOOD EStI. MA11:S and the estimated variance~AItL\Na r.L«RIX (and so STAND.w) ERRORS) depend only on the firsltwo moments (the MEAN and v.o\RL\NC'E) of the distribution of the raponsc: conditional on the covariatc:s. The quasi-likelihood appl'CNU:h makes use of this property of GLMs. Models are ftued in which only the link and variance functions (the functions that detennine how the mean and variance ofthe n:sponse depend on the covariates) are spc:c:iftcd. rather dian the whole distribution of Ihe response. This may be done eYen if the link and variance functions do nol c:om:spond to a member of the exponcatial family. An example may make this clean:r. Suppose that rt ..... y'vare the numbers oflumours induced in N mice. We might model the association between the expected number of tumOW'5. E( Y). and some covariate of interest. X. using Poisson regn:ssion with the log-link function. The~fore. log E( Y) = a + fJX. whe~ a and /J ~ unknown parameters. This model includes the assumplion that Var()')=£(Y).ln many experiments. however. there isovemispersion. i.e. the variance of the response is greater lhan ils expected \'aluc. This will lead to the standard enors of a and {J being underestimated and the TYPE I ERROR rate of any hypothesis test being inllated. Instead, one might adopt a quasi-likelihood approach. One possibility would be to assume again that log E(Y)=a+/JX. but that Var(Y)=.E(Y). where 4J is some unknown parameter. If 4J= I. we have the POISSON DISTRI. aunoN: if t; > 1. there is OVERDISPERSION. Also. note that if ; > I. the response distribution does not belong to the exponential family. A generalisation of quasi-likelihood consiSls of oeIEJtAl, ~ED EStIlIA11NO EQUAnONS (GEEs). WheRU quasi-likelihood is for independent responses. GEEs allow n::sponses to be correlated. This might be of use. for example. for repeatedmeasures data (see REPEAlED MEASURfS ANALYSIS Of VARIANCE). For further details see McCullogh and Neider (1983) and Diggle el al. (2002). SRS DlgIe,P.,H-.n;r, P., ....... K.-Y.andZepr,S. 2OO2:AIrtIlysis
o/longituJinol data. 2nd cdilicm. Oxford: (hfonl Uni\'ClSil)' Pn:ss. l\IeCnllop, P. and NeIder, J. A. 1983: Genero/imJ linear motk&. Loadoa: C1Iapnum & Hall.
questionnaires
Tbcse an: a means of collecting information from participants in a study. They an: useful in many reseBKh settings and good design is paramount to ensure that results ~ informative. Questionnaires can be self-administered or intervieweradminisle~ in which case this might be done face to face or over the telephone. Informalion gained from 375
~~-----------------------------------------------inlcrvie~r-aclrniniSlcn:d qaelliollll8in:s is ~ conapicle
and il usually lhau,hl to be DlDIe accurate~ because the inlervic~r is able to PfU\'icic additiORal guidancc lO lbc respondenl. InlcrYicwcr-aciminislered questionnaires musl be used with rapondcnas who are illiterale or semi-liacratc in lbc languap or lhe questionnaire. Sclradminislerm qaellionnai~ by way or cOlllnlsL are cheaper to use and ~ Icncrally quicker ror the relpondent to complelc. Sclr-adminillcn:d queslionnaires in often considered a more appropriaac technique when questions arc of a YeI)' saDSilive nalun:, such 81 eaquirics aboul illegal drq ase. When deYclaping a qaellionDaire lbc pn:cisa issues or inlerest should be considcml can:ftllly ad numbers of questions apparlionccl com:spondingly. nme speal conducliqa PILOI' S1tJDYon a small sample &om Ihe IaIpI popuIatiOll is rarely WDIIed; _lySis of Ihe pnxcss and responses will hiJhlighl problems with limin& omission of questions or misunclcrstancliag of instructions. Then: an: two majortypesofquestion ..... can be iDcludecl in any quelliClllllllin:: open and clGllCld. Opc:a qaeslions ask n:apoadenls to reply in lheir OWD words. Far example.. lbc rollowiq open question caalcl be included in a survey aboul children's attitudes to smoking: "How did you reel wilen you had your finI cipmte?' Such questions havc: the advantace Ihallbc n:sponcIImt is naI inftuenc:ecl by the rescan:hcr's sagclliaas and is able to pmvidc a mon: cletailed reply. However, suppI)'ing luch answas labs more lillie aad dTart on dac put of lbc n:apoadenl and Ihe procesl of cacliDg BDSWCIS il lime CXIIIsuming and can be complex. Closc:cIqueslionl provide a lid ofn:sponses 1i'om whiChlbc individual chaosc:s lbcir answa(s). For example~ lbc open qucslian from carliercould be made into a closed question by supplyilllthe followillllisl or poIIiible n:sponsc:s.
o o o o o
o
1 felt grown up
I enjoyed it I was disappoinlcd 1 feltiD 1 felt pilt)'
Oller ....•..•..•.•................•
When Ulilll a clascd qacstion in a face-to-face intervieweradminilll:n:d queSliaanaiM. it is helpral to lisl possible IaJIORSCS 011 a 8ashcanl so dud Ihe padicipant can easily sec all ilia options. ClasccI questions must provide all possible R:SJIOIISCS ar include a category entilled "other' ~ 81 shown hc:n:. Hale that a questiaa that includes the "other' catc:gary (as in this example) is IIDmelimes dc:cmecI "semi-opcn". A pilotllucly can be userul to cIcIermine papular respaases in the ·oaher· catcgary~ which can Ihen be inclwW as defined aplions on the filial quelliannai~
'Ib standardise n:5pontCS. caIqOries sbauld be qualified as rar as possible. For example. use lbc descriplions on lbc ripl
orthe followil1l bquc:acies. rather than thasc onlhe left. in n:spaIIsc 10 the qucllion "How oRen do ,au eat chocolate?'
Olea .... CIDCC a week Olclweea aae and CftD times. week o Man: than once a day
[] Notoftcn
[] Fairly oftea [] Vayoftca
A ftIrtIa possible _slake: to awid is to ha~ clllegarics thai 1ft DOl mutually cxclusiw:, Col. by asking a puticia-nt 10
indicale their •
by liekil1l a box below:
o
Under IS 018-25 025-30
o
30-40
040-55 5Sarolder
o
When clcc:idiag on catqories it can be helpful to elaSlR dial they will be comparable willi extcmal data. sach as ethnic groupinp asc:cI by gownImcnl bodies. Scales 1ft a spcciftc type or closed qucstian. two commonly used scales an: Ihe L11CERr SCALE and the VlSlL\L ANALOOUE SCAlI!.. The LiIccIt scale mpans a padicipanllo choose a IapOIISC indicalin, their level of apeemcnl with a Slall:meDt. For CUIIIplc. the participant llli&ht be asked whether lbcy apa: with the statement "I am n:stricled in my activilies bcc.sc of pain'. He or she would haw: to chaose one of the n:sponscs that follow.
o o o o
o
Strongly agn:c Agn:c
Neither agree nor disagn:c DilDlrec Stiungly disagn:e
The vilual analogue scale n:quin:s panici....11 10 indicate lheir response on a continuous scale. IDIII'ked al either end. For eJUlllllk Ihe n:spondent mig'" be asked to iadicalc lheir level of pain rollowiDg an apention with a CI1l5S on dac
rollowing scale. No pain
I Once the daIa ha~ baen coIlccb:d the n:sean:fH:rmusa mc:asm: the: disIancc Iiom the leMa' cad of the scale to canYCIt the n:spaDIC iIIIO a KCR. Far CXIIM:Dience. the scale is often lOan lang ~ measun:menllan: tabn to the nean:sI miUilllCl&
It il helpful if questions 1ft as spec:ilic as passi~ so dial instead of asking "Do you have a cat?' ask· 'Is then: a car CII' van a_lable for priWIC ase by )'GIl CII' a member or )'our
household?' It is also important that the wording or questions is not ambiguous. For example., the qucstion "Whm: do you livc?' might clicit msponses about geogl8pbicallocations or types of accommodation. Hedges (1979) describes other ways in which the wording or questions affects responses. In answering question (a) in the following questionnaire. 8241t of respondents replied that they take enough care of their health. whereas only 68CJL gave this response to question (b): (a) Do you rccl you take Cllough care or your health or not? (b) Do you reel you Iakc enough care oryourhcalthordo you think you could takc more care or your health? 'Thc description or the alternativc to taking enough care or your health may have influenccd respondents. However, it is also notable that the two options in question (b) are not necessarily mUlUaily exclusive: a respondcnt may be aware that he 01' she could take more care of his or her health. but at the same time considers that he 01' she takes enollgh care. A n:spondent must only be asked one question at a time. The enquiry "Wen: }OIl salisfied with the tJcatmcat }OIl m:ci\"Cd in hospital and at home?' would be bcuer split into two separate questions. Difticullics also arise when respondents . , asked questions that arc irrelevant to Ihmn. If a questionnaire distri~ ulcd to an elderly pnpulalion asks whether they get ~thJCS5 when doing housework. an individual who IICYCf docs any housework ~ answer 'no' few this n:ason. For some mcasumncnts, such as birth weigh.. boIh imperial and metric: units arc in common usc. Mon: accurate msponscs will resuk if the respondent is allowcclto report on either scale and the conversion donc at the analysis stage. The layout or questionnaires is also important. The rorm should be casy to read. particularly ror selr-administered questionnaires: it helps ir there is plenty or while spacc on each page. Questions on thc same topic are best grouped together and 'transitions' such as 'We would now like to find OUl aboul the health or your ramily' arc userul.lfscctionson a questionnaire are 10 be skipped by some respondents. then this should be made as clear as possible., perhaps by the use or arrows. Sampling considendions an: as impol1ant in questionnaire-based studies as in any othCJS and as such it is vital to use a representative samplc from the population to whom the results arc to be applied. Nonn:sponsc often introduces BIAS (see NOmtESPO..~SE BIAS) and this can be a considerable problem when using self-administeml questionnain:s. panicularly postal qucstionnaiRs. 10 minimise nonn:sponsc. questionnaires should be kcpt concise. easy to n:ad and should not begin wi'" personal or difficult queslions that disalurag,e the participant from SlaJting. A covering letter explaining the MaSOn for the research and the investigator's
~
P
M
A
S
A
T
O
U
Q
_________________________________________________________
credentials may also help motivalc the responden.. Pn::-paid retum en\'Clopcs should be included with postal questionnaira and one or pn:ferably two reminden sentlD those who do not respond, enclosing rurther copies of the questionnaire and pn:-paid mum CIIvclopes. The inVcSligalor musl make every effort to cnsure that names and addresses used are cum:nl. BereR a questionnaire is used the issues or validity and reliability should be addressed (sec MEASUJlEMENT PRECISION AND Rfl.L\IIBJI'Y). Validity assesses whclhcr a queslionnain: measures whal it intends to measure. while RliabililycwlUaleS the consislcJEy of the questionnaire when it is administered repeatedly to the Ii8IDC individual. It is thcm"ore impoItanI that the questionnaire is known to be valid and reliable before lime ard rcsaurc:cs arc invcSled in the study. McDowell and Newell (1996) gi\'c inrcmnation on how validity and reliability can be assessed. An important consideration before writing a new questionnaire is whether a suitable insIJUment already exists. 111e usc or an existing questionnaire saves time in writing and piloting. Also. infonnation may be available regarding validity and reliability, and results could be more comparable with those from other studies. Again. McDowell and Newell (1996) proVide delailed descriptions or cxisting questionnaira few measuring asped' of health such as depression. pain and quality or life. SRC H..., B. M. 1979: Question wanting df'ects: Pft:senling one or both sides or a CISC. The Stol&tititur 28.13-99. MrDoMII. L IIDIII N...... C. 1996: MelUW"iIIg ires/til: Q guide to IYllillg aUlleJ and qursliDllfttlireJ, 2nd edition. Oxford: Oxford Uni\'enity Press.
quota sample
QuOla sampling is a nonrandom s,w. IUNO MEIHOD. Befon: thc sample is chosen. the population is
divided into groups according to certain chanctcristics, e.g. age. sex or smoking status. The intervicwcr is then lold 10 interview a specified numberofpcople within each group, but is givCII no i.trumons on how lD find the people 10 interview. Quota sampling is ollen used in opinion poUs ODd in m.aJtc1 research survcys. Quota sampling has the advanlage that it is quick and easy to do. Any member of the samplc can be replaccd with another member with the same characteristics. which is not the case in random sampling. A major disadvantage or quota sampling is that, as it is completely nonrandom, then: is likcly to be a gRal deal or BIAS in thc selection process. The interviewer is more likely to approach people who an: casy to question 01' who appear cooperativc. It is also difllcult to find out about thosc who do not cooperatc. since they are replaced in the sample. Howcver, if no sampling frame exiSlS then quota sampling may be the only practical mcthod or obtaining a samplc.
QUOTASAM~
_________________________________________________________
As DB example., in a paper aaessing &he priorities for allocation or donor liver grafts (Neuberger el QI.• 1998), qUDla sampling was used to choose memben or &he general public to be included. 1bc: quota was designed so that &he sample would be 'nationally rcpresentaliw: and included 1000 people aged 15 01' above.. It was based on I~II qUDla ror sex. household tenun:. age and work status. Quota sampling was also usc:d to choose &he rqions from which &he ramily doctors came. quotas beiDg based on region. with one
practitioner per pnctice. Wilhin regioas the selection of practices was random. Far rurther details sc:e Crawshaw and Chamber.s (1994). SLY
CnwIIIaw, J. aad C"1aamIIen. J. 1994: A COMbe ~ in A terri statistics. 3rd ecltioD. ChelteDham: Stanley 1bomes PuIIIishaL Nta.......... J.. Adals.D.,MacMastu,P.. Maldlllellt.A. .... Speed.M. 1991: Assessing priarilics far allocation ofdonor Iiva' grafts: suney of public: and cliDicians..llTilUIIllttlkal JOJItrIQlll7. 172-S.
R R This free sIaIislicaJ softwan: offersexlensive dalaanalysis andlraphics facilities. R nuw on maD)' diffen:nl compulcr opcnllinl systems (including Windows. MatOS and various ronus of Unux). and pmvicles a wiele I1IftIC or IlAtistical analy&c:s and has ~ powm'uI facilities far procIuc:ing publicalion-quaJit)' graphics. R is used warlclwiele by n:searchers in both universities and indusby (includiq the phannacc:utical industry). In addition to the pralcfined statistical anal)'ses ami lraphics capabilities. R provides a fUlly featured prugnaDmiDI lanluagc for manipulaaing data and for crating new analysis Dad lraphics fUnctiOBS. This pmJramming languDle is similar to the S language. sococle that is written ror S-Plus will oRen run in R without modification. The basic functionality or R can be extended by loading add-on packqes. or which lIIIft an: now several thousaad available (see the Comprehensive R An:hive Network,. htlp".Ilcran .r-projecLorgI). The default user inlcrface far R is a mmmand line. but a number of GUI inlaf'aces an: available via n:1ab:d softwan: projects and add-on packages. Far mon: inrannation on R. sec: the. R hamepap (hllp:ll www.r-project.or,I). which has links to download sileS. mailiq lists. clacumentalion. add-on packqcs and related softwan: p!Ojects. 'Ibe documentation on dais website includes several book-ICDlth inllOductions to using R tha. CIIII be downloaded ror rree. There an: also many published books: DaJlaani's Inlroductory stlliislies Ivilh R (2008) provides an entry-level statistical mlllCXt while Mamll's R gmphics (2005) focuses _ praclucing a variety or chans and figures. Venables and Ripley's Modem IIpp/ied sllllb· lies with S (2002) provides a more sophisticatc:clln:atment. Some or Ewritt and Rabc-Hc:skdh's Alltl/r-in, medicIII S-Phu (2001) and all or Everitt and Hothom's A hturdboole of sllllbl;«I/ tllrlllyJW$ using R (2006) an: also applicable. PM
.'lIlISi. ""II
rd, P.2OOI: ,,,,,otIMtlory sloliflia ,";Ih R. 2nIIcditiaL New YOlk: S~ . . . . B. Md........, T. 2006:: A htmtJ600Jc III sltlliflittll"""h'les ruilrg R. BCIaIltalon. FL: Chapmllll4 HaWCRC. . . . . . B• .ad ............. S. 2001: AIItIJ,:iRg mediC'llI dtlttl aing S-P/u,. New Yark: Springer. M......., P. 2005: R ,rapllirl. Boca Ra_ fL: Chapman a HalLatc. V_hili, W. N..... iUpIeJ, B. Do 2002: MtNkm tlpflliaJ ,toliflitl "'jl" S, 4da ediliaa.. New York: Springer.
random effect This is
one of a set or effects on a
JaJIOIISe wriBble conapond.g to a set or values taken by
an explanatol)' 'VIIIiable. Random effects an: incluclc:d in a regn:ssiaa modeltoacknowledp that n:sponse tends tocliffu between the gmups deftned by the explaalllOry wriable. By including random effects. the invc:sliplOr can estimalc the MEAM level ofn:sponsc across gmups and theex1Cnt to which rcspaasc varies bdwccn gruups or estimate the effect or anathc:r variable of inleRsl while conbaD.g far the differences between groups. 'l)'pical examples or expianatmy ariablcs include indicator variables for hospilals in a national survey. centres in a multiccntn: study, paticnts in a longitudinal dataset ar lIudics in a r.IE1'A-ANALYSIS. If tile variable defines Ie distinct groups in the datasea. e.g. if Ie hospitals arc m:ruited far a survey. the Ie rancIcHn hospital effects in the pn:sent survey an: assumed to be drawn from a distribution or etrccts lUSGCiated with the populatiaa of hospitals in pnera!. It is common to assume random effects to be drawn ftum a NORMAL DrmtlBU11QN. The wrilllKlC or this distribution is estimated in the _lysis and n:pn:senls tile extent of varialion belween the groups. Far example. n:scan:hers may be intcn:stcd in the variability between hospitals in admission niles or the yariability between famil)' doc~ in plaClibing lipid-lowering drulL Random effects an: appruprialC when the invcstiplar wishes to estimalc or control for tile distributi_ orthe group effecls dcftnc:d by the explaaatmy wriable oyer the populalion or possible poups. When fittilll random etrc:cls.. the groups an: 8S5IIIIICd to be exchangeable. which means thalthe investilator has DO n:ason to distinguish one gmup from anaIhcr and wauld (in principle) be prc:pamI to mix up the names of the gmaps in the dataset berore canyiDl aut die anaI)'sis. An a1ternatiye a)JpR)8Ch to modcIlil1l the differences between groups is to model these as FlXEDE&ECI'S. This is appropriate when the rocus is on the group clJ"ec:ts for the k specific groups iKludccI in the dalaSCI. e.g. when estimaiiDl the response ofpalienlS tocach Glthree tn:atmcnts compan:d in a CLINICAL TRIAL. RT
random effacta models
This is essentially a syno-
nym for MUIl1I.EVD. and LINEI\Il MIXED EIRL'TS MODElS. This lenn is commonly used to n:pn:sent regn:ssiaa models that include both F1XID EfHrI'S and RANDmI EffECI'S.. RTIBSE
random effects models far discrete longitudInal data Disc:rctc responses include Calcpxical rapanscs (e.g. dichotomous ar ordinal) and inb:gc:rs (e.l. caunls). We can usc weU-known models suc:b as LOCIISt'IC RElCIIESSICIf
&rqdtlfltl6tlie CMIJIIIIIiaIr ,. Mtldiml S1",_1in; S«fIIIII EtIiIiwI Edited by IIriu S. Everitt .... Chris. . . . R. JIaImer C 1011 JohD Wiley 6\ So& ....
RANDOM EFFECTS MODELS FOR DISCRETE LONGITUDINAL DATA _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
and POISSON REOJlESSION to model the cfTects of covariatcs on the responses. However. an extra complication with LONOITUDINAL DATA is that the responses on the same subject tend to be dcpende:nl over time:.. For instance. in the eumple conside:red later, some: subjects are consistcntly IIIOI'e prone to thought disorder than oIhClS. While some of this dcpendc:nce is due to subject-specific covariales that we can includc in the model. some: extra dcpc:ndence usually remains even aflcr controlling or adjusting. for the covariates. 'l1Iis extra dependence. which may be due to omilled covariales. can be modeUed using RANDOM EfFECTS. Our focus hen: is on random effects models for dichotomous responses but we briefly consider RANDmI E&ECTS MOOEI.S for counts or incidence ratcs later. For dicholomaus and ordinal n:sponsc:s the most popular models are logistic: n:cression and probit regression (sec PROBIT MODfL). Let Yu be the mC8SlRment at occasion; = I. ... , n, for a subjcc:tj = I ...., N. The simplest random effects logistic regn:ssion model includes a subject-spc:cific nndom intercept (ClI to model the dc:pendence among the n:pc:ated measurements. Here the log of the odck of a "I' n:sponsc versus a '0' n:sponse is modcllc:d as:
In
(Pr()'q = IIXY. C~)) = (flo + ~ .) + fJ.Xlf Pr(>'v = Olxy. C~) l¥ IJ + ... + fJp·'fpq.
where xo=(.t'J ....... xptJ) an: co\'DIiales with regn:ssion alCfficic:nlS fJl to fJl" flo is the mean intc:n:ept and (01 is the deviation orsubjcctls inteKCpt from the MEAN. Thccovariates typically include: time or functions of lime to modcl changes in the log odds over time. The random intm:ept ( .. is assumed to be normallyclislributed with zero mcan.lnclusion of (q, allows the overall log odds of the n:sponsc to "DIY bc:Iween subjc:c.ts. eYen aRcr controlling for the c:ovariates. Since ('ctt remains constant over time. Ihc log odds and thcn:fore the PROBABILlfYof a "I' response for a given subject is either gn:aIer than expected given the c:oVDriDlc:s at all occasions (if Ccw > 0) or smaller dian expected (if ('W < 0). producing the mauircd within-subject dependence. The random intcKCpI can be inlelpn:tcd as the component or the efl'ects of all omittc:d covariates on thc log odds that is constant over time and ullt'Om:laIcd with the included COyariates.Thc random intc:n.-cpl logislic regrasion model can equivalently be expressed in terms of a IDICnl (unobserved) resporwc: yY (sec UlENI' VARL\BI.ES) underIyilllthe observed resporwc:)'g. where.'"Q= 1 ifYq > Oandyu=Oolhc:rwise. The logistic regression model becomes a linear regression model for the lalcnt resporwc::
)'q =flo + fJ.XIY + ... + fJ,xlii + (~ + lij where I y is indcpc:ndcat of (~ and has a logistic distribution with mean zero and VAlUANCEJil/3.
The stn:ngdl orthe residual within-subject depenclc:nc:e can be expressed by the intl'Klass CORR£Lo\TION for repeated lalc:llt responses Yj and
Y;i
p = cor(>j,Yijlxy••t'i'j)
= Cor«,Oj + ,.<~ + £~j)
Var(~)
The: random intercepa modcl assumes lhat ahe log odds change in the same way ovcr time for all subjects with the same covariate values. Sincc Ihis may be unralistico we can allow the linear growth or rate of change of the log odds to vary randomly between subjccls by including a random slope: ( 1/ of timc in the: model, giving the random coc:flicient model:
In
(PrCYij = IlxY,
= (110 + (~)
+ (PI + {lj)·t'lq + fJ2.t'lij + ... + fJ"xpij,
where XIU represents the: lime at measurement occasion i for subjc:ctj, fll is the: mean slope aDd (1/ is the deviation of subject j's slope: from the mean slope. 'l'be random interccpt (ClI and slope ('1/ are typically assumed to have a BIVARIAtE NORMAL DISTRIButION with zero means. The model can be extended by including funhc:r random effects for other variables, for instancc polynomials in timc 10 model variability in nonlinear growth. MAXIMUM LlKEUIIOOD ES11MATION is the state-of-the-art me:thod for estimating IL\NDOM EFfEctS MOOELS for di~le dal8. The marginal UKELIHODD is obtained by °inlegnling out' thc random efTects. When the random cfl'ects arc: multinonnal. the: integration is typically perfonned using Gaussian quadnture or the: superior adaptive quadnlure approach (e.g. Rabc:-Hesketh. Skrondal and Pickles, 20(5). Computationally emcicnt but rather crudc approximations such as penalised QUASI-LiKEUIIOOD or marginal quasilikelihood are commonly used. Somelimes the: distribution of the random effects is left unspecificd and nonparamctric maximum likelihood is used (e.g. Aitkin. 1999). The Madras Longitudinal SchizoplRnia Study fonowcd up patic:nts monthly after lheir first hospitalisation for schizo. plRnia. We will usc nndom efl'ects logistic regression to invesligatc whc:thc:r the course of ilIncss difl'crs between patients with early and late onseL The variables considen:d are: [Month): number of months since ftrst hospitalisalian; (Early): earlyonsct (I: before agc 20.0: at age 20 or later); [y): repeated measures of thought dison:Ier (I: prescaL 0: absent) The first tablc contains a subset of the data. namely on whetherlhought disanler I.)') was prescat or not for44 female patic:nts at 0, 2. 4. 6. 8 and 10 monlhs after hospitalisation.
RANDOM EFFECTS MODELS FOR DiSCRETE UNDTUDINALDATA
nndom eII8ct8 models for cIIacnde Iongilidunal data Data on IIrought dIsoIr:Jsr j
mrly
)'0
I 6 10 13 14 15 16 22 23 25 27 21 31
0
I 0 I 0 I I I I 0 0 I 0 I I I I 0 I I 0 0 0 I I I 0 I 0 I 0 0 0 I I I I 0 0 I I I 0 I I
1 0 0 0 0 0 0 0
I I 0
I
34
0
36 43 44 45 46 48 50 51 52 53 56 57 59 61 62 65
I
66
67 68 71 72 75 76
0 0 0 0 0 0
I 0 0
I 0 0 0
I 1 0 0 0 0 0
I 0
77
I
79 80 85 86
0 0 1 0 0 0
17 90
MaximlDll
72 I
..
,.
,)'ao
)"
.Y4i
I 0 0 0 I 0 0 I 0 0 I
0 0 0 0 I 0 I 0 I 0 I
I I I
1 0
1
0 0 I I
0 0 0 0 0 I 0 0 I 0 I 0 0 0 0 0 0 0 0 0
0 0
0 0 0 I 0 0 I 0 I I 0 0 0 0 0 0 0 0 0 0 I I 0 0
0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0
1
0
0
0
1 1
I 1
0
I
0 0
0 0
0
0
I
1
0 0
0 0
0
0 1 0
1 I 0
I 0 0
I
0 0 0 0
I 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
I
0
I I I 0 0
I 1 0 0 0 1 0
I 0 0
I 1 1 0 0
0
0 0 I 1 0 I I 0
0 I 0 0 0 1 0
I
I
I
likelihood eslimaICS based OR acIapIi'VC' quad-
ndun: ampvcn in dac seC:o.d lable (estimates \WR ohrai'M"d
usilll 1I1amJD: IdIp-Jlwww.II"".cq). For the IIIDIIom
iDlaa:pI .....1 the adds or lhauchl cIisanIer cIem:ase over lime in III1e-oD5cl WameD wi.. ·... estimated subjcc:t-specilc ODOS RAm 01' -0.40) 0.67 per 1IIGII1h. Early ORIel palieatsdo nal seem lo ha~a hishc:radds ordlaupa disorder at fInt hospitalisation (cslimarcel subjecl....iftc adds IBlio exp(o.o5) 1.05). Early-onsd palil:nlS appear 10 have a Ilalel'decUne iD dleir ocIds of t.....l disanIcr ewer time..
ape
=
=
11Ie odds ndios slaaUlcI be inlcqnlal as aJDdili..aI OD die: random inl&ftcpL sometimes Mrc:nal1O 85 subjcc:l-specific. Por lime.. die acids ratio ca" be 'Viewed as .a wilhin-subjcc:l comparilOll ad'for subjcc:1-spec:i6c: cDYariala suc:h as early aaset~ the c:omparilOD ~ bctweea two paticDls .avilll die IIIIIIC valucordlel'llllClom iDlcIa:pL The intradasscom:Jali.. or Ihc IIIlcal n:sponses is alimaled as p = 0.46. ...... eIfecIII modlll far dl8cntle 1ongItudI.... ..... Repeated ~ 01 thought disorder. maxtnum IIceIhoDd estimates for dichotomous IogfstJc 1'fIgffIIJSIDns' wIIIJ nurdom inIetcIIpt Md wiIh random inIetcIIpt BtJd tandom slope for (Month1
IlIIIrtftJm
RtIIIdom
iRt~lWpl
nH!jJk1e"t
mal
IIIIIIII1 &1(5£,
bliSS) FixetlptlTl
flo [Cons] /lalMoalh) /l.. lBluly) /ls lEady) x 1M"') RIIIIIiom /HUt Var(<.) Var(1/) Cov( .,.(q,)
Log-likclibaod
1.01 (0.46) -0.40 (0.08) 0.05 (0.88) -OJJ7 (0.14)
1.36 (0.66) -0.51 (0.13) 0.04 (1.21) -0.06 (0.21)
2.76 (1.24)
7.17 (4.Gl) 0.14 (0.09) -0.71 (0.53) -121.20
-124.75
Estimates fex: Ihc I1IIIIIom codIIcicni. madel am also n:poned iD the second table. The random l10pe wuiaDc:c is ellimatal as 0.14 _ the COVARIANCE between the illkRcpl aad slope as -0.71. c:am:spancliq lOa Corm.aiOD of -0.70. Thcrd'OM those at hi&hcr risk ofdlauchl disonIcr atlhc lime or hospitaliSlllioD cxpcricllcc a p:aIcr reduction in dacir risk DYer time than those at lower risLll is importanllo DDfe that dac random iata=pl variance aad Ihc 'c:anclatiaD bdwccn. dac nuadam inlcm:pl ad cacl1lciclll IR illlClpn:led at IM..lh] =0. (SublractiDS 5 maadas rnxa lMondlI YIelds an estimated corn:lalion clase 10 ZCIO.) 1b pin lI1CR.iDsiJhI into IhI: model. we ha~ pIoIled IhI: concIilioDal or ..bjrd-:J/lfti}k pnJbabililic:s or lIIauPl elisanlcr shen Yari_a values or the random iDlaapi (:1:3) and slope (=0.4) rCll'WOIDCn with early onset. These are showa.as cIasIaccI CUI'YCI in the fIItR OR pap ~ when: Ihc cIauaI
S81
~FOR8ST
__________________________________________________________
~ I.. ----.-~=:---------------------~ .... , -~--~~--~ ·'0... .... .. ~......... "" '
......"
"'
..........
"......."
\
".
,\
\
\
,... ......
\'
,,
,
\
,,, -......-............. ~~
.
-.
" ......
,
- ..
..... -..........-......,..'•....... ~~
"~
~-~~_
,, ,
~
,
~~-
~ -~r-~-
o ~------~----------T---------------~-----~-~-~-~-~-~-~-~-~-~~~~~~~.~~ :4 ~. '8. 10 2·
I
Monih
Marginal
................. fixed part
. - - - _. ConclliDnai
I .
I'8IIcIom effect8 models far discrete 1ang1tud~1IIII data CondIIIonIIJ and marginal predictedprobabilities of thought tIsotrIer for women with fIIIIIyonset. Dotted CUMJ is for condiIIonaI probability from the 1BIJdom COIIffIcient model when both the fBIJdom intfJlOBPl and slope are zero
cuneis thc~PR08AIIUTY rarnmclom intereepland slope boIh equal to lheir papulatiCIII means of zero. thus rqxesentinga'typical' individual. Also shown asa solid bold ClIne is the populDlion tDy'tlge or mllllinal probability of lhou&ht disonIer obtained by inlepaling Ihe conclilional prababilil)' OWl' the I1IIHIom cfl'ccts distribution. Note Ihal the papallllion avcrap curve is considerably ftalter Ihan ahat or a typical palienL Such lIlIcIIuation ~f the cfl'eclS or covariatel in marginal maclels campamd with eaaditional maclels is a well-known phenomenon far dichotamaas rapoDSCs (see OENER.WSED ESTWATINO BQlL\TIONS). Far counts or incidence J1IIes. the most common mocIel is Poisson rqn:ssiCIII. As for the rancIam cfl'c:ctS logistic mocIeI. RANDQU INI'ERCEPI' MODELS and cliR'cn:nt kinds of random coefficient models CaD be specified. Nalc that the maqinal ar papulatian-avc:rapcl eft'c:cIS equal the conditional or subject-specific effects rar the nadom inlcn:lept Poisson model. Randaaa effecls models rar cliSClele IonplUdinai data Clift abo be used for mullilevel desillW with more than two 1eYeJs. for iDsIaDcc whCR n:pealed measures IR nc:sIed in palicnlS
nested in hospitals. Rabe-Heskcth and SIaondaI (2001) gi~ a general trcaIrnc:at or randam cfl'ects models ror cliscn:Ie data. SRHIAS who
1ft
AIt.....M.I999:AJCDeDImuilDUJllliblihaadaaalysisofvariancc lCIBIIiscd linear mudd•• BiDRwtria 55.117-28• ............ So ........... A. 2008: Mldlihl antIlDlllitll-
c:ompoaaIIS in
diRsJ n_1ing UJhtg SIlIla.·2nd edition. Cal. Statian.. TX: SIa.. PIa&. So, ......., A. MIl PIddII, A. 200.5: Muinnun IikcUhaad cIlimaIioa of Waited and discme . . . . .1 'Variable I'DDIIds willa nested random clfCCl!. JDIIIfIIII oj E~t· ri~s 128, 301-23.
""'1IIIbIIa.
random forest This
is a nonparamdric method rar regn:ssion. classification aad survival analysis (see SUlMVAL
ANALYSIS - AN CMRVIEW). which works especially well rar exploratory variables with nonlinear inlucnce OD the re.sponse. in the praence or hiPcr order interactions and far HIOJl-DIltmaIONAL D.o\TA. (Bn:iman~ 200 I). The methacl is motivated by the idea of pawing an eRtemble or tm:s on BOOI'S11lAP samples or the original cIaIa. Pratictions an: computed by avengilll (n:sn:ssion) the pn:diclions of multiple tn:es or by a lDIIjarity vale or class pnDclions (classifiealion). Using Ihe aut-or-booIsbDp abscmlliClll5. random ran:sIs.compule an honesI cslillUllc of the generalisation enar. which can be used to assess' the model fiL Empirically, the al&arithm has been found 10 be rather insensitiYe 10 the choice of h)'pClplll'8Jl1CterS. A drawback is thai the fttted madelsarehanllo inlclpn:L Variableimpartance measun:s 1ft cammonIy ulCd 10 cIeri~ a nuaking of the explondory wnables with n:spcct 10 their influence on the response. However. thc:se melllUR:S haw' been queSlionc:d by various authcn and lmlaina mataofdebate. AppIicaliOlWof rancIam forests in medicine and biology include Imp-scale
______________________________________________________________ RANDOMISATION
ASSOCIA1'ION sbKIics far complex genetic diseases. e.g.todclcct SNP-SNP (sing)4>auc:lcolide poIymoaphism) inlaaClions ill
the casD-ConIIOl aJIIlc:x1 by means or cornputiDg a random fcm:st wriabIe imparlanCe I1IC8SIR far caeh polymorphism. PR:diclion of PIENOI"YPES based aa amino acid or DNA sequence is anadu:r impartanl an:a 10 which random fCRSIs ha~ bc:eD applied. TN (Sec also 1OOS1IXO)
........... L 2001: RIIIICIam forals. MllcltiM ulI1Ifing 45. 1,5-32.
random Intercept model
See LINEAR MIXED EFF&.'7S
1UJDEL
random
Intercept and slope model
Sec LINEAR
J.I1XED E»'ECI'5 t.fODEL
randomisation This is the pnx:ess by which palienlS should be assigacd lo IrCallDenlS in a CLIMCAL TRIAL AI die oUlId we should CODlrasI nndom assignment (randomisation) with nuacIom samplilll: the lauer is the process by which weselcct individuals to lab pari in ourcxperimcnl and forms the basis of our concluding thai the rcsullS apply to a broad popuIaIiaa (sec 5AMJILINO MS11IODS-ANOVEIlVIEW).lfwe ~ to Reruit oaly ,oung healthy males to a clinieal biaI or any odler rorm ofraean:h. it would be unn:asonable toexpccl our n:sults 10 apply 10 a braad populatioD of men. women. adulls and children. UnrortUDlllCly. random sampling is almast DCYcr possible in the canlext of medical raearch although we may still be able to make: n:uonabIe MrCRIICC to a fUlUM population lo which our n:sullS may apply. Ultlc man: wiD be said heM about random samplilll. The key n:asDIl for USing randomisalion 10 decide which patienl n:ceiws whichlRabnc:nI is loeliminale BIAS (Altman and Bland. 1999). Bias may obviously be introduced delibcnIIely but il CD also be intnxluccd inad~ntly. $cleating healthier palieD15 oa: those with fewer - or less IC'VCM symptoms lOn:cei~ODC balmenll1llhcrthD aDDlhcrwould pmbably inRIICDCC die aulcomc fnHD a clinicallrial sach that we might incom:ctly infer thai one balmenl was beller al n:licviDg symptoms Ihan aDDlhcr. From Ibis it should be clear dial DOl only musl the sequence be random but those Msponsible fo.. recruiting palicDts must not know whal the sequence is - oIhcrwiIC it would still be possible to cDrol thc "milder' palicDIS on lo one tn:atmcnt ann and the 'more ICVCM· paliCDIS on lo the other. This is also a MasOn agaiDst 'alternate' allacation of one IRaImcnt followcd by the other (Chalmers. 1999). 'I1Ie SlRngth or a clinicallrial should be that causality CaD be infem:d because the lwo (or more) lIabnc:nt gmups 8M balanced - alleast in Ihc probabilislic seDse - for all factors cuept the tn:abDcDt ftlCCived.
11M: table contains an cxtracl ofthc table orraadom digits published b, Altman (1991. p. S40). 'I'hcIe 8M may ways or usiDg such a lable 10 assign patients randoml, lo two tn:atment groups. The simplest approach might be 10 assign cven numbers to Mcch·c Ircalmenl and ocIcI numbers 10 receive plaa:bo, or conbOl. Rc:adinl8Cl'OSS the tint lOW the III palicat is assigned Dumber 47 and so n:cci'VCS placebo: thc 2nd patient assigned Dumber 44 would RlCcive activc baltment: lhe 3rcl, 4111. Sib and 6th (assigned 76.. 60. 72 and 56 respectivel,) all ftlCCi~ aclive lrealmcnl~ the 71b palicDl (99) Rceivcs placebo and so on. We could split the numbers up so thallhc lSi palieDl is assigned number 4 and the 2nd number 1 (ftlCCivilll aclive and placebo lRaimenis respcc:ti~l,). and the nCJLI patients ~ assigned 4. 4. 7.
6.6. cIe. ...doml....on Random numbers
47 3. 72
3. 03 36
9. 42 32 S7
44
76
60
72
56
60
26 83 99
13 9.
69
74
99 80
86
54
61
62 99
78 32
7S 3.
46
76
56
49
40
92
31
66
68
86
11
27
7S 63
74 91 21
29
46
60
89 40 49
61 72 73 9S 24
19 26
12 76 21 32
20
20
71
48
86
9S 43
30 99 86 99
06
91 14
82
go
54
32 48
72 24 81
S2 73 07 SO 46 36
42 38 OS 31
AllCIDali~ly. we could choose 10 read down Ibe columns iDstcad of across the rows. Anyone of Ihc:sc proccdura is perfcclly valid and will result iD an unbiased aDd (iD lhe long run) balanced assignmcnt belWCCD Ibe lWo IRabDcDt groups. provided lhat the rule tor using the random Dumber dilits is sct out in advance. If we had diRe lrcallnent groups lhen wc could usc Ibe same tables but use the Dumbers I ~ 2. 3. for assipmenl to IRalmcnl one: 4. S. 6 for assignment 10 trcalmcDttwO: and 7. 8.9 for assiglUllCDt 10 IRalmenl three or placebo. AD occurrence or a ZCIV is ignorccl and the DCxl digit usc:cI instead. Other rules can be eSlablished for assignmenl 10 any number of IRalmeDts. This is Mferred 10 as 'simplc randomisation'. Wc shall makc Mfcrcnce 10 this table of random digits latcr to iIIustralc dilTcrenl forms of randomisation. Refc:n:ncc was made cKlicr 10 ~baIancccI' In:almcDl gJOup5. Balaac:e is one of the moll impostanl aspects or a cliDicai trial because it ensun:s that we 1ft making a fair comparison bc:twecn the tn:aIrnc:n~s and neither one IK1I' Ihc ather is pn:disposcd to showing a bellcl' or wane respaasc. 11IcM I n three dilTcn:nt aspc:cts to balance that an: userul to discuss: casuring the same numbcl'ofpalicnts recc:iveeach
~ISAnON
________________________________________________________________
of the In:almenls: ensuring thDl demographic data and disease SC\'erity data are similar between the two In:atment groups; and ensuri", that factors unknown to the experimenter but that may neve:rtheless inftuence outcome an: also baJancc:d bc:lwccn the Raiment groups. Randomisation is quicc a mnukable: 1001 - not only doc:s it balance the: treatment groups for all the kDown important prognostic factors but it also balances for any faclors that may be important but we may not know abouL In the example gh'c:n using line 1 of the table, among the ftrst 10 patient assignments only two of them are odd numbers and so would receive the placebo (patient numbers I and 7). SuperfiCially. this seems to be a conccrn although. in fact, if we randomise a sufliciently large number of pDlicnts using the rull random number table then we should - on average - assign an equal number of patients to each of the treatment arms. Trials of only 10 paticnts an: cxtremely fan: and would probably not be \·cry convincing whatever the results. However. in a relatively small study. perhaps of some complex surgical tc:chnique R:quiring a lot of skill on the part of the surgeon. wc might be conccmc:d if so few of the carly patients wcre assigned to PLACEBO (or treatment). It is quite possiblc thDl the skills of the surgeon might improve over lime and so the ovemll outcome (regardless of lmltmcnt group) might tend to improve as the trial progn:sses. Theref(R having an imbalance in the number of patients assignc:d to one: or other treatment very e:arly on could introduce a bias between the treatment groups. For this n:ason. we often usc blocking to e:n~ that at regular inlerVais the number or patients assigned to cach treabncnt group is the same. In the simplest case of two Rabncnt groups. we may use block sizes of 4. This would me:an that within eycry group of four patients. two are assigned to In:almcnt and two are assigned to placebo. We would not know which two patients n:ceive either In:aIme:nt or place:bo, neither would it be the SBmc two in every block offourtrcatment assignmenls. When we have two lmltments (call them A and B) and assignment is in blocks of size four. there arc six possible configuratioa of each block: AABB. ABAB. ABBA. BBAA. SABA and BAAB. Each ofthesc blocks should be equally likel)' to occur. With two In:atmcnls.the block size does not havcto be four but it could be any multiple ofthe numberoflmltments. Similarly. ifth&=an: fourorlnCR trealmen15then the block size might be anything ftum twice the number of In:aImenls upwards. The advantage of using blocks should be quite dear - DOl only will the total number of pDlients assignc:d to each treatment be the same at the end of the study but also. on a regular basis (perbaps once every four patients). the number assigned to cada In:alment will also be the SBmc. These types of proccsses are ~fc:ned to as 'blocked randomisation' or 'restricted I'DDciomisation·.
mam
1bcre can be disadwntages 10 blocking since il can compromise blinding and so allow bias to be introduced. Consider an cxtreme situation wh~ there are two In:almcnls and the block size is two. If. for some reason. the assignment of the first pDlient wen: 10 be known (possibly because of typical adverse reactions or cven some inadVcnc:nl unblindiog or the treatment). then the identity or the next treatment is necessarily known. This is typically why a block size equal to the numbcroftrealmcnts would not be used. Even irthe block size is twice the number or treatments. ir one of the early patienls in the block is unblinclcd.then still the probabilit)' of fulU~ assignmenls will nOi be 0.5. Such potential unblindiogs arc arpmenls against blocking - or at least in favour of longer blocks. but the longcr the block the less the balance on a regular basis and so a compromise bas to be found. One slraIcgy that is sometimes usc:d is to VBJy the block length. perhaps between blocks of four treatmenls and blocks of six. lreabDcnts. If the fifth paticnl wcre to be unblinded then the invcstigator would not know if this is the penultimate patient in a block or size six or if it were the: ftrst patient following a block or size four. This strategy can greatly help to climinatc possible biases due to unblindi", but does increase complication in packing trealment. This strategy of blocking e:nsma that the:re is balance on a regular basis through the: duration of the trial. but a further featun: upon which balance ma)' be desired is that of demographic or disease severity fadon. In the simple:st case. consider that pea is a factor highly prognostic orln:Dlmcnt outcome so it is important to ensure that th~ is not an imbalance of men or women assignc:d to one or other treabncnt. Simple randomisation as just described should cnsun: that this is the case but only in a long-tc:nn or probabilistic sense and in any particular trial. if the~ were some imbalance. then this might bring inlo question the validity or reliability of the n:sults. Stratification is a vcl')' simple mechanism that uses different randomisation sequences ror the different strata on which treatment balance is necessary. Usi", the random digits in the tablc, we may decide: that the first five rows should be used for assigning men to either trealmcnt (eyen numbers) or control (odd numbers) and that the second 'h'c rows should be used for Similarly assigning women to either treatment or control. A common misconception is that stratification ensures equal numbers of men and wome:n on each treatment. but considering random sampling. described at the beginning of this sc:cIion. this is not necessarily the casco The proportion of men and women in the tllllci populDlion with the disease that is being studied ma), not be equal. 1bc proportion of men and women in the: targc:l population who arc prepared to take pari in the clinical trial may not be equal. Stratification ensures that of the men who take pan in the trial. half or them will be assigned to In:atment and half to conbol and that of the women who
________________________________________________________________ RANDOMISATION lake part in lbe lrial. similarly. half will receive treatment and half will ~eive conlrOJ. Considering our earlier discussion concemi", blocking. il will be evident that the equal allocation to lreatmenl and placebo of the men (and of the women) will only be the case in the long term and it is quite common within slralifted randomisation to also include blocking. This introduces very liltle extra complexily. One practicality of pn:paring slJatified randomisation sequences is thal. instead ofone sequence being prepan:d for the entire trial.lwo or IDOIe sequences are ~d (one for eacb strabJm). 1hc mcthock used to introduce blocking into the single sequence used for a trial without stratification is simply n:plicatal in each of the slndum-specific nmdomisalion sequences. In MULTJCENIRE TRIALS. the most common fealurc of the nmdomisation scbc:mc is slratific:ation by investigator or cenlle. This ensures dlat eacb cc~ uscs all of me In:atmenlS and there is no risk thai. within any partiadar ccnlR~ all patients mighl n:ccivc only one of the tn:aImenls. II is quite possible to stratify on man:: man one factor. The simplesl approach where this applies is in a mulliccnlR study whc:rc the~ is one (noncenlre) factor on whicb stratification is nc:cded. DitTe~nl randomisation sequences are prepared for each centre so that we ha't"C stratification by cenlR.ln fact. each centre would be given lwo randomisation sequences. e.I. one to be used for males and one 10 be used for females. Such a study then has lwo stratification factors (centre and Icndcr)~ although. in praclice. each investilalor would only sec and use one stratification factor. thai of gender. When there arc two or more impOJtanI prognostic facton but both needs to be balana:d for. then slralification begins to become I'DIher meR complex. Consider the case where we wish 10 stratify by lender (male and female) and also stage or disease (early or advanced). We now need four randomisalion sequences: the first can be used for females with carl)' stage disease. the second for females with _h'8JIced slqe disc:asc. the third for males with early stage disease and the fourth for males with advlIRCCd stage disc:asc. The polcnlial for using the wronl randomisation sequence obViously becomes hidter in this situation and if any of the faelen bas more than two levels or if there are more than Iwo or thrc:c factors then the number of sequences usually becomes prohibilively large. The logistics of randomly assigni", patients to lrcatmcnt can be eased by central (often lelephone or web-based) randomisation schemes when: the inveslilator docs not have to concc:m himself or herself with USing difrerent randomisation sequences for differenttypcs of patients. If medication is supplied to the investigational site in sequentially numbcn:d pack~es then a central randomisation system. if liven details of the levels of the stratum for any particular palient. can simply inform the investigator which
In:atment pack to assiln to that patient. The details of multiple randomisation sequences need nol then concern the investigator. Despite the fact thai the inveslilator does not need to concern himself or herself witb the mulliple randomisation schemes. somebody docs! The additional complexilies of treatmenl packaging and assignment should nol be underestimated and the potential for errors in randomising palients to treatments must be considered. The risk of inb'oducing errors can become quite hilh. Also~ it is quite possible. with many strata and n:lalivcly few patients. thai some of the combinalions of slrala values may occur only very infrequently or not at all. Because of the desin: to try to balance within these stratificalion factors thCI'C can sometimes be a problem that the overall balance between the treatmenl groups (i.e. the 10101 number of patients on each treatment) may now become unbalanced. Methods exist to try to find a compromise between balancing individual faclors and balancinl the tolal treatment assilnmenl. MOSllrials are anangcd such thalthe sequence of rand0misation codes is eslablillhed before patients arc n:cruited 10 the trial and medication is then pn:-packed and despatched 10 inveslilalors.ln this case. although the sequence is unknown 10 the investigators and me patients it is filled and potentially known to the study statistician or mose n:sponsible for packing the medication. Provided palients arc assipcd sequentially. balance (on aVCl8ge) will be maintained. Adaptive deSigns change the randomisation sequence as the trial progn:sses. One method is called MlNJr.USATION. which belps to solve the problem discussed earlier of multiple factors on which balance is required when stratified randomisation becomes too complex. Another type of ADAPTIVE RANDmDSATK»I is that called ·response-adaptive randomisation' (see Wei and Durham. 1978). These methods arc rarely used but they have an intriguing and appcalin, ethical basis. The), can be used when the response to trealment for each patienl is known relatively quickly and the recl1litmenl of patients is relatively slow. Such a deSign would slarl with a tradilional. equally balanced. randomisation scheme and the first 20 or 30 palients may be assigned randomly 10 each of the IrCatmenl poups. Tbereafter if one In:atment is appearinl to be superior to the other then the randomisalion probabilities belin 10 change in favour of the mosl advantageous IJUtment. Early results from trials can be very unreliable and it is quite possible that the lI"Catmenl appcarinl to be beneficial early on may subsequently appear less good than the alternative lrcatment. In this case. the allocation probabilities would then change back lhroulh O.S to again favour the lI'Catmenl that is emerging as ·best~. (See DATA-DEPEN'DEM' DESIONS for further diSCUSSion.)
38S
~ISAnON
____________________________________________________________
A parlicular appIiclllion or nmdomisalioD in the clcsiga or clinical lrials is the I8IIcIomisc:d consent clcsiln pmposed by Zelen (l979~ tWO). Such clc:sips are usc:d iD hilhly prapnalic trials - i.e. Ihasc IllatlR inlellded to mimic. as closel)' as possible. true clinical practice. 11Ic:y encompass the: consb'aiDt thai some pa1ic:nts may naI wish 10 n:cei~ certain In:lIlments SD daat. even ir the: physiciaD milht JRscribe a particular nabDc:nl~ the patient may wish DOl to lIIkeiL Subjc:ds are rancIomised to one: or two batmenl pal. . and those who are rancIomised to lUCive: SIandanIlhc:rapy the:n n:cei~ that lhc:npy. 11Iose who are rancIomised to rueive: die new. expcrimc:ntallradmc:nt an: askc:cIto COlL1ilCnl to n:ec:i~ aballn:llbnent or. if lhc:)' prerer, dley ca n:ceive: the slllndanl lIa1menL Their JRf'en:ncc is .apcc:lcd and Ihc:y rueive Ihc: bailment or their choice. It is absolutely critical lllat such lrials an: analysed by the intentioa-to-lleat principle. SD daat patients IR anaIysc:cI ...-eanIil1lto which nllbnc:Dtthc:y ~ randomisc:cl and nat which tn:aImc:nt they actually n:ceivcd. SUch a lrial is Ihc:n intended to ansWer' the question about whBl would bappen ir palienlslRprrscribetla parlicularlmdJnc:Dt.CODSiclc:rinl the inevitable: problem that same: patic:ats prc:scribc:cla particular lmdment will cboase noIto take iL A relaled type or Ilial is called WeII'Il,,",'s tieng". where: patients are rancIomised to Ra:ive eilherlhc: Irc:almc:Dt oraheir OWD chaice: ora 5IaadanL specified expcrimc:ntallrc:almc:nL Such a design is., qain. hiJhly pnIIIIIBlie. but it is very cliflic:ult to judge the benefit or one tn:aImI:ntsll'alc:lY oyer anaIhcr ror fiJian: palients since the: results or the slUdy an: highly dependent on the pMfc:n:aa:s or the patients who take pan. AnOlbc:r use: or mncIomisaliOD is in testinl Irc:aImc:nt policies followil1l an existil1llrc:atmc:nIRgime. Examples typically include answe:riDllhc: quc:stiOD: 'Ir patients do nat aapoDd to a low close. an: they likely to rc:sparul to a higher close?' Very commonI),. this question isaddn:ssc:cl as rollows: palienlS IR rancionaiseclto batment or control and eflicacy assessme:nts made III an appropriate follow-up lime: after rancIomisalian. lRatmc:nl may have: been shown 10 be. OD average. superior to CODlnJl. yet not all plllienlS wiD ha~ n:sponcIecIlo Ihe new active: Irc:atmc:nl. n.c.c who nat rc:spcmcIc:d are now In:IdaI with a highudose (or same: other modification or Ihe In:abnc:nI repme:) and subsequenl ef1icae)' or Rsponse: nates IR n:conlccl. The: criteriOD rorbeing included in such a rollow-up is lllat the patient has not respaadcd 10 the iRilial treatment and so any rc:spaase BI all is considcml indicative or an additional bc:nc:fil or lhc: adcIilianalln:alment Rgimc. The: question. or caursc:. remains as to what would I.e happened to these palienls had they eilhc:r continued wilh Ihc: existing mc:dicalion or rollowed SDme alhc:rcOUl'lC ortreabncnl instead. The: appmpriate means or evalUalinlsuch a question is illustrated in the ftpn=.
_e
Dasey
Treatment dose x Dasex
f
Randomise
t
Re-tandomise non-responders
randoml_on Example 01 re-randomisiltlJ 'nonI8Sponders'
Patients who 811: to be included iD this roDow-up n:cime: should be nmdomisc:d to either rc:cei~ the: new modified In:atmc:nt (perhaps a dose increase) or to continue on existing In:atment (usually meanil1l at the same close). AlaiD. nate the: criterion ror beil1l incluclc:d in this rollowup stud)' is that the patient has not Rsponded to the iDili~1 lmItmc:nt. and iD such a clc:sign it is not UDusualto see some: or the patients who conlinue: OD the: identical tn:abnc:IIt to raponcI now. This clc:siln. howe~. allows us 10 see possibly diO'en:ntial n:sponse rates in the patients who have continued on exaclly the: same Irc:a1ment as abase: who have been rancIomisedlO receive: the modified treatme:DL This then allows a proper assessment or abe beDc:ftci~1 effects or chaging the Irc:lllmc:nt rqime: as opposed 10 continuil1l with abe same: n:cime. An ethical debate often arises IU'OUnci randomislllion (sec: ETHICS AND ruNICAL TRIALS). How ca it be right to choose: Ralment ror a patient basc:cI OD chance alone (or the: 'throw or a die·. to use a man: emolional phrase)? This is aD impDrlaDl consideralion but many counter with the: argument that it can be uDethieal ""/10 randomise: pIIIients into clinical trials. 1'IIc: eabics or clinical llials is a very bmacl subject ucl cannot be rully covered beR. Wbc:n: lenuine: doubt exists. howc'VCl'. as to the n:laIi\'C benefits of one In:atme:nt O'VCl' another (a state often called ·c:quipoisc:'). then not ranclomisil1l patients into trials can lead to mislc:adiDl or even raise juqemc:nts about the: relative ellic:ac)' of different Ihc:rapies. Even whl;rc: DO altenllIIi~ 1n:aImc:nl exists ror lirc:-tIRateninl coaditions and a new patenlialtherapy misht offer Ihc: ·OIIly hope:' ror a paliena. without ranclamisil1l patients between this ne:w (passible:) tI1:abDeDl and pIac:cbo, we: can never pin a true: uncIc:rsaanding orthe risks and bc:nc:ftts orlhc: new treabncnt. Folklon: often tbc:n suuests tbatthe new trealment is. in ract. bc:Iler than placebo when it bas never been properly lc:stecl. and it tbc:n becomes impossible: (evc:n ir slill nol
_____________________________________________________________ unethical) 10 carry aul a nadomised Ilial. Randomised CClDscol desilns and some of the adaptive designs discussed in dais section can help balance lhe ethical qulllCllls. Using ·unequal· randomisation (assigning mare palients 10 some lIeaImeJIl arms than to CJIheD - ar 10 placebo) can allO help. Raadoaaisalion is one of Ihe most fundameatal and one of the most _portanl considendions in the design of a clinical trial. In conlnlsl to observational ..,sean:h and epicIcmioIOIY. randomisation plUvides the basis for assigning causality of n:sponse to the: assigned balmc:nt. For a much ruDcrdisculSion of randomisation. n:aders lin' refena! to Rosenberzer aad Lachin (2002). SO A_., D.G. 1991: PrGc,itol ~tal&lic, jIJr metliml remm:la. London: ChIpman a Hall. AI..... D. O. ad BIIIId. I\L 1999: Tralmcat alJacaIion iD coatnJllcd trials: why rucbaise? British Metliml JOIIl'IItIIlla. 18. CIIIIImIn. L 1999: Why transitian fiom altcrDIlian 10 rudomisaIiaa in clinical bills was made: (lelia' to the editor). BritUlr Meditlll JDllrllDl 319. 1372_ ....... W••• ad ........, J. Yo 2002: RtIIIdomi:tltitNr in tlinil.Ylllritlll. New yadt: Wiley a SalIS. 1nI:. Wei, L J. ad DarIIam, 50 D. 1978: The IUdomized play-tbe-wilmer rule ill medical bills. JIIIIIfftII 0/ Ameritllll Mmiml AD«itl,ion 73~ ~l. 7Alea. M. 1979: A IXW dcsip far IUdomized clinical trials. NftF Ellgltmi .IoIuntI1 of MetIk_ 300. 1242-5. ZelIa, M. 1990: RancIomiIcd CGlLtCnt cIcsips f. clinical 1riaIs: an update. S'tll&lics ill Metlcille 9. 645-6.
J.I.
randomlaad controllad trial. (ACT)
See QlNlCAL
randomlsecl-reepon_ technique
'yes' n:.sponsc:s in the sample size n:
;r = P-(I-p)a
=
'Ibis is a Icch-
nique dud aims to getaccuralcDASwc:n to a sensitive question tlud n:sponcIcals mighl be ..,luc:tanI to aaswcr truthfully. for example. ·Have you ever had an abortion?' The nndomised n:sponse technique pRJlcclS the n:spondcnt's lIIHIDymily by oJ1'ering both the question of inlc:n:sI and an innocuous question. which has a kDown prabability (a). of yielding a ·yes' raponse. for example: I. [Flip a coin.1 Haw: you ever had an abadion? 2. [Flip a coin.1 Did you get a head? A random clcviClC is then used by the n:spondent to cIetcnnine which question to _\\'Cr. 11Ie outcome of the randomisinldcvice is seen only by the n:spondent. nat by the inlcrYiewer. Cansequenliy. when the interviewer n:cords a 'yes' n:sponse. it will not be known whether this was a .yes. to lhe fint or sc:cond question (Warner, 1965). If lhe MOB. A11l.rrY of the rancIam cIevicc posing questi_ one (P) is kllDWD. it is possible to estimate Ihe pnJpDItion of .yes' n:sponses to question _e (.it) f10m the ow:raU pmpartion of ·ycs· raponses(P=n.ln~ whelen. is Ihe total number or
=
So. for cumplc. ir p 0.60 (3601600). p 0.80 and a dlen .i - 0.125. The eSlill'l8lcd varianClC of it is:
=0.5
Var(#) = :i(l-i) + (l-p)2a(l-a) +p(l-p)(p(I-a) + a(l-p») Por the example he..,. Ibis gives ;r = 0.0004. FurtheI- examples of the application of the technique an: giW:D in Cbaudhuri and Mulccljee (1988). SSE
ChenAarl, A. ... MIIkIrJee, R. 1981: RImtIomizm re.-e: IMMY _ ,«""ips. New York: M.ccl DeUel:. W......., S. L 1965: RaHamjzcd n:spDDSC: a survey technique for climiaaliDI C\'IISive answer bias.. JIJIIfIllll D/Ihe Anwriftllf S1t11&'_ AmJritllitm 60.63-9.
range '11Ie range
is simply the interval bc:twcen lhe minimum DDCllllUimum values in a set of absc:mlliaas. 11Ie range is a MEASURE OF' SPREAD, although it is of limited use because it is dependent only on Ihe exln:me (and possibly unusual) observations. and nat on the majority of the values. Por Ibis n:asDII. the IN1I!RQVAImU! IW«J£ is often a pn:rcrmi naeasu..,. HOWC\ICI', if the I81DpIe size is very small the range may be considcn:d to be a useful sumllllll')' statistic. b is man: informative: if the minimum and maximum an: bulb quoted (e.g. 2110 54). rather than IIKRly the diJ1'emx:e bctWCCIIlhe lwo (e.g. 33).
TRIAU
RANKINVAR~
SRC
rank Invarlance
Qualitative variables &Ie commonly measun::d by difTc:n:ntlypc:s of raliDg scales. with a disc..,te number of anlen::d categories ell' _ a VlmAL MfALOOUE SCALE (VAS) baYing a continuous 1'DDg4: or possible values between Ihe ENDPOINTS of a stnIighl line. The measurement level or data rrom scale assessment is called oreIiDal haying rank-invariant properties only. Such cIaIa remaiD invariant in all order-prac:rving Il'ansfOlRlalions, which means that lhe calcgory labels do nOi represenl any mathematical valuccxceplthecategorical orcierorthe scale. Furthermon:, ORDINAL DATA. im:.spcctive of the Iype of labelling. only contain information about orderiDg and not about magnitude ell' clislllllCe between the calclories. Thus. one succession oflsbels can be n:pIaccd by IlDUlher. e.g. numerals by letters or by a sct of increasing numben or symbols or by a pictognun. For examplc.lhe fivecalelories of a Rvepoint scale &Ie often labelled wilh the ligures 110 S. bullhe Calegorical assignmenl cauld be any oaher set or onlemt figu..,.. such as 0.25. SO. 75. 100. The: lack ofmalhemalical meaning of the calcgoricallabels implies. forexample. that lhe inlc:nnecliDle numerals bc:Iwccn the labels SO and 75 do nat exist. Sl8Iislical methods mustlhen:rore be unaR"ec:lc:d S87
~,RAN~NGPROCEDURES
____________________________________________________ Fatigue
by any kind of re-labelling ofcategorical scores. This means
thal rank-based melhods should be used. for example or other centiles for description. (For further delails see Stevens. 1946. 1996, Hand. 1996. and Svensson. 2001.) £S See also [RANICJNO PAOaI)UIlES, AOREEJ.IENT, GLOBAL SCAL~a.
FI
MmIAN. QUAJl11I.ES
VALlDrrY)
IIaDd. D. J. 1996: Stalistics aad the tbcoIy of measumaent.JoumtIl of the Royol StQt&tisti~aJ Societ}', SrritJ A 159.44>92. Sfm!u, S. S. 1946: On the theory of scales of IIICIISIRlDCut. Sdmce 103, 677-80. SleTeas, So So 1955: On the avcraging of data. St:ienl.Y 121.113-16. SftIIIIIOII, Eo 2001: GuidcIincSIO $lalistical evalualion of dlla from ralings scales and ~ JOIIIfIQI of RehabililQtiolr Met/kiM 33. 47-1.
ranks, ranking procedures
Nonparamc:lric slalisNONA\RAMEI1UC METHODS - AN OVERYIE\\')
lieal methods (see arc useful for aU lypes of data: while panunebic slatistical melhods arc applicable to quanlitative data thai meel the criteria of being nonnally distributed (see NORMAL DIS'J1UBU. noN) or othu known PIOBABIUfY Dl51R1BlmOHS. A common approach or nonparamelric statistical methods is to lnInsform data to ranks. A ranking of" onIen:d observations is a sel of numerical ranks [1. 2..... thai will Iepn:sent the observatiOllS in slalislical analyses. The rank sum is 1/2 11(n + 1) and the mean rank is 1/2(" + I). where " is the number of observations. Assessments on rating scales with a limited number of possible calegoric:s imply that groups of obsemdions wiD share the same category. and these observations will s~ the same rank wlue. which often is the: MEAN or the ranks that belong to the group of observations. so-called tic:d ranks. The calculations of the MANN-WHITNEY RANK SUM lEST of the difTc:rcnce belween lWO indepeadenl groups ofdaID mad of SPEAIlMAN"S RAMt COItRElATION COEfFIClEHT arc based on this lype or rank transformalion (Siegel and CasteJlan. 1988: Gibbons and Chakraborty. 20(3). The first Ogure shows the fRqueacy distribution of pairs of data from psychiatric assessments of abe severity of fatigue and lhe level of concentration dimculties in 43 patients. The assessments of the lWO variables ~ made on ruling scales having five ordered categories denoted Fl ••..• F5 and CI ••.. , CS. whc:rc FS and CS represent lack of symptom or difftcully. According to the frequency distribution of aJRssmc:nts on abe fatigue scale 16 patic:nts were judp:d to the calegory Fl. which n:presenls the moll severe ieYeI offaligue., and these will shan: the ranks from 1 to 16. the mean rank being 1.5. The sevCft patients in the calegary P2 will share die raaks 17 to 23. the mean rank being 16 + 1~(7 + 1)=20. The mean ranks of the two selS of distributions are shown in this figure. The pain of cell flequencies arc Rplaced by pails of ranks when tbc relationship
n.
F2
P3
F4
Tied
F5
lOt
4
11
16 l5.5
I
6
24.5
5
19
6
13.5
rant C5
I
C4
2
I
2
C3
2
2
I
C2
3
3
CI
8
I
I
total
16
1
a
11edrant
8.5 20 27.5
I ~
10 5.5 0
12
43
37.5
ranks, ranking prvced.... The frequencydistribulion of pails of data from psychiatric assessments of the SfIVfKity of fatigue and the level of concsntnJlion dfficulties in 43patients. TheratingscaleshaveRveOldered calegoriesclenotedF1' ... F5andC1, ..., C5, whereF5 f
andC5 tept8SBlJt a lack ofsymptom ordlflicu/ly. Thetwo sets of marginal distributions and the tied rank values of the margInsJ hequencles are given
betwa:n the severity of fBlip and cona:ntralion dillic:ulty is calcuJatc:d by Speannan's rank correlation coefIicieDl. This means lhaIthe observation (Pl. Cl) willgellhe pair of tied ranIts (20: 5.5) and the IhR:e absen'alians (Pl. C2) (20: 13.S). and so on. Spc:arman's rank corn:lation coeflicient. when adjuSlcd for lied obsen'ations. is 0.7S. A bivariate ranking applU8Ch developed for analysis of paired ordinal data regning agn:c:menl and disagreemenl thai takes aa:ounl of the information given by the pairs of data is suggestc:d by Svensson (1997). In this augmenaed ranking approach (aug-rank).lhe ranks an: tied lo the pairs of dala. which means to the obsemalions in the cells of a square OON11NOENCY TABLE or to the points of a SC'.O\TI'EllPL01'of dala from VISUAL AHALOOUESC'ALE (VAS) assessmenls. This means thaldle augmented rank of the assessments X depends on die pairing with V. The second figure (on p. 389) part (a) shows lhe paiRd disb'ibution of SO uscssmc:nts made by lWO ratc:n labelled X and Y. The tine individualscalcgorised A by ralu X arc found in dlecens(A;A).(A~} and (A:B). which means that rater Y has assessed one of these individuals to a higher category than hasralu '" and this individual wililhereforc be givc:n a higher aug-rank X-value. The DUg-rank X-values of lhese three pairs an: thererore 1.5. 1.5 and 3 respeclively (see pan (b». Ac:conIing to rater Y. 14 individuals arccatcgorisc:d A. bUI (2. I. 3, I) of ahem are catqoriscd A, B. C. D respectively by X. and Ihc:rerorc die Dug-rank V-wlues of these four groups or individuals will diITer (see part (b).
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE
(a)
R
A
RATER X
B
A D
T C B R B Y
A
2 1 8
1 2
C
D
1 2
I
II
3
14 3 1
ranks, ranking procedures (a) Thepaireddistribution of Intenater assessments of 50 individuals by a tourpoint scale (A, S, C, D). The agreement diagonal is marked (b)
(Aug-rank-X: Aug-rank-Y) A
C
D
(31;49)
(50'.50)
( 13.5:31.5)
(29.5:33.5)
(42.5:41.5)
B
D C
B
(3:15)
(12:16)
(23:22)
(34:29)
A
( 1.5;1.5)
(7.5:6.5)
(16:12)
(32;14)
ranks, ranking procedures (b) The paired distribution of aug-ranks of the frequency distribution of paired assessments X and Y of (a) This aUl-nnk approach to taking account of infonnation from Ihe pairs of orden:d categorical asscssmenlS when n:placing pmd onIinai daaa with pain of aug-nmu makes it possible to identify and sepandcly analyse a possible systematic component of observed disagreement from the occasional. noise. variability (sec AOREEMENT). A compielC ap1:enlCnt in all pails of aug-nnu dcftncs the nnk-lrDnSformable paIICnI of agm:ment (RTPA). which is uniquely lelated to the two marginal distributions. The RTPA is the distribution ofpairs that isexpccted when Ihe observed disagm:mcnl is completely explained by a systematic disapeemcnt and in the case or complcIC agreement. ES
been exposed than an: conlrOls. In some casc-conarol Sludics n:lI'OspCCtive data on expoMR ~ obtained from historical m:ords. However, in mosI situations such data~ not available and data ~ inslc:ad obtained by interviewing cases and controls (or their n:lalives). When this is clone thele is a chllDCC that cases may be mOle likely to lemember having been exposed than ale controls. Even w~ there is no genuine diJTcreace in frequency or cxPOSUIe between cues and controls. this differential recall may cause an appalent difference. so that the cxpo5Ul'e appears to be associated with disease. For example, in a casCH:ontrol study of congenital malfonnations. mothers arc asked about prior exposuI"Cs to infectious diseases. drugs. environmental pollutants. etc. It is quite plausible that a mother who has given birth to a malrormed child will be mole inteI"Csted in the Sludy and make more effort to lemember instances of past exposure. It is also possible ror n:caU bias to operate in the opposite direction: e.g. ir. throup shame. cases were less likely than connls to admit exposure. (For funber details sec Hennekcns and Buring. 1987. and Rothman and Orcenla.nd. 1998.) SRS (See also BIAS IN OBSERVATIONAL SlUDIESJ 1leDaebJu, C. H. aDd B........ J. F.. 1987: Epitlemiolo" in medirine. New York: Unle, Browa and Company.......... K. J .... GneaIaDd, S. 1998: Modem tpidrmiology, 2nd edition. Philadelphia: Lippincon-Raven P'ablishm..
receiver operating characteristic (ROC) curve
recall bias
DiagnoSlic testing plays an increaSingly imponant role in modern medicine and the ROC curve is a common graphical tool ror displaying the discriminatory ability of a diagnostic mmer (test) in distinguishing between diseased and healthy subjects. The outcome of a diagnostic test can be dichotomous (positive. negative). ordinal (e.g. nonnal. questionable. abnormal) or continuous (e.g. PSA measurements). 11Ie ROC curve arises only for ordinal and continuous outcomes. A diagnostic m..kcr is gcnemJly evaluated by comparison to a definite gold slandard procedun:ltesl. Such gold standanls ale often complicalcd to conduct,. intrusive. not sufficiently timely or expensive. This moovates the seard1 for inexpensi'VC, easily measurable and leliable alternatives. A subject is assessed as diseased or healthy depending on whether the com:sponding marker value is above or below a given dRshold. Associated with any tluahold value ale Ihe FROBABJUrYor a bUe positive (SENSI11\'IJY) and the probability of a true negative (SFECIRCII"Y). The ROC curve presents graphically the tradc>ofT between sensitivity and specificity ror e\'ery possible thn:shold value. By convention. the plot displays the specificily on the y axis and 1 - sensitivity on
STUDIES
the:c axis.
LSec also VAUDITY OF sc.w:s) Gtbbaas, J. D...... CIIakraboItJ. S. 2ODl: Nonptlrametric 91atis-
tkal injt.'f'rllte. 4th edition. revised and expanded. New York: MIReI Detter. ..... S. aad c.t.... N. J. 1988: NonptIrametric statutkJ for the hehtnioral Jnl!ftceJ. 2ad edition. New York: McGra,,"-Hili. SWIIIIDII, & 1997: A coefticient of apmneRt adjusccd for bias in paimI onicrM categorical data. Biometncal JormNll39.643-57.
reading the medical literature
Sec CRI11CAL
AIIPRAISAL
This is a BIAS dial can occur in C.UE.
389
RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Consider, far example. ORDINAL DATA. arising in Ihe conlext of mccIicaJ imagil1l. A Iader is paenkd wilh imqcs rn. diseased and healthy subjects and is RqUired 10 nile each imqeon a discme ordinaJ scale (1-5): (l )definitcly nonnaI, (2) probably nannaI. (3) queslionablc:.. (4) probablyabnormal. (5)deftnilely abnormal. Suppose: the n:sults IR as shown in Ihc: finl table or c:quivalc:ndy exprased as cumulati~ pc:n:entagc:s. as in the second table.
receiver operating challlCterlstlc (ROC) CUIW 0trIinIIJ data results fmm a medical imaging stuc(y Rtltin, True disc:ase slaIUs
Healthy Diseased
(I)
(2)
21
14 4
2 JO
Total
18
(3) 5 10 15
(4)
(5)
2
I
14 16
20 21
T..... 50 50
I.
receiver operating chllnlcterlstlc (ROC) curve Cumulative petr:enfIIge IfISUIts from a medical imaging study Rtllin& True disease Sbdus
Healthy Diseased
(1) 5691,
(2)
(3)
(4)
(5)
B4~
94tJt
98~
4..
12~
3291,
60'1.
101M(, 101M(,
.....r operating
c~c
(ROC) curve
SpecifIcity and sensltifIIIy pails True disease . . .
Speciftcily Sensitivily
(I)
(2)
(3)
(4)
(5)
R 56f11 8491, 94., 98CJ. 1009t 100CJf, 96~ 8891, 61., 4C)CI. 0'1.
1
I o
1-sensi11v1ly
1
receiver opera1ing chMacte118IIc (ROC) curve A typical ROC CUIVe
If we use alhreshold of (1). we would have a spccificilyof 5691, and a sensitivily of 96.. (100Cl. - 491,). Far a IhRshold of (2). the specificily is 84., while the seDsilivity is 88.. (100C1t - 124Jt). If we ilROl'althe lest and called everyone: diseased then the spc:ciftc:ily is O'J, wilh a CGm::spondinl sensitivily of 100... This gi'VCS rise 10 the speciftcilyl sensitivily pain in the lhird lable. TIae figu~ shows die plol of specificity \'CI5US I - sensitivity. the ROC curve or piOI. A diaposlic marker shows load discrimiaalor)' ability if bath sc:nsitivily and spcciftcily an: hip for a reasonable l'BIIIe orthn:shold values. In Icrmsoflhc: ROC plat. this means thal theclaserlhccurveCOlDl:S 10 the left-hand banlerand dlcn Ihc: top borcIer the bc:tIer the marker. '11M: closer the cune is 10 Ihc: diagonal (450 ) line the worse the discriminalar)' accuracy of the marker. For CODtinuauS d_ the calculations are carried aut similarly. Howcow:r. far Ibis silualion the nUlDber or outcamc values will be: lDucb largeraad usually only oncof &CDlitivily or specificity wiD change when the Ihresholil value is incn:ased 10 ill nc:JC.l observed value. COftICCIIICIIdy. far dala arising flUID a continuous sc:enario, aile resulting ROC plot willaend to look like a step function. Both for the conIinuous case ad Ihc: anlinal situalion thai has arisen fmm an underl)'iq continuous mcchanism~ the tnle underlying ROC curve is • smooIh funclion. Many mdhods. nonpal'lllDl:lric, semipIII1UIIdric and panunc:lric. have been ~lopc:cIlo provide smoaIheslimalion pmccclun:s for Ihc ROC curve..IDaddilion• . .lhacIoJogy for MijUlliq aile ROC curve far cOYariale iDf'onnalion and selection bias has been proposed. ID cCllain situations the diapa5lic: madccn are DDt IDCaIUI"ccI din:clly on cadi subject but are taken OD pooIccllJUllPs of subjeclS. Fangi. Reiser and Sehislerman (2003) dc:scribe estirnalinl the ROC curve for such pooled daIa. 11Ie discriminatory power oralie diqnostic IIIDIIca-lbat is indic:aled paphically by the ROC CUJYC is often summarised by a oac-number index. The AREA UNI& TIE ROC aJRYE (AUe) is the most commonly used index. Samc:limc:s. OIIly a particularl'BllleorSCIISilivilyvaluesisofinlen:slandapartiai area is compuled as the area uadcrthe curve over abe range of &CDsitivilics considcral import_. An aJlemali~ index due: to YO"'D (see amnc:r. Preift'er and Smilh, 2000) is 10 compute maxI sensitivity + spc:ciftcity. - 1. when: the IMXimisatian is auried oul oow:r aJl pailS or sensitivity and specificily values. The ROC curve isaJso useful inaaessinglhediscriminalory pDWa'ofIIDtiJticallllCldels aadclass_1S forbiDIII)' outcomc:s. Far fmhi:rclelails see ShapiIU (lGGG). Zhaa. Obuchowski aad McClish (2002) and Pepc: (2003). DFIBR ........, Do, ....... B. aDd MIste.-. E. F.2003: ROC CUIVC aalysis r. bionwtcrs baed au pooled aacssmcnlL Sla'illks in Mftlit:iIIt 22.2515-27. OnlDer, 1\0.., ........ De aDd SmItII, R. M.
___________________________________________________ 2000: Prinl:iplcs and practical appIiclllion of the rccciYcr-opentinl cbuacteristic ..tysis for diapasIiI: tests. Pmoenlire Velerilrtrly IImicille 45. 23-11. PIpe, Me S. 2003: 'I1Ie IIt11islittll nwllllltion 11/ nwtIiCQ' la/s for c/tmijictll_ _ pr«litliINI. Oxfonl: 0dIrd University Pm&. ....... D. E. 1999: n.c iDIaprNlion of dilllDOllie tcsk Stalislical MelhtNb iIr IIetliml ReSftll'm B. 113-34. ZbDa, "" H.. 0IIacIMnr*I, No A.... McCIIIa, D. K. 2002: Stalistical "".tbods ill diqnoslic lPIftIkille. New Yark: JaIm Wiley &: Sons. Inc.
REBRE~NmLunoNBMS
case thc:difrcn:llCe in discucrisk between tha&e at the: top and thasc at the: bouom of the: risk capasun: dilllribulion should correspond to a IIIIITOWCI' range or values on the harimntal axis than suggested.
II· -11/1
, .....1011 dilution blaa The t~ al!lies to any scuing when: one wishes 10 8SSI:SS the natlOnship between a Yllliable X that is mcasun:cl with error and an outcome variable Y. When obsencd yalues of X an: used as eslimalc:s of IIUc values. Ihe IIue relationship betwcca X and r is undcratimated; this is tqn:ssion dilution bias. In epidemiology. Ihc Icrm is often used to clcscribe the situation ~ mationsllips between risle. exPOSUIa of inlc:n:sI (such as blood pn:ssun:) and Ihc risk of a paniculardiscasc: occurring (such u a heart 1IIIack) an: underestimated becausc or the usc of single "baseline' measurements of Ihe risk capasun: as estimates of the true undcdying level. The situation arises because: sludies that aim to identify risk faclon for a particular disease usually take: ·baseline' aslCS5menis of indiYiduals and n:1aIc dIesc lIIC85UR':IIII:atsto disease: events observed OWl' a panicular follow-up periacl. Howeva'. IWEI.DIE MEAS'IJItDtENTS often do not n:ftect the patient's true: usual level during the period of follow-up bc:causc of MEASUIlSII!NI" ERRORS. sIIOIt-lcna "random' lactualions from the: indiYidual's aYCnlle level and longer b:rm true: changeL Aldaough thc:sc em:cls an: randam. meaning thai Ihc baseline measurement is just as likely to Ova'as uncIcn:slimaIc the palic:nt's true leyel. the differences bclwccn patienls eslimab:d from a baseline: sample (Ihc bclwc:cn-pcnon variation) exauc:ratc the IJUc: differences thai really exist betwccn those: palicnls oyer a period of time (because the CSlimatcd VARIANCE consisls of both within- and bdwc:cn-pcnon varialions).lnothc:r words.. thcdiffc:n:na:s in the level of the risk exposun: betwc:c:n the saudy participants arc not as large as one would estimate from the: baseline slIIIIple alone. The elTcc:t or this on thecstimaled n:latimaship betwc:c:n the risk exposun: and the risk of clisc:ase is shown in Ihe ligure when:. for the: sake of illustration. it assumed Ihat die: risle. cxpasun: is positively lIISOCiaIcd with the risk of disease and thai a unit incn:asc: in the: capasun: leads to a propodional increase in the risle. of disease (a log-linear relationship). The solid line in the agun: shows the "appan:nt' n:Jalioaship betwc:c:n the risk exposure and the risk of disease obtained when usilll die: baseline: measurement lCl'els as eslimatesofb'Ue usuallCl'els. However. as already described, the InIc dirren:nces in exposure levc:ls bclwc:cn the individuals an: noIlikely to be as peal as it would seem. in which
Risk exposure
........... dilution bla Effects of t8fIIfISS/Dn dilution billS 1he true relationshipbc:twc:cn the usual risk exposure level and clisc:asc risk may thcn:fan: be: obtained by "shrinking' the line lOWaIds the middle by some pmlc:terminc:cl amount (as indicated). so thai the: true relationship (shown by the dashc:d line) may be: obtained. 'I1Iis line: indicates the relationship between usual 1Cl'els of the risle. exposure and the risk or disease and. as can be IICCD. its slope (the strength of lISSGCialima) is paler than would otherwise be: estimated. Thccxteat ofleln:ssiondilulion bias (the dilfcn:nc:e bclwc:cn the two slopes) can usually be estimated by taking repealed measun:mcnts on individuals at various periods tIuou&hout the follow-up period.lhus enabling estimation oflhc amount of within-pc:non YDriaIion likely 10 be present ThUs. die: appan:1ll regn:ssion slope fJ may be appIUJIrilllCly com:ctc:d by a factor 1. (the repasion dilution ratio dI.. may be eslimated by the inllKlass cam:latiOll) in order to obtain the lnIe slope 11:. == /!11. The conc:cpl of Jqression dilution bias is a Yery general one and is likely to apply in any setting when: intcn:st lies in the association betwc:c:n usual CX'aVCJagCcapasun: ICYcJsovcr an expasun: pc:riod and disease risle. over that period. In situalioas where die: interesl is not in assessing relations with usual exposure levels. however, CX' w~ the oUlaImc: or inlen:51 is nol determined by usual 1Cl'e1s. correction for regression dilution bias would not be: approprialc. For a practical example. sec: the study into blood pn:ssun: differences conducted by MacMahon et III (1990). which adjusted for the etrecls or repasion dilution bias. RAfIJE (Sec: also REOIBSn.. TO THE MEAN] Macl\WIDa,S......... Cder,J.,OIIUas, R.,SoatIt, P., Neataa, III. 1990: Blaod pa:SSUR. SIIaItc and canmary tan disease. Pad I. PmIonpddUrCRDCe.S inblaad prasun::~oIIscn'lllional
J."
381
REGRESSION TO THE MEAN _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
mHIies c:cJmCtcd for the rqRSSion dilutiGll bias. Llmal 33S. 765-14.
reg.....lon to the mean
This is a phenomenon wIleR 8ft unusual or extn:mc value for a giYCII variable is rollowed by a much lessextrcme value when n:·mc:asumnc:nt takes place. The phenomenon is an inevitable consequence of MEASUREMENT ERROR. A puticularly well-known sc:enario concerns appamIt hyperteasiOD in a SUbjecl who rccanIs a very high blaod preslAR n:adilll. Re-measurc:mc:nt is 'Very likely to resull iD a lower blaocl JRssun: rc:adiq. eyen if the subjeci has not been trc:atc:d. The blaod ~ has .n:gn:ssed' (lone back) towards the IIUe undc:rlying MEI\N ror that subject. The phenomenon was nnt noted by Francis aallon in the 19th century when he compami the: heights ofpan:nlS with those of their childn:n. aalton demonstrated thai the: ehildmI with tall pan:nts tendcclto be: :mrtIller lhan the: parents. while children with small parents tended to be 11I11"than the parents. Yet the: phenomenon also worked the other way round: pan:nts of tall childn:n tended to be: smaller than the ehildn:n aad pan:nls or small childn:n tended to be: laller. Conscquendy. when Galton plotted the hc:ipts of children against the: heights of parenti, he observed Ihal the line that best filled the: dala had a slope less than ODe. Indeed. that is how the: 'n:ln:ssion line'. nowadays used mon: widely for n:laling any pair of continuous 'Variables. obtained ils name. The extent orn:gn:ssion to the: mean may be: eSlimlllCd by measuring a group of indiyiduals twice. PEARsoN'"S CORlELA110N COEFFICIENT will estimale the: extenl to which single indiYidual Yalues would be expected 10 rqress to the mean. Another method involyes diYidiq the distribution or the firsl measurements on the: indiYiduals into Itftbs (MacMahon el Ill., 1990). We calculate the: means for those: in the: lop fifth and those: in the bottom fifth. For thc:sc two exln:me groups or individuals. we the:n calculate the: mean of their measun:mc:nts on the :second occasion. The means ror lhe Iwo groups on the: second accasion will be man: similar than on Ihe fint occasion. Indeed. the: ratio of the diffen:nce in means on the second occasion to the: diffen:nce in means obtained ror the: fint occasion has been recommendccl as a good method of COJftCting ror RBJRESSION DlwnON BIAS.
Manyexamplcs ha\'C bc:c:n provided about n:pessiOD 10 the mean in wrious branches or medicine (see. far example, Bland and Altman. 1994a~ 1994b: Morlan and 1brgcrsan. 2(03). Individual palients. when bated far unusually hip blood pn:&SUJ'c:. will be: likely to impro~; the: mistake is to attribule such impro~mcnl to the: tn:atmenl nlhc:r than rqression 10 the mean. Similarly. public health action to pn:vcnla givcndiaease may be targeted. a gc:ographicalaau wIleR incidence or a given disease has been observed as
unusually high. Incidence: of the disease will be likely to decline subscqucndy, but that may ha\'C happenc:d even in the: absc:nc:c of the public heal... intc:naltion. In both these examples, the true lest of the intervenlion inyolves an experiment that compan:s the chaagc followinl intervention with thaa scc:n in a control group who. despite haVing unusually high inilial values obsc:rvc:d, an: nol subjected to the intcrvenlion.lrthe: impro~mcnt observed in the: intervealion poup was In:aIa than thai seea in the: control grouP. the:n n:Jn:ssiOD to the mean was not the: solei)' responsible factor. Because or regrasion to the mean. the British Hypenension Society recommends at least two readillls on each of se~ml occasions (Ramsay el Qt•• 1999). This is partly to adcln:ss the possibility of a poor measun:menl technique far single readings. However, the blood prc:ssu~ at the: lime of the: readiIIJ nul)' ha\'C been genuinely hiJher than that normally expericncc:d by the individual. 1hc: aclual clinical objective is to ascertain the: individual's true undc:dyiOJ mean \'II1ue:. and repealilll the measun:menl a week laler may help 10 addrc:ss this. RM/J£ BIaDd. J••L .... ~ D. G. 19948: Rcp:ssian towards the
mean. Britislr Mftiirol JOIII'IUIIJOB. 1499.IIIMd,J.M. .... AIba...
bdf....,
D.G.I": SameCX8q)lc:sol~ lowards the mcan.Brilim M«ikal JDIlTIfIII 309. 780•• s., ... R.. Cutler. J.. CaIIIas,IL...... P.. Nt8toD,J...,aI. 1990: Blaad pm:surc.. slRIb. ad canJIIII)' bean disease. Part I. PmlanFd clitTelallXS in blood prasux: pmspcctivc obsen'lllioaal studies camclcd for the Iqft:&siondilulion bias. weet US, 765-74. MortOII, V..... T....... D. J. 200): Mea of' n:..-an to die meaD on decision making in heal... em. Britislr Medical JDUmQ/)26. 10834. .....,.. L IE.. WIlli• ..., ... J....... G. 0., MKGreaar. O. A.. ....., .... Patter. J. F• ., fIl. I~: Brilisb Hypcncasian Society paiddincs rar bypelte.ioa manllgCIDCDt: suI'lll1lll'y. Britislr MetlitJIl Journal 319.~S.
regulatory stall.acal matters
National and international guidances ha~ been written to lei out (in grc:aIer ar lesser leyelsofdetail) hoWCUNICALTRIAU should be planned.. exccutc:d. analysed and n:poncd ror rqulalary submissians. The earliest documents were: national or rqional: fOoD ANI) DIWO ADMlmStRA11ON( 1988). Ministry or Health and Welfan: (1992) and the CHMP Working Pasty an Efficacy of Medicinal Products (1995). Lalc:r. the: Intemalional Conrerence on Harmonization (aCH) began to coordinlllC: JUidance in all an:as or the: n:lulatcd phannaccutical industJy (nol just slatislics and not just clinical trials). They produced the: guiclaac:e known as ·ICH E9' (see ICH £9 Exped Warkinl Ciroup. 1999), which was adopted by the ICH Stc:erinl Committee in february 1998. 1hc: luidance is focused on statistical principles aDd nol on specific methods. It is also aimed al a target audience or nanslalisticians as well as statillieians. and so should be:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ REGULATORY STATI8nCALMATTERS
mostly undentanclable to abe IBI'ICt audience of the current volume. It is die n:sponsibilily of a sponsor company to produce convincinl evidence of quality. safety and efficacy of a new medicinal product. While Ibe cawincing nat~ of such evidence may be inftucnced by the specific melhocl of analysis usc:cl in a clinical trial. it is mCR likely to be inftllCllCed by the man: fUndamental issues of design and conduct of the studies. Trials submitted far regulatory submission have to be dc:saibed and rcponc:d in peat detail (see MEDICINES AND HEALlHCARE I'RODUCI'S Rmut..\1ORY AGENCY).
ICH E9 explains lbal il is imponanllhal all the Iccy fealures Idating to the design. monitorinl and analysis of a trial (and selS oftrials) should be scl out in detailed study protocols (see PROroCa.s RIl CLINICAL 11UALS). "nIere should then be a tnM:eablc ~ofanycbanp made tothcorilinal pRllocol, including when the decisions were made and when the changes wel'C implemenled.. In this way. abe protocol. its _umentedchangesand die final clinical study report should all link to_her. The ICH E9 cIoc:ument is mostly aimed at alDfirmatory Phase III trials. although much of its alDtent may be thought of as laod staliltkal practice and applicable IIIOre Widely. Howe~. within the conted of Phase III biaIs.the IN'I'E.'I1ION1O-11lEAT principle (anci. lbercforc. "pragmatic· trials) is sbessed in pn:ference to 'explanatory' trials. where a FER PROIOCf1 dc:sip and analysis may have a ~alcr part to play. The documc:nl inlnxluced a new tenn. 'full analysis sel', referrilll to the data lhat are used to try to addIess the intention-to-treal question. This was a compromise situalion, m:opisinlht it is often not possible to include e\'CI)' patienl in the analysis for a InIC intenlion-to-ln:aI answer. This term "full analysis sel' has nat n:ally caughl GD as a concept oulside oethe relulalory field and perhaps not within it either: 'inlcntion-to trcar. however. is ftnnIy fixed in the language. The definition (and spcciftcalion) of primary and secondary ENDFOU..-JS is clearly important and covcn:d in detail. but funhcr consideration is also given to 'composite variables' and 'multiple primary' (now more usually called "coprimary') endpoinls. Many trials use composite wriables as primary (or secondary) endpoints - examples include psychiatry. whcm raling scales an: used. dcnaatology (lesion counts). arthritis (tolal joint SCOla). clc:nli.try (number of eroded surfaces) and so on. "Olobal assessment scora' arc widely used in many therapeulic an:as and arc a clear example of a campasite endpoinl (even though it may not be clear what Ibe components arc). II is important that endpoinls (particularly primary endpoints and. III4R especially. txllllpasite endpoints) an: well validated both for stalistical pruperties but also far face validity. Co-primary endpoints pn:sent fwthcr difliculties but obvious examples exist - particularly in canliolol)' studies. Reduction in "all cause maltaiity. rccum:nl myocardial infan:tion. and stroke'
is an endpoinl often used in myocardial infiln:lion tn:alment llials. While there may be a hierardly to such endpoints (dcalh being wone lhan MI and strake). Ibe~ arc also elc:mcnts lhaian: not hicraKbicai (n:currcnl MI is nol necessarily 'beuer' or 'wane' than bavinl a stroke). 11Ie f'undamcntallcchniquc:s to avoid BIAS arc considcral to be BLINDINO and RANDOUISI\1ION. Each of these is coYCtm in some detail. Fairconsiclcration is liven to the difflcullies (in some cin:umslances) ofdc:siplng trials to be double-blind ar or mainlaining the blind through thccowsc of the study - but limil8lions of studies that arc not blinded arc outlined. Randomisation. including SlJDtiIied and adaptive (such as ~). methods an: discussed. Their benefits are explained allhoulh some: concern is raia about the usc of adaptive desips. If such methods arc used. il remains far Ibe sponsor to pnxIuce convincing evidence that lhc analysis is adequate to account for the assignment mc:dIod. SENSrrMlY analyses (used bcrc and in OIher conlcxts) may help to address this. Basic dcsip issues (...... Iel gmups. cros5-0ver and faclorial designs) arc described and the relative advantages and disadvantages discussc:cl. CIIoSSO\IER lRIALS. in parlic:ular, can be a ~ eflicienl method of eXpc:rimenlalion but arc hilhly prone 10 difticullies in the face of carryOVCl' eft'eclS bcIwc:en periods. This issue was particularly highlighted by FR:eman (1989) bul seemed to sngle to become n:cognised. 11Ic leH £9 documcat clearly states in relation to carryo\'Cr and Ibe nc:ccssary chronic and stable nature or abe disc:asc: "The fact that these alllditions an: likely to be met should be eSlabiishc:cl in advance of Ibe lrial by means of prior information and data.' This was a sipificanlltep forward in lhc use of (or restriction of) crassover trials in regulalor)' work. Another substantive Itep was in the explicit n:cognilion and conlnlst between trials 10 show superiority (which an: what most people think of when discussilll clinical trials) and IriaIs to show noninferiority or equivalence. Much has been wrillen on this subject recently~ although some was wrillen befom ICH E9 was published, untillhcn it was still rather a new and unclear concept. Within Ibe same section is commenlon trials to show a dose-respansc relationship. anOIher area previously suft"erilll fRlm lack or clear exmsideration. A whole ran of oIhcr issues is also discussed. such as handling ).fJ5S1NQ VALUES and 0lJ1UEIS. data transfonnalions. estimation ~ hypolhcsis testing. multiplicily. subpalp analyses and interadions. use of baseline covariale5 and so on. The document is wide in its scope. coverinl safety as well as efftcacy. Overall. the ICH E9 guidance has been highly inftucnlial both wilhin and outside the pharmaceutical industry. However. as some pmbJc:ms belin 10 be solved. alhc:n come to abe fCRfmnl and it is now recognised ht funher luidance on topics c:oven:d. pcdaaps rather sparinlly, is necessary. To Ihis end. the European Committee: far Human Medicinal
393
RELATIVE RISK AND ODOS RATIO _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Products (CHMP) has set up wriaus subcommittees. notably its EIIicacy Workinc Party. which has idenliftccl various IftaS that need further caplanalian or clarily. This paup has clevelaped a numbcror·guiclclines'. 'poinlS10 consiclcr· cIoc:umenls and 'n:8cctiaa papen' (nat just in stalistical an:as). Cummdy, thc:~ am eishl qn:ccI documents n:IaIilll spcciftca)ly 10 slalislicallmelhodological issues: swilchin& between superiarity and non-inferiority (Aupst 2(00):. applicalians willa lIIda-analyscs and one piWIIII study (May 20(1):. missing daIa (November 20(1): multiplicity ilS1les in clinical trials (September 2002): adju5lmcn1 f« baseline cowriatcs (May 2003); cIaIa monitoring commitll:CS (JulUll)' 2006): choice of non-inferiority IIIIUIin (January 2(06): .... cODfirmaIDIy trials with adaptive: clcsips (Octaber 2(07). 1bc misli.,. clala paidanc:e is being updated and new guidance is beilillftlNll'CXl em sub-puup analyses. Numerous adler guidance cIocumenls also cover illUeS rdaliDi ID statistical points. AU or these documents haYe bc:c:a writlen ID help clinical IISSCIISOI'S ID evaluate appIicalions ror MadtCling AulharisaUons; they ha~ not been wrillen ID guide stalislicians: lhey have DOl been wrillen 10 guide phannac:euIicaI c:anpanics. HOWC\'CF. it is clear that lhc:y n:cei\oc a lot or attention rlOm these lauc:r paups. and wanhily so.. They an: all rn:cly available em the EMA website at hltp"Jlwww.ema.c:wopa.euIcma. Thea: am European clacaments and have DO formal SlalUs outside or the EU but Ihc:y ..., widely nx:ognised as valuable and impartanl guidana:s. In paralleL the us FDA has issued paidanc:c on the use or daIa monitarilll commillccs (MaR:h 2OD6). 8ayesiaD stalislics in medical device trials (FcbnmJy 2(10), "ive clc:sips (Febuary 2010) and naa-infc:riarity swdics (March 2010). SD
ina million to lwo in a million mighl be less wonisome.ln Ihc fint casco it is a relal;'~ risk ahal is pRsentcd. and. in lhe second. an absolule risk. Both the relalive risk and the: ODDS RAno are measures of relalive risk/chance. They measure how the chance or an event. typically of gdling a disease, varies betwcc:n lWO calegories. typically a grvup ahal is exposecl to a risk ractor and one thai is DOl. Explicil definitions are most easily undentoacl in tenns of a 2 )( 2 table of exposure by disease. as shown in lhe: table. Therein. N denotes the: populalion size and a. b. r: and tI the absolute rrequencies ortbe rapectivecombinations orthe levels or the: risk raclor (exposed or not exposed) and the disease raclOr (pn::senl or absenl). 1bc (absolute) risk or the disease: within a subpopulation is defined as the proportion or subjects wilhin the group that have the disease. i.e.I(Disease pRsendExposcd group) - til (Q + b) and I(Diseasc presentlNonexposed group) = c/(c + tI). As a rault the risk ratio or mlative risk. RR. of disease comparilll the exposed with the nonexpasc:clsubpopuiation is given by the 1DIi0 RR = a/(a + h). For example:. a ""'alive risk or lung cancer comparilll smokc:n with nonsmoken or 2 would be inteJpreted as claubling the risk or IUIII cancer when smoking.
relative risk .,d acids ratio exposure by disease
Pre:wnl Absrrrl RIsk I'acIor Exposed
CHMPWorIdag.....,. _1!ftImQ' of MedIdaIII PradudIII99S: BiostaIistic:al meIhodaIoD in cliaical bials in mar~tilll autharizatians for medicinal purpascs. Slalislics in Mftljtine 14. 1659-82.. Food ad DnIIAdaIInIstntIan 1981: GuitleliltejDf' IlaejDlfllQI_ mn/~'" of '''~ clinkal and sIa/wicall«liOlU Dj'nett' tJru, opplimliMS. Rockville... Mayland: FDA US Dcputmcat of Health aad IIumm Senices. ........., P. R. 1919: 'I1Ic pcrfOllllllDCe of the "'o-stap _ysis of IWO-batmcnt. two-pcriad CIQISO\'U trials. Stalwia ill MmkiM 8. 1421-32. ICH E9 Expert WorIdnR Gmap 1999: Stglistical priDciples farclinic:al trials.: ICH hannonised lripanite piclcline. StaliJlits in MmkiM 18. 1~2. MIaIItrJ ~ H........ We11'II... 1992: G_line/or 1M slat&Ii",1 tIIItIl),sis of cI..iml trials (in Japanese). 1bkyo: MHW ~euIicaI Affairs BIRIII..
relative rlak and odds ratio Thc:se two approaches measure the elrcel or a risk factor. The way in which risk is pRseated can have an inftllCllCC on how the associated risks am pc:n:cived. For example:. one: mighl be worried tobcar that occupational exposure at one's placeorwort claubled Ihc risk of a serious disease compaRCl to SOIDC oIhc:r OCCUpaiion. Howc:ver.1hc slalemc:nt that the risk had incRase:d rrom one
T~ classification of
a
Not exposed c Total a
rat,.'
Q+h d r:+t1 h+tI N=Q +h+ c+d
h
+ t:
IncOlllJasl. theocldsorthe: disease within a subpopulatian is definc:cl as the: ralio or subjects within the group thai ha~ Ihe disease ID Ihose thai do noI. i.e. o(Disc:aIe prescatllbpased group) =tIIb and o(Disease praenllNonc:xposecl group) - cl tI. As a rauillhe acids ratio (OR) or cIiseMC comparillllhe expasccl with the nonexposed subpopulation is given by Ihe ratio OR = (aIb).l(dd). Far example. an odds ndio of high blood pn:ssun: compming tn:aImc:nt A with tn:aImcnl 80r0.8 would be intcrpreb:d as a 2M. rc:cIUClion in Ihc odds of high blood prasure under an:almc:nl A. One: can evaluate. say. 9StJ. CONf1DENCE INIDVALS ror Ihc: papulation OR or RR by first tl1U15f'orming inlO the natural (base e) log scale and then exponentiating the Iowa' and upper limits. The rormula for Ihc STANDARD ERROR or the log (OR) is memorable as the sq~ root or the sum of the recipmcals or the entries in the 2 )( 2 lable of disease and risk.
______________________________________________________________
factor. (1'be rDIIDula is given cxpltcidy in the cnlly on CAS!CQNI'R(1
STUDIES.)
Usually. the p~fened choice or cxp~ssing the effect or cxpasun: on disease is the RR. Howcver. not all study designs allow the estimation of this parameter. A CI05S-StrrIONAL STUDY might fix Ibe sample size and in II COHORT SI1JDY the sizes of the c:oborts or exposed and nonexposed subjects (the row IoIals in the table) ~ under the conbol or the investigator. Both these ~lriclions allow the risk or disease within the expasun: groups and hence the RR to be estimated. In conlraSl. in a cue-conlr'ol study lhc: numbel' or subjects with the disease and the number or subjects without the disease (the column totals in the table) 8M chosen by the investigator. making il impassible to eslimate aD)' risk or disease rrom the sample data. However. since the odds ratio of disease comparing an exposed with II noncxposed population is the same as the odds ralioorhaving bc:ea exposed lOa risk ractor compariq cues wilb eDlllIOls, the odds ralio can be estimated from all the designs menlioned above. Casecontml sluclies ~ frequentl)' carried out in practice and this cxplains Ibe Widespread use or the OR as II measun: or elTCt't size. In addition. ir a disease is n:llltively rare in the population (sa)' below S9f,) the OR can be used as an approximalion to the RR. For rurthu details see Dunn and Everill (1995). SL ISee also CCJNt1NOIlNC'Y T.o\BLES. EPlDElOOLOOY. LOOISTIC REOlES. SIaN. MA.~HAENs7.EL nsr)
DIIDa, G. 8IId Eftritt, B. S. 1995: Clilriml bio:lttltislit.J: an in',.".. tI~liDn 10 r,itlence-baMd metlidJre. LoIIdan: Amold.
relative survival This
is the nlio of the ob.sen-ed
survival of a given c:obort of patients with the disease or inlercst to the survival thai this poup would be expected to cxperience based on the UFE TABLE of II comparable group without the disease (suitabl)' matched for Ale. scx. calcndar time and possibly oIher co\'ariates) frum Ihe background or n:fen:nce population rrom which the)' wen: sampled. It is routinel), applied to populalia.base:d canc:er studies of survival utilising cancer n:gistry data.. However, this mell5Ul'C can equally be applied to population-based studies of other dmmic diseases. such as cardiovascular disease. Relalive survival (Eelem-. Axtell and Cuder. 1961) was developed as a technique ror adjusting the survival curve (sec Sl1MVAL ANALYSIS - AN OYElMEw) obtained from the study cohort ror 'the deaths that would have occurred an)'way·. In the case of cancer. the n:lalive survival sepanlcs the risk or death allributable directl), to the cancer from the background risk or death rrom all olhcrcauses. and thus allcmplS to get at the 'nct effect' or clIJICIer on survival. It n:8ec:Cs the impact or the disease on monalit)'. If the cause-specific death infonnation in the study is available and. importantly, accundc thea a mcasu~ of the
RELA~~VAL
net effect of disease on survival could easil)' be obtained from a cause-specific analysis when: only those deaths due din:ctJ)' to Ihe disease or inlelat an: considered to be events. while observalion times of individuals who have not )'et died or have died rrom other causes ~ considcrm 10 be rightcenson:d. A slnightforward applicalion of survival anal)'sis methods (see SURVIVAL ANALYSIS - AN OVERVIEW) will thea provide an estimate or the diseas~spcc:iftc survival curve when deaths frum other causes ~ ranow:d. This eslimale of the disease-specific survival curve is known as the "net survival curve' (Esteve el al.• 1990). In practice, information on cause of death, ir available. tends to be unn:liable in population-based n:sc:an:h. Hen: cause or death is usuall), obtained via death cerlificDles and this iDronnation has been shown to be inaccurate ror both cancer and heart disease in terms of c:odiq CDIIeCd)' Ihe primary cause ordcalh (Lauer eol al.• 1999; Welch and Black. 2002: Mant el al.• 20(6). In this situation acause-spccific (or competing risk) analysis approach should not be used. as the eslimalion of the net survival would be biased due to aUsclassification orlbccause ordeaIb. However. n:lative survival can still be used ~ as it does nol requi~ inrormation on cause or deaIb (whether IlllJibutable to the disease or olhcrwise) to be available. All thai is reqUired ~ the occum:nc:es of the deaths in lhe c:ohoIt duriq the stud)' period and Ibe external life table inronnalion rrom the comparison group. whtch is commonly obtained frum national mortalit)' dala. It should be noted thai under certain conditions the net survival curve and the ~Iative survival runction ma)' be cslimatiq the same 'nct elTcet' or disease: on survival. This can be clearly seen if Ihe definition for malive survival is man: precisel), (i.e. mathematically) wrillCD out as: So ()
~
I
= S.(I)
~(/)
(1)
whe~
S... (.), S.(.) and Snp(.) an: the mlllive survival runction and the obsencd and expected survival curves ~lively. If. additionally~ it is assumed Ibat the reJali\'e survival function. Snl(.). corresponds 10 a 'proper' survival curve (in Ibe mathemalical sense). thea equation (1) is equivalent to:
ld.(I) = A.eI(/) + Aexp(/) w~
(2)
A.el(.), 1o.{.) and Aexp(.) an: the hazard fwac:liaDs
(see SURVIVAL AlIlUYSIS-AN O\'ER\'I£W) assac:ialcd with SnI(.), s.(.) and Sexp(.) respectively. Thus equation (2) cOlMsponds to the cause-specificfcompeling risks (or additive hazanls)
model. when: Ani repn:sents the discas~spccific hazard and Au, the "other causes' - specific hazanl.1n lhe ~latiYe surviwllitenduR:., Am is n:rermi to as the cxcess mortalit)' or ex"ss hazard. 395
RSUABlurrY _________________________________________________________________
A number of approaches exist far c:stiJlUlling n:laIi~ surviyal. The m¥xily an: buc:d _ ~ models clevelaped _ the ·bazanls scale'. whcR cowriales that atrect morta1ity (either mcxtaIity fnHft lhe disease or rrum other causes) can be easily incOlpOlDlcd. 1be most canmonJy used n:1ati~ n:~ssian appraac:hes an: those based an addili~ hazards (2). althauP multiplicative madels (sec PIIOPORDOfW. HAZARDS)cxiSl as well aslhose baa:cI_oIherapproaches (C.I. FIXm! MlXI'UIlE D151R1Bl1I1ClII5. TlWlSIORMATlONS. etc.). EstimatiaD of Ihc:se models can be based on Ihe fuU LIICB.IHOOD or GIber approaches (c.S- EM ALCKIU1II\L, pauped dala. etc.). 11Ic n:adcr is n:fened to llllicies by Hakulincn and ~nkanm (1987). EslC~ el til. (1990). Andersen e' III. ( (999). Diclcmaa el til. (2004). Pohar and SIan: (2006), NeJsaa el til. (2007) and Perme. Henderson and St~ (2009) ror fUrther details.. BT Aadenea, P. K., Ban.1t&t 1\L at., KleIn. J. P., Sa*, G.. Sra., J. v .... ZIIat.Mo-J. 1999: MoIIcllinc covariaIe"justccl mortality ~1ati,'C to a standanl population. SIG'isllrs ill MetlidM lB. 1529-40. DIe....,P. W., ....... A., Hml,M. ... ........, T. 2004: RepasiDn mocIeJs for ~latM survMl. StG'istia in M«litille 23.
51-64 Bdenr, P.,AsIIII,L. M. ... CatIIr.s.J. 1961: The ~lative survival raJe: a slalbtical mcebadolol)'. NGliDntIl ClIMrt IlUlila'~ MDllo,rop/l6.IOI-21. Esthe.J.,Ben......... E.,en.t.Ie, Mo ... ..,....., L 1990: Relative Slll\'iYai ad the estimation of net survival: clemeals for fiatbel' discussian. SIGI&tit$ ill Medi_ 9. 529-38. BaIud....... T • .ad T. . . . . . L 19B7: Repasion analysis ofJdalive sunivall1ltcs. JDU1fIQloftlle SIIII&I_ Sod~"., Serie$ C (Applied StG,is'irs' 36. 309-17. La..... 1\1. So,"'''''''' L .... Y........ O.B.... T..... E.J.I999:CllllSeol'dcalbinclinkal taeaR:h: lilAC for a ~uscament? JOIIIfItII ofllle Ameri(,1111 CoIIq~ 4ClITdiolo" 34. 618-20. Maat,J.. WJIIDa, So, JIarrJ,J.. Jktdae, P., \YIIIaI, R., 1\1"'" W., QuIrke, T., DawIeI, 1\1., (;emm II, 1\'" IIanIsaa. ..... W.......... A. 2006 Clinicians didR't ~liabIy distinpish beh\'eCD di«ercnt causes of cardiac death Ding case histories.Jollmlll tJ/ClilriCtlI Epidemiology 59. 862-7. He..., C. P., .........., P. Co, Sqalre, L ..... ....., 0. .. 2007: Flclible I*'IIJIICtric madcls fell' ~laIive .viwl. with appIicaliClll in ODIUDU')" hean disease. SIIII&Ik$ in MftiidM 26. 5486-9B...... Me P., .........., ...... Stue. J. 2009 AD appRIrIICh to estimatiaa iD ~lati\'C survMl lqlalian. Bior'ali.rlics 10, 139-16. PaIIar,1\L'" saar., J. 2006: RclaIi~ suni'Ial analysis in R. Computer M~1htJth IIIItIl'ro,rllllU in BiDmftlirin~ BI. 272-1. Welda. H. O. and m.a, W. C. 2002: Ale deaths 'A;1hin IIDOIlIh or canccr-dirmcd SIIIICIY
.,YII
Ilbibutcd to canca'? Jourlltll of tIw NaliDlull C""M' lru/ilule 94. 1066-70.
reliability
Sec MEASVREMENT PREaSKIN AND IB.L\BILITY
repeatability Sec
MfASUREMENT
PRECI~
AND
REI.JA8IUTY
thai due to one ractor, that due to a second factor and the residual variatiaa that cannot be explained by ciabcr ractor. The JqJCatcd measUfa ANALYSIS OF Y.o\RlANCE (ANOVA) is a special case of abc two-way analysis mvariancc. whcreone or the catcgorical faclorS is the time al which the meas~ ment is talccn. As with other vcnioas of the analysis or variance. the n:pc:alc:d measures ANOVA is usually employed fallowiqa dcsipccl experiment. If' an experimental dcsip decn:es Ihat measun:menls will be takca al baseline. after 1 months. after 7 manlhs and after a year. then il is sensible to view time as a ractarofrour levels. C'onbariwise. if we have nalUrally ansins times (e.g. times Dl which palic:Dts choose to visillhc:ir OP) then viewing them as cllk:larical will bedil1icuk and Jq)e&tcd measun:s ANOVA will prabably aaI be approprialc. Cadogan til. (1997) inyCSligate the errect of incrcasinl milk consumption an abc mean bone density in aclolc:seent sirls. Measurements wen: taken at O. 6. 12 and II months and the treatment was a two-Ieye) factor. For these n:uons. the n:pealCd mcasun:s ANOVA was the choice of analysis lechnique. As for the other analysis or variance techniques. repeak:d measUfa ANOVA can be exlclKkd to IlIOn: complicalcd situalions and lOr In:llk:st flexibilily can be viewed fram willUn a n:p-ession fl1ll11CWOl'k. For furlbcr details sec
e'
AGL
Allman (1991). (Sec also LINEAR MlXED-EFftrI'S .IODB.S)
AI....., 0. G. 1991: I'raclitQ/"IIIisli(,J for mielll n5Nr('lt. l.aIIdaa: Chapman.t Hall. C. . . ., J., EIIIhII, a., Jaa., N.... ....... Me E. 1997: Milt in" ancl bone miaeral acquisilioa in aclDlesccDt lids: ......isaI, CGIIIIOllcd iDlcrmJtiaft llial. Brili. M~dklll JounroJ 315.
1255-60.
replicate designs Sec C1lO5SOVER TIlIALS reproducibility
Sec . t.IEI\SUIlDIENI
PREClSIOX
RELL\IIIUTY
.......domlaatlon designs See RANDOMIs.\TION r.....llng See BOaI'SI'RAP
......rch ethlC8 board (REB)
Sec ElHlCAL REVIEW
COMMJ11EES
re.arc:h ethics committee (REe)
Sec EI1IICAL
REVIEW COMMI11EB
repeated measures analysis of variance ....is is a ICsl to see if the MEAN varies with either (or bulh of) a calCloneal ractor and time. The TWO-WAY ANALYSIS OF VARIANCE seeks to partilion Ihe varialion in a sample into
AND
.....dual confounding
Sec MEASUREMEH1' EIUlOR
residuals Sec MUJl1III.£ UNEAR IEORESSION
________________________________________________________________ ROCCURVES
respon.. feature analysis
See SUMMARY ~
ANALYSIS
. .pon.. vadllbla reatrIctad 11If1111OO1).
dift'aenl rvbusI mdhods.l'IaceIIuIes .... mainlain Ihc:Type I enur (declan:clsipillcance 1c:w1) ar1heCCNJDENc:E LEVEL an: blOwn 85 ••Iidily while those .... mainlain hilb power or size ofCOJmDENa! DI11!RVAL an: bown as eJlir.ncy rolRu,. EYeD wi.... these broad calqarics ~ llii&hl be diffeRni kinds of .......~ from the ilssumplions thai could be caasidc:n=d. 1hesc: compdilll inftueaces ba~n:suItcd ill a wiele IBIIIC or rabust ellimalion mdhods bein& clcyclopc:cI: .... iDclude M-eslimaton (based OR MAXDIlJM LIKELIHOOD ES~ L-cstimators (based on Iiaear funcIiGDs or onIcr sIaIiSlics) and R-esti1lUllOr5 (based on nmkiIJI melhads). the M-cslimalol'S (which include Ihe IlUllpIc: mean and lIICdianasspccialcues)8Mbasc:clonbounciedorn:-clc:scendina weicht runcliaas. which give lower (or ew:n zem) weilhl to exln:me absc:I.,..tians and usually imalw: an ilcnlliYC solution or the raultiqlikelihaod equatiODs. TIle L-cslimaten include trinunecl meaDS and WinlDiisc:d means ( _ &pin, die sample mean and median as special cases). while Ihe R-e1li1llldOn Je.l1O Wilcoxon and MIllUl-Wbillley pmcedlRS based OR s i ' - nab see WD.coxoN SIOHED RANK lEST and MANH-WRmIEY lANK SUM1EST. Geaerallyall tbese alla'lllllive IObust methods lead to IaIOIUIbly hiP eRiciency and wider appIicabiUty than die ~opIimal' ~. 1hese ideas "YC been atenda! to multivariate data ancIlo lep:~ ROD prob1ems. Thc:n: is now an exllmsiw: literalun: _ mbust lecn:IliOD. robaslDas iD scientific modc1lilll and in aperimental clesip. PP
FYI""".
See ENDJIOINr
maximum likelihood (REML)
See
MUIl'IIJM5L MODELS
TIais is a property or stalistical pmccdures dud implies "alibey mnlinK to wodt well eveD when then: an: departures ftvm the assumptions on which I~y wen based. Many sIandanI statistical ~ n=qui~ IhaI certaiD UMDllpliaas hold rar the underIyilll theoIy to be applicable. For eunap1e. the TWO-SMIIU Rlquires dud the data an: lIDIIDIIIIy dislributecl with ~ __ YARL\NC'E.. .. paeral. Slalislic:aI praccdun:s an: CIJIIIicIaed 10 be rabust ir they are insensitive 10 small deYiatiaas rna the uaderlyiIIg 1IISUIDplions. If Ihc: optimal ~~ n=quira die asslllllplion of normalily. Ihen the CGlRsponcIiq IObusI pn:adIRS would not be: inftuenced by cIepadun:s from· aanaaIily arisiD& ftan slilhdy IonICI' ~ shaner tails ell' sliehl SICEWNESS iD the uader1yiagcliSUibution. which coulcllaUlt ftvm the presaa or a SIDIIIl propadion of 0IJ1I.EJIS ar spurious values. Robust procedures an: ones such thai theaD OUIU~ ifdle)' accunm. waald laYC Uttle etrect OR ~ .....)'sis or Ihe cIaIa. PaIIaps the: IDDIl CDIIIIIIC3II atimaIar of the: papulatiaa MeAN is die .....,Ic: nan. bIl this is DOl raImI Blailul quite IIIIIlU depgtun:s ..... 1hc: assumpIiaD afnormalil)'. bc:iac padiculady sellliliYe to outIian.. '11M: ~ by wa)' or CXIIdnIsI. al........ lcaellicient Ihaa the ample IIICIID when the assumptions bald. is muc:h IIIIR . . . . since it is hardly aft'ected by the ~ or autlien.. A ~....' c:atinaaIar is ane ... canbines die prapedies of hi&h dIicienc)' and JabusIncss. One or the majar in the clew:lapmaIl. or mbust prac:edures conans the appnlpliale set or criIeria that the procedun: should 1IIIist)': difrcn:at criIeria ha~ led to
robustnaas
,-tEST
...,Iems
om., P. L 1993: AspccIs of mllusllincar n:patiaa. AIuItII6 D/ SlIIIiJlia 21. 1143-99. aw. P. J. ad r...m,. A. M. 1.7:
a._••
.,liB '/'1«,;". New Yark:'" W"1Iey a Sans.1nc•....., .. O.........,&J.IB): R'" ~II__ New Yark:John Wiley a Sons, Jnc. W","lgga: Robrai ~s. . _
,iDII_I"".,.
nimnIins and WiDllDrisltiaa.Ia AmIi~ P. . . Cohan T. (cds). ~itI oJ""",a/ia. ClUchcstcr: John W"aIcy a Soas, LId.
ROC curvaa
See ItECEIYER
CPEJlATINO t1WtACIBUSIIC
(ROC) CIJR\IE
SI7
s sample 81ze determination In clinical trials One hallmark of a well-designcd study is to have a formally eslimaled n:quirm sample six befo~ the study aJIIUIICnc:es. AWamJess of the importance of this has led to increasing numbers ormcdicaljoumals clcmanclilllthat rulljustificalion of the sample six chosen is published with IqIOdS orlrials. The British Metliml JowlIQl. the Joumol t1f 'he Amerimlf Mediall AJlSDdotion and n~l'OUs other journals issue cbccklists for authors or papers on CUNlCAL TRIALS. in which th~ is a question matilll to sample size justificDliOll. InYestiplors, grant-awarclilll bodies and biotechnology companies all wish to know how much a study is likely to cost Ihcm. They would also like to be n:assum:l that their elTort (and money) is well $pC.nl. by assessilllthe UKELIHOOD thaa the SlUdy will give amequivacal results.. Pnwidiltla sample size is DOl simply a matter of providing a silllie number from a set oftabies but isatwo-stage prucess. At ahe pn:limiDary stages. 'ballpark· figlfts an: RXluirm thai enable the invcstigalorto judge wbelhcr or not to starlthe dc:tailed plannilll of the study. Ir a clec:ision is made to pnx:ced. then a subsequent &Iagc is to n:fine Ihe calculations for the ronaal study pmtocol itself (sec: IIRCJI'CCOLS RJR CLINICAL TRL\I.S). When D clinicallrial is designed, Ihe investigator must make a realistic assessment of die potenlial beneftt (the anticipaled effecl size) of the proposed tCSl lherapy. The history of clinical lrials resean:h suggc:sls that, in certain cin:wnslances.. rather ambitious or O\'Croptimistic views or potential beaefil have been claimed at the design 1Iqc. 'Ibis has led to llials of inadequalc size for the questions posed.
If too few subjecls 1ft involved. the trial may be D waste
or time bc:cDuse realistic medical improvements 1ft unJikely to be distinguished fram chance variation. A small bial with lID chance of cL:Iectin& a clinically meaningful dilTcn:ace between lreatmcnls is unfair 10 alilhe subjccls pUI to the risk and distvmfOlt of the clinical llial. 1bo many subjects is a waste or n:alOUKle and may be unfair as a Iaqer than necessary number of subjects ~ive iDrerior IR:Dlment if one treatmcnl could have been shown to be mCR elTcctive willi fewer patients. 11ae b'aclilional approach to sample size delennination is by cons~dcndion of significllllCC or JIYllCJI1II!SIS 1ESTS. Suppose we wish to at~ two groups willi a continuous outcome variable. We set up a NULL HYIVI'IIESIS that the two population MEANS., Po andpl. are eqlial. We oUI a SlONIFICANCE lEST to tcsIlhis hypothesis.. We calculate the observed dilTcn:nce in means iI. This significance test results in a P-VAUE. which is the PROBABDJ1Y of gcttilll the observed n:siIIl. ii, or one man: cxuemc, iflhe nuD hypothesis is true. by chance.lflhe P-value oblained from D IriaJ is less than _ equal 10 G. then one n:jects the null hypalhesis and concludes that IheIe is a Slalistically significant difference between In:aIments. The value we take for a is ubilrary, but conventionally either O.05orO.OI. Contrariwise. irtheP-wlue is gn:DlCrlllanG, we do not n:jecl the null hypolheiis. Even when the null hypothesiS is. in rae.. !rUe the~ is sliD a risk.of rejecting it. Th rejecl the null hypothesis when il is lnIc is to make a T'IFE I ERROR. Plainly Ihe associated pro~ ability of ~jectilllthe null hypalhesis when it is tnie eqlials G. The quantity G is inlen:laaqcably tenned the test size.
can;
ample size determination In clinical trials The tefl-hand curve shows the disllfbution of iI undsr the null hypothesis. "d > d,. then Ho is rejected. The vefflcaly hatched III8IJ represents the Type I eTlOf G. The light-hand CUnIB shows the disttibuIion 01 iI under the alternative hypothesis that the dlferencs In means is d, and the horizontal hatched Btea represents the Type II snor, fJ £trqdtlfNlldic CMIJIIIIIf- 10 M«&tII SI"'B,ks: SRf1IIII E4i1itM EcIiIed by BriaD S. EYflin aad ChrisIClph« R. JIIaImeIo C 2011 laID Wiley 6\ Sou. ....
399
SAMPLE SIZE DETERMINATION IN CLINICAL TRIALS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ or probabilily of a Type I (or false positive) ellOr. The left-hand cUl'Ye in the ftgun: on page 399 shows the expected disbibulion of the observed difference iI undu the null hypolhcsi~ cenlrcd at zero. If iI is gn:aIcr Iban some wlue tIa, which is delmnined 50 that the shaded an:a to the left is equal to a. then Ho is rejected. The clinical trial couJcl yield an obscl'Yed differenc:c if that would lead to a P-value above Q. even though the null hypothesis is not true. i.e. 110 is indeed not equal to II •. In such a siluation, we then acccpI (more colR'Cdy phnuc:d as "fail to rejecl') the null hypothesis although it is buly false. 1bis is called a TYPE 11 (false negative) ERROR and the probabilily of this is denoted by /I. The probability of a 'iYPC II CIIOr is based on Ibe assumption dial the null hypothesis is not true. i.e. 6 = po - p. ~ O.1here arc clearly many possible wlues of c) in this instance and each would imply a different allemalh'C hypothesis. HI. and a diffen:nt value for the probabilily fl. The POWER is defined as onc minus the probabilityofa 'iYPC II ellOr and thus the power equals 1 -~; i.e. the po~Y!r is the probabilily of abaaining a "statistically signiftcanf P-value if the null hypothesis is truly false. The right-hand cur.·e of Ibe figure illustrates the distribution of if under the altcmativc hypolhesis HI. centrc:d on the expecteddiffen:nce in means 61 =po-pl.lf6, Q and/larc all fixed. it would appear then: is noIhing left 10 \'81)'. However. the distribulion or if depends on the number of subjects in the two groups. With more subjedS the STANDARD fltRQR clccn:ases. so the curves become narrower and so for a fixed Q the value or /I decreases. The sample size calculation is a compromise between the powcI' (I - /l).1hc effect size 6 and lhc sample size n. A key element in the design is the 'effect sizc' thai it is R:asonable to plan to observe - should it exist. Somclimes theft: is prior knowledge. which lhcn enables an iD\'Csligator to anlicipaIC what effc:cl size between gRHIps is likely to be observed. and the role or the study or trial is to confirm that expectation. In some situation~ it may be possible to state that. for example. only a doubling of t.ttDIAN survival would be worthwhile to dcmorwlra1c in a planned trial. 'Ibis might be because the new In:abnent. as 4X11Rpan:d to standard. is expected to be so toxic dlat only irsubst8atial bencfil could be shown would il ever be used. In such cases the in\'Csligator may ha~ definite opinions about Ibe difference that it is pertinent to detecL In practice. a range of plausible elTect size options arc ooasidcreci befon: the final effect size is agreed. For example. an in\'estigalor might speciry a scientific or clinically useful diffeR:DCe that it is hoped could be dctccled and would lhen estimate the n:quiR:d sample size on this basis. The calculalions might then indicate that an cxln:mely large number of subjects is n:quin:cl As a consequence. the investigator may SIGNIFICANCE LEVa.
next define a revised aim of detc:cting a ralbu larger difrerenc:c than Ibat originally specified. The calculations arc repeated and pcdIaps Ibe sample size now becomes realislic in that new contexl. One addilional problem when planning comparative clinical trials is that investigators 1£ often optimistic: about the magnilUdc of the impro\'Cmenl of new lrcatmcnls over the standard. This optimism is undcntandable, since it can take considerable effort 10 initiate a trial and.. in many casc~ the trial would only be launched if the investigator is enthusiastic aboutlhe new treatment and is sufficiently convinc:cd about its ponlial emcac:y. However, experience suggests that as trials progn:ss then: is often a growing realism Ibal. even at best. the inilial expeclalions were oplimistic. 11tcrc is ample historical evidence to suggestihallrials that set out 10 deleet large In:abnent difTen:na:s nearly always rcsull in 'no significant difference wasdetc:cted' .In such cases. there may have been a lrUe and worthwhile treatment beaeftllhal has been missed. since the le~l of detectable differences sct by the design was unrealistically high and hence the sample size too small tocstablish the lrUe(bul less optimistic) sizcof the benefit. 1bc way in which possible effecl sizes arc determined will depend on the specific situation under corwideration. For example. if a study is repealing one already conducted then very detailed information may be available on the opliorw for the efl'ed size suitable ror planning die new study. Estimates of the antiCipated effect sizc may be obtained from lhc available literature or formal ),IEJ'A-ANALYSESofrelalcd studies or may be elicited fram expert opinion. For clinical trials. in cin:umstances when: there is little prior informalion available. Cohen (1988) has proposed a slandanlisc:d effed size. ~. In the case when the difference bclwecn two lrCatments 1 and 2 is expressed by the difference bclween their means (PI - Pl)and a is lhc standard dc\'iation (SD) of the ENDFOL'IT variable. which is assumed to be a conlinuous measure. then ~ =(PI -I'l)fa= dla. A value or..d :5 0.2 is corwideR:d a small standardised effecl. A ~ 0.5 as moderate and ..d ~ I as large. Experience has suggested lba.. in many clinical areas. these can be lakcn as a good praclical guide for design purposes. In intermediate situations for dinicaltrials atlcast. Bayesian approaches to obtaining a distribution of effcci size haye been suggcsled by Spicgclhalter. Freedman and Parmar (1994) (sec BAYESIAN METHODS). 11acsc involve oblaining views on likcly effect size from a SUl'\'Cy of relevant experts and combinirq; their responses into a FlUOR DIS11UBUTION of plausible effecl sizes rrom their responses. Subsequently~ this prior distribulioa is then combined with the data obtained from the trial onceconduclcd to give a POSTERIOR DISTRIBU110N concerning the bue effect size from which conclusions wilb repnlto emcaq arc dlen drawn. This approach has also been advocated by Tan el at. (2003) who suggest how information.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SAMPLE SIZE DETERMINATION IN ClINICAL lRlALS from whalcver saun:e, may be syalhesiscd inlD a prior distribulion for Ihe anlicipalccl efI'cct size that is then UIilised for planning purposes.
Next. some theory and formulae am pn:sc:nled 10 show their c1mYalion.ln practice,. one can ~fc:r i. simple silUlllions to a graphical approach Ibat yields a suitably appIQXimatc sample size (see AllnUllfs IIOIDDIram) (AllIIIIID. 1991). Ina trial comparillllWo gmups. wilb n subjec:ls per group. if we assume lhaIlhe oulc:ome variable is CllJlllinuous••tl and Xl summarise Ihe n:spec:Iive means or the obscl'Yalions takea. Fwdaer. if the cIaIa an: normally dislributed with equal (papulation) SDs~ a. lhc:n the standard emn (58) am S8 ~tl) =SE(.q) = Ol./": 11ae two poupi an: compaml using d =!I-.q wilh S8(d) = a fiji. Hae we assume thaa 58(4) is dae same when ~ nuD hypDlhesis. Ito. of no difren:ac:e is lrUc aad when the altcmative bypolhesis. HA • thai ~ is a difference or size ell is lrUe. Under the null hypothesis. HOt Ihe critical value d. is dc:tmniacd by:
da-O
=:1-.
(I)
specific: experimental desip. the allocalion ratio (i.e. the possibility or lhe design stipulating unequal subjc:c:t numben in each JlVUP)' Ihe panic:ul. type of eaclpoint under consiclc:ralion as well as. for CUNICAL 1RJALS. the type: of RANDC».tISIJIO involvecl. Viljanen ~I ",. (2003) spec:illccl a =0.05., =0.2 and die: aaUcipaled effect size. el = I unit with slaDdani clc:viatiaa 2 units. Plum this. lhe stanclanliaecl effect size is ~= 0.5 and from equalioa (4) we Ond 64 paliCilts per group an: requilal. When dealing with billlll')' outcome wriables the sample size clc:rivalions ror binary data am similar 10 lhaI for conliDUDUS data. butonc has tospc:c:ifytwopmpanions:sr. andJr2 and the cll'c:c:t size is abe diffen::IICC el =JI'I - "2. Campbc:D. Julious aacI Allman (1995) gil'e lables for the sample sizes requilal forabe comparison of two biaomial pnIpOItioas. and this is shown in the first table.
,,=
eampIe *e determination In clinical trials C0mponents nBCfJSSIIIY to estimate lire size of a Bluely Anticipatc:cl (planning) size of Ihe 8f1'cct size. el Type I emx'. a
Ia c:ontrul. unclcrthc_ ail. HI. is lnIc:. d
pIion lhal die altemalive hypathh. mean ell but the same
SE(d) = 0 fiiiii.ln Ibis. case lhe pmbabilily that IJ exceeds d. must be /l, and this implies that:
V-
da-el
= -ZI_/I
Type II enar.,
clifrcn:nce between the two paups Equivalently. the lest lize or signific:aace level or Ihe statistical lest to be usc:cl in the analysis Equivalently. Ihe power, I - fJ (usually expmscd as a pen::ICnlap)
(2) As an example or use orlile 1abIc:.. we consider the trial by
ions (I) and (2) ror "- and sample size for each graup in
Solving the two ex reammpng. we oblai. the bial as:
_ a1... II -
..
2_
2jY(-.-a +-1-/1) -
2(ZI-a + %._~)2 ~2
(3)
nis is termed Ihe/Ulldtllllenl,,' ~f/IIlIlion. it arises. in one fonn or IIDDIher. in many situations fOl' which sample sizes ~ calculated. The use or equation (3) for Ihe case or a two-tailecl test. ndhc:r than Ihe ~lailecl test. involves a slighl approximalion siDce IJ is also slDlislically si&nific:1IIII ifit is lea than-t!.. HoweWI'. wilb iJ positive abe associatc:cl prababi6ty or obscnilll a Rsult smaller than - d. is neglipble. Thus. for the: case: or a two-sided Ie", we simply replKc ZI_. ia equation (3) by =I-an. For Ihe .comnaaaly occurring situaliaa of a = 0.05 (twosided) and ,=0.2. we find thai equation (3) simplifies to: 16 " =AI .d
(4)
This basic equatiOD h. to be modified to adapt to abe
ViljllDCn el"l. (2003). They showed that Ihe pmportiaa or people in lheirCIIJIIlm1 puIIP willi neck pain who had been on sick leave oyer 12 monlhs was 154Jt. Suppose we wisbcd to design allial that propcIIIecIto reduce lhilto IR.1bc:a fram Ihe sc:c:oncl table we would requiR 686 (say 700) people per group wilh SOCII power at SCI. significance. TIH: point to DOle here is that allhoulh this is a 33.. ~uction in Ihe nIe. it still n:quires a large trial. lageneral. the binary outcomes will oRen require large trials because they contain aauc:h less iafonnlltion than a continuous outcome. ForelUllllple. suppose Ihe body mass iDcIc:x (BMI) in a populalion was about 28 kJlm2 • with a standan:l devialion or2 kglmz. This means"at about IK oflhepoup an: defined as obese: (8MI > 30 k/m2). Suppose a triaitric:cI to reduce Ibis absolute proportion by 5"'. to about 11 ... Then from the second table we would need about 686 people to detc:ct this with 5.. significance and SOCJ.. power. However. to obtain about II .. obese pcnons in die: population we would bave 10 n:dutle the mean 8MI to 27.54. Thus the slandanlisc:cl effect size is 0.23 = (28 - 27.54)12 and flUID equation (4) we would requin: about 300 patients per group for 809f, POWc:l' at 5.. sipaific:1IDCC level. or less dIa. half Ihe equivalent sample
401
SAMPLE SIZE DETERMINATION IN aJNICAL TRIALS ample alze detennlnatlon In clinical ...Ia.s' Sample sizes to detect a difference In two proportions, :rr, and;r~ Bl a 5% significance level with 8t1J(, power ;r~
1r,
dOS dlO diS 420 d25 O.JO 0.J5 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
0.00 152 74 41 35 27 22 18 15 13 II 0.05 435 141 76 49 36 27 22 18 IS 0.10 686 199 100 62 43 32 25 20 0.15 906 250 121 73 49 36 27 0.20 1094 294 131 82 54 39 0.25 1251 329 152 89 58 0.30 1377 356 163 93 0.35 1471 376 170 0.40 1534 388 0.45 1565
size RlCIui~ for Ihe binary outcome. The moral or the story is to tl)' and have continuous outcome variables if possible. Commonly. the numbcrofpalients that ean be included in a study is lovcrned by nonscienlific forces such as lime. money and human resources. Thus with a predetermined sample size. the researcher may then wish to know the probabilily or deteclinl a certain effect size with a slUdy confined 10 this size. Iflhe resulti", power is small. say less than 509ft. thea lhe investilator may decide lhatthe saudy should nol 10 ahead.. A similu silUation arises irthe lype of subjecl under consiclcndion is uncommon. as would be the ease with a clinicallrial in ran: disease lrouPS. In either case, the sample size is conslnlinc:d and the resean:her is interested in finding the size of effects thai could be established for a reasonable power of. say. 8O'JL. Thus the output from a sample size calculation should be a ranle of possible sample sizes against the effed sizes deteclable with a number of levels or power. In onIer to calculale the sample size of a study one mUll ftnl have suilable bacqraund information tosether with same idea as to what is a Jalislic difference 10 seek. Sometimes such information is ayailable as prior knowledge from Ihc: Iik:r1llure or other sauKlCS: al oIher times. a PII.DI' S1'UI7I' may be conducted. Tnditionally. a pilot study is a distinct prelimilllU)' investigation. conducted befce embutti~ on the main trial. However. Wittesand Brittain (1990) have explored the usc of an inlemal pilot study. The idea here is 10 plan the clinical trial an the basis of best aYailab1e information. but to regard the first patienlS entered as the internal pilot. Whea data rrom these patients have been collected. the sample size can be lUSIimated with ,Ihc: reVised knowledge so gcneralc:d.
10 12 16
I
II
14 17 29 23 41 31 61 42 96 62 173 97 392 173 22
7 9 11 14 18 24
31 43 62 96
6 8 10 12 15 19 24 31 42 61
6
5
7 8 10 12 15 19 24 31 41
6
7 8 10 12 IS
18 23 29
4 5 6 7 8
10 12 14 17 22
4 4 5 6 7 8
10 II 14 16
3 4 4 5 6 7 8 9 11 12
2 3 4 4 5 6 6
7
8 10
Two vital felllUres accompany lhisapproach: firsl.lhe final sample size should only ever be adjuslc:d UPKvutIs, never down. and. second. one should only usc the internal pilOi in order to improve the estimation factors tbat ~ independent or lhe treatmeal variable. This second poinl is crucial. 11 means thai when comparing the means of two groups. il is valid to n:estimate Ihe planning SD. 0 ..... but not d...lIII• BaIh these: poinlS should be carcfuny observed to avoid distonion orthe subsequent significUc:e Icsl and a possible misleading interprelalion or the final study results. The advan. .e ofan intenW pilot is tbat ilean be relatively IlIIge - perhaps half or the anticipated patients. b provides an insurance agaim1 misjudgemcnl ~~ the baseline planninI assumptions. II is. nevertheless. impodant thai the intention to conduct an internal pilot study is reconIcd aI lhe oulset and thai fuD details ~ given in the study protocol. An internal pilot is an eumple or an ADAPl'lVE DESJON (Bauer. 2008). which is a more receat, flexible approach to clinical lriaI design. In studies involving a single gmup. sample size calculalions ~ couched in terms or Ihe CONFIDENCE INlERVAL. Thus for a given study endpoint. for example. the mean systolic blood prcsSIR (SBP), Ihe hypotensive proportion or the median duration or fever. calculated from the subjects in a case series or a cros~sectianal sun'ey. it is usual also to quote the corresponding confidcace interval (CI). ThUs. when planning a case: series sUI"Ve)', il would be approprialc 10 define cu. the width of the desired CL This width will depend on the variability rrom subject to subject (which we cannol control) and the number of subjects in the case series. We assume lhe objecl of the slUdy is to estimate a population mean p. and this is thoughl to bc clase tol'Plran' Further, iflhc: data can be assumed to follow a NDIIMAL DlSTRlBtmON. then
_ _ _ _ _ _ _ _ _ _ _ _ _ _ SAMPLE SIZE DETERMINATIOO IN ClUSTER RANDOMISEDTRIALS provided we choose a relatively Iarp: sample size n.the 100 (I - a)Cl, CI forlhe population me&nJl is likely to beclosc to: (5)
Here a is the st deviation. which s subjCC:I-lO-subjCC:1 wri on. The width, CAl, of this CI is obtained from the difl'c:n:nce between the upper and lower limits of equation (5) as: CAl
= 2 x ':1-11/2 x
(6)
Thus. for a planning value (0, the umber of subjects rr required is obtained by reorganising uation (6) to give the required study size as: (7) In practice, to calculate nPlIIR' a value of aPlIlft as well as w..... or a value for their lDIio has to be provided. The actual value ofPpt_ docs nol fealure in dais calculation. Once the study is
completed. the sample mean x replaces"p... and the sample slandanl deviation. s. replaces aP"n in the calculation of the CI or equation (5). For example. Weir. FUIICbi and Machin (1998) give the mcaa latency of the audilOly P300 measun:d in 19 righthanded palients with schizophrenia as 346 ms with SO 27 ms. Using equation (5). the CCIITCSponding 9Sc.t CI is rrom 334to 3S8 ms. The width or this CI is CAl = 358 - 334 = 24 ms. If the study wc:n: to be repeated but in (say) leR-bandcd patients, how many would be required 10 obtain a narrower width of the CI SCI allO ms? In this case. Wf'l1III = 20 ms. and. assuming the same SD of 27 ms. equation (7) suggests 4 X (27120) X (1.96)2 = 21.1 01' approximately 30 patients. Many dinicallrials are designed to show thai treatments are effectively equivalent. rathCl' than different (sec EQUlVA. LElIICB sruDIES). In Phase II trials one might like to show that a generic drug is equivllientto a standard one in lenns of its pharmacokinetics. This bioequimlence is often phrased in terms orthe AREA UNDER THE CURVE (AUC) of the sclVm levels of Ihe drug .ner consumption. Since it is impossible 10 prove equiwlence. one has 10 specify in aIlvanc:c a difference. 6, within which one is willing toconccdc: that there is, in fact. no diffcreace. For bioequivalenc:c, the convention is 10 accept that two drugs are equivalent if their AUCs are within 2Oc.t or each othe... or the ratio is belwcenO.& and 1.25. Funhcrdelails are given in DileUi, Hausc:hke and Steinijans ( 1991 ). Cohen (1988) is the classical n:fetmee for sample size calculations. The book by Machin el at (2008) givcsdctaUsof sample size calculations for a large number of othCl'designs. such as sludics with more than two groups. with ordinal or
=
survival oUlcomes and fOl' paired dalD. Hints for sample size calculations are given by Lenth (200I). NO\Yadays there is much software. both 4XJ1111DC1'Cial and fr=ly available on the web, for performing sample sizc calculations: see, for example. hUp:l/www.5IaI.uiowa.edul.....tc:nthlPower/. Sample size c:alculalions have: been criticised by numerous authors on a numbel' of grounds, such as that dley depend on only one endpoint and yel bials wUl have sevc:nl. thai any size study can be justified by judicious choiCIC of endpoint and power. that often an investigator has no idea of what a meaningful effecl size is and that they COIICCntmte on significanc:c lCsls when in facl the purpose of most experiments is estimation. However. Williamson el aL (2000) gave a \'igOlOUS defence of the practice that fon:es an investigator • priori to name the main outcome variable.. which can then be checked in the analysis. to protect against data dredging and to prevent investigaton embarking on studies that scientifically have liulechance ofgelling meaningful results. MJC Altman, D. G. 1991: P,actital s/atislics for medical re.fetII'c/r. Loadon: ChapmUi a HaJllCRC. Baaer, P. 2008: Adaptive designs: looking for a accdIe in a haystack - a DC\\' challenge in mcctical R:5ICIIKb. Stalirlia in Medicine 27. 1565-80 CampbeU~ M. J .. JaUoas. S. A. aad Altman, D. o. 1995: Sample sizC$ for binary. ordered categorical and continuous outcomes in t\\'O paup camperisons. SrilWl Mftlit:al Jourlltll 311. 1145-8. CoIleD, J. 1981: St. lislkYIl powr QIIIllysis foI' Ihe belrorioral scienas. 2nd edition. Mahwah: Lawreac:e &Ibaum. DlIettI. E., BalllUlc.e, D. and stele uu-~ v. W. 1991: Sample size ddmnination for bioequi\'3lenoc uscssment by means of caaficlcnce iatcn'8ls.lnlemaliOtltl! JOJll'IItll Clinical PIrttnruro/0IY. Therapy and Toxicology 29. I~. Leatb. R. V. 2001: Some praclic:a1 pidelincs for effective sample siZle ddmninatiCllL 71re AmeriLYIII SlalisliciQII S5. 187-93. l\_ _ 0 .. CanlpbeD, M.J.. Ta.5. B. ad Tan, 5. H. 2008: Sonrp/r sce lables for dinicoJ sludies. 31d edition. OticileslCr: Wiley-BIackweU. SpIepIbaIter, D. J .. Fnedmaa, L S. ad , ...... r.L K. B. 1994: Ba)'esian apprvacbes to randomized IriaIs (with discussiaa). JOlimal ojlire RoyalStalislical Sociely. Series A 157.357-4 J6. Tan. 5.-B.. Our. 1(. B. G., BrazzI, P. ad Madda, D. 2003: 1b\\'8Ids a &tI'Idqy forraadomiscd clinical trials in I'IR CInCas. Brilish Metlit:al JOIImal327.47-9. VlIJ....... l\... MalmlYaara.A.,tJllte,J.~RI...., M.~ Paba....., P. ad LalppaIa. P. 2003: Eft'ectiwaess ord)'lWllic muscle training. RJaxatiOll training or ordinary .:tivity for duonic Deck pain: nmdomiscdcontmllcd trial. British Medical JDUmQ/327. 475-7. W1Jla·SGD,P.,Hattoa,J. ~ BIla,J.,Bluat.J.. C'utplJeD, l\L J.. NIclaIIoa, R. 2000: StalisticaJ RView by Rsearda cdUcs committees. JDumtll of the RayoJ Sialistical Sociel)', Snits A 163. 5-13. Weir, N. H.. FIucId. 1(. aDd Madlin, D. 1998: The distribution of Iatcacy of the auditory P300 in schimphrenia and ~s sion.Smi:ophre"iaReselUcb31.151~. Wlttes,J.... BrI...... E. 1990: The role of internal pilot studies in increas~ the cftkieacy of
c:IiDicailrials. SlaJislics in Mftlidne 9.65-72.
samplealze determination In cluster random.. eel blala When designing cwmR RANDOMISED 11tIAI.S. the sample size should be carerully chasen. as when designing
403
SAMPLE SIZE DETERMINATION IN CLUSTER RANDOMISEDTRIALS _ _ _ _ _ _ _ _ _ _ _ _ __ ~nlional a.DIIC'AL 1RIALS
that nncIomise individual patienb. The use or cluster randomisation musl be taken into account al the cIesip slage or the iii"; the 10IaI number of patients requind is IIIIICI' under cluster randomisation than undcl' indiYidual ranclamisalion. so a llial tlma nncIomisc:s clllSten withoul incrcasilll the sample size will lack POWal.ln a cluster ranclamisc:cl trial.1he responses of patienb tium the same cluster cannaI be assumed 10 be indc:pcndenl. because paIieIIts within a cluster 1ft II'ICR similar than patients fmm diffc:n:nt clusters. SIaDdanI fannulae ror sample size determination assume abe outcome:" for palieDts in the planned study 10 be indepeadent and this assumption is invalid in a cluster raacIomiscd trial. 'I1Ie size of a cluster trial has lWo components: the numbCl' or clusters ~cruited and Ihe number or patients R:Cruited from each cluster. which is referred 10 as abe cluster size: (DOl usually equal to the papulalion size far each cluller. e.g.lhe number of patients in a hospital catchment 1IICa). To calculate how many patients 1ft n:quiml iD a cluster nndomised 1riDI. sample sizes given by standanl fannulae should be inllated by a factor kaowa as the dcsip etrccL The delisn effect is equal to (I + (;;-1 }p). where;; is abe average cluster size: aad p is abe DmlACUJSTER CORRB.ATIONa&FICIENI" (ICC). which n:pn:scnts the anlicipalCd exlent or similarity wilhin cluslcrs. For a cluster trial employing paiml or mtATIFIED RAMXWlSAnON I1IIhcr than SBIPLE RANDOMISA11OM. anon: complicated formulae 1ft ncc:dc:d to caJculak: the sample size (Donner and Klar. 2000). Deciding to randomise clusters ralhcr Iban individual patients can haye a substantial impact on the sample sUe requiml. Consider. ror example~ designing a trial 10 detect a difference ofO.2S stANDARD DEVlA1IONS in Iotal clm.esterol at a (two-sided) SCI. significance level. If Ihe trial we~ to randomise individual patients. a total of S04 patienls would provide SOCJf, power 10 detect this difference. If choosing to randomise general practices (for example). the sample size mauired to provide the same level of POWCl' depends heayily on Ibe anticipated yalue for Ihe ICC. as shown in the table.
sample size cletennlnatlon In cluster randomlsed trials Sample sizes lIral provide 80" power 10 delect the specifleddJlference In • cJuslerrandomlssd tI1aJ, ., different levels of ICC Toltll sQlffple si:e
752
101M 998
A"I'lIge elu~ler size
ICC
SO
0.01 0.01 0.02 0.02 0.05 0.10
100
SO
1502
100
1740
SO SO
2974
Even when the ICC is expected to be small. clusler randomised trials can ~uire considerably incn:asc:d numben ofpalients in comparison with individually randomisc:d trials. especially when cluster sizes a~ large. This demonstrates Ihc law of an approach occasionally used in Ihe pa51. in which ~sean:hers who anlicipaled a small ICC negleclcd lo allow al all far the use of cluster randomisalion in their design. TIle desired level or power for a cluster trial is more easily achieved tIuou&h raisilll Ihe plannc:d number ofclusters than through mising the planned average cluster size. In some settings. however. the number orclusters aYailable for ~cruilmenl may be Iimiled ar ftxc:d in advance because of administrative or financial constraints. The OCICummce of DROPOU1S in clinical .nals is always undesirable~ but in a cluster randomiscd triallbere is Ihe possibility that entire cluslcrs will dmp oul. Because loss of clustClS can seriously reduce the power of a lrial. ever)' attempt musl be made to n:tain all clusters recruited and some authon n:commend identifying a n:scne of potential substitutes before slarling the trial (Donner and Klar.2OOO). The value assumed for the ICC when calculating Ihe sample size for a clusterrandomised trial is usually based on available estimates for ICC values in similar sellinp. For example. in desiping Ihc hypolhetical trial discussc:d above. the ICC Yalue p used in the formula ror the design effect would ideally be based on ICC estimales representing similarity ortotaJ clmlestcrol mcasurcmenls witbiD general practices. In onIcr thal researchen planning bials have a good chance of locating relevanl information on likely values for Ibe ICC iD Ihcir trial. it has been ~commenclc:d lbal completed cluster trials publish the ICC estimates rar all outcomes collected. In addition. lOme rescan:h groups ~ collalilll published and unpublished estimates into ICC databases. such as thal of Ukoumunne el Ill. (1999). Even whcncompletely rele\·ant information cxists.. aYailable ICC estimates tend to be imprecise: i.e. the CONFIDENCE INI'ERVAL associaleci wilb the eslimatc lends toinelude a wide range of ICC values. The table demonslrates thai if the ICC value in the future bial is higher Iban the value allowed for al Ihe design Slage. there could be a serious loss or trial power. For Ibis reason. ICverai rcscan:hcn have ~ommcndc:d using a conservative Yalue for the ICC or taking into account Ihc uncCltainty in Ihe ICC estimates used (felli and Cirizzle. 1992; Turner. PIe'VOSt and 1'IIompson. 2004). Alternatively. an internal pilot study design could be used, in which the sample size is ~alculatcd some way into Ihe trial on the basis of Ihc current ICC estimale (Lake el Ill.• 2002). RT DenaIr. A. ad K1ar, N. 2000: INlig" _ tIIIIIlysa of cllUter rontlonrisaliDIIlrillb iIIlwllh re8ftll'~h. LoadDa: Arnold. ' ' ' ' Z. ad o.tm.. J. E. 1992: Con'CIlIICd biDomiai mates: propatics or
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SAMPLE SIZE DETERMINATION IN OBSERVATIONAL STUDIES cslimatar of inlnM:lass cCII'RIliian ad ils effect GIl sample siK calculation. Slatislics in MetJitine II. 1ti07-1 4. Lab. s., K. .• ..., Eo. Klar, N.... R. 2002: Sample size ~ eslimatian in cluster ranclamiutian trials. Statutits in Metlidne 2J. 1337-50. TIIrDII',a. Yo, Prftad.A. T..................s..G. 2004: Allowiag for irnpm:isian of'lIIe inbacIU5lercam:11Iian cocfftciml iD the cIcsip of cluster randomized IriaJs. SIIl/islits in Mdt;"e 23. 1195-214.lJkaaInaDae.O. c., GaIIIfanI,M.c., CIdaD.S.. S..... J. A. C. ad BarDe)". P. G. J. 1999: Mcdaads far evalUlling . .wide and orgllllisalioa-bascd ia&enenlians in health and hcallb care: a s)'stcl'Dlllic rmew. Nt'II/lh T«IJnoIlJty AssrDIMII 3. S.
Be.....,.
sample size detannlnatlon In observational
stuc:ll.. Consideralion or sample size is as imponana for
OBSERVATIOIW. SnJDIIiS as it is far randomised conllUllccIlrials
(sec CUNICAL nw.s).1he crucial issue beilll whether the slUdy
will be IarF enough to ans\\'Cr the racarch question (or qucstians)wilhsufJicicnl Slalistical pn:cision orPOWER. As with mndomiscd CCllluollcd lrials.. the depu 01' staIislical Untertainly decn:ases willi i~ sample six and.. in Icnn. samplc sizeJalUimncDISlRsubstantiaUy ~whcnstucly iag ran::r diseases or less cvmmon cxpasun:s. Obscmatianal sbldies that an: lao small may failtodclccl important ell'ec:tsor produce eslimales laD imJRCise to be uscftd.. while diose Ihat an: 100 larp can waslc raaun:es and lake 100 10lIl10 pnxIuce rcsuIlS. At Ihc same lime. it should be noted thai slUdy size luidelines are not illleadcclas rigidruJes. Itma)' be. forexamplc. thai lhe n:sean:h question is or sufT&eienl imparlancc 10 outweigh an inadcqualc.. yet unllYoidablc. sample size (Col. n:scan:h iaID a ~ cancer). Although a single slUdy may be toosmaU topnwidcdeftniliveraullS. ifwelldcsigncd it wilillill have the polcntialtomakeanimpartanlCCllllribution tocxisting evicbce.lnCXllltrast.a poorI)'clcsignedstud)' will be al high risk of producing biMcd raulas. The prablem or BIAS will not diminish by incn:asinglhe sample size. Samplcsizc n:quin:menisforobservationailludiesshouid be criticall), assessedalthestudydesign stap. Rcgardlessof the type of study being planned. die lirst step involves idenlirying the primary health outcome of interest and the primary expoSURe Sample sizc calculations should dlen be carried out for a range of diffen:nt scenarios. Particular consideration h~ should be given to the extcnt or missing data to be expeclcd. Individuals selected for interview in a CASE-co.~OI. Sl'UDY using hospital records. ror example. ma), nGi all be lnIccd and some of those successrully traccd ma)'~fusetolakepart.lncoHORTsrUDIES. indiYidualsmaybe last to follow-up. particul..ly when Ihc lIudy period extends over scveral years. Samplc size calculalions should be adjusted to lakc account oI'the likely cfTcct oI'these possibilitieL If Ihcn: is man: than one outcome mcasure or m~ than one exposure being lIudied. these should be ordered in tcnnsofpriority.Oncethesample sizcn:quimltoachievethe nc:c:dsforthc primar),rcscan:hquestion has becndelclIDined.
this can subsequently be used to eumine ils adequac)' f'or additional rescan:h questions. TIle specific type of oUlCome measun:(s) 10 be studicd is an important dctenninant of sample sizc Rquiremcnls ror observational studies. Outcome measures in CROSSSECTIO.'tAL STUDJES. for cxample. can range rrom binary proportion or PRf.VALeNCE. (e.g. obesit)') 10 those that an: continuous when: inte~st ma), focus on means (e.l. mean systolic blood PKssu~). In Icneral. sample size requirements will be much larger when the outcome measun: is binary. Samplc sizc Rquin:mcnlS can be addn:ssed fiom 1W0 diffcrenl perspeclives: Ihc: Slatislical power to be achieved from a tesl orSlalislical sipific:anccorlhc: levclofJRCision to be auaincd fiomcllimatian. Standard fonnulacrequiml when cllimating a single prevalence rate or MEAN. or comparing prevalcnce (or means) in two groups. an: provided clscwhcn: (KirkwoodandStcmc. 20(3). Compulel"pI'OInms n:quiRdlo perform thesecalculalions lRalsoaYailablc(SIalcXHp. 2003). For !Rvily. the remainder of this entry considen certain issues associatcd with sample size n:quiRmeDts for casecontroilludiesaadcohonstudicsonlywhcre interesl focuses on a binaryexposun:.. Consiclcralion ofmore advanced issues for these studies. such as tests for trend or interaction. is addn:ssed elscwhen: (Smith and Day. 1984; Breslow and Day. 1987). Sample size requin:mcats in casc:-c:ontrol studics depend on the expected mqniluclc of the ODDS RAno and Ihc prevalence ofexposure in the conlnJls.ln order 10 detcrmine Ihc study size Rquired 10 achieve adequate slalillical power. Ihc number n:quired in each IrouP (cases and controls) can be detcrmined rrom thc formula used for comparing two proportions (Kirkwood and Sterne. 2003; SlalCorp. 2003). In Ic:acral. the closer the odeIs ralio to be delccled is to Ihc null value ( 1.0) the l8IJCr the sample size required. while Ihc closer Ihc pn:valcac:c is to 0.5 (or SO..) the smallcr is the required sample size. When the number of cases is fixcd. as oRen OCCIa'S in very rare diseases. slatillical pn:cisian (and power) can be improved by increasing the number of controls per casco The study size required 10 achicvea ccltain level ofpn:cision can be assessed by examining the width of 954J, confidencc intervals (95" Cis) for the otlth rtllio. The lint IIpre on page 406 shows the effect or incrasilll the number of controls perclllC on pn:eision wh~ the expcctc:d odds ratio is 2.0. the prevalcnce of exposure iD controls is 25.. and the numberofcases is ftxed at 100. As may be seen. the pn:cision of the estimate increases with the number 01' controls per case. but little gain isobscrwd beyond fourcontrolspcrcase. In Ic:acral. unlcss Ihc extds ratio is substantially dirrcRnt from unily then: is liulc advantage in having mon: than rour controls per case (Breslow and Day. (987).
405
UM~s2ERe~noN
_______________________________________________ lOOO will be Riquin:d in each paup. i.e. a tatal study size or6OOO. 11Iis could be achieved by studyinSlOOO indiYiduais iaeach poup rar 1 ycuor 1500 in each poup ror2 yars and so CMI. Similar slUdy sizes will anly achieve 4OCJ, power. however. if IIIe IIUe rab: ratio is only 1.5.
3.5 3.0
I
~
2.5
too
2.0
1.5
80
1.0
o Nlinber of conboIs per case
....ple.1ze determination In ....rvaIIonaI8ludl. . EIfect of IncIeIJsIng the number of oontroIs ptlTC8S8 on 95% conIIdtJnce IntetVllls for an odds IBIo of 2.0 with
25" of controls tJJtpOIIIJd One pDlaalial nKlle far improving Ihe eftlciency ora caseCXlDlroJ sludycan be achieveclat the delip slap by matching cases and conbols ill ~lalion 10 one or IIIDI'e specific canrolDlciers. Matching cases aad conbols oa the basis of silang canroundiag faclon can ilHRBSC the pRlCisiOn (and power) of a saud), and also offers Ihe palelltial or a &maBer sludy size Riqui~mc:aL However. il sIIauIcI be noledill.. matchillg does nat alwa),s yield such pins. In padicular, unless the CXlDfoundiag f'aclor is SlmngI)' reIaled to the elisease Ihe~ ma)' be lillie beaefil rrum malching (Smith and Day. 1914). Unlike c:ase-coalroJ studies, c:ohaII studies provide the appartlDlily to estimale Ihe absolute mqnilUdc of disease risks, nICs ar adds as well as the com:sponcling measun:s or effect (risk ralias, rale ralios oroclds ratios). Asa n:sull. when caasicIerinJ abe saudy size n:quin:menls in cohan saudies il is particularly importanl to decide on abe primaIy outcome measun: in advance. When planninl to compare disease accum:ace in two groups (e.g. expiJl1Cd and unexposed individuals) ulinl a tesl or stalislic:al sipilicance. sample size Riquin:mcntsdc:pendon: Ihe mapilude ofille ralioor(or elill'ercnce in) disease outaJmes 10 be delected. Ihe 1e~1 of disease CJCCUImICC expc:cted in the unexposed group. the leyel or slatistical sipificance and the statistical power n:quinxl. The Riquilal formulae far these calculations ~ providecl elsewhe~ (Kirkwoacl and Sk:me. 20(3). Altematively. the le~l of statistical power can be delel'mined fOra yariely or cliffeRlltllUdy sizes. The seaJDd ftp~ shows power canes obtained ror a cohort slUely when: the mle in the unexpascd is IOper IOOOIUSON-YEARSATRlSKand the lwo-sided Iew:I of lIalistical signiftcance is 511,. Two dill'eraal scenarios ~ shown. 11ae lower cum: &bows the poWer to detect a raIe ratio of 1.5 ancIthe upper c:urYe a rate ndioof2.0.lflllellUeraleratiois2.0andlheaimistoachic:~
a minimum or 9O'it power. this sugesll thal a study size or
20
o
1000
2000 3000 4000 Study lim (person-yearB)
5000
......... determination In obeervatlo'" 8tud1e8 Power to deteclthe IBM IBtlo In cohott studies with the rate In unexpossd= 10 per 1000, wIIh 5" level 01 statistical significance (two-sided) Sample size Riquin:menls for c:ohod studies can also be considered rlOm Ihe penpecti~ or pRlCision of estimates. as acIcIrcssecI in case-contrul saudies above. Again. consideration should also be li~n to the need 10 adjust rar confaandiDi ractan, as Ibis wiD lend 10 illClallC saud)' size Riquin:menls (8n:slow ... Day. 1987). AssessiDiIhe efl'ect ofexpostnoadiseaseexpcric:aceclbyacohanofinclividuals exposed to a padicularsubslance may involye compariq IIIe cyents obaerYed in lhec:ohall with those expected on Ihe basis ofrata in a ........... papulation. Sample size mauin:ments rar this scenario are pnwidc:d elsewhe~ (8n:s1ow and Day. 1987). lAIC BnIIow, N. E.... Da)', N. & 1987: SllIIulimllllelll_ ill t:tIIItf!r
to"""
Vbl. IL 71w de~;,,, __lysis t1/ stllliie•• Lyon: InlClDlliaaal AgeDC)' far Rcsean:h GIl CILDCCI'. Kldl"", 8. ..... ...... J. A. C. 2OCD: EamtW rnetlil.'tll sttllislirs.. 2nd cditiaL Oxranl: BIIdwII. S'adtb,P. 0 .... _tN. &19I4:Thc _pof c:ase-coaIIaI SIUda: Ihc iaftUCDCC or conrouadiq and imaldion efftJds.l,,'el7lllliDIIIIlJolInIfIlqfEp""itJlory 13. 3S6-65.StaIcGIp 2003: SIIII&lkal 1fl/IwIIr~: B. \til. 4 (Sampsi pmpam). RIWUrIt.
mmse
CotJeac Station., TX: StaIB C'aIpanIiaa.
88mple 8Ize reeatlmatlon Al the design slage of a CLINICAl. 1RL\L there aM IOIIIC uncertain parameler5. For example. lhe sample size of a study wiD be based on aD estimation of variability (sec SAMPLE SIZE DElDMlKATION IN aJNICAL TRLU.S). Other things lhat are Cllimated. but have some uncertaint)'. inc lucie hazard rates or a group event rate (see PROPOR1'IONAL HAZARDS). There an: also silualions
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SAMPLING DlSTRIBlITIONS where Ihe primary objective of Ihe siudy relies on the exposure to a drug. This might be needed 10 assess safety and a minimum amount of exposure may be necded. With this uncertainty surrounding initial assumptions it may be wise to consider modifying the total sample size based on an intcrim assessment of the data. If Ihe assumptions appear to be incorRCt it might be possible to make mid-course adjustments based on the data collccted. Chuang-Stein et al. (2006) provide a thorough review and recommendations for sample sizc rccstimation for PHASE III DIALS and PH,"5E IV TRJALS. but the principles can also be applied in any study where the adjustment of the sample size is needed. Methodology 10 address sample size rc:c:stimalion is wide and varied. and each comes wilh an aexompanying decision JUIe. This decision rule is impadant in order to understand the operating chlll'Kteristics of the n:sulling design. For conftrmalol)' trials the conbol of a TYPE I EJlAOR at the designated levcl (usually SCi.) is paramount and any adjustment to sample size must ensure Ibis. Wilh the use of simulation one can invcstipte whether the TYF£ I ERROR will be controlled. It should be noted that traditional group sequential methodology by performing interim analyses is based on the statistical information accrued. c.g. in an event-clriven trial. and then adjustment based on the number of cvents mauired at thc cad of trial. Sample size rc:c:stimation can be conducted in a blinded or unblindcd fashion and the considerations for both will be discussed.. Blinded sample size rccstimalion involvcs a review of the sample size based on a nuisance parameter such as the variancc of a continuous elDPOINT or the underlying rate for a biaBl)' evcnt. Because it is blinded there will be no indicalion as to the observed lreaIment effect. and the sample size rccstimalion will be based on the original assumptions. (]auld (2001) reviewed methods of this kind and showed thai they were companble in performance to those using unblindcd methodology. Because there is potentially liUle BIAS inlrOduccd owing to the review being blinded. these types of re\'iews are acceptable 10 regulatory agencies. If then: is gn:ater unccrtainty about the treatment diffel'ence as well as the NUISANCE MRAME1D. then one might want to choose to do an unblinded sample size rccstimalion. Howcvel'. there would need to be proccdural controls to minimise the cffect of any potential bias thai could be introduced by unblinding the data. Unblinded samplc size RCstimation pn:scats more of an issue because of this potcntial inherent bias they could introduce. but wilh more knowlc:dge they should perform better than in the blinded case. Howcvel', there is the potential to increase the Type I enor rate. In order to control the overall Type I error the employment of ClDmbination tcsts is uncIcrtakcn. 11Icsc are methods where the P-VALlIES before and after the adaptation arc combined. The Fishel' combination at is commonly
used and Lc:hmad1e1' aDd Wassmer (1999) suggest an invcrse nonnaI melhod for combining the stages. Any unblindc:d sample size m:slimalion should be undertaken with caution. Aa.-ess to the data can potentially lead to operational bias and so put the integrity of the study in doubt. The analysis and n:sults need to be properly managc:d as awareness of the methods and resulting sample sizc modifiCalion could Icad to infen:nce about the trc:almcnt diffcrence. In addition. unblindc:d sample size rcestimation is likely to be less acceptable to regulaaory ageocics. Chuang-Stein el al. (2006) warn against using the interim treatment effect as a parameter of intcrest for the following reasons. First. they can be ineffic:ient with interim lreaIment cffects being highly variablc. leading to unreliable estimates of the sample size required. Second. the acblal cffect sizc used in the original sample size calculaaion is not an cxpectation of the magnitude of the effect. but is more like Ibc: clinically n:lcvant diffcn:nce. Using a variable point estimatc to determine the future sample sizc might actually not be achieving the desired objectivc. They makc the following mxJIIUIICnciation: "Before implementation mc:thods that modify sample size based on the interim trc:atmcnt effect cstimaa. it should be strongly considered whether Ibc: sample size detenninalion objectives can be achievcd.. statisticallyand procedurally. using cither an appropriate group sequential scheme or an adaptive scheme that does not utilizc the interim observed effcct for rccstimaling sample size.' It is also n:commcnded that the number of rccslimations are kept to a minimum and usually the objective can be achievcd with one. It is also n:commcndc:d thai following a sample size n:estimation the total number of subjects should be either incn:ascd or stay the same. Rc:ducing patient numbers can have inherent problems. particularly if a reAS quired amount of safcty data is mauired. Cb......-steID. C.. Aadmioa, Ie.. Gdo. P. ad CGlUas, S. 2006: Sample size KCSIimlllioa: a .mew and rmmuncadalions. D11Ig Injomltltion Joumal40, 47~. Goald. A. L 2001: Sample siB: m-escimalion: KCeIIt developments and practical considerarions. Slatist;C'! in Metlitine 20. 2625-43. Leh.le_, W. aDd W...... G. 1999: Adlllthoe sample siB: calculation in group sequential trials.
Biometrics SS, 1286-90.
sampling distributions These are
PROBABUfY DIS-
TRlBl7I1ONS of slatistics
calculated from random samples of a particular size. When we draw a sample from a population. it is just one of the many samples we could take. If wc calculllle a slalistic from the samplc. such as a ).lEAN 01' proportion. this will vBI)' from sample to sample. T1Ie means or proportions from all the possible samples form the sampling distribution. To illustrate this with a simple example. we could put lats numbcn:cl I to 9 into a hat and sample by drawing one out. n:placing it. drawing another out. and so on. Each number would have the same chanc:c of being chosen 407
SAMPUNG METHODS-ANOVERVIEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
(.,
Singledgit
0.12
-
0.1 I' 0.08 !!I I 0.06 10.D4 0.02 o-
Mean of two digls
0.12
0.1
r 0.08 -
!go
.! '0.06
10.D4
0.02
o-
(b)
• • • • • • 1 2 S 4 587 8 9• Digils1to9
• • • • • • 567 t 234 8 9 Mean 01 two dials
_mpllng dl..r1buUo.. Samp/ingdistribulion forasing/IJ digltdtawna' randomsndforthe mean oIlwodigJlsdmwn together
each time and the samplingdislribution would be as part (a) in Ihc ftgun:.. Now we change Ihe pnx:cdure, draw out two lOIs at a lime and calculale the average. .There are 36 possible pailS, and some pairs will ha\'e the same 8\'erqe (e.g. I and 9. 4 and 6. both hawq the average S.O). The sampliq dislribution of this average is shown in part (b) in Ihe figwe.. Notice that it has a clilTerent shape 10 (a). The sampling distribution of a stalistie does nul necessarily have the same shape as .the distribulion ortheobservations themsel\'eS, which wecall the pamal distribution. If we know the sampling clislribution it can help us draw conclusions about the popu~tion from Ihe sample. using CONFIDENCE INlERVALS ud SIGNIFICANCE 'IESI'S. We often usc our sample statistic as an eslimate of the corresponding \'lllue in Ihe population. e.g. using the sample mean to eslimate the population mean. The sampling distribution leRs us how far Iiom the population value the sample statistic is IikcJy to be.. In most cin:umstaDcles, we do not kaow what the sampling distribulion is. HoweYCl'. we do not m:ecI to take many samples 10 estimate it. We can do this from a single sample anly. Thc:oIy tells us inlo whal general ramily ofdistributions Ihc sampling distribution wiD fall and we can eslimate which member or this famil), the sampling distribution is. For example. if we roUow a case series or 100 patients and find lb.t 89 of them have D satisfactory oUlcome. we would expc:ct Ibesampling clislribution from which thc SIalislic 89 comes to be a member or the binomial family. '1bc puticular member
of that family is eSlimalcd 10 be the BINDMIAL DLmtIBU1ION willa panunetcrs ,,= 100 and p=8WIOO=0.89. For Ihe mean of a large sample, we waUIcI expect a normal sampling dislributian and we estimate the IDCIU1 and VARIANCE of thi5 NalMAL DlSl'RlBlJTIDN from Ihe mean ad variance of Ihe sample. If Ihe sample statistic i5 used as an eSliI1Ulle, we call the STAND.UDDEVlA11ONoflhe sampling distribution lbeSl'ANIWlD ERROR. Ralher conftasingl)'. we use this lenD both for Ihe unknown slandard dcYiDiion or Ihe sampling clislribution and far the estimalc or this staadanI deviation found fram the datL JMB
.ampllng methods - .. overview
Sampling is a way of choosing a subset or Ihe population or interest. wilhin which it is easier to study the propcIties that an: of interest within Ihe main population. The population is Ihe gmup ofpcaple or ilCms under investigation. The population may be small. IlIIIc or infinite. '1bc papulation is the entire set or units 10 which Ihe results will be exlrapollllcd. Once Ihe pn:c:ise puIJKISC or D study has been defined then Ihe target papulation should be decided. '1bc aim of the praccss of data coDeclion is 10 draw ccmclusions about Ibis population. A sample fmmthe population is the set of aD units about which information is coUccted during Ihe invcstiptian. The sampling units are Ibc individual memben that make up Ihe population. It is important. that Ihe sampling unils an: clearly dcfiDecl and cam must be taken to choose the canm
___________________________________________ sampling unit to answer the question under consideration. A sampling fnunc is a list of all sampling units in Ihe laIgel population. II may nol be possible. or indeed be lOG expensive or time cOMumin,- to obtain a eomplelc sampling rnunc ror the population. It is important that if one is used it is as accurate and up to dale as possible. It should be as free as possible from omissioM and duplicaliOM. Ir MULTISTACE CLUSTER SAMPLINO is being uscd. then more than one sampling rnunc will be needed. In this case one sampling fiamc will be needed for each stage of sampling. Tbc sampling units ror the first stage of sampling are called the primary sampling units. Those for Ihe final stage of sampling are called the listing units. Infonnation is collected using a survey. 11Icrc IIIC two diffen:nt rorms of survey: a census and a sample survey. A census is a survey thai includes every member of a population. A census is often canied out if the population is small enough. In many COUDtries. a population ccosus is canied out every lOyears. When populations are large then a census may be very expensive and time eonsuming. If the population is very large it might not be possible to survey every member. In some cirannslanL'lCs it may not be sensible to call')' out a census. When a census is not possible or is lOG difficult then a sample survey can be used instead. This is where less than the entire population is included in the survey. Ifa representative sample is taken then an accurate pictun: of the overall population can be obtained. Valid inferences can only be made from mndomly selected subsamples. BIAS contaminates the study ir the samples are chosen nonrandomly. SB.E.C11ON BIAS is the most common form of bias in samples. The statistical objectives of an investigation IIIC: to make inferences about a population by analysing sample data. to make assessments of the extent or uncertainty in these interferences and to design Ihe prOClCSS and exlCDt of sampling to fonn a basis for wlid and accurate inf~nces. Samples can be chosen in two ways. by PROBABILrrYl random sampling or by nonprobability/nonrandom sampling. Random sampling is where Ihe probability of getting any particular sample can be calculated from a probability model. Usually each unit has a known. possibly equal. probability of being chosen to be included in the sample. In nonprobability sampling there is an uneven and unknown chance of being included in the sample and should only be used with caution whco making infc:rcnce to a general population. Nonprobability sampling is often used, however. as it can be less expensive and time consuming than probabilistic sampling. Then: IIIC several examples of nonprobability sampling. CONVENJENCE SI\MPLES an: chosen for ease of ac:uss. They might be patients within a clinic or docton who work in the same hospital. Snowball sampling is where the lirst respondeat n:commends a personal contact or a friend. and so on
SAMPUNGMET~-ANOVsmnsw
until no new members or the sample are found. Purposive or judgement sampling is where the investigator decides who should be included in the sample: they IIIC usually selectc:d to be repn:senlalive of the population. For example. to estimate the number of blood samples drawn inayeMin aclinic. a rew typical days could be ChOSCD and the records reviewed. The main problem with this sort of sampling is that there is no insight into the reliability of the estimates. If only a few days could be looked at then nonnndom sampling might include some atypical days that would make the estimate inaccuralc. A case study is limited toone group or. in Ihe case ofN of I. to one individual. In a QUOJ'A 5,WPLE.1he sample ischoscn soht there are a ccrtain number or units or individuals in each catc:;ory. This is often used with convenicnee samples: the person call')'ing out the COII\'cnience sample may be told to interview a certain number of. say. males under 25. usually with no instruction on how to select those to be included. Tbc simplest type of random sampling is the SIMPlE RANDOM SAMPLE.. This can be canied out in two ways. with or without Jq)lacement. Simple random sampling is equiva1cot to putting all the units in a hal and draWing one out. WheD the sc:c:ond selection is made then: is a choice of replacing the flnl unit first or nol. In the fonner case each item can be drawn more than onoe: this is sampling with replacemcnL In the lattcrcasc each unit can be choSCD once at the most: this situation is sampling without replacement. Sampling without n:placcmcot is man: precise than sample with replaccmenL SYS1&IATIC 5AMPUNO is similar to simple random sampling. with each unit having an equal and known probability of being chosen to be included in the sample. In systematic sampling a random starting point is choSCD and thenevCl)' kth unit is chosco to be included. It should be cosun:d thai this docs not hide a pattern in the data: e.g. if every (k - l)th element has a fault then it is possible that this could be hidden by choosing every Icth element. If there arc obvious subgroups or 'slnda' within the population then STRA11f1ED SAMPUNO may be more efficieDt than simple random sampling. In this case the population is separated into the slrata and simple random samples taken in each of the slrala. Each stntum is then represented proportionately in the sample. If the population forms distinct slnda then stratioed sampling may give more pn:cise inronnation than a simple random sample and therefore maybe more efficient. Irthe populalion is arranged in a hi~hical structure thea multislDge cluster sampling could be used. In a single-stage cluster sample. a sample of the clusters is chosen at random and thco a random sample of units is chosen from within this selection or cluslcn. In a multistage sample a random sample of clustcn is chosen and then a random sample of clusters within these cluslcrs. This is repeated and eventually a random selection or units is chosen within a cluslcr.
409
SAS ________________________________________________________________
Ir a probability sample can be obtained. it is better to do lhislhan to obtain a nonprobabilit)' sample. as a probability sample should be less biased. A probability sample is representalive or the population aad can thererore be extrapolated to the population rrom which it was drawn. This is not the case with a nonprobability sample. There is no way or knowing how representative a nonprobability sample is or the population. The main sources or bias in sampling are lack or a pod sampling rrame. the wrong choice of sampling unit. nonn:sponse by chosen units. those that an: introduced by the penon gathering the data and sclf-seiection bias. It is important ir a sampling frume is to be used that it is a good one aad is up to date and rr= rrom duplications aad omissions.lrthis is not thecasc.then the probability that each unit be included in the sample will not be equal. as some units will have a probability or 0 or being included in the sample. as they are not in the fiame.. Ir. for example. the telephone din:ctol)' is used as the sampling fiame then all those who an: ex-dirccloly or do not have a phone will not be included. The: eleclOral register also misses people. Some units may have more than double the chance or another or being included in the sample. as the)' an: included IIlCR than once in the sampling rrame. n the wrong sampling unit is used thea the com:c:t inrerences might not be drawn from the results as a slightly different question might be being ansWCRd: e.g. ir individuals are sampled instead of households. then the same event or experience might be refem:d to by two individuals and 4XJUII1Cd twice instead of 0IIClC. Nonresponsc by particular units ma), be due to being unable to locate the particular unit chosen. the person may refuse to respond or the question may be misunderstood. QUES'11OmIL\IItES should be wonledclearly. be unambiguous and eas)' to understand: they should also be neutrally worded to avoid pointing towards a particular lapOlR. Tbc interviewa- can introduce bias by not interviewing people who look uncooperative or the wa)' in which a question is asked may inftuence the answer given to the question. Self-selection bias is due to lhase volunteering to be selected to be in the sample being systcmaticaUy different to those who have not voIunlecn:d. Self-selected samples an: unlikely to be representative or the population. The size or a sample to be taken is a very important alRSicleration (sec SAMPLE SIZE DETERMlNAnaN). There are many formulae in existence to calculalc the: ~ size for a sample. II is importanlto make sun: that the sample is luge enough to make the inferences n:quired from the sample. However. being unbiased is more important than the size of the sample and any sample is onl), representative or lhe population from which il was drawn. Only wilh caution should any extrapolations be made be)'ond that population. For funhcr details sec Crawshaw and Chambers (1994) and Levy and Lcmc:show (1999). SLV
Crawatlaw, J..... Clwablr5.J. 19M: A tondst! COIITJe in A Itm sta,islirs. 3rd edition. C1acltenbam: Stanley Thomes Publisbc:rs lid. Lewy, P. S. .... LemesJunr. S. 1999: Sampling of popu/aliom: melhods _ applicotiolU. 3n1 edition. Cbichesler: John Wiley &: Sans. Ltd.
SAS Sec STATISIlCAL SOFTWARE
saturated model
Sec I.OO-UNEAR MODElS
scatterplot An .~' plol of the values of IWO. usually continuous. variables that have been rcc:orded on a sample of individuals. Such plots have been in use since at leasl the 18th cenlury and they have many advantages for an intitial examination of biwrialC data. Indeed.. aa:ording 10 Tufte (1913): '1hc relational graphic - in its mlal rorm the scatlerpiOl and its variants - is the greatest of all graphical designs. llURks al least two variables. encouraging and even imploring the viewer lo IISSCSS the possible causal relationship bc:twcc:n the plotted variables. It confronts causal theories that x causes )' with empirical C\·idence as to the aclual relationship between x and )': Such a plOllinks the two variables. allows any ~lationship belween them to be visually IISSCSscd and may help in identirying OtmJERS or distinct groups of observations ("clusters'). 1hc appropriate scatterplol should ah'·tI)·:I be used when iDtCI"JRling the numerical value of an estimated correlalion between IWO variables. An example ora scauerplOl in which the mortality rate from malignant melanoma or the skin ror white males is plotted against latitude of the cenllc of theslalcforeacbslalcon the US mainJand is shown in pan (a) of the figure (page 411). The plot clearly dc:mons1nllcs thal monalit)' is strongly related to latilude. In many cases.. scalterpiols can be made more .-rul by adding the estimalc:d regression line or the lWo variables or a Iocall), weighted regression fit (sec 5CAlTERFIDTSMOO11IERS). BoIh possibililies an: iIIustralCd in pari (b) in the figure. Many other examples of interesting scatlClplots arc given in TUfte (1983). SSE (Sec also SCATIERFLOJ' )'lATRJC1iS) Tafte, E. R. 1983: TIw .inIaI display of qUlllllitalire in/ormation. Cheshire. cr: Graphics Plas.
scatterplot matrices
This is a convenient anangement of all the pairwise SCIJIERPI.DI'S or the variables in a sel or multivarialc data thai aids both the undentanding or the relationships bc:twcc:n the variables aad in uncxwering any unusual reatures or the data. e.g. possible 0U1I.IEJtS (Cleveland, 1994). In a scatterplot matrix lhe separate scallelplots an: ananged in the fann of a squan: grid with the same numbcrorrowsandcolumnsuthenumberofvariablcs.EKh panel of the grid contains a scatlcrplot or one pair or variablc:s. The upper left-hand lriaagle of the grid contains all
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCATTERPLOTMATRICES
•
• 220· 200·
220
• •
•
•
•
• •
• • • • •
180·
200
180
• •• •• • •
'160.
I
1&0
•• • • • • • ••• •••• ••
•
140·
1
••
120·
140 120
•
•••• • •
100·
100
•
30
35
4S
40 Latitude
(8)
• 30
35 40 Latitude
(b)
46
acatterpIot (II) Scaltetplol of motIIIIlty IlllBinst latitude. (bJ scaItetpIot .. In (a, with IocBJIy weighted I8(lression fit added 21
Chest
.
•
so. 28·
•
••
• •••
• ••
12
•
• •
•
•
•
•
• • • ••
•
•
•
•
•
•
•
waist
..
.. •
••
•
•• • • •• •
•
••
roll roM
• ••
••
•
•
••
•
• •
•
• •• • ••
ro.ro 40
•
•
•• ••
M. 21
t.
•• •
•
28
• • ••
..
•
• .•
at
• •
• • 3e •· sa·
•
•
42
roro 40 ro18 roll roM
• •
• •
•• • • •
••
• 40
•
• •• ••
Hips
• •
•
•
. 14 I·I
-
12
88
40
8Cldterplot matrices Scaltetplot maltbc of tlrtee body mtl8SUI8ments from 20 IndividJaIs
411
B
B
H
T
O
O
M
S
T
O
L
~
A
C
S
_________________________________________________
pairs or scallerplots. as dues Ihe I~wc:r rilht-~nd triangle. The reason for inCluding both the upper and lower lriangles in Ihe diagram, despite the seeming mlundancy. islh.. it enables a row and columns to be Visually scanned to see aae 'Variable apinsl all others. wilh the scale for Ihe one variable lined up along the horizontal or the vertical. An example or Ihe balic scatterplot malrix for Ihn:e body measurements laken OD 20 individuals is shown in the 8g~ (page 411). The diapam illustrates the 'VarYing SII'eIIgths or the positive relationships between each pair or variables bul also points to the possibility dial there are relatively diStinct groups or observations in the data iD the waistlhips scallerplol·. Here Ihc explanation of Ihesc p0ssible ·groups' is straiptrorwanl- they simpl)' ~ to Ihe 10 men aDd 10 womea in Ihe datL The panels in a scalterplDl . matrix can often be eahanced in some . way in an attempt to make the diqram mon: useful. An example is shown in the DENSlTY ESTIMAnoN entry. SSE [See also 'IUl.I.BORAPIII) CleYllad, W.B. 19M: VlrIMIIizingdatll.SwDmil.NJ:HobanPrea.
(a)
Degree
=1·smoolhness =0.25
!; = B(.~;) +e; where B is a ..mooth· function and the ., are random 'Variables· with mean zero and constant scale. Values Yi. used to ·estimatc' the y, at each :flo are round by ftIting polynomials using. weighted least squares with large weights far points near to X, and small weights otherwise. Thc:n:f~ smoaIhing. tabs place essentially by local .~III of Ihe , wJuc:s or observations havilll pmlictOl' values close 10 a lal'gd value.
=
140 100
100
)80
1000
3000 ·2000 Oxygen uptake
=
4000
=
.a 140 .
1 20 I
Degree 2 smoothness 0.25
(e)
4000
2000
1000
3000 Oxygen uptake
=
=
Degree 2 smoothness 0.75
(d)
140 1100
1
100
I
1000
3000 Oxygen uptake
2000
4000
I:
1000
Quadratic curve
(e)
1140 1100
,
I:
=
Degree 1 smoolhness 0.75
11
1 )80
I:
These an: SIIIOOIh. gcncraUy naapanunclric: curves added to a SCA11ERPLOI' to aid in understanding the relalionships bcIw=iI the two variables fCll'RliD& the plOl TIle)' are afteD a uselid akc:mative to Ihc IIIIR familiar pm1IIDCIric curves such as simple linear ar poIynomiairegn:ssion ftls when the bivariate data plotted is lOG complex to be described by a simple panunetric family. The simplest SCadClploi IIIlOOIIIcr is a lacaRy weipled relRssiim or loell:ljit. first SUlICstc:d by Clc:veland (1979). In essence., this appIOach IISSUIIICSIhat the variables .tt aad y an: relalcd b)' the equalioD:
(b)
1140
1 20
scatterplot smoothens
1000
2000 3000 Oxygen uptake
4000
8CIIIlerpiot 8IIIOOIhers Locally weighted regIfISSion IJIs for oxygen uptske data
2000
3000 Oxygen uptake
4000
_________________________________________________ Two panuneten conlrollhc shape or a locss curve: the fillt is a IIIIIDDIhiag panuneIer. G. with Wpr values leadilll to smoother c:am:s - typical values an 1/" to I. The sceoncI JNIIBIDCter, 1, is the cIqree or certain pol,nomials that an RlIal b, the methacl: Acan take wlues 1 or 2. In any specific application. the cboice of the two parIIIIICICI'5 mUll be based on a combination of judiemenl and of llial aacI emJI'. Residual plols may. however, be helpfUl in juclpng a panicaJar combinalion or Valuc:L 1'be useoflacaD,weidRdrc:p:ssion is~ialhc ftnI fipre ()JIIIC 412) for elida cxtIlccIed.the ox,pn upIIIIcc and thec:xpilal ~ilaiionofallUlllbcrorsubjeds pafannillla standanI acldse task.. In lids ftpn:.. padS (a). (b). (c) and (eI) show pIaIs of the cIata with added lac:all, weilhtal n:p:ssian ftts with cliffen:na wlucs of .1 and G. Hac the fOlD" lilted cUl'WSam very simi_ aiKI. ia thisn:lalively simplc:casc. eachofthcm isalmost icIcnIicailDa fined polynomial containillla quadratic tcnn in oXYlCn upbIke - sec (e) ill the Iipn:. An ahcmalive smaothcr that CD often userully be applied to bivarialc dllla is some fcxm or $#lime jundiDn. (In its nontcchnical use. a spline isa tcnn fOl'a lexibleslripof. . .1 or rubber used by a cIndtsman 10 draw curvc:s.) Spline fanclions 1ft pol)'DOlDials withia intervals of Ihc .Y yariable thai an conneclal &crass difl"cRat values of .Y. 1'be a;ecaad ftglR, for example. shows a line. spliae ruDCIion. i.e. a piecewise IiIlCal' functiaa, of the farm:
I(.Y)
=/Jo +Jl.X +"2(X-tl)+ +"3(X-b)+ +JJ..(K-C")+
when:
("t. = o. =
n.c interval ENDPOINIS.
0.
u
>
0
u
S
0
Q.
5 4
3 2
~--~--~--~--~--~--~
o
1
2
3
4
curved functions well. The pmbIem is ovcn:omc by usiq piecewise poI)'IlGIIIiais. ia particular c:ubic:s. which have been found to haYC nice pmperlies with Saod abilil, 10 ftt a of complex Rlalionships. 1ba Rsult is a cubiC" which arises fonn..ly b, seeking a smooIh curvcg(x) to SUIIUIUII'ise the dependence of y on .Y, which minimises the expn:ssion:
.1.,
wric:I,
.k::&(XIIP + ~±l'
(x)2ch
the secon deri\'llli~ of g(x) willi n:spect 10 .Y. Allbaugh wilen wril rannally dlis crilCriaa loaks a liltle fannidablc., il is n:aliy nothilll mare dian an eft"od to pem Ihc ~ between the: soodneSHd'-lilof the daIa (as mc:asun:cl by ~ (),_,(x;»)2) and the 'willliness' ar departure of li!Juil, or g as measun:cl b, (x)2dx: rar a Unear function, this Iaatcr part would be
wheM If'(x)
t;:"n.c
JNIIBIDClCr A. sovems the smaadmess or r~ willi values IaUlliq in a smoother CUI"YC. The solution is a cubic spline. i.e. a series or cubic polynomials joined III the: unique obsc:nred values or the ellplanatary Variable, :r,. (Far mare details. see Friedman. 1991.) TIle: 'erreclive number or parameters' (analogous 10 die number of parameters ia a parlllllClric fil) or DEGREES OF fREFDOPd or a cubic spline slDOOlher is generall, used 10 speciry its smoothness radlc:r aban A ~Iy. A numerical sean:h is lhen used to determine the wluc of A. conespondins to Ihc: requin:d dqn:es of fn:cdom. The complexilY ofa cabic spline is approllimatel, abe same as a polynomial of desn:c one less thaD abe dc:cn:cs of rn:edom. However. die cubic spline IIDOOlbcr out' ilS parameters in a ~ even wa, and hence is mucb man: ftcxible aban polynomial rqn:ssion. We shall illusIraIc the use of cubic splines by filliag such a curve 10 the monlhly deaths rrOm bnJachilis. emphysema and asIhma in the UK Ii'om 197410 1979 far men and WGIIIen. A &ealtcrplot of the data and abe filted cubic spline is shown in the dainl ftpn: (pap 414). For these data. Iocall, wcipled n:cn:ssion is DOl so successful in ftlpresentins lbe dlda. The fourth 1lIun: (page 414) shows a number ofplaiS orlhe data with addc:cI locally wciPtcd ~gn:ssion fils. again with dift"en:nl wlues or Aad G. Hefti thccluuactcrislkeyclical ~ortheclala is onl, picked up willi A= 2 and a = 0.25. In abe other Ihrcc diqrams the amounl or IIDOOlhilil is too patiO ftlyeal die structure in abe data. SSE em.
·s,..,..
b and c. an: calleclltntll6. The number or knots can vary accordinS 10 the amount or data available for fitti. the function (see HaneR. 2001).
8
~~~OTSMOO1Ham
5
8
x
acatterpIot ....ootherw A linear spline funcfion with knotsata= 1, b=3. c=5 The linear spline is simple and can appIOximale some n:Iationships. but it is not smoaIh and so will not fil hiply
CIeftIad. and
w.
So 1979: RaIIust local.y 'A'Ciptccl n:pasiaa
IIIIDDIhiDc scaaaplals. Jt»InIIII D/ 1M A_riRIII SllIIislimJ
Alllodlll_ 74, 129-36......... J. H. 1991: Multiple
adaptive n:pssian splines. AIIIItIb D/ SIIIt&lk$ 19, 1-67. Hamil. 1'. E. 2001: Regrns_ $/rlltqw6 wil6 oppIimliDM ID 1.111" rntNklJ. Io,islic rqrrlllitHr I11III :AllYiNi tIrIIII,oJU.
",,,.,6,,,
New YcIIk: Spriqa;
413
~~TSM09~
-i •
I,
_________________________________________________
3600 3000'
'8 2500
•
B·
2000
150C'-
,20
,0
40
60
MonIh
(8)
Degree. 1
smooII_. 0.25
•
I
'
3500
,1"3&00 '
0
,°
¥
'0 2500
.
DeGree. 1- smoo1bness. 0.16
(b)
•
°t#O
•·0
•
~
J
°
J j2D
1500'
1500
'0
,
° o· .0. • • ¥ • I t#O •••• ••• -• ••o , • • •• • . y° ••, • °,. • ••• • .0,.... '. , •
.•
.,
.
,
'
•
•
"
Ibdh (c)
j3D
DIaree" 2.srnooIhnB11 .0.25. ' .... '.
_ _ .2~.0.75
(d)
.'
•
.
1·'3&00'
°
'.
•°0
'0 2500
J
•
,
•
i' .I
.
'.,
¥
o'
...........
'0 2500:" ..
~
0
40
ManIh
•#0'
~
0,.
A.~ ,.
1500 . #,
1500
.'
•
__ .
....
.•
° .°0
,
~
.4\,
° .•
~~.~o~.-'~~~~. ,.~ ,.~
.,
".~."..
" ..
40
MonIh
.......01 .......... 'LocaIIy w8/gIrled~ IJIs for mOitIhIy tIeaibs 110m bioni:IriI~ in'lhe Vi( 1974-1919
___________________________________________________ ..... plot
1'his Is a SCA11ERllLGr of tile: VARIANCB
or
~NBsruaES
condilioa.lmplicitlo scn:caiq is lhe~arac:learly· ..apiSllblc 0IIIc0mC dud il IndicaIivc 01' pmctmical disease _ ... _pIiDn ....1carly di...... is bcacftciaI in same way, such .. beUcr propasi~ .....cr IIaImcnl, . . ilMliw: ....icaI pmccduIa. Jqhcr "quality of life' ar mduccd chances of lIIOItaIili. Examples of cliapostic. rats uscd in sc:mcnin&": 1.1IIIIIIIIIIOpBp, to cIetcc:t pmclinical bn:asI. caacc:r cIiscasc ia WCIIIICIi; .2. • blood tell 10 ~. ~ spac:ilc: _lipn (PSA). as hlp level. in·1IICD ue . . . . . . 10 be associak:cl willi pRc:linical disease of pnISIatc clIIICI:I';~. bIoqd pn:ssun: ancIcholcstemllcve~8S"" ~Is of bolla ale IIIIIDCiaIcd wida canliac disease. Ie_nine _lSue nat widaaal COSIs(caslof'cx_nalian; c:05Is of I'aIse posiIiw !aUIII arising rrom follow..., IabamloIy pmccclun:s; casts ~ false nepliw n:sqlll uisii1& fRIIII
Ilia facIalS iD • factor ....ysis 01' Ihe campoaenlS in a PRINCIPAL COMPOKENT ANALYSIS apiRIIlheirllANKS ia kInDS of lDapituclc. The plat can be usccI ~ pnwidc: .. ial'armal estimate or Ihe a_ber or faclan (campollCllls) by n:IaiD. inc as ....y l'aclan (CompaIICIIlS) as lhe~ are variaac:cs lhat rail ~fcn Ihe IasIlarp cIrap on lbe piaL An example of such • plat lhat sUJICSIS lhn;C racton is shaw. in lbe 81'11". 0Ihcr examples I n JiveD in Plachcr .... MacCallum (2003). BS.
lSee KAISa's 1IIlE) ......... Ie. J. .... ~ .. C. 2003: RcpIirias . . Swift's electric fa:Iar ...,.. 1DII.'Iainc. Ulldtrsl. . . . SItItillia 2., 13-44.
ral. hope of cIi.....rn:c SIIIIaS). ScRciaiIil studies an: clcsiped to quantify: Ihe aabR.af die 1M:aeftt' (c..c. mlucliaa in lDD.IIaIily. cxlalcled sunival time, . . . .n:safqualilyaflirc); the tlqCl papulaIion .... is
.....nlng atudIas
1'hac am pI~ imatiiations to cIeIaniiac Ihe cm:ct of adminiltcrin& a diacDDllic rat to dc:II:cI Ihc prcscncc 01' aa-nce 01 pnlClinicai disease iD asymplDmalic indiViduals~}.. 1he ClKXlUllter is ia~ IiIllCCl by ~ bcakh pmfcIsiD~ ...... Ihan by ilia palicntsince 110 chical IOYmplGIIIS an: appuaat dud athcnvisc waald dri\le the paIicallo seck medical diapDsis. The .... of scRlClliq is to sepanIa Ihe papallIIiaa iIIIo twa IftXIP5: thOse with. hiP vcnualow prababilily aftbc giwn cIisordu~ lIIIIIIIIy one thai is pem:iwId to 1M: • serious public IIeaIth
expc:cled to benefit fiam SCn:cnilll (in terms or accIIcndarI c....ic paups): and tile cnar I8ICS (falsc positiws and false ncplives). The NLSE PtismvE RA1E is the liiIoMaaJry.lhat the b:sl ISSCIfS "cllC8Ie' wIleD. in rad.. 110 disease is pn:1iCDl (eanwncly. SIIf.CIFICIR, 01' the prababiIiay of ablaiailll • ncptive raub when eli..., is abHnl);.1he Mua lIIIIM'IM
3.5
1.0
·0.5 I
I
I
I
I
I
I
I
I
t
2
3
4
6
8
7
8
8
I
to
CoqJonent runber
..... plot Asaee plot Iot',. pdndpIII~ . . . . . 01. comJIatIon ",.,. of 10 oIJsenIfKi ~1fabIes
41&
SCREENING STUDIES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ RAn is Ihe probabilily that the lell asserts 'no disease' when in fact diseasc is preseat (convenely. SDISITMI'Y. or the probabilily of obtaining a posilive result when disease is pn:seat). Fordiapastic p&apascs. sensilivity should be hip. while, for scn:ening purposes. spcciftcity should be high. to avoid unnecessary follow-up tesling of disease-IRe individuals. Scrc:cning lc:sts ~ indicated when Ihc: bc:acfits are judged to outwc:ilh the potenlial drawbacks (cOSls, risks of false pasili~ and false: negatiyc:s. etc.). Bceause nonranclomisc:d trials are subject to selfSEI.EC11ON BIAS. randomised scm:ninl trials offer the bell and moll reliable mechanism for cvalualil1l the patenlial bencfitliom scn:ening.1n mndomiscd sc~ing trials. study ann participants are olTered scn:ening aI relular intervals and the CODIrOI ann participants follow their "usual medical care". Due to Ihc: cost of sm:cning. such trials ~ usually canduck:d using a "Ilop«n:en clesill", in which sm:cning is offen:d for a limited time only (e.g. annual scm:ns for 3 to 5 yean). Several imporlant dilTercaees between scr=aiDg studies. used 10 evaluate the potenlial beneHI of a scn:eniRl inlervcnlion, and CLINICAL 11UAU. for Ihc: evalualim of a speciHc thempc:ulic inlc:rvenlim. . . . . Ihe dcsip and analysis of scn:ening studies very challenpng. In clinical lrealment trials. the cases are specified in Ihe pnJlOCOlto be comparable in both the llue1y and conbol anus of the trial: in scn:ening trials. participants are initially asymplOmatic and are nmdomised 10 the study ('offered scrc:eaiRl') or cxmlrol ("follow usual medical care') arms of the IriaI and cases evolve as the study progresses.lflhe scn:cning tell is successful, then cases will arise sooner in the study arm Ihan in the control ann. so survival limes (time ofENDIIOINT minus lime ofdiagnOSiS) will be: longu in the study arm Ihan in the conlrOl ann. even in the absence of a scn:ening bencHt. nais BIAS in the evalualim of scn:ening is known as "lead lime bias'. Also. because cases with laager pre-clinical disease durations ~ IDCR likely to be: detected by scn:cnil1l than cases with shorter pre-clinical durations, the cases that mise in the swely ann of a sm:cning trial are IlIOn: lilcely to be: less aggressive and Mace havc a more fawurable prognosis. even in the absence ofa scn:ening bencfiL This phenomenon is kaown as LENOTII-BlASm SAM. PUNO. Study ann participants also experience 'owrdiagnosis bias', orlhe tendency oflhc scn:ening test tosUlgest apparent butlnlly nontluaaening disease. (In an ideal world. this bias would not affect the laults if furthc:r diapastic tells later eliminate lhese individuals as cases of disease.) Finally. noncompliance in both arms is inevilable: some participants in the study arm may refuse scn:caing. while some in the control arm may seck scn:ening. 1bus.1hc: cases that arise in the two arms or a scn:cnilll trial may not be comparable as they are in a treatmenl trial. RANDmIISA110N e.un:s that the participanl cblll'8ClcriSlics are the same in both arms. includiRl thole that lead 10 noncompliance of either type in
eilher ann. alJuil1l for an IN1EN11ON-TO-TREAT analysis (Byar
etlll., 1976). 11ae mosl common measures used tocvaluate sm:cning are reduction in morltllily (comparison ofdeath ndcs) and melln beneftl lime (difference in the FatEAN surYiyailimc between the time of entry inlo the trial and the case endpoinl). Randomisation ensures t"t: I. the participanl charactc:rillics are the same in the two trial arms. including those thallead 10 IHIDCOmpliance of either Iype in either ann. and 2. Ihe elimination of bias due 10 lead time. when survi\'aI is mcasun:d from the time of entry inlO trial. Stalislical methods to estillUlle the benefit (n:duclion in mortality or exacnded survival time). lead time and Ihe elTect of length-biased sampling have been proposed. For overviews of the issues related to scn:enil1l and for statislical methodology of design and analysis of scn:enil1l studies. sc:c Zelen and Feinleib (1969), Zelen (1976). Goldberg and Wiltes (1981). Prorok and Connor (1986). Oastwinh (1987). Shapiro el ilL (1911). Prorok. Connor and Baker ( 1990). Connor and Prorok (1994). Kafaclar and Prorok (1994. 1996.2003,2(05) and Baker. Kramer and Prorok (2002). Scn:eniRlstudies are also used 10 evaluaac the outcomes of designc:d trials to scn:en chug compounds for their potential to be biolopcally active. A lypical drug scn:eniRl protocol may involve seyeral slqes basc:cl on the respoIWC of Ihc compound 10 "arious reactions: e.l. 'Conduct experimenl I: if the ellCll)' from the readion is less th. a spet'ificd leyel. rejc:ct the compound: otherwise. caacluc:t expcriment2: iflhc: second n:aclion is less than a second spccinc:d level. reject: otherwise. submit the compound for further tellil1l.' 11K: evalualion of such drul sm:ening desilns involves the same kind of considcndion as the evaluation ofnndomiscd scn:ening trials used on human subjc:cts. dc:saibcd earlier. See Roseberry and Ochan (1964) and Schultz el ilL (1973) far designs and analysis of drug scm:nil1l bials. as weU as related micles in Ihc: literalure on sm:ening designs to delccl unacccplable prvducts in manufacturil1l. KKa
Babr, s.. G.. Kraier, _. s.. ... Prorak, P. Co 2002: Slilistical iuucs in nndamiscd trials of cancer SCReninc. Britisll MrtfimJ COIIIItil Met/kat ReJetuth MethDtJology 2. II: www.biomedccDlr'II.camII47 1-22811211 I. D. p.. SlIDDII, R. M., FrIedtftId. W. T. d.L 1976: Randamiz.ccl clinicallrials: pcrspecli\'cs an some nxentideas.NeM'EngItmtlJoumtllo/Metlicinr'19S. 7~.c.aar. R. J. ad Pnnk. P. C. 1994: Issues in the IDOIlaIity analyses of randomized controlled lriaIsofcancer screening. Co"""lIedClinimJ Tl'iDu 15,11-99. GMhrIrIII,J. L 1987: 'l1Ic stalisliall pnxisian of medical sc:lmIing prac:cdurcs. Slalu'imJ Scitlrtr 2. 213-31. GoIdIIera.J.D.... Wltt-.J. T.1911: Theevalualionofmedical scfCellinl praa:cba. 71re Amtriam Slatistician lS. ~II. Kafadu, It. ... Prorak, P. C. 19M: A cIata....ylie IppiOIICb forestimaling lead lime and SCReaiDg benefit based on sum,... cun'a in randomised Irials. Slal&tit:l in MedidRe ll. 569-86. KdIdD', K. ...
-JU.
PnnIc. P. Co 1996:
Campuler simulatian experiments of
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SEAMLESS PHASE IIIIIITAIALS IIDCbaizcd scrceDinI trials. OlnrpIdllliDlltll SIIIIUlirs IIIIIl DtlIII AIIIIIp& 2~ 263-91. Kw....r, Ie. ... ....., P. C. 2003: AItcraatiw 4dini1ioas of camparablc case paups ... cstiIaaIcs of lead tiJDe ucI bcac:fillilllC in 18IIdaaIizcd CIIDCCI' SCRCniB& trials. SIIII&llu in Ma/i,,"21.13-111. KIIfadu, K.... PI'anIk, P. C. 2005: CanIputaIiDMI mctbDdsin lDdcai dccisiDDmakinc: losmcaornot to sm:cn? StlllUlira if MerliciIw 24.509-81. PronIc. p. c. ... c-r, .. J. 1916: Smaing far die early dcfcctiDn of cancer. Cllllft'r IIIIYJt;glllIMf4. 22S-ll....., P. c., C--, .. J.... ...... S. G. 1990: StaIi.sIicaI coasidcmlians ia cancx:r smcainl ...,....... UroIo,ie CliRir~ 0/ N",," ARlerim 17. 699-708. .........,..T. D.... GeIIaa. It. A. 1964: BitHMtrin 20. 7~. ScIndIz,J.... NII:IIoI, F......... G. L _ Weed,s.. D. 1973: MuIIipIc-sllp ..-edIns far dnas SCRCDiDI- Bitllft/rin 29. 293-300• . . . . . , s.., V.... W. . . . . P. _
.JtTft"'"
V..... L 1918:
PnilHlir lor br. ., alllar: I. WI" ___tv' pia propl _ ila ~ 1fJ6J-/986. Baltimare: Jalua Hapkias Uaiversity Pn:ss. ZelIa. Me 1976: Theory of culy dccedioD of' 1lRastc:anccr in abe pncal populalian..ln Hcasaa, J. c.. Maubdem. W. H. and Rozcawei& M. (eels), BmuI «IIItv'r: tlYlItb ill fr~b IIIIIIlr«ll...,. New yadt: Ra\a Pras. ... 287-301. ZellD,M. . . . ......., M. 1969: OIl abe Ihc:oIy of~ far dnaic discascs BiDlrwtrikll Sf;. aJl-ll.
_m.... Phaaellllil trials
Tmdilional"'g devel-
opment follows several distinct phases of development through 10 n:gislration. PHAsE 1lRL\L5 8M usuall), followed by PHAse II TIIALS in arder to choose abe optimal dose:., and
dlen afta' IOIDe plannillllimc PIIA5I! 10 1RL\L5 ~ iniliatcd. Althaup it is highly ck:sirable to Jeduce the lilDe bctwcca Phase U and Phase 01 there is ...uy a minimum lime Ihaltcaaas wMtlo spcad plaaliiJlS Phase III. It should be an objc:ctive to n:cIuc:e this lime. ODe possible way of n:ducilll this lime is to carry aut a seamless Phase uno clcsip. In this type or design the lime between Phase II and Phase IU is !educed and the two an: combined into a silllle llial. An adaptive sc:amIcas IriaI is one iDwhich abe Bnal anaI),sis wiD use daIa fium palic:allcnrollccl beron: Mel after the adaptation. The PYCS an iUustmlion or the dilfCI'CIKIC between .lraditional Phase IIIPhasc III appraach ami a seamless Icarn/canfinn appmac:b. 11Icre ~ certain eonsiclcraliaas that nc:cd to be laken into account for a sea_ss desi&n 10 be feasible. The mast important is the tilDC a patient needs to be rollowed to reach abe ENDPOINT, which is to be usccIto make the cIasc: selection. If the lime to reach the cndpoiDl is short in n:lation to palicnt MCruilment lllen enrolment eM coalinuc while abe decision is made and IIIe number of ovcnunning patients•.i.e. those rancIomisc:d to cIasc nDllaIccn rorward. will be minimised.. Howe~, if this time is long thea the number or overrunning paticnts will be RlOIe sisnificMt ad a seamless slUci)' less applicable. It is also advisable to usc: a well-cstablishccl endpoint
_un:
or sunvialc IIIIII'ker (~ 5U1tROO.QE EJUIOINTS)
whc:a
,
DoscA
I
• .1• •• • .j•
OaseB Oasec OaseD
•
i-
Plaa:ba
•••
••
I'hucD
Dose A
••i
: ~Whilc S&*e--.c.l
••• ••I ----------------~:~---------------:------~---------------+. •
•!I
~
•
••
;
OaseB Oasec
:• ~
OaseD
Placcbo StqeA -1.canIin&
Slap B - CCIIIfinniq
.............. 11111 t ..... Compstison of the tndIionIIJ PhIIse IIIPhass 1111IPPfOIJCh (lop panel) BIId seamless IeamitJt;conIImJlng (bottom p8IJfIIJ 417
SECONDARYENDPOINTS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ implcmentilll a seamless clcsign. Whe~ Ihc goal of Pbuc II is to establish an endpoint for Phase W. it is likely that II seamless IriaJ will be IM."CepICd. Then: 1ft also some 10gislical considcndions that ncc:d to be taken into account. puticularly IIR1UJId the drug supply and drug packagilll. 10 Ibis extent chug development prDgl1lllUl1CS thai do not have complicated or expeasive n:gimens 1ft IIICR suited to seamless designs. Regulalor)' agencies are also likely to have many questions around the use of a seamless study and IR likely to n:quire a second conRnnatory study. It is unlikely that a seamless Phase IUIII study will be accepted as a single conllnnatory study. One other cOMideralion is that of maintainilll the blind until all data 1ft frozen at the end of the Phase IU part. This then mauies the use of an independent data monitoring aJIIUIIillee to make the decision to swi~h 10 Phase ilL Then: 1ft many methods of analysis thai can be used in a seamless study. All methods mUll be seen to conlrOl the overall TYPE I EJUtOR. as this is a paramount mpliement for the egulatory agencies and is not negotiable for a conRnnatory study. Todd and Stallard (2005) consider a group sequential method (see INTERIM ANALYSIS) that incorporates a In:atment selection based on a short-tenn endpoint, followed by a contlnnalor)' phase that uses a longer tena endpoint. Bauer and Kieser ( 1999) consider the use of P-VALUE combination tests in a seamless llial by combining die information befo~ and after the adaptalion. Inoue. Thall and Berry (2002) and Schmidli, Bretz and Racine-Poon (2007) consider Bayesian decision rules to decide which dose to take farwanlto the confinualor)' phase. However. while Inoue, Thall and Beny (2002) use BAYESIAN t.tETHOOS in both the Phase II and Phase IU parts of the study. Schmidli. Betz and RacinePoon (2007) use Bayesian methods for dose selection but traditional frequenlist medlods in Phase III. using a combination lest to control the 1)pe I c:nur. Whichever melhod is used. simulations need 10 be carried oul 10 understand the operating characteristics of the design. Submission of these designs 10 regulalory qencies would requi~ these simulations. A I1IOIC thorough coverage of the considerations. operational aspects and examples are given in Maca el al. (2006). Sponsor rcpesentalion in seamless designs is conbo\'USial. but in a seamless adaptive design there can be more motiValion for sponsor participation in the dose decision process. AB ....... P. aad K_r, Me 1999: Combiaiq di«crent phases ia die devcIopmcat of medicalln:IIInCIIlS within a siqlc biaI. Sttllislia ill Mt!tllciM 11.1133-41.Inoae,L Y. T., Tlld,P.F.aadllerrJ,D.A. lOOl: ScamJcaly cxpandiq a nncIomimi Phase 0 lrialao Phase III. BitNMtri('s SI. 823-31. Maca,J............,... DnpIID, V.,
s..
Gale, P..... K ...... M. 2006: AdaptiYc seamless Phase uno ISpccIS and cumplcs. Dr., lII/orrnatitHI JtlllmGl40, 463-73. Sc....... B., BnIz. F..... . . . . ....., A. 2007: Bayesian predictiyC pD\\'CI' for iatcrim adaptaIion in SC8IRlcss Phase l1li11 trials ,,'hen: the eadpoint is survival up to IilOIIIC specified timepoinl Sttllalics ill Mftii('iM 264925-31. Tadd. S. .... St.uant, N. 2CDS: A ~ cliaical biaI daipC'ombiningPliascs D lad m: scqucnlialdcsipswithln:llmCllt scIcctian lad a chanp: of endpoinL Drllfl I"/Dr"""i",, JD11J71111 39. 1C»-18.
desilas - bKklraund. opc:ralianal
secondary endpoints segregation analysis
Sec ENDFOINTS
The obsemdion or characteristic sq~gation raaias among the offspring of particular pan:ntal crosses was first made by the: Austrian monk Gn:pIr Mendel (1822-1884) in his experiments on the pnlen peL These observations enabled him to fonnulate a theory or genetic transmission from plftnl to olTspring. Mendel studied discrete trailS (called PllEXOfYPES) in the garden pea (e.g. smooth versus wrinkled seed) and. after many generations of inbecdilll. obtained puK lines (widl unifonn phenolype. e.g. all having smooth seeds.ovc:rmany generations) for each trait. When lwo puK lines with different phenotypes (e.g. smooth and wrinkled seeds) an: crossed. aD the offspring (called the FI generlllio,,) wercor the same phenalype (e.g. all smooth). The trait that is uniformly praent in die FI generation is said to be tIonaillllnl. while the absent alternative is said 10 be receni~. When FI individuals &reCrossed with the recessive pure line (which is called a 'back-cross'). half the oft'spring had the dominant phenotype and the other half the ecessivc: phenotype. When two FI individuals we~ crossed (which is called an "inter-cross')' three-quarters of the offspring had the dominant phenotype and one-quarter the ecessive phenotype. Thesechancteristic 1:1 and 3:1 ratiosarecallc:d segregQlio" rQIiGS. Segregation ratios De explained by the fact that each individual ecdves a complete set of genes from both parents, so thai each gene is JRsent in duplicate. When thee De dift'e~nt forms or Ihc same gene. cadi fonn (or allele) may com:spond to a difl"c:rnt phenotype. but when an individual has two difl"c:ent alleles (i.e. is ',elerozyglRls),1hc phenotype of one or the alleles (the eccssive allele) is completely masked by the phenotype of Ihc other allele (the dominanl allele). Thus the FlgeneraliCID from lWO different pu~ (i.e. lromoz,·glRls) lines will be all heterozygous and theefan: display die dominant phenotype. A back-cross will rault in half die offspring having the heterozygous genotype and the other hair having the homozygous recessive genotype. An inter-crass will result in hair the offspring being heterozygote. one-quarter being homozygous dominant and one-quarter homozygous recessive.
___________________________________________________________ Classical se~galion analysis is the examination of the offspring of different mating types to see if Mendelian segrqalion ralios ~ present. When such nlios are observed. the inference is made lbat the phenotype in question is dc:tennincd by a single undc:rIyiq genetic locus. Complex segrqalion analysis is a furthu development orEbis method for lrailS in which Mendelian sepegation ratios may be masked by complexities such as the involvement of backpound genetic or environmental factors in addition 10 a locus of major effect PS (See also ALLBJCASSOC'IA11OH. 0EHE11C EPlDDOOLOOY. oemnc UNKAOE.. OEHOfYFE. PHENOI'YPEJ
systematic difTc:n:nces between those who are selected for study and those who an: DOl selectc:d. :so lbat Ihe selected sample is nol n:paaentalive of the larget population. For example. in a survcy of the ImOking habits of 14 year olck. a convenient sampling fnunc would be cbildnm attcnding schools in a deftned geographical area. Howcver. nOl all 14 YeaJ"olck will be included in Ibis sampling frame and if the n:asons for exclusion an: associated with the smoking habit a biased estimate of the prevalencc of smoking will be obtained. Another area where Ihe choice of sampliq frame might lead to BIAS is in telephone sampling. where households withouttelepbones would be sy5lellUltically cxcluded. Even when an appropriatc sampliq frame is used for a survey. nonrandom sampling can lead to biased cstimates. For cxample. in a Iludy or own:rowdiq. an appropriate sampling frame might be all households in an electoral ward or postcodc seclor.listed in order of postal addn:ss. Howc\lCr. a systematic sample of eyery eighth household might ovcrn:praent certain types of accommodation. such as ftaIs on a particular Roor (e.g. ground ftoor or top ftoor) in tenement blocks of eighL If the a\'CI"8ge number of people per household difren systematically between floors. this is likely to lead to a biased estimate of oyeraowding. Ideally. probability sampling methods should be used to avoid selection bias in survcys (see S.ulPUNO ME11IOOS - AN OVERVIEW). One type of study thai is almost ncYer caniecI out on a random sample or Ihe target popullllion is a randomised QJNICAL 1RIAL. Trials rely on random alloclllion to tn:abDcnt poups for their intcrnal validity but. because of lighl eligibility criteria for patient selection. those in the llial may not be n:praenlalive of all patients with the CUldilion beiqlRalcd. Epidemiological studies. especially C.o\SE-CamROL STUDIES. an: susceptible to selection bias. In case-control studies it can be exlRmely difftcull 10 obIain a aJllarol group thai is repraenlative of all noncases in the same target population lhal Ihe cases arise from. This can rault in biased estimates or the ODDS RATIO in either din:ction. dependiq on the form or selection bias. 1'1Icse issues arc discussed in dclail in Sackett (1979) and Ellenberg (1994). Even in ~fully designed
SENSm~TY
OBSERVAlIONAL SIlIDIES it can be difficult or impossible 10 rule
oul selection bias as a possible explanation for an observed associalion (Boydell el til•• 20(1). Selection bias can occur in many other contexts. For example. Kho et tiL (2009) describe how it can rault from the lCquirement few written informed consent in studies of medical mcords. WHG 80)'4l1li, J.. wan 0., J.. MrKeazIt, Ie. ,,111. 2001: Incidence of schizcJPamUa in ethnic minorilies in London: ecolopc:al study mto interaclions with awiroDmeDt. BriJislr Meditol JOIITIIII/321. 1336-8. £Ie....... J. H. 1994: Selc:ction bias in obsemdianal and experimeDtaI SIUda. StalUtics in Medicine 13. 557-67. KIID. M. &, 0aIfeU. ~... W...... D. J. ,t & 2009: WritlCll informed CODSCClI and selection bias in obsmational studies using medical ~0Rb: systcmDlic.mew. British MedimlJoumo/338. b866. DOl: IO.IIlCii bmj.b866. Sackett, 0. L 1979: Bias in analytic research. Jourlltll of Chrome DiJwIses 32. SI-63.
sensitivity "This is a mcasun: ofhow weD an altemalive test performs when it iscompaml with the refc:n:nceor "gold"
standard test for the diagnosis ofa condilion. Sensilivity islhe proponion of patients who an: correctly identified by Ihe lest as haying the condition out of all patieats who havc the condition. Sensitivity may also be cXlRssed as a percentage and is the counterpart to SPEClf1CI1Y. 1hc refen:nce standanl may be the best available diagnostic test or may be a combination of diagnostic methods. including followiq up palients until all with the disease haYe presented with clinical symptoms. For example. in a study of mammography. the reference standard for breast cancer would include aU ",omen who went on to develop brast cancer. whether they wem first diagnosed radiologically. histologically or symptomatically. Thus. Ihe best design when a diagnostic tcst is evaluated against a refen:nce standard is a COHOJn" sruDY with complete follow-up. When the data ~ SCI out as in Ihe table:
tI tI+("
•.• SenslliYlty =-
senaIIIvlty General table of test results among a + b + c + d inIIiIIidJaIs sampled Disetl:.e
Test
Positive Negative Total
Presenl
Absent
ToIDI
D
b tI b+tI
tI+b e+d tI+b+c+d
e tI+c
Sensitivily should be pn:sented with CONFIDENCE IJ\"1ERVALS. typically set at9S... calculated usiq an appropriate melhad such as that of Wilson (described in Altman el DL. 20(0).
419
~QUaa~~~SB
___________________________________________________
whidl will produce asyllU11dric conficlcnc:e inlcrValswithout impossible wlues. i.e. that will nalgive values for the upper canftdence interval> I wheD seRSitivilyapproaches I and the sample size is small. Where a test n:sull is a continuous mcasun:mcnl. eol.liver enzymes in serum. a cut-ofT point for abnormal values is chosea. If a lower wlue is chosen. then sensitivity will be ~ively high. but specificity matively low. The impact of all possible CUl-ofT points can be displayed paphically in a REtB\'ER OIUAJING aIARACIDIS'I1C (ROC) CURVE by pIottiDg sensitivity at e:achcut-offpoinl on the,. axis apinst I - specificity at each cul-offpoinlcmlhe oX axis. The choice: of' cul-off point is nal, however. solely a statistical dc:cision. as the balance between Ihe fa\I.SE IIOSJJIYE RATE and the FALS! NfXIATIVE RATE should be related to the clinical cantexl and consequences of' wrong diagnosis for the patienl and hcalthcare syslc:ln. A sample size calculation for sensitivity can be: made: by spcciryiag a coaftdcac:e iDte"aI (e.g. 9S4Jt,) and an acceptable width fOl'the lower bauacl of'the CODficlence interval. Where the anticipated SC:RSitivity is high and the sample size small. a ·small sample· methad should be used: a sample size table can be found in MachiD elllL (1997). CLC (See also UKEUIOOD RAno. NEOA11\IE FIlEDIcrIVE VAWE. POSI. TIVE IlREDICTIVE VAW£, DUE POSI1IYE RATE)
All..... D. G., Mac..... D.. B..,...., T. N. aDd GanIDer••L J. 2000: Slalillics willt I.'DftjiJeM~. 2nd cditiaa. London: BMJ Boob.
• ......, D.. C...........L, • .,..... P.... ......,..\.1997: SDmpk
si:e ,ables for cmiCilI Jtutlies.. 2ad edition. Oltfonl: Bladtwc:11 Sc:ic:nces LId.
aequentlal analysis
A mdhod allowing hypothesis tesls to be conducted on a DUmber of accasions as lhe: data aecumulate duou&h Ihc: course or a aJNJCAL 1RL\L A llial moaitoml in lhis way is usuallycalle:d a sc:quentiallrial. 'Ibis approach is iD mnlnSl to the usc of a slandanllixc:d sample size trial design. in which a single hypothesis tc:sl is candueled at the end of a lrial, usually when some specified sample size has been altainc:d. with no allowance: to collect ftarther data and n:pc:al the lest. Sequential analysis mc:Ihods DR: altractive iD c1inicallrials since. for cthicallUSDDS. il is often important to analyse the data as they aecumulale: and to stop Ihe study as 100II as the pn:sence or absence of a balmenl elTec:t is indicatc:d sufllciently clearly. Although the lOtai sample size for a sequential trial is DOl fixed iD advance: - it depends on Ihe observed data - an additional advantage: of seque:ntial mc:lhndolagy is that lrials may be caasaructed so that Ihe e:xpc:etc:d sample: size is smaller than that for a fixed sample size lrial with the same 'I)"pc I enor nde and POWER. Suppose that in a clinical trial we wish to compare two paups of patic:Dts. with one n:ceiviDg Ihe e:xperimenlal balment and the other Ihe COnllVllR:almc:DL Fonnally. we
define some mc:asum of the bQlmenl dilTemlCe bclwec:n the experimental and control groups. which we wiD denote: by tI. This trealmcnt difTemlCe may. for example. be measun:d by the dilTerence: belweeD the MEAN response for a nonaally distributed I!NDPOINI". the log-oclds ratio for a biDlU')" endpoint or the: log-hazanl ndio for a survival time endpoinL We generally wish to lest the NULL IIYIVIlIESIS that then: is no difference between the: treatment groups. i.e. that 11=0. ID a standard fixed sample: size tesl. some test statistic is obtained and compared with a critical value. The: critical value is chasen so as to give a specified 1}pe I envrrate. i.e. to e:nstR Ihat the risk of' concluding that thc:M is a bQlmc:Dl difference when. in ract. the lreatmc:nts are identical is controlle:d. usually to be no more than 5~. If Ihis sIandard h)'pOlhc:sis tc:sl is n:peated at a DUmber of' INTERIM ANALYSES. lIIIft ~ a number of opportunities to CUllCIudc: thai the trealmcnts are ditren:al. 'lbe risk of doing so on at Ic:ast one oecasion therefore increases above S4J,. so that the ovenll Type: I enar nde thus exce:eds S'I. and a valid lest is no longer pmviclcd. 'Ibis problem is addressed by sequeDlial analysis. iD which the mpc:atc:d hypothesis tests are conducted in such a way as to maintain an ovc:nll Type I enor nIe for Ihc: sc:quenlialtrial as a whole. Although sequential monitoring methods have been proposc:d based on a IDllge of possible lc:sl statistics (sec:. fareumplc:. Jennison and Turnbull. 2000. for a discussion of passible melhods) agc:neral sequentialapprvach is basc:d on the use or the eRicienl score statistic (see Whitc:head. 1997), as a measun: of lhe trealmc:Dl difference. Large positive: values com:spond to an indicalion of superiorily of' the experimental IJatmenl.lqe: Dc:gative values to an indication or superiority of Ihe control batmenl. while: values close: to ZCIO iadicate liUle dilfc:rmce betweeD the Imdmc:nts. The exacl form or the scom statistic cIepc:ads on Ihe type of data used and Ihe way in which the Imdment difTemlCe is measun:d.. As an example. for binary data. wilh the RaImc:IIl difference mc:asun:d by the: Iog-odds ratio. if equal numben of patieDls have received the experimental and cantrolwalments. the scan: statistic is half or the: difTemlCe: in absc:rvc:d numbers of succc:ssc:s on the experimental and mntrol anns. For survival datL with the Ireatmc:Dl diffc:n:nce mellllUn:d by the lag-hazard ratio. the sc:on: slalistic is the lag-rank statistic (see SURVIVAL ANALYSIS). ID a sequential trial. a number of interim analyses an: conducted. The value or the score statistic is calculatc:d at each interim analysis together with the: observed Fashc:r's information. a quantity mlated to the: sample size: summari~ iag the: amount of information available. If. at any interim analysis, the value of' lhe: SCIO~ statistic is sulliciently luge. the lrial is stapped and il is caneludal thai the experimc:Dtai treatment is superiol" to Ihe control lreatment. If Ihe score statistic is lao small. the llial is stopped aDd. clc:pcndiag on lhc way in which the: test is construclc:d. it may eithc:r be
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SEaUENTIALANALYSIS
concluded ....1 the experimental tn:almcnt is inferior to the control or that the~ is insufficient evidence to distinguish bc:Iwecn the two tn:atmcnts. If neither criterion is mel, that is ror intennedi.te \'slues or the sc~ st.tistics. the bial continues to the next interim analysis. Graphically. the observed \'alues or the liCo~ stalistic may be ploucd against the wlues of the inrormation. As. the infonn.tion available incn:ascs Ihroughout Ihe trial the ploucd points form what is called a SDmple path. At each interim analysis. the sample path is «llllparcd with upper and lower critic.1 values. with the bial stopped as soon as the SCOI'e stalistic lies either above the upperc:ritical value or below the lower critical \'alue. The critical \'alues. which. in general. lake diffe~nt values .t the diffen:nt interim analyses. thus define a continu.tion n:gion. As aln:ady explained. the problem of sequential analysis is the calculDlion or the critical valucs so as to gi\'e • specified Type I error mte, for example. or Sc.t. As the choit:lC of critical values to achieve dais aim is not unique. problems of appropriate choices ror use in a sequential dinicaltrial setting ~ also of inten:st. In particular. in contrast to fixed sample size hypothesis tests. asymmetric sequential methods ~ possible. A fixed sample size test tlud is designed to ha,'e specified power, say 9O'it. to detect a treatment efTecl orgi\'Cn size. say fJ = 0•• has equal power to detect the opposite bUtment efTect or the same magnitude. i. e. 0= -fJ•. A sequential test may be eonslnlcted to have power 0.9 to detect 0 =' •. but lower power to detect 0 = -fJ •. Such a sequential tesl may h.ve a smaller expected sample size when -0, than when fJ=' •• This is sometimes desirable in clinical trials.. when it is advantageous to stop a trial as soon as possible ir the experimental treatment appears to be inrerior to the conlrOl and there is no dcsi~ to continue n:cruiting patients to lest whether or not this inferiority is statistically significant. The method based on the score statistic is a very ftexible one. since,. as shown by Scharfstein, Tsiatis and Robins (1997) for a wide range or problems.. conditional on the observed information values. the seo~ statistics at the interim analyses are approximately normally distributed. This means tlud critical values can be obtained b.sed on this normality to proVide sequential tests that can be used ror many difTen:nt types of data and choit:lCS of measure ror the treatment diffen:nce. Two distinct .pproaches to the calculation of the critical valucs with which the efficient scon: statistics ~ eompan:d have been developed. The 8m. which is sometimes called the bounc/Q,ies applYHlch. is based on modelling a continuous sample path. The second uses Ihe .ssumed normality of the sc~ statistics directly. evaluating the critical values via a n:cw-sive numerical integnlion technique. wi'" the fonn of the sequential test often specified by wlud is called a spending function. The two .pproaches ~ described in detail later. A more general approach. the adaptive design method. which is not based on the asymptotic
,=
normality or Ihe scon: statistics.. is also brieRy described. After a brief example. Ihe problem of analysis at the end of • sequential bial is then discussed. We then continue with • description of the ~lated area of l'CSpOlUIe-dri\'en designs .nd end with some comments on the role or a DATA AND SAfETY MONIfClUNO Com01TEE in a sequential clinical trial. In the boundaries approach. the approximate NORt.W. DlmUBunoN of the seeR st.tistics ewlualcd at the interim analyses meBM that the observed valuesean be considen:d as points on a Brownian motion with drift equal to the trealment difTerence, obsen'ed at times given by the observed information. This has led to the considemtion or the .bstract conce:pt of continuous monilorin;. in which the value of the test statistic is taken to be observe:d at all times nther than at the discn:te times gi\'en by the interim analyses. Tbe plotted sample path thus forms a continuous line. which iscom~ with continuous boundaries. which may be e:xpn:ssed as runctions of the inrormation level. Many of the theOl'Clicai developments in sequential analysis have been based on considc:ration or this problem. A consequence or this ronnulation is that. since the sample path is considered to be continuous. the trial stops exactly on • boundary. whereas ror. disen:tely moniton:d trial. then: is some ovenhoot of the critical v.lue when the trial stops. The boundaries approach stems from the work of Wald (1947) who de\'elopcclthe sequential probability ralio lest (SPRY) for the: testing ofannamcnts during the Second World War. In Wald's SPRY after each obserwtion. the UKELDlOOD RATIO for the simple alternative hypotheSis ~lati\'e to the Dull hypothesis is calculated and the test continues so long .5 this likelihood r.tio ralls within some fixed nange. equivalent to the plotted values of the score statistic lying belween two parallel str.ight boundaries. Wald derived stopping limits so as to give a test with. specified Type I en'OJ' rate and powu under the assumption of continuous monitoring. Among all tests with the same properties. the SPRT minimises the expected sample size when either the null or alternative hypothesis holds. However. the parallel boundaries give • lest that. although itlc:rminates with probability 1. has no finite maximum sample size. This feature makes it unsuitable ror many clinical trials. Following the work orWald.. numberofaltemDIive ronns ror boundaries th.t maintain the ovmtll1)pe I error me ha"e been proposed. Whitehead (1997) describes a wide range of such tests. One form that is particularly commonly used in sequential clinical trials is the triangular test. This lest has straight boundaries that form a biangular-shapcd continuation ~gion. Tbe test approximately minimises the maximum expected sample size among all tests with the same c:nur rates .nd b.s a high probability of stopping with a sample size below tlud of the equi\,alent fixed sample size test. The critical wlues obtained using the boundaries approacb maintain the overall Type 1 elTOr J1Ilc for a continuously
421
~Quarr~~~s~
___________________________________________________
monitored tcsL In practice. monitoring is necessarily discrete. since even ir an interim analysis is conducted after observation of each patient. the infonnation will increase in small steps. This means that if the critical values from lhe boundaries approach are used. the "tYpe I error rate will be: less than the planned level of. for example, S%. Whitehead (1997) has proposed a correction to modiry lhe continuoWi boundaries to allow for the discretel), monit~ sample path. This com:ction brings in the critical values by an amount equal to the expected ovcnhoot of the disc~te sample path. The coJl'Cdion is pmticularly accurate for the triangular test. In gcnend. specialist software is needed for the conlilnlCtion of critical values using the boundaries approach. A commercially available software package. Planning and Eyaluation of Sequential Trials (PEST). is available from Medical and Pharmaceutical Statistics Research Unit at Lancaster University for the calculation of lhe boundaries. An alternative approach was a recursive numerical inlepation method for calculation of the overall TYpe I enor rate for a sequential trial with specified critical "alues under the assumption thal the scee Slatistics observed at the interim analyses are normall), diSlJibuted (Armitage. McPhcnon and Rowe. 1969). As wc)) as demonslraling the effect of eonducling interim analyses wilhout adjusting for lM.1IPLE 1'fSJ'INO. this method allows the construction ofcritical values to maintain an overall Type I enor rate of. say. S'i.. Using this approach. Pocock (1977) and O'Brien and Fleming (1979) calculated critical values for sequential tests that preserve the overall Type I cnor raIe to be S'i. when. for O' Brien and Fleming's design. the critical values with which the SC~ statistics are compami arc the same at each interim analysis. and. for Pocock's design. the critical values com:spond to the same P-VAWE for a conventional analysis perfonnc:d at each interim analysis. 11Ic critical yalues obtained were tabulated to allow eas)' implementation without the ncc:d for additional computation. Although these methods., particularly lhal pr0posed b)' O' Brien and Fleming. ~main in usc. they are not alwa)'s Ihe most appropriate designs in the clinical mal scUing. Pocock's design has been criticised because it has a relatiyely high chance of leading to ~jection of the null hypothesis very carly in the trial. O'Brien and F1eming's design. in contrast. is unlikely 10 stop carly in the trial unless then: is very SIrong evidence of a treatment difference. Ir the two treatments IR very similar. both designs are likely 10 lead to a mal n:quiring ~ patients than Ihe cquiwlenl fixed sample size trial. A more Rexible design approach is provided by the spending function method proposed by Lan and OcMets (1983). In this approach. the IOlaI overall Type I CI1'OI' rate of. say. S~ is considen:d to be spenl through lhc course or the trial. wilh lhe rate at which il is spenl controlled by the specified spending function. Not only docs Ibis introduce
ftcxibilily in the choice of Ihe shape of the stopping boundaries bul it also. in contrasl to lhe tests of Pocock and O' Brien and Fleming. allows construction of a tcstthat maintains the Type I error rate ir interim analyses are not taken at the planned limes. Man), forms can be used for the spending function. but families of functions to give tests with cCJtain properties have been proposed. A thorough review of the approach is given by Jennison and Turnbull (2000). As wilh the boundaries approach. specialisl software is reqUired to calculate the crilical values. The softwa~ package EAST prodU()Cd by Cytel Software Corporation and the S-PLUS module SeqTrial produced b)' MalhSoR perform the necessary calculations. An aIlcmalive to the sequential design approaches based on lhc assumption of nonnalit)' for S(lOI'C statistics jWit described is the adapIive design approach described b)' Bauer and KOhne (I ~). Although the ideas can be extended 10 trials willa gn:ater numben of slages. Bauer and KOhne focus on a two-stage design and assume that lhc data rrom each stage arc independent of those from the other stage. Suppose that a standard h),pothesis test of the null hypothesis thatlhere is no In:almenl ditTcn:nce is conducted based on the data obWned from each stage. leading to two P-values. PI and P2. A result of Fisher cited by Bauer and KOhne shows that. if there is no treatmenl difference. -2 log(P1P2) follows a CHISQUARE DlSTRlBtmON on 4 DEGREES Of ~t. allowing lIae data from lhc two stages to be combined in a single test. • rad that the only assumption made is the independence of data from the lWO stages meDDS lhat this approach has gn:at Rexibility, enabling changes to many features of the trial design without invalidating the final test. The most common change discussed is modification oflhe sample size ofthe second stage based on the predicted power oflhc trial at the end ofthc first stage. but possible changes go far be)'ond this 10 include changes of Ihe endpoinl being measured and the null hypolhcsis being tested. The adaptive design approach has been criticised. however. for the facl that the lest statistic. -2Iog(P1P2)' is not a sufficienl statistic for lhc treatment difference. This leads to a lack ofpower for the test. so that. irthe ftcxibilily of the adaptiyedesign is not utilised.. a sequential lest based on the boundaries approach or Ihe spending function method can be found that is as powerful and has smaller expected sample size. As an example of a sequential analysis. the ligule (pa;e 423) shows raulas from the anal)'sis or a small trial 10 assess the emcacy of Vaapa in men suffering erectile dysfunction as a rault ofspinal cord iqjUl)'. EUgibic men with a regular female partner. who were attending clinics in Southport. Belfast and Stoke Mandeville. ~ randomiscd bel\W:en Vaagra and a matching PLACEBO pi)). After four weeks they ~ asIccd whdhcl' the Imalment n:cci\'Cd had implO\'Cd their cn:dions. The bial was designed using the boundaries appruach with the triangular IcSl being chasen as an appropriate design. 11Ic solid
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SEQUENTIALANALYSIS
lines on the ftguIc iOusbalc the conlinuation region for this test when the clicientSCCR statistic. Z. is platted apiDst tbcobseryed F"JSbcr's information. V. Tbc IriaJ conlinucs until the wluc:s or Z and V lead to a point oulside this biaaluJar lqion. 7
6 5 4 3 Z 2 1
O+---____~__--__--__----~--~ -1
-2 -3 aeq..ntIaI .....,... Continuallon region and sample path for a dinlclll ,rial of Viagra In men with spinal cold
inilJly Al the lint interim analysis, 12 men bad completed rOlD' weeks' In:lllmCnt with 5/6 on Viapa and 116 em placebo n:pOltilll improyement. The ftrst ploucd point on the ftgun: n:pn:scnts these data. To allow rorthe fact that the bial is not monitorm mnliDuousIy. the boundaries an: .tjusted using the so-callc:d auislmas Re corn:ctian. so that the plotted point is compared with the iMCI"doucd boundaries shown. As the point is betwc:c:a Ihcse boundaries. Ja:lUilment to the trial conlinuc:d. At the third interim analysis. the observed implUYemcni niles wen: 8110 on Viqm and 1II0on placebo. On the basis of dICSe daIa. tbc upper boundary was reached and RClUitmeni clused. When the raults on tbc 6 men under treatment wen: added. the impro\lemc:Dl rates became 9112 and 1114 n:spcctively.lcading to the fourth ploued point. The clc:sip allowed a ItnIag pasili\le CXlKlusion 10 be dnwn aller only 26 men had been 1Icated.. nus is in comparison wilh a con\leDlionai fixed sample size trial. rorwhich 57 subjecls. ~ than twiccas many. would haw: been n:quinxl fo.r a design of the same power. The mcthacls described earlier mainly lead to tellS CXJDdueled ala n. .bcrofiJdcrim analyses orlhc data oblainccl in aclinicallrial, with the possibilily of stopping tbc trial as soon as sufftc:ienl evidence ora tn:aImcnleffect. or the lack ol'Such an effecl. is obtained. Much mon: gcacraI methods can be envisaged in which may aspccIS of a clinical trial design may be m:onsidm:cl following an interim analysis. Such mc:Ihods an: sometimes Rferxcl to as DATA-IJEPENDENI" DESIONS, or n:sponsc:..driven dc:sips. or. rather confusil1lly gi\len thai tbc same terms are used for the diffen:nt
approaches described bcn:. as either sequential designs or adaptive designs. A n:view of n:spansc-driven clc:sip methods is gi'VCD by Roscabagcr (1996). A simple n:spanse-driven design is tbc play-the-willner design far a clinical trial comparing two lIalmenls on the basisofasuccesslfailweendpoinl. The purpase of this dc:sip is to replace the random allocation of treatmenl to patients willi a method that leads to mon: patients m:civil1l Ihe superior treatmcaL Since. of coune. il is DOl kDown at the beginning or tbc trial which In:atmcnl is superior. tbc lint patient may be assigned 10 aln:almcnt al ranclam.lra success is observed fram the In:atmcnl or Ibis palienl. the next palient receiYes the same In:almcnt. If a railwe is observed. tbc next patient ret'Cives the other lIatmenl. Each subsequenl patient n:ccives the SIUDC tn:allllenl as the pnMous one if a success was obsc:ncd and the GIber tn:aImcDl ir a failure was observed. In praclice. the simple play-lhe-winncr mit: is genc:nlly modified to iaclude lOme nmdom elemenL Several ather rules with diffc:n:nl propcrlic:s. bul with tbc CXIIIUDDn aim or assigning man: patients 10 Ihe mast successfuillcalment ann, ha\le also been IUgeslcd. Responsc-driven dc:sips ha\le found mast use in carlyphase clinicallrials ror close finding. Hen:. the dose of Ihe experimental tn:atmenl thai is 10 be given 10 each patient in tbc llial is cIelenninccl depcading on the responses fram patients baled earlier. Often die aim is 10 dc:tcnnine a close that leads 10 a cc:rtain proponioa or patients expc:riencil1l some event: in trials in oncology. far example. toxicily rates of 2"' an: often 4."Onsidcnxl optimal. The usc of n:span~ driven clc:signs in such biaIs means that tbc optimal dose can be efHc:ientJy estimated without exposing patients 10 large doses lIIal may be highly Ioxic. 1he use or a sequential stopping nile in a clinical trial means thai may or the standanl analysis mclhods 1ft no longer appnJpriate. Suppose that a sequc:alial llial has &lopped al some inll:rim analysis with the lesl statistic exceeding the upper crilical value. i.e. with tbc conclusion that the experimental llalmenl is superior to the cOllbVI In:llbDcnt. 1he trial has stopped precisely because or the large observed value ortbc random SCIR stalistic. This me_ that a standard unbiased ellimate or the balmenl dilTen:ace based on the observed value or the lell statistic, e.g. the common MAXlMlBI UlCEUJIOOO ESTIMAlION. will. on aYenlle. ovcn:slimalC the uue value or the tn:almenl cIi~nc:e. The P-yalue from a slandard analysis will. in a similar way. on avenap be too small; i.e. il will overstate theevidcnce spinsl the null hypothesis. Special melhacls of analysis allowing rar the sequenlial monilOring have been developed. 1hese an: dc:scribc:cl in detail by Whitchc.t (1991) and Jc:nnisaa and Tunabull (2000) and an: implemented in the softwan: packqes PEST. EaSt and ScqTriaI. In Imgc-sade clinicaillials. mnnitorilll or accUIDulali1ll data is commonly uncIcrtabn by an indc:pc:nclent data and
423
SEQTRML _____________________________________________________________
il uses the sign of the dilTereDCcs rather than their m~nitude
safety monitoring 4XJII1millee (DMC), 11Ie primary role of such a committee is to ensure the safety of patients recruited to the biaI. It is lhererore nDlUral. in a sequenlial clinical trial. that the DMC should be involyed in the interim analyses conducted to assess the treatment dilTerence and in dc:cisions or whether or not the study should be stopped. The inwlvement or a DMC. the use or a carefully chosen sequential stopping rule. approved by the DMC before the start of the study. and a final analysis that allows for the sequential moniloring provide a clinim trial that can be stopped when appropriate withoul compromising the stalistical integrity of the Iaults obtained. NS
and is therefore less seRSitive than the WD.COXaN SIGNED RANK TEST. b can be used for two samplcsthat are matchedorpain:d
with a NUll. HYPorIIESIS that the ~L\NS are nol dilTcreal belween the two groups. Alternatiyely. it can be used in the one sample case 10 compare to a panicular value. e.g. the median. where the null hypothesis is lhallhc group median is not differenl to the proposed median. Jl is a nonpammclric venian of both the paired and the CNE-SaWPLE I-TEST. For the lWo-sampic case., find the sign of the diffi:m1c:e between the lwo Yalues in the pair. Cakulalc N.the number of differences showing a sign. Far the one-samplc c.e. find the sign or the ditTc:le11ClC between each subject· 5 wluc and the value or interest. Cakulalc N. the number or obsc:nations that arc ditrcrelll to the value of inlaal. Then for bach cases iel.'C be the numberorrewer sips•.'C = min( + s. -5). and CXIIIIpGII'C.'C to the critical region or the 8DK»dL\L DIS1'RI8UJ1ON, N. 112. Rejecl the null hypothesis if .'C is less than or equal to the critical value. As pan or a study, the genc:raJ hc:aJth sc:c:lion ofthe SF-36w. collected.. 1he subjecl's vaJues (shown in the flrsl lable) are to be compan:d to the expcc:1cd value of 72 within Ihc population. There arc S plus signs. 9 milaas signs and I tic; therefore x =S and N= 14. From the tables or the binomial distribution (N = 14. P = I/J the crilical value is 3. As S is greater than 3 then: is insufficienl CYidcnc:e to reject the null hypothesis., so il is concluded that this group's gcncml hc:aIlh SComi . . , not diJTc:lenl from those expected in the papulalion. Ocneral heallh scores wac collected on this group or subjects al II second lime point; the sc:oms at this lime paint arc shown in the sc:c:ond table. 1his lime. there are 7 plus signs., 8 minuses and frO lies.. ~rarc x =7 and N = IS. Compan: x = 7 lo the critical yalue or the binomial distribution IS. 1/ 2• This value is 3~ as 7 is ~r than 3 ~ is insufllcienl evidence to reject the nuD hypothesis. Then:rore the general health SI."O~san: not difl"ercnlalthe lwo time points.. For fUlthcr details sec PcII (1997) and Siegel and Caslellan (1998). SLY
AnaUaae, p .. M~ C. I(. and Rowe, B. C. 1969: Repealed sipiftcaaoe ICsts on accumulating daIa. JDUnlQI O/IM Royal SIQlislicalSotie'y. Serie$A 132. 2lS-M. B....., P.aad KibDe, 1(.1994: EvaluaIionofcllpaimmts wilh~ iaterim anaJyses. Biomelrit$ so. 1029-41. Jt!IIIIboa, C. ad Tamball, B. W. 2000: Group sequenliDl melbods with Qpplital;Q1U to tlin;mllriG&. Boca RaIOa: Oaapmaa & HalIlCRC. .....,)(. I(. O. and D. L 1983: Discme sequential boundaries for clinical trials. Biometrika 70. 659-63. O'Brim. P. C. and 11""", T.R-I9'79: A multipletesling ~ for clinical trials. Biomelria lS. S49-S6. Poad, So J. 1977: OIoup sequential methods in the design and analysis of clinicaJ trials. BiomelrilcQ 64. 191-9. RaHaberpr, W. F. 1996: New directions in adapti~ designs. SIQlutiml Science II. 137-19. Sdwfstem, 0. 0., Tltatll. A. A. and Ro...... J. M. 1997: Semiparametric efficiency and ilS implications OIl the design and analysis of P'OIIp-scquential studies. Journal o/,be Alllu;t:tUI Statistita/ ABOCiGtion 92. 1342-50. WaId, A. 1947: Sequentilll QllQlysiJ. New York: Jmn Waley " Sons. Wlllfaad, J. 1997: TIre design ad aftQl)'$is oJ stqumtilll cliniC'sl Irillis. Chichester: John Wiley & Sms. Ud.
OeM"
SeqTrlai
See 5EQUENl1AL ANALYSIS
sequential probability ratio teet (SPAT)
Sec
SBQUENTlAL ANALYSIS
shrinkage See WLnLEVEL MOOELS
Pttf, M. A. IWl: NonpGrtllftetric slQtislit$Jor bealtb carrrrmznb. Thousud Oaks: Sage...... S. and Casttllaa, N. J. 1998: NonpQfQRfetrit :lIatiJtia for 'be bebtniorQ/ st:knce:.. 2nd edition. New
sign test 1'his
is one: of lhe oldest nonparametric rnelhods and one or the mosl simple. II is so named because
York: McOraw-HiIL
sign test Subjecfs values in the general hesJIh seclion of the SF-36, using signs from the sign test GH value
60
SS
Sign
7S
100
+
+
SS
60
SO
60
72
40
=
90
7S
+
+
70
7S
5S
+
sign test Second recording of subject's vaAles in the general health section of the SF-36
Timet Time 2 Sign
60 40
SS 4S
+
+
7S 100
100 SO
+
SS
60
SO
70
9S
9S
60 6S
72
as
40
SS
90 70
7S 4S
+
+
70
7S
7S 6S
SS SO
+
+
______________________________________________________ slgnlftcance
testa and significance levels
Significance ICsls wen: inboduc:ed by R. A. Fisher (1925) as a means of assessing the evidcace apinst a HlILL HYJIOIH. ESIS. Often. such a null hypothesis states dlat there is no association between lWo variables: e.g. between hypcncnsion and subsequent hean disease. Significance teslS are conducted by calculating the P-VAUIE. defined as Ihc PROB. ABIUIY. if the null hypothesis wm: lrUc., that we would havc observed an association as IlIIJ:e as we did bychancc. The Icnn significance level is sometimes used as a synonym ror the Pvalue. Irthe P-value is small we have evidence ogo;nsl the null hypothesis: the entry on P-values describes their calculation and interpn:.tation in more detail. Fisher suggested that if the P-value is sufficiently small then the n:sult oftbe test should be regarded as providing evidence againsl the null hypothesis. He advocated that a con\'enlional line be drawn at SCjt significance (although he rejected fixed rules) and described results of experiments in which the P-value was sufficiently small as sIoliJliC'ml)' signijiC'onl. Stcmc and Davey Smith (2001) have argued that in situations typical of modem medical fCsean:h. P-values of around 0.05 proVide only modest evidence against the null hypothesis. A difTc:n:nt usc of the phrase significance level arises from the hypothesis testing approach to the intcrpmalion of experiments advocated by Neyman and Pearson ( 1933). who showed how to find optimal rules that would minimise the TYPE 1 and TYPE 11 ERROR raIcs OVCl' a series of many expcrimcnls. We make a 'JYpc: I cnur if we reject the null hypothesis when it is in fact lrUc. while we make a 'JYpc n cnor if we IICtlepl the null hypothesis when it is. in fact. false (see HYPOIlESIS 'JESTS). The Type I error rate, usually denoted as a. is closely related to the P-value since if, for example. the "iYpe I enor raIc is fixed at SCJf-. dlen we will reject the Dull hypothesis when P < 0.05. Thererore. n:scarc:hers using the NeymanPearson approach often report simply that the P-value for their leSt was Icss than their chosen Jignijimrrce level. 1bcre is. howeVCl', an important distinction between the use of the term significance level to refer to the evidence ag,ainst the null hypothesis provided by a particular experiment (Fisher's approach) and the choice of a fixed significance level that. tocether with the Type 11 error rate. will be used to detcnninc our behaviour with regard to the n:sulls. Goodman (1999) discusses the confusion caused by the failure to appreciate this distinction in more dclail. JS Fisher. R. A. 14)')..5: SIaliJliml me,hods lor resetm.'ll workers. Edinburgh: Oli\'CI' It Boyd. (iaodm.... S. N. 1999: 10ward eviclcace-bucd medical sIalistics. I: The P-wlue fallacy. Annab of interlftzl Meditine 130.995-1001. Ne,maa, J. aDd Panaa, Eo 1933: On the problem of the most efficient rests or stalisIicai hypolhcscs. PhillJSOphittl/ TTanstlclions oflire Royal Society. Series A 231.289-337. st......, J. A. and o.wy S........ G. 2001: Sifting the cyidc:noe - what" s \\TOng 'Aith sigaiftcanc:e tests? British Medical JOllmll/322. 226-31.
SIMP~SPARADOX
simple random sample
This is the most basic sampling technique. II is where a smaller group. a sample, is chosen by chance from a population. Each member of the population has an equal and known probability of being chosen to be in the sample. Each sample of a given size also has an equal probability of being chosen from die papulation. Sampling is usually done widloul replacement. 50 that each member of the papulation can only be selected for inclusion in the sample once. To choose a random sample. ftJst a list is needed of every member of the population to be sampled: this is the sampling Irmne. Each member of this list is then assigned a number from 1 to N (when: N is the lolal size or the population) in any order. Each member of the sample then has a probability of liN ofbcing in the sample. A nmdom number generator. or table. is then used to select a random number. The member of the population assigned that number is then selected to be included iD the sample. nus process is n:pe&ted until a sample of the required size is obtained. For example. suppose tbat a survey of doclon' opinions is to be carriedoUl. Then: are SOOdoctors ina hospital and a 10CJt sample is to be collected. Fint, the sampling frame ncc:ds to be obtained -a list of all doctors in the hospital. Nexl. cach doctor is assigned a number from I to 500, e.g. in alphabetical order or the oRler on the list. Now look at a random number table. which gives the following numbers. say: 28049 16831
11632 68254 14217 44612 0S049 13213 76103 07222 31852 43S01
Therefon:. the sample would include OOcton numbcn:d 280. 491.163.268.254,142.174.461.205,49.168.311.321.376. 103,72.223. ISS. 243, SOl. As 501 is outside the range ofthc numbers assigned it is ignon:d. Can: is nceded 50 as not to ignon: leading zeros or else same numbels might be inadvmcntly overlooked.. The main adwIugc oftbis method of sampling is Ihc Iac::k of clusificalion aror. as no infonnalion needs to be bown about i!ems except thai they are in Ihc population. b is useful when lillie is known about the population. only that it is likely to be homogeneous. The main disadvantage is it mipt not be possible to rmd the sampling frame. In the example given earlier. tbcIC might not be a list orall the doctors in the hospital. meaning that a difTen:nt method would need to be uscc:l For further details sec Crawshaw and Chambers ( (994). SLY CnnnIIaw,J.1IDd ~J. 1994: A t:oIftise t'OUrse in A lerrl stalislit:s. 3m edition. Cheltenham: Stanley Thornes Publishers Ltd.
simple randomisation
See RANDOMISATION
Simpson's paradox
nus is the observation that a measure of association between two categorical variables may be identical within the levels of a dlird categorical
42&
SKEWNESS ____________________________________________________________
variable, bul can t _ on an cnli~ly difl'crenI value when the wriable is cIi~PnW and the associlllioD mcasun: calculated f'nHn Ihc pooled datL As an cumple consider Ihc dne-way coalilllCIICY tabIc shown in Ihc table.lnf'anlsbom in two cli nics durin, a certain lime period ~ Calcloriscd acconliRl to survival and amounl of pre-1UIIaI ~ receival.
have posilive or riJht-band skew. When the tail or die dislributiaa is on the left-hand side (ICC put (b) in the fiISl fi~) die data have nClalive 01' left-hand skew. )
Slnlpeon·. paradox Three way c1assiftcation of infant survival and amount of pre-natal care In two clnIcs, taken from Evedlt (1992)
Itl/tlnt mniWlI AmDIDII of
Clime
Died
........ EJuImpIe 01 flrlht-hIInd IIncUd-hand skew Many clistributiaas eDCDUIIICrc:d in analyses or medical
S"rlmi
pn-ntIlDI CtI~ A
Less
3
116
MeR
4 17 2
293
Less
8
MeR
data an: positively skewed. Forcumpic. ~ 8 rat..elalcd growth hormone. was mc:aIIftCI in umbilical cord blood samples taken from 407 babies born at 37 wccks~ JCsIalion or later. The cliSlribution of the relu"s is gi'VCR in the second filun:; the d8ID am positively skewed since relatively few babies have conllcplin 1cvc:Js above 2ODg/m1.
197 23
Calcu1alcd within clinics. the oddI of survival vary liale
hi'" wi"
between Ihc two pn:-natal can: poups (ODDS RAllO (OR) for survival. comparing lower IllllllURI of can:. clinic A: ·OR= 1.2S~ .clinic B: OR =0.99) and the carraponding 00-5QUARED1ESJ'SofiDdcpendcacc ofsurvival and amount of ~
do not reach sipificaacc. Jr.
bowe\lCl'~
the data
1ft
collapsed twer clbaics. the odds nalio becomes OR = 2.82 and is IlaliSlically significant 8CCGI'di1ll1o D cbi-squamllcsL and &he conclusion would be thai the amounl of can: and survivallR ~l.lcd. Such a silUalion occurs when the third variable is associated with both the other variables and. themCR. confounds the association between the variables or i~. Hc:~. ldatively IDCIR pe-aataI care is given in dinic A and the survival pm:cnlale is also bigher in clinic A Ihan B. 'I1lcn:-
be, 10 some cxteal the pooled mca5tR of .o\SSDCIATION bdwcc~ survival and pnHUdai can: mcaswa both the association with ~nataI can: as well as that with clinics. To lake aa:.1JUIIl of the levels of a confOllnclinl variable. such as a clinic. a poaIcd within-level mc:asun: or association can be CODSlIUCted (sec MANm.-HAENs:za. ME11IOOS) or a slalistical madel can be used to adjust the association of interest for the conrouadc:r (see UJOISriC UXJUSSIDN, LOO-UNEAR MOOEU). SL Ewrftt, B. S. IF-: TMIIIItIlyJU D/I.'DIflilllenq ttlbk~ 2nd cditian.. Boca RaIDD: 0Iapman Ii: HalIfCRC.
skewne..
Data amclcscribcd as skewed ifthcy have an asymmclric clistribulicm. When thc tail or the distribution is cmthe rilht-hand side (sec part (8) in .t he first fIgtR) the data
ADalysis ofskcwed data can pracc:edeithcrusiRl thcllANlCS ofthe data or using transformc:d willes. Analyses using nmb are known as naDparametric ar distribution-flee methods. because they make no assumptions about the diSlributiaa of &he data. When clesmbilll skewed daIa usiRllIDIIJI8IDIIICbic methods the MmWI is a suitable MEI\SURE OF LOCA11OIN. Altenaative analysis techniques are based on transformed values. These usc panundrie mcdIods, which n:st on die assumption that the data have a particular distributicm. usually 8 NCJIWAL DlSl'RlBUI'ION. AlthouJh skewed data do not confann to this assumption. it may be ~bIc to apply a malhclDlllic:aI 'I'IlANSfCJRMA11O the cIaIa so lhal they cIo. WhCn the data an: pasilively slccwed il is often rouncllhatlhc lopridunic (los) transfannaliOD is approprialc. If the Icplin data an: Joged they have an approxillUllely DDIID8I
___________________________________________________ distribulion. as shown in the third figIR. When clc:ac:ribing slcewcd cIaIa in this situalion then the OfDMETRIC MEAN is an appmpriate paranu:lric measun:. of location; 80 r=
60
r=
F-r= ~
r--
I----
l"""'-
I .----.,
0
I
I
2 Loa cord leptin (log nWmI)
-, 4
skewness LogaritlJmictransfotmlllionoileptlnlfHldlngs Mathcmalical measures of ske.WDCSS can be used to describe distributions. Data with a synuncIric distribution. such as the normal distribulion, have a skewness of zero. Pasilive valucs for slcewness indicate a positively skewed diSlributioa when:as negative values for skewness indicale a negative sIcew. The skewness or the nw cord leptin measwanents is 2.7, whc:Jas that of the log-lnUlsfonned rneasun:menls is 0.2. which is considcrably closer to zero. SRC
software Sec STA1ISI1CAL PACKAOI!S
apaUaIepldemlology
This is the analysis·of epidemiological or public health data that an: geographically n:.fen::nc:ecI. ~ically the data arises ia two rarms: either (a) the n::sidential address of cases of dilcaae an: known 01' (b) arbitrary smaU arca such as ClCnsUS tracIs. zip codes or posk:odes have counts of disellK obscrvc:d wilhin lhem. The Iocatianal inf"ormalion is used in the analysis. usually to make inf~nces about spatial health declS. Often hypolheses of inlen:1I in spatial EJIIDEMIOLOOY focus Oft whelhc:r the n:sideatiBl ~ss of cases or disease yields insighl into etiology of the disease or. in • public health appIicatioa. whether advene environmental health hazauds exist locally within. Jqion (as exemplifted by local incn:ases in disease risk). For example. in a study of the n:lationship betweeD malaria endemicity and diabetes in Sardinia a sll'Ong neptive n::lalionship has been round. 1his n:.Iation had a spatial exPft'55ion and Ihe geographical distribution of malaria was impmtant in gc:aerating explanatory models for the n::lalion (Bemanlinelti el Ql. 1999). In public health pradicc. it is orconsiderable imponance to be able 10 assess whether localised an::as that have luger' dum expected numben of cues of disease an:: related to any underlying enviroamenlal cause. Hen:. spatial evidence or
~~ALEADE~OGY
a link between cases and a lIOUR:e is fundamental in Ihe .....ysis. BYiclenc:e such as a decline in risk withclistanc:e fram the putative SDUR:eorhazanl orelevalionorrisk in a pn:fem:d direclionisimporlanlinthisn:pni(scc.forexample.Law50n. 2001. 2006. Chapter 7; Lawson et QI•• 19M). There an:: four main ~ whem statistical methods ha\'C seea development in spatial epidemiology: DISEASE MAPI'IND. D~ CLUSTERINO. ecologjcal analysis and disease map surveillance. SefOM looking in detail aI each of these IRU. it is appropriate to consider same common themes or issues· that arise in all an::as or the subject. A rundamental fc:alun:: or data avaUable ror analysis in spatial epiclemiolou is that it is usually discn:te (either in the ranD ora point process orcaunling process). ud the cases of concern arise tiom within a local human populatiOlllhal varies in spatial density and in susa:ptibility tothediseaseofintcn:1l. Hence any model or lest procedun: must make allowanc:e rCX' thisbackpouncl(nuisance)papulationeffi:ct.Thebackpaund population e8'ect can be allowed for in a variety or ways. Forcaunaclida it isc~looblainexpected ndcsfCX' thediseascofinteR:slbuc:dontheagc.«XsbucllRofthclocai population.. and some crude estimalcs of local n::Iative risk an: often compuIal fiom the nlio of absenecIto ~ counts (e.g. Sl"ANlWtDISID ......wrvlincidenc:c RA11OS. or SMRs). For
caseC'\lall data,expc:clCd maarenolavailablcal then:solulion of the cue locations and the usc of the spatial disIn"bu1ion or a coatml disease has been ad\lOallcd. In thai aase the spaliat variation in the casedisalsc iscomparmlo the spaIial varialion in die aHIIIaI diseue. A major issue in dIis appIU8da is the cam:ct choice or CIOIIbuJ disease b is impadanllO chouse a control that is 11UIIchc:d 10 the age-sex stnIcttK or the case disease but is Ullll8'cctcd by the feabR ofintcn:sL fGrcxample.
4700
split."
epidemiology DisttI:IuIion of cases of chi1dhood lymphoma and leukaemia in HumbersidB. UK.
1974-1986 427
9MTMLE~OGY
___________________________________________________
in the _,sill or cues IIULIIId a put.aIi~ Iic:akh IuanI, a aXdntI cIiscaae shauId nat be alf'ec:II:d by the heal... hannI. CaunIJ or CIDIIIIaI elsac CIRI could also be . . iasIead of ~ raa when anaI,siDe CXJunl data. 11Ic Int (see ~ 421) and 5eIXIIId fipn:s c&spIa)' case 4M'IIl and CXIIIIIol' daaa maps fix' a . . . . of the UK far a lbcallimc periacl1lle thinI fI&IR displa,ys a typical count. daIa c:uaaple. ~~~~~~~~--~~~--~~~
_: .
...
.
•• '
• • *.A... ~~ •
~-,r
...
•
....... epidemiology ConImI'dlsttlNtlon: disIIibuIIDn . . t(II • . " " . of Ive bidhs ~ the bitth regisl8r in HumbtnIde, UK,' 1974-1986 Par CIlSe CVCIIl dais. IocaIiGlis often ~tlaiclenlial Mdrascs or cases and the cases .__ fnan a ~lIS
papialaiian ... varies bulb iIa ~ dcasity and in susc:eplibility tOdiscue. A betenJseaeaus ...... pracc:as madcl is often IIISIIIIICCl as a llaltilll poinl far .ru.ther anaIys& 11x: rocus or lntaat ror IDIIkin& int'en:acc ~nc par8,lDeten ~ribin, CKe$S risk J~ iDa n:liIIi~ risk ~ which is included in Ihc: ftnt-ardcr inleDlity of Ihc: Paissan pnICIeSS. It is ~bJc .... papalatian or awJnxuDealal he~ ncit~ may be u~ in 11M: ... set.. This couIcI be because ci'" Ihe (iopuIaIiaa backpound hazanl is· not direc:l1)' aYBilablearlhedilCUC displays a tcDdcacy Ia clusla' (pcJ:haps due·to UlllllCUured eawrialcs). The hdcropneily could be spatially com:JaIed. or it caulcllack t"ORItEI.A1Df, in . which case it could be n:pnIed ai a type or OYERDISPERSION. One CD iDelude such uaobscrwd hetcrascneity willlin die: rramework of ~ models •.a R..t\I\DlM EIWrI'. A caasidcnblc . . lileratuns . . developed alDCCmilll die: analysis or «XIUII1 dada in spatial cpiclcmiDlog (c.g. see n:vicws in E1lioll tiL, 2000; La~2001, 2006;'Lawaon and Williams. 2001; Lawson itl til., 20(3). 'The usuid .....1adapted far the IIIIIIlysis or n:&ion CXUIIS IISMIIeS dial the auals ~ ~nl PDissan mndam w.iabIes willa pIIBIIIdc:r A, in the ida . . . . 1his model _y becxlCld:dtoincludD~ hctcropndty bc:lween qians by inIrOcIuc:inJ a prior cIisriuion for abc . . mlalM risks (ICJI A,).1DaHpcnIiaD or -.ch ~ity bas become a camlllDlnpPRlBclIBad_ 8cu& York ... Mallie (BYM) model is now a SIandanl model. A filll Ba)'aiaD _)'Sis usin& this ..... (ICC JAYISIAN MmIDCS) is awilablc ali WJHBUG$ _~ lllUl)'en:asiaasortbis.maclel~ been .. apased (for wriauscUmpies ICC laWsan.,.2009). AL
e'
5IiIl . • [_ [
[0
1.44 , .... NI 1 1a .44 120 It.70 j Il!'-l n..il2 n.l'O IXI)
ID It
'j.J
n.42 [2(i)
....... epIcIemIoIagy DIsIrIbuIion ofcountsofSuiJden iniantdealh (SID) wlihinthe oountIes ofNotthCatoina, USA, 197~1978
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SPAnO-TEMPORAL DISEASE SURVEILlANCE B.....nIDeII,L,PanIHo.C..'tII. 1999: Ecolopcal ~nwith mars in covariaIes: an application. In Lawson. A. B.. Bun. A.• BaeIIn~ D.• ~. E. el aI. (cds). ~ mDpping _ rUle aSJeSSlnt!ft1 for public NoIll•. New York: John Waley It Saas. 1-., P. 329. ElIott, P., WIIcdIId. J. " .. (cds) 2000: Spalilll epidemiology: methods tllltlttpplimlions. London: Oxford Uni,asity PIa:s. La...... A. I. 2001: SlalisliaJImelbods iIrspaIitI/tpitknriolog),. New York: John Wiley & SoDS., Inc. La1noa, A. B. 2006: Statistical mctlrotls in spotiai epidemiolDgy. 2ad edition. New York: John Wiley & Sans. Inc. La_, A. B. 2OD9= Bayesian OWtlSt mappilrg: Irienmbit.YII modtIinI in spatialepitlrmiology. New yurt: CRe PIa:s. La..... A. 1._ WllllDls.F. L R. 2001 iR inlrotmtIOT)"g. to ~ mtIpfIing. New YOlk: John Waley & Sola. lac. LawIaa, A., B_II,A.""I999:Arevicwofmodellingapproachesinhealduisk assessment auaund puIIIive sources. In 1..a\\'SOIL A. B., Bigp:ri. A.• BaeIIn~ D., ~. E. el aL (cds). D~ mDpping _ rUle tUawmenl for public health. New Yolt: John Wiley It Sons. Inc.• pp. 23145. Uwsoa, A. B., BI'01It'DIt W• ., til. 2003: DireMe mappm,iIr lV-dUGS tmd.VL"iN. New York: John Wiley & Sons.IDI:.
spatlo-temporal disease surveillance
This is the detecUon or aberrations in health or disease. usually U they arise. This definition slrcsscs both the unusual nalun: of the disease event and also the impoJtance or temporal change in surveillance. How ·unusual· an evcnlor sequence ofevents is bceomcs. of CXIUI'Se. an issue in the design or any survcUlance system. The Centers for Disease Control (CDC) define surveill8DtlC u: •... the ongoing. systematic collection. analysis. and interpretation or health data essential to the planning. implemenlalion. and evaluation of public health practice. closely integrated with the timely dissemination or these data to those who need to know. The final link of the surveillm..:e chain is the application or these data to prevention and control. A surveillance system includes a runctional capacity rOl' data colle:ction. analysis. and dissemination linked to public health prognms' (Thacker. 1994). This definiUon stresses the collection, analysis and dissemination or data in a timely manner. and hc:ace it is very broad and stn:sses the focus on public health ncc:ds. However, it is possible to dislinguish two basic types or surveillance that play dilfermt roles in public health activities. Yarst. relrospediJ-e slB'\'Cillancc CXIDCeIDS the colle:ction or historical data on disease oc:cum:nce aDd its cxaminalion. n.e purpose or such analysis may be to inronn decision makers as to temporal 01' spatial ~nds and other rcalUn:s or disease behaviour. nus fonn or SID'\'Cillance is closely associalCd with classical epidemiological analysis (see EPIDEWWXIY). and differs mainly in its rocus on public: health needs. Second. prospective surveillance is the online or acli~ examination or disease data to discover changes in disease at the Ume or close 10 Ihc time or oc:cum::nc:c. In this casc. monitoring or disease occum:nce is done "as data arrive' so that decisions can be made conc:crning outbreaks or disease. 111c impol1ance or the this fonn or surveillance has been
heightened with the n:cent rise in tenorism and the pote:ntiaI ihn:atto health from bioterrorism. The n:lease or toxic or highly infectious agents into a population would be or grave concern in this context and so il is now importanl that rasl and accurate: surveillance of diseuc be undertaken to detm changes as early as possible. The dcsip or disease surveillance systems must consider some of the rollowing issues: 1. Early delet:lion. It is imponanl to dclCCl changes to
disease inciclcnce as early u possible. For example. ror certain inrc:c:tious disc:ascs il may take up to 7 days to receive Iabondory conRnnaUon or cucs. However. such a delay may be unacceptable ir a serious event had oc:cunal. Hence ways to speed up detection could be imporlanl. 2. Synt/romic methods. The use or ancillary inrormation is rcquin:d in order to speed up dete:ction of population health aberrations. A fonnal definition is given by (Sosin. 20(3): "..•the public health tcnn Syndromic Surveillance has been applied to systematic and ongoing collection. analysis and interprclation of data that precede diagnosis (e.g. laboralory test requests. emergency department chief complaint, ambulance response logs. pn:scription drug purchases, school or work absenteeism. as well as signs and symptoms m.-ordc:d during acute care visits) and that can signal a sufficient probability of an outbn:ak to warrant public health investigation.' ORen covariate information could be userul in helping to establish the character of an outbreak. The: first figure (page 430) displays the time series or phannac:eutical sales and gUlrOinlestinai diseuc reporting ror a Canadian example (see ror rurther examples Lawson and Kleinman, 2005). 3. Sensitivity and specificity of deledion methods. The calibndion ora suncillancc system is \'CI)' important in that false abernIion alarms (false posiliws) could lead to WInecessary public aIann. while raise abenation negatives oould lcad 10 health disasrers. Farrington and Andrews. 2004. and Le StraI, 2005. proVide IlIOn: deWlon these Luues. 4. Which diseose tlIId "'hat 10 look for? In relroSpc:c:tiYe surveillance thediscase: is usually known and the reatun:s or intcR:St an: also known (e.g. tn:ads). In prospectiYe surveillance. particularly where biotcnorism may play a role, the~ could be little a priori knowledge: or Ihc disease and the abenation to look for. Hence. detection systems must have the capability to deal with multiple diseases and possibly multiple ronns or aberration. nus is known U ",u1ti.YlTioie-mul'ifocus surveill8DtlC. Because or the potentially huge database searching problem thal results rrom this. DATA r.t1NlNO appl'OBChes have been adopted (sec. ror example. Wong el Ill., 2002, and Madigan. 2(05).
429
SPA'I'IO-TEMPORAL DISEASE SURVEIL.LANCE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
E...... eon.(can....................... GI.GnMI) MClW__ ~ UnilSalH~.OVar·1heoCounIIr AnIIoNautMnt alid AfIIodanheaI ProcIucIs .-r}' 1PI1...., 1ID01)
,... ......... ".WchI;......
,~P---------------------------------------II\UtI·'lrOTC ....
,.J:
....".......
CIOIII·=i•• •
i• •••
2D
J
••
;0
DafB .,..aa tlllllparal • •_ ............ BIItIItIIotrIs SasIaIIc:hewan: epidemic curve and. flme ~ Ot ~ IIrtH:ountei (ort;:) ~ - - JBruJty to May 2001 (fmm Edg8 .'ilL, 2OtI4, ~ .Iou",., at Pub/it:'HeaIlh) , die nate 01 accuneaee (Jua;ps). 8acl overall cbaaic (_above).:
I.
• lillie series or c:ouall of elise. . ia lalaYaia or·. a pc;lnl incess 0, NpGItiI,ll 01' ~il CIdioa times). With lillie -=ria 'of cliscrele tile "mdi~ that ~t be of ialcn:lt < . .CIlIa. CtmllO.l'_t' eIi__ c~) am hi&hlighled ia Ibc IeCDDd IIpIe, wIdCb ihows thallhele ale three .... chanps (A,B,C), 'I1Ia lint abell'atioa '(A) is a sharp rise iii risk Oump) ar c:bappoinl'la Jevel, 'l'i..; .... aJlmatio. (8) dUd mIPt be of'iIIlaat is a ClUIIer of risk (a iacreue ad lllaa iD risk). 11IiI call alii), 'be ~ n:tmspec:li¥cly or course. 11Ie thinI . . . .liaa (c;:, is .M overall pnx:css claaap whe!e die IcVcl of die pac:ess is ct.apd but also the variability is. iacrcucd or clac:n:ascd. WheD a poiDt proc:cu of CYCllt IiIDes 1s0000000001ldloama), aced 10 be", whcnewl: .. aew eveal amva. Abcmllioas thai. iIrc rCIIIDII 'ia pojat prac:lC5lO1 CaD tale die farm of unusual . . . . .Iioas 1;tr "inta (Ie...........Iten)~ !IharP c:lumJcl aa T~mptInI/.· la ICIIIparIIl survcill_ eVCIIIS ia available (...., cilhcr &11
8
CCMI"
.1aSe
UCL
~
............. dl..... ·.....1. . . . sa.matIt: 1e8IuIaI. Of ~ forIJd J;t IemponII
SUIWIIanc:e 2.. $JIiIIiD,.t....,.lfalP,llialclDmain (disc_ maP) is to be
IDDJIifoIal Ihca IpIIIiaJ aad IpIIIio.fmnpanJ abcmdiau IIIUIl be CllllIIicII:nId. Spatial ~ c:auJcI mnaiat or clillXNlliauilics iii risk bdw_ n:pm (jUmps in
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SPEARMAN'SRHO(p)
risk aaoss fCIion boUDdaric:s), spatial clusters of risk (localised qgrqalions of cues of disease) and a Ioqmoge Rnd in risk. When disease maps 1ft monilCRCi iD lime then riNln,es in the above f'catures milhl be or inlCn:st. 'I1le development of'DCW clusters of (say) iDfectious disc:asc iD time mighl sipaI localised oua1Raks.. for example. Mull;,ur;QI~
issues
In health surveillance syslems desilacd lOr prvspec:tive survciU~
....y of the above fcallftS would have 10 be delectable aeross a wide range of diseases. Not only would whole disease stn:ams be mcmi~ but subsets or clisc:asc sllams may be ~uirm to be examined. For example:. for eady dclCctioa of eft"eclS il may be important lo IDIJIIitor IiDiI subsclS (such as old oryouq age paups). Also. it may be that CCRItElA110NS betWCCft subsels may be important wilen making ddc:cIiaa clccilions. ID addition. if disease maps ~ to be maailami ewer lime as well as time series. IhcD the problem grows CDAlidcrably. The BiO_Me system develaped by CDC (www.cdc.gov/pllinlcomponenlinitiativeslbiasenselindex.hbDI) bas a multiYariale and mixed sIn:am (spatial and aempcnl) capacity.
MtHkls
One or the c:um:at concems aboul laqe-scalc surveillllDtC syslelDS is the ability to model aJIIeCliy the varialiaa in disease aad to caUInae properly SENSmVITY and speciftcity
with sucb a multiwrillleand mullifocus task. TheappliclllioD of aayesiaD hierarchical maclelling has bc:ca advocab:cl for survcillaacc purposes and this may pIOve to be useful in ill abilily to deal with Iarp-scalc Iyslems evolving in lime in a nalUrai way (Zhau aad Lawson. 20118). For example. RCunive Bayesiaa )eanailll could be importanl. Clearly compulDliaaal issues could be paramounl hm". liven the polentiaUy multivariate Dablre ofthe probIem.ancllhc aced lo optimise campulatiaa is panunounL Sequential Monte Carlo mc:Ihods have been proposed to cleal with computational speedups (KalIl. Lai and Wong. 1994: Doucet. Dc Fn:ita and Gonion, 2001: Doua:t. Oadsill aDd ADdricu.200S: Vidal Rodeiru and Lawson. 2(06) as ha\'C 1IO\'C1 algorithmic speedups (Neill and Moo~ 2005). AL
lSee also 11ME SERES IN MEDICINE) Doaeet. A.. De Fnfta, N. ad GardaB. N. 2001: Sequenl./ Monl~ CtlTlo m~thOib in prGtlite. Ne\\' York: SpriDcer. DaacM. A.. God..... S. ad AIIdrIea. C. 2005: On scquaalilll MCIIIte Carlo slllUpling IIIClhocIs far Bayesian fibering. SIGIWks _ ComJlllling 10 197-201. ' ' ' ' ' ' ' - ' P. ad AIMInw. N. 2004: Outbn:al ddcc:tian: appIicatian to infcctiaus disease sum:i1IaDce. Ja BmokJncya, R. aDd SbaUp. D. (cds). MoniloriRJ l/re /reG/I' tJ/ PDI',,[QliMJ: JIGI&titGI printipl~J I11III n.ellrotb for publit _lib svrwilllllru.
New York: Oxfard Uaivenily PIas. Kaai. A., ..... J ..... W. . . W. 19M: ScqucaliaI inIpuIatians and (B)ayesian missing data problems. JOIIIIltII tJ/ lhe Ammctlll SIGlwical ADrJrilllitlft 89. 278. La..... A... ad KI"nma• It. 2005: SptllitIl tIIIIlqntlromir JIII1wiOllllte 101' pub/ir "~G/I'. New Ycd: Jaha Wiley a: Sans., Inc. I.e &tnt. Y. 2005: OveI\'icw oftcmpoml SW"ciUaacc. Ja Lawsan. A. B. and Kleinman. K. (cds). SptllialllRli 8jvrtlromit: JlltlrillQ/K~ for pllb/it: New York: John Wiley a: Sou. _. &,....., 0.2005: Bayesian data miniJII for hcakh SID\'CiIIanoc. Ja Ln.... A. B. and Kleimnaa.. It (cds). SptIIitII _ I1N1romir Jlltlrilllm« for pllb/it: WI". New York: John Walcy a Sons. Jac:. NIDI, D. ad A. W. 2005: Eflicient scan stalistic computalions. In Law-. A. B. aacI Kleiaman. It (cds). SpilI.1 tIIIIl l)'fIIlrumk JIII1w/tRft for JIfIhIk Inll". New YcIIk: Jobn Wile)' a Sons. Inc. s.ID, D. 2003: DIIft fiaaDcwodt for evalualilll syndrumic SID\'CiUaacc systclll5. JOIIfINIl 01 Ura Hellith 80. i8-i13. 'I1IIIcbr, S. 19M: HistClricai development. In Tcusda. S. and Clllftbill. R. (ells). Print/pleJ I11III JlNtlk~ oll'u6lk IImII" JlltwilltllrU. New Ven: (hfard University Pras. VIdId RadII.,.. C. ad ........ A. B. 2006: Online updaling of SJ*e-lilDC disease - SlllVCillance mocIels via paJtic:1e fiItcA. StGI&titGI MelbOib in Mellklll R~.IIt'" IS, 1-22. W. . . W.. Moon, A.. Caoper. o. ad W........ M. 2002: ~bascd anomaly pattem detection for dclccling disease autbJaks. In "'" NGliDul CM/~rma OIl Arl;' flriGI iIIIl!lr"~M~. Cambridp. MA: Mrr PIas. H. ad La--, A. B. 2008: EWMA SIDDDlhiDs aad Bayesian spaIiaI raodding far heaIda sunciUancc. SltlI&Iit$ iIr Meditine 27.
"mll".
.Iaan,
aa..
5907-28.
Spearman'. rank correlation
coefficient Sec
C'ORREI.A11ON
Spearman'. rho (p)
Also known as Spmmlllll'S runic aI"elllliDlr clH!jJicierrl. this is a mellllR or the relationship belween lwo variables that uses only the rankinls or Ihc observations.lflhcnuabd wlucsoflllctwovariablcsforasct of II indiYiduals 1ft II, and h,. with tit = II, - b,. then Ihc cacf6c:ient is deftncd explicitly as:
"Jf ~ p-I. I
-II
ID essence. p is simply Pearson's product momenl com:lation caemcieDI (see COUElA11ON) between the nnkings II and h. We can iIIusln1te the caeflicieal on the cIa'a shown in the lable, which were collected to invesliplC lhc: relalionship belween MEAN annual temperalure and lhc: monality rale for a t)'PC of breul cancer iD womea• The data relate to cenain regions of areal Britain. Norway aDd Sweden (sec Lea, 1965). Here, the Spearman com:laliOD is 0.90 aad Pearson's prodUCl mOlDent com:lation 0.87. In genen), the: Spearman cocOic:ient is mOJ'C mbusl qainst the pracace of oualias. SSE
431
SPECIFICITY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
_
r....n'. rho
(P) BI'fJIJSI
CIIncer
motIIIIlly and
tempel8ful8
51.3· 49.9 50.0 49.2 48.5 47.1 41.3
102.5 104.5 100.4
Total
11.6 19.271.9
65.1 68.1
67.3 52.5
[See abo CORIII.AtION. NDNIWIMEIRIC: ME1IIOD5 -
AN
OVDVIEW}
..... A. J. 1965: New aIIscnatians ell tbri...... of~ 01 female basi CIDCCI' ia certain EurqlcID aIIIIIIIics. Bl'ilish M,tlical
JOIIrIIIIII, 4BB-9O.
specHlclty This is·a IIICIISIR or how well .. allernative test performs when it is compnd with lhe Rfaac:e of ·Iold' lIanclanl tesI for dillln05is of a coaclilion. SpccilIcity Is the pnIpOItion.of patienls CGm:cIly identificclas he from the condition by the diapollic lest oal of all patients who do DDI have the condition. Spcci6c:ily ...y aI. be aprcsscd as a pcrcenlap and is Ibe caualelpalt 10 SENSITMTY. The Mfc:n:nce IIandanI may be Ihe bell _YlliJable cIiqnosIic tell ar may be a combiMaion of cliacnostic mcI~ includilll foIlowiq up paIieaIs until all patienlS wilh the disease .a~ presented with clinical symptoms. It follows ahatabe bcstdcsip when adiaposlic tesl isc:valuarcdapinll a Mfcn:ace SIDDdud is a COllOID' ~ lmaliptan shaald eaasider whether vcrificlllion BIAS is pn:senl: this 0CCUI5 whc:a obtainiq a Deptive n:sult on one diainostic: lest inftllCllCCS Ihe chances of a patienl FinI on to have fllllllcr lesIS. ., 1haI_ patic:lllS with Ihe cOndition IIC\IeI' n:ceive the COJreCt diaposis. When the data an: sel out as in Ihe bible:
~'n' ~..-I ICily
d II = b+
among a +
Specificity should be pn:1CRIed with mNFIDENCE 1N1&VALS. typically lid III calculated usilll an appropriate melbad sach 81 thai of WilSOD (described in AlImaD elllt. 2000) ...... will DDt produce impassible values, i.e. that will IIDl pve values for dac upper conftdence inlen'lll > I when specificity appraachc:s 1 and the sample size is small When a lest raula is • aJlltiauaus mc:aIIRlIICnl, far example. HDL chDlestaoL a c.~11' poinl for abnanul values is chasen. If a hillier value is chasen. then specificity wiD be .a.tively high. but II:RSiIivity ..,latively low. 'l1Ie impad of all possible cul-ol1'painls CD be displayed paphically in a IECIJ\'Bl CIUA'I1NCI CIIAIIACIEJUSI1 (ROC) cunc.. Thechoiceofcut-oll'poinliSDDI,IIDwew:r.solelyastalistical decision. as the balance belWCcn the MUE POSI1I¥E RA1E and the fMSE EL\1I\'E RAm should be .e11llCd ID the cliaical context and conseq~cs of wrong dilllllOSis both for dac palienI and the healthcan: systclD. A Slllllple six caiculalion for spcci6c:ily c.. be . . . . by lIipailllilllacoaftdeace interval (e.g. 95.,) and. KCCpIabh: width for die lower bound of the conftdenee intenal. When: theanlicipaled specificity is hiP and Ihe sample size is s.....l. • 'small sample' mcIhad should be used: a sample aim IabIe is incl"" in Machin et III. (1997). etc (Sec: also NBL\1I¥E PREDICJ1\IE VALUE. IOSI1IVE PREDIC'IIVE
95.,.
14.6 11.7 72.2
42.3 40.2 31.1 34.0
ffJSUIIs
b d b+d
Positive Negative
95.9 11.0 95.0
45.1 46.3 42.1 44.2 43.5
apecIIIclty 6enetaI1BbIe of test b + c + d individJaIs SIUfIP/fId
VAUJE, 11lUE POSIIM! RATE)
A-..a, D. 0 .. MacIdII, 0.. ...,.... T. N. ad o.nt.r, M. J. 2000: SltII&tia with CDIIjfIIat.Y. 2nd edilioa. Loadoa: BMI Books. " '..... D.,c.apbIII,M., FaJllltP..... ....., A. 19f11: StInIpI, me 1IIbIt~ {or diftit:tll strMIfa. 2Dd cditica Chfard: a_k.U
.
~L~
spending function See SEQUENIlAL ANALYSIS
spline function
.PlU8 SPSS
Sec: SCA11BlJlLOl' SUOOIIIERS
See 5TA1IS1ICAL PACKMES Scc: STATI511CAL MCICAOES
stable population See DDtCXIlAPIIY stacked .... chart See BAR CHART
______________________________________________________ STANDARDERROR standard deviation
'J1Us is a measure of spread inICndcd to give an indication of the qftad of a series or values (.1'.. X2 • •••••1',,) about their MeAN(I). Taking the aYCmlle or abe difl'cm:ncc:s rlOm abe mean may initially seem a good measure oflheir spread. bul in fact this is always ZCI'O. 11Ic~rCR, Ihc standard deviation is based on the a~ or the squan:d difl'e~nc:cs from the mean. sinc:c these ~ all positive. Taking the squan: rool of this n:sult gives a measure lhal is in the same units as the original values. Thus. the standard deviation (s) is calcalaled using the rollowing formula. He~ n is the numbc:l' or amc:naliaas. ; takes values rlOm I to" and the ~ notation cIenoIes the sum..
i.e. (XI_x)2 + (.'tl-x)2 + :1=
E(Xi-X)2 ,,-1
es division by n - I. ralher lIIan or the squan:d dift"e~nces. 'J1Us givesan:sultthalisabe eslimaleofibellanclanldevilllion in the whole population. which is being estimated rrom the sample available. The &Iandard deviation can be denotcd SO. ad. s ex' o. although the last technically mers to the &Iandard devialion of a populalion. I'8Iher than a sample. 10 calculate the slandanl deviation by hand ~ is a IIIOM eonvenient 8nd malhemalicaUy equivalent formula:
As an example. the mi (lOftlent ( C) or 10 babies was measun:cl using dual eneru X-my absorptiomeby (DXA). 'l1Ic measumnents in grams ~: 46.6. 46.9.
49.2.49.8.53.2.61.1,68.1. 73.1. 77.1 aad 78.6. It is simple to calculalc that abe sum or the observations ~ .1'; :. 603.7 and the s~ of abe squares of the observations ~.~ - 37938.89. Thus.
The Yo
95%
... & (X._.i)2:
Naae thallhe fonnula i n. when taking the a
:1=
then approximalcly 9S4Jf, of the observalioas wiD be within twostandanldevialionsoflhe mean.'1be figure shows the casc or a standard normal cliSlribution, which hasa mean or 0 and a standard deviation or I. SRC
DImllBUDON
37938.89-1(603.7)2/ 10] 9
=
1 2.8Ig
a set of measurements is the square or Ihcir standarddevi ·on. AlthoUlh the variance has many uscs. the standard dey· lion is a more mcanlngrul descriptive statiSlic because it is in the same unilS as Ihe mw data. Whereas square millimelres. nun:. may have an obvious inlap~ta tion. sq~ millimetn:s or men:ury. ,mmHg1, does DOl. Altman ( 1991) suggests thai slaDdanl deviations may be quoted with one orlWo IItOm clccimal places than the original values. The slandanl deviation is typically used as a measu~ or spmJd alongside the mean and is most appropriate when the data ~ approximately symmetrically dislributc:d. It has the useful property that when the data follow a !(OBtAI.
I
I
~
~
I
I
I
0 Standard deviations
2
~
3
stIIncIm'd deviation Standard normal disttibution, with mean of 0 and SO of-1 AftIaaa, D. G. 1991: PradiaJI sltltisliC'3 for IMdiCtlI Laadoa: CllaplRID & Hall.
R!JmI'('It.
standard error This is Ihe SlANlWlD DEVIATION or Ihe SAMPI.INO DJS1II8Uf10Nora statistic. Foreumple.lbe Slandard
enOl' orlhc: sample MEAN or n observations is a l../ii. whcm,r is the V~RIANCE or the cxiginal observations. A useful aidc-memaire 10 distinguish when 10 use SIandanI deviation (SD) and when to use standard enOl' (SE) is 10 recall: 'SD fex' description, SE ror estimation: In particular. when describing patient eharaderislics in a sample, as in a n:searda paper"s typical 'nIb1e I, means and SOs should be IqIOrtcd. whereas when seeking to learn fram the sample and apply results 10 the relevant papulation, i.e. perfonning Slatistical inference either by IIYP01IIESIS 1!STS or estimation by CONfIDENCE INTERVAlS.lhcn the SIandard emw is used. The SEis necessarily smaller than the SDand it is wronltouse SE as a MeASURE OF SPREAD whca describing samples. M~ generally. standard cmxs can be attached to any sample-basc:d quantity. not just the meDofa single sample or COnlinuously distributed daIa.. as just discussed. The general form or a larg~sample 95'1. confidence interval for a populalion parameter (numerical characteristic) is the sample-based point esaimale :±J.96 (slandard errors). where 1.96 arises from the Slandanl NORMAL DIstRIBUTION and the standard error is that of the point estimate. itselr the best sample-based guess for the value orahe plll1lllleter. For two-sample inference. this is usually a quantity such as tbe difference in population means. ror continuous data. or the difference- in populatiOD proporlions. ror categorical data. SSE
433
STANDAADPO~noN
___________________________________________________
standard populBUon See DBIOORAPHY statistical coneuRlng See c~o A STATISTICIAN standardised
mortality
ratio
(SMA) See
DEMOORAPIIY
STATA
See STAllS11CAL PJ.CKAO~
statistical methods In molecular biology
Molecular biology is the branch of biology Ihat studies the structure and function of biological mamHIIOlc:cules of a "II. and especially their genetic role. Three types of macr0molecules an: the main subjects ofintcrcst: deoxyribonucleic acids (DNA), ribonucleic acids (RNA) and proteins. Genetic information is encoded in the DNA and inherited from parents to children and whca expressed. a gene. the basic unit of inhcrilaDc:e, is first b'an5cribed to messenger RNA. which then carries the information to a cellular machinery (ribosome) for protein production. This basic principle of the information Row in biology is often refem:d to as Ihe "central dogma'. put forward by Fnncis Crick in 1958. A central goal of molecular biology is to decipher the genetic infonnation and understand the regulalion of proIein synthesis and interaction in cellular processes. The rapid advance of biotechnology in the past few decades has facilitated manipulation of these important biopolymers and allowed scientists to clone. sequence and amplify DNA. As a result, a large amount of biological sequence and struclural infonnation has been generated and deposited into public accessible databases. The phenomenal growth of biological data is underpinned by the developments of high-throughput DNA sequc:acing and microarray technologies and Ihe recent prograses in giant ralClIR:h projects such as the human genome project that produced the sequence of the human genome. The word 'genome' refers to the entire collection of genetic malerial of an organism. These advances result in many complex and massive datasels, sometimes clecoupled Iiom specific biological questions under investigation. 1be nec:d to extract scientific: insights from these rich data by axnpulational and analytic means has spawned the new field of bioinformalics and computational molecular biology. which deals with storage. retrieval and analysis of biological data. 'l1Iese can consist of infcxmation storc:d in the genetic axle. but also experimental results from various sources. palient statistics and scientific litendUre. Bioinformatics is highly interdisciplinary. using techniques and concepts from informatics. statistics. mathematics. physics. chemistJy. biochemistry and linguistics. Nowadays. various biological dalabases and practical applic::ations of bioinformatics are R:adily available through Ihe internet and are widely used in biological and medical research.
A wide spectrum of statistical methods has been successfully applied in bioinformatics. ranging from the basic summary statistic::s and exploratory data analysis tools. to sophisticated bidden Markov models and Bayesian rcsampling methods (see BAYBlAN METHODS. MARKOV CHAIN MONTE CARLO). Analyses in bioinformatics focus on three types of datasets: genome sequences. macromolecule structures and large-scale func::tional genomics experiments. Various other data types are also involved. such as tax.onomy trees. sequence poIymorphisms. relationship data from metabolic:: pathways. patient statistics. text from scientific literature and so on. DNA sequences are the primary dala from the sequencing projec::ts and they only become really valuable through multiple layers of annotation and organisation. Sevenal areas of bioinformatics analysis are relevant when dealing with DNA and prolein sequences: sequence assembly. to establish the COlRct order of sequence c::ontigs for a eontiguous sequence; PRdiction of functional units. to identify subsets of sequences that code for various runctional signals such as protein coding genes. promoters. splice sites. regulatory elc:ments: and scquenc::c comparison and database search. 10 retrieve data emciently from organised d"abases. Most oflhese analyses involved sequerrce alignment. one of Ihe classic problems in the early development or bioinformatics. Sequence alignment is the basic tool thai allows us to determine the similarity of two or more sequences and infer eomponents that might be CIOII5c:I"YCCI through evolution and natura) selection. To align two protein sequences. similarity scores are assigned to all possible pairs of residues and the sequences an: aligned to each other so as 10 maximise the sum total of scores in the sequence pairings induced by the alignment. Dynamic programnring-based algorithms were de\'eloped to OYCKOme the large search space for the solution of optimal global and local alignment problems (NeedJeman and Wunsch. 1970: Smith and Waterman. 1981). Dynamic programming is a general algorithmic technique that solves an optimisation problem by recursively using 'divide and eonquer' for its subproblems. Faster heuristic word-based alignment algorithms 'Were later introduced for large database similarity sean:hes (BLAST by Altschul eI ai. 1990~ FASTA by Pearson and Upman. 1988). These algorithms build alignments by extending or joiniDl axnmon short patterns (·words') that an: computationally efficient. but often yield suboptimal solutions. The interpretation of alignment scores and database sclKh results was aided by statistical signiflcance deri~ from simulations and fIROBAlSI1JI"Y theory of extreme value distributions under the framework of standard statistical hypolhc:sis testing (Karlin and Altschul. 1990). 1'11c:se classic results have become indispensable tools for biomedical n:searchen and axnputational biologists 10 analyse molecular sequence data.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STATlSTlCAL METHODS IN MOlECULAR BIOLOGY Slalislical models ~ also routinely usc:d 10 consbuct probabilistic profiles 10 characterise the regularity ofbiologieal signals basc:don eoJlcaions ofPRHllilncd sequences and to incmlSe SENSITMI'Y of seardJc:s. For example, a blackbased product multinomial madel CaD be used to describe the position-spcciftc base diSlributions of lhe 5' splice sile (exon-inlnHl junction) sipal in humans (sec the figure). which gives a richer rqm:seatation of Ihe sequence motif than lhe consensus CAGIGTGAG ('I' indiaates theexOft-intronjunction). A position-specific scoring molri."C can be derived subsequcnlly using logarithms of the ODDS RIJIO of the signal 10 background base ED evalWlle matches of new query sequcnees to Ihe sequence motif and ED quantify the in/ormotion conlent ofthc s~nal sequcace paIIem. n.e infonnalion conlent of a signal is defined as the a\'Crqe saxe of random sequence maIclIcs. measUl'Cd in 'bits~ using Ihe log (base two)ocIds ratio SCCRS that represent the number of0-1 s nccessuy to cocIe for ahis signal in a binary coding system. for iMlanceS. the human 5' splice site depicted in the ftgure contains 8 bits of information. meanil1l thai 'decoy' splice siles will be observed roughly ever)' 2 8 256 bases in mndom sequence. Note that the inf'onnaliaa content can also be formulalccl as the relotiFe enlropy (or Kullbodc-Leibler dislonce) of Ihe signa) to background nucleotide rmp.ency distributions in lhe context of information theory. More sophistiaated maclels and scoring mabices arc also available tocapt~ dependenCies among neighbouring positions using Morlcor models and albers. Anolher area of biological sequence analysis that relies heavily on stalistical aasaning is gene finding or, more gCDmllly~ predicting complex features from a sequeace. The goal ofprolein-axli11l gene ftndil1l is to lOCale gene feat~ such asexons and introas in a DNA genomic sequence. which
=
AAGOTGCTGTG CAOOTGAOTGG AATGTACGTGT CAOOTGAOCGG CAGGTATGCGO AAGOTAAAGTT CAOOTGAOCCC GCGGTAAOAOO GGOOTGAOTCA GAGGTGTGTGC CAGGTAATCAA ACGGTAAGCCC GTGGTGAGCOO AAGGTOOGTGC GAGGTGAGAGG AAGGTGAGGGC CAGGTAAGGCA CAGGTOAGCCT
is the essential ftrst-pass annotation of the genome project products. In addition to inferringholllologous (evolUtionarily n::laled) gene slnK:tu.a from database similarily sean:hes. Slatisticai ab initio gene-ftndil1l programmes ha~ been developed to integnde all known realUres aDd 'grammars' of protcilH:oding genes in a probabilistic model. Hidden Morko., models (HMMs) ~ at the heart of the mast popular gene finders (Genscan by Burge aDd Karlin. 1997. and n::viewcd in Dwbin elol•• 1998). HMMs WCI'e originally developed in Ihe early 19705 by elcctriaal engineers for the problem of spcccb recognition -to identif)' what sequence of phonemes (or words) was spoken from a long sequence of category labels n::pn:scnting the speech s~. The resemblance of the gene-finding problem to speech recognition and the way HMMs an:: formulated make them especially suited in this context. In addition. HMMs ~ lhcon:tically well-founded models. combining probabilistic madelling and fonnallangu&le theory that guaranlecs 'sensible' predictions that obey speciftecl grammatical rules e~n though they might not be the 4XJIRlCl genes. There are also wcll.cfocumented and computationally emcient methods for plll8lllClcr estimation (e.g. expc:ctatioo-maximisation) and optimisation (Vilerbi algorithm). A Markov chain is a series of mndom events oc:cuning with probabilities c:oaditionaOy dependent on the stale of lhe pm:eding event(s). A hidden Markov madel is a Markov chain in which each sIDle generales an observation according to some mle (usually stochastic). 111c: objective is 10 infu the hidden stale sequence that maximises the posterior probability of the obsen'CCI event sequence givc:a lhe model. For example. the hicldc:n Slates may repn:scnt words or phonemes aDd lhe observations are lhe acoustic signal.
Ia! iiG T~ 6~ ~ ¢ xl
Position: -3 -2 -1: +1 •
+2 .f3 +4
..s
+6 +7 +8
A 0.34 0.65 0.10 0.00 0.00 0.61 0.70 0.09 0.18 0.29 0.22 C 0.36 0.100.03 0.00 0.01 0.03 0.07 0.06 0.15 0.19 0.25 G 0.180.11 0.811.00 0.00 0.34 0.11 0.780.190.30 0.24 T 0.11 0.140.07 0.00 0.99 0.Q3 0.120.08 0.49 0.22 0.29
atatI8tIcal rnethocI8 In molecul. biology The human 5' spies site (exon-intron junction signal) 435
STATISTICAL PACKAGES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Malir discovery is an ala under acti~ resc:an:h and has bcneftlCd from sophisticated modem SlalisticallCchniques.ln • typicaJ setling. a collection or 5eqUeDt1C5 derived rrom MlCI.<WlRAY EXJIEItDIE.NTS or various sources an: believed to slaare cammon scquc:nce motirs that oftc:a replUenl runctionaJ domains or lqulatOl)'elemcnls. and the: challc:age is to find the: unknown signals and locale them in iDdi"iclual sc:quc:nccs. One approach is to fonnulllk: the mUltiple aJignment information as ~O DATA and inrer them togc:ther with othe:r parameters or the slatislicaJ modcl.gi~ only the sequc:nccs as obse:nables. Ad\'8DtlCd statistical madelling and ilCnllive computalion te:chniques such as the EM AUJO. RI11IM and Markov chain Monte: Carlo arc t)'PicaUy use:d for simultaneous model estimation (Uu. Neuwald and LaWJ'CRL'lC. 1999). The: runclion of a pmtcin is delermined by its Ihn:edimensional slnlClUrc. The: problem of pralicling the thm:-dimensional 5lrUcture of a prote:in rrom its amino acid 5c:qUc:ntC (or the pIVlCin-rolcliq problem. bc:c:ause proICins an: capable of quickly folding into their stable. unique threedimensional struclure. startiq from a nmdam coil conrormation without additional genc:tic mechanisms) is one of biggest challeages in bioinformalics. Th~ an: tbm: major lines of approaches for protein stRIcture pn:diclion: comparative modelliq. fold n:cognition and ab inilio pn:diction. Companlive modelling makes use of sequence alignmenl and database se:lIKhes and builds on the ract that evolutionarily related proteins wilh similar sequc:nccs have a similar structure. For plOleins widaout a homolocous sequence of knowD structure. the approach or "thrading~ has been developed. It is assumed that a small coIlc:ction of "rolds·. perhaps several huadlals in number. can be: used 10 model the nugority or protein domains in aJl orpnisms. The: pote:inrolding problem is thus nxluc:ed to the tasks of classirying the query protein based 011 ils primary sequence inlo one or the rolding classes in a database of kaowa 1hn:e-dimeft5ional structures. This classification is often accomplished using axnplicatc:d statistical models such as Oibbs sampling and HMMs to paramc:tcrisc the ftt or a se:quence to a given fold and solve the optimisation problem aoconlinJly. Analogous 10 the genc:-ftnding problem. one may atte:mpl to campute a polc:in's structure di~ly from its sc:quc:ncc. based on biophysical undc:nlanding or how the thrce-dimc:asionaJ stnH:lurc or proteins is altainc:d. The: chaJlelllC can be brokc:a clown into two mmponc:nts: devising a scoring fUnction thai can distinguish between c~land incorrect structurcs and a se:lIKh mc:thod to explore the confonnalionaJ space efficiently. Ir sua:essful. d=t folding certainly would give a deeper insight than the: "top-down' duading or homology modelling approaches. However. currently no reJiabie method has yet emerpd in this catc:cmy. During the past few years. the development or DNA anay tcchnolOSY bas scaled up thcbaditionaJly one-gene--at-a-timc
runclional studies to allow the monitoring of huncln:ds of thousands or genes simultaneously. A large number of stalistieal issues arise in connection with these: studies and these: have rosten:d unpn:cc:dentedconvelSalion and collabomlions between biologists and statisticians to establish means 10 plan. process and analyse thc:se massive datasc:ts. Many bnmches or statistics have bc:c:a revived anellor extended by their leCenl applications in the analysis of functionalgeoomics and molecular data. including DATA ~ methods 10 discover and classifY paUenI5. MUL11PLE lES11NO pruccdurcs to adjust P-VAJ.UB to conlrOl false: discovery rates and meta-anaJysis (sec SYSlBIATIC REVIEWS AND r.IEJA-ANALYSIS) to mmbine experimental results rrom various sources. New slatislicaJ methods will saon be nc:cded when combining inrormation Iiom multiple distinct data types (sequence. gene expression. prutcin structures. sc:queace variation and phc:aolypcs) for the same subjects. RFY
AIIda.... s. ... GIsII, W.....r. W.. PtIyen,E. w............... ». 1990: Basic local aliJDIDCIII sean:h tool. JDIInIIII 01 Mola. Biology 21S. 403-10. B.... C. B. ..... Karlla,s. 1997: Prcdidion of complete cme SlrUctum in hulDlll pamic DNA. JOrmllll of MoI«ultll' Biolog)' 268~ 71-94. BarIlla. It., EddJ. s.. KraP. A. ..... M.......,O.I998: Biolo,iallJeqlltlltclllNllysis: probobiliJti~ motle& of proleilu _ "uclei~ atids. c.nllrid&c: Camllridge Viii"mil)' Pras. KmID, S. ..... AItIcIauI. s. F. 1990: Methods for asscuiaglhc 5Ialistical sipificancc of molcculu sequence feahlla b), using p:neraJ lI:arillC II:hcmcs. ProtWdingJ 01 the NatiOlfllI Amtkmy 01 Scitlltes 011. United Slates of America 87. ~ Llu, J. S.. N....... A. aad Lawnnce. C. 1999: MaJkoviaa &truc:lllRs in biological sequence aJipmcals. JDUnlai of lhe Ameriam SlalisliNI Associalion 94, I-IS. Need'e-na, S. B. ..... WIIIIIdI. c. ». 1970: A aacnI method applicable to the scan:h for similuities in the amino acid sequence of t,,'O pnJteins. Jourlltli of MoI«ultII' BiDlo,), 48. 443-53. ......... w. a. aad Upm.... D. J. 1988: ImpRWed tools far biological sequc:nc:e comparison. Prtl«td· iRgs ollhe Na/iDnai At:atienry of SdmceJ of Ihe Uniled Sla/a of America 15. 2444-8. SaIHII, T. H. &lid W......, Me S. 1981: Identification of COIIUDOIl subscqucaccs. JOUI'MI 01 MDI«III. Biology 147, 195-7.
statistical paclcagea
In 2010 the Association ror Survey Computing (ASe) website: (www.asc.org.uk) lilled some around 200 51alista packages. Many of these: have been underdevelopment rorneady 40 )'c:ar5 aad therefore ilis bath a very mature and di\'ersc: sonwan: market. While many of these: around 200 packqes an: developed ror niebe markets. there arc still sevc:nl gc:acric software suites. II seems almast invidious to try to se:leca and discuss individual packages. HOMWer. there an: deady some wellknown and loag-eslablishc:d packages. and 10 many the term "statistical package' is almost synonymous with SPSSTM or possibly SASIN. Oiven the variety of analyses that these: packages offer. they can meet most user needs. It would seem likely that a viJtuaimonopoly should exist. but in fact there
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STATISTICALPACKAGES have been new enlnUllS gainiag popularity. Comparing these is inslructive about Rnds in the development of statistical softw~. The packages in the fintlable ~ the ones on which we will coneenlnlc here.
statistical packlliea Major statistical pacluJges Mlljor 8tlllisl;C'QI ptldulres
SPSS
www.spss.com www.sas.com www.slala.cam www.insightful.com
SAS
STATA S-Plus
The prevalence or these major packages notwithstanding. ada packages, as listed in Ihe second table, although these will DOl be further discusscd.. Competition has beca good for the development ofplUpanIS and palCnliai pun:h1lSClS should always be aware of options outside the norm that may weD fit their raauin:ments. Topther with the ASC website (Jiven earlier), it will always be profitable to make: comparisons when purchasing. t~ an:
Other mtljor stlltislklll ptlCkllge~ Ocnstat STATlSnCA NCSS SYSTAT
www.vsn-int1.com WWW.Slatsoft.com www.ncss.com www.syslat.com
Naturally enough. one wants a slalislical package to do statistics and Ihe leadinl paclc.qeseovcr a wide range. These include basic descriptive statistics. including EDA-style charting. cvll1JRhc:asive c:mss-tabulalion _alysis. means testing. the gc:aeral linear model. mulliwriaIC methods. data mlucUon and clustering. nonpa.ramelrics. IOI-lincar modelling. time series - and more. The convcnion in the late 198Os-cady 1990s or the packages SPSS and SA! to run on desktop PCs seemed to cause a hiatus in the development of statistical metlladoloJy within these suila. Quite possibly, one of the maia reasons for this was the nec:cllo develop new user interfaces, as an alternative the command-line format preYiously used on maiDfnunc and minicompaten. With the DOS interface model heiq rapidly succcedcd by that ofWinclowslM• major c:onsecutive design changes MR needed. This did seem to leave a window of appoJtUnity for new enballts to the 11'UIIket. which could write clin:clly usiq modern programming an:hiteClurcs. S-Plus is perhaps the earliest example or this. initially written for the UNIX system and then subsc:quc:atly ported to
PCs. The desip wascvaceplually novel. based on the notion or an extensible slDlistical calculator. It provides adyancc:cl graphics facilities and has become papular with professional slatisticians for its ability todevclop analysis methodologies. ratheT thllll being tied to a rigid rramework. Over time S-Plus has developed 10 acid extensive user interface c:ahanccments as well as larpr statistical Ubraries. 11Ie public domain OR' (www.r-pmjecl.org) is based on a similar philosophy 10 S-Plus (sec R). STATA has become a very popular altanalive far similar reasons. Startilll out as a command-Iine-driven pqram. it has malwal over the years 10 offer a windowing interrace in addilion. III atlnlcliveness to raean:hcn has been a modem appraac:h to statistical teslinl. as \Yell as ill ability to incorporate new mcthacIoIogiesquickiy. NOl only do Ihc devclopcrs have an ardJiteclUrc that pc:rmilscasy incremental ex....sion. users thc:msclvcs can pl'OJIBIR their own pracedun:s. This has .ained the support of die professional statistical community. who IhrouJh their educative role haye pmmotcd the package's popularity. Panly as a n:sull ofcompetition. packqcs have also begun to differentiate themselves in lenns of extending extra su~ pon to the whole data analysis procca. While the &dual lest result remains the core of any aaalysis. data lII8I1IIIemcni is farlllCR demlllldiq in lelmsortime.1'1Ie n:saumcs ncc:dedto support DATA MANAOEMENT in a MULnCEHrRE clinical11UAL an: significantly IlU'Ier than those for a classical experiment. In these scenarios. IIIBIUIIiIll and mIIIIipuJatilll data prior 10 analysis becomes very impaltanL SAS bas long specialised in data mIIIIagcmcat support. willi ftexible proccdun:s far mcrgiq and manipulaliq datascts. as well as links to dalabase packages. In the phannaceulical industry SAS is almost a de racto standanI far majar analyses. reRecting ill abiUty 10 Supparlthe SIIonJ audit n:quircmc:nts in the induslry. To a CCItain ex.tent other paclcqes have been n:slriclCd to the ~Clangular data naaclel (arspR*lshecl) view ofdata. although all arc now impmvilll these: fc:alUrCs. Onedim:t effect ofthedcvclopmcnt ofstatistical packages has been to intnxluce the possibility of statistical data analysis to a wider audience than just statisticians. Siacc these usen an: often in finance and cammerc:e. they reprcsc:at a significant revenue stream 10 package pnxluccn and making the prop1U11l11C1' Meadly for nonspecialist audieac:es has became a priority forsome. SPSS's menu-driven ·point-andclick' interface. ror example. epitomises this model. In contJast. the command-line models ofSAS. STATA or SPius n:quirc more dcd.icalcd lnIining. althouP as noted earliCl' all have developed similar facilities. (STATA B inlnKlucc:d a menu-driven interf'ace in 2003 tocomplemcnt its tnldilional cOlllllUllld-line orientation.) Intepaling advanced cIaIa-cn1ry fcatun:s with a statistical analysis package is common. 'I1Ie pmlominant spn:adsheet
437
STATISTICAL PARAMETRIC MAP _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ daIa enlly model can be enhanced 10 include daIa enlly fonns. daIa checkilll and audit. The large packages such as SPSS and SAS provide "add-on· pmgrams ror this. Other prognms provide din:ct database links so that data enlly can be proVided in a normal prognunming package such as Microsoft Access and then din:clly impoltCd for analysis. While lraditionally the ~11s or an analysis 11M interpreted and then incorporated into a final n:poIt. packages have begun to ditTerentiate themselvcs on their ability to produce tables and n:sults Ihal can be dircclly pasted mlo a presenlalion qualily report. Packap:s vary widely on their ability to do Ibis and support can be patchy. SPSS provides a very good ability 10 mo\'C ~II$ tables.. but the exported graphics an: not of such a good qualily. STATA. by contrast. docs not otTer sophiSlicated cxport of results. but has in its IatcSl venions excellcnt graphical outpuL SAS offen full programmable reporting fcahlres that 11M \'CI'Y Oexible. but challenplII ror the naive user. Whilc the main focus of any slalistical user is on the large packages. dedicated packages still have a role. As an example, propams such as NQUERY (www.sIaIsol.ie). dedicated to sample size cSlimalion. do one particular job very wcll and an: popular as a resulL 11Ic lone. innovatiYc n:scarcher (an example perhaps being MX found at www.vcu.cdulmxI) is also a likely produecr or innovative soRwan:. An importanl dimension ror the individual consumer can be price. Some of Ihc major packages havc prices that match their capabilities: the silllie n:seBKbcr. panicularly in the commcR:ial sector. may find this an important factor in choicc. All the relevant websitcs can gi\·c guidance on obtainilll price quotations. Rather than ossirying. the marketplace for slalistical softWaM is healthy and researchers can find themselves well supported with a choice or divenc pKkages. CS
statistical parametrtc map statistical refereeing
SccSTAnmCSlNlMA01NO
There have been hundn:ds of review articles published in the biomedical literature that point out statistical cnon in the design. conduct. lU1alysis. summary and presentation of rc:sc:arch studies. 111e contents or every gcneral medical journal (most notably Annals of Inler.1 Medicine. BriliJIJ Medical JOllmol. JoumoJ of lire Amerimft Medical Auoc;olion. Unrcel and Hen' EnglmlCl JOIlTno/O/Medicine). as well as of many spcciaiiSi ones. have been subjectcd 10 this intense scrutiny sometimes rrequently. 1bcse review articles have rocused on particular SIalislical tests. frequency of usage and corn:CI application or Ic:chniques of statistical analysis. design of C1INJC'AL TRIALS and epidemiological studies. use of POWER calculations and CON. FlDENCElNTERVALS and many othu aspects. Their aImoSi universal coac:lusion is Ihal a subSlanliai pen:cntage of research studies. perhaps as many as 5()Cj(..
published in the biomedical literature conlains enors or sufftcicnl magnitude to cast some doubt on the Yalidity or the conclusions thal ha,'e been drawn. This does not mean that the conclusions 11M wrong. but it docs imply thai they may not be right. and this inevitably leads to serious conecm about the consequences both for understanding ordisease and for the lR:atment of patients. One solution 10 this problem has been Ihe introduction or medical slalisticians into Ihe peer ~vicw PJ'OtlCss. Some hayC advocated thai all submitted papen should be scrutinised in this way. arguing thai staliSlical review or those that are not published. no matter how poor. will at IcaSi lead to higher' standards in research and improvement in future papers. In view of the very large number or biomedical journals and Ihc huge numben ofpapcrs submitted for publication every year. such a remedy is impracticable. An alternative, now used by sevemljoumals. is 10 divide the peer review proa:ss into two Slages, whereby papers consiclcml by the editors as caadidatcs for publication are sent first to subject mallei' men:c:s (physicians. surgeons. epidemiologists. etc.) and those recommended for publication by them are then sent 10 statiSliciaas for rurthu specialist revicw. The process or statistical review is complcx, requires sophisticated judgement and varies conSiderably in its application to evcry section or a paper (absb'act. introduction. methods. ~sults and discussion). Altman (1998) ~views some of Ihc diRic:ulties and provides practical examples of both definite emu and matters of judgement. within study design. analysis. pn:senlation and interpretation. Then: are 12 broad aims or staliSlicai review thai can be summarised as follows: to prevent publication of slUdies that ha\"e a fundamenial law in design: 10 prevent publication of papers that havc a fundamental Oaw in interprelalion: 10 ensure that key aspects of background. design and methods of analysis are reported clearly: to ensure that key rcalures or the design are relccted in the analysis: 10 ensure that the best methods or analysis. appropriate to the data.. are used: to ensure that Ihc pn:scntation or n:suIls is adequate and employs summary Slatislics thal are justified by the design. Ihc data and the analysis; to ensure that tables are accurate and are consistent both with the text and with each other. to ensure thai the style ofOgun:s is appropriate. that they an: consistent with text and tables and not unduly repetitious or other content: to guard against excessive analysis and spw10us accuracy; to ensure that conclusions are justiOed by the results: 10 ensure that content or the discussion is justified by the n:sults and, in particular. that it avoids genenalisation far beyond the confines of the paper: and. finally. 10 cnsure that the abstract accords with the paper. The statistical reviewer may also comment on subject matter when an expert within the medical specially of the paper. but will not indicatc typos. except when these are critical ror acc:uracy within formulae or texl.lndeed. pointing
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STATISTICSINIMAGING out inconsequential typos is not part of any aspect of any n:vicw process: the)' should be clisrqardecl b)' expelt n:vicwers and len entirely to the jouroal's CGpyeditor! Since slalislical n:view is camplicalcd and. far the n:viewer. samelimes excessively Icdious. with lhe nc:cessity or making very similar. sometimes lhe same. comments about manuscript after manuscript. ddailed slaliSlic:aI guidelines and chcc:klisis havc bc:cn wrilten with the specific: intention orhelping authors (and reviewers).11Iese have been suppartcd b)' the edilals or I1UID)' biamcclical journals and mcm:d to in the journal's pideiiDCS 10 authors. Examples can be found in Allman eI til. (2000) and Gardner eI oJ. (2000). 1110sc mast widely used far clinicallrials an: the CONSORT guidelines (Maher. Schulz and Altman.. 2001, updated 2010). far which them is accompanying explanation (Altman el 01., 2001, also updalccl 2010). and extension to cluster trials. noninferiarity and c:quivalcnt'C rancIomisecilriais. herbal mc:dic:ine illlencDlicn. nonpharmacological iIIIc:naIIicn. banns. al!stnM:as and pragmalic trials (sec www.consort-stalcmcnLarg). The chc:ckJisl thai fOnns part of the CONSORr stalerne'" is inlCaded 10 accompany a submiUcd paper and to ilKlicale w~ in the mlllalscript cuh item in the checklist has been addrased. thus servinl as a useful mnindcr to authors and an aide 10 mCftC5. 'I'IIcR IR also n:c:enl guidelines for reporting r.1ETA-ANALYSIS (PRISMA. which supercedes QUORUM). far observational studies (STROBE) and for genetic ASSOC'JADOH sludies (STREGA): details can be founclthroup the EQUATOR network (www.equator-nctwork.oll). Slalillical nMcW is inl&mdcd to be helpful aad consbUctivc: it should also reassure authcn and n:adcn that published papen an: sound. However. it is not always seen flUID this perspective and edilor& of joUlllals need to be viP- in ensuring that il does not become a focus far conbu'VCISy and dispute. as can happen. ror example.. when authors parade the views of "their own stalistician' to counter comments flUll1 a refem:. There is at prcsentlittle incentive far stalislicians 10 CIIPF in such n:view - it docs nol enhance theircara:n.there is no specific nning far ii, small (if an)') remuneration. il is time CCJIIsumillJ and "the only likely CXJIK'Rlc consequence of load n:vicwiq is fulun: Ialuesas farlllCR RWicws' (Batd1Clli. 2(02). Bacchctli also points out thai slalistics is a rich IRa far ftndinl mislakes and. when coupled with "the notion lhIt randing Raws is the key to high quality peer moiew', can lead to ·ftnding Raws that an: not really them·. This reinron:cs the need for sound statistical judJemenl. Statisticians may also have 10 aJUJlter mistaken crilic:isms from subjecl matter Ieviewers with limited Slalillic:aI knowledge (8acc:hcui, 20(2). The final part or statistical I1:view is usually a n:commcndation to the journal's editor either 10 accept. at'ICCpI with ~vision. I1:vise and resubmit. or n:jeclthe paper. n.c distinction between the second and thinl is sometimes difftcult and can only be made b), balanc:iq the eXlent and Batun: or the I1:visionsagainSlIhc capabililies of the authcnas evinc:ed
flUID the submitted paper. Rcjcc:tion b)' the statistician can also lead 10 provocation. especially as authors will be awan: that their "subjecl malter' peers have already judged il sound. In 1937 the I..ance,·s leadilll article Ibat hcmIded the series or classic papers by Bradford Hill on The Principles 0/Medit.Yl1 S'1I1islirs fCRWameci: ·It is exasperating, when we slUdied a problem b)' methods that we have spent laborious years in mastering. to find our colII:lusions questioned. aad perhaps refuted. by somcaae who could not have made the obsenalions himselr. It n:quin:s II1CR equanimity than moSl of us possess to acknowlcxlJe thai the faull is in ourselves: Authan or papers an: IMlvised to n:ad staIisIicai n:views carcrully. put them asidcfar48 hounand ani), lbcnslall to think aboul how 10 n:sponcl. For runhcr infannalion and discussion TJ sec Rubinstein (2005). Smith (2005). Wan: (200S). AftmaD, Do O. 1991: Stalislic:aI MYiewinl for medical journals. Slalislits in Mftiiti"r 17.2610-74. A....... 0. G .. Gore, s.~... Ganlaer. M. J..... S. J. 2000: S'alisliml guilelinr$j",. I.YJIIIribul.s 10 rMtlit:al jollrnals. In Ahman. D. G.. Machin. D•• BI)'IId. T. N. adOanlaer.M.l.(cds).SID''''ics wi,h co"jidmt~. 2nd cdition.l.oadon: BMJ Books. 171-90. Aa.a.. Do G .. SdlaIz,K. F.. Malaert 0.. Eaer.l\L, Dattdolr. F., m......... D., GebIcIae, P. C........... T. far ... CONSORT Groap 2001: 1hc ~iscd CONSOU stllcment far R:portinc randomiscd bills: cxplaaatioa ad clabondion. Annab tJ/ inlrmal Medicine 134. 663-94. BaedaetII, P. 2002: Peer ~ or stllistics in medical IaClR:h: the GIber pJUblem. BrillJh Mftiktll JDlllfla/324. 1271-73.0.......... 1\1. J., ..add.., 0.. C. .pIIeII. ~L J..... AItmaa. 0. o. 2000: Slalislittlith«klisls.1n AltmaD.D. G.• Madlin. D.• BryanL T. N. and anncr. M. l. (cds). S'alblics M·il" «IIfjit/ence. 2nd edition. Landan:
p.....,
BMJBoob.191-20I.MoIIer,O',SdI..... It.F.udAIIIuD.O'G. ........ CONSORTGraap200J:TheCONSORTstatcmml: R:\,iscd n:lCCllDlDl:ndaliaas for iJDpnwiqlbe quality of n:parts of parallclpaup randomiscd trials. Alulals tJ/lnlenrtll Mftiitine 134,657-62. Rulli...., LV. lOOS: SIDlbliml re1'irlF for /MliitOl jDIIma&, guidelinesj",. Du,lrors.ln Coltan.. T. aad Annit.qc. P. (cds). ED"""," pftiia of BkAsla/islits. 2Dd edition. 200.5. Cbichesta: Joba Wilc)" & Scm Ltd. p&lCS 51CJO-5192. S....... R. 200.5: Sla/mimi Tenn·j",. mediCQljtHlmals.jouma"s~tli'r;
In CDhon. T. aad Annilqe.
P. (cds). EnC)'tlopet/iD tJ/Bioslalillks. 2nciedition. lmS, Chichester: John Wiley & SODs Ltd. palCS 5193-5196. W~ J. H. 200.5:
Slalislittli rem' for lMtIit:aljtJllt1IIIu; In Colton, T. and Annilqe. P. (cds). Encytlopedia tJ/BitJslalislirs. 2nd edition. 2mS. Chidlester: JoIm Wiley " SODs Ltd. p&lCs 5186-5190.
statistics In Imaging
"Ibis is the usc of statislic:al techniques to analyse and quantify infannation conlaincd in digilal image formal. Imaging is widely used in medicine 10 visualise objecls. slJuc:tun:s and even physical proc:csses in .-;vo and in vill'O. A significant advanlap in mc:dic:al imagiag is lhe abilit)' to visualise 5lIuclUn:s or proc:csses without I1:lying on surgical operations. Thus. animals may be reqeled in cbuJ disc:o\'Cf)' and development or patients may not suffer fiom intrusive procedu~s. 11Ie ability to acquire infannatiOD withoul inlnlsive proccdun:s is also a
439
STATISTICS IN IMAGING _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ disadwatage to mc:dical imqiq.. This raises the issue of sunupte imqilll ENDPOINI'S (or .DIITO,at:,,): i.e. bow well do the cooclusious from an imaging cxperimeDt canapand to physical pmpcrtics obWacd from an inlnlsive procccIure? Allhaugh Ihe human visual system is \lei)' good at extnH:l.ing informatian fmm images. the sheer IIIDDUDtof data being produced CRales the CXIIDIIIOII problem of ·not enough lime to look al eYeI)'lhilll'. SIali&licai techniques usinI computers enable rc:searchen and clinicians to IUIIUI18rise large numbers ofil118lCS rapidly so dud paIIa'nS, Rnds, rqioas oractiyatiaD, etc.. may be idenliftc:d and quantif1cd. Besides the amount or information. medical imaging SyslCms also see beyoncIlhe visible light specIrUm and 8ft: able to process inl'onnalian Iium a wide range of Ihe elcctJumagnetic spectrum. Examples or medical imqing systems iacludc COllVCDtional radiology (X-rays). angiography (imaging or. system of bload ycssels using X-nays). positron emission t0mograph)' (PET). X-ray transmission computed tomography (CT)~ mapetic raonancc: imaging (MRl). microscopy. silllie photon emissian (campukd) tomography (SPET or SPECT). spc:ctroscop)' and ultrasound imaging. EvcD clcctrocDc:ephaIograms (BEGs) or magDdOCDCephalopanas (MEOs) 1ft examples of imaging systems. albeit with very poor spatial resolution when comparccito MRI ar PET. An image is a two-dimensianal func.1ian Ibal depends on spatial caordinalc:s. when: the amplilUde or Ihe function represents Ihc brightness or grey level of Ihe image at a particular poinL Individual clements or the image 1ft kDowa as piclUre elemen~ pixels for short. Imqes may be
coDecICd to fona • lhR:c-cIimcnsianai data SInIcture., ar volume:. where abe individual elements are called voxels. This is common in. far example:.. MR1 and PET. whl:re an experiment an a single subject will involye acquiriDg information in three spatial dimensions and in time. Traditional statistical tcchniquc:s in image analysis include: areas sucb as signal and morphological processiDg. Signal processing applications include image enhancement. imqe reslamtion. colour image pmccssing. wavelets and campn:ssioD. Morphological pmccssing assumes Ibal set Ibc:ory may be applic:clto manipulate slructun:s pn:sc:nt iD an imqe:.. A relatively new area or raeaIdI iD imaging is the usc: of MRI in fimctional or pbumacological 1Iudic:s of the brain. Punctioaal MRI (fMRI) is now weU developed and seeks to associate braiD fUactions (human CII' animal) with spcciftc regioas of the braiD. PharmacolOgical MRI (phMRI) is relatively new and seeks to aslOCialc pharmacakinc:tics with spc:ci8c regions orlhe (animal) brain. AlthouP group studies are willcspmul, consider a sing1~ubjc:ct analysis fiam • typical fMRI experimcnL After dala acquisition. a set of images associated with distinct slices of Ihe braiD is available far analysis. Each slice will have a time scqoeuc:e assoc:ialed with it; i.e:.. the imqiag experiment conlains both spatial and temporal infoamatiOn. Given knowledge orlhe study design. the: goal is to iclc:nlif'y regions of the braiD w~ significant activation wasobsc:rved, where aclivalioa is mc:asurecI by die: intensity of Ihe signal observed in Ibc: fMRI experimcnL Signal intensity is relalcd to Ihe ratio or oxygenaled and deoxygenated blood locally in the: brain.
(X,y) • (50;30)
I I 0
it
• I
I 0
20
eo
40
80
100
TlIII8
statistics In Imaging Example ofan MRI slice {Ie"} and voxel timtJ course (tight). 'TheBJCpetimentaldesign hasbBen supefimpossdon the IimeCOUlBfJpIot where the vfsualslimulatlDn Is shown bya dashBdline andlheaudiostimuJalion Is shown by a dotted line (data proWdBd by the Stain Mspping Unit, Depalfment of PsychIatry, University of Csmbtidge)
______________________________________________________________
The time coune in lite figure (page 440) shows a typical slice fiom an MRI experiment and the study design of onIolT sequences for \isual (dashed line) and auditory (douc:d line) stimuli. Each voxel in the image has an associated time course: a mask that eliminates nonbruin voxels is Iypically used to rocus the data analysis. UNEAR REORESSION. or. more fully. fiUi~ the OENERAUSED UN£AR MOOEL (OLM). is pc:rformcd on each voxel using the experimenlal design. convolved with a function to model Ihc hacmodynamic n:spolLK of the patient. as Ihc independent ,,·ariable. Trend mnoval is an important step and may be applied as a preprocessing step or by incorporating low-frcqucnc:y terms explicitly in the OLM. 1hc t)'pical assumption or independence bclween observations is not true in fMRldaIa: methods such as ~whitcning. autoregressive mociclling and least squares with adjustment ror correlated enors are auempts to oven:omc the limitations or ordinary least squares. Fiuing the OLM to fMRI data may be performed on an individual \'Oxcl. on a cluster or '¥Oxcls known as a region of intcn:st (ROI). where the data are averaged in space to produce a single time COUI1iC. or on e"'ery brain voxcl in the image. For Ihc first two cases. standard Ihcory far statistical inference on Jql'CSSion models may be applied. For Ihc lhird case. tcc:hniqucs such as Gaussian nmdom field theory. resampling (sec: BOOTSTRAP) and adjustments by multiple comparison procedures have been usc:d. Regardless of which mClhod is applied. a sel or voxels is obtained where signirICant aclivalion during the experiment was detected. Resean:hcrs then relate the images to the analomical regions identified in the acti\'ation image. also known as a statistical panunctric map (SPM). Infonnation from a group of patients may be combined or compared by first registering all images with a standard brain. The most common brain adas used is lhe Talairach alias. Then. a random elTec:ts or fixed elTccts model (sec LINEAR MIXID &n:crs MODELS) may be used 10 apply a statistical hypothesis tesl between groups of subjects in the experiment. For more details sc:e Serra (1982). Olasbey and Horgan ( 1995), Mooncn and Bandcltini ( 1999). Oonzalez and Woods (2002) and Worsley el aJ. (2002). B\v Glasbey. C. A. aDd Horpll,O. W. 1995: Image tlIItIl),su/or the bioiogittJI stienres. Chichester: John Waley cl Sons. Ltd. Oaazaltz, R. C. aDd Woods, R. E. 2002: Digital inrage protessing. 2nd edition. Englewood Clift's, NJ: Prcatioe Hall.l\loo.... c. T. W. ad Bandeltlal,P. A. (cds) 1999: FllIIctionol MRl. Bulin: Springer-Verlag. Serra.J.I982: Image tllfolysu tllfdmothmlDliml morphology. London: Academic Pn:ss. Worsley. It. J .. LI.... C. H., Alto... J., Pe..... V.. Daco. G. H.. M ..... F.ad Evul, A. C. 2002: A geDeral statistical approacb for fMRI datL NeuroImage IS, 1. I-IS.
statXact This is a specialised software package ror the ex.act analysis or small-sample categorical and nonparamctric
STA~CT
data with special emphasis on data in the rorm or contingency tables. The term "small-sample' applies equally to datascts with only a few observations. to large but unbalanc:ed datascts or 10 ~y TABI.ES with zeros and small cell counts in some or Ihc cells bul luge cell counts in other cells. In these sCUings. StaIXad produc:es exact P-VALUES and exact CONfIDENCE IN1BlVALS instead or relyi~ on possibly unn:liable large-sample theory for its infe~nccs. 1hc iDference is based on generali~ permutation diSCribulions or the appropriate test statistics in a conditional reference sct. DilTerent ",'Views of StatXact DR given by Lynch. Landis and Loc:alio (1991). Wass (2000) and Oster (2002). The cunent version. StatXad 6. olTCJS exact P-values ror one-. lWO- and K-sample problems. 2)C 2. 2 x t: and r x c contingency tables aDd measures of ASSOC'IATION. The data may be eilhcr unslnlifiedor Sb'Dtificd. Both independent and blocked samples DR accommodated. lbis version computes the exact confidence interwJ for ODDS RA110S that arise from 2 x 2 and 2 x c contingency tables. as well as an exact confidence interval ror the MEDWl shift parameter in an onIcred 2 x c contingency table. StatXacl olTers proccdura that clllel' explicitly to binomial data. Poisson data. nominal categorical data. ordered categOrical data. ordered COrMlated categorical data. continuous complete data and continuous right-censored data. For comparing two proponions (either from dependent or independent samples). StalXacl provides Ihc exacl unconditional confidence inlCnD1 ror a dilTerence in proportions or Ihc ratio of two proportions and computes exact P-valucs ror tests of equivalence and noninferiority. In addition to tools for ex.act inference. SlaIXacl also provides exact power and sample-size calculations far study designs involvi~ one. two or several binomial populations. In the two-binomial casc. these realUreS include exact powu and sample-size calc:ulations fordcsigning noninferiority and equivalence studies. In case: the computation of an cxac:t P-value becomes infeasible due to the lack ofeithu time or computing memory. SIatXact produces an unbiascdeslimate oflhccxac:t P-vaJuc 10 atlcast two dcc:imal digits of aa:Ul8CY USing el1lc:ienl Monte Cariosimulation strategies (see MARKOV CHAIN Mo.~ CARLO). The USCI' can arbitrarily incmu;c the number of Monte Carlo simulations in order to incmISC the acaarac:y. StalXac:t 6 runs on Microsoft Windows NTI2000IXP as a standalone prodUCL In addition. a special version. StatXact PROCs for SAS Users. is available as external SAS procedures for both the Microsoft Windows and Unix. operating systems. CCoIPSeICMINP L)'adI, J. Co, I MMUs, J. R. _
lAfallo. A. R. 1991: SIalXact. TIle
Amnicun Slatistifitlll4S. 2. 1S14. Oster, R. A. 2002: An exam-
inlllion of statistical ~ packages for CalcprU:al daIa analysis using exact methods. The AmeriCOll Stotirtidtlll 56. 3. 235-46. Was. J. A. 2000: SlIIXKt .. for Windows. Biolech SoftM'(lI'e and Intnnet Report I. I. 17-23.
441
STEM-AN~~~OT
_______________________________________________________________________________________________________________________________
stem-and-leaf plot Essenlially. this is an enhanced in which the actual data values are retained ror inspection. Observed values are each divided into a suitable 'stem' and 'leaf' ~ e.g. the tens ligun: and the units ftgun: in many examples.. and then all the leaves com:spondi~ to a particular stem are listed (usually horizontally) next to the value of the slcm. An example is shown in lhe figure. IBSTOORAM
14 14 14 14
15 15 15 15 15 16 16 16 16 16 17 17 17 17 17
: : : :
: : : : : : : : : : : : : : :
2 555 (f;TTn
889
000000111111 22222?2m?233333333333333333
44444.44444555555555555555555555
668£66616666666666666777711111111111 11111
888B8B888B8888888BB88888888888899999999999999999 0000000000000001111 1111111 11 11111 1t 333333333333333333333333
4444 ••44444.44444555555555555555555 668666666861111111
88BBaagggggggg 00000000000111 333 4 67
88
stellHM'ld-leaf plot A stem-and-leal plot for lire heights in centimetres of 351 eIdedy women
The plot oombines the visual pictun: of the data provided by the histogram with a display or the ordeml data values.
The design of stem-and-leaf plots is discussed in Velleman and Hoaglin (1981). It is important to use a typeface for which each digit occupies eqUivalent space. otherwise a key featun: of bei~ 'a histogram on its side' is lost. SSE Vel...., P. F. &lid H....... D. C. 1981: Applic:atiaas. basics, and computing of explCll'llory data analysis. Boston: Duxbury.
stepwise regression
See LOOISTIC' REORESSION. MULTI.
PLE LINEIdl REORESSIOH
stochastic process This is any system thai develops in acccxdance with probabilistic laws. usually in time but sometimes in space and possibly even in both time and space. Foreumple.the spread ofan epidemic is a stochastic process and its development can be tracked in lime. across some temUn or at the «Injunction of both lime and position. The constituents of a stochastic process are its :./ale. X say. and ils inde.dng lwiablt!f:.'• .s or t. 111e state is the primary mc:asun: or interest. such as number of individuals ill. while the indexing variable denotes either the lime (I) or the position (:.) at which the state is measured. A d~ indexing variable is usually shown as a subscript~ but a «Intinuous index appears within InIditional function notation. Forexamplc. suppose that the stale or the epidemic; is the nwnbcr of individuals who are ill. Then X, would denote the
number or individuals ill at lime / if observatiOIW wen: taken at the start ofeach day. while Xis) would denote the numbcr'of individuals ill at position:. measun:d continuously in space. Of course. the state or the process can also be either discrete (e.g. number of individuals ill) or continuous (e.g. £CO reading of a cordillit' patient). An essential ingredient in a stochastic process is the dependence or either successiYe or neighbouri~ obsCI"YDtiOIW. Different assumptions about the dependence slnlCtun: lead to diffen:nt types or stochastic process. which QI1 be used as models ror many observations collected in practice. The objective is usually to derive theoretical PROBABIU11ES for the variaus sbdcs of the syslcm and thus to use these probabiJitieseither fOl'prcdicti~ the future bchaviour of the syslcm 01' for gaining some understanding orits mechanism. Many practical systems can be modelled adequately by assuming a Markovian dependence structure. in which the PROBABILITY DlSJ1llBUllDNof X depends only on the most recent or neighbourl)' value. Standard stochastic processes that accord with such an assumption include random walks. Markov chains. branchi~ proecsscs. birth-. . .dcath proceues. queues and Poisson processes. Jones and Smith (2001) proVide an accessible introduction to the mathemalics of such proceues. Some classical applications of stochastic models to medicine are described in Gurland (1964). Suc:casrul uscsofMarkov models in medical CXJIIleXts range in time and application from the planning of palienl care (Davies, Johnson and Parrow, 1975) to n:saun:e provision (Davies and Davies. 1994) and the COSt-effectivel1CSS of ,'Kcines (Byrnes, 20(2). Many more examples can be round in jaumals such as Healdr Care Managmlell/ Scielll:e. WK 8~ O. B. 2002: A MaJkov model for sample size calculation and infen:noe in vaccine CCl5t~Jl'ecti\'aIeSS studies. StaliJtits in Medicine
21.3249-(10. DattIs, ....... 01.... H. T. O. 1994: Modelling palic:lll flows and IalOUIa: JIIOVisiaa in health systems. Omtro. Internalional JoumaJofMtmII&mrt!1fI SdeIW 22, 12l-ll. om., R.,JeImsaa,D. aDd tanoow, S. 1975: Planning patieaI care "ith a Marlcor model OperaliotlalRemsrr:h QuorIer/y'1A S99-CI07. GurIInd, J. (cd.) 19M: StoclJastk motlels in mftiidnt IIIIIl biology. Madison. WI: Uni\"CISiay or Wwmsin PIns. J.... P. w. UId SJDItb. P. 2001: St«lroslk protnSes, QII introJudion. l.oDcIon: AmolcL
stratified randomlsa6on See RANDOMISATION stratified sampling Stratified sampling oc:curs within defined strata of some population. This should be carried aut when the population contains easily identifiable subpopulatiOIW. If the sizes of the strata arc difrc:n:nt then proportional allocation should be used. If the SJ.o\HDARD DEVIATIONS an: known in advance then optimal or Neyman allocation can be used to minimise the VARIANCE of the estimalc or the populalion MEAN. If they an: unknown it is possible to usc a pilot study to eslimalc the SlaDdard deviations.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STRUCTURAL EQUATION MODELS The mc:thod is as follows. Define the strata that the population faUs into. Decide if the slnda 1ft of a similar size and if the standard deviations 1ft Imown. For similar sized strata usc simple random sampling 10 select members of each straIum.lfthe sizes 1ft difTen:nt then the number in each slratum is proportional to slndum size. 'tben simple nndom samplina; is used to obtain the correct number in each IiIraIUID. If the standard devilllion is known in adWIICIC then for a fixed population sac. n is obtained by choosina; "1 50 that: nj ~
n
NjSj I
~ N..S",
when: ~ is the number in the stnIlum. S, is the standard deviation of values of ilCms within the s~ n is Ihe fixed population size. "1 is the number to be chosen by simple random sampling from the stratum and .s is the number of SlratL Thornhill eI 01. (2000) used stratifted samplin; in a study of disability followina; head injury. The patients we~ stratified accxxdina; to the Glasa;ow coma score. The mild aad unclassified patients were further stratified by Ihe IRsentina; hospital and a simple random sample was laken. In a;eneral. if the population can be separatc:d into distinguishable strata then the eslimates from stralified samplina; will be mo~ pn:cise than from a SI)DLE RANDOM SAMPLE and the~f~ it can be efficient. The disadvantages a~ that it CaD be ditlic:ull to choose the strata. it is not useful without homoa;eneous suba;roups, it can require ac:cunde information about the population and it can be expensive. For ~ details see Crawshaw and Chambers (1994) and SLV Upton and Cook (2002). CnWlllaw. J. _ ClaalDben, J. 1994: A crmciw aJrlrX in" In~1 staliJtits. 3nI edition. Cbcllalham: Slanlcy 11Iomes Publislam lid.. 'I1Io1IIIdII, s., T........ G. Me. MIII'n)', O. D., Me!• .., J., Roy, C. W.IIIIII PeIua7. K. L 2000: Disability in ~ people and adults one year after head injuJy: prospecth'C: cohort study. BrilisIJ Medical JOrmNlI320. 1631-5. UptGa, O. aDd Caak. L 2002: DitliolllUY of staliJtia. Oxford: Oxford Unh'a5ity PIas.
structural equation modelling software The four mosl commonly used packages for fitting Slnlc:tural equation models 1ft:
EQS (http://www.mvsoft.coml) LISREL(http;//www.ssic:entnLcoml) MPlus(hUp:/Iwww.stalmodel.comforder.html) AMOS(hup;/Iwww.amosdevelopmenLcoml) All four allow the fittJDa; or complex models n:lalively easily. althaua;h MPlus is possibly the moll Rexible. Each package's website provides speciRc information OIl their capabilities. as wen as availability and cost. SSE
structural equation models
The opendioaal definition provided by Pearl (2000. p. 160) slates: "An equalion y = Px + E is said to be :lITtlelUral if it is to be interpreted as follows: In an ideal experiment when: we conlJOl X to x and any other sCI Z of variables (801 containina; X or Y) to :.the value of Yis given by /J.Y: + E. when: I: is not a function of the seltings :c and z: The key word here is "control'. We 1ft obscrvina; values of Y after manipulalin; or fixina; the yalues of X. The madel implies that the values of Y. in fact, 1ft determined by Ihe values of X. A structural equation mode) is a description of the causal effect of X OIl Y. IL is a C.\USAL ~. and the panunelCr' fJ is a measun: of the causal effect of X on Y. II should be clearly distina;uishcd fI'om a linear n:p:ssion equation that simply describes the ASSOCJATION bc:tw=n two random variables. X and Y. If we an: able. in practice. to inlervene and control the values of X (by random allocation. for example) then it is straia;hlforward to use the ~ulting data to obtain a valid estimate of fJ. If. however. we do not have control of X. but can only observe the values of X and Y (and Z). as in an epidemioloa;ical or other type of OBSfllV~ DONAL STUDY. for example. this does nol invalidale the above opentional definition, but the challena;e for the data analyst is to find a valid (i.e. unbiascc:l) estimate ofthc causal parameter /J under these circumstances. The equation y =fJx + E is. of course. a description of a very simple structural model. II is common to coHeel data on se~ response variables (Ys) and several explanalory variables (XI) and to construel a series of structural equations of the following farm:
(;= 1Io/;j,k = ltoJ) (8) in which sevc:ml of the fJ values wiD be fixed 10 be zero. a priori. The adIers 8ft: to be estimated from the datL 11Ic form of the cqualions defined in (I) - i.e:. the sbUL1ura1 thccxy that dcccr-
mines die paIIcm offJ values 10 be cstimaIcd and those flxcd at zero - is detmnincd by the inYcstiptor's prior knowledge or hypolhescs c:oncx:ming the causal prucesses gcnc:raling the dIIIa. Quodng Byrne (1994, p.3): 'Stnactwal modeling (SEM) is a slatimc:al mdhodology that Iab:s a bypolhesis-tesling (i.e. confinnaIDry) approach to the multivariate analysis of a stnIclund thccxy bearing on some phcDDIIIeIIDR.' 'l)pically. SEM inyol\ICS (a) the specification of a set of sInIc:lural equations, (b) representation of these structural equalions using a a;raphical model (a path diapam - sec laler). (c) Simultaneously linin; the set of sbUctural equalions to a given set of data in onIer 10 estimate the fJ values and to test the adequacy oflhe madel.lfthe model fails to filthen Ihe inYeslia;aaor may nwise the model and try qain. The success of the exucise is likely to be hia;hly dependent upon the quality of Ihe investia;ator's prior knowleda;e of the likely 443
~~~EaM~ONMOOBB
_____________________________________________
causalllleChanisms under Icst and how much lhouPl he or she has giYCD 10 Ihc: clc:silll oflhc: slUdy in the ftnI place:. Good dcsilll aDd subsequent statistical analyses n:quire lc:chnical knowledge, skill and experience:. Far lcchaical knowlcd&c, n:adc:n arc~fclMd 10inlnJductcxy tealS by Dunn. E\lCrilland Pickles (1993). 8yme (1994) and Shipley (2000). and 10 the ad\lllllCCd IDDIIOpBph by Bollen (1919). Discussian orSeM in the contexl or n:c:c:nI work 011 aMIsaI infc:n:ac:e can be: found in Pearl (2ODO. 20(9)anci. apia, in Shipley (2000). Traditionally, SEM lias canc:cnlnllCd _ slnlclUnll models far quantilali~ daIa. which arc usually assumed 10 be: multivariate normal. Exlcnsions frvmthc IJaclilionallinc:ar slrUcllinll equations (i.e:. lJIII!AR REDRESSIDN) to gcnc:nlized linear slrUcluni equalians arc discussed by SkrancIaI and Rabe-Hc:slcth (2004). II is bquendy the case Ihal we cannot measure construc:ls din:c:dy. or al a l DOl withoul considerable: MEASUJIElIENI" ERROR. This givcs rise to lhe idea or LA11!NI" VAIlL\BIJSS. Thc:sc arc chandc:ristics Ihat arc IIDI diftlClly obsemable:. The)' may be straiptrorwanl canccpIS such as heighl. weilht. amount of cXPOSIR to a known IOxin. or COIICCnlnlliOll of a givcn metabolite in blood or urine:. bul we expliddy allOWl. that the)' cannot be mc:asun:cl wilhaul emJI'. 11H: obsc:rvc:d IIIC8Surement is a manifcst or indicator variablc:. while the anespanding amknown. but true:. value is a latcnt variable. Howc:~r. lalcnt variables ma)' be: more abslnacl lheon:tical aJDSlrUcts thai arc inlrDduccd toeaplain COVARIANCE bc:twc:c:n manifcst ar indicator variablcs. An example of this Jasllypc is the sci of'scora on a bauc:ry of copiti~ lc:sts that arc assumccl in some: way to reOcct a subject's cognitive abilil)' or gcnc:raI intelligence. Analhc:r example could be a sct of symptom scverily scores (the manifest variables). which an: assumccllO be indicatan of a palicat's overall cIcP'C of clcpn:ssion (the lalcnt wriablc). ~aII)'. a data anaI)'st will propose a famaal mc:asun:mc:nt model (usuaIl)' equivalent to same form of fadar analysis n:prcscnlalion) to R:laIc the: observed mc:asurcmc:alS willa the underlying lalcnt variables. We can Ihc:n proceed to propI'IIIC slrUctuni or causal h)'JXJlhc:scs involving the lalcnl variables instead of the fallible (cnar-prone) indicators. We sbut. for eumple. willa a t'OVAIUANCE MA11UX for the obsenal yariables. We fit a gcneml structunl equalion model to this covariance or moments matrix. "Ibis procedure will involve the simultaneous filling of'the measun:menl equations far the relevanl latent variablcs and Ihc:ir ~ing indicalors and oflhe slrUctuni equations thau&ht to rcOcct the 1IS5umc:c1 causal relationships between the lalcat variables. Specialist softwarcpackagcs 1ft now wiclc:ly available forsucbanalysc:s
din:cliOll ora causal elTcct)araclaublc-hcadc:clonc (indicaliq ~). The obsc:nrc:cI or manifest variables arc usuall),
plac:cd within a n:c:langular sqUIR box. while lalcnl variables arc placccl within aD oval or a cRle. Random measun:l1ICIIt cnon and residuals I'ram slructural equations., allhaup they an: sIriclIy spc:akilll laIaIl variables. arc nat traditionall)' plac:cd wilhin a cin:1c: .. owl. Path diBlrams 8ft: very closel), related to the gmphic:al I'CpI'CICnlalians (clRctcxl acyclic graphs. or DACis. far example: ICC CIRAPIIICAL MODElS) Ihat have R:lali~ly nD:nll)' bc:cn ~Iopcxl elsewhere (sa: Pearl. 2000. 2009, for example). 'I\vo simple: cumples of path dia&rams arc shown in Ihc: two figures. A detailed explanation will be Jiven in the foUowing section.
0-Y--0--~--
oI
1 Ox _---_1
Dy
p
structundequatlon models Path flaglBm tOlflPRlSenI the strucluflll equations linking encouragement to slop smoking duflng pregnancy (Z), the amount smoked duting pregnancy {X} and the bitth weight of the child (Y). Ox and Oyare randomly distributed residuals
EI
E2
I
J
0
GG \/ Y
-0,
Oxl
II
-0 I
P
Dy
_IBm
structundequatlon models Path to represent the stnJctuflll equations IinIdng encourtJIItIment to slop smokIngdutlngpteglUJncy(Z), the I1Ue amount smoked dutinQ Pf8tIIJancy (TJt) and the bilfh weight of the child (V). Ox and Dv are randomly disIribuIed residuals. XI and X2 are error-prone indicatots of smoking, with unconeIated measurement etrOl'S E1 and E2 respedive/y
(sec 51RUC'1URAL EQUA'IKIII MODflLINO S'OF'IWAJlE).
SlrUclUral equation maclc:ls arc ~ often n:pn:sentccl by a graphical slnIClUrc known as a paIh diBlnln (sec MlH ANALYSIS). In a paIh diagram die praposc:d relationships bc:awcc:n variables (whelhc:r manifest or observed) an: n:prcsenlCd eilhc:r by a linglc-laclc:d anow (indicatiq the
For an example:. Pennull and Hebel (1919) describe: a trial
in which pR:gnDt WOIIICn were randoml)' allocalcdto R:CCiYe CDlXlUl"8lcmc:1lI 10 n:duc:e .. stop their ciprcllc smokinl during pR:IlIIIIICy (Ibe treatment gruup) ar not (the conlrVl glOup) - indicated by Ibc: billlllY variablc:. Z An intcnnc:clialc
oulcome wriable (X) was the amount of cipn:UC IIDOking n:conlccl durinr: pregnancy.1be ultimate outcome (y) was the binh weightoflhe ncwbomchild. Smoking is likely loha~ been nxIuaxI in Ihc group subject 10 encaurqemcal. but also in the conlml puup(althoulh. presumably. to alc:ucrexlenl). 'I1Iere IR also likely to be: hidden confounders (e.1- odIer heallh pnIIIIOIing behaviaum) dudlR a.acialccl with boIh the molher·s smokiq duriq JRIIUIDC:y and the child·s binh weisht. Smalcinr: (X) is an endopaaus lIcabnent wriabIc the above confounding will n:suIt in Ihe n:sicIuaI rmm a sb'UclUnll equation madelto explain tile lc:Yel of smoking by RANDOMIf.VIOJtItonx:ci~c:acauracement be:ingc:an:laIcd with the n:siclual rrom the slnlttural equaUan liakinJ obscnccl levek of smoking to the birth weight or the child. We DSlIUIDe that ~ is no din:ct efl'ect of randomization (2) _ outaane (y): theell'cct ofZon ris an iJaclIJm ODe thmugh smoking (X): i.c.Zis anDBlllM!NL\LvARL\II.E.Ignarinr: Ihc.inlcn:epl terms. the two stnIctural equatiaas IR the followiq: X = yZ +
Ox and r = IX + Dr
In filting thc.sc two models to Ihc apprapriak: data we
lie-
knowlcd&e the ~aIion (p) bc:Iwcen the n:siduals. DJt and
Dr (diose CGlDpOlll:llts or X and r not uplained by Z and X n:&pcclively).1heoverall model isilluslralcdby Ihcfintfipre (pBF 444). Now. what if we acknow" thatsmokiDllcwls cannot be: measun:d lICCunlely and we decide to obtaia two different mcasun:mcnlS on czh penon in Ihc trial (XI and X2. say,beingsc:lr-n:poncdnwnbcnofpac:lcspcrclay,obIainedat6 monIhs and a monIhs into the pn:pancy)? The InIc level of smoking is now n:praentcd by the variable TJt • Our mcasun:mcnl naadcl is repn:sc:ated by the lwo equalions: Xl
= Tx +El andXl = Tx +£2
We assume that Ihc El and E:l IIIC8SIRJ1lCIII enars IR uncorrclatcd and thai ~ is no cbanp in the: lnIC level of smoking between Ihc two times. The n:vised SlrUcIllnll equations now usc TJt IDIha- than X. as fallows:
Tx = yZ +Dxand r =/lTx +0, 11ac cam:sponcIing path cIiqnm is shown in the sc:cond figure (page 444). NOIe dud not all of the model paramctcn implied by the model in the reconcI fipre can be estimated. 1hc: IllUdeI is too complex. far Ihc clara at hand. The model as a whole is said to be uncIcricIcntificd. but the gaocI ncwsis thai we can still estimale fl. the panunck:r mast likely 10 be or intercslto the invc:lliptar. Prablc:ms of uncIcricIentificatian IR beyand the scope of this entry. but an: CO\'Cn:d by the slaDdanlleXIbaob on Sll'UcIUnll equatians mocIcUing refen:accd below. GD B.... K. A. 1989: Strw/lll'tllqutlliDtu "'ilk /almt lY1ri11ble.s. New York: John Wiley a S_s. Inc. 8yrDe, .. M. 19M: SI,,'ural etplillion m _ling ,,';Ik Eg5 _ EQSlWiRdow.r. 1hausand Oab. CA: Sace Publicalioas. 0... G., EtuUt. B. I. ad I'IddII,
~
-
T
S
r
N
~
U
i
f
__________________________________________________
A.I993: MINk/I;". RJ.vrriGRtY.s and lale,,' .vrritlbk.r MliIIg EQS. Laadoa: t1aIpman &; HalJ. PIarI. J. 2000: 2nd edition. 2D: ClIIIStIIity. Cambridge: ClUDbridge Univmity Pre.a.. PenIIatt. T. ad HIlMI, J. .. 1989: SimultlllCDl&Cquatian cstimaliDn in a cliaical trial of Ibe effect of smatiDI and birth ""CiPI. BitJIMI,ir.r 45. 619-22. SIIipIe)" .. 2000: Ctnua _ ~orrela,ioIr mbiolDfY. Cambridp: Cambridp Univcnity Plass. ........, A. ........ ........ 1. 2004: Cknerali=ftllale,,' .arioblemotklu.,: nat/tile.oel IoRgitudintll _ !/,,'ural equaiiDlu motkls. Baca RaIan. f1.: Chapman a HaIIICRC.
student'. focIlatrlbutlon
See 1-orsn18~
stuclenf. ,.test
William Scaly Oasscl. who worbcI uncIcrthe pseudonym or·Studcnt'. deYelopc:cl the Stuclenfs 1test. 'I1ae Student'sl-tell is commonly refem:cito men:ly as the ,-test. The simplest usc of the l-test is in comparing the MEAN of a SBlple to some spccifted population mean this is usually called the ono-samplc: l-lesL The ,-lest can be modilic:clto compare the means of two indc:pcndeat samples (the two-sample ,-test) and for painxl clata to comPIR the ditrerences betwec:n the pairs (the pain:d ,-lest). Studen", I-tell isa panuncbic test and cedainassumplions are made about the clata. These ~ Ihat the observations within each poup (with indc:pcndcnt samples) ar the ditrerenc:es (with paired samples) IR approxillllltcly aonnaIly dislribuac:d and for the two-sample case we also requin: the two groups to ha~ similarvAllJAllD.S. ]fthc: sample data does
tlac IllSUmptions then the analysis is seriously .awed. HowC\'Cl'. the l-test is ·lObust' and is not peally atreclcd by a modc:ralc failun: to mc:eI the assumptions. 1bc OI»osample I-tell can be used tocam~ the mean of not meel
a sample 10 a cedain specified value. This yalue is usually the population mean. 'I1Ic NUU. HYFOI1IESIS slalcs that then: is no signiflcant difference bctwa:n the sample mean and the population mean and the allclllati~ hypothesis SIales thai lIIcM is a significant ditren:nce between the sample mean and the population mean. The assumption we make is"at theclata are a radom sample or independent observations from an underlying nannal dillributi_.'l1ae tellstalillic' ispvCII by: Sample mcan-Hypathcsised mean Standard enar of sample mean -
1=-..· · - - - · - - - - - · - - · - -
with n - 1 ~ OREES OF HlEEDOM. w~ " is the sample size. So , is the deviation of a IIOIIDIII wriable from its hypolhesised mean measured in Sl'ANDARD ERROR units. 11Ic stancIanI error of the
This is campan:d against the
........ lIIOIIIIisali..Jedby STANDARD DEYL\11QIN.
1-D15tR18~
{.In). .........
is~ .......
ForexampJc:.suppase 8MI vaJucs fora sample or25pcopie were measun:d and a mean value of 24.5 was found with a sample slandanl clcYialion of 2.5. 10 lest if this sample mean 8MI is signiftcantJy dill'erenl fram a population mean
445
SUBGROUp~yaS
___________________________________________________
BMI of 26 we can usc the onc-samplc: l-lest. whcft our Dull hypolbesis is lhallhc~ is lID difference bc:twcen the sample mean of 24.' and the population meanor26. Tllis allows us 10 c:akulate Ihe leal stalislic as rollows: I _ 24.'-26 _ -3.0
MIl-l
UsiOC . . . . fiIr with In - I) =24 dccI- 01 1i'eccIam. We ftnd a P-VAWE or 0.0062. The Rsult is SIaIislieaUy signiftc:anl and we Ihc:n:ran: accept the Dllemali~ hypolbesis thalthe mean BMI of the sample is sipificanlly dilfen:at from 26. We can usc the lWo-sample I-leslto clc:tamine the slalislical signifit'anccofan observed dinaenc:ebetwc:c:D the mean wlues of some variable belweea two subpoups ar belweea sepuaIe populations. For example. we: could look at the dilTen:ac:es in hciplS between males and remales. The lest Slalislic: for the two-sample l-Iest is given b)': I _ Difl'cmxc in sample IDCIIIS-Di~nce iD
-
hypolhc:siJcd means
SbIIIdiId emil' of the Clifl'CRIIOe in iIIC two sampae me. .
Fnlquc:ntl)' the null h)'pOlhesis of inten:sl is whether the lwo glOUpS have equal means and the CXHIespoading twosided altemalive hypothesis is Ihal the means ~ in fad dilfen:aL For example. when comparing the mean outcome ror two dilT~1 tralments is Ihe diffc:n:nc:e in means observed D statistically sipiftCanl one? In this case the lest statistic mluc:es to: ,
Diffc:n:nc:e iD the two II8IIIple means Stanclani envrof the diffc:n:nc:e in the lWo sample means
11ns is then cOlllplllalIo Ihe l-clislribulion with III + "2 - 2 dep:csorncdam. Whc~"1 isthc:sampJesi&r«lhe ftrstgmup and"2 is the sunplesizl: rorlhe sa:ondpoup.1heSlandanlenar oIlhe clilTaaaa: ill the two-samplc means is givaa by: SE(xl-·i'2)-
~_ (m-I).r)
.,
~ Jj -+-
III
"2
1r1-1)~
"1 +"2-2
and -'I aad-'21R the 5landarcldeviaIiaM for pxIII5CJIIC and two
respectively. Far the painxI '-lest, Ihe daIa an: depe~ i.e. there is a oae-to-anc cam:sponclence bc:lwc:cn Ihc yalues in the two samples. Pain:d cIaIa can occur liam 1W0 measun:mcnls on the S8IIIC pc:na1. e.g. befon: and afterbalmc:nt orlhe same subject mc:uurcd at clil"en:nt limes. II is incaneclto anaI)'se paimI data ignoring the pairing in such cimlmslances. as impcxtanl infamalion is IOSl Same ractors you do nat conllal in the cxpc:rimenl wiD aft"cct the bereft and Ihe after II'IC8IUI'CIIICls
equall),. so they wiD IIDI atrccllhc clitren:nce belween bef'on: aad aftu. By Iookinc only at Ihe dilTcn:ncc:s. a pain:d '-lest cam:cts rar Ihcse rKkn. the two-sample pairm l-test usually tc:slS the: null hypothesis that the population mean orlhe paired diffc:n:nccs of the two samples is ZCIO. We assume lhat the pairm diflC:n:nces an: independent. To perform the: paiRXi/-test we calculate the: difference between each set of pairs aad then ped"onn a ~ sample '-lest on the diffc:n:nc:es with the: nuD hypolbesis that the populalion mean oflhc dilTtRnces iScquaito zero. Mare details can be round in Allman (1991). MMB AImaa, Do O. 1991: Pradkal .'tltU,ic$ for 1ft_kill rrMINIr. Londoa: Qapmao & Hall.
subgroup analysis This fona of analysis is often employed in CLINICAL TRIALS in an Dltempt to identify putic:ular subpaupsofpalienls for whom a treatment weds betb:r (ar wvne) than far the overall patient population. For example. doc:s a IIealIIIenl wadt better for men than rar women? Such a question is D natund CJIIC rar clinicians to ask since Ihey do not lreat ·a'\ICI'DP~ patients and. when COnflODb:cI with a remale patient with a certain canclilion. would like to know whether the acc:epIC:d tmItmc:nt ror the: condition weds. sa),. less well ror women. AsscsIinc whc:thcr the c:ft"cct oIl1a1ma11 YBric:s acaxding to the value 01' one ar ~ patient charaderistic:s is lehdively SlIaightrmwarciliam a IIalisIic:al viewpoint. invalviDg IIDIhinI IIIIJm Ihaa fating a In:almenl bycxwarialc infaacIion.lIowcwr. liliiii)' S1aIisIieiaRs would caution apinsI auch analyses and. if uncIcrtalccn at all. sugest thai they ~ inlciprdal abaDely c:auliausly ill thespiritol·explonIion· ndhcrlhanan)'lhing man: rormaJ. 1he n:asons for such caution an: DOl dillkult to idcaIify. Fint, lrialscan rarel)' provasuftic:ic:al POWER Ioddcc:tsuch subpaaprmlcl1lcliaa elTccI5; clinical trials accrue: sullic:ienl puticipmls to provide acIcquaac: )nCisian far estimating qullllilies ofprimlli')' interest. ....0)' cwcndIln:alineIIl elfeclS. C'Gnftaing allcnlion 10 subJRIIIPS almost always aaulls in ellimates or inadequate prmsion. A Irial just large enough to ewluate an overaU tn:atmenl elTec:l adiabl)' will abnost inevila"y lack pamsion for evaluating dilTcmttial tn:atmenl effecls between differatl population subpJups. Sa:and.lWlDCMSlVDlalSlRSda thcO\a1ll ....... pIPS in a dinica11ria11R likely 10 be lXIIIII&abIc. SUbpaJps may nat eqjoy Ihe !iIIIK cIqpc 01' baIanae in JIIIicd c::t.ac.ic1itticL Finally. the:~ ~oftcn many possible prognostic racton in the baseline: data. e.g. age. gender. aace. type or stage or disease. from which to form subgroups. so lhal8Dalyses InDy quickly degenerate into 'data dn:cIging'. from which arises Ihc potential for past hoc emphasis on the subpwp analysis giYing ~sulls of most intcn:slto the invcsliplar. wilh emphasis given to n:sults dec:med "statistically signific:ant' contributing. in tum. to D preponclenulee or 'p < 0.05' n:sults
nue
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SUMMARY MEASURE ANALYSIS published in the mcdicailiterallR (an excess of raise positive ftndiDiI. thc=rCR). Olbel' potential dan&en or subgroup analysis can be found in dc:W1 in Pocock et III. (2002). SSE Pocack,S.J.,A......, s. Eo, ....... L E ..... bfa, L E. 2002: Subgraupanalysis.covnteadjustmcntMid bascliac comparisaIs in clinicailriallqlCllting: ~ practice and pnIbIcms. StQtut;cs ill Mftiirine 21. 2917-30.
8uftlclent-component cause model
See CAUSAL
toIODW
summary measure analysl8
This is a relatively slraighlforwarcl approach to the analysis or LONGITUDINAL DATA., in which the n:pcalcd measun:ments· or a response variable made on each individual in the study an: raluccd iD some way to a single number that is considen:d to capture an esscalial restun: of the response over time. In this way. the multivariate naI1IR of the n:pcaIed observations is lnInSformed to a univariate rraeaslR. The approach has been in use far many yean - see, far example. Oldham (1962) and Matthews el III. (1989). the most impadant consic:lcntion when applyinl a summary meal'" analysis is the choice or a suitable summary mellllR. a choice lhat needs to be made bdCR any data an: collected. The mellMR chasen neeck to be ~evant to Ihe particular questions ofinleral in the study and in the bmader scientific conlext in which the study tabs place. A wide range or summary measun:s has been proposed. as shown in the fllSltable. ~ to Frison and Pocock (1 992).ihe average: ~sponseoyc:rtime is often likely to be the most ~levanl. puticalarly in aJNICAL 11lJAI.S. Having chasen a suitable summary measun:. analysis will
involve DDlhing rnon: complicated than the applicalion or Student's l-tc:starcalculation ofa C'ONFlDENCEJN1ERVAL far the
group dift"c:rc:nce when two gmups IR being compaml ar a one-way ANALYSIS OF VARIANCE. when the~ are IIICR ~ two groups. If txlDSideral man: appropriate because or the dislribulional properties of the: sc:lc:ctcd summary measure, thc:a lIDIIIogous NDNP..\RAMEI'IU METHODS might be used.. The summary mcasure approach can be illustrated using the data shown in the second table, which come rrom a study of alcohol dc:pendence. Two groups or subjects. one with se~ dc:pendence and one with modcndc dependence on alcohol. hacllheir salsolinol exemion levels (in millimoles) recanIed on four consecutive days.
sumllllUY measure ....,... SaIsoIinoI excreIion data DIIy
Subjerl
I
2
Group I (moderate dcpc:ndencc:) 0.33 0.70 1 2 S.30 0.90 3 250 2.10 4 0.98 0.32 0.39 0.69 5 6 0.31 6.34 Group 2 (sc:vc:~ dependence) 0.64 0.70 7 8 0.73 I.IS 9 0.70 4.20 10 0040 1.60 11 2:50 1.30 12 7.10 1.20 1.90 1.30 13 0.50 0.40 14
J
4
2.33 1.80 1.12 3.91 0.73 0.63
3.20 0.70 1.01 0.66 3.86 3.86
1.00 3.60 7.30
1.40 2.60 S.4O 7.10 0.70
lAO
0.70 2.60 4.40 1.10
1.80
2.BO 8.10
.ummary measure ...lysIs Possible summary mfHISur8S (from Matthews eI III., 1989) Type of
Growth
.'a
QuestiDn t1finlelTsl
SumllllB)' measure
Is overall value of outcome variable the same in dillamt groups? Is maximum (minimum) raponsc dift"enmt
Overall mcan (equal time intervals) or amJ under curve (unequal intervals) Maximum (minimum) yalue
betwe:en groups? Is lime to maximum (miDimum) ~pansc:
"fime to maximum (minimum) n:spons
dillen:nt gioups? Is rate or change of outcome dift"e~nt between
Regrasion coefficient
~?
Orowtb
Is eventual value of outcome: ditrermt between groups?
OroWlh
Is raponsc: in one group delayed ~lalive to the other?
Final wlue or outcome: or dilTen:nce between last and lint values or pcm:entage change bc:Iwcen first and Jast values "fime to rcadJ a particular value (e.g. alW:cl pen:c:atage of baseline)
447
SUPPORrVECTOR MACHINES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Using the mean or the rour mc:asumnenls a.lable for cac:h subjcct as the sumDUll)' IIIC8SlR leads to the n:sulas shown in the third bible. TIleR is no CYiclencc of a glOUp dill"cn:nc:e in salsolinDl exc::mion levels.
summary ........ .....,... Results from uskJg the mean as a SUmt1llllYmeasut8 for the (fa_In the sscond IIlbIe Motiemle
Se..ere
Mean 1.80 0.60 sci n 6 1= -1.40. M= 12. 1'=0.19 95.. CI: 1-1.77.0.391
2.49
1.09 8
A possible altcmali~ to the usc of the ),IEAJI( as a summary measun: is 10 usc lhc maximum eacmion rate n:ccmlc:cl
over die rour da)'S. Applying the WJI.L"OXON RANK stJil TfSI' to this summ8l)' 1DCBSun: raullS in a at &Iali5lic of 36 and associalccl P-VALUE of 0.21. The summary mcasun: appmach 10 the analysis of 10000itudinal data can accommodate missing daIa but the implicit assumption is thallhcse arc missilll complerely at random (see DRCIIOUIS). SSE (Sec also AREA UNDBl aJRYEl FrIIoa, L ... Pamdc. s. J. 1902: Repeated mc&I1IRS ia clinical 1riaIs: anaIysisusinlmca SUllllDll)'staIisIicsand its impUadicafor dcsip. Sllllw;~s in Mrtlitine II. 168>704. Malt..... J. N. s., A_.. D.G.,c.a........ M.J.aad..,..., P. 1919: Analysisof' serial IIIClStRnaIs ia medical n:seaR:h. B,;IbIt If«lklll JDIIffIIII 300. 23-35. 0IdIIaa, P. 0. 1962: A nafcon analysis ofrepealCd
*
IDCIIMRmclllsof'*SIIIDe_jcc15.JCHtmtllo/ChrDIJit Disortkrs 15. 969-77.
..pport vector machines
Tbcsc arc: algoridllDS for learning complex classiftcali_ and n:pasion funclions. bc:lqiDg 10 the general ramily or'kcmcl methods' discussed later. Their aIIIIpIIlali_aI and statistical ef1iciCDC)' recently made them one or Ihc tools or choice in cc:daiD biological IM.TA MOlINa applications.
Support vector machines (SVMs) work by embedding the data into a featu~ space by means or IccrncI runctions (the so-called "kerncllrick'). In the binary classiftcation cue, a separating hyperplane that SCpanlCs the two classes is saught in this fealu~ space. New data points will be classifted inlo one of bath cluscsacconling to their position with rcspecllo this hyperplane. SVMs owe lheir name to their prapcdy of isolating a (often small) subset of daIa poinlS called SUpporl vccton". which have in~ng lhcorctical pmperties. 6
The SVM approach has several important virtues when companxl with earlier approaches: Ihc: choice or the hyperplane is founded on slatislical arguments: the hyperplane can be found by solving a cODvex (quadmtic) optimisation pmbIan. which means that lnIining an SVM is naI subject 10 local minima~ when a nonlinear kcmcl runction is used. the hyperplaac in lhe rcat~ space CD com:spond to a complex (nonlinear) decision boundary in the original data domain. Even IIIIR inlcn:slingly. kcmcl fUnctions can be defined nat oaIy on vectorial data but on vidually Dy kind of data. making it possible 10 classify slrings. images. trees or nodes iD a graph: the c1assiftcation ofunsc:cn data points is p:nemUy computationallychcapandclcpc:adsonthenumbcrorsuppart v«loIs. First intnJeluc:cd in 1992, support wctor machines arc now one or the standard tools in PATI'EL~ UCOCJNIDON appIicali_s. mostly due 10 their computational eOlcicncy and statistical stability. In n:ccnt ~ extensions or this algorithm 10 deal with a number of important data analysis Iasks have been proposed. n:sultiDg in the general ramily or "kernel melhucls' (Shawe-Taylar and Cristianini. 20(4) (see DENSm" ESmL\1IOXS).
The kinds or n:lation cIetectcd by kcmcl rncthock include classifications, ~~sians. cluslc:ring (sec a.USTEIl ANALYSIS IN MmICINE). principal aJlDpaaents (sec PRINCRL cor.lJIONENT AIW.YSIS). canonical ClDm'lalions(see c.u«)NlCALCOIRELA11ON ANALYSIS) and many olhen. In the same way as with SVMs.
the kernel Irick allows these methods to be applic:cl in a reatun: space that. is induced by this kernel. makiJas Iccmcl.methads applicable to virtually any kind or data. Elcpnlly, the dcvclapmenl of Iccmel rncthock can always be decomposc:cl into two modular steps: Ihe Iccmel design. on the one hand. and the choice of the aI,arithm, on the other hand. 11Ie Iccmcl design part implicidy clefillCsthe fcatun: space. which should conlain all a.lable inronnalion thal is ~Iewnt ror the pmblem at hand. The choice ordtc algorithm (which needs to be wrillcn in tc:nns or kernels) can be done indcpendeatly fivm the kernel design. As wiIh SVMs. most kernel mclhods n:cIuce their phase 10 aplimisilll a conycx cost runclion or to solving a simple eip:nvaluc prablem. hc:nce a'VOidin& one orlhc main computational pitralls or NEURAL NE1\VORKS. Howc:VCI'. since: they often implicitly make usc or very high dimensional spaces. kcmcl melhucls run the risk or ovcrftailil. For this n:ason. their ap nc:cds 10 illlXtrpondC principles of stalislic:aI learniDg theoJy. whicb help to i_dfy the cnacial panuneIcn that need to be conbOlled in order to avoid this risk (sec Vapnik. 1995). For further ~fen:ncc on SVMs. sec Crislianini and Shawe-Taylor (2000). NeRDS
lrainin,
a
Wi....., N.... Sbawe-1'11)'-, J. 2ODO: AIr ;"trodMrl_ 10
SlIpporl reel", ma""iMs. camllridF: Cambridr;c Uni\'alil)' Pras (www.suppart-\"CdCIr.net). . . . . .....,..., J .... CIf......... N.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SURVIVAL ANALYSIS-AN OVERVIEW 2004: Kernel./W,for ptlllefJIllllalysis. Camllricl&e: Cambridge UDiversity Pas (www.kemcl-mclbads..nca). VapIIIk, V. 1995: 11w nal"e 0/$Iatistictllwaming lheo". New yort: s.,nnpr.
surrogate endpoints These ~
DDPOINI'5 thai can a clinical endpoinl rar the purpose: of assessing the effects ornew tMatmcnlscariicr. at lower cast. or with lreater statistical SDSmYITY. Surrogate endpoints can include measun:menls of a biomarker. defined as ·a characteristic thai is objectively measured and evaluated as an indiclllOr of normal biological processes. pathogenic processc:s. or phannacologic ~sponses to a Ihc:rapeutic inten'ention' (Biamarken Dcftnitions Working Group. 20(1). Use of a biomarker as a sunople endpoint can also be usct'ul if the final endpoint mcas~ment is unduly invasive or uncomfortable. For an enclpoinlto be a sunople for a clinical endpoint itlDUst be a measun: of disease such that: (a) the size (or rlaluency) corn:lates stroqly with thai clinical endpoint (c.g. blood pn:ssun: is poSitively carn:lated with the risk of SIJOkc) and (b) lIatmc:nts praclucilll a change in the SUJlDpIC endpoint also modify the risk or thai particular clinical endpoint (e.l. n:ducil1l blood IRS~ nxluces the risk of sbake). Surrople endpoints an: routinely used in early drug development. when: interest focuses on shOWing thai new IIaImcnts have enoulh activily to wananl rurther n:searc:h. In confinnatolY PHAsE III TRIAlS. however. interal rocuses on shoWing that new lrealmcnts have the anticipated clinical benefits. and in such silUlltions surrogate cndpoints can anly be used if they have undergone rigorous statistical ewlualion (or ·wlidalion·) (Burz.ykowski. Molenbcrghs and Buysc. 200S). Indeed. some promising SUl1'OpIe endpoints have proven to be unn:liablc pralictan of clinical benefits. For example. cardiac arrhythmia was believed to be a JOOCI surrogate endpoinl far mortality after an acute heart allac~ sinec in Ihc:se cin:umstanca patients with a higher risk or such an anhylhmia have a gn:ater risk of dealh. Howcver. several clrup (c.g. Iill1ocaine. Rc:cainide) thai pnwent DrJhythmias after a heart attack actually incn:ase monalilY (£Cht et tiL. 1991). Similarly.1DIDC blood pn:ssurc-Iowaing druas (such as angialensin-convClting enzyme inhibitors) have much largcr etrccls on vascular mortality than mighl be predicted flOlll their effc:ds on blood pn:ssun: (Heart Oulc:cHncs Pn:vention Eyaluation Study InveslillllDrs. 2000). In contrast.. discasc-fn:c survival has rc:cenlly been yalidated as an acceptable sunoplc for overall survival in patients with colom:lal cancer lrcaled wilh IlUOlOpyrimidines (Sargent el tiL. 2(05). Pn:nlice (1989) pl'CJlXlClCCl a definition and opcralional crilcria for Ihc: validation of sunvpte endpoints. AlthouP the strict crileria prapased by Pn:nlicc seem lao Slriqent to ever be met in pmclicc, his landmark paper spoked iDten:s1 in developing statistical methods thai could be used to show that a surrogate is acccplabic (ar ·wliclated·) ror the purposes or ~place
assessilll a spcciftc class of bealmenls in a specific disease selling. One approadI consists or usinl a Ml111IDEL MODEL 10 show that the surroplc endpoint prcxIicls the true cadpoint ('inclividual-lcYcl· surrop:y). and thai the efTcds of a In:aImenl on the sunopIc endpoinl pn:dicl the efl'ecls or Ihc: batmcnl on the true c~nl ('trial-level" SUI'IOIBC:Y) (Buyse eI QL, 2(00).1111: lattercandition n:quiJa data to be aWlahle rrom scycral unils. usually fmm a META-ANALYSIS of scvaal trials. Anolher approacl1 consists of using a CAUSAL MODEL 10 cornpan: the causal efl'ect or lMatmc:nt on the true endpoint in patients for whom In:abnenl does. and does noI. aiTecl the surrogate. Sec Weir and Walley (2006) fell' a n:view or the tenninolou and sunugate validation models. CBlMB BIaaIarkIn Del. . . Wo...... Graap 1001: Biomarters and sunople endpoints: prefc:rml defiailions and coaccptual fiamc-
wark. elinictll PiwrmtltolD" and Tlterapealirs 69. 1'9-95. Blln)'kow*I. T., Malabe..... O. ad ..,.., M. (eels) 2005: EaYllualitJII of $II1rogale endpoi,,'s. SpriDler ~ B...,.. M., M........... G., Bun.rbWlld, T., Jtmud, D..... 0.,., H. 2001: 1'1Ic validltian or sunolale endpoints in mclll-analyses or randomized experiments. BiDJlalislirs l. 49-67. F.cId, D. s., .....,. I0Il, P. IL. MtIdHll, L B. « ...... die CardlacAn'.,...... Sapp. . . . Trial (CAST) lay........." J99I: Modality and
morbidity in palieals nx:eiviDI encaillidc.. ftecainidc. or placdIo: the cardiac arrhythmia aqIIRSSicIn bill. N~ Engllllltl JOIImQI of Met/kiM 324. 781-8........ 0uIcGmeI PnftDtIaa Ewlaatlaa Stad). lay......... 2000:
arms of an aDliotcnsin-cammiDI
aIZ)'mc iabibitar.. ramipril. on death rmm cardiowscular causes. m)'OCBldial infmlion, . . SIRJIce in hip-risk patients. Nn.' Ellglfllltl JDW'IIIII ofMdriRe 342. 145-53. Pnaace. R. L 1919: Sunoplc eadpoints in diDicaJ trials: clcftDition and opcraIioaaI critail. SI. listicsi" Meditine8. 43 1-10. Saqlat,D., \VIead,s.,IIaDer, D. G. 2005: ~heSW\'iYai (DFS) ysomaU sunival (OS) asa primay eadpoild far adjuvant coIoa CIllOCl' studies: indi\'idual paIicnt data from 20.191 patients on 18 nndamir.cd trials. JounraJ 0/ elinkal O"roIogy 23. ~70. Weir, C. J..... W"Iey, R. J. 2006: SlIIisIic:al evaluation of biomudc.cn as 51IJ'IVIatc endpoints: a literalUle m-icw. Stalistics iIr Metlidne 25. 1&3-203.
If.
survival analysis - an overview 'I1Iis covers mdhods ror the analysis of timc-lD-eVcnt da... e.g. survival limes. SurYiwl data occur when the oulcome ofinlc:n:sl is the lime fram a wcll-deftned lime origin to the occurrence or a
particular event or DlDPOM. If the endpoint is the cicada of a patienl the n:sulting data am. IilClally. survival times. Howcver. other endpoints ~ possible. e.g. the lime to micf ar rcc:u~JICIC ofsymploms. Such obscnalions 8RI often n:fcm:d to as limc-tOoCYCftt data allhoqh survival data is commonly used as a gcneric term. Slandard Slalistical mcthodofOJ)' is nol usuaDy appropriate ror sucb daIa. for two main n:asonL First. the distribution orsurviyaltime in general is likely to display positive SKEWNESS and 10 assuming nonnality for an analysis (as donc.. for example. by a l-lESTor a n:grasion) is probably not n:asonablc.
449
SURVIVALANALYSIS-ANOVERVIEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ critical than doubts about normality. however. is the pn:sencc or censon:d observations. ~ the surviwllime or an individual is rerenalto as censoml when die c:acIpoint or inb:RII has nat yet been mached (IIICR JRCiscly. right ccnscnd). For bUe surviyallimeslhis mi&ht be becausc the clata fram a &ludy arc analysed at a time point when IDIDC participanls are lIiII aIi\'e. ADaIher reason ror cellSDRd ewat limes is thai .. indiyiduallDipi ha", been last to rollow-up for JaIIGIIS unn:1aIed to theevenl ofintcn:st. e.l.due to moviDg to a Iocalicm that cannot be ncc:clordue to accidcnlal cIc:ada (scc DItCPOOTS). When ccasoring occurs all that is kaowa is that the actual. bat unkaowa. survivaltimc is iarpr than the ccnsan:d survival time. Specialised slalislical techniques clcvclopcd to analyse suell censcnd and possibly skcwccI outcomes are known as surviYal analysis. An impaltaDt assumption made in standard survival analysis is Ihat the ccnlDrinl is noninf'omlalive. i.e. that the aclUal surviwllimeofan individual is inclcpcnclcnt of any mechanism that causes thai individual's SUl'Yiwllimc to beccnsan:d. Forsimplieity. this dcscription alsoconcenlnlb:s OIIIcdmiqUCS forconlinuous surviyal times -Ihc: analysis or disclde lIIn'ivailimes is dc:scribccl in Collclt (2003). As .. cumplc. consiclc:r data that arise fram a doublcblind. randomisccl c:onbolled clinical IriaI (Ref) (sec CLINICAL 1I1AU) 10 compan: tralmcnts for )JIOlIalc caac:cr (plaecbo VClSUS 1.0 l1li of diClhylslilbcsbol (DES) adminiSlcn:d daily by mouth). 'I1Ic rull datasel is giyen in Andn:ws and H~bcq (1915) and the tint lable shows the tint scvcn or a subset of 38 patienlS used hen: and discussed in Collcu (2003). In this lIudy, the lime or GriPn was the dale 011 which a cancer suft"en:r was randomiscd to a IJUlmcnt and the cndpoint is Ihc dealh of a palic:al fmm pnJSlalc caaccr. 'Ihe surviwl timesorpalicn15 wIaodic:cl from athc:rcauscs or W'C1'e last duriq Ihc: follow-up pracC15 1ft R'IIanIccI as ript cellSDRd. The 'stalus' variable in Ihc: first table lakes the value unily if the palicnl has died f'Mm prostaIc cancer and Second.
IIICR
zero if the surviwllime is censan:d.. In additicmlo survival limes. a number ofplVlllOSlic raclOrs wen: n:conled. namely the • or the patienl at trial cnlly.their scnnn hllClDllllobin levcl in "",,100m). the size or their primM)' tumour in crn2 and the value or a combined index oflUmaur stqc and gnuIc (the Gleason inclcx with Jailer valucs indicating II1CR advancccllUmoun). 'I1Ic: main aim or this study was 10 compen: Ibc survival cxpcrieac:e between die: two In:alment paups. Inpncral. to dcscribesurvivalawo functions orlimc are of central inlcn:sl-Ihc .n;l'IIljaRdioR and Ihc: htlztB'd/IIIKlion. 'I1Icsc 1ft described in some detail ncxL 11Ic survival functionS(l) isddincd as the probabililY ahat an individual"s survivallimc:.. T. is IRala' lIIan ar equal to time I. i.e.:
$(1) = Pmb(T ~ t) The graph of S(I) apinll 'isknownas thesurviyal curve. Tbc survival curve CaD be lIIaughl or as a particular way or displaying die frequency distribution of Ihc: event times. rather than by. say, a HISI'OOIWI. Wbc:n dlerc an: no cc:nson:d observations in Ihc: saaaplc of survival limes. tile survival fuaclioncan be estimated by thecmpirical surviyar r.x:liaa: • { ) _ Number of individual s willi survivallimcs ~ I St Number of individual s in Ihc: data ICl Since every subject is ·aliyc· althe beginning orllle slUelyand no GIIC is observccIto sum\'e Ioqer than the larpst of die observed surviwllimcs then:
.$0(0) = 1 ancI.$o(I_) - 1
FID1hc:rmore. abc cslimarccl survivw filnc:ticm is assumed constanl bctwceD two adjaccnl cIcaIh times. so thai a plot or 5'(1) apillSl 1 is a step ftlnetion that cleaascs immediately after each 'death". This simple mc:Ihod cannot be used when Ibcrc an: ccnson:d abscrvations since the mcthacI docs not allow far infonnalicm pnwiclal by an individual whose surviwllimc is ccnson:d befan: lime Ito be used in die compulinl or die
8U1V1v......,.. SUn!ival limBs of ptOSIate cancer patients Ptltiml
IrIIRIbo I 2. 3
4 5 6
7
Trmlmenl {I =pl«ebo. 2=DESJ
1 2 2
1 2
I I
$un;,1Ii I;'"
,,,,,,,,Ihs, 6S 61 60 5& 51 51 14
Sltllru fI=di«/, 0= celUDl'ftI)
A" (,.,an)
S"."
Size t1/
Glmron
1I.m.
tMlllQUl' (mi,
iIIde.~
l4
a
4
10
(gm/IDO nil)
0 0 0 0 0 0
67
1
73
60 77
64 65
61
13.4 14.6 15.6 16.2 14.1 13.5 12.4
3
a
6 2.1 8
9 9 8
11
II
____________________________________________
aut the study period. A similar procedure can be used 10 estimate adlu pcKCI1tiics or the distribution of the survival times and approximate confidence intervals CaD be round once the variaace or Ihc esdmalccI pm:entile has been deriw:d from the VARIA.~CE. or the estimator of lhc suMWII'
eslimateat/. 11Ie moil commonly used method rCll'estimating the survival runction rar survival cIaIa conlaining censOn:d observations is the product-limit or KAFLAN-MmR ES11MA1OR. 1'11e essence or this appmach is the use ora product of a series of conditional pmbabUilies. One alternative estimalDr ror censored survival limes. derived differently but in practice often similar. is the NcI~Aalc:n estimator. Appnaximatc STANIWtD ERRORS and pointwise synundric or asym~tric CCNRIlEIC! IN1BlVALS rar the sum" funclion at 8 given time can be: dcriw:d to dercnnine the pn:cisiOD or the estimatorddaiJs are given in CoUdt (2003). The Kaplaa-Mciu eslimalorS of Ihc survivor curves £or the two prostate CaRC:« tmdmcnts arc shown graphically in the figure. 'I1tc survivor CUJ"VCS an: step functions that decn:asc at the lime points when panicipaats died or the cancu. ThecenSCRCI observations in thedala 1ft indicated by the ~ClOSs' marks on the CUl"VCs. In our patient sample then: is apprOximately a diffen:nce or2K in the proportion suniviog ror at least SO to 60 months between Ihc bealmcDt glUUpS. Since the diSlribution or survival times tends 10 be positively skewed Ihc r.IEDIAN is Ihc prererred summary measure or location. The mc:diaD surviwl time is the time beyond which S09f, or Iha individuals in abe population under slUely 1ft expected to SUl\'ive aDd. once abe survivor runction has bc:e" eslimalcd by S(I). can be esdmalccl by the smallest observed survival time, 1!tO. for which Ihc value of the eslimalc:d survivor function is 1c:ss than 0.5. The estimalcd median survival time can be lad rnJIII the survival curve by ftnding the smallest value on lhc ."C axis ror which Iha survival proportion n:aches less than 005. The OgUM shows thDl the median survival in the placebo group can be estimated as 69 years while an estimate for the DES group" is not available since survival extleCds S09f, tIuauP-
1.0
.-- --+I-.
SURvWALANALYS~-ANOVsmn8N
function.
In Ihc analysis or survival data. it is oRen of some int~ to assess which periods have the highest and which the Iowat chance of dea'" (or whatever Ihc event of inlCrCst happens 10 be) amonl those people alive at the time. TIle appropriate quaDtily for such risks is Ihc hazanl function. "(/). defined as Ihc (scaled) PROBABD.ITY that an individual experiences an event in a small time interval 61. given that the individual has surviwd up to Ihc bcpnning of the inlaval. The hazard f'uaclion theld'CR represents the instantaneous death rate ror an individuaisunivinllO time I. It iS8 mcasun" of how likely an individual is 10 experience an event as a runclion or the qe or.1hc individual.. 1'11e hazanlfUnc:tian may ~main constant. incrasc or declalC with time or take some more complex rarm. 'I1tc bazard runction of clcath in human hemp. ror example.. has a 'balhtub' sImpc. It is relalively hip illU11Cdiately after binh. declines rapidly in Ihc early yean and then remains ~lali\'Cly colWlant until bcpnning to rise during late middle age. A Kaplaa-Meicr type estimator or the hazanI runction is giveli by the proportion or individuals experiencing an event in an interval per unit time. given that lhcy have surviwd to the belinna", or Ihc inlaval. Howc'VCl'. Ihc estimated hazard function is gencmlly considcn:d 'tao noisy' ror practical usc. Instead. the cumulative or in~ graICd hazanI function, which is derived from the hazard function by sumnudi4)L is usually displayed 10 describe Ihc chanp in hazanl over time.
,--- --,
0.8
1~6
!
..
0.2
i
0
,.I.
-~
0.4
0.0
survlval .....
.- --,,- ---of.+.- -........
i
20
i
i
40 TIIIt8 (months)
60
• 80
DIsplay of Kaplan-Meier survivor cuwes
451
SURVIVALANALYSIS-ANOVERVIEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ In addition to companD; survivor functions graphically. a mare fonnal statistical test for a group cliffen:nc:e is often required in order to compare 5Un'ivallimes analytically. In the absenec of a:nsoring. a nonpanunetric teSi such as the Mann-Whitney test could be used (sce MANN-WHn'NEY RANK Sm.I TESt'). In the presence of censoring. the log-rank or Mantcl-Haeaszel tesl is Ihe mosl commonly used nonparametric test (see MANIB.-HAENszEulEIIIOOS). It teSis the NULL HYFOTHESIS that the population sunival functions S.(I), S:!.I). ••• , S,.,(I) DR the same in k groups. Briefly. abe lest is based on computing the expected numbel' of deaths for each obsenc:d "death' time in the dataset. assuming that the chances of dying. given that subjects are aI risk, are the same in abe groups. The total numbel' of expected deaths is then computed for each group by adding the expcctc:d number or deatM for each failure time. The teSi linally compares the observed number of deaths in each group with Ihe expected number of deaths using a CHI-SQUARE TESt' with k - I DEGREES OF f1tEI!DIlV (sec HasmCl' and Lemeshow, 1999). The log-rank test statistic. weights contributions from all failun: times equally. Several alternative test slalistics have been proposed that give diffen:atial weights to the failun: times. For example, the generalised Wilcoxon test (or Breslow tesl) uses weiptscqual to the number at risk. For the prostate cancer data in the first table the log-rank test (r=4.4 on I degree of fn:edom. P=0.036) deteclS a significant group diffen:nec in favour of longer survival on DES beatment while the Wilcoxon teSi. which puts relalively more weight on differences between the suniwl cwves at earlier times. fails to reach signillcance at the 54Jt test level 3.4 on I degree of freedom. P = 0.(65). Modelling sunivaltimes is useful especially when there an: several explanatory variables to consider. For example. in the prostate cancer trial poIients wen: ranclomised to treatment groups so that the abeon:lic:al distributions of the diagnostic factors wen: abe same in the two groups. However, empirical distributions in the patient sample might still vIII')' and if the pmgnostic variables DR relalcd to survival they might confOUDd the group diffen:nec. A survival analysis that "adjusts' the group difference for the prognostic faclor(s) is needed. 'I11c main approaches used for modelling the effecls or covariates on survival can be divided roughly inlo two classes - models based on assuming proportional hazards and models for din:ct effecls on the sun ivai times. The main lc:chnique used for modelling survival limes is due to Cox (1972) and is known as the PROPORTIO.~AL HAZ.O\RDS model or. more Simply. Cox's regression (sec COX'S REORESSm MODEL). In essence, the technique acls as the analogue of multiple regression for sunival limes conlaining censored observations, for which multiple regression itself is clearly nol suitable. BrieOy, the procedure
:e,
e:e =
models the hazard function and central to it is abe assumption thai the hazard funclions for two individuals at any point in time IR proportional, the so-called pr0portional hazards assumption. In other words. if an individual has a risk of 'death· at some inilial time point that is twice as high as anothCl' individual, then at all later times the risk of death remains twice as high. Cox's model is made up of an unspecified baseline hazard function, !ro(I). which is then multiplied by a suitable function of an individual's explanatory variable values. to give abe individual's hazard function. The interpretation of the regression parameter of abe ith covariate. /l" is that exp(/J,) gives the hazard or INCIDENCE rate change associated with an increase of one unit in the ,lh covariate, all other explanatory variables remaining constanl. Cox's regression is considerc:d a semi-parametric procedure because the baseline hazard funclion. bo(l), and by implication the PROII.O\BllJI'Y DISTRIBUJION of the survival times. does not have to be specified. The baseline hazard is len unspeCified: a different parameter is essentially included for each unique survival time. These parameters can be thought of as NUISANCE PARAMlrrER5 whose purpose is merely 10 control the parameters of interest for any changes in the hazard over time. Cox's regrasion can be used to model abe proslalc cancer sunival data. To start with. a model containing only abe single treatment factor is filted. The estimated re~ssion coefficient of a DES indicator variable is -1.98 with a Slandard enor of J.1. This translates into an (unadjusted) hazard ratio or exp( -1.98) = 0.1 38. In other words. DES treatment is estimated to reduce the hazard of immediate death by 86.2CJ. relative to Fl.o\CEBO treatmenL According to a UKELIHOOD RA110 (LR) test. the unadjusted effect of DES is Slatistically significant at the SCI. level (r = 4.5S on 1 degree: of freedom. P = 0.033). For the proslale cancCl' data. it is or inlcn:st to determine the effect of DES after controlling for the othu prognostic variables. Ukelihood ratio tesls showed that dropping age and serum haemoglobin from a model that contains abe treatment indicalor variable and all four prognostic: variables did not significantly worsen the model fil (al the lOCI. level); the fit of the final model is shown in the second table. After adjusting for the effecls of tumour size and Slage abe hazard miuction for DES relative to plaa:bo treatment is reduced to 67.1CJ. and is no longer sbltistically signiftcant (LR test: 0.48 on I de~e of f~m. P 0.49). Both tumour size and Gleason index have a hazard ratio above unity. indicating thai increases in (umour size and advanced stages are eSlimated 10 increase abe chance of death. Cox's model docs not require spcciftcation of the probability distribution orthe survival times. The hazard function is not restricted to a specific fann and as a result abe
=
r =
____________________________________________
SU~WAL~YSB-ANO~
..rvIvaI • ...,... Parameter estimates from Ct»c's tegression of SIH'IIvaI on lreafment gmup, tumour size and Gleason inde1C
EJ1erl e$timate Rqrenitm,. Ct1eJ/kinll (iI)
DES Twnaursize
GIe. . . index
-1.113 0.0126 0.7102
SllIIIdtJrd erN,
( 0.048 0.338
9SC1 CI lor e:cp(fJ)
(exp (,;))
lAwn limit
Upper lilllit
0.329 1.G86 2.034
0.031 O.MO UM9
3.47 1.19 3.95
HtlZtII'd ",tiD
survival...,... PaRlllJ8ler estimates from Iog-Iog/stIc acceIersted fallute time model of SUI'IIVaJ on treatment gnJUfJ, tumour size and Gleason index
Regression r«Jlicienl(a) DES 1\amaur size Gleason index
0.628 -0.031 -G.33S
A«elttrrllioR
Itlt:lor(exp(
0.203
semi-panmebic mocIcl h. considenIbIe llexibiUty ad is widely used. However. if lbc IllSUmptiDn of a particular probabilily clislribution farlhc~ is valid. inferences based OD such 811 uSlDDpti_ arc IIIIR pnscise. Far example.
estiJDates orlumud ratios or median surviwlliJnc:s will ba~ smaller IIancIant emH5. A IUIly parlllllClric: pIOpOItionai hazards model makes lbc .1IIIIe assumpliaas as Cox's n:pasion bul in additioa also assumes thai lbc baseliDc hazard IUDClion. hrJ.t~ can be paIBIIICleriseci aaxIIIIilll to a specific macIeI for Ihe distribulion or the surviyal lillles. Swvivallillle distribulions Ihaa can be ural Cor lIIis purpose. i.e. that ha~ the prvpodionaI hazards property, arc principally Ihe EXIONEJII1W.., Weibull and OompedZ Dl51'R111U'11ONS. DilTemnt dislribulions imply dilTaaat shapes of Ihe hazard func:lioa., and in pnac:1ic:e lhe dislributioa dud best clcscribes the functional fana of lhe observed hazard functicia is chasen - for dclails see CoDell (2003). A fanaily of fully pal8lllClric models I " accommodate cIin:ct multiplicative effects of COYarilllCs on SUl'Yival times and heIIce do IIDI have to rely on propanianal hazards arc dcrelermm /tIilw~ limI nrotkls. A wieler ..ap or SUl'Yival lime dislributions possesses the acceJended failure lime
-a»
1.0.... ';",il
Upper limit
0.112 0.988 0.939
1.568 1.077 2.oao
0.534 1.031 1.393
pmpcdy. principally the expollClllial, Weibull. log-logistic. genenlisc:d OAMMA. or I.CJ(JJt(OBWo DISTRIBt1J'IOJI In addilion. Ibis familY ofparamebic models iDelucles dillribuliCIIII (col. Ihe log-logistic clistribulion) ....1 model unimodal bazanI functions wbiIe aU distributions suitable ror dae prapaltiaaal hazards madeJ imply hazanl flmclioas ~ incmIse or ~ crase IDDIIDIoIIically. The .... pmpert)' mipl be Iimililli. far cxamP~ far modcllinl Ihc hazanI of dyilll after a complicated operation that peaks in die past~ye period. the pncllli accelendccl failure lime madel for the elTc:cts orpexplanalcxy variables. x,. x~ ...• x", can be ~pn:senlcd as a log-linear made] for Slln'ivallime.. T, namely:
t·¥; ,
1n{T) = ao +
, I
+ enar
where Cli ••••• CI.. are the un coeflicienlS or the expllllUdaly Yariables and Go aD illlcn:epl puamelC:r. The ~k:r a, re8ects the effi:clthat Ihe idI COYariIlla has on Iog-sunival lime willa pasilive values indicaIilll Ihe IUn'ifti time increases willi incrasinl yalues of the covariaIe aDd vice vena. In terms or the oriPnai IimescaIe. Ihe
"'at
~~~CU~E
____________________________________________________
model implies thai the explanatDl)" variables mcasun:d on aD individual acl multiplicalively and so aft"cel Ihc speed of progressima 10 the ~nt of inlCn:lt. The: intapn:talion of the panameler a, islhcn:fore thal exp (u,) giyc:s Ihc factor by which aDy surviWlltime pera:nlile (e.g. the median surviWlllime)changes per unit incn:asc in :C~ all other explanalor)' Yariables remaining CDIISIanL Sx.JRssed dilfcraady, Ihc probabilily ahal aD individual wilh ClOvariaIc Yalue x, + I survives beyond I is equal 10 the probability ahal an individual with WIIue .~, survives beyond exp(-a,)I. Hence exp(-a,) determines Ihc change in the spc:cd with which individuals pnx:c:cd along the limescale~ and the cocflicient is known as the acceleration fadar of the fth ClOYarialc. Softw~ pacbgc:s lypicall)' use the log-linear fonnulalion. The n:pession cael1icieals from fitling a log-Iogislic accelerated failure lime model to the prastalc cancer survival limes usiDg treatment, size of I1ImDUr aad Glcuon index as pmlictor variables ~ shown in the thirdlable. The DCgative aqression cocfftcicats sugeslthat lbc surviyallimcs lend 10 be shorter far larger value of tumour size and Gleason index. The posiliYe reg.asion cocmcicnt fOl' the DES bealmcnt indicator suggests that survival times lend 10 be longer for individualsassigncd to the acliYe treatment after adjusting for the elTeels of tumour size and stage. n.c eslimatc:d acceleralion factarfor an individual in the DES poupcolDpllml wilh the placebo group is exp( -0.621) == 0.534: i.e. DES is estimated to slow clown the progression or the cancer by a factor of aboul 2. While possibly clinically rdc:vanl. Ibis effcel is. howevcf. not SIalislicaily signiftcanl (LR test: Jf == l.57 on I degree of freedom. P == 0.2(1). In summary, surviyal analysis is a powerful 1001 for analysing limc-lo-eVenl data. The cl_cal techniques. Kaplan-Meier estimalion. Cox's n:p:ssion and accelerated failure lime madelling. ~ implemented in most general purpose STA1lS1IC.AL~. with the S-Pluspackage having palticularly extensiye facilities for lIaing and assc:ssing nonstaadanl Cox models. The area is complex and one of active cunent n:sc:arch. For more Reenl advances. such as frailty madcls to include RANDOM EFFEctS. MULTISTATE MODElS to model diffcn:nt InInsition nics and models for compeling risks. the n:ader is refem:d 10 Andersen (2002). Crowder (2001) and Hougaard (2000). SL
AIIdeneat p. K. (cd.) 2002: MlllliJla/~ nrotIe&. slalulim/,.tboth in mealall remur" II. 1.aIIdon: Arnold. AIIIInwa, 0. F.... Hen-II. A. Me 1985: Data. New York: SpriDlcr. CaIIett, O. 200J: MDdelling JUn'hYliiata in IMtlkal ramrr/t. 2nd cclilian. London: CIIapman a HalIICRC. o. It. 1m: Repasion models and life tabla (with discussion). JDurnsl D/ tM RDytll Slalis,;",1 Soriel,. S"i~s B 74. II1-22Q. Cnnrder, K. J. 2001: C/tmiem "",,,elinl f&Ia. Boca R.... Fl.: 0Iapmaa & HalIICRC.
c..
1IGaIer.0. W...............,5.1999: Appln surri1'tlltBIalYs&. New York: John Wilcy" Sans.ln!:. H......... P. 2000: AIIIIIy;uD/ mu/'iWll'iflle :uuri.tJIiattl. New York: Spriaccr.
8urvlval curve
See ~ER ESl'lMATJQN.Sl1RVIVAL
ANALYSIS-AN OVERVIEW
8urvlval function
Sec SURVIVAL ANALYSIS
8ystematlc revlaws and ........n.lyaIa This is aD approadI to die combining of n:suIts from the many individual CUNlCAL11UAU ora partic:ularlRalmcnt orthcrapy ahal may have been carried out over die caune of time. Such a procedun: is needed because individual trials ~ rarely large enough 10 ansWcf the questions we want 10 answer as reliabl)' as we would like. In practice. mast trials an: tao small far adequate CXIIIClusions to be drawn about potentiall), small advanlages ofparlicular thc:npies. Advocacy ofhqe trials is a natural ICSponse to this silualima. llut it is nat always possible to launch very large trials before thClBpies became widely accepted or n:jccted prematurely. An alacmativc possibilily is to examine: the results from all ~evant trials. a pmccssahat involw:s two components. ODC qUlllillll;ve~ i.e. the extraction or the relevanlliteratun: and description or the available trials. in tenns ofthcir n:leyllDCC and mdhodological stn:Dgths and weaknc:sscs (the s),:llmJlllic revie1'·). and lbc other qrIQRlillll;l~, i.e. malhcmalicaJIy combining results from dilTen:n1 studic:s, evea on occasions when Ihcse studies have used different measures to astC5S auk:ome. This component is known as a melll-tllllllyns (Normand,. 1999). Infonnal synthesis or evidence rna dilTen:nl studies is. or course. DDlhing new. but it is now generally accepted ahal mela-anaJysis gives the systematic review an objectivity ahal is inevitably lacking in the classical n:yiew article and can also help the process to achieve grater JRCision and generalisabilil)' of findings than any single SlUdy. Then: n:main sceptics who fccl that the conclusions fRlm a mela-analysis oRen go far beyand whaa the Icchnique and the data justify. but despite such conccms. the demand far systemalic n:vicws of hcallhcan: interventions has deYeloped rapidl)' during the last decade. inilialCcl by the Widespread adoption of the principles of EVIDENCE-BASED MEDICINE bath among hcalthc~ pnlCtitioncrs and polic),makers. Such n:vicws an: now incn:asingly used as a basis for bath individual tmdmcnl clccisions and the runding of hcallhcan: and heallhcan: research worldwide. This puwth in sy5lcmalic reviews is n:lleclc:d in the curn:at slate of the CocHRANE COLLAIIOIIA1ION database conlaining as it claes mon: Ihan 1200 complete systematic n:vicws. with a further 1000 due to be IMIdcd soon. Systematic n:vicwsandlbc subsequent meta-analysiS have a number of aims: to reYiew systematic.)' Ihc available eMclcnce ftom a panicular n:scan:h an:a: to provide quantitative
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SYSTEMAnCREVIEws AND META-ANALYSIS summaries of the ~ts from each sludy; to combine the results across studies ir appropriate - such combination of mndts leads to gmllCr slalistical power in emmDling Imllment effects: to assess the amounl of variability bc:twec:n studies; to estimate thedc,n:e orbenefttassocialcd with a particularsludy ~atment: to identify study chuac:teristics associated with particularly eITcctive ImItmenlS. Ideally. the trials included in a sYSlematic review should be clinically homo,eneous. For example. they mi;ht all study a similar type of patient for a similar duration with the same lRatment in the two anns of each lrial. In practic:c.. of course. the trials included are far more likely to differ in some aspects. such as eligibility criteria. duration of treatment. length of follow-up and how ancillary care is used. On occasions. even treatment itself may nOl be identical in all the lrials. This implies that. in most circumstances. the objective of a systematic review cQnnol be equated with that of a single lar,e trial. even if thatlrial has wide eligibility. While a single trial focuses on the effect of a spcciftc treatment in spccif1c situations. a meta-analysis aims for a more generalisable conclusion about the effect of a generic treatment policy in a wider range of areas. When the trials included in a sysaematic review do differ in some of their oomponents. therapeutic effects may very well be different. but these differences are likely to be in the size of the effects rather than their diR:dion. It would. after all. be exlnordinary ir~atmenteffects were exactly the same when estimated from trials in different countries. in different populations. in different ace groups or under diffi:n:ntlrealment rqimens. If the sludies were big enough it would be possible 10 measure these differences reliably. but in most cases this will not be possible. However. meta-analysis allows the investigation of sources or possible hetero,eneity in the results from diITerent trials. as we shan see later. and discourages the common, simplistic and often misleading inlelpretalion that the results orindividual clinical trials are in conRict because some are labelled 'positive' (i.e. statistically signiftcant) and others 'ne,ative' (i.e. statistically nonsipifieant). A systematic approach to synthesisin, information can oRen both estimate lhe degree ofbenclit from a particular therapy and whether the beneftt depends on specific characteristics of the studies. The selection of studies is the greatest single concern in applying meta-analysis and then: are at least thrc:c: important components of the selection process. namely breadth. quality and repn:senlalivencss (Poc:ock. 1996). Breadth relates to the decision as to whether to study a very spcciftc narrow question (e.g. the same chug. disease and selting for studies foliDWin, a common protocol) or a man: generic problem (e.g. a broad class of treatments for a range or conditions in a variety ofseltings). The broader the meta-analysis. the more difficulty there is in interpreting the oombined evidence as
regards future policy. Consequently. the broader the metaanalysis. the man: it needs to be interpreted qualitatively rather than quantitatively. Quality and reliability of a systematic review is dependent on the quality of the data in the included studies. although criticisms of meta-analyses for including original studies of questionable quality are typical examples of shooting the messenger who bears bad news. Aspc:c:ts of quality of the original anicles thai are pertinent to the reliability of the meta-analysis include a valid RANJX)).O$A. TIO.'l process (we arc: assuming that in meta-analysis of clinical trials. only nuadomised trials will be selected), MtNJratlSATION of potential BIASES introduced by DROPOUTS, acceptable methods of analysis. level of BUNDIKO and recording of adequate clinical details. Several attempts have been made to make this aspect of meta-analysis more rigorous by using the results given by applying specially constructed quality assessments scales to assess the candidate trials for inclusion in the analysis. Determining quality would be helped if the results from so many trials were not so poorly reported. In the future. this may be improved by the CONSORT stalemenl (CONSOLIDATED STAh"DARDS fOR REPoImNO 'hiALS).
1he representativeness of the studies in a systematic review depends largely on havin, an acuptable searcl1 stralcl)'. Onc.-e the researcher has established the ,oaIs of the systematic review. an ambitious literature search needs to be undertaken. the literature obtained and then summarised. Possible sources of material include the published literature. unpublished literature, uneompleted rc:scan:h reports. work in progress. mnference/symposia pl'OClCedings. disSCltalions, expert informants. granting agencies. trial registries. industry and journal hand searching. 1he search will probably begin by using computerised bibliographic databases of published and unpublished research review articles. forexamplc. MEDLINE. This is clearly a sensible strategy, allhough there is some evidence or deficiencies in MEDLINE when searching for RANDOMISED COKTROlJ.B) TRIALS. Enswing that a meta-analysiS is truly representative ean be problematic. II has long been known that journal articles arc: not a representative sample of work addressed to any particular area of research. Research with statistically significant results is potentially more likely to be submiucd and published than wodt with null or nonsigniftc:ant results. palticularly if the studies are small. The problem is made worse by the fact that many medical studies look at multiple oulc:omes and there is a tendency for only those oulc:omes su"esting a signilicant elTcct to be mentioned when the study is wrillen up. Outoomes that show no clear treatment effect arc:oflen igDDRd and so will not be included in any laler' review of studies looking at those particular outcomes. Publication bias is likely to lead to an overrepresenlation of positive results.
4S5
SYSTEMATIC REVIEWS AND META-ANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Clearly it bc:comes of some impottance to assess lhc likelihood or publication bias in any meta-analysis reported in the lilenllure. A well-known infonnal mClhod or investipiing this polential problem is the so-called RJNNEL PlDI". usually a pial or a measure or a study's pledsion (e.g. one over the SJANDARD ERRClIl)apinstefFect size. The most precise estimates(e.g.lhase fram the largest studies) will be at the top or the plot and those from less pn:cise or smaller studies at lhc bottom. The expectation or a ·funnel· shape in the pial relies an two empirical obscmdions. First. the variances or studies in a meta-anaJysis are nal idealical. but are distributed in sucb a way that lhere are fewer precise studies ud ndber more imprecise ones and. second. DI any ftxc:cllevel or VARIANCES. sludies arc symmelrically distribuled aboulthe MEAN. Evidence of publication bias is pIOvidcd by an absenc:e or slUdies on the left-hand side or the base or the runnel. The assumption is that, whether because or editorial policy or aulbar inaction or some alber reason. these studies (which are not statistically significant) are lhc ODeS ahat might not be published. An example of a funnel pial suggesting the possible presence or publication bias is given in the figure (laken from DuwJ and Twcc:dic. 2000). Various prapasals have bc:cn made as to bow to lest for publication bias ina syslelnatic review although none orthcse is wholly satisractory. The clangc:r orlhe testing approach is the Icmptation to assume thai. ir the lest is DOl significant, then: is no problem and the possibility ofpublicatian bias can be conveniently ignonxi. In pnctice. however. publication bias is very likely endemic to aU empirical resean:h and so should be assumed present. whalever the result or some testing procedures witb possibly low POWER. Once the studies ror systematic ~view have bc:ea selected and the possible problems of publication bias addressed.
(b)
(a)
10
i
8
-;
6
•
-
. ..•., .•• •• .~.
.5
-1
•
.5 Effect size
-;
6
•
~--C
•
e
8
0-
•
2
0
10
•
'2 4
I ....
effeci sizes and varianc:e estimala arc CXbacled rrom Ihe selected papers. reports. ele., and subjected to a metaanalysis. in which the aim is to proVide a global test or significamc:c ror Ihe avenll N1JLL IIYIVrHESIS of no effecl in all shlelics and to calculate an c:sIimalc and a CONFJDea: INIERVAL or the o\"raIl effect size. 1\\10 models are usually considered. one involving AXED EFFEl'"J'S and the other RANDOM EfIIfrIS (FJeiss.. 1993; Sutton el al.• 2000). The former assumes thDi the true efFect is the same: for all studies whelUS lhc latter assumes that individual studies have different elTc:ct sizes thal vuy randomly around the overall mean etTect size. Thus the nndom effects madel spc:ciftcally allows ror the existence of both between study heterogeneity and within-study variabilily. When Ihe resean::h question concerns whether treatment has produced an effect, on the avem;e. in the sel of studies being analysed. then the fixed effects modcI for Ihe studies may be the man: appropriate; hen: then: is no interest in generaiising Ihe results to oIher studies. Many statisticians believe. ~ver. thatlhc nndom effccts model is IIII:R appropriate than a fixed effects model ror melD-analysis.. because: between-study variDiion is an importanl soun::e of uncertainty that should Dol be: ignaml when assigning IIDCeI1aint)' into pooled results. tests or homogeneity ~ available. i.e. a test that the between-study YarillDCCcomponenl iszuo- ifil is, a fixed effects modcI is considered justified. Sucb a test is.. ~ver. likely to be of low power far dc:Ic:cIing departun:s fram homogeneity and so its practical consequences ~ probably quite limited. The essential feat~ or bath lhc fixed and nndom effects models ror meaa-analysis is Ihe use or a weighlc:d mean of lreabnent etTcct sizes frum the individual studies. with Ihe weights usually being Ihe reciprocals of the associaaecl
0.5
-J
• 1:5
• •
.•.•., ... • ~--C
4
~.
2
0
.5
-1
.5
05
15
Effect size
systematic reviews and rnetIHInalysis (a) Funnel plot of 35 simulated stucfes and metlHnaIysis with tflle effect size of zero: estimated effect size;s 0.080 with a 95% confidence interval of £-0.018,0. 17B}: (b) tunnelplot as in (a) with five feflmosr studies Sf.IPPI8SSBd; overall effect size is now estimated as 0.124 with a 95" confidence interval of fO.037,0.2fOJ. Rsprintedftom Duvaland Tweedie, 2000, withpennission fromTheJoumai 01 the American statistical Association. Copyright 2000 by tire American StaIistictJJ Association. AJII1Qhts teServed
______________________________________________________ STANDARDERROR standard deviation
nus
is • measure of SlRad inlCncled to live an iDdication of the or • series or values (.~.. X2• •••, .1',,) about lheir r.EAN(I). Takilll the averqe orlhe differences rrom Ihe mean may initially seem a good measure orlheir SJRad. bul in ract this is always ZCIO. 11ae~rorc, the standard devialiOD is based Oft the a~ of the squan:d diffe~nces from the mean. sinte these IR all positive. Takilll the square rool or this fault gives a measun: that is In the same units as the original values. Thus. the standard deviation (.I') is calcalalcd usilll the rollowing formula Hae n is the number of absemdiaas, ; takes values from I to If and the ~ nalation cIenob:s the IUIIL i.e. (XI-.t)2 + (Xl-x)2 + .. (x._.t)2:
spn:"
then approximately 9S4Jf, of the observations wiD be within lWostandanideYiationsorthe mcan.1be filun: shows the QSC or a standard aormaI clislribulion, which has a mean or 0 and. SIaDdanI clcviation or l. SRC
DISTRIBUIION
95%
-Ib
.1'=
E(Xi-X)2 If-I
Nale that Ihe ronnula i es division by n - I, ndher than n, when lakilllthe a or the squarN dilfe~nces. 'This gives a rawt thai is. be eslimaleorthe IIandanI deviation in the whole population. whic:h is beilll estimated from the sample available. The SIandaId deviation can be denaled SO. ad. .I' ar o. although the Iastlechnica1ly n:fcrs to the SIandanI clcviation or a populaaion. I'8Iher than a sample. TO calculate the slandanl deviation by hand ~ is a IIICR convenient and malhematicaIJy equivalent rormula:
,
01 Standard devialions
.......... deviation SIandatd nonnaJ disttibution, wiIIr mean of 0 and SD of 1 A......, D. G. 1991: Pradiftli Jlatistirs for mNiml
Laadoa: C1IapmID a Hall.
standard error 11ais is the srAl\D.W) DEVIATION or Ihe 5AMPLINO DJSlIIBUTIONofa statistic. Forcumple. the Ilandanl w~,r error orlhe sample MEAN or n observations is D I is the VARIANCE or the oripnal observations. A useful aiclc-memoin: to distinguish when 10 use sIancIanI c1cviatiOft (SD) and when to use standard error (S£) is 10 nx:aIl: 'SD rar dc:scriplion, SE ror estimation.' In particular, when describing patient c:baracterislics in a sample, as in a n:searcb paper·s Iypical1lab1c I, means and SOs should be ~rted. wIIe~ when seekinlto learn rRIIII the sample and apply !Csults to the relevanl papulation, i.e. performing Slatistical inre~nce either by IIYPOIIIESIS T!S1'S or estimation by CONJIIDfl(CE INIDVAlS.lhen the IIancIant CII'OI' is used. 111e S£is necessarily smaller than the SDand it is WIOIIItouse SE as a MEASURE (II SPREAD whc:a clescribilll samples. M~ generally, standard emxs can be altacbc:d to any sample-bascd quantity, notjusllhe meu ora single sample or conlinuously distributed daIa. as just discussed. The general form or a large.-sample 95f1. confidence inlerYal ror a populalion parameter (numerical characteristic) is the sample-based point estimate ~l.96 (slandard enors), where 1.96 arises from the Slandanl NORMAL DIstRIBUTION and the standard error is that of the point estimate. itself the best sample-based luess for the value orahe pammetcr. For two-sample inrerence. this is usually a quantity such as the difference in population means. ror continuous clata. or the differcnc:e in population piOporlions. ror categorical clata. SSE
JR.
As an example. the au 4XlRtent ( C) of 10 babies was mcasuml usinl dual C1ICrgy X-my absorptiomelly (DXA). 'I11c: mcasumnents in grams wcn:: 46.6. 46.9. 49.2,49.8,53.2.61.1.68.1,73.1,77.1 aad 78.6. It is simple to caJculalc that Ihe sum or the observations ~ .1'; :. 603.7 and the s~ orlhe squ~ of the observations ~.~ - 37938.89. 11Has.
.1'=
37938.89-1(603.7)2/10] 9
= 12. 81 1
'I1le Yo a set of IDC8SIIImDents is the square or lheir SIandard dcvi •on. AlthouP the variance has many uses. the standard dey· ion is a more meaningrul desaipti'VC staliSlic because it is in the same units as Ihe mw data. Wherais squan: millimelrcs, mm2 • may have an obvious inlap~1a tion, squan: millimetn:s or mercury, mmHI2, does DOl. Altman (1991) IUgeSls thai slDndanl deviations may be quoted with one orlwo IIIOJ"C clccimal places than the original values. The slandard deviation is typically used as a mealU~ or spmuI alongside the: mean aad is IDOSl appropriate when the data 1ft approximately symmetrically dislribukd. It has the useful properly that when the data follow a NOMIAL
RJrtUrIr.
STANDAADPO~noN
standard populsUon statlslleal consulting standardised
___________________________________________________ Sec DBIOORAPHY Sec CmmJU1NO A STATISTICIAN
mortality
ratio
(SMA)
Sec
DEMOORAPJIY
STATA
See STAnmcAL PACKAOES
statistical methods In molecular biology
Molecular biology is the branch of biology Ihat studies the slrUcture and fwaction of biolo,ical mamHnOlecules or a cell. and c:spccially their p:netic role. "I'hrce types of rn.acromolecules an:: the main subjects ofinlcrcst: deoxyribonucleic acids (DNA), ribonucleic acick (RNA) and prolcins. Genetic information is encoded in the DNA and inherited from parents to children and whca expressed. a ,ene, the basic unit of inheritance, is first b'ansc:ribcd to messenger RNA. which then carrics the infonnalion 10 a cellular machinery (ribasomc) for protein production. This basic principle or the information Row in bioloJ;Y is often referred 10 as Ihe 'central dogma' • put forwanl by Francis Crick in 1958. A ccnlnll goal of molecular bioloJ;Y is to decipher the genetic infonnation and understand the regulalion of protein synthesis and interadion in cellular processes. The rapid advance of biotcchnoloJ;Y in the past few decades has facilitated manipulation of these important biopolymers and allowed scatists to clone. sequence and amplify DNA. As a result. ZI lar;e amount of biological sequeace and structural information has been generated and dcposilcd inlO public accessible databases. 11Ie phenomenal powth ofbiolo,ical dala is underpinned by the de\lelopments of high-throupput DNA sequeac:in, and microBrraY technologies and Ihe recent progresses in ,iant rcsean:h projects such as the human genome project that produced the sequence of the human genome. The word 'genome' refers to the entire collection of genetic malcrial of an organism. 111ese advances result in many complex and massive datasets, sometimes decaupled &om specific biological questions under investi,ation. 11Ie need to extract scientific insighlS from these rich data by CXJmputational aDd analytic means has spawned the new field of bioinformatics and computational molecular biolo&;Y, which deals wilh stOI1lgC. retrieval and analysis of biological data. These can consist of information stoml in Ihc genetic axle. but also experimental results from various soun:a. palient statistics and scientific literature. Bioinformatics is highly interdisciplinary. using techniques and concepts rrom informatics. statistics. mathematics. physics. chemistJy. biochemisti)' aDd linguistics. Nowadays. various biological databases and pradical applications of bioinformatics arc n:adily awilDble throu,h Ihe internet and arc Widely used in biological and medical rcsearcb.
A wide spectrum of statistical methods has been successfully applied in bioinformatics. rangin, from Ihc basic summary statistics and exploratory dala analysis tools, to sophisticated bidden Markov models and Baycsian resamplin, methods (see BAYESIAN t.ETHODS. MARKOV CHAIN MONTE CARLO). Analyses in bioinformatics focus on three types of datasets: genome sequences. macromolecule structures and Iar;e-scale functional genomic, experiments. Various other data types are also involved. such as taxonomy trees. sequence poIymorphisms. relationship data from metabolic pathways. patient slaliSlics. text from scientific literalure and so on. DNA sequences arc the primary data from Ihc sequencin, projects and they only become rally valuable through multiple layers of annotation and organisation. Several areas of bioinformatics analysis arc relevant when dealing with DNA and protein sequences: sequence assembly. to establish the com:ct order of sequence cantip for a contiguous sequence; prediction of functional wits. 10 identify subsets of sequences thal code for various functional si,nals such as protein CIOding genes. promoters. splice sites. regulatory elements: and sequence comparison and database search. to retrieve data emciently from organised databa!ICs. Most oflhese analyses involved sequerrr:e align",ent. one of Ihe classic problems in Ihc carly development of bioinformatics. Sequence alignment is the basic tool that allows us to determine Ihc similarity of two or more sequences and infer components thai might be canscm:d through evolution and natumJ selection. To align two protein sequeaces. similarity scores arc assigned to all possible pain of residues and the sequences an:: aligned to each other so as 10 maximise the sum total of sean:s in the sequeace pairings induced by the alignment. Dynamic progrllllllJJing-basc:d algorithms wen: developed to oYeR:Ome the large scan:h space ror the solution of optimal global and local ali;nment problems (Necdleman and Wunsch, 1970: Smith and Waterman. 1981). Dynamic prognunmin, is a general algorithmic technique that solves an optimisation problem by recursively using 'divide and conquer' for its subproblems. Faster heuristic word-based alignment algorithms were later introduced for large database similarity searches (BLAST by Altschul eI QL. 1990: FASTA by Pearson and Lipman. 1988). These algorithms build alignments by extendin, or joining CXJmmon shon patterns ("words') that arc computationally efficient. but often yield suboptimal solutions. The interpn:tation of ali,nment scores and database search results was aided by statistical signiflcance deri\lCd from simulations and PROBABIUTY theory of eXtl'cme value distributions under Ihc framework of standard statistical hypothesis testing (Karlin and Altschul. 1990). These classic results ba\le become: indispensable tools for biomedical researchers and CXJmpulalional biolo&;ists to analyse molecular sequence data.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STATlSTlCAL METHODS IN MOLECUlAR BIOlOGY Statistical models arc also routinely used 10 consbUct probabilistic proftles to characterise the regularity ofbiological signals based on collections ofpnwililned sc:qucncesand to incrasc SENsmvrrY of seardJes. For example, a blockbased product multinomial madel can be used 10 describe the position-speciftc base disbibutions of the 5' splice sile (exon-intron junction) sipal in humans (sec the figure). which gives a richer rqRSCDtalion of the sequence motif Iban the conseasus CAGIGTGAO ('I" indicates theexon-intronjuaction). A posUion-spedjic scoring malri." can be derived subsc:qucnlly using logarithms of the ODDS RATIO of the signal 10 background base 10 evalualc malc:hes of new query sequc:aces to the sequence motif and 10 quantify the in/ormation content or the signal sequeace paIICnI. 11te informalion content of a signal is defined as the a\ocrqe SC'OI'C of nuHIom sequence malcbes. mcaslftd in 'bits· using the log (base two) odds ndio scora that represent the number of0-1 s nccCSSlll)' to code for Ihis signal in a bilUllY coding system. For ilwtanccs. the human 5' splice site depicted in the ftgure contains 8 bits of information. meaning thai 'decoy' splice sites will be observed roughly every 28 256 bases in mndom sequence. Note thai the infonnaliaa content can also be formulalc:d as the retatire entl'0py (or K"lIback-uib!el' distance) of the signaito background nucleotide lmpIency dislributions in the contexi or information theory. More sophisticated maclels and scoring mallices an: also amiable tocaptlR dcpcnclcncies among neighbouring positions using MtuIcor models and oIhcrs. Another area of biological sequence analysis that n:lies heavily on stalistical reasaaing is gcne ftndinr; or, more generally. p~cling complex features from a seqllCDCC. The goal ofprotcin-cading gene ftnding is to locate gene featlRl such asexons and introns in a DNA genomic sequence. which
=
AAGGTGCTGTG CAOOTGAeTGG AATGTACGTGT CAOOTGAeCGG CAGGTATGOOG AAGGTAAAGTT CAGGTGAeCCC GCGGTAAGAGG GGOOTGAGTCA GAGGTGT'GTGC CAGGTAATCAA ACGGTAAGCCC GTGGTGAGCGG AAGGTOOGTGC GAGGTGAGAGG AAGGTGAGGGC CAGGTAAGGCA CAGGTGAGCCT
is the essenlial first-pass annotation of the pnomc project products. In addilion to inferring homologous (evolUtionarily related) gene SlnH:lUrcs rlOm database similarity searches. statistical ab initio gcnc-ftndilll programmes have been developc:d to integndc all known features and ~grammars' or prolcilKoding genes in a probabilistic model. Hidden Marko. motlels (HMMs) arc at the heart of Ihc mast popular gene finders (Gcnscan by Burge and Karlin. 1997. and reviewed in Dmbin eI at.. 1998). HMMs were originally developed in the early 19705 byeleclrical elllincers for the: problem of speech recognition -to identify what sequence of phonemes (or words) was spoken from a long sequence of category labels repn:scnting the spc:ech signal. The resemblance of the gcn~finding problem 10 spc:cch I'eI."ognition and Ihc way HMMs arc fannulated make them especially suitc:d in dais context. In addition. HMMs an: thecRlic&lly well-founded models. combining probabilistic modelling and rormallanguqe Ihcary that guarantees 'sensible' predictions that obey speciftc:d grammatical rules even though they might not be the aJII'CCl genes.
There are also wcll-documc:atcd and eomputationally emcient methods ror parameter eslimation (e.g. expc:clalion-maximisalion) and optimisation (Vitcmi algorithm). A Markov chain is a series of random cvents occurring wilh probabililies c:onditionally dependent on the state of the prc:cc:ding event(s). A hidden Markov madel is a Markov chain in which each sIBle genendcs an observalion according to some rule (usually stochastic). 1bc objective is to infer Ihc hidden stale sequence dI8l maximises the poSlcrior probability or the obsen'Cd event sequence "VCR the model. For example. the hidden Slates may repn:scnl words or phonemes and the observations are the acoustic signal.
Ia! ~!G T~ A~ ~ ¢ ~
-2 -1: ...1 ...2 +3 +4 +5 +6 ...7 +8 • A 0.34 0.65 0.10 0.00 0.00 0.61 0.70 0.09 0.18 0.29 0.22 C 0.38 0.100.03 0.00 0.01 0.03 0.07 0.06 0.15 0.19 0.25 G 0.180.11 0.811.000.000.340.110.780.190.300.24 T 0.11 0.140,(11 0.00 0.99 0.03 0.120.08 0.49 0.22 0.29
Position:
~
statistical methods In molecul. biology The human 5' spies site (exon-intron junction signBJ) 43&
STATISTICAL PACKAGES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Motif discovery is an Bra under acti~ research and has benefited from sophisticated modem statistical techniques. In a typical setting. a collection of sequc:atlCS derived from MJCIOI\RRAY EXPERDoIEm'S or various soun:es an: believed to shan: aJlDmon sequence motifs that often repmient functional domains or regulatory elements. and the challenge is to find the unknown signals and locale them in indi"idual seqIIc:atlCs. One approach is to formulaIC the multiple alignment infonnalion as MlSSL'lO DATA and infer them together with other parameters of the statistical model. given only the seqDc:atlCS as observables. Advanecd statistical modelling and itendive computation techniques such as the EM ALOO. Rl11IM and Markov chain Monte Carlo arc typically used for simultaneous model estimation (Uu. Neuwald and Lawn:nce. 1999). The function of a protein is determined by ilS threedimensional structure. The problem of predicting the three-dimensional sbUcture of a protein from its amino acid sequence (or the protein-foldi~ problem. because pn:1Ieins an: capable of quickly folding into their 5lable. unique threedimensional struct~. slarti~ from a random coil conformation without additional genetic mechanisms) is one of biggest challenges in bioinformatics. There an: three major lines of approaches for protein structure prediction: comparative modellilll. fold n:cognition and ab initio pn:diction. Comparative modelling makes use or sCQuence alignment and database searches and builds on the fact that evolutionarily related proteins with similar sequences have a similar structure. For proteins wilhout a homologous sequence of known structure. the approach of "dftading" has been developed. It is assumed that a small coIlcction of ·folds·. pc:daaps several hundn:d.s in number. can be used to model the majority of protein domains in all orpnisms. The proteinfolding problem is Ihus mluced to the IasU of classifying the query protein based on its primary sequence into one of the folding classes in a database of known three-dimensional structures. This classification is often accomplished using axnplicaled statistical models such as Gibbs sampling and HMMs to parametc:risc the fit of a sequence to a given fold and solve the optimisation problem acamlingly. Analogous 10 the gene-linding problem. one may attempt to aJlDPUlc a protein's struct~ din:dly from its sequence. based on biophysical underslaDdi~ of bow the thrce-dimensional structure of proteins is attained. The challenge can be broken down into two components: devising a seoring ftlnction thai can distinguish between co~1 and incom:ct structures and a search method to explOle the conformational space emcienlly. If successful. diJut folding certainly would give a deeper insight than the "top-down' lhrcading or homology modelling approaches. However. currently no reliable method has yet emerged in this category. During the past few years. the da'elopment of DNA anay tcchnology has scaled up the baditionally one-gene-at-a-time
functional studies to allow lhe monitoring of hundreds of thousands of genes simultaneously. A large number of statistical issues arise in connection wilh these studies and these have fosteml unpn:ccdented conversation and collaborations between biologisL~ and statisticians to establish means to plan. process and analyse these massive datasets. Many branches of statistics have been re"ivcd andlor extended by their recent applications in the analysis of functional genomics and molcc:ular data. including DATA !.IININO methods to disco\'Cr and classify paUenas. WIl1PLE 1ESTINO proocdurcs to adjust P-VALUfS to control false discovery rates and meta-analysis (see SYS1BIATJC REVIEWS AND MET.O\-ANALYSIS) to combine experimental n:su11S from various sources. New slatistical methods will soon be ncccled when combining infonnation from multiple distinct data types (sequence. gene expression. protein structures. sequence variation and phc:nolypes) for the same subjeclS. RFY
AltsdI"',S.F.. GIIII, W.. MlDer. W.. J\oIyus,E. W.andUpm.... O' 1990: Basic local alignment sead tool. Journlll 0/ MoImtlar Biology 21S. 403-10. a.... c. B. aad KarIID, s.. 1997: Prediction of complete gene SlnlclUres in human genomic DNA. Joumol of MoI«ultIT Biology 268. 7'-94. D....... It., Eddy. s.. Krop. A. and Mltddloa, G. 1998: BiologicQIJeqUentrQIIQlysis: probohilistic models ofprolrim tmd nuc/ric adds. Cambridge: Camlllidgc Uni"mil)' ~. KarIlD, S. aad Ab:..... S. F. 1990: Methods for assessing dac Slatislical sipUicance or molccuJu sequence fealuJa by using general scoriUC schemes. I'roLwdings 0/ the NatiOlftlI AmJmry 0/ Scimces O/Ilrt! United Slates of Amnim 87. 2264-8. LIu, J. S.. Neawald. A. aad LaW'l'llll:et C. 1999: Markovian struc:tlfts in biological sequence aJipmcnts. Journal of lhe Amniam Sialislimi Associolion M. I-IS. NeedlIIIIBD, s.. B. and WIIII!IdI. c. 0. 1970: A general method applicable to the scIIKh for similarities in the amino acid sequence of t\\'O proteins. JOUI'Mlof MoI«uJtIT Biology 48.443-53. PeIU'lOllt W. R. 8Dd Upm.... D. J. 1988: ImPlV\'ed tools for biological sequence comparison. PrD«eJ· ings 0/ lire Naliono/ At:adenry of ScWrlCt!J of lire Unilrd Siales of Amnim 85,2444-8. SmUb, T. H. aad W.........., l\L s.. 1981: Identification of common subscquencc:s. JOUTIItII 0/ Mol«ltlar
Biology 147. 195-7.
statistical packages
In 2010 the Association for Survey Computing (ASe) website (www.asc.org.uk) listed some around 200 statistical packages. Many of these have been underdevelopment forneaJiy 40 yean ud ~fore it is both a \'Cry mature and di,'crse soRWDIC market. While many of these around 200 packages arc developed for niche martets. there arc still several generic softwarc suites. II seems almost invidious to lJy to seleci and discuss individual packages. However. there an: clearly some wellknown and long-established packages. and 10 many the tenn ·stalistical package' is almost synonymous with SPSS"" or possibly SAS'''. Gh'en the variety of analyses that these packages offer. they can meet most user necds.1t would seem likely that a vmal monopoly should exist. but in fact Ihere
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STATISTICALPACKAGES
have been new enlranls gaining popularity. Comparing these is instructive about bends in the development or statistical software. The packages in the ftrstlable ~ the ones on which we will COMenllate here.
statistical packages Major statistical packages MajDr Jlalisl;ca/ ptlduJges
SPSS SAS STATA S-Plus
www.spss.CXJm www.sas.com www.slata.CXJm www.insightrul.com
The prevalence of these major packages notwithstanding. there are other' packages. as listed in the second table. although these will not be rurther discussed. Competition has been good for the development or programs and potential purdJascrs should always be aware or options oulside the norm that may weD fttlheir n:quin:ments. Together with the ASC YI-ebsite (given earlier). it will always be profitable to make comparisons when purchasing.
statistical packages Other major sta6slicBJ packages Other major slaliJ/ical packtlgeJ
Oenstat STATISTICA NCSS SYSTAT
www.vsn-intl.com www.slatsoft.com WWW.RCSs.com www.syslat.com
Naturally enough. one wants a slalistical package to do slatistics and lhe leading packages cover a wide range. These include basic descriptive statistics. including EDA-style chaning. comprdlc:asive cross-tabulation analysis. means testing. the general linear model, multiVDriale mcthods. daIa raluction and clustering. nonparamelrics. log-linear modelling. time series - and more. The conversion in the late 198Os-ear1y 1990s or the packages SPSS and SAS to run on desktop PCs seemed to cause a hiatus in the development of statistical methodology within these suites. Quite possibly. one of the maiD reasons for this was the need to develop new user interfacc:s, as an alternative the command-line format previously used on maiDframe and minicomputers. With the DOS interface model being rapidly succeeded by that orWindowsN • major consecutive design changes were needed. This did seem to leave a window of opportunity for new enlnmts 10 the l1UIIItel. which could write dim:tly USing modem programming an:hilectwcs. S-Plus is perhaps the earliest example or this. initially written for the UNIX system and then subsequently ported to
PCs. The dcsip was conceptWllly novel. based on the notion of an extensible statistical calculator. It provides advanced graphics facilities and has become popular with professioaal statisticians ror its ability to develop analysis methodologies. rather than being tied to a rigid rramework. Over lime S-Plus has developed 10 add extensive user interface enhancements as well as larpr statistical libraries. 'I1Ic public domain OR' (www.r-project.org) is based on a similar philosophy 10 S-Plus (see R). STATA has become a very popular alternative ror similar reasons. Slarting out as a command-line-dmocn pqmm. it has maaured over the years 10 offer a windowing interface in addition. Its attractiveness to researchers has been a madem approach 10 statistical testing. as well as its ability to incorporate new mcthodologiesquickly. Not only do the developers have an architecture that pennitseasy inc:Rmental expansion. users themselves can program their own procedures. l'bis has gained lhe support or the professional statistical community. who duough their educative role have promoted the package"s popularity. PaJtly as a n:sult ofCXJmpc:lition, packages have also begun to differentiate themselves in terms of extending extra support to the whole data analysis process. While the actuallCSt result remains the core of any analysis. data managemcnl is rar mon: demanding in tcnns of time. 'l1Ie ~l'(lCs nc:c:ded to support DATA MANAOEMENT in a MULnCENTRE clinical11UAL 8JC significantly larger than those for a classical experiment. In these scenarios. managing and manipulating data prior 10 analysis becomes very impoltanL SAS has long specialised in data management support. with 8exible procedures ror merging and manipulating datasets. as well as links to database packages. In the pharmaceutical industry SAS is almost a de racto standard for major analyses. reRccting its ability 10 support the strong audit n:quirements in the: industry. To a c:ertaiD eXlent other' packages have been n:slrictcd 10 the rectangular data madel (orsprudshcel) viewofdata, although all are now improving these: fc:alwcs. One direct effect ofthe development ofstatistical packages has been to intnxluce the possibility of Slatistical data analysis to a wider audience than just statisticians. Since these usen 8JC often in ftnance and a:JIDIRCrc:e. they represent a signiftcant revenue sbeam 10 PKUge producers and making the program user fiiendly for nonspecialist audiences has become a priority for some. SPSS's menu-driven 'point-andclick' interface. ror examplc:.. epitomises this model. In conlnlSt. the command-line models or SAS. STATA or SPius n:quire more dc:dicatccl training. although as noIcd earlier all have developed similar facilities. (STATA 8 inllOduced a menu-driven interface in 200310 complement its traditional command-line orientation.) Inlepaling advanced chda-entry featuR:S with a statistical analysis package is common. The pralominant spreadsheet 437
STATISTICAL PARAMETRIC MAP _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ daIa entry model CaD be enhanced 10 include: dala entry fonns. daIa checking and audit. The large pacltqes such as SPSS and SAS provide "add-on' pmgrams for this. Other programs provide din:d database links so that data entry CaD be proVided in a normal propamming package such as Microsoft Access and then din:clly impoltCd for analysis. While lraditionally the n:sulls of an analysis IR interpreted and then incorporated into a final n:poIt. packages have begun 10 differentiate themselvcs on their ability to produce tables and results Ihat can be din:clly pasted mlo a presentation qualily report. Packages vary widely on their ability to do Ibis and support CaD be patc.hy. SPSS provides a very good ability 10 rno\'C: n:sull$ tables.. but the exported paphics an: not of such a good qualily. STATA. by contrast. does not offer sophiSlicated export of results. but has in ils IateSl versions excellenl graphical OUlpuL SAS offen full programmable reporting feahln:s that an: very dexible. but challenging for the naive user. While the main focus of aDy statistical user is on the large packages. dedicated packages still have a role. As aD example. propalDs such as NQUERY (www.sIaIsol.ie). dedicated to sample size cSlimaiion. do one particular job very well and an: popular as a n:suiL The lone. innovativc raearcher (aD example perhaps being MX round at www.vcu.edulmxI) is also a likely producer of innovative soRwan:. An important dimension for the individual consumer CaD be price. Some: orlbc major pacltqes have prices that match their capabilities: the single n:searcbcr. particularly in the aJI1UDCR:ial sector. may find this an important ractor in choicc. All the relevant websites can give guidance on obtaining price qUOlalions. Rather than ossifying. the marketplace for statistical 5OftWIR is healthy and n:scarchcG can find themselves well CS supported with a choice of divenc packages.
statlsUcal parametric map
SeeSTAnmCSlNlMA01NO
statiaUcaI refereeing There have been hundreds or review artic::les published in the biomedical literature that point out statistical enon in the design. conduct. analysis. summary and presentation or research studies. 1be contenls of every generul medical journal (most notably AnMIs 0/ In1eT1fD1 Medicine. BTili:lh Medical JOllmol. JoumoJ
of lire
Amerialll Medical Auociolion. Ltnlcel and Nen' Englo",' JOIlTnoi 0/Medicine). as well as or many spc:ciaiiSl ones. have been subjectcd 10 this intense scrutiny sometimes frequently. 1bese n:view articles have focused on particular SIaIistical tests, rrequency or usage and corn:ct application of tc:chniques or statistical analysis. design of CUNlC'AL TRIALS and epidemiological studies. use of POWER calculalions and CON. FIDI!NCEJN1'ERVALS and many other aspecls. Their almoSl univcnaJ conclusion is Ihat a substantial pcn:cntage or n:scarch studies. perhaps as many as SO'.it.
published in the biomedic'aI literature contains cnors of suflk:ient mqnitude to cast some doubt on the "alidily of the conclusions that ha,,, becn drawn. This does not mean that the conclusions IR wrong. but it does imply thai they may not be righ" and this inevitably leads to serious concern about the consequences both for understanding ofdisease and ror the tn:atment of patients. One solution 10 this problem has been Ihe introduction of medical slatislicians into Ihe peer ~view process. Some have advocatcd thai all submitted papen should be scrutinised in this way. arguing Ibal statiSlical review of Ihosc that are not published. no matter how poor. will at IcaSl lead to higher standards in research and improvement in ruture papers. In view of the very large number of biomedical journals and the huge numbc:rsorpapcn submillcd for publicalionevcry year. such a remedy is impracticable. An alternative. now used by severaljoumals. is 10 divide the peer review process inlG two Slages. whereby papers considenxl by the editors as candidates for publication are sent first to subject mallCr refem:s (physicians. surgeons. epidemiologists. ele.) and those R:COmmcnded for publication by them aM then sent 10 staliSlicians ror further specialist review. 1be process of statistical revie\\' is atrnplex. requires sophisticated judgement and varies considerably in its ape pUcalion to evcry section of a paper (absllac" inlroduclion. methods. results and discussion). Altman (1998) ~views some of the diRiculties and provides practical examples of boIh definite enors and matters or judgement. within Sludy design. analysis. pn:sentation and interpretation. There an: 12 broad aims of slatiSlicai revicw thai can be summarised as rollows: to prevent publication or studies that ha\" a fundamental law in design: 10 prevcnt publication of papers that have a fundamental flaw in in'erprelolion~ to cnsure that key aspects or background. design and methods of analysis an: reported clearly: to ensure lhat key fcatures of the design an: reftcctcd in the analysis: to ensure Ihat the best methods of analysis. approprialc to the data. aM used: to ensure that the pn:sentation of n:suIls is adequate and employs summary statislics Ihat are justified by the design. Ihc data and the analysis: to ensure that tables are accurate and are consistent boIh wilh the text and with each other: to ensure thai the style orftgun:s is appropriate. that they an: consistent wilh text and tables and not Wlduly n:pctitious of other content: 10 guard against excessive analysis and spurious accuracy; to ensure that conclusions ~ justiftc:d by the n:sulls: 10 ensure that content or the discussion is justified by the results and, in particular. Ibat it avoids generalisation rar beyond the confines or the paper: and. finally. 10 cnsure that the abstract accords with the paper. 1be statistical reviewer may also comment on subject matter when an expert within the medical specially of the paper. bUI will nul indic'atc typos. except when these an: critical for accuracy within rormulae or text.lndecd. pointing
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STATISTICSINIMAGING out inconsequential typos is nol part of any aspect of any review pI'OCess; they should be disrcganlecl by expert reviewers and left entirely to the joumal's copyedilOr! Since slatislical review is complicated and. far the ~icwcr. sometimes excessively Icdious. with the necessity of making vel)' similar. sometimes the same. c:onunc:IIls aboul manuscript ancr manuscripl. ddailed slalislic:aI guidelines and checklists havc bc:ca wrilten with the spealic ink:ntionofhelping authors (and revieweD). 1hese have been suppadcd by the edilan of maD)' biamcclical journals and maml to in the jounal's piclc:lines 10 aUlhors. Examples can be found in Allman el aL (2000) and Gardner el III. (2000). "I'bosc mast widely usc:d for clinic:allrials arc: the CONSORT guidelines (Moher. Schulz and Altman. 2001, updated 2010). far which there is accompanying explanation (Altman el al, 2001, also updated 2010). ad extension to cluster trials. noninfcriarily and eqUivalence randomisccl trials.. herbal mc:dic:ine illlCr\lCDlions. nonpharmac:ological illlc:naUJDs. banns. abstnK:1s and prqmalic trials (see www.COIISDI.l-stalcmcnLCJII). The checldist thai forms pad of the CONSORr stalernenl is inlCllded to accompany a submiUcd paper and to indicalc where: in the mamascripl each item in the checklist has been acldressed.lhus serving as a useful ranincIer to authors and an aide 10 mcm:S. 'I1Ierc: IR also rc:cenl guidelines for n:polling r.El"A-AN.\LYSS (PRiSMA. which supcrc:c:dcs QUORUM), far obSCI'Yational studies (STROBE) and for genetic ASSOCIATION studies (STREGA): details can be found lhrough the EQUATOR network (www.c:qualor-nciwork.org). Stalistical n:Yic:w is intended 10 be helpful aad constJuctive: it should also reassure authors and Jadmi that publisbcd papen IR sound. However. il is not always secn flUID this perspective: and editors of joumals nc:c:d to be vigilanI in ensuring thai il does not become a focus far conlloveny and dispUlc. as ca happen. for eumple.. when authors parade lhe Yiews of "lhcir own statistician' to counler cammcats from a ref~. There is aI prcsentlilde incentive for statislic:ians 10 cnPIC in such review - itcloes ncl cnhanc:c thcir~, there is no spcc:iftc InIining for it, small (if any) remuneration. it is lime consumina and "the only likely aJRaac consequence or goad ~iewilll is fUlun: l'IXluesls farmorc: reviews' (Baa:helli, 2002). Bacchetli also points out thai slatislics is a rich an:a for finding mislakes and. when coupled with "the notion that randing Raws is the key 10 high quality peer review', can lead to 'finding Raws thai arc: not rally ~'. This rc:infarcesthe need for sound Slalislical judJemeat. Stalisliciam may also havc 10 CXIUJItc:r mistaken criticisms fmm subjc:ct matler n:Yiewers with limilcd statistical knowlcclgc (Bacchcui, 2002). The final part of statistical Rview is usually a n:commcndation to the joumal"s editor cilher to accept. 8C1t1ept with revision. RVise and resubmit. or rejcct the paper. n.e distinction between the second and thinl is sometimes climcult and can only be made by balancing the exlc:nt and nature of the: RVisions against the capabilities ofthcauthonuCYinc:ed
from the submitted paper. Rcjcc:tion by the statistician can also lead 10 provocation. especially as authors will be aware: that their "subject matter' peers haye already judged it sound. In 1937 the: lmrce,' s leading article lbal heralded the: series of classic papers by Bradford Hill on The Principles 0/Medit.YIl Statistics forcwamed: "II is exaspend.ing. when we slUdied a problem by methacls that we have spenllaborious years in mastering. to find our conclusions questioned. aDd pemaps refuted. by someaae who could not have made the obsenaliolW himself. It requires morc: equanimity than most or us possess to acknowledge thai the fault is in ourselves.' Authon of papers an: advised to n:ad staIislicai reviews carcf'ully. pul them aside far 48 hounand anly tbcnslalt to think about how to respond. For flUther infonnalion and discussion sec Rubinstein (2005). Smith (2005), Wan: (200S). TJ AftIam, D. O. 1991: SlIIislicaJ moiewinl for medical journals. Sla.is.its in Mmiti"~ 17.2610-74. A!fm.... Do 0 .. Gore, 5.1\01.. Gudaer. M. J. MIl Poc:ac., S. J. 2000: S'a.isliml gaitieliM, Jor COIIIribu.ors 10 rrwtIicaJ jourlltlls. In Altman. D. G.. Machin. D•• BI)'IRt. T. N. andOardDcr. M.J.(cds).Statir,;cs wi,h colljitltnte.2nd edition. Loadoa: BMJ Books. 171-90. ~D.G.. Sdlab,K. F.. Maller. 0.. Eaer.l\L,DattdDII.'., m.......... D., GabIdae, P. eMIl ...... T. ,_ tile CONSORT Groap 2001: 11ac R:\iscd CONSORT stlllemcnt far ~podinc raadomiscd bials: ClpIaDalioa and elaboration.. Anna& of In'~mal Medic_ 134, 663-94. 1IKdIettI, P. 2002: Pccr n:vic'W or stBlislics in medical mcaKh: the eMber pmblcm. BrilUh Mmiml JOIITfIall24. 1271-73.0........ 1\1. J., I\ladda, 0.. C_pIIeD. I\L J. MIl AItmaa, D. O. 2000: Sla'is'im/~h«klis's. InAitmaa. D.G•• Man. D.• BryIilL T. N. and Gardacr. M. J. (cds). S,alis.ics "'i'. toII/idence. 2nd edition. Loadan: BMJ Books. 191-201. M......,D.,Scb.K.F.... Altala.D.O.
rar ... CONSORTGraap200I:TheCONSORTstatcmenl:R:\'iscd n:cammcndaliCIIS for ilDpRtYinI the quality or n:pcIIls of parallelpaup nmdomiscd 1rials. Annols of'"lemtll Mmitirre 134,657-62.. RubI..... LV. lODS: S'alis,;ml rn'irK' /Dr 11f14itlll jDumals. guitkliM$forQu'_rs.ln CollOft, T. and Annitage.. P.. (cds). Erlt)'CIopnJia of Bio.stalis,ks. 2nd edition. 2005. Chichc:sta: John Wile)' " Salls UcL p&&Cs SlC»-SI92.. S...... R. 2005: Sta.istical rerietl'Jor merJimljDumals.joumtIl'spmpett;rr; In Colton. 1: and Annitap. P.. (cds). Enc)'tlopet/itl ofBiDJla.istics. 2IIdcdition. 200S.Chicbcstcr: John Wiley .t SoDs Ltd.. pages 5193-5196. W~ J. H.2OO5: Sialis'imi rerie.'for mmicaJjoU11lQ&; In Colton. T. and AnailllC. P..(cds).EncytlopeditlofBitJJlalislks.2IIdcdition. 2mS,Chicbestcr: JohD Wiley a Soas LtcL pqc:s SIB6-SICJO. statistics In Imaging
This is the u~ of statistical
lcchniqucs to analyse and quantify information conlainc:d in digital image formal. Imaging is widely used in medicine to visualise objects. strucltRS and e\'Cn physical processes in .-;'0 and in ,ilm. A significanl advanlap in medical imagiag is the ability 10 visualise Slnlcblrc:s or proc:c:sses without Rlying on surgical opcndiolW. Thus. animals may be rc:cyclcd in dnIg discovery and development or patients may nol sutler Iiom inbusive proc:c:dun:s. 11ae ability to acquire inrormation withoUI inlnlsive procedures is also a
431
STATISTICS IN IMAGING _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
mc:cIical imqing. This raises Ihc issue or sunup'" imqiq DIDPOINIS (or srllTOgtlc,.): i.e. bow well do the cooclusioas from an imaging experiment carrcspDDd to physical properties obcaiacd rrom an inlnlsiw pmceckue? Although the human visual s)'Urn is ~ good at exll1lcliDJ infonnatian fmm imaps. the sheer IIIIIDIIIItof dais being produced c:mdcs the 4XIIDIIIOft problem of ·nat enough lime to look at CYel)'thiq'. Slalislicallcchniqucs using c:omputen enable n:sean:hers and clinicians to summarise large numbers of images rapidly solhal paIIcmS, nnels, regioasofactivalian, etc.. may be idcnliftcd and quantil1cd.. Besides the amount or information. medical imaging sYSlems also see beyoncIlhc visible lipt spectnmI and IR able to process inf'omudiClft lium a wide range of die eleclnJlnagnetic spectrum. Examples of mc:dical ima&ing systems include conveadisadWDtage to
tional radiology (X-rays). angiography (imaging or a system of bloacl w:lsels using X-rays). positron emission t0mography (PET). X-ray transmission computed tomography (CT). mapetic resonance imagiq (MRI). miaoscopy. silllie photon emission (computed) tomography (SPET or SPECT). spcc:troscopy and uJtnlsDund iraqing. Eveo eJcctroenc:ephaIognuns (BEGs) or magnceoeacephalograms (MEGs) ~ examples or imaging systems. albeit with very poor spatial ~ution when compa.raIto MRI or PET. An image is a two-dimensional function Ibat depends oa spatial caordinates. when: the amplitude or the function represents Ihe brightness or gn:y level of die image at a particular poinL Individual elements or the image ~ Imowa as picbm: elements, pixels fOr shoIt.. Imqes may be
coDceled to farm a ducc-climensianal data SlrUc:t1R., or volume. where the individual elements 1ft called voxcls. This is COllUDOD in. few exampI~ MRI aad PET. where an experiment an a single subject wiJI involve acquiriDg information in ~ spatial climeations and in time. nadilional sblljstical techniques in image analysis iocludc aras such as signal and morpbolCJlical processing. Signal processing applications inclucle image enhancement. imqe n:stantion. colour image pnx:essing. wavelds and campn:sIioo. Morphological processing assumes that set theory may be applied to manipulalc slructun:s pn:seat io an image. A n:lalivcly new area or research in imaging is Ihc use of MRI in limctional or phumacological lIuc1ies of the brain. PunctiClllai MRI (fMRI) is now weD developed and secb to assacialc brain ftmctions (human or animal) with spcciftc n:gions or the brain. Phannacological MRI (phMRI) is n:latively new and sccIcs to associate phannacakinelics with speci8c n:gions oftile (animal) brain. Although group studies an: widespmul, consider a singlc-subject analysis fmm a typical fMRI experimenL After cIaIa acquisition. a sct of images usociatcd with distinct slices oflhc bnUn is available ror .-lysis. Each slice will have a time scqoence assacialed with it; i.e. the iraqiag experiment contains both spatial and temporal information. Given knowledge or the study design. the goal is to icIenIify n:gions orthc brain when: signiftcant activation was observed, where activalioa is by die intensity or the signal observed in the fMRI experimenL Signal intenlity is relabl to die ratio or oxygenaled and deoxygenated blood locally in Ihc brain.
mc:asurm
(X,Y) • (SO.30)
I I
...
0
I I
I 4) 0
.50
ro
7(]
ao
Bl)
0
20
•
80
80
100
Time
statistics In Irnllling Exampleo'anAlRI slice {Ie"} and voxel time coutSe (right). TheexpstimentlJldeslgnhasbeen supetimposBdonlhB timeCOUISS plot whenJthe vlsuBlslimulatlon Is shown byadashBdlineandtheaudiostimulalion ;s shawn by a dotted line (data providBd by the BIlIin AfIIpping Unit, Depattment of PsychIatry, University of cambridge)
____________________________________________________________
The Ume course in the figure (page 440) shows a typical slice f'rom an MRI experimeat and the study design of ontoO' sequences for \isual (dashed line) and auditory (dalted line) stimuli. Each voxel in the image has an associated time course: a mask that climinalcS nonbnUn voxels is typically used to focus the data analysis. UNEAR RfXJRESSION. or. mom fully. DUinr: the OENERALISfD LINEAR MOOEL (OLM). is pc:Iformcd on each voxel using the: experimeatal design. convolved with a funclion to model the hKmodynamic ~MC of the paUcnt. as the independent \·ariable. T~nd n:moval is an important slc:p and may be applied as a preprocessiDg step or by incorporating low-fn:queacy terms explicilly in the OLM. The typical assumption of inclcpcadcnee belween observations is noIlnIe in IMRI daIa~ methods such as prc-whilCDing. automg~ssive modelling and least squ~s with adjustment for correlaled enon ~ auempts to oven:omc the limitations of ordinary least squ~ Fitting the OLM to fMRI data may be performed on an individual voxel. on a cluster of voxcls knowD as a region of i~ (ROl). wheM the data ~ averaged in space to prodUCIC a single time course:. or on every brain voxcl in the image. For the fint IWO cases. standard theory for statistical inferenc:eon ~on mocIcls may be applic:d. Focthe lhird case. techniques such as Oaussian nmdom field theory. n:sampling (see BOCJ1'STRAP) and adjustments by multiple comparison procedu~s have been used. Rcprdlcss of which mclhod is applied. a sel of vaxels is obtained where signirlcanl acUvaiion during the experiment was detected. Resean:hcrs thea ~lalc the images 10 the anatomical regions idcaUDed in Ihc acUvation image. also known as a staUstical parametric map (SPM). Infonnation from a group of patients may be combined or compared by fint registering all images with a slanclanl brain. 111e mosl common brain adas used is the Talainlch alias. 111en. a random effects or fixc:d effects mode) (sec LINEAR ).UXID EFfiEcrS ).I()DElS) may be used 10 apply a statistical hypothesis tesl between groups ofsubjects in the experimcnl. For mom details sec Serra (1982). Olubey and Horgan ( 1995), Moanen and BandelUni ( 1999). Oonzalez and Woods (2002) and Worsley el al. (2002). B\v Glasbey. C. A. aDd Ho,..II,O. W. 1995: Image QllQlysu/or tire bioiogittJI ltienC'~s. Chichester: John Waley &; Sons, Ltd. G __lez~ R. C. aDd Woods, R. E. 2002: Digital imag~ proces· sing. 2nd edition. Englewood Cliffs. NI: Prcatice Hall. Mooaea. C. T. W. ad ""'ettlal,P. A. (eels) 1999: Fllllclionol MRl. Berlin:
Springer-Verlag. Serra.J.I982: IlfItIgetlJlolysu tIJIdmalbemali«ll morphology. London: A£ademic Press. WonIeJ. K. J .. L..... C. H.~ AltoD, J., Petre. V.. DIIIIC8IIt G. H.. ,.,....... F.1IDd Evaas, A. C. 2002: A geaeral staUsticai approKb for fMRI datL NeuroImtlge IS, I. 1-15_
StatXact This is a specialised softwan: package for the exact analysis of small-sample categorical and nonpanunctric
STA~CT
data with special emphasis on data in the form of contililency lables. 1bc term "small-sample' applies equally 10 datascts with only a few observations. to large but unbalanced datascts or 10 ~y TABLES wilh zeros and small cell counts in some of the cells but )~ cell counts in other cells. In these sellings. SIalXad produces exact P-VALUfS and exact CONFIDENCE INIBlVALS instead of ~Iying on possibly un~liable lillie-sample: theory for its inre~nces. The inference is based on genending pcnnutaUon dislrlbulions of the approprialc test statistics in a conditional ~ference set. Diffe~nt reviews of StaIXacl arc given by Lynch. Landis and Localio (1991). Wass (2000) and Oster (2002). 111e cunent version. SlalXac:t 6. offers exact P-valucs for o~. two- and K-sample problems. 2 x 2. 2 x c and ,. x c conlingency tables and meas~s of ASSOCIATION. The data may be eilhcr unstratified or SlJ'aIiftc:d. Both independenl and blocked samples arc accommodatc:d. This version computes the exact conftdencc interval for 0005 RA11DS Ihat arise from 2 x 2 and 2 x c conlingcncy lables. as well as an exacl confidence interval for the M~ shift parameter in an onIcn:d 2 x c conlingency table. StalXacl OO'CIS procedura that calc:l' explicitly to binomial data. Poisson daIB. nominal catcgoriaal data. ordcn:d categorical data. ordcn:d correlated categoriaal data. continuous complctc data and ClDRtinuous right-«nsorc:d data. For comparing IWO proportions (either from dependent or independent samples). StalXacl provides lhc exact uncondiUonal ClDRfidcnce intcrwl for a difference in proportions or Ihc ratio of two proportions and computes exact P-valucs for tests of equivalence and noninfcriority. In addilion to Iools for exact inf~ncc. StalXacl also provides exact power and sample-sizc calculations for saudy designs involvinr: one. two or several binomial populations. In the two-binomial case, thc:se feawres inc:lude exact power and sample-size calculations for designing noninferiorily and equivalence studies. In case the computation of an cxac:t P-value becomes infeasible due to the lack ofeilhc:rUmc orcompuling memory. SIaIXac:t prodUCICSBn unbiascdcslimate oflhcc:xactP-valuc 10 at IcaSi IWo dcc:imal digits of accuracy USing ef1icient MonIC carto simulation slrategies(see MARKOV CHAIN Mo.'«ECARLO). The user can arbitrarily increase the number of Monic Carlo simulations in order to incmI5C the DCCUI'Ky. StalXact 6 runs on Microsoft Windows NTI2000IXP as a saandalone prodUCL In addilion. a special version, StatXact PROCs for SAS Users. is available as eXlCmal SAS proc:cdu~s for bolh the Microsoft Windows and Unix operating sysaems. CCoIPSeICMINP Lyadt,J. c., '.MMUs,J. R. _1M", A. R. 1991: StalXact. The American Sltllislirian 45.2. 151-4. o.ter~ R. A. 2002: An examinlllion or stalislical software packages for clllegOrical elida analysis using eltKt methods. The Amerittlll SIoI&Iitian 56. 3, 23s-46. \Vas. J. A. 2000: StalXact .. for Windows. Biolem Software and Inlnnel Report I. I. 17-23.
441
fiEM-AN~~~OT
____________________________________________________
stam-ancHea' plot
Essenlially. this is an enhaac:ccI in which the actual data values arc retained for inspeclion. 0bsc:mxI values are each divided inlO a suilable 'stem' and 'leaf'. e.g. thc lens ftgure aad the units ftpan: in many examples. and then all the leayCS cam:sponcIiq to • particular stem are listed (usually horizaataIly) next 10 the value of the slcm. An example is shown in Ihc: ftpan:. 1OS'IOCIlAM
i4 : 2 i4 : 555
i4:~
i4 i5 15 i5 i5 i5 i6 16 16 16 16 17 17 17 17 17
: : : : : : : : : : : : : : : :
889 0000001 ttiii 2222ft?????233333333333333333 ...........555555555555555555555
686668~&~III"III','11111
888888BBDBB8_ _ 8~ OOOOOOOOOOOOOOOitt111 tti111UU11 U 333333333333333333333333 ••••••••••••44 ••• ~55555
688666&&&881111111
8888880G989999 00000000000111
333 4 67
88
stem ..d-leat plot A slem-and-lealplot tor lire heights in CBtJIimetres 01351 fIIdedy women The plot awnbines the visual piclun: of the clata provided by the histogram with a display of the orden:cI clata values. 11Ie cIesip of slcm-and-leaf plots is discussed in VeUcman and Hoaglin (1981). II is important 10 use a typeface for which each digit occupies equivaJeat space. otherwise a key SSE feature ofbeiq 'a histogram OIl its side' is losl.
V......, P. F.... R....... D. C. 1981: Appliclliaas.llasic:s. and computiq of aplanlory data analysis. Baston: Duxbury.
slepwl. regression Sec LOOISTIC REORESSION., MULn. PI.E lJNEAJl REGRESSION
stochastic process
'Ibis is any system that develops in 8IX'OIdancc with probabilistic laws. usually in time but sometimes in space and possibly even in both time and space. Forexamplc.the spn:ad ofanepidemic is a stochastic process and its de~lapmcnl can be bKkc:cI in lime. across some terrain or al the ClGIIjunclicm of bodI lime and position. The constiluents of a SlOChastic process are ilSslaie. X say. and its In••dng ,. .iablets,_ s or I. 'I1Ie state is the primary measun: of interest. such as number of individuals ill. while the indexing variablc denotes either the time (I) CII' the posilion (s) at which the state is measured. A discrete indexing variable is usually shown as a subSCript, but a ClGIItinuous index appears within lraclilional function nolation. For example. suppose that the state oftheepidenaic is the number of indiYiduals who are ill. 11Iea X, would denote the
number of individuals ill at lime I if observations wen: laken at theSIarl oreach day. whileX(s) wouldclcnote the number of individuals ill III position s measured cantinuously in spacc. Of course. the: state of the process can also be either cliscrete (e.l. number of individuals ill) or continuous (e.g. £ca reading of a cardiaC' patient). An essential ingn:dient in a stochallic JXUCCS5 is the: tleperrtimce of either SUlX"essive or neighbouriq abserwlions. Different assumptions aboul the dependence sbUctun: lead 10 dilfemll types of SIOehastic process. which can be used as maclels for many observations collected in practice. The objective is usually 10 cIcri~ theoretical PRO&\BILI11ES far the Yariaus sblb:s of the syslcm and thus to use these pdJabiJities either far predictiq the future behaviour orlbe syslcm ar for pining some understanding orits mechanism. Many practical systems can be modelled adequaacJy by assuming 8 Markovian clepenclencc structure. in which the PROBAIlun DlSTRlBUDDNof X depends ODlyonthc mast m:enl or neighbaurly value.. Standard stochastic pracesses that accold with such an assumption includc random walks. Markov chains. branchiq processes. birlh-aad-clcalh procelles. queues and Poisson processes. Jones and Smith (2001) proVide an aetasible introduction lolbe mathematics of such processes. Some classical appIicaiiODS of stochastic models to medicine arc described in Gurland (1964). Succasful usesofMarkoY aaaclcJsin medical caaleJLts l'DDIe in time and appIicalicm Iiom the planning of palienl CBR: (Davies. Jalmson and Panuw, 1975) to n:soun:e pIO\'ision (Davies and Davies. 1994) and the cosl-effi:ctiveness of ''Kcines (Byrnes, 2002). Many more examples can be round in jaurnaIs such as Health Can'MlIIIIIgemml Science. WK 8)1WS, O. B.lOO2: A MIIIkov model for sampIc six c:alculalioa and infCn:DCC in vaa:iae CCI5l~ti\'CIICIS Sbldies.. SI,,'isl;ts in 11_ _ 21,3249-60. DatBtR. ... Da_R. T.O.I9M: Madellinlpalielll ftows ad rcsaume JIIUVisiaD in Ixatlh systems. omtr- ''''mtIIliontII JDIIIIIIIl D/MtI1IIIgmfmI ~ 22, 123-31.""" ~JGIuIsan, Dad Fuww, S. 1975: PIaui. patiaII eR 'Ai'" a Markov model OpmlliotltllResearrJa QuorIeriy 26. S!J9-(a07. Gurllal,J. (cd.) 1964: SlodIGstk motIell ba mftiirine tIIIII biology. Madison, WI: Univcnil)' of WlSIXIIIIin PIal. J.... P. w.... SmItIa. P. 2001: Slodmlk pr«.nMS, . . introtiuditm. I.oadan: AnIold.
....atlfled randomlsaUon Sec RANDOMISATION
stratilled ..mpllng SlnItified sampling occurs within defined slrata of some papulation. This should be canied aut when the papulation contains easily identifiable subpopulations. If thc sizes of the sbala are cliffcn:ntthen proportional allocation should be used. If the srANDa\RO DEVIATIONS an: known in advance then optimal or Ncyman allocaticm CaD be used 10 minimise the VAIiAHa of the estillUde of thc papulalion MEAN. If they an: unknown it is passible to use a pilot saudy to estimalc the SIaDcIard devialicms.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ STRUCTURAL EQUATION MODELS The method is as follows. Define the strata that the population falls into. Decide if the slnlla are of a similar size and if the standard deviations are bown. For similar sized strata use simple nndom sampling to selCCI members or each stndum.lfthe siKs are difTen:nt then the number in each stratum is propoltionalto slndum size. Thea simple nndom sampling is used to obtain thecOll'CCI number in each slndum. If the standard deviation is known in advance then for a fixed population size. n is obtained by choosing "J 10 that: nj ~
n
NjSj I
~ N..S",
when: NJ is the number in the stntum. S, is Ihc standard deviation of values or items within the strata. n is Ihe fixed population size. ", is the number to be chosen by simple random sampling from the stratum and s is the number or stratL Thornhill eI QI. (2000) used stratified sampling ina study or disability following head injury. The patients wc~ stmlified 8CClOI'ding to Ihc Glasgow coma 1COI1:. The mild aDd unclassified patients were further stratified by Ihe pmsenling hospital and a simple random sample was taken. In general. if the: population can be: separated into distinguishable strata Ibea the estimates from stratified sampling will be: more precise tban from a SlYPLE RANDOM SAMPLE and the~fore it can be efficient. The disadvantages a~ that it CaD be difficult to choose the: strata. it is nol useful without homogeneous subgroups. it can require: accurale inrormation about the populalion and it can be expensive. For more details sec: Crawshaw and Chambers (J 994) and Upton and Cook (2002). SLV CnWlllaw. J. aad CllalDben, J. 1994: A crm~M COllrX in A Inoel
slotulks. 3nI cdilion. Cheltenham: StanicyThomcs PublUhers lid.. TboralllD, S., T...... G. Me. Mil1ft)', O. D., McEwea,J., Ray, C. W._ Peaa7. J(. L 2000: Disability in JOUIII people and adults one )·ear after head injuJy: pruspecti\'C cohan study. Br;l;m Medical JDlmNli320, 1631-5. UptGa, G. aad Coak. L 2002: DitliDnory t1/
stotulits. Oxford: Oxford Unh'a5ily PIas.
structural equation modelling software
The
four most commonly used packages for ftlling slnK:tural equation models are:
EQS (http://www.mvsoft.coml) LISREL(hUp:Jlwww.ssic:entral.coml) MPlus(hUp:/Iwww.stalmodeJ.comforder.html) AMOS(hUp:lIwww.amosdcveiopmenLcoml) All four allow the fitling or complex models n:latively easily. allhough MPlus is possibly the most ftexible. Each package's website proVides speciftc information on their SSE capabilities. as weD as availability and cost.
structural equation models The opcndional definition provided by Pearl (2000. p. 160) states: ·An equalion y ={Jx + E is said to be: :structural if it is to be: interpn:tc:d as follows: In an ideal experiment when: we conbol X to x and any other sc:t Zofvariablc:s (Il0l containing X or Y) to :. the value of Yis given by /J."C + e. when: £ is not a function of the settings ."C and z: The: key word here is ·control'. We are observing values of Yafter manipulating or fixing the values or X. The model implies that the values or y, in fact, are dcIerminc:d by die: values of X. A structural equation model is a description of the causal effect of X Oft Y. It is a C.o\USAL MOOEL. and Ihc pammclU fl is a measun: of the causal effect of X on Y. II should be clearly distinguished from a linear rep:ssion equalioD that simply describes the ASSOC'IATION bc:Iwc:en two random variables. X and Y. Ir we an: able., in practic:e. to intervaac: and control the values of X (by random allocation.. for example) then II is straightforward to use the raulting data to obtain a valid eslimale of fl. If. howc\'er, we: do not have control of X. but can only observe: the values of X and Y (and 2). as in an epidemiological or oIhc:r type of OBSElVAnONAL STUDY. for example:. this does not invalidale the above operational deftnition. butlhc challenge ror the data analyst is to find a valid (i.e. unbiasc:d) estimate of the causal panunc:tcr /J under these cin:umstances. Thc: eqUalion y =flx + E is. of course. a description of a very simple structural model. II is common to coUCCI data on several response variables (Ys) and several explanalor)' variables (Xs) and to construct a series of SllUctuml c:qualions of the following form: (i = 1to I; j.k = 1toJ)
(8) in which se\'cmI of the fJ values wiD be fixed 10 be ZCIO. a priori.
The oIhc:rs are to be estimated from the data. The form of lhc: equations defined in (I) - i.e. the sbUdund theoIy that cIctcrmines the paIb:m offl values 10 be cstimaIcd and those fixed at zero - is clc:lmninc:d by the inYestiptor's prior knowledge or hypolhescsconceming the causal pnx:c:sses gcDCIBIiDg the daIa. Quoting Byrne (1994. p.3): 'Sbuctwal modeling (SEM) is a sbItisIical IIICIhodology that IaIa a bypothc:siHcs1ing (i.e. confinnaIaIy) appraach to the multivariate: analysis or a stnIClLnl thccxy bearin; on some: phcaamenon.' 1)pically. SEM inyolves (a) the speciflcalion of a set of 5lnIctuml equations, (b) representation of these structural equalions using a graphical mocIc:l (a path diapam - sec: Iab:r). (c) simultaneo_ly lilting the: set of SlrUctural equations to a givea set of data in order to estimate the fJ values and 10 test the adequacyorlhe model. If the model fails to fit then Ihe inYesligalor may ~ise the model and by apin. The success of the CJCCn:ise is likely to be highly dependent upon Ihc quality of Ihc investigator's prior knowledge of Ihc likely 443
ST~RALEQUA~ONMOOBB
_____________________________________________
causal mechanisms under test and how much tt.oupt he or she has given to abe design ofabe study in the ftnI place. Oaod dcsip and subsequent statistical analyses require technical knowledge. skill and experience. For tc:cbaical knowledge. readers aremferred to inlnJduclOl)' texIS by Dunn, Everitt and Pickles (1993). Byrne (1994) and Shipley (2000). and to the adwnccd IIIODOpDph by Bollen (1919). Discussion of SSM in the context of recent work on aMlsaI infcmK:C can be found ia PeaJI (2000. 20(9)anci. &pin. ia Shipley (2000). Traditionally. SEM has aJltce:nlraled on slnlCblraI models for quantilati~ data.. which are usually assumed to be multivarialc normal. Exkmsiorw rrom the InHIitionallincar slrUclUnll equations (i.e. IJMWl RfIJRESSION) to generalized linear slnlCluni equations are discussed by Sbondal and Rabe-Hestcth (2004). It is frequently the case thal we cannot measure constructs directly, or at least not without considenblc MEASURElIENT EJlR(]Il. This gives rise to the idea of LAlENI' VAlUABLES. These are chancteristics that arc not dircctly observable. They may be straighlforwanl concepts such as height. weight. amount of exposure to a known toxin. or concenlnllion of a given metabolite in blood or urine. but we explicitly acknowledge that they cannot be measun:d without error. 1be observed measurement is a manifest or indicator variable. while the aJI1esponding IDIknown. but true. value is a latent variable. Howe\ler. laicnt variables may be more abslrac:l lhcorelical ClDDSIrUcIS thai arc inll'Oduced to explain COVARIANCE between manifest or indicator variables. An example oflhis Jast type is the set of SCeRS on a battery of cogniti~ tells that are assumed in SOnIC way to ICBec:t a subject's cognitive ability 01' general intelligence. Another example wuld be a set of symptom severily scores (the manifest variables). which an: assumed 10 be indicators of a patienl's overall dc:.grc:c of depression (the lalcnl variable). Typically, a data anaIySl will propose a fannal measurement model (usually equivalent to same fonn of factor analysis n:presentation) to RI8Ic the observed DlClIMRments with the underlying latenl variables. We can then proceed to propose SlruClurai or causal hypotheses involving the latent variables instead of the fallible (error-prone) indicators. We staJt, for example. with a COVARIANCE MA11UX for the observed variables. We ftt a genenal slructural equation model to this covariance or moments matrix. "this procedure will involve the simultaneous fttlin; of the mcasun:ment equations for the relevant lalcnl variables and their c:orrespondin; indicators and ofthe slrUCturai equations thought to rcJ1ec:t the assumed causal relationships between the latent variables. SpeciaiiSl software packages are now widely available for such analyses
din:clionofacausaJ effect) ora c1aublc-hcaded one (indicating CORRELAlKlN). The observed or manifest variables are usually placed withia a rectangular sqUIR box. while laIcnl variables
are placc:d within an oval or a ciKle. Random measuremc:al errors and residuals from sIruClural equations. althou", they are sbictly speaking lalalt variables. are not traditionally placed within a circle or oval Path diagrams an: YCI)' closely related to the paphicaJ representations (cliRCtcd acyclic ;raphs, or DAOs. for example: see CJRAPHICAL MODElS) that. have Rlali\lely IUlCntly been developed elsewhere (see Pearl. 2000. 2009, for example). 1\\'0 Simple examples of path diagrams are shown in abe two ftpres. A detailed explanation will be gi\len in the roDGWin; section.
0-Y0~
8
1 _----_1 I
Ox
p
Dy
slructuralequatlon models Path clagrsm to represent the structural equations linking encouragement to stop smoking during pregnancy (Z), tire amount smoked during PIf1I1IJ8IJCY (XJ and the bitth weight of the child (Y). Ox and Oy are randomly distributed resklua/s El
E2
J
I
GG \/
o y'0,
p
1----P
Ox ...
Dy
structural equation models Path diagrsm torsprssent the stfUClural equations linking encouragement to stop smokingclutingpregnancy (Z), the true amount smoked during pteglJancy (TJt) and the birth weight of the child (Y). Ox and Dv are flIlIdom/y dislributsd residuals. X1 and X2 are error-plDftS indicators of smoking, with uncorretsted msasuremsnf enors E1 and E2
respectively
(sec S11WCTURAL EQU,A1KJN MODElJ.INO SOFIWARE).
Slructurai equation models are ~ often n:prc:sentcd by a paphical structure known as a paIh diagram (sec MlH ANALYSIS). In a paIh diagram the proposed relationships bctwccn variables (whelher manifesl or observed) an: n:prcscnlc:d either by a singlc-hcaded aJI'OW (indicating the
For an example. Pennutt and Hebel (1989) describe a trial in which pn:;naal women were randomly allocated 10 m:eive eneouragcmenl to reduce or stop their cigareUC smoking during pregnancy (the lmItmenl ;roup) or nol (the control ;roup) - indicated by the binary variable. Z An intcnnediate
outcome variable (X) was die amount of cipn:UC IIDOking m:onIc:cI dwilll prepanc:y. The ullimalC oulcome (y) was the birthwcightofthe newbomchild. Smoking is likely toha~ been nxlucecI in Ihc poup subject 10 encaurqemcat. bit also in the cOllbaI paup (allhough. presumably. 10 alc:ucr~tml). 'I1acR IR also likely to be hielden confounders (e.g. ocher health pRJIDOIing behaviours) Ihat IR aaacialcd with bath the molhcr's srnokins durilll JRII18IM:y and the child·s birth weight Smalcing (I) is aD endogenous lIcaImcnl wriable the above confaunding will n:suIt in the raicIuaI f'nmI a sb'Uc1Unl equalion madel to ~plain die leYel or smoking by IANDOMlS.VION lDnx:ei~CIICIJW1IFmc:nt beingcam:lalCd wilh the raidual rrom the slnlttural equalion linking observed l~ls of smoking ID the birth weight or the child. We assume that then: is no din:ct eft"cct or mndomizalion (Z) an OUlaHlle (Y): theel1'cctorZon Vis an iJadin,ct ane tluaup smoking (I): i.e. Zis anDBJIUIENTAL VAIL\II.E.lgJIIJIiIJI thc.inleKepl tenDs. the two stnIctural equaliaos IR the rollowilll:
X = yZ + Ox and Y = /lX + Dr In filling these IWO models to Ihc appropliale daIa we acknowledge the c."CII'R:JaIic (p) bclween the mWluaIs.. Dx and D,. (Ihase camponealS or X and Y nat explained by Z and X n:&pectively). 11Ieovcnll model isilluslrated by thefint ftgun: (pap 444). Now. what if we acknowledge thatsmokinglCM:1s cannal be measun:d accurately and we decide ID obtain two different measun:menlS on each pc:nan in the IriaJ (XI and.n. SBy,beingselr-Rponcdnwnbenofpac:bperclay,ablaincciat6 monlhs and I monIhs into the )RI1I8DC)')? The InIe level or smoking is now n:pn:sc:atc:d by the variable Tx. Our measlRmeat madel is rqRSCDted by the lwo equalions:
XI
= Tx +EI and Xl = Tx+£2
that Ihc EI and El IDC8S1RI11CIIt enan IR UIICIXMIatecI and thai ~ is no chan&e in the true 1~1 or smaking between the two times. 'I"hc n:Yised SlrUc:IUnII equaliaRs now use Tx ndher than X. as rallows: We
IISSUIDC
Tx = yZ+OxandY =/lTx +0, 11ae cam:sponcIing ..th diagram is shown in the second figure (pap 444). Note Ihat nat all of the madel pammeIcIs implied by the model in the sec:ancI ftgun:can be estimatc:d. 1hc: madel is too compa rar the clara at hand. The model as a whole is said 10 be underidcntified. but the gaod ncwsis thal we can still estimate fJ. the puamc:Ic:r mast likely to be or inlelatlO Ihc invc:atiptar. Problems or undericIcntificatian IR beyond the scope or this miry. but an: coven:d by the SIaDdanI texlbaolcs aD SllUclUral equalians madcUing n:fen:nccd below. GD ....... K. A. 1989: SiruclMIYIletpltlliMu M'ilk IlIIm' I'IBiIIbles. New York: Jalut Wiley ct SCIIIS. Inc. &yr., B. M. 19901: Slrwl"",1 etplQlion m _ling ,,';Ik Sg5 _ EQSlWindowl. 1bausand Oab. CA: Sage PublicatioDs. DaD. G., EftrItt. B. I. ..... I'kIdII,
~
-
T
S
r
N
~
U
i
f
__________________________________________________
A.1Wl: Motkllilll nnvrritulres tIIItI MIt'" lvrritMes II.S;'" EQS. Laadon: t1aIpman ct Hall. Part. J. :!OOO: 2nd cdiliDn. 2009: CQllMlity. Cambridge: Cambridp University PraL .......... T. ..... H.....,
J. R. 1919: SimultllllCDllWquatian estillllllion in a
c1iDic:aI trial of the effect of smakial and birth weipt. BiomelricI 45.619-22. SIIIpIeJ, B. 2000: Carua tIIId ~orrela,ioIr ill biology. Cambridge: Cambrid,e Univcnity SknIaIW, A. ......... ....... 1. 2004: GGwl'llli:ftilole,,' ",ritlbkmodtling: Rutlliltloe( lon,i1utlinal _ s/rwlural «J_IiDRs motkIs. Boca RaIan. fL: Chapman a: HalIICRC.
""51.
student'. "dlstrtbutlon
See 1-DLmtIBU11ON
student'.,.test
William Scaly Oosset. who warked under the pseudonym or "Student'• developed the Student's ,test. 11ae Student's I-tell is commonly referred to men:ly as the I-test. The simplest use of the l-test is in comparing the MEAN or a sample ID some specifted population mean dOs is usually called the ono-sample l-lesL The l-Iest can be modified to compaue the means or two indc:pendent samples (the two-sample l-test) and for painxl data ta mmPIR the dillerences between the pairs (the pairad l-Iest). Stuclcnl"s/-tell isa parametric test and cc:nain assumptians are made about the clatL These IR that the observatians within each poup (with independent samples) CJI' the dillcrences (with pain:d samples) IR appIOximlllely nonnaIly distributed and rar the two-sample case: we also requft the two groups to have similar vAIlfAIIUS.lrthe sample data does IIDl meet these assumptions then the analysis is seriously nawc:cl. Howc\lCl'. the l-test is "robusl' and is not greatly affected by a modente railun: to mc:eI the assumptions. The ~Ie I-tell can be used tocampan: the mean of a sample to • certain specified value. This value is usually the population mean. 11Ic NUU. HYFOI1IESIS stales that thc:n: is lID signilicant difference betW'CCn the sample mean and the population mean and the allcmalive hypothesis sIaIcs that tIIcR is a significant dillcrence between the sample mean and the population mean. Theassmnption we make: islbat theclata IR a nadOlD sample or independent obsenations rrom an underlying nonnaI clislributiCIII. 11ae tell stalillic , is PVeD by: Sample mean-Hypalhesiscd mean Standard enar of sample mean -
--0-----.--,--
1= - .....
This is aIIII~ against the l-DlSIRIBU11ON with n - I [Ii.. OREES OF RlEEDOM. whc:n: n is the sample size. So 1 is the deviation or a narmaI variable from its hypothc:siBCd mean measun:d in STANDARD ERRCR units. The stancIarcI error or the ........ """'" isc:olilllolal by ISln). when: S is !he"", STANDARD DEVL\11ON•
Fareumple.suppaseBMlvBluesrorasampleor2SpcapIc wen: measun:d and a mean value of 24.5 was round with a sample slandarcl clcvialion or 2.5. 1b test if this sample mean BMI is significantly clil1'cnml fram • population mean
S
a
y
~
P
U
O
A
6
B
U
S
___________________________________________________
BMI of 26 we can usc the onc-samplc: l-lest. whcR our null hypolhcsis is lbatdle~ is lID diffen:nc:e bc:Iween the sample mean of 24.5 ancllhe populalion meanof26. 'lllisallows us to calculate Ihc test statillic as follows: I _ 24.5-26 _ -3.0
~
*"-
Usa. ...... tar willi (,. - 1)-24 01 1i'cccIam. We ftnd a P-VAWE or 0.0062. The n:sall is _islic:ally signiftcanl and we Ihc:n:fan: accept the alternalive hypolhcsis lhallhc mean BMI of the sample is sipiftcandy
ditrercal from 26We can usc lhc two-samplc: l-lcsIlo dc:taminc lhc slalislical sipiftl:anc:cofan observed diffen:nc:e betWc:cD the me_ wlues of some variable between two subpaups 01' between sepande populations. For example,. we could look at lhc ditren:aces in heipts bclween males and females. The test slalislic: for the lwo-samplc ,-lest is giYen b)': , _ DiffCft:DCc: in sample IIlCIIIS-Di~nce in hypothesised means - stIIIdiid emil' of the Clifi'ama: in iIIC two SIIIIII* IIlCUS
F'mpJc:ntl)' Ihc: null h)'pDlhc:sis of interest is whether lhc two glOUpS have equal means and the CXNIesponclil1l twoskied allCmalive hypothesis is thallhc means an: in fact ditren:nL For eumple. when comparillllhc: mean outcome for lwo diR'~nt lR:almenls is the diffen:nce in means absc:rved a Slalisticall)' signiflcanl one? In this case the lc:It slalislic: mluc:es to: Diffen:nc:e in the IWo sample means , - Standanl CITOI' of the diffen:nce in the two sample means 11W; is then cCllllp8lalID the I-distribuIiaD with n. + R2 - 2 dcp:c:sorhedam. Whc~"1 iSb samplcsi&b'the Orslgroup andn2 is Ihcsamplesiz b'lhc sec:ondpaup.1hcstandanlcnar or the clill'ama: in the two-sampIc means is given by:
SE(xl-·q) =:
4 4 -+na R2
~ =: (na -I}.1f m-I ).si ., nl +n2-2 and ", ancI-'2 ~ the slandanlde\liaIiaRI for poupsaneand two n:spcclively. Far the paiIaIl-Iat. Ihc daIa an: dependent. i.e. theM is a one-lo-anc canaponclence between h values in the lwo samples. Palin:d daIa caD occur tivm two mcasun:mc:nts on die samcpcnon.e.g. bcfan: ancIafterlle8lmc:.. or the same subject ~ al dil"CI'CII1 timc:s. II is incanectlo anaI)'1C pWaI data ignoriqthe pairing in suc:h cilalmslanccs. as impcxtanl infannation is IOSL Same: factors you do not cOl1llol in the aperimenl wiD afl'ect the bef'~ and Ihc aRa mc:asun:menls
equall),. SD they wiD nat all'eeth c&1I'cn:nce between Wan: and aftc:l'. By Ioakinc onl)' at the dilfcn:ncc:s., a pain:d I-test com:cts far these fackn. 1be lwo-sample paRd l-test usaally tc:sIS Ihc: null hypothesis Ihat the papulation mean orllle pain:cI clift'cn:nccs orlhc: lwo samples is zcm. We assume: thai the pairaI diffen:nces I n i_penclc:at. To perform Ihc: paiRXII-tesl we calculate die: diff'e~nc:e between each lid ofpail'5 aad then perf0I'III a onesample I-lest on the dilfen:nccs with die: nuD hypolhcsis lhal the papulation mean oflhe dilf~nces is c:quallo ZCIU. Man: c1clailscan be found in Allman (1991). MMB A-..., D. O. 1991: hactittll sltllalir$ for ",.tli I.oIIdoa: OIapmaa .t. Hall.
Fr6«ll'tir.
subgroup ...lysl8 This fonn of analysis is often employed in ClOOCAL TRlAU in an altempt to identify particular subpoups ofpalic:nts for whona aln:atmc:Dl wadts beuu (01' wone) than far the ow:raIl palic:nt papulalion. For example,. does a lIeatment warIt bclter for men than fOl' women? Such a quc:slion is a nalunll ODe fOl' clinicians 10 ask since Ihc:y do nolln:al ·a~~ patients and. when confnmted with a female patienl with a certain aandilion. would like to know whc:dac:r die accepIc:d trealmcnl for die: condition weds. sa),. less well far women. Assc:ssinc whc:lhcr the c:ft'ect or lrealmc:nI varic:s accading to the value of one ar IIIICR paIicnl charadc:ristic:s is n:IaIivel)' strai&htbwanlliam a IlalislicaI viewpoint.. involving nothing IIIIIm Ihan taIinIa tn:atment bycxwarialc inllRttiOlLIIowcwa'. I11III)' S1aIislic:ians wauld cauliaD apinsI auc:h analyses and. if undcrtabn at all. sugcsalhal they ~ infapn:tcd c:manel), c:audausly in IhclpirilC71"eapIantion'l1IIhcrlhananything man: farmal. 1hc JQ!IIJM far such CMIlion an: DOl difllc:uJt 10 id&:dif)'. Finl.lriaIscaD ran:1)' provide suflicic:at POWfJlIO dctc:c:l such sabpuupranlc:l1lc:tian effeclS; clinical trials accrue: sufficient participants to provide adcquaIc JRCisian fOl' cstillllllinl qu....iliesofprimary intcn:sl. usaaB)' overall baImc:IIlell"ec:ls. C'GnfInil1l allc:nlion to subgnups almast alWIl)'S rauks in e&limales of iDadcquale JRCisiaa. A Irial just Iarp enaup to e\lBluatc an cwc:raU In:abnelll efTc:cI Mliabl)' will almost inevitably lack pm:ision for evalualiftl dilfcn:nlialllalmeDl effects between diffcn:nl population subgroups. Scc:ancI.ItANlXMSlQDlaasun:s .... IhcCM:nll ....... paups in a dinicallIiaIlR likely 10 be CXIIIII&abIc. SubpJups 1118)' nat c:qjoy Ihe SIIIIC cIq;nIc or balance in , . . . dJan:.icritIicL Finall)'.Ihc:~ I n often many possible prvgnoslic facton in the baseline: data. e.g. a&e. gender, IKC. lype 01' IIqe or disease. from which to fonn subgraups. so tbat analyses ma), quickly cleFIICI1Ile into ~daIa dn:cIgil1l', frona which arises the potential for past hoc emphasis on Ihc: subpoup anal),sis giYing ~sults of most inten:stlo the in~liplar. with undue: emphasis given to n:salts clec:mc:d "stalislic:ally signiftc:anl' conlributil1l. in him. 10 a pn:pondenuacc of ~p < 0.05' raults
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SUMMARY MEASURE ANALYSIS published in the mc:dical litenlllR (an excessorralse pasiti~ findings. thererore). Other potential claqen or subgroup aaalysis can be round in detail in Pocock el DI. (2002). SSE
MODELS
group dift'c:rma: when two groups are being compared ar a one-way ANALYSIS OF VARIANCE when there~naon: than two groups. If cansideml more appropriate because of the dislributional ptUpC:rties of the selc:ctcd summary measlR, thea analogous NDNP.MWIEI'RIC MEnlODS might be used. The summary measure Dpproach can be illustrated USing the data shown in the second table. which c:ome from a study or alcohol clc:pendcnce. Two groups or subjects. one with ICWft clc:pendcnce and one with moderale dependence on alcohol. had their salsolinol excmion levels (in millimoles) reconIcd an rour cOlLteculive clays.
summary m_u18 analysis This
summary meaaunt ..aly. SIIlsDIinoI excretion data
PocadE,S.J.,A......, s. Eo, I!'IIas.L E.... 1Castu, L It. 2002: Subpaupanalysis,covariateadjuslment ..d baseliaecomparisonsin clinieallriallqlClding: cumnt practice and problems. Statutics ill MftiiC'ine 21. 2917-30.
sutflclent-component ca... model See CAUSAL is a relatively straighU'orwani approach to the analysis of LONOmrDlNAL
in which the n:pcaled me&SlRmcnts Dr a response variable made on each individual in the study an: rcduc:cd iD some way to a single number that is consicleml to caphR an essential feallR of the response over time. In dais way. the multivariate naIun: or the ...-ted observations is n.. rormcd 10 a univariate measIR. 11Ie approaclJ has been in use far many yean - see, far example, Oldham (1962) and Matthews el Qt. (1989). The most important considcralion when applying a summary measure analysis is the choice of a suitable swnmary measure. a choice that needs 10 be made berore any data an: collected. The melllUle chosen needs lobe mevan' 10 the partiaalarquc:stionsofinlelat in the study and in the bmader scienliftc context in which the study tabs place. A wide range orsumnwy mea5lRshas been proposed. as shown in the ftnt table. AcoonIing to Frison and Pocock (1992), the average response overtime is oRen likely to be the most relevant. particularly ill QJNICAL 11UAI.S. Having chasen a suitable summary mea51R. analysis will involve nothing I11On: complicated than the applic:alion of Student' sl-lest arcalculalion ofa C'ONf1DENCE JNlEIlVAL far the
/JQy
DATA,
J
4
Group I (moderate depeadence) 0.33 0.70 I 2 S.30 0.90 3 250 2.10 4 0.98 0.32 0.39 5 0.69 6 0.31 6.34
2.33 1.80 1.12 3.91 0.73 0.63
3.20 0.70 1.01 0.66 3.86 3.86
Group 2 (severe dependence) 0.64 0.70 7 0.73 8 I.SS 9 0.70 4.20 10 0040 l.fIO II 1.50 1.30 12 1.80 1.20 13 1.90 1.30 0.50 0.40 14
1.00 3.fIO 1.30 1.40 0.70 2.fIO 4.40 1.10
1.40 2.60 S.4O 7.10 0.10 1.80 2.80 8.10
Subjet:l
I
2
summary measure . .lyaIe Possible summary mfNISU18S (from Malllrews et aI., 1989) TypeD/MID
Peaked Growth
QuesliDn tlj"inleresl
SummlB')' mftUllTe
Is overall value or outcome variable the same in differenl groups? Is maximum (minimum) response dift'enml betweea poups? Is lime to maximum (minimum) response different groups? Is rate of change Dr outcome dift'erent between
Overall mean (equal time intervals) or area under curve (unequal intervals) Maximum (minimum) value Time 10 maximum (minimum) respons Regression coeflicieal
groups?
Growth
Is eventual value or outaJme dilTemat between groups?
Growth
Is response in one group deJayed relative to the other?
Final wJue Dr outaJme or diffen:nce between last and fInt values or percentage chllll&e between ftr:sI and last values Time 10 n:adJ a particular value (e.g. a fixed pen:entage or baseline)
447
SUPPORT VECTOR MACHINES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Using the mean or the rour measun:ments available for each subject as the swnmary measure leads lO the n:sullS shown in the third table. There is no evidence of a group dilferalce in sUsolinol exc:n:tion levels.
summary measure analysis Results from using the mean as a summatYmeasute for the da. in the sscond table Moderate
Mean sci
1.80 0.60 n 6 1= -l.4O. dr= 12. P=0.19 95.. CI: (-1.77.0.391
Sel'ere
2.49
1.09 8
A possible alternative lO the use of the ),tEAJIr as a summary
mcasun: is
10
usc the maximum exc:n:tion rate n:corded
over the rour days. Applying the WJLCOXOH RANK~' TESI' to this summary mcasun: n:sults in a lest lilatistic of 36 and
assoc:iaIcd P-VALlTf. of 0.21. The sW1llD8l)' measure appmaclJ 10 the analysis of longitudinal data can accommodate missing daIa but the implicit assmnplion is thallhese are missing completely at nnclom (see DRCPDUI'S). SSE (Sec also AREA UNDER CURVE) FrbDa, L ..... Paeock. s. J. 1992: Repeated me8SURS iD clinical trials: analysis using mean summary statistics and its implicatioll\ror «sip. Stalistirs in MMirint II. 1685-704. Ptlatt..... J. N. s., AJtman.D.G.~Calapbell.M.J...... Jlo,JItaa,P.1989:AnaI)'sisol serial me~1DeIIl5 iD medical IaClR:h. BTiIi., Metfirol JDUTflQI lOO. 23-35. 0IdMm, P. D. 1962: A nocc on the anaI)'sis ofrepeillal
mcasurementsofthe.samembjects.JDUTflQlo/Chronit DisortkrJ IS. fWl-77.
support Vector machines
1'bcsc arc algorithms for learning complex ciassiftcation and n:;n:ssion functions. belonging to the geru:raI famil), or 'kernel methods' discussed later. Their 4X11Dpulalional and statistical eflk:iCftC)' n:cently made them one or the tools of choice in cenaiD biologie&l DATA MINING applications. Support vector machines (SVMs) work by embedding the data into a fcatu~ space by means or kernel runctions (the so-called 'kernel trick'). In lhe binary classiftcation cue, a separating hyperplane that sepandCs the two classes is sought in this featu~ space. New da.. points will be classiftc:d into one of both classes according to their position with respect to this hyperplane. SVMs owe their name to their property of isolating a (often small) subset of data points called 'support veclOn·. which have interesting thccRtical properties.
The SVM applOach has several important virtues when compared with earlier approaches: the choice of the hyperplane is foundc:d on statistical arguments: the hyperplane can be found by solving a convex (quacbalic) optimisation problem. which means that IJaining an SVM is not subject 10 local minima: when a nonlinear kernel runction is used. the hyperplane in the rcatu~ space can correspond to a complex (nonlinear) decision boundary in the orieinal data domain. Even ~ inlm:Slingl),. kc:mcl functions can be defined DOl only on vectorial data but on virtually aay kind of data. making it possible lO classifY strings. images. trees or nodes in a graph: lheclassiftcalion ofunsecn data points i'&enerally computationally cheap and depends on the number ofsupport vectors. Fint introduced in 1992, support vector machines are now one or the standard tools in PATI'ER.'I ItE(.'()(JJI(JT applications.. mostly due to their computational emciency and statistical stability. In n:c:ent yc&rs. extensions or this algorithm 10 deal with a number of imponDnt data analysis tasks have been proposc:d, resulting in the general ramily or 'kernel methods' (Shawe-Taylor and Cristianini. 20(4) (see DENSrJ'Y ES~S).
The kinds or n:lation detected by kc:mcl methods include classifications, ~grcssiDIIS. clustering (sec CLUSTER ANALYSIS IN MEDICINE). principal4XllDponcnts (sec PIlINC1ML CDr.UONENT ANALYSIS), canonical c:oI'Ielations (see CANONICAL COJUlELA11ON ANALYSIS) and many others. In the same way as with SVMs. the kernel Irick allows these methods to be applied in a reature space that. is induced by this kernel. making Iccmel methods applicable to Virtually any kind or data. Elegantly, the dcvelopmenl of kernel methods can always be decomposed into two modular steps: the kernel design. on the one hand. and the choice of the algorithm, on the ather hand. 11M: kernel design pan implicitly defines the featun: space. which should contain all available inronnation Ihat is relevant for the problem at hand. The choice or the algorithm (which needs 10 be wriUcn in leIms of kernels) can be done independeatly from the kernel design. As with SYMs. mostlccmcl methods n:duc:e their tJaining phase to optimising a convex cost funclion or to solving a simple ei&envalue problem. hence avoiding one of the main computational pitralls or NEUR.O\L Nf:IWOIlKS. However. since they often implicitly make usc or very high dimensional spaces.. kernel methods run the risk or overflaing. For this reason. their desip nc:c:ds to ineorpondc principles of statistical learning theory. which help to idcatify the crucial panunclelS that need to be conbolled in order to avoid this risk (see Vapnik. 1995). For rurther rcfen:nce on SVMs. sec Cristianini and Shawe-Taylor (2000). NeRDB
erwt....., N...... S..we-....,.Ior~ J. 2000: An inlrotktion to mpport reelOf' mamiMs. CambridJC: Cambridge Unhasity Pms (""'''''.5UJ!POIl-\'eCCCIr.ncl). Sbawe-TQIor, J. aad CIt........I. No
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SURVIVAL ANALYSIS-AN OVERVIEW 2004: Kernel rM'irotb for pGllem tIIIIII)'lis. cambrid&e: Cambridge Uaivenity Plas (www.kcmel-lllClhacls..nca). VapaIk, V. 1995: 77w IfIII",e 0/stalistical &-ar,,;", l/reo". New York: SpriDgCr. ~ ENDPOINTS thai CaD rqJlace a clinical cndpoinl rar the purpose: of assessing the effects ofnc:w tratmentscadier. atlowcrcast. or willa grealer Slatistical smm\TI'Y. Sunolatc endpoints can include: measumnc:nts of a biomarkc:l'. clcftncd as ·a characteriSlic thai is objectively mc:asuml and evaluated as aD indiclllOrofnormal biolopcal processes. pathogenic praccssc:s. or pharmacologic responses to a Ihc:rapc:utic intcn-cnlion' (Biomarkers Deftnitions Working Group. 20(1). Use or a biomarker as a sunopae endpoint CaD also be usefUl if the final c:adpoinl mc:asun:menl is unduly invasive ar unatmfartable. For aD endpoint to be a sunople for a clinical cndpoint it must be a measure of disease such that: (a) the: size (or rR:qucncy) c:om:latcs strolllly with thai clinical c:adpoinl (c.g. blood pl'Cssure is posilively cam:latcd willa Ihe risk or suoke) and (b) treabnc:nts praclucillla change in the: SUJIOplC endpoint also modiry Ihe risk or thai particular clinical endpoint (e.g. n:ducing bloacl pra~ nxluccs the risk or stmkc:). Sunopae endpoints an: routinely used in carly drug development. where interesl focU5c:S Oft showing thal new tratments have enough acti\'ily to wanant rurther rescan:h. In conftnnatory PHAsE UllRIAlS. however. interesl rocuscs Oft shoWing that new In:almc:nts have the: anticipated clinical beneftls. and in such situations sunolate cndpoints CaD only be used ifthe:y have undcrgaae rigorous statistical evalualiOD (or "validation·) (Burz.ykowski. Molc:nbc:rghs and Buysc. 200S). Indeed. some promising surrogate endpoints have provcn to be unreliable pnxiictOlS of clinical beneftts. For example. canliK arrhythmia was believed to be a good sunogate endpoinl far mortalilY aIlcr aD acute heart aUack. since in Ihc:sc cin:umslancCS paticnts with a hilhcr risk of such aD aniaylhmia have a greater risk or death. Howcver, scveral drugs (c.g. lignocaine. IIccainide:) thai pnwent arrhythmias after a heart auack IIClually increase: mortalilY (Echt el tiL, 1991). Similarly. some: blood pra.5un:-lowering drugs (such as angiolensin-convc:ning enzyme: inhibitors) have much largcr effects on vascular mortality than might be predicted from their cffects on blood pressure (Heart Outcomes Prevention Evaluation Siudy Investigators. 2000). In conlnSl. disease-free: survival has n:cently bc:c:n validated as an acceptable sunoplc for overall survival in patients with colan:clal cancer treated with Ouoropyrimidines (~cnt el til., 2005). Prenticc (1989) prupascd a dc:ftnitiaa aad operational crilc:ria for Ihc: validation of sunuptc: endpoinlS. Although the strict criteria prqJ05CXl by Pnmlicc seem lao lilringc:nl 10 ever be met in practicc. his landmark paper sparked iDterest in dC'Veloping slalistical meIhods thai could be usc:cIlo shIM-that a sunugate is accepIable (ar "wlidatcd') for the: purposes or
surrogate endpoints These
assessing a specific class or lmllmc:nts in a spc:dftc disease: sc:Uing. One: approach consists or using a MUI.'I1U.\'EL ),fOOB-to show that the surrogaIC cndpoinl pnxlicts the hUe c:adpoinl ('individual-level' sunvp:y), aad that the cfTects of a tn:aImenl on the SUIIOpIe cndpoint praIic:t the: effects or Ihc: lRalmc:llt on Ihc lrUe c.q,aint ('trial-level' alunogacy)(Buyse: elill. 20(0). The latlerClCJftdilion n:qumdala 10 be: awOable: rrom SCYeral units. usually flOm a MEJ'A-MW.YSIS of se:wn! trials. Analhc:r approadl consists of USing a CAUSAL MOOB-to cOI11lJlR the causal effect ofllQbnc:nt on the lrue: endpoint in palients for whom tn:aImc:DI docs. and docs nul. afTec' the: surrople. Sec: Wcir and Walley (20D6) far a review or the: terminology ancIswmgate validation models. CBlMB B.......... Del...... Wo....... Graap 2001: Biomartas aad SIII'IOple cndpoints: JRfcmd defiDilians aad coDCCphllll fiamc. wark. Clinical PllarmtKDlogy IIIIIl 'I1reraptfl,icl 69. 89-95. BDnJ...... T., M....1IeqIas. O. aDd BII)., M. (cds) 2005: E'WUllIiDII of SlUlDla'e entlpoin". Springa' ~ B81Rt Me, M.......... G., IIan)".WIId, T., ......, D. ad Geys, H. 2001: The valicialiOll or SUDOglIIe cndpoiDls in lDda-aaalyscs of randomized experiments. Bios'a,illicl •• 49-67. F.dd. 0. s.. ua. IDII, P. IL. MlIdMIII, I.. B. lie CardlacArrll.¥tbada
ft" ....
511......... Trial (CAST) laY.......... 1991: Manality aad
morbidity in palieDlS nxeiviag caeainide. ftc:cainide. or placebo: the cardiac riydnnia SUAJRSSiaft 1riaI. NftF England JDllmQI of Met/kiM 324. 781-8...... 0aIcGIDeI Pn,1DtIaa .......... StadJ lay......... 2000: Eft"cds of an IDgiotcasilKonVCl'liag CIIZ)'1IIe iabibitar. rami.,nl. on death Iiam cardio¥ucular causes. myocardial inCan:Iion,"" in hilh-risk palienlS.Nnt· EIr&/1IIIIl JDIII1IIlI of MftiidRe 342, 145-53. rna... R. L 1919: Sunople eacIpoinlS in diDicailrials: deftnilion and aperaIioDaI criteria. SI. lislklin M«litine 8. 431-10. Supat,O',\V...... s.,HllllertD.G. .... 2005: Disease-he survival (DFS) vsomallsunival (OS) asa primary eodpoinl for adjuWDI caIoa caaccr studies: indi\'iduaI paIient data r.am 20.191 Jllllicats on 18 nndamizcd Irials. JDumtzi 0/ C/inital O"t.'DIogy 23, ~70. Weir, C. J.... W"Iey, R. J. 2006: Statistical evalualion or biomutccm as aarroglllC endpoiDts: a lilcrahlle ~yiC'A'. SIa'Uticl if Metlidne 25. 113-203.
survtval anal,. - an overview
This covcn mdhods ror the: analysis of tiDM>lD-eYcnl dalll. e.g. survival times. Survival da.. occur when the outcome: orinlc:n:51 is the: lime flUID a wc:ll-deftnc:d lime origin to the OCCUrmlClc: of a particular event or EJlDPOINI'. Ir the endpoinl is the de:aIh of a palienl Ihe resulting data 1ft. lilc:lally. survival limes. Howcver. other endpoints ~ possible. c.g. the lime: 10 relief ar recu~nce orsymptoms. Such observations IR often rerc:m:d to as IiJnc:..tOoCvc:nt data althoqh survival data is commonly used as a gcneric term. SIandanlSlAlisticai methodology is not usuaUy appropriate ror sucb data. for lwo main reasonL First. the distribution orsurvivallimc in general is likely to display positive SKEWNESS and so assuming nonnality for aD ....ysis (as done:.. for cxample. by a l-lESTor a regn:ssion) is probably nOl reasonable.
449
SURVIVALANALYSIS-ANOVERVlEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
Second. IIIOIe critical than daubts about normalily. however. is the pn:sence or ClCnson:d obscr\'lllions. wIIeM the surviwllime of' an individual is rerem:d to as ClCnson:cI whea die eadpoint or inlm:sl has nul yet been mached (meR PRCisely. li,hI c:ellJOled). For true survival limes Ibis Diilht be _RIC the data flam a study 1ft aalyscd at a lime poiDl when same participants I n lIill aIi~. Anodaer reason ror ClCnsomI ew:al limes is thai .. individuailDipl ha", baen last to roDow-up for n:asans unn:laIcd to theevcnl ofinten:st. e.J. due 10 rnovinllO a location dual c:annot be Inc:cdorduc 10 accidenlal cIeaIh (see DRCIQITS). When censorinl accurs all that is blown is Ihalthe actual. bat unkaowa, survival time is larpr than the c:ensan:cl SlIm val lime. Specialised statistical techniques clnc:lopcd 10 ....)'IIC such c:enscncl and possibly skewed aub:omes I n known as survival analysis. An imparlaDtassumplion made inslllDdanl sunival ....ysis is that the cenlDl'inl is DDJliIlfonMIive. i.e. that the actual sUn'iwllimeof'an individual is inclepenclent of any mechanism dI. causes that individual's survival lime to beccnsan:cl. Farsimplicity.thisclescriptiOD aIsoconcenlnla _lechniques farconlinuous survival times-the analysis of cliscn:cc sunivallimes is clc:scribed iD Collett (2003). As .. eumple. consicler data lhat arise from a doubleblind. randomiscd CDlllIollcd clinical trial (Ref) (see CLDnCAL lRlAU) 10 compare treatmeDls ror prostate caac:cr (placebo VCJ5US 1.0l1li ofdielhyJslilbestrol (DES) adminiSlelal daily by mouda). The rull datasel is given in Andn:ws and Herzbeq (1915) and the fint lable shows the fint seven or 8 subset or 38 patients used heIe and discussed in Coiled (2003). In this lluely, the time of' oriIin was the clare _ wllich a cancer sufferer was ranclamiscd to a In:alment and the endpoint ilthe death or a patient flUm pmslale c:aacer. 'I1ac survival timc:sorpatients wIaodiecl fRJIII alhercauses or W'tR lost cIuriq Ihe follow-up pmc:ess 1ft rqaniecl as ript a:nsomI. The 'slalUs' variable in Ihe first tablc lakes the value unity if the patient has died f'mm prostaIe cancer and
lelU if the sun'iwl lime is ClCnsan:cI. In addition 10 survival limes. a numberorplVlllOSlic: rac:1Ors Wa1: ~ namely the 81c or the patient at trial enlly. their ICIUIII hllCllDlJllobin lew:1 in P11100ml.the size of their primary tumour in cm2 and the value or a cambi_ index oflUlDaUr st.qc aacI pade (the Gleason index with "lIer values indicating ~ advanc:cclwmoun). 11ac: main aim of'this saudy was 10 c:ompan: abe survival experienc:e between the IWO In:almeat paups. Inpneral. 10 clesc:ribc survivallWo funclionsoflime 1ft of ccntnal inlen:st -the .nil'll/fundion and the "lIztUdjilnc1;011. 'I1acse 1ft described in some detail nexL TIle survival fUnction S(/) is cldined as the probabililY thai an individual·. survival lime. T. is IRIIIa than ar equal 10 lime I. i.e.:
S(/) = Pmb(T ~ I) The,raphol'S(/)apinlllisknDwnasthesurvivaicurve.The survival ~ c:aa be dMJupt of 81 a panicular way of displayiftl die frequcacy distribution or the ew:nl tilDes. I1Ilhcr than by. say, a HISTCJCJRAM. Wbea dlere are no censon:d observations in Ihe sample of survival limes. the survival f'unclion can be estimated by the empirical survivar f'unc:ti_: •{) S I =
Number or individual s with survival limes ~ Number of Individual s in the ... sci
I
Since CVCIY subject is 6a1ive· at die bcpnniq orllle study and DO one is observecIlD survi~ Ioqer Ib.. the largest of die observed sun'iwl times then: $(0) = I ancI$(/_) = 1
Furthennon:. the eslimlllCd IIII'ViVOl' filllction is assumed constanl betweea lwo adjacent cIeath limes. so th8l8 plot of 5(/) qaiDSlI is a step fUnction that cIemues immecliately after each 'death·. This simple mc:Ihod clllUlOl be used when there are ccnsend obsemdions since the methacI does not allow far information provlclal by an individual whose sun'iwl time is censored befon: lime 110 be used in die compulinl or the
1IUIVInI • ...,.... SUtvivai times of pmsIate CBIJCfN' patients P,,'iIfIIl
IIIIIPIbo 1 2 3
4 5 6 7
SllIla
Age
I.
O-died,
(yean)
flllDlldu,
0= eelUlWaI)
TrftJIIIWIII {I =pllleebo.
Sunival
2-DESJ 1 2 2 1 2 1 1
6S 61 60
0 0 0 0 0 0 1
sa 51 51 14
Smnn
S;:e t1f
G/mson
llaem.
l1IIlIOUt' (mi,
iIrIII.y:
34
a
4
10
(1m/IOO ml)
67 60 77 64
65 61 73
13.4 14.6 15.6 16.2 14.1 13.5 12.4
3 6 21 8 18
a 9 9
a 11
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SURVIVALANALYSIS-ANOVERVIEW estimateal'. n.e most commonly used method rareslimaling the survival function rell' survival cIaIa conlainiDg ccmorcxl observations is the pnxIuct-limit or KAFl.AN-MmR ES11MA1OR. 1he ClSeIlCe orlhis appmach is Ihc use ora pmduct or a series
aut Ihe study period. A similar pmc:c:dun: can be used 10 estimate other percentiles or the distribution or Ihc survival limes and approxim.1C cxmficlcnce intervals CaD be found once the variance or the eslimatc:d percenlile has been clcriwd flOm Ihe VARIA.~ or the Cltimatar of the SUl'ViVGl' function. In the analysis of survival daIa. it isoRcn of some intcral to assess which periods have Ihe highest and which the lowest chance or de.... (or whatever Ihe event of inlClat happeM 10 be) among those people alive allhe time. TIle appropriate quantity for such risks is the hazard fUnction. h(I). defined as Ihe (1ICa1ed) PROIWIDJ1Y that an individual experiences an event in a small lime inter'V8l dl.liven th.1 the individual has survived up to Ihe bepnning of lhe interval. The hazard function therd"CR n:pn:5ents the instantanc:ous cIcalh rate far an individual surviving 10 lime I. It is a 1Deasun: or how likely an individual is 10 experience an CVCDt as a funclioD of the age or.1he individual.. 1he hazanll'unctiaa may remain m~ inctase or _rase with time or take some more complex fann. 1he hazard fWIClion or death ia human beiDp. rar exampl~ has a 'balhrub' shape. It is rel.lively high i~ diately after birth. declines mpidly in Ihe early yean and then remains ~I.tively coRllanl until beginning to rise cluriag late middle age. A Kaplm-Meier type estimatar or the hazanl runction is given by the proportion of individuals experiencing an event in an interval per unit lime. given that Ihcy have survived to the beginnilll of Ihe interval. Howcver. the estimated hazanI function is lenemJly cons~d 'too noisy' ror practical use. Instead. the cumulalive ar i~ paled hazard function. which is derived flam the hazard function by summalion. is usually displayed 10 describe the claanF in hazard over lime.
of condilional pmbabilities. Oae akcmaIi~ estimlllDr ror censon:d sunivallimcs. derived difl'em1lly but in practice often similar. is Ihc NeJ~Aalen c:slimalor. APJJI'Ol'imalc STANDARD ERRQRS
and pointwise
symmetric
or asymmetric
CCX\IRDEJIU: INIERVALS fex the sllrviwl funclion al a given lime
can be clcriwd to dercnninc abc pn:cision or the estimatordc:tails &Ie given in CoUdt (2003). The Kaplaa-Mcier estimalors of Ihe survivor curves for the two prostate cancer RaIments an: shown graphically in the figure. n.e survivor curves an: step funclions that decrease at the time poiDls when pmlicipaats died or the cancer. Thecenson:d observations in the daIa are indicated by the 'ClOSS' marks an Ihe cUI"VCl.lnourpalientsampie then: is approximalely a difl'cmIICC of20Clt in the pnJpOI'lion sunoivilll for at least SO lo60 months bdwecn Ihe lIalmCDlgraups. Since the c1iSbibution of survival times tends 10 be positively skewed Ihe MEDIAN is the prefell'ed summlllY measure of location. The mediaD sUrYi'VDl time is the time beyand which S09f, or ilia individuals in die population under study &Ie expc:ctccllO survive and. once ilia survivor function has bc:cn eslimatcd by S( I). can be eslimatc:d by
the smallest observed survivallime~ I!O. for which Ihe value or the eslimaled survivor function is less than 0.5. The estimatc:d median survival time can be read from the survival curve by finding the smallest value on Ihe .'C axis
ror which the survival prvporlion reaches less than 0.5. The nglR shows thai Ihe median survival in the placebo group can be estimated as 69 }e8I"S while an alimale ror the DES group is RDI a'VDilable since survival exceeds S09f, duouP-
1.0
.----~.
L ___ - - I
0.8
,----*'--~-~~
lu I
.- --.
0.4 0.2 0.0
0•
• 20
• 40
• 60
• 80
TIm8 {months} survlval .....,._. DIsplay of Kaplan-Meler survivor CUIWIS
451
SURVIVALANALYSIS-ANOVERVIEW _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ In addilion 10 compari~ survivor functions graphically. a IIKII'C fonnal slatisticallesl for a group difference is often required in order 10 compare SIJn'ivai limes analytically. In the absence of ceMOring. a nonpanuuelric lest sucb as the Mann-Whilney lest eouId be used (see MANN-WHnNEY RANK Sm.I TEST). In the Iftscnce of censoring. the log-rank or MantcJ-Hacaszel tesl is the most commonly used nonparametric test (sc:e MANIB.-HAENszELMEIlIOOS).llleSlS the NlILL HYPOTHESIS that the population survival functions S.(I), S:,(/). •.. , S,(I) arc the same in k groups. Briefly. Ihe lest is based on computing the expected numbec of deaths for each observed "death' lime in the dataset. asswning that the chances of dying. given that subjects are al risk, am the same in Ihe groups. The toUl numbec of expected deaths is then computed for each group by adding the expc:ctc:d number of deaths for each faillR time. The test finally compares the observed number of deaths in each group with the expected number of deaths using a CHl-SQUIdlE TEST wilh k - I DEGREES OF fItEEDO).f (see Hosmer and Lemcshow. 1999). The log-nnk test statistic. Je, weights contributions from all failon: times equally. Several alternative test slalistics have been proposed that give difl'cn:ntial weights 10 the failon: times. For example, the generalised Wilcoxon lest (or Breslow lest) uses weights cquallo the number at risk. For the prostate cancer data in Ihe firsl table the log-rank test (r=4.4 on I degrc:c of fn:edom. P=0.036) detects a significant group diffcmIIX: in favour of longer survival on DES beatmcnl wbile the Wilcoxon leSl, whicb puts n:lalively more weight on dilJerences between the survival curves at earlier times. fails to n:ad1 significance at the S9t test level (Je 3.4 on I degree of fRedam, P 0.(65). Modelling survival limes is useful especially when ~ an: several explanatory variables to consider. For example. in Ihe pruslale cancer trial palients wen: randomised to lmdment groups so that the lheon:lical distributions of the diapostic fadars wen: Ihe same in the two groups. However, empirical distributions in Ihc patient sample might still vary and if the prognostic variables arc related to survival they might confound the group diffen:nce. A survival analysis that "adjusts' the group difference for the prognostic faclol(s) is needed. The main approaches used for modelling Ihe effects of covariates on survival can be divided J1)ughly into two classes - models based on assuming proportional hazanls and modeJs for' diR:ct effects OD Ihe survival times. The main technique used for modelling survival limes is due to Cox (1972) and is known as Ihe PROPO~AL HAZ.O\RDS model or. more Simply. Cox's regression (sec COX'S REORESSlON MODEL). In essence. the technique acts as the analogue of multiple regression for survival times conlaining censored observations, for which multiple regn:ssion itself is dearly not suilable. BrieRy, the prot'edure
=
=
models the bazard function and central to it is the assumption that the hazard funclions for two individuals al any point in time arc proportional, the so-caUed pr0portional huards assumption. In other words. if an individual has a risk of 'death' at some iniliallime point thai is lwice as bigh as anolher individual, then al all later times the risk of death remains twice as high. Cox's model is made up of an unspecified baseline hazard function, ho(I). which is then multiplied by a suitable function of an individual's explanatory variable values. 10 give the individual's bazard function. The interpretation of the regression parameter of the ith covariate. Il" is that expffl,) gives the hazard or INCIDENCE rate cbange associaled witb an increase of one unit in the ith covariale, all other explanatory variables remaining constant. Cox's regression is considcrc:d a sc:mi-pammelric procedun: because Ihe baseline hazard function. ho(l), and by implication the PROB.O\BIUI'Y DISTRIBuno~ of Ihc survival times. does not have 10 be specified. The baseline hazard is left unspc:cified: a differenl parameter is essc:ntially included for each unique survival time. These parameten can be thought of as NUISANCE PARAMEl'ERS whose purpose: is merely 10 conlJ'ol the parameten of interesl for any cbanges in the hazard over time. Cox's regn:ssion can be used 10 modellhe prostate cancer survival data. To start wilh. a model containing only Ihe single treatment factor is filted. The estimated regn:ssion coefficient of a DES indicator variable is -1.98 with a standard enur of 1.1. This translates into an (unadjusted) hazard ratio of exp( -1.98) 0.138. In other words. DES lRatmcnl is estimated to reduce the hazard of immediate death by 86.2~ Rlative to Fl..ACEBO treatmenl According to a UKELDIOOD RA110 (LR) test. the unadjusted effecl of DES is Slatistically significant al the SCI. leyel (r 4.5S on 1 degree offRCdom. P=0.033). For the prostale cancer data. it is or in~ to determine Ihe effect of DES after conlJ'oliing for the other prognostic variables. Ukclihood ratio tests showed that dropping age and serum haemoglobin from a model that contains Ihe lRatmcnt indicalor variable and all four prognostic variables did not significantly wonen the model fil (al the IO~ level); the fit of the final model is shown in Ihc second table. After adjusting Cor the effects of lumour size and saage Ihe hazard reduction for DES relative to placebo lRatmcnl is reduced to 67.1 ~ and is no lODger statislically significant (LR lest: Jf 0.48 on 1 dcgn:e of fn:c:dom. P = 0.49). 80th lumour size and Gleason index ba\'e a hazard ratio above unity. indicating thai increases in tumour size and ad\'llllCClCl slages are estimated 10 increase Ihe chance of death. Cox's model does not n:qu~ specification oC the probability distribution of the survival times. 'J'he hazard function is not reslriclc:d to a specific: form and as a Rsult Ihe
=
=
=
____________________________________________
~ALANALYSB-AN~
..",..1 • ...,... Parameter estlmlJles 110m Cox's regression of SUfVivBI on I1eaIment tIIOUP. GIe8sm index
95~ ClltJI' expf/J)
EJI~I es,iIrrG'e
Pretlk,or
tumour size and
WJrifIhk
..
DES Tumaursize
Gleason index
Regres.itm ctleJ1icDlI (~)
SI_dlJrtl error
-1.113 O.os26 0.7102
1.203 0.048 0.338
( (;)
HlIZtII'd rll'",
I.Dwer
(exp~))
lim;'
0.329
1.G86
0.031 0.990
2.034
1.049
Upper /;111;1 3.47 1.19 3.95
survival.....,... PatamBler estimates from Iog-IqjsIJc accelerated fa/lu18 time model 01 sul'llval an IrelllnJent tJIOUP.
'umour size and Gleason indtIJc
95. CI /Dr e.~p(~aJ
EJlect ulimllie RegnDiDn
c«Jlidenl(a)
DES 'nImaursize Gleason index
0.628 -0.031
-0.335
S,_ _d
O-V;;a)) o
Accelertllion
1lK'lor(exp(
0 0.022 0.203
senD-paralllCbic model has considcnlble lexibilily and is widely used.. However. if the assumption of a particular probability clillributiDn far the'" is valid. infe.enccs based OR such 811 8SS1mIpti_ ale II10Ie pn:c:ise. Far example. c:sIimates ofhazanl ratios ar medlan surviWlllimes will ha~ Slll8ller IIandanI CIIOIS. A fUlly pallllDClric proportional bazanIs model makes the late asslDllpiions as Cox's rqn:ssion but in addition also assumes Ihal the baseline hazanl fUnction., brJ.l).. c_ be pananeterisecl aecanlilll to a specific madel far thc distribution 01' the lIIn'ivailimes.: Survival lillie distributions that can be used for this purpose. i.e. Ihat ha~ the pnJpDIIional hazards property, ~ principally thc EXPDNEN1W.. Weibull and GonapeItz Dl511U11U11ONS. Dift'aenl dislribulions imply dift'en:at sIIapcs of the hazanl function. _ in pnICIice the distribution .... best describes the fUnctional fann 01' the observed hazanI functioD is chaseD - for dclails see CoUea (20D3).
A flllllily of fully parametric madels that accommcxlatc dircc:t lDultiplicative eft"ecIs 01' covarillles on survival times and heac:e do not have to rely on pmpartiaaaI bazards are lI«elwfllmfailure I . motiek A wieler ImIIC or survival time distributioDs possesses the accc:leraled failure time
-a»
LD..w limil
Upper lilllU
0.182 0.981 0.939
1.568 1.077 2.080
0.534 1.031 1.393
pmpcdy. principally the exponc:atial, WeibuiL log-logistic. genc:nliscd GAMMA or LOGNORMAL DISIRB1I'DIS. ID addition. this family 01' panunebic models iDcludes elillribati_s (col. • log-logistic: clislribution) dull IDDCIeI UDimodal bazanI fllDclions while all dislributions suilable far die pRJpGdianai hazards model imply hazanl f'uJlclions thai incmIsc ar deCIUSC IDIIIIOlOIIically. The IaIter pmperty _pt be limitilil. far CUIIIplc. fex' modellinl Ihe hazanI 01' dyilll after a complicated operation that peaks iD • pa5I-operative period.
the pnemlacccJendccl faillR time ..... for the ctrccls orp explaaatCll)' variablc:s. XI• .%20 •••• x"' CaD be repn:senled 85 a log-linear IIIOdel far sunivallimc. T, lUIIIICIy:
, 1a(T) = Go + Ea;.Ti + enar , I
where a ••••• , a" am the uakaowa coefIlcienlS or the explanaloly variables and Go _ iDll:m:pI panuncla'. Tbc: panuncla' a, .e8ec:ts the efl'cc:l that the itb covarillle bas on Iog-survival lime willi pasiliw: values iDdicatilll Ihallhe survival time inclaSCS willi incn:asilq; values 01' the covariade aad vice vena. In terms or the oriPDaIlinlescalc. Ihe
~mnv~CU~E
____________________________________________________
model implies thai the explanatory variables measured on an individual act multiplicatively and so affect the speed of progn:ssion to the event of intesat. The intcrpn:tation of the parameter at is then:fon: that exp (a,) gives the factor by which any survival lime percentile (e.g. the median survival time) changes per unit increase in .'CI • all other explanatory variables mllaining constant Expressc:d diffen:ndy, the probability that an individual with mvariate value x, + 1 survives beyond I is equal to the probability that an individual with value :c1 survives beyond exp( -a,)I. Hence exp( -a,) determines the ehange in the spccd with which individuals proa:ed along the timescale. and the eaefJicient is known as the acceleration fador of the ilh mvariate. Soflwan: packages typically use the log-linear fonnulalion. The regression CXlCt1icients from fitting a log-logistic BlX'Clerated failun: time model to the prostate cancer survival times usiag treatmen~ size of tumour and Gleason index as predictor variables an: shown in the Ihird table. The negative regn:ssion coemcients sugeslthat the survivallimcs tend to be shorter for larg« value of tumour size and Gleason index. 1'he positive regression coefficient for the DES treatment indicator suggests that survival times lend 10 be longer for individuals assigned to the activc treatment aftc:r adjusting for the efTccts of tmnour size and stage. the estimalc:d aceel«alion factor for an individual in the DES poupcomparm with the placebo group is cxp( -0.621) =0.534: i.e. DES is estimated to slow down the progression of the canc« by a factor of about 2. While possibly clinically relevant. this effect is. howe\'er. not statistically signiftcanl (LR test: X2 = 1.57 on 1 dc:grec of f~om. P =0.2(1). In summary, survival analysis is a powerful tool for analysing time-to-event data. The classical techniques. Kaplan-Meier estimation. COX's regression and acxelerated failun: time modelling. an: implemenlcd in most general purpose STA11mCAL ~£S. with the S-Plus package having particularly extensive facilities for fitting and assessing nonstandanl Cox models. The an:a is complex and one of active curxnt research. For mon: recent advances. such as frailty models to include RANDOM EFfECTS. MULTISTATE MODElS to model different transition rates and modcls for competing risks. the reader is refcm:d to Anderscn (2002). Crowder (2001) and Hougaard (2000). SL AlldeneD, P. K. (cd.) 2002: Mu/liJla/~ models, slillislimi nwlhoth
in medictJ/ remuch II. London: Arnold. ADdnn, 0. F.... Henberw. A. M. 1915: Daill. New York: Springer. Collett, D.
2003: Modelling surl'nvrl dIIllI in nredical rat!tlTr/t. 2nd celition. London: Chapman " HalIICRC. Cox. D. R. 1972: Reprssion models and life tables (with discussion). Joumal o/Ihr Royal Slalistical Soriely. Series B 74. 187-220. Crowder, K. J. 2001: CitmicIII ",nrpeling rUJu. Boca Raton. FL: Qapmm & HalllCRC.
HDI8Ier.D. W ...............,S.I999:Applimsurri1'llltmahsis. New York: John Wiley" Sons. Inc. Baa..."" P. 2000: AnIIfy~isof nru/liwuillle m-riraJ N~· York: Sprincer.
.'11.
survival curve
See KAPL"~ER ES11MA11ON.Sl1JlVlVAL
ANALYSIS-AN OVERVIEW
survival luncUon
See SURVIVAL ANALYSIS
systematiC reviews and meta-analyals This is an approach to thc combining of n:suIas from the many individual CUNlCAL1'RIALS of a parlicuiartRatment orlherapy that may have been camed out over thc course of time. Such a procedure is ncccL:d because individual trials an: rarely large enough 10 answer the quc:slions we want to answer as reliably as we would like. In practice. most trials are too small for adequate eanclusions 10 be drawn about potentially small advantages of particular therapies. Advocacy ofl~e trials is a natural n:spanse to this situation. but it is nul always possible to launch "ery large trials before thCIBpies become widely acxeptc:d or rejected prcmatun:ly. An alcernative possibility is to examine the n:sults from all n:levant trials. a process that involves two CXJIDpanents. one qua/;IIl';ve. i.c. the extraction of the n:levantliterature and description of the awilable trials. in IeJms of their relevance and methodological stn:agths and weaknesses (the S)~/enltllic revi,.,). and the other qrlOlllilal;l'e, i.e. mathematically combining n:sults from difTerent studies, even on occasions when these studies have used diffen:nt measures to assess outcome. 1'1Iis component is known as a meill-tlllalysis (Normand. (999). Informal synthesis of evidence from different studies is. of counc., nothing new. but it is now generally accepted that meta-analysis gives the systematic n:view an objectivity that is ineVitably lacking in the classical review anicle and can alSID help the process to achieve greater pn:cision and gencnlisability of findings than any single: study. Then: remain sceptics who feel that the conclusions from a meta-analysis oRca go far beyond what the technique and the data justify. but despite such concerns. the demand for systematic reviews of healthcare intervcntions has developed rapidly during the last decade. initialc:d by the Widespread adoption of the principles of EVIDENCE-BASED MEDICINE both among healthcan: pnlCtitioners and policymakers. Such reviews are now inc~asingly used as a basis for both individual tmIIment decisions and the funding of hcaI~ and heallhcan: n:scarch worldwide. This growth in systematic n:views is reRccted in the cunaal stale of the COCHRANE C'OLLAIIORAtION database containing as it does mon: than 1200 complete systematic reviews. with a furtbel' 1000 due to be added soon. SystclDlllic revicwsand the subsequent mda-analysis have a number of aims: to l'C\'iew systematicaDy the available evidence from a puticuJar n:sean:h an:a: 10 provide quantitative
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SYSTEMATIC REVIEWS AND META-ANALYSIS summaries of the results from each sbidy: to combine the ~ts 8C'I'OSS studies if appropriate - such combinatian or mndts leads to gRalcrstalislicai power inemnmting lIealment ef1'cds: to 85SCSS the amau.. of Yariability between studies: to estinudc the degR:e ofbeneftt associalcd with a particuill1'sbidy trc:abnent: to idenlify study chamcterislics associated with particularly effeclive ImItments. Ideally. the trials included in a systematic review should be clinically homogeneous. For example. they might all study a similar type of patient for a similar dUl1ltion with the same treatment in the two anns of each trial. In practice. of course, the trials included are far more likely to differ in some aspects. such as eligibility criteria. duration of treatment. length of follow-up and how ancillary c~ is used. On occasions. even treatment itself may not be identical in all the trials. This implies that. in most cin:umstances. the objective of a systematic review conn,,' be equaled with that of a Single large trial. even if thatlrial has wide eligibility. While a single trial focuses on the effect of a specific treatment in specific situations. a meta-analysis aims for a more generalisable conclusion about the effect of a generic treatment policy in a wider range of areas. When the mals included in a systematic review do differ in some of their components. therapeutic effects may very well be different, but these differences an: likely to be in the size of the effects rather than their din:ction. It would. aner all. be cxlraordinary iftrc:almenteffects ~exactly Ihc same when eslimalc:d from trials in dif1'cn:nt countries.. in different popuIalioas. in ditTcn:nt age groups or under diffi:rcnt treatment regimens. If the studies ~ big enough it would be possible to meaMR these dif1'ercnces reliably. but in most cases this will noI be possible. However. rncta-analysis allows the investigation of soun::es of possible heterogeneity in the results from dift"erenttrials. as we shall see later, and disaJurages Ihc common. simplistic and often misleading inlClprctalion that the results ofindividual clinical bials ~ in conRict because some an: labelled 'positive' (i.e. statistically significant) and others 'negalive' (i.e. slalistically nonsignificant). A systematic approach to synlhcsising information can oRen both estimate the degn:e ofbencfil from a particular therapy and whether the benefit depends on specific characteristics of the studies. The selection of studies is the greatest single concern in applying meta-analysis and then: an: at least thn:e important components of the selection process. namely breadth, quality and representativeness (Pocock. 1996). Breadth relates to the decision as to whether to study a vcl')' specific narrow question (e.g. the same drug. disease and setting for studies following a common protocol) or a man:: generic problem (e.g. a broad class of treatments for a range of conditions in a variety ofseUings). The broader the meta-analysis. the more difficulty there is in interpreting the combined eyidence as
regards future policy. Consequently. the broader the metaanalysis. the man: it needs to be intClpl'Cled qualitatively rather than quantitatively. Quality and reliability of a systematic review is dependent on the quality of the data in the included studies. although criticisms of meta-analyses for including original studies of questionable quality are typical examples of shooting the messenger who bears bad news. Aspects of quality of the original articles that an: pertinent to the reliability of the meta-analysiS include a valid RANJX))lJS/a. TIO.~ process (we an: assuming that in meta-analysis of clinical trials. only randomised trials will be selected), MINIMISATION of potential BIASES introduced by DROPOllfS. acceptable methods of analysis. level of BUNDINO and recording of adequate clinical details. Seyeral attempts have been made to make this aspect of meta-analysis more rigorous by using the results given by applying specially construcled quality assessments scales to assess the candidate trials for inclusion in the aualysis. Detennining quality would be helped if the results from so many trials were not so poorly reported. In the future. this may be improved by the CONSORT statement (CON5OI.IDATED STAh"DARDS fOR REPoRTING "hIALS).
The representativcness of the studies in a systematic review depends largely on having an aca:ptable search stndcgy. Once the rc:scardIer has established the goals of the systematic review. an ambitious Iilcl1lture search needs to be undertaken. the literature obtained and then summarised. Possible soun:es of material include the published literature. ID1published Iiteratun:. uncompleted n:scan:h n:ports. work in progress. oonfcrcnce/symposia proc:ccdings. dissenatians. expert informants. granting agencies. trial n:gistries. industry and journal hand searching. The search will probably begin by using computerised bibliographic databases of published and unpublished research review articles.. forexample. MEDLINE. This is clearly a sensible strategy. although there is some evideace of deficiencies in MEDLINE when sean:hing for RANOOMISED CON11lOlJ.B) lRJAU. Ensuring that a mcta-anaIYSis is truly repn:senlalivc can be problematic. It has long been known that journal articles are not a representali\'e sample of work addressed to any particular an:a of n:scan:h. Rcscan:h with statistically significant results is potentially more likely to be submiued and published than work with null or nonsignificant results.. pal1icularly if the studies arc small. The problem is made worse by the fact that many medical studies look at multiple outcomes and there is a tendency for only those outcomes suggesting a significant ef1'cct to be mentioned when the study is wrillen up. Outcomes that show no clear treatment effect are often ignored and so will not be included in any later review of studies looking at those paJticular outcomes. Publication bias is likely to lead to an ovcrrcpresenlation of positive results. 4SS
SYSTEMATIC REVIEWS AND META-ANALYSIS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Clearly il becomes of some importance to assess the likelihood of publicalion bias in any mela-analysis Rporled in the lilendure. A well-known infonnal method of invcsaipiing this potential problem is the so-called RJNN!I. PlDr. usually a plot of a mellSlR or a study's precision (e.g. one over the SlANDARDERRal)againsteft'ect size. The mast precise eSlimales(e.g.lhosc frum the largest studies) will be althe lop oflhe plot and lhosc from less pn:cise or smallersludies al the bottom. The expeclalion ofa ·fWlnel· shape in the plot Rlies on two empirical observations. First. the variances of studies in a meta-analysis are not ideAlical. but are distributed in such a way that IheR are fewer precise studies aad ralher more imprecise ones and. second. at any nxed level of VAlUANCES. sludies an: symmelrically dislribulc:d aboul the MEAN. Evidence of publication bias is proVided by an absence of slUdies on the left-hand side or the base of the funnel. The assumption is that. whether because of editorial policy or author inaction or some other reason. these studies (which are not statistically significant) are the ODeS that might not be published. An eompte or a funnel plot suggesting the possible presenc:e of publication bias is given in the figure (taken from Duwl and Tweedie.. 2000). Various proposals have been made as to how 10 test for publication bias in a syslCmalic review although none orthese is wholly satisfactory. The danger oflhe tc:stiilg approach is the temptation 10 assume thai. if the lest is not significant, then: is no problem and the possibility of publication bias can be conftniently ignorai. In praclic:e. howeftr. publication bias is very likely endemic to all empirical Rsean:h and 50 should be assumed present. whatever the result of some tesling proceduRs with possibly low POWEll. Once the studies fCll'systematic review have been selected and the possible problems of publication bias acIdressed.
10
•
e
8
-;• 6
-l.
-J
4
0-
~--C
"! 4
e
I
-..;
• e.
2
0
10
•
8
-;• 6
-
nus
(b)
(a)
e
effc:ct sizes and variance estimates an: Cllbacted from the selected papers. reports. c:lc:.. and subjc:c:1cd 10 a melaanalysis. in which the aim is 10 proVide: a global tCSI of significance fCll'the: ovc:nll NULL HYPCJnIESJSofnoeffc:c:1 in all studies and tocalculale an eslimDlcand a CONFIDENCE INlERVAL of the o\'Crail effc:ct size. TWo models are usually considen:d. one involving fIXED EFFEl'"TS and the other RANDOM EfFErI'S (Fleiss., 1993; Sutton ellll.• 2000). The former assumes that the true eft'ect is the same: for all studies wheaus the lalter assumes that individual studies haft diffCRnt effect sizes lhal vary randomly around the nndom effects the ovaall mean efl'ect size. model spc:c:iftc:ally allows fOl' the exislcnce of both belwc:en saudy heterogeneity and within-study variabilily. When the resean:h question concerns whether treatment has procIucc:d an c:ffc:c:I. on the avcmge. in the sel of studies being analysed. then the nxc:d effects mode) for the studies may be the: mare appropriate; hen: then: is no intc:rest in generalising the results 10 other studies. Many statisticians believe. however. that the nndom effecls model is more appropriate than a lixc:d efl"c:cts model for meta-analysis., because between-study variation is an important SDUR::C of uncertainty that should not be ignarccl when assiping uncenainty into poolc:d results. Tests of homogc:neily are available. i.e:. a lest that the between-study variance component isz.cro- ifit is. a fixed effects mode) is t:lOnsidered juSlific:d. Such a tesl is., however. likely 10 be of low power far delc:c:ling departures frum homogc:neily and so its practical consequences are probably quite limited. The essential feature of both the fixed and nndom effc:cts models for mcla-analysis is the use of a weighlc:d mean of (Rabnent effect sizes frum the individual studies. with the weights usually being the Rciproc:als of the associatc:cl
.5
-1
..•.• , .••
.5 Effect size
05
• t.5
• •
~--C
.•.• , . • -l·
2
-1
.5
05
• • t.5
Effect size
systematic reviews and meta-analysls (a) Funnel plot of 35 simulated studes and msta-analysis with lroe eIIect size of zero: estimated effect size;s 0.080 with a 95% conIIdencB interval of[-0.018,0. 178}; (b) funnel plot as in (8) with five feflmosf studies suppTessed; overalleffectsize;s nowestimateclasO.124 with. 95% confidence itJtewaJof fO.037,O.210J. Repdntedfrom Duvaland Tweedie. 2000. withpennission fromTheJoumal of the American statistical Association. Copyright 2000 by lhe American Statistical Associatfon. All fiIJhIs reserved
______________________________________________________
variances. Effect sizes might be standanliscd mean dirreraac:es for continuous RESPONSE VARIAIIUS or RELAnVE RISKS AND OODS R.o\11OS for binary outcomes. Both fixed effccts and random effects models n:sult in a test of zelO effect size and a confidence interval for effect size. However. it should be remembered that. in general. a more important aspc:ct of meta-analysis is often the exploration of the likely heterogeneity of effect sius ti'om the diffen:Dt studies. Random effect models. forexunple. allow for such hc:lelOgeoeity but they do not offer any way of exploring and potentially explaining the reasons study results vary. In other words. random effects models do nol 'control for'. 'adjust for' or "explain away' heterogeneity. Understanding heterogeneity should perhaps be the prilDlll")' focus of the majority ofmetaanalyses carried out in medicine. The examination of hc:terogeneity may begin with formal statistical tests for its presence. but evc:a in the absence of statistical evidence of heterap:neity. exploration of the mationship of elJect size to study characteristics may still be valuable. The question of impollalltle is. what causes hc:leJogeneity in systematic reviews of clinical trials? Study of the causes of heterogeneity oftn:almcnt effects in a metaanalysis ollen involves the technique generally known as META-RIDlESSIOM. Esseatially. this is nothing more than a weighted n:grasion analysis with effect size as the dependent variable. a Dumber of study characteristics as explanatory variables and weights usually being the n:ciprocal of the sum of the estimalcd variance of a study and the estimated between-study variance. although other more complex approaches have beea described. Mela-regmssion can. like subgroup analysis within a single clinical trial. quickly become lillie more: than DATA DIEDOINO. This danger can be pallially dealt with at least by p~ification of the covariates. which will be inve5ligaled as potential SDUR:a of hc:leJogeneity. As an example of the syslcmatic review and associated meta-analysis we shall consider transcranial magnetic stimulation (TMS) for the tn:atment of depn:ssion. Such tratment involves placing a high-intensity mapetic field of brief duration at the scalp surface to induce an electrical field at the cortical surface that can alter neuronal fUnctiOD. Repetiti\IC TMS (I'TMS) invol\ICS applying trains of these magnetic pulses. In humans I'TMS bas beea shown to produee changes in frontallobc bloocIliow and to oonnalise the response to dexmethUDDC in depn:ssion. Since trials in the late 1990s.. I'TMS has been proposed as a tn:alment for drug-resistant deprasion. schizophrenia and mania. McNamara el al (2001) n:port a sySlcmalic review of the published dala. in which RANDOMlSa) CONTIIOLLfl) 1RL\I.S were sean:hed for using a varic:ty of dalabases. iDeloding Medlinc: and Embasc. Sixteen published clinical trials of rTMS for depn:ssion wen: identified, but eight were excluded because there was no randomised control group
SY~~CSAMP~
and a further three excluded for reasons gi\ICD in the original paper. The resulis from the five trials accepted for the mela.....ysis an: shown in the table.
systematic reviews and meta analysl. Data for five RCTs of ,TAfS
Trial J Trial 2 Trial 3 Trial 4 Trial S
rTMS
Pla"ebo
II
6
11 I
Improved Not implOved
6 7 I 8 4 4 6
Improved Not implOved
17 18
8 24
Improved Not implOvcd Improved Not implOved Improved Not implOved
4 2
4 I 10
n.e n:sulis from both Ihe fixed effects and nmdom effects maclels an:. for these dal&. eudly the same. The overall effect size (log odds ratio) is estimalcd 10 be 1.33 with a standard error of 0.37. leading to an estimated odds ratio of 3.78 with 9S., confidence interYai (1.83. 7.8 J). BSE (See also FOREST PLOrJ
DuYaI. s. sad Tweedie, R. L 2000: Nonparamdric "trim 8DII fill' IDdhod of accounting for publicalion biu in meIa-aDIII)'sis. JOUTIftII olllle Amerittlll SlalisliUlIAssocilJ'ion 9S.19-98. Flekl,J. L 1993: The slllislieal basis of mcllHDalysis. SIlIlislimiMethods in MwJiml ReNlUm 2. 121~S. MKN........, B.. Ita)', J. L, ArtInIn, 0. J.. IIDII BaaIfut, S. 2001: TranscnnniaI IDlpetie stimulation for dqIIasion and other psychiatric disorders. P$,rlt%,ical MerJidM 31. 1141~. N.....-, S. T. 1999: Mca.analysis: formulating. cvaJuating,. combining 8DII rqJOIting. SIlIIi"i"s in Meiicine II. 321-59. PUcock. S.J. 1996: Clinical bills: a statiSlician's perspective. In AnnilagC. P. and David. H. A. (eels). AdWlllces in biomelTy. Chichester: John Wiley cl Sons, Ltd. SIdtOII, A. J .. Ab....... It. ... JODIIt D. R. .... S........ T. A. 2000: Methods/or melll-anal)'lU in medical reMGrM. OUchcstcr: John Wiley &: Sons. Ltcl.
systematic sample
Every kth element in a list is incluclc:d in the sample. To oblain such a sample. begin with the sampling frame and BITIIDIe in an order. which may be alphabetical or some other order. Then select the number of samples to be taken. Selc:ct a random starling point in the liSI. Divide the size or the papulation by the number of samples to be taken. This is the length of interval. k. Tben every kth unit. depending on the SlaJ'ting point. is included in the sample until the Dumber of samples to be taIcc:n is reached. This may mean staning spin from the beginning of the list. For example. Litde. Keefe and While (1995) used a systematic sample when studying melanoma patients in
457
SYSTEMATIC SAMPLE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ a general practice. Every 125lh patieat on the general practice rqistcr was selected to be included: this was to yield a minimum of 60 indh'iduals. The main ad\'antage of systematic sampling is that it is a quick and easy-to-usc sampling method. particularly when dealing with large samples. where it is often used in prererence to SIMPLE RANDOM SAlIPLES. Howc\·er. irthere is a periodic cycle within Ihc sampling fmnle then estimates obtained from systematic sampling may be incorrect. Irthe periodic cycle is ruogniscd thea the Slalting point and the length or interval between chosen items can be varied. Systematic sampling
can often only be used when then: is a sampling fnunc available that can be onIeR:d in some way. For rurther details see Crawshaw and Chambers (1994) and Upton and Cook (2002). SLY Cra""b..." J. aad Cbmben.J. 1994: A t:oncise t:OIIrsr in A lerrl stalislirs. 3rd edition. <::helteDham: Stanley Thomes Publishers Ltd.
I...Itth, p.. Keele, 1\1. aad WIll.. J. 1995: Self-sclmling ror risk or melanoma: validit)' or self-mole counting by patients in a single: general PfB'=tice. BritiJlr MttiiMI J01U1IIJ1310. 912-16. (]pt.., G. aad Cook, I. 2002: DinlimDry of stalistia. Oxford: Oxford VDi\'mit)' Plas.
T kllstrlbutlon Also known as SIUdcnl"s t-distribution. this is the dislribution ofthe estimate ofabe MEAN of a NORMAL DISlRIBlmmr when the STANDARD ERROR has also been estimalc:d. 1he distribution is used when perfonning S1t1DENI"S t-TE5T.lfwe have a normal dislribulionofknown VARlANa.a2. then we can lest 10 sec iflhe mean of a set ofobserwlions .TI. .T2 •••• , .1',.. denaled m. is consistent with a hypothesised mean I' by calculating a COHFJDENa.INTERVAL far1'. 11Us is done by considering thai: m-/l
N=
PIn
will have an approximately &Iandard normal diSlribulion mean (mean 0. standard deviation I). denoted N(O.l) and calculating a 95 otI. CODfidence interval for I' as:
(m-I.96..jol/n)
m-p
T='""""l:I==
..;;rr;;
will have a l-distribution with n - I IBIRES OF fREEDOM. written I(n - 1). A 95 CJ, conftdencc interval for /l can then be exprasc:d as:
(m-IOJ1l5.j~/n)
< II < (m + lo.'I1S.j~ In) when: ICkOZS and to.975 &Ie the critical values rrom the I(n -
I) distribution. These values. which are chasen 10 ensun:: 2.5 4Jt of the probabililY deasily lies in each 1Si1. can be found rrom tables (Undley and ScoU. 1984) or computer packages. As long as n is al least three, like the N(O.I) dislribulion. the t-dislribution has a zero mean and is symmetric. but the variance is (n - 1)1(n - 3). As the sample size. n. incn:~ the variance approaches I. but for small sample sizes the variance will be gmder. which n:lects the uncertainty in the estimation of d'. Thcl-distribulionisn:latcdloolheroanunondistribulions. If we compare ahe statistics N and T. we will sec dud Tis men:ly N divided Ihrolllh by Now. £Io~ is known to have a OU-sQUARED DlSTRIBlJ"IlOX with n - I degn:es of fteedom and so we see that. mon: gc:acrally, the t-dislribution with n - I degn:cs offn:edom arises when an N(o.l ) variable
..fi!7ii!.
is multiplied by the sqU&le root of(n - I) and divided by abe squaue root of a %2(n - I) variable. Having observed this. if we now square T. wec:an see that it is (n - I) times the squan: of an N(O.I) variable divided by a r(n - I) variable. Now the squaue of an N(O.1) variable is a I) variable. so the square of T is (n - I) limes a X'2( I) variable divided by I limes a I) variable. The division of one chi-squan: variable by uother independent one with the COIMCt mulliplic:n is known to generate an F-disbibution. and so we can see thai Ihe sqUIIIC of a l-distributed variable (with n - I degn:es of freedom) wiD have an F-DlS1IUBUIlON (with I and n - 1 dcgn:es offn:edom). For fwther n:ading see Lc:emis (1986), Allman (1991) and Jones (2008). AGL
r(
ten -
AItmaa, D. O. 1991: I'raf/ital Iialistifl for IMtlifsl relUTflr.
Laadoa: Chapman &: Hall. J.... ttL C. 2001: The I family and their clase .ad diSlaDt Jdatioas JOlUU 0/ the Komur SlalilliraJ Society 37. 29l-lO2. ........., L l\L 1986: Relationships IIIIKIDI ccnmoa animate disarilJulions. The AIfW,.it:aIr SllIlmition 40. 2. 143-6. U...,., D. V. IIDII SnH, W. F. 1984: N~· Conrbrilge eleIMIII., Jllllmical tllbla. Cambrid,e: Cambrid,e Uni\'a5i1)' Plas.
teaching medical statlaacs Four maiD groups of people an:expectcd to learn medical statislic:s: unciellradUale students of medicioe and other healtllcare professional SU. jecls. healthcare practitioners. n:searehers in heallhcare and would-be medical statislicians. For all these groups we must select Ihe mosl appropriate material fram the huge amount available: (even to master Ihe contcntsofthe jOUlllai Stalislics in Medicine would lake me severallifelimes). This material wiD, in tum. partly ddennine the teaching method. StudenlS ran:ly havc much lime in their clOWcied cuniculum forstatistic:s and I'IIn:ly have much natura) sympathy far the SUbjecl. 11H:y sec their fulUn:s as practical people busy saving Ji\'Cs and carinI far Ihe sick. nal analysing data ar reading journals. They &Ie also usually at the age of maximum aJllficlence in their own infallibilily and hence difficult lOpc:nuade thai they might be mistaken in their image ofaheir fulan: roles. It is easier to persuade ahem of the n:lewnce of reading evidence Ihan number crunching. and such councs do beuu if they conc:enlnlc on the undc:ntanding ofraearch publications. I have found that my convenlionallectun:s an: of litlle value to Ibis IrouP and seminan when: they discuss papen. backed up by printed notes or web pages on Ihe slatistical principles. &Ie man: effective.
£1rcydt1ptlllt6c CtNIIJIIIUIiM 10 Mmlml SlGlB'" S«WId EditiM Ylcd by Briu s.. Everitt and ChrisIGJlh« R. JIaInIer «> '1.011 .folD Waley a Soas. Ltd
4S9
TEACHING MEDICAL STATISTICS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ Lcc:tun:s work better when students who havc been challenged with this material are then able 10 ask a statistician to explain things that puzzle them. The c:onccpls acquiR:d in this way are more likely 10 be backed up by other parts of their course than the calculation of CHI-SQUARED 'IESTS. If students can be equipped with the basic ideas of wriabilily. measurement. IL\NIXJr.USA11ON. estimation and signiftcance. we havc done well. The machines can do the sums. Increasing numben of hcalthcarc students are taught by problem-based learning (PBL). a system intended to pn:pare students for a life of EVIDENCE-BASED MEDICINE. Statistics is l"DI'Cly ta",ht as part of the core PBL progl'1UDl1lC. bUI is instead taughl as a separate addilion to PBL cases., or in a scparate. parallellecturc- or seminar-based counc., or DOl at all. This is bad news not only for medical Slatistics (and those who teaC'h it) bul also for rnc:dicine. Surc:ly the skills needed for the interprelDlion ofevidence should be central in a couJ'sc preparing studcnlS for evidence-bascd practice. It happens because lutors. mainl)'laboratory scientists or cUnicians. feel insecure about teaching statistics and because the 'problems' ofPBL are usually descriptions ora patienl 1\IIors need to be aJRvint'Cd that they do not need to know the subject to facilitate students" mutual education and course orpniscrs need 10 be aJRvinced that problems can be a publication or a community problem. that the patient case is not the only way. Healthcare practitioncrsarc usually taught statistics as pari of study for a higher professional qualification. The key application is still the interprelDlion of numerical evident'C. mainly in the context of published resean:h. However. they often have the marc immediate goal of passing a demanding examination with a high failure rate. Some of these examinations include some quite advanced statisaics. such as those in radiotherapy or public health. Tbe teacher can make usc of this by collaborating with the students 10 defeat the examiner and concentrating on past questions. I find that slDJting with a few multiple-chOice questions to identify areas of difficulty and then explaining the answClS the students get wrong works \'CI)' well. Once the basics ha\'C been coveml in this way. past examinalion questions fonn the ideal motivalOr. It is for the examinen to design their tests so thai in order to pISS them the saudenls must learn whll the examiners think they ncc:d 10 know. For those who do not havc to satisfy an examiner but simply wish to undcntand their own subject's lilmlture belter. indin:cl teaching is frequently used. Many journals have canicd long series of articles on statistics intended to help their readers understand what is published. a praclice that began with the early ground-breaking Ltmcel articles by Bradford Hill in the 19305 and continues still. Resean:hers have very diITerent needs from practitioners. Tbey must acqui~ the skills 10 design sludies and analyse data. Understanding of concepts. while still cc:atraJ. is not enough. Practical skills a~ usually developc:d in hands-on
computing practical classes. preferably using softwaM of the lype that they will usc in their own n:search.1.ectuMS have a more natural place in this teaching. as methods and their applications and limitations can be described. We can even risk a few mathematical fonnulac without 100 much discouragement of well-motiVDlc:d sauclents. The opportunity 10 discuss their own projects is very allractivc to these students. Textbooks are particularly impodaDt to this group. At one time the marIcct was flooded with poor books on medical Slatistics (Bland and Altman. 1987). bul there are now many good ones. Another source of stalistical education for ~ searchers comes (rom individual discussions of their projects with a statistician. This is a Iwo-way street. as they educate the Slatistician aboul the research topic and medicine in genc:nl. I have learnc:d 50 much from the people who have come to me for help. For new statisticians. statistics is usually a Master"s caursc: taken by graduates in mathematics or other quantitative subjects. It is possible to study statistics as a braDCh of mathematics without real data making much of an appearIIIK'C. but if studenls have chosen 10 study medical statistics speCifically we would expect them to want a practical course with the focus on application 10 real problems. Clearly. they must become familiar with the common techniques ofdesign and analysis and should be able to analyse data within bath the frcquentist and the Bayaian frameworks (see BAYESIAN MEIlIODS). As nearly all statistical analysis is now done USing general-purpose SlatisticaJ software. they should learn the basics of the softwaM they are likely 10 mc:eI. AI the time of writing. SM. Stala and BUOS would be contenden for the programs of choice. but familiarity with other Widely used or speCialist software eould be included. Stalisticians need not only technical skills but also the abililY to collaborate with and give advice to members of other disciplines. Expcricnc:e is the best Reacher. but experience is what you get just after you nec:ded it. We would like 10 give our studc:ats a bit of experience before they are plunged inlO real-life problems. Medicine and other healthcarc professions have much 10 teach us hcJc.. 1 used 10 run a session for MSc studc:ats in medical statistics when: I invited clinical rescan:hcrs who so",ht my ach'ice to come and get it in front of a live audience. I pointed out to them thai if 1 wenl 10 consult them, they would do it with an audience of medical students. Perhaps we could incorporate this type of advisory clinic into our lcaching. We wanlto enable our students not only 10 use the cunmt sel of statistical mc:lhods bul also 10 develop new ones where these are nc:eded. 10 this end they need some theory as well as the practice of Slatistics. I think that statisticians should also have a sccun: basis for thinking that the statistical methods that we routinely use arc in some way the besa methods we could usc and for this reason a theoretical course will provide valuable grounding. even though the)' may never usc it again.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ TIME SERIES IN MEDICINE We should not think that students wililcam and RUin all we teach them or that if we do not teach them something they will never know iL Being laq,ht is only a part orleaming and good students will continue to learn throughout their careers. What we must by to do is to give them the dem to relain what they have learned already and Ihe ability to add to their JMB knowledge whenever they need to. BIIIDd,J. M.'" AItIaaD, D. G. 1981: CaVQl dodar: a pim tale of medical stalistic:s tc:ltboolcs.. SriJish Met/itsl JourIltl129S. 979.
tetrachorlc corralaUon coetrIclent
See CORRD.A-
TJON
thrtve lines
See 0Il0W11I C1IARTS
tlme-dependent variables
nme-dcpendent covariates. also known u time-varying cowriales or updating covariates. are variables that can change their value over time. They are particularly impoltant for prognostic models. such as COX's RIDlESSION MODEL. They should be distinguished from fixed cowriates. which are measurable at baseline and do not change with time. Examples of fixed covariates are race and sex. Age varies with time. but is completely pmlictable fiom baseline data and so is not included among time«pendcnt covariates. nme-dependent covariates may be classed as being internal and external (Altman and de Slavola.. 1994). External factolS impact on outcome but do not explicitly refen:nce lime. e.g. the balf-life of a drug In:alment; when:as internal factors are measun:ments taken at set times relating to the individual or their condition. e.g. blood pressure or blood markers. The mason for considering the inclusion of time-dependent covariatcs is that including only baseline variables may ienon: a great deal of poICIItiaJ Pl'Olnostic infomudion. 11lerefon:. the inclusion of time-dependent covariates may substantially incn:ase the potential detail and accuracy in a model. For example. inaases (or decreases) o\'er lime in patienls' blood pressure may be a beta pmlictor of future pIOIlID5is than a single baseline value of blood pn:s5lR. The Cox. model can be extc:ndc:d to include timc-dependent covariates instead of, or in addition 10. ftxed cowriales. In simplest terms. the hazanI for a time-clcpeadent covariate takes the form: h(1)=exp(y:(I)). where /r(1) is the hazard at time I and is a lime-depcndent cowriate and y is ilS coeflicient value. As for fixed covariates. all data types can be enacn:cl as lime-dependent covariates into a Cox. regression model. II is impaltant to IISSCSS the assumplion of JIRaIOR11CXt.lA ILUARDS. once any time-depencL:nl covariate has been taken into account. These variables do add additional complications to any model. FilSt. they mquin: the dataset being analysed to contain additional variables or additional observations
=
(depending on the dalasct's structum). Second. it can be dimcull to obtain complete data on these wriables. especially with incrc:asing time. MJSSING DATA can be problematic. Third.. these variables effectivel)' incmase the choice of Cox models aVDilable for consideration. One must ensure that issues of multiplicity of testing am addn:ssed. Finally. there arc issues of inlelpmatiaa. Including time-depcndent covariates in a model may be pnactically simple. but the greater difftalily lies in interpreting Ihe data: one must be sum how any variable would be interpreted befo~ including it in a model. Simply. the hazard ratio for a time-depc:ndenl covariate represents an addilional change in risk associated with a change in this variable over time. For example. when considering bone pain as an outcome after treatment for plUSlale cana:r. one may wish to record Ihe developmenl of ostc. arthritis over time as the second condition may incn:ase the risk of bone pain. When interpraing oUlput it can be complicated bying to tease cause from effecl with such wriables.. MSIMP AftmaDf 0. G. aDd de StaYGla, B. L 1994: Practical pnJblems in filling a proportioaaI hazanls model to daIa with upd_ InCISIfto mealS or the covariata.. Slalislia in Mttlki1te 13.4. 30 1-41. CInes. 1\L A., GoaId. W. W.IlDllGat1ernz,R.G. 200J: An inlrotludiDIIto sutnral QJIIIlysis lISing Sial'. KYiscd edition. Texas: Slala PR:ss. 1\_ _ 0. ad Pumar, M. K. B. 1995: SIIrl'ilvU tIIItIlyJu: II JlrtlClimi approach. Chichester: Jaha Waley & Salls. Ltd. Plutodell, S. 1997: Cliniml trio&. New YOlk: Wiley lntc:r5ciencc.
Umeserles In medicine
Chatfteld(1989)hudeftned a time series as "a sequence ofobsc:lVatiaasordel-cd in time". In medicine and medial rc:sean:h. observations arc often onIercd in time and speciaJ k:chniques have evolved to deal with them. II is hc:lpfulto think of thn:c types of lime series: (I) single series. often long; (2) man: than one series. each of moderate length; and (3) many shorter series. 1he~ are at least Ibm: masons for collecting a single time series: (a) To lRCiicl some future event. An example might be measuring cmdinine clc:arance f'rom kidney failure patients where the main aim is to pmlict complete kidney failu~.
(b) To lesl whether some evenl in time hu an elrcel on subsequent outcomes. 1'bc:se are sometimes ailed beforc-and-aRer studies or intc:rruptc:d times series (Glass el QI•• 1975). Examples incluclc the effect of seat belt legislation on deaths due to car accidents. effect of NHS Direct on consullation to a genc:ral praclitioner and behavioural experiments in psychology. (c) To look for trends and m)'thms in the series. An example would be a spectral analysis of the: electroencephalogram (BEG) signal to measure the strength of alpha waves.
481
TIME SERIES IN MEDICINE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ For series of more moderate length a common thc:mc is to examine whether th~ is an ASSOC'IATION between two series. Examples include studies to examine relationships between ClDI deaths and environmental tempendure and daily deaths ftom heart disease and air pollution. Shorter series are often dealt with under the lenD 'repeated measures'. Tbc reason for measuring observations OVCl' time is that a series would more accurately ~ncct the aclion of treatment than a single measure al one point in time. A typical ell8Jl1ple would be repeated measures in • CLINICAL TRIAL. such as blood pn:ssure measun:d monthly for a year. Summary measures (Matthews el til.• 1990) such as the AREA UNDER DIE CURVE or the slope or the response over lime are often the outcomesofinlemil. Repcatingobservalio.can improve the accwacy of the eslimak:s of lmllmcnt effects. The most basic time series model is the autorqressive (AR ( I») model (Diggle. 1990). Given a time series x" from which the MEAN value has been subbac:tcd an AR( I) is given by:
x, =
ClX,_1
+ er. where -I < a < 1
This model is often called a Mtlrkov model because the value at one point in lime only depends on the value at the point immediately pn:ceding it. The model is easily extended to AR(P), when: p> 1. For p=2. Cleflain values of the aJCffic:ients can give models in which cycles appear. Some forecasting models use autoregressive models. but in medicine these are mn:ly used because usually one is more intcn:sted in estimating trends. which are I1IOIC easily Rtted using convention models. The complemenlaly model to the AR( I) is the moving average model MA(l):
:cr = er+fJEt-I, where-I I. These models can be combined to produce an autorqn:.ssive moving average ARMA model. This has been round to model many time series and requircsonly low values ofp and q. The procedure of filling these models is often known as Box-Jenkins modellilll (Box and Jenkins. 1976). In general. this type or modellilll is more cammon in the forecasting and CXJntrol of industrial processes. although on occasion it has been applied in medicine. Perhaps Ihc most ClOI11IDOn feat~ of the analysis of clinical signals is to look for regularly occurring features or rhythms (Campbell. 1996). For humans to maintain stable bodily functions, clinical signals must be constrained to lie within cenain limits. This is done using nonlinear feedback loops. which tends to make patterns within signals recur regularly. A simple example will iIIuslmle the point. To remain healthy. hum.. must maintain blood pressure to within cedain narrow limits. Blood pressum is mediated through Ihc baron:ceplors. located in the wall of the aortic
an:h and in the wall of the carotid sinus. If blood pressure is too high. then signals from the barareceptolS ~It in vas0dilation. which drops the blood pressure. If pressure is too low. then vasoconstriclion oc:c:urs to increase the blood prcss~. The feedback mechanism is thought to be nonlinear and inc:orpondes a delay and, for Ihcse reasons. at-rest rhythms can occur spontaneously. Periodogram analysis involves dccompasilll a signal into individual frequency components where Ihc amplitude of these components is proporlionalto Ihc 'enCl'J)" of the signal at that frequency. It is aCXJnvenient method for summarisilll a long time series and is a natural procedure if we believe then: are mythms in the dala. Periodognm analysis is the method of choiCle for the analysis of clinical signals. The problem with the pcriodogram isihat it is an inconsistent eslimator. in that its VARL\.'fCE dues not reduce as the sample size increases. To achieve a consilient estimator various smoothillltcchniques (known as 'windows') an: applied to the periodogram. so that it estimates what is known as the spedrum. Then:. are three major components to be found in a Iypical heart rate spectrum and these an: also present in the blood pressure spectrum. A region of activity occurs at around 0.25 Hz. which is aUributable to rapindion (respiratory sinus arrhythmia) and this is thought 10 be a marker of \'apl (parasympathetic) activity. A second component at around 0.1 Hz arises rrom spontaneous vasomotor aclivity within Ihc blood pressure control system and is mediated by vagal and sympathelic acti\·ity. A third. low-frequency component al around 0.04 Hz is thought to arise ft'om thermoregulatory aclivity. An example is given in the first figum on pDIe 463 (Bernardi el al.• 2(01). which shows the effcct or recitation of manlnlS or prayen on the spcctnun or respiration. heart. rale (RR interval) blood pressure and mid-c:ercbral blood Row. II can be sec:a that n:cilalion conocnlrDtes the power of the signal at a cycle with a frequency of about 6 cycles per minute (0.1 Hz). Some signals are esseatially continuous, w~ othcn are discrete. For example. the heart rate is measured from surface electrodes on the chest from the electrocardiogram (ECO). Although the ECO is continuous. the heart nile is usually derived rrom the OR' wave in the ECO. which is a sudden spike just pn:cedilll the venbicularcontrac:tion. ThUs, the heart beat si,naI is essenlially a point process. Some authors have analysed the inlclbeat intervals. thus arriving al a spc:cbUln. which ellimales frequencies per beat. ralhcr than per uniltime. Others sample the heart rate (or RR interval) si,naI at Iqular intervals or niter the point process 10 produce a continuous signal that can be sampled. The elcctrocnc:c:phlogram (EEO) is electrical aclivity of the brain measured by electrodes at the surface of the skaD. Then: is an immc:ase amount of literature devalcd to the spectral analysis ofEEOs.ln particular. six spcctml pcakscan be idcatified. These peaks. with a typical range of frequencies
~
~
M
N
I
s
e
~
E
S
~
_________________________________________________
~J_rlt_
A
' J--"","rIt_
J
c2'.
1
J
----- ~--
----
l~. . . . .J~_ ft
•
___
J~= ~Z=ZJ~~_
O~I----------~~iO~I----------~~
~
time ..... In medicine Spec:fIUm sIJowiW etfecIs (in one subject) ofdrytlrmlt: dfuaIs 0DmpIIIfId with spontaneous breaIhIntI, on IffIIlIJitaIoly and cadwasc:ular thyIhms. Nate slow thythmlc osdIIations (~ ShnIn) in .. signals duJfnIJ t8CIlalion (Bematd et 111., 2001, British MedIcal Journal32S, publishing &oupJ an:: della 1 (0.5-2.0 Hz). dclta 2 (2.0-4.0 Hz). theta (4.0-8.0 Hz). alpha (8.0-12.0Hz).Ii&_ (12.0-14.0 Hz) and beta (14.~20.0 Hz). The peaks can be used, for CXlllllpic. to classify cliffelalt levels or sleep. Rcanll, lhcR has been illlCn:lt in describing neunaI pnxesscs in the ClIIDIaI of nonlinear~aad. in particular. ia the mpiclycvolvilll of .'nmIni61iC' _ •• An assum..- barCR appl,iagspec:lnll analysis is Ihal tile signal is ~ (i.e.. the .... and \'IIIi1lllClC do aaI chaap). However, mc:cIicai sipaIs ~ not IIaIionaIy in the usual sense. The)' canl8in IbytIuns thai may COllIe and 10 in the lime iatcnal. dac lRIqucacics ...y vary ar amplit_s or .c,clcs aI ccdaiD fnqueacics incIasc or cIccIase. SpedraI anal,1iI CCIIIlicIcrs tile cnIin: time interval aad 10 ~)'C1cs dud onlyaceur in putorahc: iab:l\laI will hawlhc:irspcclnl peaks
ftc"
attc:nualalh, Ihe low power inCllhcrpartsoflhe intcmd. One solution is 10 divide the series up inID sections ..... CDlllpllIe the spec:IIWn rarcach sec:ti_. 'I1Ie dilliculty heft is IhaI il is DOl mdistic 10 dIiDk ofa sip bcilll SlDlionIIr, in sccIions. A
14~
with pennissIon from the BAfJ
better inlUilivc: aaaclel is one in which die lis- 6ewtlves' slowly 10 dud Ihe nonstatianary campaacnl is slaw ia camparisDJI with die: sipal in which ~an: inacn:atal. The new), dcvcIopccIlelci ofWAVELI!l' AlW.YSIS is usccIlO ....., . sipals orlhiskiad. The main pnlbiem wilh liIDe &eries is thai &erial cam:laaion ilmdidala one of tile ID8in IIIIIUIftPIicms or COIMIItimiai n:p:ssion. namel, thallhc cmHI in the: aaadal an: incIcpende.. or each other. ~ second prabIcm is thai if we an: i_rated in Ihe n:lationsllip between two lime clapendmil wriabIcs);andz,....,oIhcr variable associated withti.-wiU be a ccmrauncli:r between Ibem. AIr cumpIe. ir bulb sc:ries illlRllllO with lime. thea ..." will appear com:latad. This has been Ihe IOIIIQ: of much BlllUlelllCnl. with positive c:anUalions sach as those between the annual popuIlIIion of Holland and the number of storks· nc:ats ad sales or icc cn:am and deaths by dnJwninc beinc qIIDIaI. eridencc or A ....i_! 1b make pnIIR:SS aae has to try and lit • mDCII:l dial n:1IIDftS IheClonf'aundcrvariablcL For. conIinuous oulcolllc
TIME SERIES IN MEDICINE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ suppasclhe model isy,=/lx,+rJ,.I= 1•.. •• n. WheMY, is the dependent value measural at liIDe I. x, a vector ofcanfOunder and predictor wriables and , a wctor of rqlasion CXlCfficic:nls. If the MSiduais an: serially corMlab:cl. thea ordinary least sqIIIIIa does nul provide valid estimates of the SfAMl\RD EIlAIDRS or the par&IIIICteIs. If we assume dlat x, ad II, are gc:nendccl by AR( 1) processes willi panunc:tcn a and y. lhea, using ordinary leal squares to estimate~, Ihc ratio of the eslimatcd variance to Ihc true variaDcIe is approximately (1 ay)l(l +(1). In general. x, aad tI, are likely to be: positively CIOII'eIatcd. n..s the etrcct of ignoring serial COItIII!l.A1DI is to giveartifk:iaJly low eslimates of the sbIndanI errar(SE)oftllc aqaasiaa caemcienls.. 'l'bis means dc:cJaring significBIIIX: IIICR oRal Ihan the signifk:BIIIX: level would 1iUlFSt, under the NULL HYFOIIIESIS or no ASSODATION. AsSUlllinl tI is known. a method or generalised least squares known as the CCJc:luanc:-Orc:ua pmccchae (Cochnne and On:ult, 1949) can be employed. Write y~ = y,-aYI-I and.%~ = .1',-tl.1',_I. Oblain an ellimale or fJ using ordinary least sqlUlla OII}~ and .1';. However. since: tI will DOl usually be Down it CIIII be eslimatcd frum Ibc: ordinary Ic:IIIl squan:s ~iduals e, by: tI
=
te,e,-I/t;'-J ,=2
t=l
This leads to an ik:l1lti~ pmcedwe in which we Cllll canslrUcl a new set of lransfonncd variables and thus a new
set of n:gn:ssion estimates .nd so GIl until conYellence.
n.cilcl1lliyeCocJuane>.on:uIl~canbeinteqnlcd
as a slepwisc: algorithm for computing lllAXIMUM umJHOOD SllMAlIONS or tI and fJ when: the inilial ab&ervaIion YI is reganlccl as filu:d. If the n:siduaIs can be assumed to be narmally dislributal thea full maximum likelihood mdhacIs are aYailable, which estimaIe a andfJ simultaneously, and this can be gaac:nWa to higher anIeI- auton:grc:ssi~ models. These models can be: filled using (say) PROC AtITOREG ar AUTOREGRPSSION in the canpula' packqcs SAS and SPSS mspectively (sec: STATImCAL MC'ICACES)•. However. ~ 1'OCORREI.A'J'I or residuals can appear because the: wrong mocIDl is being filted. Por c:umple. if Ihc bue aaponse was quadndie and a linear madel was ftlled. Ihc enors would appear as a gmup ofneptiYe enars, a group ofposilive enun and then a gnHIp of lICIaliVC: errors. Il is a bc:uc:r SInItcgy 10 oblain a good model th8II using an aulOn:grc:ssiye enormodel as a panacea far models thaa simply do not fiL IntmuplCd time: series an: often either befon:-anckftc:r tn:alment for single: subjc:cls 01" befan:-and-after intervention far populations. An importlllll question for the analysis is whcthCl' the data ~ alln:latc:d. 1be main n:uon for com:lalioa might be because: the same subjc:ct is measam:d Wan: and after. However. if we Mmovecllhc subject dl'c:cl.the data may be indcpc:nclc:al and so, far example. a IWCHIII11pIc: SruDENrS I-TEST would be valid. One: could look fOr tme difren:nt sods of effecl on an oulcolne of an intervention at one poinl in time: Ca) a change ill slope. (b) a change in level or (e) a combination ofa change in slope and a change in level
25000 -
20000
I 'e
Ii
Conlrol coopendives
CooperaMs in NHS Direct areas: ----All contads •••.•.• Contacls dealt with by telephone advice - - Conlac1s res'*ing in direct contact with a doctor
15000
l-, l \. .I \.,\,
10000
,
5000
: .. .. , .. ....." " ,,, \ .
...... "
~....
.........
~
"
o~--------------------------------------------~-------------------------------------------Mar May .lUI Sap Nov Jan Mar May JuI Sep Nov Jan
1997
1998 Mon1h
Irne ..... In meclclne Monthly number of conI.acIs with GPs before and after introduction of NHS Dited (Munro fit aI., 2000, Bfilish Medical Journal 321, 150-3, with permission from the 8MJ Publishing Group)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ TIME SERIES IN MEDICINE NHS Di~ is a Iclephane sySlem desipcd to n:lieve pn:ssun: on pacral practilioners (GPs). It wu intnxfuced into the UK in 2001 (MulllO el 01•• 2000). We an: intm:sted whether il has an eft"ect on the number of telephone calls to GPs. The most likely model is a change ia slope (see the second fig~ on page 464). One simple method is Ihe rollowing. We make the origin for lime the point at which the intervention oc:curmL The model is y,=a+/l.I+/J2t' +E,. wbIft y, is Ihe monthly number of calls to selected pmcticcs in month , and " = 0 if t < O. t' =1 ir I> O. Thus a tcst of the eft"ecl is to test whether ~=O. As slaICd earlier. conventional n:gression will gi~ inwlid JaUlIs ir the errors arc serially cormated and so we ncc:cl to check the serial COIKlalion or E,. Again. we can usc ccltain stalistical padcagcs to fit Ihe model assuming the enors arc pnendcd by an auton:grasive process. Many epidemiological sericscxmsist orcounts and n:qui~ Poisson ~p:ssion rather tbaa anlinuy linear n:gression. We can also employ a method similar to the Cochrane-On:UIt mc:thod to allow far serial COIRlation and usc CJENEIlAI..BED BI1falA1lNO EQUATIONS to estimate Ihe paramc:lels. Campbell (19M) analysed Ihe dependence or daily deaths fmm sudden infllDl de_ syndrome (SIDS) in England and Wales flUID 1979 to 1913 on the mean daily enYironmenlal tempenllUre measured in London. The input was ..... daily tcmpenllUre and the output daily clelllhs due 10 SIDS. ~ is clear seasonality in the modaIilY series but this does nat mean thai the~ is a causal m.tionship between lempenIIure and col cIeaIhs since many rKlan behavc seasonally. such as length or day and rainfall. II is only when lhcse elTects arc mnovedcan wecleduce a possible relationship. A model was litted that maoWMI seasonality and then included a linear tempenllUn: eft"ecL 1be coclllcieni associated willi me_ tempc:nalUre 3-S days befon: the death was -0.041 (SE 0.0(5). We intcrpn:1 this u saying thai aloe drop in tempenllUre is associated with a rise in SIDS by about 4 c.t. Further investigations demonstrated thai the n:Jalioaship was approximately linear. We can lest the n:siduals far autacomlation. using tests such as the Dnin-Watson lest (fint-ordcr AR) and the Ljung-Box (gcaeral order). However. one should ask if it is sensible to tcsl for serial earn:lalion and only include serial COIKlation in the model if the test is significant. One should also ask why the data an: serially c~laIc:cI. Serial COIKlalion could be split into intrinsic COIIelalion (eacIopnous)aadexlrin:JiccomJation(exogenaus).lnlrinsic COIKlalion means that the value at a particular lime depends dim:tly on the value aI an earlier lime. Examples include: serum choJestemJ at dift"cn:nt limes. population iD ap groups in sucx:essive yean. epidemics of measles. £Xbinsic COIKIalion OCCIIJ5 becausebOlh Yllriables dcpcadOD some thircl (timcdcpcndent)variablc. Examples include daily SIDS. when: the
cleathsan: not caused by epidcmics aad an: aly UllMlatcd 10 each other except throulh (say) the weathcl'. We will not cover n:pealcd measun:s in detail here. Commanly they arise when individuals havc measurements taken n:pc:atedly over time (see IlDEAlED t.lEAstJRES AlW.YSfS OF VAlUANCE). Often the serial concladon aspect or the data can be n:moved by the simple: expc:client or using 1111IIIIIIII')' measwa (Manhews el 01•• 1990). If nOI. Wiually either a simple AR(I) model is IISSIIIIICd. ar what is known as an exchangeable earn:lalion model or compound s)'llllDCb'y. This is pnended by a modc:J of the fonn:
'il =
/J + Gi + Eit
when:Yn is an outcome at lime Ion subjecl i. a, is theefl'ecI of subject i, which is assumed normally disbibutccl with wJiIIIICC 0;. E(a,a)=O wilen i_jand EU has wriance 0 2• 1beeft'cctofthis is to generate a covariance matrix with ~ on the oft"-diagonal and a2 + 0; on the diagonal tcnns. Allhough one would expect measuremenlS made funhcr away to be less c:arn:lated (i.e. perhaps an AR( I)), in practice compound symmeary has been found to be a rasonable assumption in many cases. We need to distinguish between methods when: serial com:lalion is _ important pari or the madel. such as rar pn:cIiclion. and whIR it is simply a Duisance. If it is a nuisance. theD we need to examine intrinsic and exlrinsic conelalion. We should allow far serial c~lalion in rep:ssion IDDCIelling. Often serial c~laIion can be "made to go away' and so the lime series aspect is nat a majar concern. Compound syllUDClry is a useful assumption far repealed mcasun:s in RANDOMISED CONtROIJ.ED 1RIA1S. MJC
..................."p.
BaDdbIIIlI.G.. C....al,s.,FdodIII, ...
Wdowcz.Jc.s.le,J. ad Lql,A. 2001: El'cct ofrosal)' pra)'Cr and )'oga lIIIIItru on autoaomic: canIioYascuiar rb)'lhms: comparaIive study. Brili'" Medical JDUmtlI 323. 1446-9. au. o. E. P. MIl JaIcIaI. G. 1976: Timr srrirstllllllylis:/Dr«tUliIIg fIIItl conlTDI. San Francisco: Halden Day. CalDpbd, Me J. 1994: 'llme series rqn:ssionfarc:ouals: an investigation inlOtbc n:laIionship bctwcn suddca infllll death S)'DIIrvIK aad cDYironmentallelllpCllllft. JDIlnItII 11/ Ih~ Ro)'tll Slalisli«l' Sode". S~rira A 157. 191-201. C. .pIIeIL Me J. 1996: SpccbBI anaI)'sis of cliaic:aI sipals.: an interface bdweca mcdicalltatiSlicians .ad mcdic:aI CllJineen.. St,,'utkal MrlllDth in Mrtlkll'Rrmum 5.51-66. a......, c. 1989: 'Thr_Iyris 11/1.ria: ,. ;"/roiMttion. 4th cdiliall.. LancIan: C'laapmaa A Hall. C. . . . . . De ad Orntt.G. B. J949: Application oflcast sqUImi ~pasion to n:lalioaships c:ontaiainr; IUllacDJRlalcd cnor lamS. JIHII'IItII of lhe Alntrit.Yur Slalistical AsJoritIlioIr 44.32-61. ...... P.J.I990:1iIrwMries.AbiOllalialimihllnHiltcliM.Oxfonl:Oxfoni Science Public:llioas. GIaII. O. V.. wu.., V. L - ~J. Me .. III. 1975: Design _ allt")'au of ,_ JerieJ experinwnlJ.
Colando: CoIando Assac:ialcd Pre&s. Ma.......,J. N. s.. A".D. 0.0., C..,..., l\1.J. ud .,....J. P. 1990: Aaal,sisof serial IDC8S1IIaIIeIII
in medical n:scan:b. BriliM Metlital Jollmlll 300.
485
TIME TRADE-oFF TECHNIQUE _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ ~S.
1\IUIII'O. J .. NIdIoI. J., O·C.......~ A.... KDowht, E.
2000: Impact of NHS DiKct oa demand for immediate CaR: observational smdy. Britislr MediC'G1 JoumoJ 321. 150-3.
time tra.off technique
See VON NEUMAN-
MORCIENS'I'fllN STAN1WlD GAMBLE
total fertility rate (TRF)
See DEMOOIWHY
transformations The use of Innsformations in slatistics has a long hiSlOry. Forexamplc..the Wilson-Hilferty cube root Innsformation for CHI-5QUo\RE DISTRIB~S. the Fisher z-TRANSFORMATION ror CORRELATIONS. the use of logarithms for biological data aDd the arc-sine root lransformations for proportions ~ well-known procedures. In most cases the use of lransronnalions is not an end in ilSelf. but rather a meaDS to an end. '11Ie ultimale benefit is usually nol what the transformation dirccdy achieves. but rather thaa it allows subsequent analysis 10 be simpler. IIIOIC revealing or more BClCurate. Whaa is most important is how the lransfonnalion aids in the inlel'pmation and description of the datL The transfonnalions may be applied to observations. either response or explanatory variables. or to panunc:tc:n or statistics. or they might be an explicit part of a stalistical model. '11Ie purposes of using transfonnalions include:
(.<-_
corm:t slnlcture in lhe systematic part of die: model and also achieve homogc:ncous and Gaussian error distributions. It is unlikely that a single lnIDsfonnalion will achieve all or these exacdy. It islhe CorKCt systemalie that is the most importanl or the thrc:e aspects to achieve. For a set of observations yT = (Y, ••••• VII). Box and Cox proposed the model: ....l)
'i = X;fJ+e;
(1)
where X, is lhe ith lOW of the clesign matrix "- e, are independeat. ", - N(O. a2).1n this model a single,t isassumed to achieYe the thn:e objc:ctive5 of a simple systemalic struc~. hOlllClCeneily of variance and nonnal enol'S. A problematic aspect or equation ( I) is that die: in~ talion of the parameter P depends on 1. However. various aspects orP. in particular. the direction of fJ (repn:sentcd by ,,/(length(.8). say) or the mio oflwo rqn:ssion caeflicic:ats. "1/fJ:. which measures the n:iatiYe imponance of one explanatory variable to another. both have interpretations that are not dependent on 1. For models involving lransfonnations some IIspeCIS of inren:nce for the regn:ssion codftcient. fJ. have been controversial. In an unconditional approach A is In:atcd as a parameter on an equal footing with all the other parameters: however. the interpretation of fJ depends on lhe estimated A. and the variance of fJ is Yery large. In the conditional approaclJ inference about fJ is made on the estimated lnIDsformed scale. ignoring the fact that 1 has been estimated from lhe observations. which is not entirely salisfactory because it ignores the uncc:nainly associated with the estimation of 1. Aspc:cls of inference that are less contro\lasial ~ lests of fJ =0. hypolheses that ha\le an interpretalion irrespc:ctive of A. In some applications transfonnations back to the original V scale are desirable and can be particularly useful for graphical display. Caution is necessary in lransfonning interpretations from one scale to the other: e.g. a lack of interaction on Ihe transformed scale should not be intelprc:led as a lack of interaction on the original scale. Power lransronnations allow predictions back to the original scale because they are monotonic. For model (1). the quantit)' I +A.(XJJ)I/A. when 1 ¢O. or exp(XJ1). when,t = O. is the pedicled median of the distribution of V given Xc,. In multiple regrcliSion modelling it is common to consider transformations of explanatory variables. In Ihe model: V;
= a +"X:l ) + e;
(2)
for scalar X. where et-N(O. 0 2 ). the purpose of adding Ihe extra parameter1. is to fit the systemalic structure of the model belter. with the requirement of homogeneity or variance and normalilY of '" playing a lesser role. Whether lhe tmnsformation achieves symmetry or normality of the marginal
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ TREE-sTRUCTURED METHODS(TSM) distribution ofXC.l) is usually of less importance. other than to raluce Ihe SDlsmYlI'Yof OUIUERS in X or unless one needs to model the X distribution. Canoll and Ruppen (1988) have developedu approach tOr aonlincar n:gn:aion models in which the sy&lemalic part of the model• .J(X. /I). is Icaown through subject matter considerations. For example. the MichaclirMenlcn equation for enzyme n:actions is Y={JoX/(/J1 +X). 11te same tranlfonnalion is applied 10 boIh siclc:s of this equation by assuming the model riA.) = (/(X,./J»f).) + e" where e, - N(Oo (12). 1be modc:l assumes that the unlnlnsfannc:d n:Jatianship alracly fits Ihe MEDIAN of the data adequately. but that the n:siduals exhibit hc:lc:roscc:dasticity ancUor nonnormality. 1hc: main aim of the InIDsform-boah-sides approach is to make: the residuals nonnal with constant varianc:e. henc:e improving properties and infen:ace associalccl witheSlimalcS of (J. An imporlaDt aspecl of the appmach is that the intc:rpn:tation or fJ cIoc:s not depend an A. 11te accc:lc:rated failun: time model for censon:d liUl'Vival data. 100(Tt ) XJJ +e,. w~ e,- N(O. cf). eaR be viewccl as a special cue of Ihe power transfonnation model In the exlcnsion or the Box-Cox proccdun: to multivariate data a separaIC powerlnRsformation paramc:Icr is 8S5umed far each component of the: multivariate vector. Solomon and 1&ylor (19Y9)COIIsidc:led Box~ transformations forcomponenlS of variance models. Power transformations have been used to assist in estimating Mgressian cenlilc:s. which have application in eslDblishing refc:n:nc:e ranges. In the UoIS ).IEJIIQI) at each fixed value of a scalar variable .Y the distribution of r is assumed to be normal following a power transfonnalion: &om this the pc:I1:entilc:s can be easily calculated. 'I1Ic: modc:l assumes that the: median of r. the: power transformation panunc:te:r and the scale paramclcrofthe NORMAL DISTRIBUTION all \'DIy smaolhIy as a fundion of .Y. In AIDS studies, vinlload and CD4 c:ounI. a mc:asun: of the immune system. are fn:quendy measuml and are importanl indicators of disease progn:asion. Both ofthe:se variables are highly skewed. Relatively wmplicatc:d longitudinal and joint longiludinal-SUrvivai maclels have bc:c:n applied to serial mc:asun:menlS of these markers.. Nearly eVCl)One: uses the: loprithm of viral load. either log.. Jog 10 or 1012' ForCD4 counts some authan usc: the 101 for case: of intc:rpn:talion. whe:n:as alhc:rs usc: CD4IH or CD4 112 to elLl~ thai the assumptions in the:ir models. such as homogeneily of variance and symmc:1Jy for the: "USURaENt' ERROR. wc= salisfied for their data. PSA is a common blood test used in prostate cancer studies both for sm:eningud to monitordiscasc: progn:ssion. PSA is quite: skew and it is natural to alllsider a 100000thm lnRsformalian bcaausc the: PSA value is thoughl to be rauPly proportional to Ihe volume of the: tumour. which grows approximately exponentially. In pnc:lice log(PSA + I) has
=
bc:cn used. because PSAcanbeclosc: to or even equal tozc:rv. cauSing a stanclanl log lrallsfonnation to produce 100 many luge negative values or not be: calculable. For further details sec Cole: anel G~en (1992). Tsialis. DcGnIltoIa and Wulfsahn (I99S). Slate and Cronin (1997) and Wang and Taylor (2001). JMGT
Box. O. E. P.... co-. D." 1964: An anaIysisoflransfarmatialls (willi cliscassioa). JoumaJ 0/lire Rt110l StQlistictJl Society. Stries II. 26.211-52. Carroll. .. J. ad Ruppert, Do 1981: TrtIIuforlfltllitm ""tI ,...~ighling in regres;siM. Landon: Chapman & Hall. Cole, T. J. IIDIII Onea,P.J. 1992: SmaalhiDJ~fen::nceccntilc:curvcs: die LMS mdhodaadpenalizcdlikclihood.StQIUtic:sinMevliciiw Il.llOS-l9. s.Jda. .. Me 1992: The Box-C'ox lraDsfonnalioa tcc!.ique: a n:vicw. 1M StatiJtitklll. 41. 169-18. Slate, Eo H.IIDIII Cnda, K. A. 1997: Oaansepaint madc:liDg of Ioaliituc&naI PSA IS a biomarker rar prostate caacc:r. Ow Siudie. ill Ba)"esilln Stalislic. III. SpringerVerlag.. pp. 435-56. SoIDIaoa, P. J. IlDlllTaJIor. J. Me G. 1999: ~alily ancllrlllsfarmaticm in varianc:c compaac:nls madel. Biometrika 86. 289-300. TsIatIs. A. A., DlGndtala. v. IIDIII WaUiaIm, l\L S. 1995: Modclinl the ~latiamhip of survivallD longitudinal data IIIC&ISIWII with anN". Applicaas to survival and CD4 counts in .-icnls with AIDS. JOlIrIllll oj 1M Ammt'tlll Stalis· lical ABtNiIlIimI 90. 'D-37. W.... Y.IIDIII ,..,.,r, J. M. O. 2001: Joinlly modeling longitudinal and neal time data: IIIIPIicatic.a in AIDS SIUda. JOurJlQloJ the Amt'dnm StatiJtital AD«illt_ 96. 895-905.
trae-structured methods (TSII)
These are methods clcsiga to produce intc:rpn:lable pmliction rules by subdividing data into subgroups that are homogclDN5 wilh respect 10 both ClOvariates and outwmc:. Predictions flow from Ihis oulCOmc COlLllancy. with simple: subgraup summariessuflicing.1he intc:lpretability orahe auc:nclaat pmliClion rules derivcs &om the: simple. n:cursive rashion by which Ihe covariates an: employed in eliciling the: subcruups. As a consequence of this simplicity tree-struclunxl metbacls have enjoyed widc:spread papularityo panicularly in biomedical sellinp. However. or course:. all this simplicily belies a number of issues, c:spccially pc:naininglo prediction performance. thai havc spawDed considerable n:ccnt n:scan:h activity. TSM prediction rules can be developed far boIh categorical aad continuous outcomes. rellecting classiftcalion and relRssian problems rapc:ctively. It was the com:spondingly named monograph CllUsi/imtiorr Qlfd rqression trees by Bmman ellli. (1984) IhaI. by way of c:stablishiag a methodological rramewcd and providing sc:veral compelling appIicalions. ruc:lled the: subsc:quc:nt popularity or TSM. Indc:cd. tn:c:-SlruClun:d methncls are frequently mcnm to by the monograph's Iitl~buccI acronym. CART. The tcnninolo&)' 'n:cUlSive partitioning' is also commonplace., wilh anothc:r n:levant monograph being that by Zhlllll and Singer (1999). The '1R:C°lcnninoiogy itself derives from the compeniOD graphical depictions of the: fitted models (sc:c: the first 487
TREE-S'TRUCTURED METHODS(TSM) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ figure). As an historical note~ Ihc fomunners ofTSM dale to
the 1960s.
Node 2 tree-structurecl method. Log-rank sulVivaJ ttee The basic TSM paradigm. as de\oclopcd by Breiman el al. (1984) and outlined later. has been extended in many directions. or particular importance from a biomedical standpoint ~ eldcasioM to survival outcomes aDd LONOJruDlNAL DATA. Tbe raultant methods are deScribed in an overview article (Segal. 1995). Here. illustration ofTSM will make recOUJ1lC to dae 5U1MVAL ANALYSIS applications since this allows exposition orTSM fWKlamenaals. as well as showusing a selling ror which tree concepts seem particularly weD suik:d. The celllJal thrust of In:c techniques is the elicilalion of subgroups. Within these subgroups covarialcs are homogeneous and between subpaups outcomes an: dislinc:t. Tberefore, in clinical settings with survival oulCOnICS. inlUpmalion in terms or prognostic group identification is Ii1:quently possible. Cmdian of the subpoups ac:conling to a Iree sllUcture (binary recursion) mimics. alleast simplisticaDy, medical decision making: if the patienl is female, has a fiunily history of blaSt cancer and is tn'er 40. then annual mammograms are recommended. Similarly. ghoen a survival In:c.itisstraighlforwardtoclassifyancwpaticnltoaprognastic group by simply answering the sequence of 'Jes/no (binary) questions or splits that give rise 10 each subgroup or node. It is reasonable to assess whelhcr this goal of subgroup exlnlc:lion rcquira new methodology. Could not. for examplc.1he Cox 'R1pESSKlN MODEL (Cox. 1972) be employed for this JMIIPOse? Suppose. without loss or gencmlity. lhal in fttting a JIRCIIOR1tONAL HAZARDS model with three continuous amarialcs we obtain positive c:ocJ1icients for each: i.e. each
variable is adwnc: increased values of each an: assoc:ialcd with elcvaled risk. ThUs. we might try to create a high-risk group by combining individuals who have high values for aD three covarialcs. However. this approach may fail due 10 no patients possessing such a covariate proftle. Alternatively, we cuuld CXIIDpUle a risk score for each member of the sample based on substitution of the actual covariate profiles into the I.OO-lINEAR MOOD. using the ftued coefticienlS. 1bea a high-risk stratum could be obtained by selecting the desired percentile ofdae sample risk scora. Tbe difflcullY hm: is that individuals with potentially disparate covariate values are combined and hence the raultant risk. group is hard to label or interpret. In addition 10 identifying important prognostic poups. which can be thought of as local inlenldions. survival tree techniques can also be infonnative about individual covariales. this derives from single splitting (subdivision) being revealing about thrahold effects for time-indcpendeal covariates or change points in the case of time-dcpendeat covariates. Also. repealed splitting on a given covariaIC can be ~ing about more aJIIIPlex nonIincarities. However. usc of (smoothed) martingale n:siclual plals (l1u:meau. Grambsch and F1eming. 1990) is arguably a mare direct way for dclcrmiDing appropriate functional fonn. Further, tree methods in general are not gean:d towards making global assessments of a covariate's importance. This is for a variety ofR2S011s. First, if a covariate is used (to define a split) in just one bnuac:h of the tn:c. Ihcn it is problematic hying to gauge its overall importance. Second. masking whereby a cowriate selected as (the best) split variable IRcludes another. almost as good. cowriate from emerging complicates cowriale evaluation. (Splitting criteria are discussed later. these allow determination as to which covariate constitutes the best split variable. Most software implemcnlalions of TSM provide output detailing scvenal of the top awnpeting splits (not jusl the best). as well as measures of 0\"er811 covariate imponancc. The related issue or iastabilily is further discussed in dae conlexl of improving TSM pn:dic:tive performance.) Finally. covariate splilSare selected by optimising a split criterion and are therefore highly adapIboe. While SDIDe conesponding dislributioaal raults have been obtained. dit11cullies remain in assigning Significances to a sequeac:e of splits. and hence 10 formally appraising cowrialc importance. nae prescription for IJ'ce construction advanced by Bmman el DI. ( 1984) has &eI"\ICd as the foundalion for many extensions and refinements and therefore is worth dclailing. Their approach features four aJllslitueat components: a set of questions. or splits, phnlsed in terms of the cowriates that serve to partition the covariale space. A IJ'ce structure derives from the recursivcapplication of these questions and a binary tree results if the questions are binary (yeslno). 'Tbe su. groups crated by assigning cases according to these splits
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ TREE-sTRUCTURED METHODS(TSM) ~
termed nodes: a splil fUnction (or split crilerion) .(s. I) can be evalll8lcd ror any split s of any nadc I. The split function is usccIto assess the worth or the competing splits and a means far cletcnninil1l approprilllC tree size and statistical summaries for the nodes of the tree. The ftnl item defines what sort of subdivisiOlW are pcI'mitted - lhese ~ Ihe allowable splits. Binary splits IR ICllCl'ally used, mostly ror compulalional IaSOIIS. These have the fiavour: ·15 age less than 45?' or ·Is eihnicity Asia. black or Hispanic?'. The answers to such questions induce a partilion. or split. of the covarialc: space: cases for which the answel' is 'ycs' belOllg to Ihc earrespaadil1l region while those for which the answer is 'no' belong to the camplemealaly ~gion. 'I1Ic allowable splits satisfy Ihc following conslnlints: e.:h split dc:pcncls on the value of only a single covariate: for ordcmI (continuous or calcprical) cowrialcs. XI" only splits resulting from questions orlhe rorm ·JsXJ ~ c?' for c E domain (XJ ) an: considcn:d. Thus ordering is preserved (see the lirsl question): ror uDlJldcml CalcSarical pmlictcn all possible splits into disjoint subsets of the categories ~ allowed (sec the second question). The allowable splils are formulated in this fashiOll in order to balance 8exibility and interpn:tability oflhc fitted models withcomputalional feasibility. While variants and c:xteRlions have bc:c:n subsequently prumoted.lhis fonnulalion underlies moll inaplemeatations. GivCD a sci of allowable splits a tree is grown as rollows: for each subgroup or node: (a) examine every allowable split OR each pn:diclor variable and (b) select and execule (ClUte left and right daughler nodes) the best of these splits. The initial or roaI nodccomprises Ihc enti~ sample. Steps (a) and (b)~ Ihcn n:applied to each or the dalllhtc:r nodes and soon. II is this Rapplication IhalgiWlS rise to the reclUsiw ptlTlilio"in, terminolOSY. 'I1Ic dclcnllination of tree size (bow many splits). the third component of the pamcligm. is important yet complicaac:d - details are defc:rml to BRiman elal. (1914) and Segal (199S). Tbus.it~mainstodefincwhat constitutes a best splil; dais is the province of Ihe second componenL
Best splits an: dc:cidcd by oplimisil1l a split function tj(s. ,) that can be evaluated for any splits of any node g. For Jqn:ssion (i.e. conlinuous outcomes) Bmman el al. (1984) describe two possibilities: Ic:ast squan:s (LS). detailc:d laICr. and least absolute deviations. Let , dcsignalc a nacIc or the tree~ i.e. , contains a subsamplc of cases I(r,. )',), ~ %',=(X,I. XI%• •••••'tor) is the w:ctor of observed mvariatc values and y, is the observed outcome far the ilh case. LeI N. be the IGIaI number of cateS in g and ld j(,) = (1/N) Lie.>'i be theoulcamc awngc: for node ,. Thea thc.within-nodc sum of squan:s is givea by SS(g} = Liqtv; - ;(g»2. Now suppoIICa split s parlitians , into left and rijhl cIau&I*r nodes gL and gR' Tbc LS split function is .(s. g) =SS(g) - SS(&L) - SS(gaJ and the best split ~ ofg is the split such that t/J(S-•g) = I11IIllaeg(s. g).
where Q is the sci of all allowable splits s of ,. An LS regression tree is constructed by n:cunively splitling nodes so as to maximise the abo\'c t/J function. The: function is such thai we cn:are smaller and smaller nodes of progressively incRased homogeneity on account of the nonncgalivity of _: f ~ O. since SS(g) ~ SS(gL) + SS(g.) Vs. n is worth noting that a tree grown in aaxxdance with this LS split function will coincide with 8 tree grown usil1l a twosample I-statistic as the split function if the laller uses a pooled estimate of VARIANCE. Sclc:cti11l the split that makes the Rsultant I-slatistic maximal can be viewed as optimisinl node separation as mellMlR'Cl by the diffe~nce in Ihc rapc:c:live DUCIc 8WftlCS. Modiftcations to the split function are a primary means ror expanding the tlCopc or TSM. Several such macliftcalions have been proposed 10 enable handling of (censon:cI) survival outcomes. One suite of such split runctions is based on notions of bctwecn-nacIc separatiOll. analogous to use orlhe l-statistic. The log-nmk slalislic provides a familiar and readily implemCDted example. The n:sullant n:wardil1l of subgroups that an: internally homasellCDUs with Mprd to covariates (as imparted by allowable splits). yet externally different. dovetails with the objective of idc:atifyil1l distinct prognostic IRMlps. Furthc:r. usc of the log-rank statistic as a split runction allows additional accommodation of lefttnmcatccl survival timcsas well as time-dc:pcnclent covariates (see nME..oEPENDENI' VAlUABLES).
We p~nt an illuslndive example ofTSM with survival that pertains to HIV disease progRssion. The lalcacy OJ' incubatiOll period for AIDS (i.e. the time fram HIV infc:ction to an AIDS diagnosiS) is both long and variable. In orclcr 10 II')' to explain this variability in terms of immune funclion decay, markers of immune fUnction an: ~gularly measured on longitudinally followed cohorts of HlV selvpasitive and scroconverting individuals. In particular. counlS of CD4+T lymphocytes have been widely used both 10 follow the course of immune: function loss and 10 pn:cIict time to AIDS ordcath. Hen: we CXlRsidcr an addilional marker. delayed-type hypersensitivity (DTH) skin teslS. as a putative supplemcntto CD4. Usc of Dnt is math..ted in put by deficiencies or CD4: quanlificalion of peripheral blood CD4 depictiOll underestimates the severity or the HIV-induced lass of antigen-specific cellular immunity and proVides no guide as to which antigenspeciftc RSPOnses have bc:c:n lost. Man: scasilive mcasun:s of antigen-specific cellular immamity an: Ihcn:fore raauiral as an adjunct to the monitoring of CD4 counts. Testing of cutaneous DTH Rsponses to n:caIl antigens plDvides a clin:ct mcasu~ of cell mediated antigen-specific n:sponscs in ";1'0. Assessment ofthesc markc:n madcn:course to the Western Ausbalian HIV database. Palients in Western Auslralia with HIV inrcdiOll h8~ been lDIIIUlIed III a single specialist ~f'CnaI cenln: since Ihe ftnI AIDS case was CXlRftrmed in ENDIIOOOS
481
TREE-sTRUCTURED MEJHODS(TSM) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ AUIIISt 1983. BaIh HIV-infc:ctccl and at-rislt iadividuals were roUawed l'eJulariy. The close.y schc:cluled (bimonthly) visilS andearly inceplion orlhe cohan provides a aood oppaItunity ror marker evaluation. Further clclails on lhe cohort, markers. haadli"l 5el'OJRvaient (HIV posilive at enrolment) subjects in the context or survival analysis.lrUtinl markers as limedc:pendeat cowriales and complementary Cox proportional hazank .aulas I n given in Sepl eI QI. (1995). Results rrom a TSM analysis or the survival endpoint liInc-to-AIDS an: prescnlCcl in the IWa figures for ba:saructuRd methods. 11Ie loa-nmk statistic w.s used as a split f1mction. allhaugh results were insensitive to this choice. 1he cowriales used wen: lip. CD4 and Dm aU measuml at enrolment. CD4 is expressed as the percentage or Tlymphocytes lhat are CD4 positive. 11Iis is ia 8IXlDId with slUdies that. while DOli"l strolll correlations between this and other measwa (CD4 count. CD4ICDI ndio), round CD4 ~ to be the IDDISt prognostic ror time.-loAIDS and exhibiling the smallest variability on n:peated ddenniaationL The initial sample features 336 individuals, 76 of whom JJIOIressed to AIDS. as clcpic:1cd in Ihe uppennost (I00I) DDCIe. 1bis sample is subdivided on the basis ofCD4 4JL, the optimal cut-ofl" point beinl 11.5~. as shown on the emanaling bnnc:bes. In clctmniDing this split all possible cut-ofTpoinas GIl aU thn:e (aJlllinuous) covariales were evaluated; i.e. canespandinl log-rank statistics were computed. The selected split was maximal amOlll all these st.tistics. The 291 individuals with CD4 ~ exa:eclin& 1105 4JL I n also subdivided on the basis of CD4 ~ - cowriales can be used rqJC.teclly. Examiaation or these CD4-bucd splits afIinns the anticipated: the SUblfOUP5 with higher CD4... have superior survival.
Indeed. Ihe Kaplan-Mc:iCI' cum:s (seen in the sc:cond figure) for the respective terminal nucles (.a=lanIles in the ftrst figure) showcase dramatic differences between the CD4 c:xtmDes~ with survival prospactsrorthe IowCD4 graup(nodc 1) bei"l dismal in comparison with the hilh group (node 4). The 89 individuals with intermediary CD4 CJ. (ll.S < CD4
< 24.5) I n rurtherpartitiaaed. this lime on the basis of MH. The optimal cut-oft" point is at a D11I value or 6.S mat. 1he
second ftJlRagain n:veals survival di8'en:aces forthaseiD the two DTH-defincd subpaups(node 2 versus nacIc 3). Thus., it is passible ....t DTH can serve to IIUpIeIIt Co. Cj\. as a nuuker rOl' HIV PJOlR'ssion: rOl' individuals whose CD4 ..... wJues an: intermediBl)", raIher than exbeme.. additional prognostic inrannatiGII may be oblained from their D11I v.lues. It is iatemslinl to c:onInIsl the splits with IIIIDOlhc:d marlinple residual plots obtained fmm • null Cox pnJpOIIionaJ huanIs model. Such graphics are useful for infanning an approplialC covariate functional fonn (Themeau. Grambsch and Fleming. 1990). Each splitconrorms to a step fuDctian appraximalion of the smoothed martinplc n:siduals. which in turn I n sugesli~ of threshold ell"ects. or caune., TSM I n not wilbaut shortcominp. The primary deftc:iencies pertain to die interrel.ted conc:ems of instability. modest pRdction perfOl'l11DDCC and ineflicicncy in capturing underlying smaoIb response surfaces. Instability reras to Ihe mquentJy luge impact (GIl tree tapology selected CO'Variale5 aad cUl-ofl' points) thai can rault from small changes in data anellor iDpulL Tbis instability in part leads to relalively macIest pmliclion performance by way of the associlllecl larae pRdclion wriances lhat are exhibited when applyinl a TSM to independenl test dala. A number or so-called committee CII' ensemble prediction methods have recently elDClJCdthal impro~ perfonnaac:e by
1.0~~==___'t-._ " . .
...L....j
0.8
~ 1 -Node2 ..... Node 3 -Node 4
..............,. .......................... .............
0.6 II
0.4
L
I
I
.. -
"I
0.2 0.0
I "I
.. - ___..,....____..,....____......-_
I--_ _ _.....,~_I
o
2
4 8 Time to AIDS (years)
tree-structurecl method. KIIpIan-MeiBr curves: Iog-ranIc tree
8
----------------------------------------------------------------~~ miucilll Ibis yariability by way or sIralcgic cRlIlian and combinali_ of'(IDIII1Y) iacliYiclual TIM... Baginl. BOOmNO ancIlWIDDM f'ORES1S ale examples; see Breiman (2001). Nanelheless. by virtue of lheir lady iDterpn:tabiIiIy, TSMs a yaluable taoI ror a wide J1ID&C of biomedical problems. MRS
RIII_
......... L 2OO1:Slllislical . . .Uinl:*awocultan.s.SItIIglktll ~~ 16.199-215..~L,FrIId-.. J.R.,0IIIIeD,"A. .-..,C.J.19I4:CItm(Ikvrt_ _Rgn'.r_IRn. Bclmald: Wadswonh. Cos. 0. .. 1972: Rep:ssian IDCIdeIs . . life.Iablea (with disc:ussiDn). JOIIIfIIIl ~ 1M RtIytII SlllIillkd S«~". sma B 34. 111-220. SIpI, .L .. 1995: ~ the elcmcals or ~ SIIUcbncIIqIasiaa. SIiII",ktIIlhtlJDtls iR Malksl b . .clt 4, 219-16.SIpI,M. ... M.A.B. ..........S. 1995: Statistical issues iD Ibe evalUlIIioa of IDIIbrs of HlV pR)IIDsiaLllllmIIIliMtIISltIliltical Rniew63.179-JI7•.........,T. Me, 0 ........., P. M............. T... 19!aO: MIltiqaJe.baed residuals ror I1IIYiuI models. IlitNMtrilctl 77. 147-60. H. . . . . . . . L 1999: Rmnirr ptlI1iliDrlilrl ill .",111111 ~i6lU'. New Vcd: Spriqcr.
.....,L...........
a...
118I1Is graphlca This is an appmach used 10 examining bicWunensionai
~
in data by means of ane-, two-
ancIllua:-dimenslaaal graphs. 'I1Ie pmblem addRssed is bow abscrwdioasononcGl'IIICIRI variablcsclcpcnclon Ihe values of GIber wriabIcs. '11Ie ClSCllliai reature of lhe approach is Ibc multiple CXIIIIIilianing dud allows some types or puphic involviDl GDC or meR variables 10 be displayed sewnI limes. caeb lime as it appc:an when one or IDIR GIber Y.-iables . . particular values. the simplest example or a
o
tn:llis Inphic is Ibc coplol. wbich is a SCATI'EIPLOI' of IWO variables cxmcIilionc:cl OR Ihe values taken by a IhinI variable. Far cxample. the ftnI II&IR shows a caplot or mortaIil)' \/CISUS lalitucle concIilioDccl on papulation size. In this diapam, Ihe panel .. the lGporlhe ftgun: is kDDwa as Ibc "giftn' panc1~ those below 8M 'clepeadcacc' panels. Eacb ~11e in dac: pvcn panel spc:ci&CS a nnp or values or papulation size. On a canespandiDi clepcnclenc:e panel. lIKIIIDJil)' is plaited apillllllllilucle for IbaIe coualrit:s whose papulation siza lie in the parlicul. inlCn'al. To maIcb population size: intervals 10 a clcpenclcnce panel. Ihe laUcr 8M examinccl in anIcr rrom left to riPt in the boIlom IWI and Ibca apia fraIn left 10 richt in subsequent lOWS. The lIISGCiali_ bcIwcen billa values or lIIDIIIIIily _lower values of laIilucle (and vicc 'VCnIl) is seen to haIcI far aIIlcM:ls or papullllion size. A man: campicxCXlllllpleofa IR:llis smphic is shown in abe second fipIe _ pap 472. Here a lhrcIHinIensional pial or .....ity. latitude and JanpIUclc is given far four I8npII or papullIIian size:. Several GIber CXlllllples of the use of .Dis papbics an: giVCD ill Verbyla el Qt. (1999). BSE
ISee also SCA11ERPLOI' UA1IUCESJ V...,.., A. p.. c.DII," ",lCawml, l\L G.... W....., So J. 1999: 1'IIc . . . . or _peel ap:riIneats ad IaasJludiDaI data using aaaalbiDl spIincs (with clilcussian). Applied SltlIUJiC$ 41, 209-312-
trlMgU" test See SfQlIEN'IW. ANALYSIS Hast See S1'UDEXr5 t-usr 10
5
15
----..&1*------------------------------------
-
220
180
•
•
.'
140
• • • ••
•• •
• •••
I:
••
180
. .. .,
• • •••
100 30
35
•
• •• •• • ••
• •
140
•
•• •) • •• • ••
•• • ••
.,. •
• ••• • • •• • • •• •
40
I.atiIude
trelllsglIIPhlcs CDpIot of mDI1IJIIty WJfSUS lalitude oondIIiDned on population size 471
nMNA~YSIS
________________________________________________________
1POpulation: [8 to 4.71
•
1.
•
f120
i
1PopuIaIIon: 4.9 to 19.11
I I I I I I I I I I
eo 0
120
101
• 14
f.
•
II II II II II II II II II II II
I I I I I I I I I I
•
.:: J , I I I
•
I
1
\!HI
~ II I
I I I
.. I
II II I, II
I I I I I I I I I I
....
.. !
!.
1.
.QIO
.iii
I I I
o
101
Longitude
Longitude .-
72
I;
.
•: I
•
I~
I I
I I I I
I I I I I I
I I I I I I I I
I
1
I I
I I
I
I
:
III III III III • ,I
I • • • •
..
•
I" I"
1,1
..
I" • ~
72
I FqUdon: 0.810 1.0 1
1.1.
.
:: I I
" . $ ~:
•• •• • ,••
I
::
I I
I
#.
II II. II I II I
I. I I
!
I
I I
I I I
, .:. ,
I
~ I,
I
I PCijiUIIiiOri: 1.0 to 2.8 1
.. 'I. + I I I
I· .I}+ :
I'
I
I
I : I I
I
I I •
II.
I I I :
..
I I I I ::..: "I I I
II'
~::
I :
I I I •
ft
"I I
I
~:
: I I
:
I I • ~ .. .. .. "II
I I I I I
!
II
,1
trellis graphics Trellis graphic of mortality, lalilude IIIJd longitude contfIioned on population size
twin analysis
The .....ysis or diseases or Dther pIicaotypes iD lWiDs is aD impartanlloDl in 0ENE'I'lC EPJDI!MlOI.OOY.. Twins may be either monoz),IDlic (MZ), i.e. arisil1l rrum dIc same fcnilised ClI, or dizygotic (DZ). MZ IwiDS are lellCticall), ideDlical 10 one aDadaer. DZ twiDs. CIOIIlnriwisc, ~ genetically equiVDlent 10 siblil1ls and sbare OR avenge hair dleir genetic material. Since bulb MZ ad DZ twin pairs share abe same envil"OlUllCnt (at least in utero and early life) abey proVide a wcll-controlled stud)' design to evaluate Icnetic inftuences. For aDy phenolype abat has a geaelic component, MZ twins will tend to be: more similar dian DZ twins. MZ BDCI DZ may be distil1luished in sludies by QUBI1ONNA1RE (e.g. "Arc )'Ou like two peas in a pod?') or more objectively by
the Ic:aelic component, E is a component due 10 shaual CDvilOlllDenlal factors and R is a raiclual componcnL The propartioa ofdlc trait wnllllCle due to leaelic componcat G is called the heriltlbilily or the bail (often wriucn H). The genetic componc:Dt can be I'urther dccamposc:cl into an tltltJili~ genetic componcat A and a nonadditive (~donli ftGlK'e)companeDl D. An adcIiliYCaJlllpaIlCDl in Ibis sense is such thai the aloe rorany iDcIivicluai is tile MEAN or tile values in thcirlwo plRnls.lfthe propadions of the lnil variaDcedoe 10 addili~ and nonadditiw: leaelic faclors and sllaual eavimanaenlan: PAt PDaodPe. then thecxm:lalions between MZ ad DZ twiD pain will be givea b)':
DNA analysis. For quanlilBlive IrBils, the extent or c:onconIance between MZ and DZ lwins can be expressed in lenDS of inlmcllU~ ",rrelaliolU: Simple compariSDDS caa then be made using F-tesIs. Since these lcsts assume normality, lOme: Innsfarmalian or the trail may be Iquum.. Man: general analysis orlWin daIa is usually conducted iD the framework of a VARIANCE COMFOHENI'S analysis. Thus. we think or die bail X as being clcc:omposc:cl into a sum or
Puz - l/lpA + 1/41'0+1'&.
4
4
aJIIIpanenls:
X==O+E+R
W~ G is
PMZ=PA +PD+PE
Since twin studies only allow lwa COIUlEI..ATIDNS 10 be estimated. it is DOt possible ID estimate the additive.
dominance ad shan:d enviraamenlal components simultaneously. A common 815U11Jplion is that abe domillBllCC componellt is zero. so Ihat die genetic componcnl is pumy additive. Data 011 other relalives may allow all three componeolslo be CSlirnalcd. W~ necessary. oIher cowriIlles may be included iD this fiamewark. Age may be a particularly importanl covariate to include since twiD pairs are identical in &Ie.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ TWo-WAY ANALYSIS OF VARIANCE An imponant assumption in these analyses is that the s~d cnvironmcat component E is identical in MZ IU1d DZ twins. This assumption may sometimes be vioJalcd if certain in utero clTects are important. 1Win data for binmy InIits are oRcn expressed in tenns of the concordance rule. deftned as the proportion of twins of affectcd individuals who are themselves affectc:d. A hilher concordance rate in MZ twins is cxpccled for traits with a genetic component. ConconlalK:e rates may be cakulalcd in two ways. If the 5larting point is a series of alTected individuals (e.g. from hospital records). the ConcOnlllDCC rate can be estimated straightforwardly by identifying Ihose individuals with twins and calculating the rates in the co-twins. If. howevcr. the starling point is a ~gistcr of twins. the CCHICOI'dance ralc can be calculalCcl from the number of pails whe~ both twins are affccted (conc:ordant) and w~onIyonc twin is affected (discanlanl). thus: 2 )( 'concordant pails 2 )( II concordant pain + II discordant pain
The factor of two is necessary 10 allow for the fact that c:onconIant lwins may be ast'Crlained through either twin. Studies based on twin ~giSlcn can be seriously biased if concordant twin pairs are ~ likely 10 be identified. For this n:asaa. the bcsttwin data on disease come fram populatioDbased n:gistcn with nx:onIliqc 10 medical records. as are available. for example. in Scandinavia. Man: ~cneral methods of analysing binary twin data ha~ been developed. A common approach is tocxlend the ideas of LOOISTIC RIDlESSIOH or. for a chronic: disease, PoISSON or COx 1SIlESS1DN. Thc genc:lic: and shared environmental effects are modelled by including IlANDOM &FECI'S terms in the model in addition to any fixed covariatcs. DE
(See also 0ENEI1C EPJDBDOLOOY. GENOTYPE. PHENOTYPE) ....... D.J.. 8bbDp,M.ad CU..... C.(ccis) 2001: HIIIIdbook
11/statislimJge"rti('$. Claichcster. John Wiley" SoDs, Ltd. SIIIa, P. 1991: Statistics in hUlllan leMtks. Loadon: Arnold.
two-dImensional contingency table
See CCNnN.
OENCY TABLES
two-eample t-test
See S11JDENTS t-lESl"
two-sided tests In hypothesis tests we try to distinguish between chance wriation in a datasel and a genuine effect. We do this by comparing the NULL HYJIOTHESIS, which stales thattilem is no diffc:n:nce between the populations in which the cIaIa arose. and the a1temaaive hypothesis. which slates that ~ is. in fact. a dilTercace.lf no direction for the dilTercace is specified by either the null hypothesis or the a1ternali\'e hypothesis. we have a two-sided test. sometimes
~fClRd to as a two-lailed test. We are ~fcR IODkin~ for a diffe~nce but are equally inlerestcd in dilTen:nccs in cithu direction. For inslancc. when comparing a new treatment 10 an existing one we would be interesled in det~ difTc:n:nc:es both in fa\'OUl'of or against the new trealmeat.ln the majority of cases. the two-sided aJlcmative is the appropriate one as it allows for the uncertainty about the clin:ction ofan elTect that is often presenL Whctheror not to use a two-sided lest should be decided based on theclesign or the study. Unless the study speCifically seeks to c:Iclcc:t an upward or downward change determined in advance. two-sided tests should be used. It is usually assumed thld the P-VALUE reported from a specific statistical test is two-sided unless stated otherwise. Man: details can be found in Bland and Allman (1994). 14MB
(Sec also ONE-sIDED TESI'S)
BIImd, J. l\L ad AItmaD. D. G. 1994: Statistics naco: one- and ""CHided IesIs of sipiikanc:e. BritiJh Met/kill Joumal D. 248.
two-way analysis of variance
This is a tcstlo sec if the mean varies with either (or both) of two c:atc~oric:al facton. The o~way ANALYSIS OF VARIANCE seeks to partition the variation in a sample inlo that due to the group factor (the between-groups sum of squares) and the n:siclual variation that cannot be explained by a factor (the within-groups sum of squares). In a two-way analysis of variance. then: are two fadars that define the groups and each factor cxplains some of the variation. It is the~fore necessary to partition the wriation in the sample into thai due 10 factor A. that due to factor Band the n:sidual variation. The total variation/sum or squares (lotal SS) can be calculated in the same way as for one-way analysis of variance. 11Ie sum of squares due to factor A is the sum ovcr all individuals of the squared dilT~nces between the overall MEAN and the mean value associated with the level of the factor appropriate to the individual. The sum of squares due to factor B CaD be c:alculated in a similar manner. The ~sidual sum or squares is thea calculated as the total sum of squares minus the sums of squares due to facton A and B. If we wish 10 calculate the sum of squara due to the interaction between the factors. thea this is c:alc:ulated as the sum ovu all individuals of the squan:d ditTen:nc:e bc:twcc:n the mean for the appropriate combination of the factors and the sum ofthe meansassoc:ilded with the ~Icyantlevelsofthe individual factars less the ovcrall mean. ute the one-way ftavour. the two-way ANOVA can be pn:sentcd as a table.lffac:tor A has k levels and factor B basj Icvels.thea the statistics to test for fac:toreffccts arc praentc:d in the first table. The F-Slalislic: associatc:d with factor A wiD be comparai to an F-DlSlRISUIlON with k - I and
473
~IANDTYPEUE~
________________________________________________
two-way 111181ys18 otvlll'lllnce Two-way ANOVA table formBinfllcforeffectsorJly. EnttiBSinboidmustbecalculated tlreclIy from the dais; the otheI entdes follow in the manner indicated Source of vllrltlllce
Degrees offr«dom
Sums of squlITes
Mellll squllTU
Factor A
t-I
ASS
A MS=A SSI(k-l)
FactorB
}-I N-k-J+I
BSS RaSS= Total SS - (A S5+ B SS) TaUtSS
Residual ToIaI
two-way
N-I
B MS=B SSl(j-I) RcsMS= Res SSI(N-k-j+I)
F
AMS
P-vtzlue
RaMS
P
BMS RaMS
P
liliiii,..80'
variance Two-way ANOVA table whensninlersct/onf1ffect1s included. Entliesinboldmust be calculated dIrecfIy from lire data; the other entries follow in the manner IncIcated
SouTt:e of lY1rilmce
Degrees
Sums ofsqutn'el
MetI" sqrtQres
F
P-vlllue
offreedom AMS
p
Factor A
k-I
ASS
A MS=A SSI(k-l)
FactorB
}-I
BSS
B MS=B SSl(j-I)
BMS RaMS
P
(k-I)Q-I)
IDtSS
Int MS=1n1 SSI((k-I)(j-I))
t.-MS RaMS
p
Intcradioa
Residual
N-IlJ
ToIaI
N-I
RaSS= 1btaI SS(Inl SS+A SS+B S5) TotaISS
N - Ie - j + I DBIlEESClF FREEDOM.1'1ud associaled With factor B will be compaml 10 DD F-dislribution with j - 1 and N - Ie - j+ I ~ of freedom. If the interaction tenn is ~uin:d, then lite analysis is as shown in the second table. Rcac:man ellli. (2001) use lWo-way ANOVA IDcompR alcohol use in eight groups deftned by gender(k 2) and four levels of ecstasy use (;=4). With 69 people in the study and no interaction being consiclen:d. the lest for a gc:adcrelJect is anluc:ted by comparing lite F-slalistic 10 an F-cliSlribalion with 1 and 64 de,,"s of ~. For rwther reading see Annilqe and Bcny (1981). AGL
RaMS
Res MS= Ra SSI(N-kj)
mjc:CI the null hypothesis. when lite null hypolltesis is in reality true. 11Iis is termed a 'I')'pe I error, and can be
Type I and Type II errors
considem:l as a false positive resulL Secondly. we may obtain a nonsignificant n:sult when the null hypothesis is noItruc. in which case lite enor is called a1)rpe II enur. which may be consiclcRcla false nqalivc !esuiL A Type II enor thus mfers to a mistaken failure to ~jcc:t lite aull hypothesis when the allcmativc: hypothesis is true and ~ is a raJ difren:ace between lite &Iud)' groups. the 1)rpe J error raIc is no InCR ahan the so-called SIGNIFICANCE LEVEL. a FROBABUI'Y fRquently denoted by alpha. cz. Thus the significlIIICe level lCpn:sents the cluuac:e that the null hypothesis is ~jected when it is actually true. For every hypothesis test the signiftcancc level shoulcl be decided upon bcfCRhand; the typical value chosen for this is O.OS. Coasc:quently, over many llials. S 4JI, or I in 20 an: expected 10 yield false positive !esults. Sometimes. however. smaller values far alpha arc used in particular to help deal with the problem of MUonlU 1ES11KO (see MULTJPlE
TEST
COMMRISONS).
=
[See also ~ED LINEI\R t.IOOELJ Amlt . . , P..... Berr)', G. 1987: Slatinkol rMllrotis in nredical remJrdJ. Oxfald: BlackwcD. R - , L" 111.2001: EIfcds or dose. sex. .. long-term ahsIaatioa from use on toxic dl'CCIs or MDMA (ecstasy) on laia seRJtania neuRIIIS. Lmwl 358. J864-9. After every HYPOrIIiSIS the decision to accept or mjecl the NUlL HYPOrIIiSIS is made. This clc:cisiDD can. however. lead to lWO possible enors. First. we CDD obtain a significant laulL and thus
The probability of making a'l}'pc II cnor is rcpn:sentcd by tbcGrc:ek Iellubeta(/J).11IeTypcIl enoris cJosely related 10
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 1YPEJ. .
,~~~:
llee''''' ......·IIIiUI_"I'IIISJ A>; .. '8, .. '''Ii ~"")ir,~PuiInlr. ~a.j;W,~a:'._ ,
". "".
.
,
'.
.4.,.',
u uncle test lor randomisation
See EI1IICS AND
CLINICAL TRL\LS
unlfolill dlatrlbutlon This is the ,.,.,.0 qy DISIIlJ. BU'IION wIIen=by •
auICOIDCS (wilbin a lANaI) a'O equally
lilrcly. 1'IIem is a dilCRle unifana dislribuliaa and. ~ aous anaTann dislribalion (.... bown u Ihe RICIanIUIar distribulion), ... we caasider Ihe discn:~ ftISion lint. If a ......... variable ... a cIiscn=Ie unlfarm di~ on the inlepn flam 1110 II + 6..... we ca. wrile Ihe probability IIIIISS
ftalcli... as:
1 P(X-x) = - 6 ,tlSxSb +1 Ex...... in Ibis way,lhe dislributian ..... MEAK 01 II +In. ad • VAItIANCE or (11 +211)112. As allIIIOIIAIIUI'IES an: the ..... it is or'COIIIIe symmcIric about ......... Thccliscn:lc UDiloma elillribuii_ is .... ia uscsainI DIDII' 1'RIIIIBICIi. If cIuraIioIas or apaatiDns an: bcin& n:canIccI to the ___ miaUle.;and . .opcraIions typically lab bctwaea two and falll' haan,. Ihaa it ......' be ptaUIIIDIllhaI the distribution of Ihe IenDiaaI .pt or the minutes wauId be appnWJaaIeIY _farm flam 0 10 9. The obserYcd .lIls c.a .. ClIIIIIPIRd to . . uriifOnn distribution to deled all)' lIAS. 11M: IDDIt COIIUIIOD IBe or the ..irona dislrildi... i. the mcdic:aJ scicDccs is perhaps in ICIIeIBIiaI ...... scqumiccs
ror CIJIIIIO\L 1IIW.S.
ThepmbabiUtyclcnsityl'uncli... or"CXIIIlilluDusuaifann distribution appears as • ~ on a papb (1K:acc its
&Q..,...wu.,. e_..... ,.".., C> 2011 ....
allaDaliws ___). Since Ihe am. or Ihe raclaD&le mull be equal to I,.the value of Ihc delisit, f'uattion is dcllacd by aha ..ap or plausible ab5a\'Idiaas (10 keep aha an:a or Ihe rectaar:Jc caDIIIuIt, the ..... is delaed by the wiclth). 11M: probabdil, density ftmcIioII rar Ibe COIIIInIlDllS unifann disllibutiaa" is:
.. Ibis 1'onnuIaIiaa. the mean is (tI+6)12 and Ibe wriaDce is (6 - 11)2/12. As willi Ihe disI:NIc unil'ann distribulion.lhe diSlributioa is s)'llllllClric: ..... Ihc: __• Hale that when 11-0 and 6-1, Ibis is • spcdal case or Ihc: IIEI'A DISJ1UIUIIDN.
11Ie cumulatiw distribuliaa ftmction maps obscrwllliaDs rna • clistribalion 10 pnJbabiIilies tIIat sIIauId be uairannly diSllibulal beIwcca 0 and I. ConYcnelY.. :Ihc:. iawne or IIIe cullllllaliw distribution ftuM:Iion ..... uairann obsc:mdioas between 0 ad I to obscnaIiaas ftam Ihc: dislribulioa in qucstiDn. The Intpnlpl:d)' pnwides. useruJ chec:kor.... II. The IICI&XJIICI allows us to pncnIe raadam obscnatiaas rna II1II8)' dislribulioas. if we can pncnle randDm UBi......, cIisIributccI data and write aut Ihe iDvase or die: c:umulali~ clislribuii_ ra.:tiDa.. Far t'uItIa readiaa see AItm.a (1991). AGL Ahus, D. G. 1"1: Plwlktlllltll&lia ftJr "... N.II'tIId. LcaIaa: n.,... a ....
SMlirIi&s: IftYIIItI &Mall MIld by . . . S. E...... CIuisIapIMr" ......
ilk Sa& ....
417
v RlIclIty of acaIea 11IealllCcpllofqualityoflllliClSlllClll by I1IIiDg scales and awlli!EaJc QlBIIDNINB an: wIidity and n:liabilily (D: IEa\SUltBl!NTPIIIIlSI)X AMDIIBJAIIJIY). A IDling sade is wliel ifillllClllUl1:S willi it is intcIIIW to ...... in . . spacific 1IUdy. 'I1Ie YaIicIiIy of ..bjd~ IIISI:IIIIDCIU is n:1aIivc,
spec.
llUdy and C8IIIIIII be cIdcnninaI abscJbDIy. 11Iaer. . then: ale variauseaaccptsorYBlidity~ cada addIasinc. specific type of quab.., ___lent. "die IIIIIiD an:cpIS critaion. con&IIUctaadcallmll validity. but . . . . llUlllbcrofsubcaDa:1* usal. The meaning ollhc&c: CDIICCpIS is IIDl uniwx:al and depends an appJicIIiaIs and n:scaIICh ........ Crilm.r
-=
-=
WIiitIU)'n:fca1D ibeaad'amlity or.1I:aIe lOa InIc I11III: . . . . .
standanI. and dc:paIdin&on lhe JIIIIIIOIIeoflhe sIucIy I1Ibcaac.-ct* like cIiDicaI, pndcIM and caICUIIeIIt wlidity wiD be: ..... "1hc:~af.InIc"GI'a&Olcllllaildanl,l.WUIrudWllili".
n:falillllD lhe CDIIISislcncy b:twa:n s:aIes
_vinc the _
thecmicaJ cldinilian. and subcanc:cpr.s likeCDlMrplll. clra:ripfiIttariaI, II'DIIaIion vaIicIty and ,...Del n:liabilily have becD used in . . . . Dilcriminllivel1llillgsc_ to distinpish bctwa:n individuals ar pIUpI; when DO extanaI c:rilcrian is awiIabIc. dbmmiRtIIIl ralidity is to lie 8I8a:1IICd. AnIleI .aabilily a:fen ta lhe ~Iity ofscalcs. The CXIIICCpI alnlm' 1'111itli,>'''ers to die coaapIetcnc:ss or lhe scale ar malaisealc questiommile iD die cowncc of importanl aras. Subcaacepts lib race. ecolopcal, clec:isioD, tiw:., cIiYapIII.
-= ...
COIIICIII1I8I, sampling Validily. mnqnhc:asiwmlS and rcasibilily ha1e been used in studies (S---. 1993). AsscssIilClllS an RmNOSCAIBpaendconiinaiclala "villi rank-illYarianl pmpatics only, whic:h mcanslhat the respaIIIICS indicalc ...... onII:raadaala IIIIIIhanalic:a value. Then:sullS
or slalislic:aI lrellllneru or data mlBl DIll be chaapd when relabellinl the arcIc:mI rcspoIIIII:s (sc:c lANK INVARIANCE). AppnIpiate Slalillical mdbacIs rar e_ _an or crilcrion and COIIIbucl WIlicIly oftc:a n:f'er ID the on:lI:r consislcncy . . to Ihc rclaIianship bctwcc. the scales or comparisaa. The 5CA11EIJIIJJI'or low back pain pen:ciw:d by 48 paIiaIlS widaallcast4yeaJ5_paiD hisIcxy made. aVBlW.AN.WJOUE 5t'ALE(VAS) aad.velbal clcscriplive scale (VDS-5) havin& the five clllcpJrie5 ofpen:civcd pain. 'IIDIIC', ..,.Jipble', 'modcnIe', 6ndllcrlleVCM', 6YC1Y sevcn:'. issbDwa ia the ftnl ftgun: (SYCDSIOII ~I til., 2009). Asevidall hm the plQl thc:mis Iarp cwcrlappil1l; c"," paIienIs "villi 'iIIDCIcndc paiD' _ die VOl used VAS posilionsr..... 28 to 73,anclthc: _ VAS padlians wcre"'by palienlswilh "rathcrsevcrepairi' .1hcpmpDltion ofcnalappi. pain IIDIOIIg • passible dill'cn:al pailS of clara defines the _un: or clilOldcr D. D equals 0.07. which .....111.7 \\ ofall passibleCDillbiMlionsof dill'cn:nt pairs are disanlaal. The expected paUc:m or compleae onIcr consiSb:Dcy, die rank-lnIIIsf'OI'IIIDbIc paIIcm of ~ (InPA). is conIInIcIecI by IJIIirinI off die lWO Ids or distributions of data
""isease
VAl ".VDI ......1n (n.4I)
5
•
4
•
•• • ••• • ••
••
............. •
•• •••• • ••• • • 2
•
1+-4-----------------------------------------------__ o 100 VAI_
nlldItJ of . . . . . The disItI1UIIonolpa/trJdassessmenlsoIback,." Dna vlsulllanaloguescllls forpein (VAS"..,) and. tiv8-point 'I8fb8I ~ scale for pain (VDS-5 pain) ~ C....... IOM. ., SItIIima:
«> 2011 .laID Wiley Ilk SoIII. ....
J«VIIItI EdiI_ &lied by . . . S,lEveritt . . CIuisIaphM' R. .....
471
V~NCE
_________________________________________________________
&pillll each other. The ~ or disanler exJRSSCS Ihc absc:ned dispenion of pairs Iiom allis order CClDSistcnt diSbibulion or interchangeability between the scales. The cut-olJ" respoase wlues for inlcncale calibndiOD 1ft also provided by Ihc IttPA. and il is obvious thallhc:n: is no linear cxm:spaadenc:e betwccn VAS ancIdiscn:tc scale assessments (see dae second ftgurc) (see RANKS. IW«DKJ 1IIIIOCfJ)lJIES) (SyCftSSOlL, 1993. 2000a. 200Db).
Ji
4
1.8
e3 21 • 0,
•
.....
; ==
E
100
0 V.paln
VIIIIdIty of acales The fIIIJk-transfonnlJble pattem of agreement (RTPA), uniquely defined by the two sets of frequency dlsttibulions of data In lite filSl ligure The~an: GIber mc:asura lhatcauld be applied toewluation
DcpencIins
011
Ihc
puIJJIJSC. 5III!AJIM.o\N"S IWIItC'CJRREI.A'llDNtvEffk!lENT. Oaadmm-
KnIsbI's pmma (see EvalD. 1992) and KmdaII's tau (see CCIUIB.A'I1ON) caulcl be suilable. Spcannan~s rmak-ardcrcam:IaIioa cocllkienl is a commonly used nonpanIIIIClric measure of ABX1A1IDN. Howew:r. a IIrang assacialiaa docs not neeC5Sal)' mean a hiP level or ana consisleney. aad docs DOl indicate .... two scales 1ft i.acn:ilaDleabie. TIle Peanaa carn:Jalion eaefticieat (see aIIIII!I.ATJON), Sl'UDENl"S 1-11!ST and Ihe MW.YSlSOFVAlL\NCEan: aIsoCOllUllDD in yalidity sbldies. A serious clrawllaclt is thal these methods assume nonnaIly dislribulcd quanlilali\'C data (see NORMAL DI5IIUIIUI1ON). and such mauin:mc:nts 1ft nat met by data flam raling scales (S~1IS61DD, 200(1,).
ES
t.YIII'.ncr
E..... 8. S. 1992: Tlwtllltlly.is 0/ 'tlhlt~ 2IId alitian. lAaIDn: Chapman a: Hall. lYe-. Eo 1993: A""'* of syslemslit _ rtIIIIlonr tliJ/BnKYJ WItIWnptlim#ortlintll",trgtwit.Yll.,s (diacrtatioa). OCifebaq: ~ Univcnity. 5 1 - . Eo 2OOOa: CaInpariscJa of the: quality of IISICSSlDCIIIS u_ CODliauous and diSCRte ordiaaI lllinl scalc:s. Biomrlrictll Jollrlflll 42. 417-34. s....., Eo 200Gb: CGacclcdace bcnwcn ralinp usinI dift'Clat scales for the same variable. Stalisticl fir M_ _ 19. 3483-96. S...., Eo, ScItIIheta ... N,--, B. 2009: 1be baIancccI imaIlOl)' far spiaal disonlers.1be ftJidity ora disease specific quc~ far evalaatian or outcCIIIICS in palienls with ftriaus apiaal disanIcn. SpiM 34. 1976-8].
a.."........
(Xi-x)l II-I
(See also COVAJlL\NCE MATRIX)
,
or scales.
which" is the number or observations. i takes values frona 1 II and Ihc % notation denotes the sum. i.e. (.~I_.t)2 + (.1'2-x)1 + ... + (x.-.t): to
and~lftusedtoinclicalc the variance. TechnicaDy. Ihc former n:fers to the variance of Ihc sample and die lallU to the variance of the population. which is being eslimaled by the sample and is lIUIrIinaily smaller. since the divisor is II instc:ad or ,,- 1 in dae fonnulL Whe:n quoting a mean Ia sununarise clatL it is also custGlll8l)' to quaIC a sample standard clevialion. This is the squaIC IVOl of'the sample variaac:e. and is in the same uaits as the: laW data. SRC
---- --
of various kinck of' validity
The variance is Ihc squan: or the STANDARD DEYL\1ION. It is calculated using the following fonnulL in
Balhr
ArPA VAS VI VD&5 back pail
5
variance
variance components
See COMPOJENTS OF VARIANCE
varlogram 11Iis is a procedure .... pruvidcs a descri~ lion of the aulOcorreJaaion (see CORRELATIOK in Jine series ... spadal clusters. II is the Jatter thai foms the rocus far dae following acCOUDt.1t is ilapodanlto describe and model this aulacDl'lelalion so as 10 incorporate il iDlo cslimalion and prediction prucedura. Far example. consider disease incidences IIIC8SUIaI DI s.-lial Joealions. 10 coastrucl a map. one would need Ia ink:IpDJate the incidence value for die locations at which it was nat obsm val and in the absence or lup-saaIe spatiallmld such predictions should Ii~ larger weights to nearby locations ir Ihc autocOlKlatiaa WCR inm:asiq with dccmIsing distances. n.evariopamisbasc:clonlhesemi-varillDCey(x.Y;"Nt"~).
which mealURS half the wriaac:e of die diffen:nc:e belween lwo yaluesofaaoutaJlne.Z observecIatlwospaliallacalions n:f'en:aced by the spalial coanIinDlcs (x. y) and (.~ +,,~ , + II,,). Slriclly speaking. the theoretical variognun is defined a twice the semi-ymanc:e. i.e.:
2Y(X.)"ih.\".",) == ~[Z(X'Y)-Z(.~+"%'Y+h~.)]21 but the sc:mi-varianee i If is usually ~renallo as Ihc ~ gram. It n:pn:ICDIS an ill\'ClK) IIIe&IIR of' Ihe statistical dc:pendency or Ihe wriablcs allacaIiaM (x. y) and (x+"h"", y + 11,.). In all generality. tile yllliopDm is a ftlnc:1ion or both the locatiaa(.~.y)andlhcdislaaceanddin:clion(hln"..,)·Hence.1a ellil1llllc it. rqJIicalc obscrvatiaas aI each loeation wauld be nccded. In praclicc:. only aae sueh Ralisalioa is available. 1b oycftXJIDe Ibis. the inlriDsic hypaIhc:sis is introclucal, which makes 8I51IIIIpIions about Ihc diffe~nce %(.1'. ,)-Z(x+""", , + "~).It slab:s that for the spatial ilia under inYesIigalion (I) Ihc expectation or Ihc ditren:ace is zero. i.e. thallhcn: is no
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ VISUAlANAlOGUESCALE spatial ~nd. and (2) the "ariance of lhc diff'en:nce depc:ack only on Ihe dbtance vc:dDr (Ir.a. h,) and nat the location. Far variognuns that reach an asymPtote (so-called bountlet:l yariocrams) this b)'pDlhcsis is equiwlc:nt 10 assuming seoand-onler slalionarily of the me8sun:s themselves. Under this assumption it is possible to estimate the wriopam from the cia... For simplicity. further assuming thai the Yarlogram isisolropic.. i.e. that it only dc:pends on thedislaDl."c. h. between Iwo locations and nOl lhc direction of (h.... h,.). the variogram can be estimak:d by the empirical yariasram: y(It) -
21~(Ir)I~(Zi-Zi)' =,
•
t.5
I i c:J
•
•
•
• •
t.O 0.5
(J)jective = 0.2504
0.0 0.0
0.:5
1.0
fed into a pmliction routine to specify the intcrpo)ation weights (sec Lawson, 1998). alIhough simultaneous maximum likelihood esdmadon of the variogram panuneten and possible trend parameters is considered preferable (sec Crasie. 1993). SL CNsde, N. A. C. 1991: Sltl'islit~ for
1.5
Dislance varlogram Spherical valiogram model (CUfV8) fitted to an empirical vanogram (open symbols} by optimising an objective function
spill.' .'tI. New Ym:
John Wiley &: Sons,lDc. a..WIOD, A. B. 1998: Statistical map. In AmIi., P. and Calton. T. (c:ck). EM}'£lopetiitl of biOJ/tllislitJ. Cbiclxsla': John Wiley &: Sans. Ltd.
velocity charts
when: N(h) is the set of distinct location pails (i,i' thai 1ft distance Ir apart. W(h)lthe Dumber of such pairs and and ZI the abserwd values at these locations. 10 achieve n::asonable numben an estimation is carried out at discn:te lap and distance bins 1ft allowed for. C~ has to be laken when choosing lhc number or lqs. 1q increments and bin widths (see Cn:ssic. 1993). Since the main goal or variopmn analysis is to rmd a parsimonious clesc:riplion of'the spatial aulocam:latioo slnJctureof'a yariable. a wriognunofa particular functional farm is usually lilled to the empirical wriogram. Suitable m0notonically inc:Rasing ftJncIions f«bounded wriognuns an: defined thruup tJua, panunelers: the IIIIggel rjfecllqRSClllS microscale variation or measun:mcnt error. the :lill I1:plalC8ts the YarillltlC of abe oulcome measun: and abe rjfeclive ronge is the dislance at which autocom:latian becomes negligible. For an example, the open symbols in abc ligan: show an empirical YBriogram to which a spherical variogram function (a particular choice of func:tioaal fonn) was liued. The curve is fully described by the parameter estimates (effective range =0.8, sill = 1.75. nugget =0.7) and indicates small-scale spatial autocorrelation up to a distance of 0.8.
•
Once a variognun function has been identified and
ftac:cl this runelion is usually considered as known and
See OIIOWTH CHARTS
vlMial analogue scale These
are scales used 10
mcaswc a subjective assessment. such as 'amount or pain' or 'level of anxiety'. particularly when the assessment is believed to lie along a continuum ralher than only laking a discn:te sct of values. The item consiSls of a line. typically 10cm in length, with lowat and highest valucs indicated by IabcJs at each end. The subject is expecled to pJace a mark on the line to repraeDl his or her assessmcnL An example follows (with a ClOSS indicating w~ a subject has placed a marlt): How much pain do you feel?
No pain 1-1-----:)( ....---11 Unbearable pain The data from a visual analogue scale are n:cordcd by
measuring how far along the Jine from the left end the subject has placed a mark. It is important to n:member dlat. althouP it is possible to reeanI the data from a visual analogue scale with great accuracy. lhc value is very subjective. In the examplc., the subject's n:sponse may be RlCOrded as 1.1 (because it is 1.1 em from the left-hand end of the line), but then: is no objective unit an this value. If another subject records a value of 2.1, it is not necessarily the case that this subject cxpcrienees mon: pain than the rust subject. However. if the lirst subject is mcasun:d again (say. after a month) and ~ives a SCOR: of2.1. it is possible to interpn:llhis to mean that the lint subject is now experiencing moo: pain than previously. It is also importaat to n:mcmber abat a visual analo;ue scale is unlikely to be linear. For example. a distance of 1 cm at one end oftbc scale does not necessarily rcpn:sentlhe same diffen:nce as a distance of 1em allhc oIher end of Ibc scale. This cautions against Ibc use of SIandanI methods for continuous dais; a common m:ommcndaIion is to analyse ranks of the sc:oI'CS rather than the raw scon:s. Far further dc:tails see Altman (1991) and Stn:incr and Norman (1995). PM AItaBI, D.
o.
1991: I'rtIclital sltllislitJ for mN/ital resttUclr~ a Hall. S.......r. 0. L. 8IId Nonua. G. R.
Landon: Chapman
481
~Na~~~~am~fi~GMI§£
________________________________
JIIE'''''' ,..
I"': IIMIIh ........., 6CltIIa: II 10 '-ir dllWitJpa.wlllllll. .,2adcditi.. 0atanI: 0xfanI UIIiYcniIy PRa.
Von Naumann II......... atancI8rd gamble
...,dIad of .....mill pn:fCIeIICCS in balth CCIIIIIDmic:s was ftnt pn:ICIIlccI ill \'OIl Naum.a .... Marpr.. SIan (195~). 'I1Ie method _laJpaIhJ:tiailallerics as a means or IIIeaIIIriDI people's ....c:n:IICDI whc:a facccl with a c~ IM:tweaa II'eatIIaII thaa oIfeD poIadiai IleDeftt iD quality of life (seoQUAUlYeJr LRNIASUREIIEXI'), but willi the ....40' Ihat Ihcnt is a ftaile paaitiilty dud . . paIienl will' not surviVe _ _ _L All iDdivid_ -PI be asked 10 eta. between IIac: ccrIaiIIly of survivinl ..... IUd paiacI iii a particular 11_ of ill health and a pmbIe betweea suni\'iq far ilia I8IIIe period Wilhoul disability.. on Ibe GIlD ...... and illllllaliale deaIh.. GIl Ihe CIIher. 1hI: ....-...m or aDViYiD& wilhaut disabilily, as oppcIICCIlo dJins. is . . . ¥ariaI .tiI die pcnoD shows lID ....CR111C8 IR:awa:a die amain apIic. aad die pmbIe. 'I1Iis pnlbabiBty Ihen cleftnc:s the uta1.y of _ individual f... the disablalllalc bclwcca 0 and 1, whose ENDPORftS ~ death and pafcc:l . . .th. 8ecmIse few paIienIs ale IICaIItaII1ed to dealilll ill pmbabiUIica. .. allaIUdive ......_ ailed the time trada-oJ1' technique is oftca I;DllClIed. 1IIia bep_." eslimalill& the
'I1Iis
classiC
likely nmiIIiniD& ycaDI of Ife far .a 1IeaIth,. subjccl, usia& adIIariaI bibles. and dian the roDowin& quCSIioa is asbcI: 'lmqine liYiIII the mnainclcr of your IIIdUnII .... (an Clllimaled IHIMIIar 01 YCIIIS wciilld be i~) in your pRICIIl 1taIc. CaaInst this wi'" the alranative. dIIIl yau IaIIIin in pedb:llaJth .... rawer ,eus. ~ IIIIIDJ. YCIIIS wauIcI yau IIICIiftc:IC ir,au could have perfect bc:aItb? .AD aa.p]e or .. use~lbelllladanl pmbJeappnlK'b is &iYcD in PeIIaa and CIIIIIPbclI (1997) who use it 10 cIliIaatc ulilities .... rap 01' Ileal... stales ill coIanK:IaI can:iDDDIL 1'Iieywcmllbte·lOdcmanltralelhatthequalityorlirebcneftts of lIabililllli_ iD the IrcalnI6DI or adYaated IIIIdIIIIalic coknctaI caaccr \YCIe ndccI almost u biPIy u Ihase of. partial mspoase. 11Iey ... abDwaI ... die bene. of ~ a drq licensed ror the IIalIDcnt or ~ coIarcdIiI CIIIIta' in paliellls who had rldled an Cllabliabed 5-FU-contaiaiDI . .DXII,CIUlwci&hed Ihe shad..... impact olloxicilyin thosepatimats who aahicvcdatlcast.lllbt1jsali_ of their disease BSB
N.-. s. ... 0 ...., N~ 1997: SIIItiIisaIiDn in caIInc:taI ClllCef.lrllmltltiMtIl JllllffllllII/PttIIitnirr N11ni1a13~ 275-a V. New....... ' ..... ..........., 0. 19.53: ,..., tJ/,.,. tIIIIl . . . " . . _ _ _.l"Cw YaIk: .JaIua Wiley a Salls, lac.
w washout pedocl
fuDclions iDelude the MarIet, Mexican hal aad ftrsldcrivalive
See CIOSSOVI!Jl ftW.S
of a Oaussian densily function (top lOWoflhe ftg~). Noliee
wavelet analysis
'This is
a method of n:pn:senting
a function by projecIiDg ilon toa collection ofbasis filaclions dc:riwd fl'OlD a siqle wavelet (often n:fc:ned tou the rnoIhcr waveld ;(1». All basis functions (wavelets) n:quin:d in the analysis an: translated (shifted) aacI dilated (lIn:lcheci) versions of the maehcr wavelel. Unlike the Fourier lransf'orm (sec Sullba Rao, 1998). wha5c basis functions an: derived from sine aad cosine waves with penislcnl oscillalions. the basis func:lioas fOl" the wa~let Inmsfann an: DCHmmJ and oscillalc ror a shan interval. As a rault. the wavelet ..... fonn simultaneously localises information from a function in both lime and fn:quency. For funcUORI with lilDe-varying charactcrislics or sudden changes. die wa~lellnlnsfann has proved quite useful. Two main "avaurs of the wavelellnlllSfann an: the 4XIDlinuous wa~let lransfonn (eWT) and the cIiscmc wavelet transform (DWT). 'I1ley differ by the fact thai the lransf'orm wades with continuous or discn:te translations and dilations. n:spec:lively. of the wavelet function. The CWT is a highly ~dlD1dant lransfonn with the family of wa~1ets being compuk:d via f/I(III + b). wilen: II and b an: mal numbers. .. gc:aeral. the numbcrofwavelelcoefficicnl5 is muchgmder than the number ofobscrvalions. Pbpularconlinuous wavelet Mollet
Mexican Hat
Fat Derivalive of Gaussian
Haar
~s(Ie~4)
DUJechies (length 8)
-
wavelet
that all the wavelet functions oscillate - i.e. they ha~ bath positive and nepli~ values - and the Morici wavelel is complex valued (the mal and imagiIWY portions an: ploucd using ditTcn:n1 line lypes). The DWT uses a waveletlhat is lnnsIated and dilated by discn:le values. i.e. ofb form 1/t('¥1 +It¥) wherej and Ie an: integers. Theparamc:terjisaIIDIIICIIIlyn:fenmloulhest"tlle. The DWT may be an orthogonal or biorthogaDallnlnsfann depeading on the wavelel function. Popular orthogonal di~ cn:le wayelc:a functions include the Ha., and Daubcchics families ofwavele15 (bottom lOW of the figure). Although the disclele wavelet funcUonsdisplayed look continuous they an: derived Iiom two. four aad eight unique yalues - from len to rigilL NoIice. the discn:te wa~lel functiClDs an: not symmetric.. exCICpI for the Maar wavelet. and much less smooth when compan:d to aJllliDuous wavelet CUnclians. A compromise between the CWT and DWT is partially achieved by usiq the IImtslatiCID invariant DWT where a discn:lc wa~lel function is appiiecllo all possible integer slliRs ~r the calli in time via f/I(-rl+k). This n:suks in a redundanl
-.....,..S
Continuous and dlsct8te wavelet functions
£trqd""..c CfNHJMIIiM I. IIlId"lml S1.BIb: S«wuI Edit_ Yac.I by IIriu S. Evt!ritt and ChrisIiClphec' R......... C 2011 .folD Waley a So-. ....
WEB RESOURCES IN MEDICAL STATISTICS _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
1be DWT is mosI commonly used in nonpIII1III1Ctr rqRSsion (see HucllOn. 1998) (usinl a technique known as "lIVekt t/enoising). The key conlribution of thc wavelCllnlnSronn is thal bath smoolh and abrupt changes in the signal will rault in a few large wavdetoacmcients while the noise wiD be dispersed throughout the catiJc vectCX' of wa\"eld coefficicolS. 1'hus, thn:sholding all wave&et cocllicienls wiD eliminate Ihc noise and ~ the rcahRS or intcn:sL Wa\"eld dcnoising has bcca adapIcd to cases when: the noise exhibilS au1ocom:lation and may be non-Gaussian (e.g. POisson distribulcd).lt is \'cry importanl when performing a wavelet analysis to select Ihc wavelet function (disclde CX' continuous) that best matches Ihc reatlftS ~nt in the function or interaL This will concenbale the function into a small number of wa\"elct coefficienlS. simplifying the analysis. The wavelet transform may be applied to data or several dimensions, e.g. in time series analysis (onc dimension). imagc analysis (two dimensions) or spatiotempora) analysis (three or mCR dimensions). For rurther details behind the theory and applications to medicine and time series., sec AIdJoubi and Unser (1996). Chui (1997) and Pelcival and Walden (2000). B'V AInJuIII, A. .... UDIII', M. (eels) 1996: Wa,,,/etJ in mtrJidnr tmd biology. Boca Raton: Chapman & HaWCRC. a.aJ, C. K. 1997: \VD'.!.: Qmallrtrnlltkallool/or signa/analysis. SIAM moaopaphs at maIbemaIicaJ modeline and computatiaa. fhiladeiphia: Society for
HazeI_
Industrial aod AIIPIicd Mdcmalics. M. L IWI: Nonparametric rqn:aion.la ArnIitqe.. P. and CoIICIL T. (eds). Encytlopetlia of biostatislics. Clad1esur: JolIn Wi Icy & Sons, Ltd. PerdvaI. D. B.1IIId WaIdea, A. T. 2000: Wawl methotls p time series DIItIIylis. Cambridge: Cabridge Vni\'mity PRss. Sa.... ~ T. 1998: Fast Fouricrbaasfum.lnAnnitap:. P.andColIDn. T. (ed!o) Ent:}v:loJlftliaof biostatistics. Chidlcster. Joim Wiley" So.... Ltd.
web resources In medical statistics The growth of the internel has revolutionised access to information for c\'eryone from academics to the general public. The potential uses or the internet in the arcs or medical statistics are vast and here wc can only proVide an overview or these uses as they currently stand. As the internd continues to develop. its potcntial for use in this an:a can only incn::asc. Intcmct raouKleS in medical statistics can be grouped loosely lDIder the rollowing headings: SIOWl:eS of routinely collected data; n::rcrence (online encyclopaedias. dictionaries. lecture notes); cmail discussion lists; statistical software. reviews and downloads; c-journals; and datasclS for usc: in teaching. For routinely colleclc:d data. the World Health Organization (WHO) website (www.who.intlen) is a good place to start. A number of databases are available to browse. including the WHO Statistical Information System (WHOSIS). ClOIItaining national statistics on mortality. morbidity. risk ractors. service coverage and health systems. and a Global
Health Atlas. containing statistics for inrectious diseases at country. regional and global levels. Many counbies have their own wcbsites rOl' routinely collected data. c.g. National Statistics Onlinc for the UK as a whole. which incorporales health statistics (www.stalistics. gov.uk). SClottish Health Statistics (www.isdscotJand.org) and the CDC National Centre for Health Statistics in the United States (www.cdc.gov/nchs). Data an: routincly available from all of these sites in summary tables and charts and some sites allow access to some of thc data in the form of Excel spreadsheeas. which can be customised by the user. Most national sites proVide inrormation from censuses., monality data. lI10Ibidity data and inrormation on usage and performance or health services. Ir any of these links bcoamc: redundant in the rutun:. many univenity libraries will continue to maintain u~to-date. accessible links to the laleSt information on their web pages. For example. the Glasgow Univcnity Ubrary website (www.lib.gla.ac.uk and select the link to Maps, Official PublicatiollS and SlIllislics). which can be 8IXlCSsc:d by anyone. has an excellcal page of links to local. national and international data sources and is regularly updated. 1bc internet is increasingly a sourcc of good n::rercnce material in medical statistics. There are many online dictionaries. glossaries. encyclopaedias. sets or lecture notes., lists or statistician.t; and statistical bodies (c.g. The Royal Statistical Society: www.rss.or;.uk). intcractive training wcbsites (e.g. Computer-Assistcd Statistics Tcaching: hnp:llcast .massey.ac.nz). Jaw applets ror use in teaching (e.g. www .stat.sc.cduI-wcstIjavahbnl; Rice Virtual Lab in Statistics: hup:/Ionlinestatbook.com). free statistical software (e.g. Epa Info: www.cdc.gov/Epilnro; StatCnmch: www.statcrunch .com: and R: hUp:l/cnn.r-projecLorg) and c-jolD1lals. It is likely that links to some of these materials, especially lecture notes and teaching materials. will be volatile. but a reasonably up-to-date list (including all orthe links listed in this article) should be obtained by a carefully worded searda in Google. Then: are many other widely availablc seard1 engines including some that arc dc:dicatc:d to n:sources in the an:a or health and medicine. such as Intule (www.intulC.ac .uk/medicine) and Metlscape (www.medscape.com). Also. Wikipedia (www.wikipedia.org) has many enbies on topics related to medical statistics of varying quality and depth. It is probably safer at present. however. to rely on articles that an: available rrom a reputable univenily website. rather than those in \Vikiperlia. Most leading medical journals have their own websites with useful links. 1'hey also generally allow frec 8IXlCSS to abstracts of papers and sometimes also to the full text of papers. FOI'example. the Britisl, Medical JOllmal (www.bmj .com). The Lancet (WWW.thclanccLcom) and the New England Journal of Medicine (http://contenLnejm.org) havc published many articles on statistical methods. which an:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ WHEN TO USE WHICH TEST?
writlen at a )ew,1 accessible 10 noDSlalisticiaDS. as well as n:pans an CLINIC.AL TRIALS and SYSTEMATIC REVIEWS.. which (from a teachinl point of \,iew) iDuslnlle die usc of these melhacls. The de'VClapmena and use orille inland has laken place at such a pheaamenall1lle thal it is climcult ID predict what rurther changes wiD lake place in die rutun:.. The one: thing Ihal is ceJtain is thai its importance as a n:soun:e far all thase with an interest in mccIicaI sbllistics wiD inc~ue. WHO
weighted kappa Sce KAPPA AND WEIOHrED KAPPA weighted ..... squaresestlmator(WLSE) See LEAST SQlJAUS ESmL\1ION
W.,nberg'. deelgn
Sec IlANDOIISATION
when to use which teat, This isaqueatiDn raised by many clinical lesean:hen wantins 10 grasp basic slalislics and hence feel equipped 10 ....lyle lheir own daIa. Howc\lCl', then: an: sc\'Cral Ia5OI1S wh), it ma)' nat be the most appmpriate question 10 _ aDd il is illllnlCli'VC 10 beGin by considerial why noI. Fant. ma)' meclic:aI Slalisticians. and infannecl medical jaumaI editcn, now....,. plefer anaI)'IeS by C'ONfIDENc:E IN1'EJlVAIS instead of HYPOI1IESIS 1ESTS. 'rIII:M am numelOus JalSDDS ror this. same of which an: inlriDsic pmbIc:ms with the infen:ntial pRJCedUle or hypothesis tc:sIin& while others
clue 10 Ibis praccdule'S historical mis... overusc ar abuse. 11ws. the ftnI advice is to COlDlICr the quelli_ with: IAIe yau SIR )'CIa 'WIlDt to do a hypothesis tell?" Scmnd. die quesliOD bellays a false view ofslatistics as if lllen: were a fonnulaic approach wilb one size filling all. UarortuJUllely far lhasc: seeking quick answers to apparentl)' simple quesliODs. Ibis is not Ihe case. as doclon ma)' know flOm lheir own ellperienccs of calling dirr~nlial diagnoscs from outwardl), similar signs and s),mplolDS. Equally. medical statistics is a divenc: discipline with no lwo research studies being quite the same and hence different analytical stnlelies may apply in similar-lookinl cin:umslanccs. third. it may be inappropriale 10 perform a hypolhesis b:sl if the topic of investigation is only raised by Ihe data themselves aad not by a pn:-aistilll n:sc:an:h question (see JII'II\\W IN NIDICId. stATISTICS). Fourth. raean:h must llew:r be seen as a chase to get a P-VAUJE and pn:ferably one less than O.OS. ror IIIDGDg GIber thilils this caafuscs CLINICAL YDSUS stA11S11CAL 5IOlGf1CANa. Caasc:quent.ly. we may he~ nat fully answer Ihe question as posc:cI. but willi suitable pn:caulions and haying decided a leSt is appropriate. will 10 as far as identifying the praper b:sl to use in commonly encounlclal lllliwriale sillUllicms. Fuller details concemilll melhods ad ...,lic8lioDS or each test pmceclun: menlioDed can be found under their specific enlly. Note also thai one CD easily lind guides about choice of stalistic:aI pmc:edun:scm the internet (see WEB IIESOIJRaS IN an:
when to ... which test? When to use which test, according to number IIIId type of samples snd outcome (or response} valiable measurecL Some enI1ies conIIIin demlllive tests. LIlsI Mo mwslndlcate cllCUI1IIIIsnces to USB vadousspproaches for IlSSfJSSIng association Of1Jgf88IIJIIIJI. The table is not a complete ClJtegodsaIionofallpossible tests and data " " . but incIudss those teIened to elsewhere under incIivIdJIIJIy ntIIItIId entries Nomilllll (cllle,o,;ml)
x2-1CSl
Onesample
0'
O,drred mle,,,,im/ conl;""11113 lind "011 IIOTIINII KoImagarov-5mimov teSl.
ConlilrrlDa tmtI _r",IIU,. dblribllled Sludaat's test
Sign lest 1Wosamples
I"peadenl
X2-1eSt (2 )( Ie),.
Mann-Whitne)' rank sum lesl
Unpaired 1-lc:sI
Pairal
FlSher's ellacl test McNemar's tell
Wilcomn sipIecIlDllk test.
PaiIall-ll:Sl
Sign lesl
Mulliplc samples (k>2)
Association betwc:ea twa yariables Agreement bc:Iween two yariables
IncIcpeacIenI
x2-lc:s1 (r)( k)
Kruskal-Wallis test.
Related
CochnID Q-b:sl
Friedman b:sl
(ANOVA) Repeated measura ANOVA
Contiqency cocflk:ieDl
Speannan's rank com:lation. Kendall' 5 tau CGII'Clalion
Pearson praduct-mament cCIIRlaliOD
Kappa c:oeflicienl
Weighted kappa coeflk:ient
Umilsor~l
Jo~Terpstra leSt
AnaI)'sis of WIriaDce
48&
T
S
E
T
M
U
S
K
N
A
R
N
O
X
O
O
L
~
t.EDIG\L STA11STICS): C.I.
_______________________________________________________
www.whichlcsl.iDraiindex.htm may
prove helprul.
Thm: raclOrs inftucncechoicc or sllllislical tcsl: the nahR or the n:sponse (lype of data beilll analysed): abe Dumber or groups sampled (one. lwo or many); and.. irmon: dum one.. abe nalure or abe samplilll (malchcd or inclepcndcnl). The raponsc ex' outcome variable can be continuaus and appmximately nonnaIlydistributcclordicbalamous(a biruuy 'ycslno' outcome) or intcnncclialc to these in a variety or ways. Farcxamplc.the n:spame variablccould be in on.IcRd catclOricS (sec ANALYSIS OF 0JtDDW. IMTA). Olhc:rwise. abe n:sponse variable could be continuaus and aannonnaUy dislribub:d. beiDg skewaI or conlaiDinc 0U1I..IfJI5. pcdIaps. ID eithcror tlac latter cues it is approprialc lo apply ODe or the many NDNMIWIETIlIC MEI'IIODS. In abe special case or the n:spDDSC variable beinJdIe time unlil an eve.... which mayor may DOl ha~ ac::curmI by the time or analysis (stricdy. database closun:), aben SUl'Yiwl methods waulcl be used lo handle die C'ENSOREDOISERVATIONS. Notably. this enlDils a version or die Iog-rank test or one or its allcmaliw:s and can be slndifted or nol clepcadinl an the slrUctun: or die data (sec SURVIVAL AJlW.YSlS). The number of lmups being sampled is lencrally obVious. although somelimes care musl be takeD about analysing the com:ct statistical UBit. ID a clustcr nmcIomised saudy. for instance. il is Ihc clusters thai need to be analysed. nol the individuals fonning the clustcn. When n:peatcd measun:meDls arc taken. while more sophislicatc:d approaches can be adopted. thc simplest is to eonverl each individual's data iDto a suitable summary statistic prior to analysis. Forexample. this statistic mighl be the AREA UNDER TIlE CUJM!. slope or the regression line or MEAN observation. etc •• depending on whatever was previously decided 10 be lhe most clinically meaninlrul. III practice. note that stalillical convenicnce should not be the mlCria for choosing amonl possible summary slabStics (sec SUMMARY MEASURE ANALYSIS). Laslly.the relationship &I1IODI paups is crucial rordccicling OIl the c::om:ct testing procedure. In die simplest case in~vin& two lroups. one aeeds to know whether sample daIa wen: ptlired (also known as IIfIIlrhed, reltlled or depen_I) or unpaired (1IIUIItIIrheti. unrelaled or ilftlqentknl). 11Iis is usuall), strai....forwarcl~ C.I. whcac\lCl' daIa an: collected em Ihc same palicDts berore and after an intc:rventiOll or when a pair or orpDS (ear. eye. haad. kidney. ele.) is measured within abc same pcman or when twins an: studied within a eonlrolled experiment. It can be less clear how best to analyse clata in CCltDin CASEoCON1ROL srtJIUS. malchcd by sex and agclo withill a ftxcd number of years. howcver. 'Ibis is bccause~ a.m,. the purpose or matching is to cn:alc bnJadly comparable groups according 10 basic dcmopaphic status. ndhcr dum attemptinl lo achieve PRcisel), well-malchccl pain (see MATCIDNO).
11Ic table shows. aa:onIinc to abc thn=basic criteria. when to usc which test mcthad in die simplest cases. For complctioa. il also iadicates which pnx:cclun: applies whca assessing A5S0CL\TION or ACIlEflO!N1". Apin. rurthcrclclailscan be found under individualI)' II8IDCd enlrics. However. as emphasised duaugbout. confklc:acc infl:nlals an: prel'c:m:d to leu and for man: iDfarmali~ analyses IIiIL mudclling or aqn:ssion tec:lmiqucs can be bcucr mil 1bcse pmvidc muluaD)' actiusrallalllts for impartanl confounders.. an altagcdlcr I11CR . .ractary aAJI08Ch 10 bandling daIa and supc:riar 10 cxpc:cting it lo be adequately clc:scribc:cl by a P-valuc, as ir n:laIianships wilhin the data coulcl possibly be encapsulated by a single numbc:r. a hopelessly false ambition. NCYCIthcIcss. viewed positively and ~y, the right hypothesis 1e51 can ~ 10 rule oul chance as DB explanalian ror disweiDiIt data appan:nt in one or man: random samples. and lead abe iD~ OIl IOWDIds a ruller aulysis or the: daIa c:ollcctad and. ill tum. a deeper clinical uncIcnitanding. CRP (Sec also I!XACI' ME11IOO5 RJR CA1IDJRICAL DATA. IIYPOIHESIS TEnS)
Wilcoxon rank sum test
See MANN-WHIINEY RANK
SUM lEST
Wilcoxon signed rank teat
'Ibis is a noapanunctric VCl5ianofdle pairml-ll:Sl (see Sl'UDENrsl-'II5ST) used rex'1Wo poups thai an: either matched or pain:d. II is man: sensitive
than Ihc SKIN TEST as it uses the III8Ini.... or the diffcn:nccs bclMlCn the pairs not simpl)' Ihc sip or the diffen:ncc. It gives mon: wCilMlo pairs that show large differences than thasc that show small diffcn:nces. The Wilcoxon signed rank lest ICsts the assumplion thai the sum of the positi\'e RA...s equals abc sum of thc lICIalive ranks. The test statistic is die smaller or abe sum or the positive and the sum of the IlCplive ranks. The daIa should be CCIIIlinuous or ordinal in
Wilcoxon signed rank test Mcm2 and ICi61 Values, data from a study of patients with ctJtJCer Ptlll""
Mmr2
Ki61
D;jfemrce RmrIc 0/ SirdiJIerena tJf
,_Ie
t/iJfermt:e 1 2 3 4 5 6 7 1 9
14.78 7.96 10.89
12.10 18.23 16.40 18.02 23.35 26.70
14.78 8.68 1.57
I.SS S.14
3.043.96
8.16 8.40
0 -0.72 9.32 10.25 12.39 13.36 14.06 IS.19 18.30
1 2 3 4 5 6 7 8
-1 +2 +3 +4 +5 +6
+7 +1
_______________________________________________________________ WlSE
-.1'hepdn=d . . . . . . . . . be ........... and ~ ~ dID ... IEDWf ditrea.ac.: . fInt, ...........in . . . .fGreachpair(\IBIiaWe 1variIII1e 2). Then ... ~ ........... or ... dUl'tiiti....., ...... to hipit, ........ '~:.YtftIIIS . . . to lies iD" diIreiaIcts 1IIIII_1IIIIk lit zcig clft'CI1IIaL FiDeI dID swia of tha.~ farille paIIiMdilinnaa. ..."........_or .. JIIIIb -Pihe til...... V.... N, die ~ ...... 01 dlB'lNiCCIi II1II lilal", tiel. Pinel ... aIIIcal v... I'RIaI ............... ~W..... (W~. W-); nsject ... ~ IiwcmBIs If W Ja _, ilia ~ equailG .... cdtal.l~.•r.,...>W-.... ~I .... io'be. . . . . .·. . . . . 2 .... ¥icD . . . Iu .... or.IbId,.Mc;m2 and Ki67 __ - - CGIIIJIIIII'II .to . . If~" .cIft'crea&.oe"~-'~"" laJllllil!* w.lIh ....... o.a. ........iII the IaWe.. A plat or the diIreIaacas ....... ..., ae pI·...,.5,....eM .. tIIe
,..1h6
....... of.,......., ......
r.
"-.1.
Me r~-JSIIIIIl willa W-lllill(wt', ~,-I ~ N-I. Ram ............. (N-I..-o.o5) ....... cri1iaII . . . is 3. As I is len .... 3. ..... 1.....nt evidellce IG . . IIIe l1li1 ~L 'I1Icmrcn Ihcn i5 • diJrei.ac beI..... Man2 and lCi61 value&. Mcm2 viii...·... to ..
hiPar 1IIa'1Ci67 v*s.. Far 1urtbcr ....1s.., NI (1917). SicpI ... c.tcI_ (1908) .... Swin..w .... Campbell
!LV
(2002}.. Nt,
u.
A. 1W1: ·MIa ""..,rb , . ",.""
"":"""C'aII'
&1ft ~_ _•
~OIb:' __ "N.I• • • ___ ",...,_Ik,*,,,,Ia/flr'" ~.rd!Iita..2ad cditiaa.. New
Ycd:.....a..w..aa .......
;T.D.V.... O ....M.J.2G02: SItIIIJIia til . . . . . . ., ..... ediliGD.l.aaIIaa: aMi .....
WI..- . - s. aUGS AIG):\VDdIUoS WL8E AbIJaMaIIaararwciPtedlcat......eJIimaIar.
See ~ lQIiMEs BSmMIIIIN
y YIdea' correclon This is an aCtiUJI~t III8IIe 10 a CHl-SQUARI! 'IESI' wile.. ~ nlllDher or abscna1ians is IIIUIII. YaleS· CCII"I'CCliOD is I:UlDpIe of B continuity cam=aliaa and ii cIciipcd spcciftcally I'ar 2 x 2 fNquency tables. The cbi-sqwn tesl IlleS 11M: DDoSQUAIlE DBI'RIIU'IION 'to ddennine whc:thcra set or obIawd CDWIIS fio.. a study (the number or abscrvati.... in each cell of' a hquenc:y ..bIe) .difI'eI' significantly ~ die espec:tedcaunls paIidccI by. a hypadlesis:. This use of . . chi-squam diSlribulioa is .. appoximation based OIl the use or die ~ DlS11UBUI1ON
an
toapproXillUlletbedisllibuti_arllM: nlBllbcrorobRrYatioas , in each ·ceO or 11M: flequeacy table. The use or IIH: IIIIIIDIII clislributiDn is only... ~ lion because the of obsemdiDns ill a cell or a 1Rq1lCllC)' bible is aclilCRlc Value .(it ~. . only take nanneptive iaIcpr values: 0. 1~ 2. ...) when:as dID normal distrilMalion IS continuous (it caD lake any wille); Whea tho nlBllbcr or observlllions is J~ this· dill'aence becomes
liliiii_
&qd""" CfMIJIIMlM fiJ M__I SItIIiltlia; JftVIIIII EtIiIitIa
.Ievant (I:hc appntlWnaIioa beco.mc:s Vel)' goad),"'1 when die IIUII1ber or observationS is IIIIIID ilia ..,..aaimaliaa can ..... that the SIaIIdanI cbi....1IBIC Iat aault is invalid (the pVAUJE n:poIled by •
lest may be . . low).
Yales' conection is. .. Blf:I=DIPt
10
lake account of' die
..,..,ximaliaft in the .calc:ulaliaa or the chi....uan: tal SIB-
~r. 1Iac: cft'cc:torthec.'OlRClion Is todccn:ue die sizeof' ~ which P-Yalue fi'Dm.dIe 1I:st. TbisDICaas abat Yales~ com:ctiaD will alwaY51ivea IIIOIe"ClJlllCnllliYe last; it wiD be less likely 10 n:pod a sipiftcanl JaUiL Tllmeis nolDlJ~ ap=mc:nt_ Vllfcst CCImICtion shouJcI be appJ~. Wilh ~~patcrpmccuil1l speeds and me.... ' capacity it is ~ u.II, reasible ",use aacl IIICthocIs .instead. wbic:. a~d.. appIVldmalioas aIlopthcr
r
incraIses"
w_
(~-I. AsIa'S EXILT 1EST)..Far
fullller
~II
see Altman
(1991).
A111D1D, D. G. 1gg 1: PnlclktII l'lIlislirs p rrxtBr". Landon: Chapma • Hall.
PM
medklll
MIed by BriIII So Everitt MIl CuisIapMr R. .......
02011 ..... Wiley" Soar.. ....
481
:JI'~ 1Wsi~.~..,.ar~~~~
.,,·ineiisumd ..
.... to ....UIY.. ~.n.s diIraaIl scales. s..pJe waJ.sX amCXlllYCltelliO HCGIa Z
.••. 1IIa i'armala:
QU~~.Z~of~';'61~.~" . . . ....100,25111 caiIiIe:. ... positi_Z:~~"'10 qUaDtiIc:s iii 11M: ........f.,of the diIliibaliaa. ;,. .LMS eakmds .MmIoD ' ".. . .. . .WIiDD of die Haire .. . .Ia. -
. . . fbrlDWNESS:
X-II ;2-SO
whaa ... ~JI·aad'SWCDARD DBVL\1ION (SO) Ix' X'. ;aIui_ either,~'1iIIIIII* or ...cid.er :,..... ·'Ibe.DICIIII and SD ot%IR ......y.cIose to ...... CIIIC
...am.
__ "~lhcmall..arg;s_caefti:lltol'~6~
~. SD.~ by Ihe ...., ~ L "",,,~.pDWa'. ~'·Io'~ Ihe~ .. Cs6ma1ed by
DIII'mIinaN,
. . . .~.1IICIIIacIs. Whc:a. L-I(i.e. a nanna1 disui. . . .) dd5....IUlcs Io'~ ftnt .•IliiIiOn~ TIC'
,aq.......
·iI: . . . ., coI:·. JlCItIW.
Lx$.
MCaI'e"
n:Spec:tively.;.IIi·.caaiatorLiNDRlBItJ5SiiJNihe ~ Jf; IIIHi _Jiic
10 ·11........ ....... ..... distriIMJIJaa
Z_(~/M)'-l
lhe~_ "'e.xpR~~ •.• ~_.~ ar